A Stitch in “Time”


Central to nearly every dimensional data model is one or more date
dimensions (aka time dimensions).  The benefits of the date (or time) dimension have been discussed at length since the genesis of dimensional modeling  in the 1990’s, so I won’t rehash them here. But for the uninitiated:

A date dimension is a table that contains a single row for each date represented in your fact table.  In addition to the surrogate key and an actual date, they also contain various attributes of dates (ex: day of week, day of month, quarter name, month name, etc) as well as attributes for representing multiple calendars (ex: fiscal calendar, academic calendar, etc.).  While it is true that many of these values can be calculated using built-in dbms date functions, these functions perform slowly and are different for each database vendor.

Relative Time Attributes

One set of attributes that I think are often overlooked are relative time attributes.  Relative to what? Usually to the current date (either today or the last date that the fact tables were updated).  Let’s start with the typical date dimension

dimDates

DateSKey: INTEGER [PK]
CalendarDate: DATE
DateDescription: VARCHAR(40)
Year: VARCHAR(4)
Month: VARCHAR(10)
(various other attributes)

Now if, from this table, I needed to filter on yesterday, it wouldn’t be too difficult to do a little math to find that date for yesterday.

SELECT
  *
FROM
  dimDates
WHERE
  CalendarDate = ( GetDate()-1 )

But what if I needed all the records from last week? How about last quarter?  What if I wanted to compare last quarter to the same quarter from last year?  Some of these calculations can be infuriatingly difficult, and don’t forget that the SQL necessary for SQL Server won’t be the same as the SQL for Oracle (or DB2 or mySQL…).

Enter; Relative time attributes. Let’s start with the most basic, RelativeCalendarDate:

dimDates

DateSKey: INTEGER [PK]
CalendarDate: DATE
DateDescription: VARCHAR(40)
Year: VARCHAR(4)
Month: VARCHAR(10)
(various other attributes)
RelativeCalendarDate: INTEGER

This integer column will be populated with a 0 for the current date, -1 for yesterday, -2 for the date before, and so on.  For handling future dates, it would also have a 1 for tomorrow, 2 for the day after, and so on. So every row in the table will have a number representing the number of days from to day, negative going back in time, positive going forward.

You would then add similar relative attributes for every level of granularity you represent in the date dimension:

dimDates

DateSKey: INTEGER [PK]
CalendarDate: DATE
DateDescription: VARCHAR(40)
Year: VARCHAR(4)
Month: VARCHAR(10)
(various other attributes)
RelativeCalendarDate: INTEGER
RelativeCalendarWeek: INTEGER
RelativeCalendarMonth: INTEGER
RelativeCalendarQuarter: INTEGER
RelativeCalendarYear: INTEGER
RelativeFiscalYear: INTEGER

So, similar to RelativeCalendarDate, RelativeCalendarWeek would have 0 for all 7 rows in the current week, -1 for all 7 rows in last week, -2 for all 7 rows in the prior week, +1 for all 7 rows in next week, etc..

An Example

Assuming that today was July 2, 2010, your week starts on Mondays, and your fiscal calendar begins in July, your table would look something like this:

Calendar

Date

Date
Description
(var.) Relative

Calendar

Date

Relative

Calendar

Week

Relative

Calendar

Month

Relative

Calendar

Quarter

Relative

Calendar

Year

Relative

Fiscal

Year

6/25/2010 Friday,
June 25, 2010
-7 -1 -1 -1 0 -1
6/26/2010 Saturday,
June 26, 2010
-6 -1 -1 -1 0 -1
6/27/2010 Sunday, June 27, 2010 -5 -1 -1 -1 0 -1
6/28/2010 Monday, June 28, 2010 -4 0 -1 -1 0 -1
6/29/2010 Tuesday, June 29, 2010 -3 0 -1 -1 0 -1
6/30/2010 Wednesday,
June 30, 2010
-2 0 -1 -1 0 -1
7/1/2010 Thursday, July 01, 2010 -1 0 0 0 0 0
7/2/2010 Friday, July 02, 2010 0 0 0 0 0 0
7/3/2010 Saturday, July 03, 2010 1 0 0 0 0 0
7/4/2010 Sunday, July 04, 2010 2 0 0 0 0 0
7/5/2010 Monday, July 05, 2010 3 1 0 0 0 0
7/6/2010 Tuesday, July 06, 2010 4 1 0 0 0 0
7/7/2010 Wednesday, July 07, 2010 5 1 0 0 0 0
7/8/2010 Thursday, July 08, 2010 6 1 0 0 0 0
7/9/2010 Friday, July 09, 2010 7 1 0 0 0 0

With the addition of relative time attributes to your date dimension, comparing ranges of dates at any level of granularity becomes simple. For example, let’s compare last month to the same month from last year.

SELECT
  D.Year,
  D.Month,
  SUM(F.Measure1)
FROM
  FactTable F
  INNER JOIN dimDates D ON F.DateSKey = D.DateSKey
WHERE
  D.RelativeCalendarMonth IN (-1, -13)
GROUP BY
  D.Year,
  D.Month

Keeping Them Up-To-Date

Of course, to make the relative dates useful, they’ll need to be updated every day.  Below we see a SQL statement that will take care of this for SQL Server. You can run this on a nightly schedule, preferably right at midnight if your warehouse is available 24 hours a day.  This SQL can be adapted to work with any DBMS.

UPDATE
  dimDates
SET
  RelativeCalendarYear = DATEDIFF(yyyy, GETDATE(), CalendarDate), -- RelativeYears
  RelativeCalendarQuarter = DATEDIFF(q, GETDATE(), CalendarDate), -- RelativeQuarters
  RelativeCalendarMonth = DATEDIFF(m, GETDATE(), CalendarDate), -- RelativeMonths
  RelativeCalendarWeek = DATEDIFF(ww, GETDATE(), CalendarDate), -- RelativeWeeks
  RelativeCalendarDay = DATEDIFF(d, GETDATE(), CalendarDate) -- RelativeDays
WHERE
  DateSKey > 0

You’ll note the Where clause at the end.  In my date dimensions, I have rows with negative surrogate key values which represent such things as “No Date” (eliminating the need for nulls), “Unknown Date”  (another replacement for null values) and “Invalid Date” (because some transactional systems I’ve pulled from allow 31
days in February).  If you don’t have rows with negative surrogate keys, then the Where clause wouldn’t be necessary for you.

Things To Try

  • The concept should work well for even more granular time dimensions.  If shifts are part of your time dimension, or even hours, then you should be able to create relative time attributes for these periods of time as well.
  • I had considered adding named attributes as well (ex. RelativeCalendarDate = -1, RelativeCalendarDateName = “Yesterday”), but I was unable to come up with a comprehensive naming scheme. If you can think of one, please let me know.

Conclusion

As you can see, relative time attributes are easy to add to any data warehouse and are simple to maintain. They’ll reduce your SQL development time when working with dates, and in many cases they’ll improve the performance of your queries. They are not entirely intuitive, however with a brief explanation, your query developers will pick up the concept easily.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: