Central to nearly every dimensional data model is one or more date
dimensions (aka time dimensions). The benefits of the date (or time) dimension have been discussed at length since the genesis of dimensional modeling in the 1990’s, so I won’t rehash them here. But for the uninitiated:
A date dimension is a table that contains a single row for each date represented in your fact table. In addition to the surrogate key and an actual date, they also contain various attributes of dates (ex: day of week, day of month, quarter name, month name, etc) as well as attributes for representing multiple calendars (ex: fiscal calendar, academic calendar, etc.). While it is true that many of these values can be calculated using built-in dbms date functions, these functions perform slowly and are different for each database vendor.
Relative Time Attributes
One set of attributes that I think are often overlooked are relative time attributes. Relative to what? Usually to the current date (either today or the last date that the fact tables were updated). Let’s start with the typical date dimension
dimDates
DateSKey: INTEGER [PK] |
CalendarDate: DATE |
DateDescription: VARCHAR(40) |
Year: VARCHAR(4) |
Month: VARCHAR(10) |
(various other attributes) |
Now if, from this table, I needed to filter on yesterday, it wouldn’t be too difficult to do a little math to find that date for yesterday.
SELECT
*
FROM
dimDates
WHERE
CalendarDate = ( GetDate()-1 )
|
But what if I needed all the records from last week? How about last quarter? What if I wanted to compare last quarter to the same quarter from last year? Some of these calculations can be infuriatingly difficult, and don’t forget that the SQL necessary for SQL Server won’t be the same as the SQL for Oracle (or DB2 or mySQL…).
Enter; Relative time attributes. Let’s start with the most basic, RelativeCalendarDate:
dimDates
DateSKey: INTEGER [PK] |
CalendarDate: DATE |
DateDescription: VARCHAR(40) |
Year: VARCHAR(4) |
Month: VARCHAR(10) |
(various other attributes) |
RelativeCalendarDate: INTEGER |
This integer column will be populated with a 0 for the current date, -1 for yesterday, -2 for the date before, and so on. For handling future dates, it would also have a 1 for tomorrow, 2 for the day after, and so on. So every row in the table will have a number representing the number of days from to day, negative going back in time, positive going forward.
You would then add similar relative attributes for every level of granularity you represent in the date dimension:
dimDates
DateSKey: INTEGER [PK] |
CalendarDate: DATE |
DateDescription: VARCHAR(40) |
Year: VARCHAR(4) |
Month: VARCHAR(10) |
(various other attributes) |
RelativeCalendarDate: INTEGER |
RelativeCalendarWeek: INTEGER |
RelativeCalendarMonth: INTEGER |
RelativeCalendarQuarter: INTEGER |
RelativeCalendarYear: INTEGER |
RelativeFiscalYear: INTEGER |
So, similar to RelativeCalendarDate, RelativeCalendarWeek would have 0 for all 7 rows in the current week, -1 for all 7 rows in last week, -2 for all 7 rows in the prior week, +1 for all 7 rows in next week, etc..
An Example
Assuming that today was July 2, 2010, your week starts on Mondays, and your fiscal calendar begins in July, your table would look something like this:
Calendar
Date |
Date
Description |
(var.) |
Relative
Calendar
Date |
Relative
Calendar
Week |
Relative
Calendar
Month |
Relative
Calendar
Quarter |
Relative
Calendar
Year |
Relative
Fiscal
Year |
6/25/2010 |
Friday,
June 25, 2010 |
… |
-7 |
-1 |
-1 |
-1 |
0 |
-1 |
6/26/2010 |
Saturday,
June 26, 2010 |
… |
-6 |
-1 |
-1 |
-1 |
0 |
-1 |
6/27/2010 |
Sunday, June 27, 2010 |
… |
-5 |
-1 |
-1 |
-1 |
0 |
-1 |
6/28/2010 |
Monday, June 28, 2010 |
… |
-4 |
0 |
-1 |
-1 |
0 |
-1 |
6/29/2010 |
Tuesday, June 29, 2010 |
… |
-3 |
0 |
-1 |
-1 |
0 |
-1 |
6/30/2010 |
Wednesday,
June 30, 2010 |
… |
-2 |
0 |
-1 |
-1 |
0 |
-1 |
7/1/2010 |
Thursday, July 01, 2010 |
… |
-1 |
0 |
0 |
0 |
0 |
0 |
7/2/2010 |
Friday, July 02, 2010 |
… |
0 |
0 |
0 |
0 |
0 |
0 |
7/3/2010 |
Saturday, July 03, 2010 |
… |
1 |
0 |
0 |
0 |
0 |
0 |
7/4/2010 |
Sunday, July 04, 2010 |
… |
2 |
0 |
0 |
0 |
0 |
0 |
7/5/2010 |
Monday, July 05, 2010 |
… |
3 |
1 |
0 |
0 |
0 |
0 |
7/6/2010 |
Tuesday, July 06, 2010 |
… |
4 |
1 |
0 |
0 |
0 |
0 |
7/7/2010 |
Wednesday, July 07, 2010 |
… |
5 |
1 |
0 |
0 |
0 |
0 |
7/8/2010 |
Thursday, July 08, 2010 |
… |
6 |
1 |
0 |
0 |
0 |
0 |
7/9/2010 |
Friday, July 09, 2010 |
… |
7 |
1 |
0 |
0 |
0 |
0 |
With the addition of relative time attributes to your date dimension, comparing ranges of dates at any level of granularity becomes simple. For example, let’s compare last month to the same month from last year.
SELECT
D.Year,
D.Month,
SUM(F.Measure1)
FROM
FactTable F
INNER JOIN dimDates D ON F.DateSKey = D.DateSKey
WHERE
D.RelativeCalendarMonth IN (-1, -13)
GROUP BY
D.Year,
D.Month
|
Keeping Them Up-To-Date
Of course, to make the relative dates useful, they’ll need to be updated every day. Below we see a SQL statement that will take care of this for SQL Server. You can run this on a nightly schedule, preferably right at midnight if your warehouse is available 24 hours a day. This SQL can be adapted to work with any DBMS.
UPDATE
dimDates
SET
RelativeCalendarYear = DATEDIFF(yyyy, GETDATE(), CalendarDate), -- RelativeYears
RelativeCalendarQuarter = DATEDIFF(q, GETDATE(), CalendarDate), -- RelativeQuarters
RelativeCalendarMonth = DATEDIFF(m, GETDATE(), CalendarDate), -- RelativeMonths
RelativeCalendarWeek = DATEDIFF(ww, GETDATE(), CalendarDate), -- RelativeWeeks
RelativeCalendarDay = DATEDIFF(d, GETDATE(), CalendarDate) -- RelativeDays
WHERE
DateSKey > 0
|
You’ll note the Where clause at the end. In my date dimensions, I have rows with negative surrogate key values which represent such things as “No Date” (eliminating the need for nulls), “Unknown Date” (another replacement for null values) and “Invalid Date” (because some transactional systems I’ve pulled from allow 31
days in February). If you don’t have rows with negative surrogate keys, then the Where clause wouldn’t be necessary for you.
Things To Try
- The concept should work well for even more granular time dimensions. If shifts are part of your time dimension, or even hours, then you should be able to create relative time attributes for these periods of time as well.
- I had considered adding named attributes as well (ex. RelativeCalendarDate = -1, RelativeCalendarDateName = “Yesterday”), but I was unable to come up with a comprehensive naming scheme. If you can think of one, please let me know.
Conclusion
As you can see, relative time attributes are easy to add to any data warehouse and are simple to maintain. They’ll reduce your SQL development time when working with dates, and in many cases they’ll improve the performance of your queries. They are not entirely intuitive, however with a brief explanation, your query developers will pick up the concept easily.