I have a table with the following info:
Task Start Finish Sleep 01-oct-04 19:00 02-Oct-04 07:00 Awake 02-oct-04 07:00 02-Oct-04 20:00 Sleep 02-oct-04 20:00 03-Oct-04 02:00
Not a very exciting life really!
If I were to report what I did in a 24 hour period (midnight to midnight), what would be the best way to obtain the data below:
01-Oct-04 Sleep 5hrs 02-Oct-04 Awake 13hrs 02-Oct-04 Sleep 11hrs 03-Oct-04 Sleep 2hrs
This is a really tough question. The following explanation does not include any of the wrong answers or false starts that I struggled with while trying to solve it. What you see is the result of many hours of comprehensive testing, as well as many more happy hours trying to write up the answer in such a way that it makes sense, and yet requires fewer than eleventy-seven parts. It was tough getting it down to 3.
To begin, let's look at the desired results and work backwards. We will need to do a GROUP BY on date and task, with a SUM() for the portion of elapsed time of any task that occurred on that date. Notice that the data contains datetimes. However, we need to GROUP BY date values, not a datetime values.
To help understand this type of problem, which involves datetime ranges, it helps to construct a timeline diagram to illustrate the various start and finish scenarios relative to a given date.
taskday nextday | | 1 <--S====F | | | | 2 <------S==|=F | | | 3 <------S==|============|=F | | 4 |<--S====F | | | 5 |<--------S==|=F | | 6 | |<--S====F | |
In this diagram, the S====F's are the tasks. Each task has start (S) and finish (F) datetime values. The taskday line represents time 00:00:00, i.e. midnight, while nextday is midnight of the next day. Midnight is what you get when you remove (or "zero out" if you prefer) the time portion of a datetime value. There's an arrow pointing from each start datetime value to its corresponding date, i.e. midnight value.
So relative to taskday, the diagram shows 6 scenarios which a task can have, based on its start and finish datetime values. Remember, we are going to use GROUP BY date, so we want to sum up, when grouping, all tasks relative to each date, whenever any part of the elapsed time of the task occurred on that date.
For a given taskday, scenarios 1 and 6 can be ignored, as they will be included entirely within previous or next days. In scenarios 2 through 5, then, some portion of the task elapsed time occurs between taskday and nextday. These portions are shown in the diagram in bold red.
The following conditions will select only scenarios 2 through 5:
where start < nextday and finish > taskday
Scenario 6 is excluded because its start is not less than nextday. Scenario 1 is excluded because its finish is not greater than taskday. Neat, eh?
Now to consider grouping by date. We need to have every date for which even a portion of any task occurs on that date. So we need at least the date of every start datetime in the data.
Secondly, to cover all possible cases, we also need, somehow, all dates that occur between the start and finish of any task, i.e. cases similar to scenario 3 but which have multiple dates separating start and finish.
Finally, we need the date corresponding to each finish, to include the "tail end" portions of scenarios 3 and 5, i.e. to include these portions when they become scenario 2 when nextday becomes taskday.
Please proceed to Sum elapsed hours by task by date, part 2 of 3.
This was first published in November 2004