Assume that we have the following tables of Departments, Employee, and EmployeeSickDays (schema somewhat abridged for brevity):
CREATE TABLE dbo.Departments ( DepartmentId INT NOT NULL PRIMARY KEY, DepartmentName VARCHAR(35) NOT NULL UNIQUE ) CREATE TABLE dbo.Employees ( EmployeeId INT NOT NULL PRIMARY KEY, DepartmentId INT NOT NULL REFERENCES dbo.Departments (DepartmentId), EmployeeName VARCHAR(50) NOT NULL ) CREATE TABLE dbo.EmployeeSickDays ( EmployeeId INT NOT NULL REFERENCES dbo.Employees (EmployeeId), SickDate SMALLDATETIME NOT NULL, CONSTRAINT PK_EmployeeSickDays PRIMARY KEY (EmployeeId, SickDate), CONSTRAINT CK_SickDate_Day CHECK (SickDate = DATEADD(dd, 0, DATEDIFF(dd, 0, SickDate))) )
These table definitions are relatively straightforward: A table of departments, each with a unique surrogate identifier. A table of employees, each row of which is related to a department, forming the one-to-many relationship between departments and employees (a department can have many employees; an employee is a member of exactly one department). And finally, a table of employee sick days, with a composite primary key consisting of the employee's surrogate identifier and the sick date.
It's important to notice the CHECK constraint, CK_SickDate_Day, which ensures that any date in the SickDate column will have a time of 00:00:00. This is especially important because if dates were allowed to have other time portions, an employee could be marked sick multiple times on the same day! Remember that, even with the best-written queries, your data will only ever be as good as the constraints that maintain its integrity.
The challenge -- now that we've ensured quality data -- is to write the query that will return the names and departments of employees who've been out of the office more than six times in the last six months. Luckily, this task is also fairly straightforward. The following query does the job:
SELECT E.EmployeeName, D.DepartmentName, COUNT(*) AS NumberDaysOut FROM dbo.Departments D JOIN dbo.Employees E ON E.DepartmentId = D.DepartmentId JOIN dbo.EmployeeSickDays ES ON ES.EmployeeId = E.EmployeeId WHERE ES.SickDate >= DATEADD(mm, -6, GETDATE()) GROUP BY E.EmployeeId, E.EmployeeName, D.DepartmentName HAVING COUNT(*) >= 6
So how does this query work? Starting in the FROM clause: Each table is joined based on foreign keys that were set up as per the DDL listed above. Departments to Employees based on the DepartmentId, and Employees to EmployeeSickDays based on the EmployeeId. Next, to the WHERE clause, where we filter the SickDate column of the EmployeeSickDays table, making sure to only return dates that occur after the current date minus six months (i.e., those dates within the last six months.)
The results are grouped by the EmployeeId, EmployeeName and DepartmentName -- which means that for each distinct combination of values found in the results, an aggregation will occur. In this case, that aggregation is a COUNT of rows -- this will count the number of rows in the EmployeeSickDays table for each employee and department, that have rows with dates in the last six months. Note that although EmployeeId does not appear in the SELECT list, it's still being used for grouping. Why? Insurance! What if there are two employees named John Smith in the Sales department?
Finally, the HAVING clause checks to make sure that at least six rows were counted for each aggregation, so the query won't return rows for employees who are under the threshold.
This query will work fine in older versions of SQL Server, as well as in SQL Server 2005. There are no new tricks in SQL Server 2005 for situations like these. The best trick in any developer's toolkit is always having a solid understanding of foundations!
Do you have comments on this Ask the Expert Q&A? Let us know.
This was first published in May 2005