Q

NULL columns in a LEFT OUTER JOIN

When I run the following query I am getting 10 rows:

SELECT A.COL1,A.COL_BEG_DT,A.COL_END_DT
FROM SCHEMA1.TABL_1 A
INNER JOIN
SCHEMA1.TABL_2 B
ON A.ID = B.ID
LEFT OUTER JOIN
SCHEMA1.TABL_C
ON A.ID = C.ID
LEFT OUTER JOIN
SCHEMA1.TABL_D
ON A.ID = D.ID
WHERE A.COL_BEG_DT < 1999
AND B.COL_CD = 'X'
AND C.ID IS NULL
AND D.ID IS NULL

If I change the NULL checking clause as below, then my result set is 33 rows:

SELECT A.COL1,A.COL_BEG_DT,A.COL_END_DT
FROM SCHEMA1.TABL_1 A
INNER JOIN
SCHEMA1.TABL_2 B
ON A.ID = B.ID
LEFT OUTER JOIN
SCHEMA1.TABL_C
ON A.ID = C.ID

AND C.ID IS NULL
LEFT OUTER JOIN
SCHEMA1.TABL_D
ON A.ID = D.ID
AND D.ID IS NULL
WHERE A.COL_BEG_DT < 1999
AND B.COL_CD = 'X'

I am not sure why the difference in result sets due to the change in position of NULL checking. Your help regarding this is much appreciated.

To understand why this is happening, you have to know how a left outer join works. In a left outer join, all the rows from the left table(s) will be returned, together with matching rows from the right table, if any.

It is the "matching" specifications that are important here.

In your first query, the matching specifications are given by the ON conditions. You are specifically looking for rows from the C and D tables with A.ID=C.ID and A.ID=D.ID. Any A and B rows which do not have matching C or D rows will be returned, but all columns in the result set which would have come from C or D will be NULL. Then, when you test for C.ID IS NULL and D.ID IS NULL in the WHERE clause, you are in effect asking for only the result set rows which had neither a matching C nor a matching D row. It appears that there are 10 A and B rows which have no matching C or D row.

In your second query, the matching specifications given by the ON conditions are slightly different. In order to satisfy the ON conditions, the C.ID and D.ID columns must be NULL. This is highly unlikely, given that they are probably keys. Furthermore, it means that in order for there to be a match, A.ID would have to be equal to NULL. And as we all know, nothing is equal to NULL, not even another NULL. So since there is no matching row in the C or D table with a NULL ID, then there is no match. This will occur for every A and B, not just for those which have no non-null match, but for all of them. This is why the second query returns more rows. But note that for all of the returned rows, the columns in the result set that would've come from C and D will be NULL, because no matching rows were found.

This was first published in November 2004

Dig deeper on SQL Server Database Modeling and Design

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchDataCenter

SearchDataManagement

SearchAWS

SearchOracle

SearchContentManagement

SearchWindowsServer

Close