When I run the following query I am getting 10 rows:
SELECT A.COL1,A.COL_BEG_DT,A.COL_END_DT FROM SCHEMA1.TABL_1 A INNER JOIN SCHEMA1.TABL_2 B ON A.ID = B.ID LEFT OUTER JOIN SCHEMA1.TABL_C ON A.ID = C.ID LEFT OUTER JOIN SCHEMA1.TABL_D ON A.ID = D.ID WHERE A.COL_BEG_DT < 1999 AND B.COL_CD = 'X' AND C.ID IS NULL AND D.ID IS NULL
If I change the NULL checking clause as below, then my result set is 33 rows:
SELECT A.COL1,A.COL_BEG_DT,A.COL_END_DT FROM SCHEMA1.TABL_1 A INNER JOIN SCHEMA1.TABL_2 B ON A.ID = B.ID LEFT OUTER JOIN SCHEMA1.TABL_C ON A.ID = C.ID AND C.ID IS NULL LEFT OUTER JOIN SCHEMA1.TABL_D ON A.ID = D.ID AND D.ID IS NULL WHERE A.COL_BEG_DT < 1999 AND B.COL_CD = 'X'
I am not sure why the difference in result sets due to the change in position of NULL checking. Your help regarding this is much appreciated.
To understand why this is happening, you have to know how a left outer join works. In a left outer join, all the rows from the left table(s) will be returned, together with matching rows from the right table, if any.
It is the "matching" specifications that are important here.
In your first query, the matching specifications are given by the ON conditions. You are specifically looking for rows from the C and D tables with A.ID=C.ID and A.ID=D.ID. Any A and B rows which do not have matching C or D rows will be returned, but all columns in the result set which would have come from C or D will be NULL. Then, when you test for C.ID IS NULL and D.ID IS NULL in the WHERE clause, you are in effect asking for only the result set rows which had neither a matching C nor a matching D row. It appears that there are 10 A and B rows which have no matching C or D row.
In your second query, the matching specifications given by the ON conditions are slightly different. In order to satisfy the ON conditions, the C.ID and D.ID columns must be NULL. This is highly unlikely, given that they are probably keys. Furthermore, it means that in order for there to be a match, A.ID would have to be equal to NULL. And as we all know, nothing is equal to NULL, not even another NULL. So since there is no matching row in the C or D table with a NULL ID, then there is no match. This will occur for every A and B, not just for those which have no non-null match, but for all of them. This is why the second query returns more rows. But note that for all of the returned rows, the columns in the result set that would've come from C and D will be NULL, because no matching rows were found.
This was first published in November 2004