The following tip was excerpted from Chapter 8, 'Building Fast-Performing Data Models,' from the book "Beginning...
Database Design" by Gavin Powell, courtesy of WROX Publishing. Click here for the complete collection of book excerpts.
Filtering with the WHERE Clause
The WHERE clause can be used either to include wanted records or exclude unwanted records (or both). The WHERE clause can be built in specific ways, allowing for faster execution of SQL code. Use of the WHERE clause can be applied to tune SQL statements simply by attempting to match WHERE clause specifications to indexes, sorted orders, and physical ordering in tables. In other words, filter according to how the metadata is constructed.
The WHERE clause is used to filter records and can, therefore, be placed in all three of SELECT, UPDATE, and DELETE commands.
There are numerous points to keep in mind when building efficient filtering:
- Single record searches: The best filters utilize a primary key on a single table, preferably finding a single record only, or a very small number of records. This query finds the only author with primary key identifier as 10:
SELECT * FROM AUTHOR WHERE AUTHOR_ID = 10;
- Record range searches: Using the >, >=, <, and <= operators executes range searching. Range searches are not as efficient as using equality with an = operator. A group of rows rather than a single row are targeted. Range searching can still use indexing and is fairly efficient. This query finds the range of all author records, with identifiers between 5 and 10, inclusive:
SELECT * FROM AUTHOR WHERE AUTHOR_ID >= 5 AND AUTHOR_ID <= 10;
- Negative WHERE clauses: Negative filters using NOT, !=, or <> (an operator that is different in different databases) try to find something that is not there. Indexes are ignored and the entire table is read. This query reads all records, excluding only the record with author identifier as 10:
SELECT * FROM AUTHOR WHERE AUTHOR_ID != 10;
- The LIKE operator: Beware of the LIKE operator. It usually involves a full scan of a table and ignores indexing. If searching for a small number of records, this could be extremely inefficient. When searching for 10 records in 10 million, it is best to find those 10 records only using something like equality, and not pull them from all 10 million records, because all those 10 million records are read. The following query finds all authors with the vowel "a" in their names, reading the entire table:
SELECT * FROM AUTHOR WHERE NAME LIKE '%a%';
- Functions in the WHERE clause: Any type of functional expression used in a WHERE clause must be used carefully. Functions are best not to be used where you expect a SQL statement to use an index. In the following query, the function will prohibit use on an index created on the PRINT_DATE field:
SELECT * FROM EDITION WHERE TO_CHAR(PRINT_DATE,'DD-MON-YYYY')='01-JAN-2005';
- Utilize the index by not applying the function to a field in a table, but using the literal value on the opposite side of the expression:
SELECT * FROM EDITION WHERE PRINT_DATE=TO_DATE('01-JAN-2005','DD-MON-YYYY');
- Small and large tables: Very small tables are often more efficiently read by reading only the table, and not the index plus the table. The same applies when large portions of a single table are read. If enough of the table is read at once, the index may as well be ignored. Reading an index involves scanning an index and then passing pointers through to a table, scanning the table with index values found. When enough of the table is read, index scanning activity to find table records can become more time-consuming than reading only the table (ignoring the index).
- Composite index field sequence: In many databases, the sequence of fields in a WHERE clause can determine if an index is matched or missed. For example, create a composite index with three fields, indexed as follows:
CREATE INDEX CAK_EDITION_1 ON EDITION (PUBLISHER_ID, PUBLICATION_ID, ISBN);
- When a table is accessed with the following WHERE clause, the index will be used because all fields are included, and in the indexed field sequence:
SELECT ... WHERE PUBLISHER_ID=1 AND PUBLICATION=10 AND ISBN='1555583059';
- When a table is accessed with the following WHERE clauses, the composite index may not be used (depending on the database):
SELECT ... WHERE ISBN='1555583059' AND PUBLISHER_ID=1 AND PUBLICATION=10;
SELECT ... WHERE ISBN='1555583059';
SELECT ... WHERE PUBLISHER_ID=1;
Some databases may allow index use of composite indexes for the previous queries, but it is unusual.
The first query above does not match the sequence of indexed fields. The second query contains only the last field in the index. The last query contains only the first field in the index.
- IN and EXISTS set operators: IN is often used to test a single value against a list of literal values, as in the following query:
SELECT * FROM AUTHOR WHERE AUTHOR_ID IN (1,2,3,4,5);
- EXISTS is used to check against a dynamic set of values, such as that produced by a subquery, as in the following example:
SELECT * FROM AUTHOR WHERE EXISTS
(SELECT AUTHOR_ID FROM PUBLICATION);
- IN and EXISTS can be efficient, depending on how they are used. In other words, performing an IN check against a non-indexed field forces a full table scan. The IN operator query above checks literal values against a primary key and, thus, uses an index. The EXISTS operator full scans two tables. This can be made more efficient by incorporating a WHERE clause in both queries and by using a correlation between the calling query and subquery.
- Using AND and OR: AND and OR allow logical combination of multiple expressions, such as in a WHERE clause:
SELECT * FROM AUTHOR
WHERE NAME LIKE '%a%' OR (AUTHOR_ID >= 5 AND AUTHOR_ID <= 10);
Matching indexes where AND and OR operators are used is important because anything missing in an index field use could result in a full table scan.
Some databases allow for use of specialized built-in functions, very similar to regular programming CASE statements. Sometimes these CASE statement, much like functions, can be used to more efficiently replace expressions logically joined with AND and OR operators. The UNION clause is another option to consider. UNION merges two separate queries into a single set of records and can be equivalent to an OR logical operation.
The above tip was excerpted from Chapter 8, 'Building Fast-Performing Data Models,' from the book "Beginning Database Design" by Gavin Powell, courtesy of WROX Publishing. Click here for the complete collection of book excerpts.