How to Apply Indexes in the Real World
There are various ways in which indexes can be applied, generally dependent on the function of the table for which an index is created:
- No index: Data in a table is heap structured (in a heap or disorganized pile). Both small and large tables can benefit from having no indexing. Small tables may be best accessed as a whole, rather than with table and index, because they access a small amount of physical space. Large tables could very well only be read in their entirety based on application requirements. Why index a table when indexes are never used? It is even sometimes beneficial to drop referential integrity keys and indexes.
- Static data: Small static data tables are often better off without indexing. Be aware of two potential problems: removing foreign key indexes can cause serious locking problems that can drastically hamper performance; and highly complex joins with many tables usually benefit from all tables having indexes, particularly unique primary key indexes, even on small static tables.
- Dynamic data: Dynamic data is data that changes all the time (transactional data). These indexes are changed frequently, are subject to overflow, and require frequent refreshing. Be acutely aware of the type of index used for dynamic data. The default index type for a particular database is usually the best option for dynamic data. This index type is usually some form of binary tree indexing structure. Other index types involving pre-calculated structures such as ISAM, hash tables, and bitmaps will overflow almost immediately when subject to change.
Overflow of an index is seriously ugly for performance. Index overflow happens to certain types of indexes where any changes to data in tables cannot be slotted into the proper physical point in the original structure of the index. This is because of the way in which certain index types are constructed. The result of overflow and a lot of data changes could be query I/O quite literally bouncing all over disk storage trying to find data. This can cause serious performance problems.
- Read only reporting index: Unlike dynamic data indexing, read-only data is much more flexible with respect to index types, because data is not subject to change. Read-only indexing is specially designed for read only queries, often in data warehouses. Types of indexing in this category are pre-built structures subject to overflow but highly efficient when used for read only I/O activity. Index types proficient as read-only indexes (bitmaps, clusters, hash tables) are ineffective in highly dynamic environments.
- Unique and non-unique indexes: A unique index is an index allowing only a single value in a table. Be careful when creating unique indexes because every insertion or update to a uniquely indexed field (or fields) requires a scan of the entire index space (to verify a value as unique). Anon-unique index allows more than one record with the same value and is typical of foreign key indexes. Unique indexing is better for performance and is typical of primary keys. Aunique index is better for performance because subset index searching can be used to find single records, in theory making for less I/O, and less traversal through index structures.
- Single field versus multiple field indexes: Multiple field indexes are generally known as composite field indexes. Single field indexes are more efficient than composite multiple field indexes. The simple fact is the fewer fields, the less to search through. Also, fewer fields means the index is relatively much smaller than its parent table. The bigger the relative difference in size between table and index, the more effective the index is at reducing I/O, especially for larger tables.
- Datatypes to index: Integers are always best. An integer is a whole number with no digits to the right of the decimal point. Any other datatypes are nearly always flexible in terms of both length and content. Fixed-length strings are not quite as efficient as integers but they can be good options for index construction if the strings are a few characters, such as with use of codes. Quite often, codes are used to represent structures, such as code names for states in the United States. For example, NY represents the state of New York and CA represents California. Numbers are still better because there are 10 possible digits. Character strings have 26 different variations for the letters of the alphabet, plus 10 possible digits (strings are alphanumeric and can contain numerals as well), plus all sorts of punctuation and other odd characters.
- Sacrificing referential integrity for performance: Sometimes this is a good idea, but most often it is not. Dropping of foreign key indexing can cause serious locking issues. Referential integrity uses primary and foreign keys to validate relationships between records in related tables. If there is a lot of validation occurring, and a table containing a foreign key has no foreign key index, the child table could be frequently fully scanned, resulting in huge competition for the entire table rather than just the index.
- Optimizing with alternate indexes: Alternate indexing is often referred to as secondary indexing. Alternate indexing includes any indexes created against tables in a database model, which are not part of referential integrity constraints. Quite often, the need for alternate indexing is a mismatch between the database model and functionality required by applications. Excessive alternate indexing could indicate the database model not catering to application needs. Reporting or data warehousing application requirements will usually cause further demand for alternate indexing.
The above tip was excerpted from Chapter 8, 'Building Fast-Performing Data Models,' from the book "Beginning Database Design" by Gavin Powell, courtesy of WROX Publishing. Click here for the complete collection of book excerpts.
This was first published in April 2006