Problem solve Get help with specific problems with your technologies, process and projects.

Horizontal partitioning of large databases

Learn about the details and considerations of performing horizontal partitioning of large databases.

Last time, we looked at partitioning large databases by vertically partitioning the dataset into different tables contained in other databases. This scheme offers you logical unification through views, but operational complexity managing data references, additional statements and backups. Horizontal partitioning schemes represent the alternative.

You can horizontally partition your database by splitting a table into complete datasets and placing those datasets into other databases. Typically a field is used to create that separation, such as an ID field, a data field, location and so on. You can horizontally partition your data using a logical separation such as a year or an office location. Each partition of the data has the same structure as the original database. You would use horizontal partitioning to archive a year's data and remove it from the main table. So whereas both partitioning types create new tables in different databases, in horizontal partitioning you are maintaining your original architecture.

Horizontal partitioning has both advantages and disadvantages. From a performance standpoint, any user requiring access to a record uses the device that has access to the table with that data. You can create multiple paths to the data and cut down on contention and record locking making user access faster. Partitioning data horizontally also lets you physically and logically protect data by limiting access to that data.

Separating data through a logical division is certainly convenient, but it adds the complexity that now your code has to account for where the data is stored when a user requires access. Sometimes that is easy to do, other times it isn't. However, logical data division is certainly easier to access than the alternative method of partitioning your dataset using a hash partition. When you use a hash key to create the partition the hash key normally generated from a random number algorithm provides an even data split, maximizes your performance and lowers your access issues, but makes finding the data's location difficult. Consider what happens when a RAID set fails. You can restore that set, but that requires restoring the entire data set. For a logical partition you know which data was lost and you only need to restore that particular part of your data set.

The other problem with horizontally partitioning a data set is that you have to access the data using something like a UNION operator to create a unified query resultset. You can store the necessary access details into a unifying query, but INSERTs, UPDATEs, and DELETEs are still problematic.

Barrie Sosinsky is president of consulting company Sosinsky and Associates (Medfield MA). He has written extensively on a variety of computer topics. His company specializes in custom software (database and Web related), training and technical documentation.


Dig Deeper on Microsoft SQL Server Performance Monitoring and Tuning