Partitioning is the distribution of a table over multiple subtables that may reside on different databases or servers in order to improve read/write performance. SQL Server partitioning is typically done
Vertical partitioning improves access to data
In a vertically partitioned table, columns are removed from the main table and placed in child tables through a process called denormalization. This type of partitioning allows you to fit more rows on a database page, making tables narrower to improve data-access performance. Therefore, a single I/O operation will return more rows. By vertically partitioning your data, you may have to resort to joins to return the denormalized columns. Carefully consider the impact of vertical partitioning on your overall data access patterns.
SQL Server will, by default, vertically partition text and image datatype columns on the database page. Columns created using the text and image datatype will be stored in separate data pages from the non-text and image columns. That way they take up less space than if you stored the text data in columns using the varchar data type.
Consider the following table:
Create Table Table1
(PK int not null identity constraint pk primary key,
You can store a total of 8096/(4+10+10) = 338 rows per page, ignoring per-row overhead. If you stored textual data in a varchar(8000) column, assuming you have 8 KB of data in this column, you would have one row stored per page. If the majority of your queries do not return the text column, this storage-engine optimization will be highly beneficial to you. If the majority of your queries do return the text column, you may wish to use the text in row option. The table option text in row stores text data in a single database page if the amount of text is less than a configurable threshold.
Horizontal partitioning improves overall read/write performance, sort of
A horizontally positioned table distributes groups of rows into different tables or databases (sometimes residing on different servers) according to a criterion, which may include a primary key value, date column or a location identifier.
Consider an application where 10,000 users must access a single table on a single server: You can expect high levels of read and write contention on this table. To improve overall read performance, you may want to scale out the same table and distribute it across 10 SQL Servers, then load balance them so each of the 10 databases services 1,000 customers. This may seem like a good approach to improve performance -- but there are several inefficiencies:
- Storage usage: The same copy of the data is distributed to each node.
- Data distribution: How will you keep all nodes in sync with the same copy of data?
- DML synchronization: How will you synchronize changes across all nodes?
SQL Server 2005 will offer a peer-to-peer (P2P) replication model designed to distribute uniform sets of data across multiple servers and keep them synchronized. Updates in a node are replicated to all other nodes in this topology: When a node comes back online, it is synchronized with updates that occurred in other nodes participating in the P2P model. This solution is designed to scale out servers in different geographic locations.
However, in most horizontal partitioning implementations, each server or database contains a subset of the data. Prior to SQL Server 7, these implementations were done at the application level where the calling application would make decisions on which table to access based on the SARG (Search Argument). If tables were partitioned within the same database, the T-SQL logic had to account for the partitions so the correct partition was referenced or updated. Such partitioning methods quickly became complex and did not scale well. If you created a view to present a complete results set, all the underlying base tables would have to be referenced.
Distributed partition views
In SQL Server 7, the Query Optimizer would consider check constraints in the execution plan. If you had a check constraint on a column and your SARG referenced this column, only the correct partition would be referenced in the execution plan. Consider a table with a check constraint on a last name that looks like this:
CONSTRAINT [CK_Employee] CHECK ([LastName] > 'a' and [LastName] < 'b')
Such a table would only accept employees with a last name beginning with the letter A. Say you had a distributed topology consisting of 26 tables, one table for each letter, and these tables were referenced in a view of the union of these 26 tables. If you were to issue a query like the following, the Query Optimizer would reference the check constraints for each table and only reference the table containing last names beginning with the letter A.
Select * from EmployeeTableView where LastName ='a'
However, inserts, updates and deletes would have to be performed against the correct base tables. In SQL Server 2000, you could perform DML operations against the view and the correct base table would be referenced.
Partitioning in SQL Server 2005
SQL Server 2005, available in November 2005, introduces several database partitioning enhancements to improve performance and reduce the administrative overhead involved with managing the partitioned data set.
Prior versions of SQL Server required you to manually create the tables to form a partition. In our example above we would have to create 26 tables (one for each letter) to form the Employee partition. In SQL Server 2005, you can create a partition function, then a partitioned table using this function, and SQL Server will handle the distribution of data ranges to the different partitions according to your range function. This will look like a single table. For performance reasons, you would probably want to have the partitioned table span multiple file groups (possibly on different disks or arrays) with each partition allocated to its own file group. To do this, you would create a partition scheme that contains all file groups forming your partition, and then create your table to use this partition scheme instead of the normal file group.
You can also create partitioned indexes. By default, indexes created on a partition will use the same partitioning scheme as the partitioned table. The best resource for how to use the partitioning capabilities of SQL Server 2005 is Kimberly Tripp's white paper.
Both partitioning methods have their strengths and weaknesses. SQL Server 2000 offers several options to partition your database: Vertical partitioning is used to narrow the effective width of your rows through denormalization and careful use of text datatype columns. Horizontal partitioning breaks VLDB (very large database) tables into smaller tables according to ranges. Query Optimizer in SQL Server 7 is aware of check constraints and will only reference the base table that corresponds to the where condition. SQL 2000, on the other hand, allows you to perform DML against the views. And SQL Server 2005 will allow you to scale out using the peer-to-peer replication model, simplifying the creation and administration of a partitioning solution.
About the author: Hilary Cotter has been involved in IT for more than 20 years as a Web and database consultant. Microsoft first awarded Cotter the Microsoft SQL Server MVP award in 2001. Cotter received his bachelor of applied science degree in mechanical engineering from the University of Toronto and subsequently studied both economics at the University of Calgary and computer science at UC Berkeley. He is the author of a book on SQL Server transactional replication and is currently working on books on merge replication and Microsoft search technologies.
More information from SearchSQLServer.com
- Tip: Get normalization rules you must follow
- Tip: Learn how to improve partition views in SQL Server
- Topic: Get tips and
advice on how to improve SQL Server performance
This was first published in September 2005