Want to improve SQL Server performance? Fine-tune your applications
As you develop SQL
Server applications that hundreds of users will tap into at once -- using databases with up to
millions of rows -- it is critical that you make those applications as efficient as possible so
that the hardware can keep up with the workload. This applies to both the database and the
client-side application code. Follow these tips and guidelines on fine-tuning your applications and
you’ll improve SQL
Server performance in no time.
It’s all about indexing
Pretty much everyone with even a little database
Premium Access
Register now for unlimited access to our premium content across our network of over 70 information Technology web sites.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States.
Privacy
Dig Deeper
-
People who read this also read...
-
This was first published in October 2011
experience agrees
that indexes are crucial for developing efficient database applications. Yet in practice, indexing
gets neglected or isn’t implemented properly. Often, senior-level software engineers develop
applications that lack proper indexes (see “Four steps to proper table indexing” below).
Without proper indexes, SQL Server has to scan all data in the table to find matching rows.
Imagine how difficult it would be to look for certain words in a book without having an index at
the end. You would have to read the book cover to cover every time you needed to find something.
But that’s essentially what SQL Server has to do anytime you look for data in a large table that’s
not indexed. As a rule, you should index columns that are included in the WHERE
clause.
Code matters
Batch-processing and
batch-import applications are used to read and process huge amounts of input data and move it over
to a destination database. It is not unusual for these applications to handle millions of records.
One mistake I have seen developers make in these types of applications is to write the code as if
they were developing regular applications -- handling one update at a time as a self-contained
piece of code.
For example, for each row processed by ADO.NET, an
application starts a while loop: It creates a connection object, connects to the database,
creates a command object, populates it with a newly created collection of parameters and makes a
database call, and it cleans up all these objects as the loop is running. In a batch-processing
application, all that initializing and cleanup is expensive. One time is enough.
For maximum efficiency, pull initialization and cleanup code out of the loop and tighten it as
much as possible. The code that creates and initializes ADO.NET objects like connections, commands
and parameters should not be in the while loop that processes the data. Create and initialize
ADO.NET objects just once, and then modify the parameter values and update the database in the
processing loop. Do the cleanup at the end, outside of the loop.
Remember, when you are processing millions of rows, even a few milliseconds saved makes a
difference. When you multiply that by the astronomical number of processed records, you could be
saving hours of processing time.
Careful what you COUNT(*)
Here is another coding inefficiency developers commit: “If the result of SELECT COUNT(*)
is greater than 0, do something.” If you look at it more closely, you’ll see that what we want is
“If at least one row meets the condition, do something.” The problem with using COUNT(*)
here is that the SQL query will count all the rows that match; instead, we can stop it after
finding the first match using the EXISTS query. So keep this subtle difference in mind and
count everything only when you need the actual count. Use EXISTS when all you need is to
check for the existence of at least one matching row.
Get queries, data types in sync
When writing Transact-SQL and ADO.NET code that uses parameters, it is important that you match
data types in the queries with column data types in the tables. In some cases SQL Server can do an
implicit data type conversion for you (e.g., convert a number to a string). But what you may not
realize is that while the conversion of the data types will not affect the results, SQL Server may
not be able to use the existing index on the column being queried.
For an example, let’s turn to Microsoft’s AdventureWorks sample database. In the Employee table,
there is a column called NationalIDNumber; it is defined in the database as the variable-length
data type NVARCHAR(15). The column is indexed, so SQL Server should be able to perform a
quick lookup if you search for a particular ID. Now look at these two queries that return the same
result:
Query 1
SELECT
* FROM [HumanResources].[Employee]
WHERE NationalIDNumber = N'233069302'
Query 2
SELECT
* FROM [HumanResources].[Employee]
WHERE NationalIDNumber = 233069302
Figure 1 shows the execution plans for both queries. (In each window, “Query cost” shows what
percentage of execution time the query took, and percentages for each step are displayed below the
query lines.)
Figure 1. This execution plan shows that when the data types match (top window), SQL Server
performs an index seek to find matching data. When the data types don’t match in the second query,
SQL Server uses a less efficient index scan.
In the first query, we search using an NVARCHAR
string, N'233069302', so the data type in the query matches the data type in the table.
SQL Server uses the existing index on the NationalIDNumber column and performs an index seek. That
gives us a quick lookup.
In the second query, we search using an integer, 233069302, so there is a type mismatch between
the query and the data type in the table. Subsequently, SQL Server converts the value in each row
in the table from NVARCHAR to INTEGER to perform a comparison, and it scans all
the rows in the table. It still uses the index, because it’s faster to scan all national IDs in the
index pages as opposed to data pages, but a scan requires much more processing to deliver the same
result.
In this particular database, the difference in execution is not very large (54% of total
execution time for both queries versus 46%), but if the table contained millions of rows, we could
be talking about an execution time taking valuable seconds instead of milliseconds. The difference
in the query is subtle, but it’s an easy mistake to make if the programmer assumes the data type in
the table was stored as an integer and doesn’t double check.
Indexing your databases makes the applications faster and more responsive, and efficient coding
cuts down on processing time. When you are using small databases and your front-end applications
aren’t accessed by many concurrent users, inefficiencies aren’t noticeable. But as the data piles
on and your applications become more heavily used, the database and the applications can slow down
to the point where they’re not usable, users will get timeouts and your customers will be unhappy.
Following these coding and database efficiency practices and coding your applications with
scalability in mind will ultimately improve SQL Server performance.
Four steps to proper table indexing
The importance of indexing for speedier applications cannot be stressed enough. Use this
indexing checklist as a guideline for making sure your tables are indexed properly.
- Create a clustered index. To help SQL Server organize data more efficiently, each table
should have a clustered index, which determines the order of data in a table. Choose your ordering
method carefully (by last name, ID, date of birth) because you can only have one clustered
index.
- Create a primary key. Each table should have a primary key to implement entity
integrity. In some cases, you may also want to have a clustered index on the primary key. When you
are doing a range search on dates, for example, it’s best to create a clustered index on the date
column.
- Create unique constraints. If you have a group of columns in which each row should have
a distinct value, enforce this requirement in the table design. You could implement it as a unique
index or a unique constraint. The latter option is encouraged in SQL Server Books Online because it
makes the objective clearer.
- Create indexes on foreign keys. You’ll want to create an index on your foreign keys so
that child records can be found quickly. But don’t blindly create an index on each foreign key in
the table. Take column selectivity into account and index only columns that will contain numerous
distinct values. For example, indexing the OrderID column in the OrderDetails table makes sense,
because the column is selective; it will have as many distinct values as the number of orders. But
if you have a column like OrderStatusID, you will only have a few distinct values across the table,
so the index won’t likely be used.
ABOUT THE AUTHOR
Roman Rehak is principal database architect at MyWebGrocerin Colchester, Vt. He specializes in SQL Server
development, database performance tuning, ADO.NET and writing database tools. He contributes to
Visual Studio Magazine, SQL Server Magazine and other publications and presents at
user groups and conferences in the U.S. and Canada.
Disclaimer:
Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
Join the conversationComment
Share
Comments
Results
Contribute to the conversation