When application users start experiencing performance problems, they typically call the database administrator...
and ask if there is a problem with the database. Interestingly, most people will look at memory, CPU and disk utilization, but not at the particular code modules that cause problems. This is unfortunate because poorly written Transact-SQL (T-SQL) code can cause serious performance issues; and fixing such problems by throwing more memory at the server is simply relieving pain rather than fixing the root cause of the problem. Instead, you should identify poorly performing queries and try to make them more efficient.
T-SQL is a powerful programming language that lets you read and modify data, change SQL Server settings, create and modify database objects, change Registry settings and much more. Let's not forget, however, that no one piece of T-SQL functionality is appropriate for every environment and every application. For example, if your application needs to copy numerous files between different network shares, T-SQL is not your best option. Similarly, e-mailing multi-megabyte attachments using SQL Server's built-in e-mail functionality is not a good idea. On the other hand, if you need to retrieve data rows from your database or modify all rows that meet specific criteria, T-SQL is your best bet. Since T-SQL supports a multitude of functionalities, I can't cover every specific situation that can cause performance woes. Instead, I will offer a few scenarios where one way of writing code will afford superior performance over others.
Row processing with cursors
Cursors are perhaps the most common cause of suboptimal performance, even though they're a powerful and handy mechanism for certain tasks. For example, suppose you have a complicated stored procedure you need to call repeatedly for each row stored in a temporary table. You could use a cursor to retrieve parameter values from the temporary table and call the stored procedure once for each retrieved value. Basically, cursors allow you to process each row within a data set and apply the same piece of logic to a collection of rows, one row at a time. However, T-SQL is a set-based language; the language is optimal for data read and write operations on sets of rows, not one row at a time. Beginner T-SQL programmers who switch from development with front-end specific languages (such as VBScript, ASP or ColdFusion) often make the mistake of using loops within T-SQL when loops aren't necessary. For example, the following piece of code marks all California authors' last names as "changed" in pubs database:
DECLARE @last_name VARCHAR(50), @au_id CHAR(11) DECLARE last_name_cursor
CURSOR FOR SELECT au_id, au_lname FROM authors WHERE state = 'ca' OPEN
last_name_cursor FETCH NEXT FROM last_name_cursor INTO @au_id, @last_name
WHILE @@FETCH_STATUS = 0 BEGIN UPDATE authors SET au_lname = @last_name +
'changed' WHERE au_id = @au_id FETCH NEXT FROM last_name_cursor INTO
@au_id, @last_name END CLOSE last_name_cursor DEALLOCATE last_name_cursor
This code works, and since there are only a handful of California authors, it finishes quickly. Now let's see if we can rewrite the same logic with a simple update statement:
UPDATE authors SET au_lname = @last_name + ' changed' WHERE state = 'ca'
What should you do? Indeed both snippets of code accomplish the same task, but not only is the second option much simpler, it is also likely to be 100 times faster on a large data set. I've worked on several projects where changing a cursor solution to a set-based solution reduced the duration of the query from several hours to less than a minute.
Stored procedures vs. inline queries
Using inline queries is typically a beginner's mistake. Some languages offer straightforward incorporation of queries sent from a front-end application to the database. If your application is used by a handful of people, you can get away with such a practice. But once your user base grows into dozens and hundreds, you are guaranteed to notice suboptimal performance (and perhaps even an irate user's wrath).
What's the deal? SQL Server has a very intelligent built-in query optimizer that creates the plan for executing queries at run time. The optimizer can cache execution plans that have been recently used and reuse them. Doing so results in reduced load on the server and therefore better performance. However, the optimizer is much more likely to reuse query plans for stored procedures than for inline queries. Your stored procedures are guaranteed to perform at least as well as inline queries -- 99% of the time they will perform better. The remaining 1% is for rare cases when reusing the query plan is actually a poor idea because the parameter values have changed dramatically. What should you do? As a rule of thumb, always use stored procedures instead of inline queries.
Performance issues with views
SQL Server views offer several benefits to a T-SQL programmer: they can be used to obfuscate underlying table structure, restrict sensitive data from certain users and so forth. However, views are not the best tool for every job.
Some Oracle programmers who switch to SQL Server tend to use views extensively; older versions of Oracle didn't allow returning data set from stored procedures, so one would have to create views. This is not the case with SQL Server. But, to be fair to Oracle programmers, they're certainly not the only ones who make this mistake.
Perhaps the most common misconception is that a view with dozens of joins will perform better than a stored procedure containing the same number of joins. This simply isn't true. A view has no performance advantage over the query that has the same number of joins. In fact, views have a major limitation -- they do not accept parameters. So, if you have a view joining 10 tables and one of those tables has a million rows, your view will return at least 1 million rows. Joining such a view to a few other tables within a stored procedure could be disastrous.
What should you do? A better alternative in such cases would be a user-defined function (UDF), which accepts parameters and allows you to limit the number of returned rows. Another option is to join all tables within a single unit stored procedure and limit the output by using parameters.
Performance issues with UDFs
UDFs were a welcome addition to a T-SQL programmer's arsenal when they became available in SQL Server 2000. UDFs allow you to create routines that execute much like built-in functions, but they perform data manipulations specific to your business needs. For example, you could write a function that calculates the number of business days between two dates.
There are three types of UDFs:
- Scalar: returns a single value
- Inline table: returns a row set (or table)
- Multistatement: returns a row set (or table)
Scalar UDFs can cause a significant performance problem if called within a query that searches through a large table. The problem is that a UDF is executed once for each row in the table calling it. For example, the following query will call my_udf 1 million times if Authors table has a million rows:
SELECT dbo.my_udf(au_lname), phone, address FROM authors
For this reason, be careful using scalar UDFs. Even if you have simple logic within a scalar UDF, it can slow your queries dramatically when executed against a large table.
What should you do? Consider whether you really need the scalar UDF. Let's say dbo.my_udf looks like this:
CREATE FUNCTION dbo.my_udf (@string VARCHAR(50)) RETURNS VARCHAR(100)
AS BEGIN DECLARE @return_value VARCHAR(100) SELECT @return_value =
'Mr. or Mrs. ' + @string RETURN @return_value END
Executing the SELECT query above, which calls the scalar UDF, will result in 23 reads (because pubs database happens to have 23 records in Authors table). On the other hand, the following query will return the same results in one read:
SELECT 'Mr. or Mrs. ' + au_lname, phone, address FROM authors
Unnecessary record blocking
If your application performs slowly, you may be locking records when doing so is not necessary. This is often true of the applications that combine reporting and transactional activity on the same server.
By default, SQL Server locks records every time you issue SELECT statements for the duration of the statement. Other users will have to wait until your SELECT completes before they can modify data.
Reports rarely need up-to-the minute data. For instance, a timesheet application typically shows data for previous weeks or months. In this case, you can use locking hint (NOLOCK) or a less restrictive transaction isolation level without affecting the quality of your reports.
What should you do? Add NOLOCK hints to the tables referenced in your reporting queries. For example, the following query retrieves a list of all titles and authors without locking any records:
SELECT au_lname + ', ' + au_fname, title, price FROM authors a (NOLOCK)
INNER JOIN titleauthor b (NOLOCK) ON a.au_id = b.au_id INNER JOIN titles c
(NOLOCK) ON b.title_id = c.title_id
Returning number of affected rows
By default, SQL Server returns a friendly message to report the total number of rows affected by each query. This is a great setting for debugging your applications or modifying data directly in Query Analyzer. However, your front-end application doesn't need to know the number of affected rows -- it simply needs the data. Sending this message can introduce unnecessary network overhead.
What should you do? Use "SET NOCOUNT ON" in all your stored procedures to reduce network chatter.
Extensive use of triggers
Triggers offer a way to perform certain actions when data is added, modified or removed from a given table. SQL Server 2005 also offers database-level triggers but I'm only referring to table-level triggers here. Triggers impose additional overhead on the server because they initiate an implicit transaction. As soon as a trigger is executed, a new implicit transaction is started, and any data retrieval within a transaction will hold locks on affected tables.
What should you do? Use triggers sparingly and make them as short as possible. You can easily replace certain triggers with referential-integrity constraints; other trigger functionality can be enclosed in stored procedures.
Often, you need to apply different coding logic depending on the value of parameters passed to your stored procedures. For example, I could retrieve values from one set of tables if my parameter has a value of zero, and from another set of tables if the parameter has a value of one. The most common way of programming such logic is to use an IF statement as follows:
IF @parameter = 0 BEGIN SELECT column1, column2 FROM some_tables…
END ELSE BEGIN SELECT column1, column2 FROM other_tables… END
Such code forces my procedure to be recompiled at each execution because SQL Server has no way of knowing which of the two plans will be useful until runtime.
What should you do? Split the above procedure into two separate procedures, each querying a different set of tables. Determine the value of the parameter in your middle tier (or front end) code, then call the appropriate stored procedure.
How to diagnose and fix T-SQL problems
There are several ways to diagnose code-related performance problems and I don't have room in this tip to cover each one of them. But you can use SQL Profiler to narrow down a list of the routines that perform the worst. You should also examine the query execution plan to ensure that SQL Server comes up with optimal execution plans for your routines.
Once you know that a stored procedure has a problem, take the following steps to resolve it:
1. Create a procedure with a slightly different name (say "routine1" instead of "routine").
2. Make code changes in the new procedure and ensure it is more efficient than the existing one.
3. Rename the old procedure to "routine2."
4. Rename the new procedure to "routine."
How do you resolve problems with inline queries? You can attempt to rewrite the existing queries but, more than likely, you will need to recompile and redeploy the code. It is to avoid inline queries altogether and enclose all T-SQL queries in stored procedures.
How to prevent future occurrences
- Have coding standards and guidelines. Microsoft offers resources that may be helpful in this regard.
- Perform code reviews before each deployment. Code reviews should:
- Ensure that code accomplishes what it's set out to do.
- Ensure that coding standards and guidelines are followed.
- Educate newer programmers about any coding tips and techniques they might not have encountered in the past.
- Test your routines thoroughly under load before deploying them into a production environment.
ABOUT THE AUTHOR
Baya Pavliashvili is a DBA manager with HealthStream Inc. -- a leader in online healthcare education. In this role, Pavliashvili oversees database operations supporting over one million users. His primary areas of expertise include performance tuning, replication and data warehousing. You can reach him at email@example.com.
More information from SearchSQLServer.com