The much-lauded SQL CLR is a powerful tool in the SQL Server 2005 developer's arsenal. Being able to develop routines in .NET languages that run within SQL Server's process space gives developers the ability to create new kinds of custom methods and extend the server's type and aggregation system, giving them quite a bit of newfound control!
Using these new features is fairly straightforward, but like any other new programming paradigm there are bound to be pitfalls. The following is a loose collection of tips that represent some of my more important findings and realizations as I've worked with SQL CLR integration over the last several months.
Treat SQL CLR data access code like client data access code
The ADO.NET unified provider makes it incredibly easy to migrate code between tiers; virtually all of the ADO.NET classes you can use in client tiers function the same in the data tier. As a matter of fact, code migration from a client tier into SQL Server typically only requires changing the connection string to make use of the context connection.
As developers, we have quite a few standards and best practices built upon data access. A common adage to live by is "open late, close early." In other words, open connections at the last possible moment and close them as soon as you're done accessing data. Another example is stored procedures. Although there is certainly some debate in the areawhether or not stored procedures should be used instead of ad hoc SQL, most data access developers know the benefits of using stored procedures to encapsulate data access.
Given the ease with which data can be migrated between tiers when coding SQL CLR routines, you can never say for certain that a given piece of code will not have to be migrated between tiers. So keep the coding standards simple: Use the same standards inside SQL CLR routines that you'd use in client tiers for data access. This will make later migration and maintenance much easier.
Avoid DataSets in SQL CLR stored procedures and functions
Rules are meant to be broken. Unfortunately, the previous tip has its caveats. If you're migrating code that uses DataSets to store intermediate data, ask yourself why you're migrating. Hoping to see performance improvements by transferring less data across the network? Consider that you'll still be marshalling data in and out of the CLR via the ADO.NET objects; the data has to get into the DataSet somehow.
Does the migration still make sense? Consider a rewrite. DataSets (and collection classes) used as intermediate stores for data processing can take up a lot of memory, which may be needed for one of SQL Server's other pools. Thanks to SQL Server's pricey licensing, every megabyte of memory SQL Server may need is quite a bit more expensive than a megabyte of memory in the middle tier.
Instead of using the DataSet, try rewriting your code with a DataReader. The DataReader streams rows one at a time from SQL Server and therefore uses a lot less memory at once. That translates into less memory overhead for your routine and cheaper overall cost for a scalable solution.
Procedural logic is no better in SQL CLR than in T-SQL
If you've spent a lot of time reading SQL Server Web sites, you've probably learned that the most important thing you should do to write high-performance T-SQL: Avoid procedural logic (cursors) unless there is truly no other solution.
Before migrating procedural code that allows data access into SQL Server, carefully consider whether the benefits you're hoping to see will manifest. Just like RAM mentioned in the previous tip, CPU time in SQL Server is expensive compared to CPU time in client tiers. Scaling out a middle tier is generally cheaper than scaling up a database server.
Procedural code is often inefficient in a database context. Unless you have absolutely no other option, consider a rewrite instead of a straight migration. Can you rewrite the procedural logic using set-based T-SQL? Time spent doing such a rewrite will probably end up being cheaper than having to upgrade the SQL Server later because of excessive resource utilization.
Modularize: Use UDFs instead of stored procedures whenever possible
User-defined functions (UDFs) are a great way to encapsulate logic for flexible reuse. Unlike stored procedures, which can only be executed, UDFs can be used in other queries' SELECT lists, FROM clauses, WHERE clauses, etc. This makes them ideal for utility logic that needs to be reused again and again in a database system.
T-SQL UDFs were introduced in SQL Server 2000, but many database developers shied away from them due to performance concerns. T-SQL UDFs are called once for every row of the result set they are being executed against. Alone this might not be a problem, but T-SQL UDFs are also known to be quite slow; combining row-by-row execution with slow processing means greatly decreased performance. Thankfully, Tthese concerns are no longer warranted thanks to SQLCLR integration.
SQL CLR UDFs perform better than their T-SQL counterparts, showing especially vast improvements in the areas of mathematical operations and string handling. If you require common utility methods in these areas, there is no question that UDFs are the best choice in SQL Server 2005.
Also consider which type of routine to use when migrating code from client tiers into SQL Server. Once again, SQLCLR UDFs make a lot of sense. Instead of creating a stored procedure, which may or may not be usable in all future scenarios, consider a UDF. Its flexibility will guarantee you'll be able to utilize the logic in many different situations.
Favor T-SQL triggers that call SQL CLR UDFs over SQLCLR triggers
A SQL CLR trigger is a CLR method -- just like a UDF or a stored procedure – but it is defined as a trigger instead of a callable routine. This means that the logic it contains may not be used in other situations.
Instead of creating SQL CLR triggers, try creating T-SQL triggers that call SQL CLR UDFs for logic or external operations. This will make your code more modular and more reusable for a variety of requirements. Who knows when you'll need to use the trigger logic elsewhere?
Encapsulate data verification logic in SQL CLR UDTs
SQL CLR user-defined types (UDTs) have received mixed reactions from database developers. Some hate the new functionality and others can't wait to use it. I won't get into that debate here (stay tuned for an upcoming tip on this topic), but if you are going to use UDTs you should take advantage of all of the power they bring to the table.
New instances of SQL CLR UDTs are created under the covers by a method called Parse. This method's stated goal is to manipulate (parse) the input string into its component parts in order to populate the UDT's member fields and, therefore, create an instance of the type. But Parse can be much more than that.
Instead of simply breaking up the string, use Parse to encapsulate data verification logic. Once you've done that, your type will be self-enforcing. This eliminates the need to do things like bind CHECK constraints to columns that use the type -- if you know that the type enforces its own rules, there is no reason to repeat the work. This will also allow you to encode rules in exactly one place. Using the same type two times in the same database? You will not need two constraints if the type implements its own rule checking logic. The economy of embedding verification logic in UDTs should be immediately apparent: The more you use the type, the better the return on investment.
Those who've read this far are probably somewhat surprised to see that these tips mainly fall into the realm of monetary savings rather than raw performance. Money is a topic that all of my clients -- and most likely your employers -- are extremely interested in. By carefully considering when and how to use SQL CLR integration you can enhance your system's overall performance. But by misusing these features, you may incur financial losses.
The most important lesson learned so far is to think and test carefully before putting any solution into effect. There are always alternatives, so spend a few minutes before getting too invested in any one solution. A better, cheaper choice may exist.
About the author: Adam Machanic is a database-focused software engineer, writer and speaker based in Boston, Mass. He has implemented SQL Server for a variety of high-availability OLTP and large-scale data warehouse applications, and also specializes in .NET data access layer performance optimization. He is a Microsoft Most Valuable Professional (MVP) for SQL Server and a Microsoft Certified Professional. Machanic is co-author of Pro SQL Server 2005, published by Apress.
More information from SearchSQLServer.com