What makes a database scalable, growing and working well for years to come for 10 to 1,000 or more users?
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
There are a few things that go without saying: Don't expect a single Pentium 4 box to scale out to 10,000 users as effectively as an eight-processor system. Beyond that, many scalability issues in SQL Server rest on how you've built your database, where you put your code and how it's run, and how much of your business logic you put into it.
Parallelization across file structures
SQL Server was designed to break databases and tables across multiple physical files. The bigger the data structure, the more it should be spread out to take advantage of the computer's own file system. Always find ways to break things across physical file boundaries, both by need and by patterns of access, and segregate each file across physical disks whenever you can.
I have a table of several hundred million rows that contains date/time, integer and ntext data types. The ntext data is the biggest and most cumbersome, so it gets placed in its own physical filegroup -- not just its own file, but its own filegroup. Each file in that group is allowed to grow as needed, and at the end of each month, a new file is introduced to the group.
I've placed indexes for this table in its own filegroup, so they can be accessed independently of everything else as well. However, I've also balanced this against the fact that while I have a lot of data online, I have relatively few users trying to access it at once, so I don't need to spread it out across as many physical disks. In fact, the whole thing fits nicely on one disk and runs quite well.
Proper placement and management of code
There has been endless debate about where to put code that works directly with a database: Should it all be placed in the front end (i.e., dynamic SQL), in the back end (i.e., stored procedures) or some mix of both?
If you're concerned about how well the database will perform with 10,000 users instead of just one, you should rely as little as possible on dynamically generated SQL. This includes SQL statements produced in the context of a stored procedure, which have to be evaluated at runtime.
You'll also need to determine, through real-world performance testing if possible, whether it's best to use a broad range of specifically constructed procedures or a few general-purpose ones. For instance, in the above-mentioned database, I have one stored procedure that returns an open-ended range of rows from a particular table (used in one context) but returns a fixed number of rows (invoked in another context). It was just easier to move them into one stored procedure and have the procedure's own logic determine what to return -- needing only an IF statement to do that -- rather than make multiple redundant procedures.
SQL Server 2005's CLR (Common Language Runtime) makes it far easier to do non-database activities within a database ,that seem best suited to solutions where the emphasis is on flexibility rather than massive scalability (i.e., a turnkey product deployed to a site where there are never more than a few dozen users at once).
The more trim your T-SQL code is, in general, the better: Don't return columns you don't really need. Break the habit of using SELECT * in your queries if you haven't already. Avoid cursors, but if you must use them, choose those with the right lock type for the work you're doing. Don't put excessive indexes on tables that see a lot of INSERT, UPDATE or DELETE actions; one clustered and one or two non-clustered indexes should do it for almost every kind of table. Most of this should already be familiar to people who practice good T-SQL code hygiene and may seem excessively petty, but it really pays off when put into play on a server with many users.
Less handling of business logic
Business logic is anything normally handled by your Web server (i.e., output formatting) and should ideally be left there. Anything that deals with manipulating files, for instance, shouldn't be done in a stored procedure unless it's absolutely needed. This kills scalability and is best left to the front end, whatever it may be, since it's built from the ground up to scale well when handling multiple chores of this type. I do caution against storing images as binary objects in a database -- it's just easier and faster to store a file path to the actual image on disk.
A database must always assume responsibility for maintaining data integrity (i.e., type checking or constraining input). This way, no matter what front-end the database is hitched to (or if it's being accessed directly through an administrator's console), it becomes that much harder to wreck the data in it without making changes to the constraints themselves.
How you enforce data integrity will also affect scalability. For instance, you should avoid using triggers unless you have no other choice. Trigger actions are automatically encapsulated within a transaction, which can mean a lot of overhead if you have a great many people firing off said trigger and modifying rows. If there are certain relationships you always want to enforce between tables, start by using database diagrams instead; they're easier to grasp and understand, whereas triggers are buried and not always easy to debug.
About the author: Serdar Yegulalp is the editor of the Windows 2000 Power Users Newsletter. Check it out for the latest advice and musings on the world of Windows network administrators -- and please share your thoughts as well!
More information from SearchSQLServer.com