Face it, you never intended to become a SQL Server expert, but the proliferation of this database engine – and...
its many editions – requires somebody to feed and care for it. You're the "Microsoft Guy" (or Gal), so whether you wanted to be or not, you were elected. This series of articles is all about making you more effective with SQL Server as an administrator, not a programmer.
Before we start diving into real-world tasks, however, a little bit of background information will be helpful. So how does SQL Server work? Believe it or not, understanding this aspect of the "black box" will help you get a handle on nearly every aspect of Microsoft SQL Server, from backups and restores to replication and mirroring.
SQL Server stores things on disk in 8 KB chunks called pages. It also manipulates those same 8 KB chunks in memory, meaning the smallest unit of data SQL Server works with is 8 KB.
When data is written to disk, an entire row of it must fit within that 8 KB page. It's possible for multiple rows to share a page, but a row cannot span multiple pages. So, if a Customers table has columns for Name, Address, City, State, and Phone, then all of that data combined must be less than 8 KB. An exception is made for certain data types where the actual page only contains a pointer to the real data, such as binary data like photos, or large gobs of text. The real data can then be spread across multiple pages, or even stored in a file (that's the special FILESTREAM type which we'll discuss later). SQL Server gathers all these 8 KB pages into a simple file on disk, which usually has either a .MDF or .NDF filename extension.
Looking for more on the fundamentals of Microsoft SQL Server?
Check out our tutorial:
SQL Server basics for A to Z
When SQL Server is told to do something, it's by means of a query written in the Structured Query Language (SQL) syntax. This is what happenes first: SQL Server's internal query optimizer looks at the query and constructs a battle plan for executing it (i.e. figuring out what steps it will need to take in order to get that data off of the disk). This is actually pretty complicated, since SQL Server has a number of techniques it can use, some which are better in certain conditions than others.
Once SQL Server has the plan, it executes it and retrieves the needed data off of the disk. In the case of a retrieval query, the data is then streamed to the requesting client across the network. With a modification query, SQL Server modifies the pages of data in memory. It doesn't write those modifications back out to disk – oh no, not yet. That would be silly, since there might be additional changes coming along for those pages and the system load might not offer a good disk-writing opportunity right then. What SQL Server does, however, is make a copy of the modification query in a special log file called the transaction log. This file, which has an .LDF filename extension, keeps a record of every transaction SQL Server has executed.
Eventually – maybe a few seconds later – SQL Server will decide to write the modified pages out to disk. When it does so, it goes back to the transaction log and "checks off" the transaction that made the modifications. Essentially, it is saying, "OK, I made that change and it's been written to disk." This way, SQL Server knows that the change is safe on disk.
In the event that SQL Server crashes, it has an automated recovery mode that kicks in when it starts back up. It goes straight to the transaction log and looks for uncommitted transactions, or those which have not yet been checked off. It knows that the checked off transactions are safe on disk; anything else had not been written to disk and was still floating around in memory when the server crashed. So SQL Server reads those transactions out of the log, re-executes them, and immediately writes the affected pages to disk. This process allows SQL Server to catch up with all in-progress work, and ensures that you never lose any data – provided your disk files are okay, of course. Now think about this important fact -- EVERYTHING that happens in SQL Server occurs only through the transaction log, and SQL Server can re-read the log to repeat whatever has happened. This process makes nearly everything that SQL Server does possible.
Of course, this is only the default, and you can change it. Individual databases can be switched from the full recovery model I've described here to a simple recovery model which doesn't use a transaction log (well, it does, but checked off transactions are automatically removed to keep the log small). Simple recovery is only appropriate for read-only databases that have no changes being made. With no changes, there's no chance of losing data in a crash.
So that's how data moves from disk to memory. This entire process is absolutely essential to how most of SQL Server's functionality actually works, as well as how to administer it. In my next article, I'll look at how disaster recovery works in SQL Server, and how you can implement a sensible, safe disaster recovery plan for your databases.
Miss a tip? Check out the archive page for Don's series, SQL Server for the Reluctant DBA.
ABOUT THE AUTHOR:
Don Jones is a co-founder of Concentrated Technology LLC, the author of more than 30 IT books and a speaker at technical conferences worldwide. Contact him through his website at www.ConcentratedTech.com.