Problem solve Get help with specific problems with your technologies, process and projects.

Data collection the automated way in SQL Server 2008

SQL Server 2008's data collector feature automates the collection of critical performance data, which DBAs can then analyze using brand new SSMS reports.

The data collector is a new feature in SQL Server 2008 designed to collect performance-related data from multiple...

servers, store it in a central data warehouse and present the data through reports in SQL Server Management Studio (SSMS).

The main purpose of the data collector is to provide an easy way to automate the collection of critical performance data such as performance counters, snapshots of data grabbed from dynamic management views (DMVs), and disk space details.

Since the feature collects information from the latest DMVs, it only works on SQL Server 2008. Still, it's worth noting that unlike many other useful DBA features, the data collector is not limited to the Enterprise edition only.

The data collector feature in SQL Server 2008 consists of several components:

  • The msdb system database is used to store so-called data collection sets containing the data collection definitions and jobs for collecting data with their associated schedules. Along with collection auditing and historical details, the msdb database also stores SQL Server Integration Services (SSIS) packages that are used to collect and upload the data.

  • The dcexec.exe utility executes the SSIS packages mentioned above. It also manages data collection sets.

  • The management data warehouse database stores collected data and contains view and stored procedures for collection management. It is recommended that this database is stored separately from the servers where data collection takes place.

  • SQL Server Management Studio 2008 reports are used for viewing collected data. There are three built-in reports -- Server Activity History, Disk Usage Summary and Query Statistics History.

SQL Server Management Studio provides a wizard for configuring data collection. To start the wizard, expand the Management node on the server where you want to collect the data, right-click on the Data Collection node and select Configure Management Data Warehouse. You will be presented with the following dialog:

Figure 1. Configuring the MDW wizard (click to enlarge)

When you run the wizard for the first time, you need to select the Create or upgrade a management data warehouse option. The wizard will then walk you through the steps necessary to create a central database for storing collected data. During this process, you need to map a login to the database roles in the warehouse database. I suggest creating a new login and mapping it, as shown below in this dialog from the wizard.

Figure 2. Map logins and users to MDW roles (click to enlarge)

Once you create and configure a data warehouse database, you can run the wizard again and select the second option, Set up data collection. You should do this on the server where you want to collect the data. On the first screen, select the server and database you created in step one, as well as a directory for storing cache data. Repeat this step for each server you want to collect data from.

When the wizard is finished creating data collectors and scheduled SQL Agent jobs, you should see three additional nodes under the Data Collector node -- Disk Usage, Query Statistics and Server Activity. You can double-click each node to open the Properties window. I highly recommend taking the time to examine all of them to get a better idea of what they do and how they are configured. Here you can also modify things like retention period, caching mode, and schedules. Since the data collector generates a lot of data and its storage tables fill up with millions of rows within hours, you may want to modify job schedules and decrease the frequency of data collections depending on the use of each server.

After running the wizard, data collection will start. It should take a while for some meaningful data to accumulate, so you may want to wait an hour or so before looking at the reports. SQL Server Management Studio 2008 now has three new reports for viewing data accumulated by the data collector -- Server Activity History, Disk Usage Summary and Query Statistics History. You can view these reports by right-clicking on the Data Collection node and selecting Reports, followed by Management Data Warehouse. SQL Server Management Studio 2008 also recognizes which database is used for warehousing the data, so when you right-click on that database, you will have the option to select the Management Data Warehouse Overview report shown below.

Figure 3. The Management Data Warehouse Overview report (click to enlarge)

This report shows you what servers the collection runs on. You can click on each link to drill into additional details on each server. The snapshot below shows the top part of a Server Activity History report after four hours of server activity.

Figure 4. A Server Activity History report (click to enlarge)

As you can see, the top of a data collector report shows a navigational toolbar that allows you to scroll through captured snapshots and also select a timeframe of the data you want to see. When you click on some of the above graphs, you can drill into the sub-reports for additional details. Below is an example of a SQL Server Waits sub-report.

Figure 5. The SQL Server Waits sub-report (click to enlarge)

I encourage you to examine each of these reports by drilling down, selecting different timeframes and so on to get familiar with what they have to offer. For example, you can drill from the Query Statistics History report to the individual query details, including the graphical execution plan.

The data collector comes at a cost of a 2- 5% performance hit on the servers where it's collecting data, mainly on the CPU. The storage required to store the data is roughly 300 MB per day, so you will need around 2 GB of database storage per server each week. How long should you retain that data? The answer really depends on your needs and storage availability, but for the most part you can go with the defaults. These are 14 days for Query Statistics and Server Activity History data collections, and two years for Disk Usage Summary collections.

If you want to keep performance data for longer periods without storing millions of rows that quickly accumulate, you could write your own queries and generate a daily or weekly aggregate of important data to keep around. SQL Server Books Online contains fairly good documentation of tables used for data collection, making it easier to do custom queries against collected data.

So that's a pretty general overview of the data collector feature and how you can configure your server to start collecting and analyzing performance data with SQL Server 2008. In a follow-up article, I will dig deeper into the internals and show you how to write your own queries to extract snapshots of data that you can keep around for longer periods of time, as well as how to create your own custom reports for displaying that data.

Roman Rehak is a senior database architect at in Colchester, Vt. He specializes in SQL Server development, database performance tuning, ADO.NET and writing database tools. Roman also serves as the technical chair for the SQL Server track at the annual DevTeach conferences in Canada and is the president of the Vermont SQL Server User Group.

Dig Deeper on Microsoft SQL Server Performance Monitoring and Tuning