Database administrators have become begrudgingly familiar with virtual servers and the concept that several virtual servers can share the same physical host server, and that these virtual servers can easily be moved from one physical server to another. The next virtualization technology has an even bigger impact on DBAs because it revolutionizes the one thing database servers are all about: data storage.
Virtualized storage, like virtualized servers, is a piece of technology that sits between the server and the storage, abstracting the middle layer so that the storage can be changed on the fly. Virtual storage gives SAN administrators the ability to seamlessly move data from fast and expensive RAID 10 fiber arrays down to slower, cheaper RAID 5 arrays on SATA. Just like virtual servers can be quickly moved back and forth from fast to slow host servers, virtualized storage does the same thing for data.
The most advanced vendors perform this tiering at the block level. A single drive presented to the database could be partially on RAID 10 fiber, partially on RAID 5 fiber and partially on SATA. As data is accessed less and less frequently, it drops to slower tiers. The telltale sign is doing a SELECT against different date ranges of the data: The most current data might perform extremely fast, while data from several months ago performs very slowly.
Unpredictability: A DBA's foe
SQL Servers live and die by I/O speed, so surprise storage changes strike fear in the hearts of database administrators everywhere. When DBAs respond to performance questions, they don't stop to ask if their underlying drives have gotten faster or slower. But with virtual storage, that question has to come up at the start of every troubleshooting call. When a shop invests in virtual storage, it's doing it to gain the flexibility of changing arrays to higher or lower performance drives – without planning or warning -- and the SAN team will use that power.
Even well-intentioned SAN administrators can wreak havoc on databases by changing storage. For example, a large company's Internet filtering database may see constant, heavy write activity, but it would not need high-performance storage since it doesn't care how long it takes to write the Web logs. A SAN admin might see the large write volume
and think this data drive needs a faster tier of storage. He looks around, sees a decision support database with a relatively low amount of load and decides to swap the storage around. Upshot: The database administrator would suddenly start fielding calls from angry executives, and the DBA probably wouldn't take the time to notice that the Internet filtering database was suddenly much faster. DBAs have a hard enough job troubleshooting performance problems without having to ask the storage team if any drives have been moved between tiers.
While block-level tiering SANs lessen the chance that an overzealous administrator will manually move data around, other risks still exist. An underestimated SAN department budget combined with storage growth can undermine the SAN. In theory, block-level tiering will automatically push individual blocks of data down to slower tiers of storage only when they're not accessed frequently. However, if the SAN admin doesn't buy enough fast storage, or buys too much slow storage, the SAN will have no choice but to push some of the blocks down to slower tiers, because the company doesn't have enough fast storage to keep all of the frequently accessed data on the top tier.
An unplanned application can have the same effect: A new app can come into the same block-level tiered SAN that houses databases. If that project doesn't buy enough fast storage to match, and if it does a lot of writes, then it can push the databases down to a slower tier, since all writes happen on the fastest tiers.
Careful SAN management can confine specific applications to specific tiers, but this requires great attention to detail by the SAN team. Since this type of tiering is so new, the reporting capabilities aren't quite there yet, so until problems manifest in slow database response times, the SAN team may not know about the problems.
Show me a shop with virtualized storage, and I'll show you a shop where the DBA can never sleep. The DBA must constantly be on the watch, patrolling for sudden I/O bottlenecks, measuring storage statistics night and day.
Flexibility: A DBA's friend
With dangers like that, why would anyone pay money for virtualized storage? Because if it's used wisely, virtualized storage offers amazing flexibility.
Database administrators usually go to the SAN team long before a new project goes live and the two groups work together to predict load requirements. But here are some reasons we can't always predict the kind of load an application will have:
- If the app has a slow adoption rate, the SQL Server might be able to live on RAID 5 for months.
- If the app catches on like wildfire, it might need expensive RAID 10 arrays of 15k drives right away.
- In the worst-case scenario, the project never really catches on, and it could live on slow, cheap SATA forever.
This type of prediction isn't easy, and shops end up overbuying or under buying storage with disastrous consequences. After the project goes live, expensive disk drives sit unused or else users complain about slow performance -- causing the project to run over budget.
With virtualized storage, the server can start on a relatively slow, cheap array and move up to faster disk drives only if the application shows I/O bottlenecks. Everybody's a winner: The SAN team wins because it doesn't waste expensive drives, and the DBA wins because users get the speed they need. Both teams spend less time trying to guess an application's performance before it goes live and less time planning outages to move data around between storage tiers to rectify a bad guess. As the project's needs change, the SAN team can just migrate data without server downtime.
Another great example is a data warehouse that retains seven years of financial history. The recent data will get heavy read/write access and needs to be on the fastest disk possible. But as time goes on, it needs to automatically sink down to a slower, cheaper tier of storage. With SQL Server 2005 partitioning the database administrator can automate that movement, but database partitioning requires careful planning, testing and active management.
Block-based virtualized storage performs this task automatically. When blocks are not frequently accessed (relative to the other blocks on the drive), they will gradually sink down to slower tiers, eventually settling on RAID 5 SATA. This frees up fast and expensive 15k RPM drives. Since the storage tiering is done on a block level, not a drive level, a single table's contents could be automatically segregated between fast and moderate storage. This frees up the DBA to partition data only in cases where SQL locking control is important -- instead of trying to learn storage techniques.
So which is it: Friend or foe?
The key to a successful virtualized storage environment is storage performance monitoring. The DBA must discover performance pain points before they become user complaints.
Block-level virtualized storage requires even more performance awareness. The DBA can't just watch whole drives, but has to know what tables and what types of queries are running more slowly than expected.
With careful monitoring and management, virtualized storage can be the DBA's best friend. But without a careful eye, the DBA's job gets a lot harder.
ABOUT THE AUTHOR
Brent Ozar is a SQL Server DBA based in Houston with a leading global financial services firm. Brent has a decade of broad IT experience in systems administration and project management before moving into database administration. In his current role, he specializes in database performance tuning, SANs and data warehousing. Brent conducts training sessions, has written several technical articles and blogs at BrentOzar.com. He is also a member of Quest's Association of SQL Server Experts.