Implementing Supertypes and Subtypes
 |
|
| You are reading part 5 from "Physical data storage in SQL Server 2005 and 2008," excerpted from A Developer's Guide to Data Modeling for SQL Server, Covering SQL Server 2005 and 2008, by Eric Johnson and Joshua Jones, copyright 2008, printed with permission from Addison-Wesley Professional. |
|
|
 |
 |
We discuss supertypes and subtypes in Chapter 2. These are entities
that have several kinds of real-world objects being modeled. For example,
we might have a supertype called phone with subtypes for corded and cordless phones. We separate objects into a subtype cluster because even
though a phone is a phone, different types will require that we track different
attributes. For example, on a cordless phone, you need to know the
working range of the handset and the frequency on which it operates, and
with a corded phone, you could track something like cord length. These
differences are tracked in the subtypes, and all the common attributes of
phones are held in the supertype.
How do you go about physically implementing a subtype cluster in
SQL Server? You have three options. The first is to create a single table
that represents the attributes of the supertype and also contains the attributes
of all the subtypes. Your second option is to create tables for each of
the subtypes, adding the supertype attributes to each of these subtype tables.
Third, you can create the supertype table and the subtype tables, effectively
implementing the subtype cluster in the same way it was logically
modeled.
To determine which method is correct, you must look closely at the
data being stored. We will walk through each of these options and look at
the reasons you would use them, along with the pros and cons of each.
Supertype Table
You would choose this option when the subtypes contain few or no differences
from the data stored in the supertype. For example, let's look at a
cluster that stores employee data. While building a model, you discover
that the company has salaried as well as hourly employees, and you decide
to model this difference using subtypes and supertypes. After hashing out
all the requirements, you determine that the only real difference between
these types is that you store the annual salary for the salaried employees
and you need to store the hourly rate and the number of hours for an
hourly employee.
In this example, the subtypes contain very subtle differences, so you
could build this subtype cluster by using only the supertype table. For this
situation, you would likely create a single employee table that contains all
the attributes for employees, including all three of the subtype attributes
for salary, hourly rate, and hours. Whenever you insert an hourly employee,
you would require that data be in the hourly rate and hour columns
and that the salary column be left NULL. For salaried employees, you
would do the exact opposite.
Implementing the types in this way makes it easy to find the employee
data because all of it is in the same place. The only drawback is that you
must implement some logic to look at the columns that are appropriate to
the type of employee you are working with. This supertype-only implementation
works well only because there are very few additional attributes
from the subtype's entities. If there were a lot of differences, you would
end up with many of the columns being NULL for any given row, and it
would take a great deal of logic to pull the data together in a meaningful
way.
Subtype Tables
When the data contained in the subtypes is dissimilar and the number of
common attributes from the supertype is small, you would most likely implement
the subtype tables by themselves. This is effectively the opposite
data layout that would prompt you to use the supertype-only model.
Suppose you're creating a system for a retail store that sells camera
equipment. You could build a subtype cluster for the products that the
store sells, because the products fall into distinct categories. If you look
only at cameras, lenses, and tripods, you have three very different types of
product. For each one, you need to store the model number, stock number,
and the product's availability, but that is where the similarities end. For
cameras you need to know the maximum shutter speed, frames per second,
viewfinder size, battery type, and so on. Lenses have a different set of attributes,
such as the focal length, focus type, minimum distance to subject,
and minimum aperture. And tripods offer a new host of data; you need to
store the minimum and maximum height, the planes on which it can pivot,
and the type of head. Anyone who has ever bought photography equipment
knows that the differences listed here barely scratch the surface; you
would need many other attributes on each type to accurately describe all
the options.
The sheer number of attributes that are unique for each subtype, and
the fact that they have only a few in common, will push you toward implementing
only the subtype tables. When you do this, each subtype table will
end up storing the common data on its own. In other words, the camera,
lens, and tripod tables would have columns to store model numbers, SKU
numbers, and availability. When you're querying for data implemented in
this way, the logic needs to support looking at the appropriate table for the
type of product you need to find.
Supertype and Subtype Tables
You have probably guessed this: When there are a good number of shared
attributes and a good number of differences in the subtypes, you will
probably implement both the supertype and the subtype tables. A good example
is a subtype cluster that stores payment information for your customers.
Whether your customer pays with an electronic check, credit card,
gift certificate, or cash, you need to know a few things. For any payment,
you need to know who made it, the time the payment was received, the
amount, and the status of the payment. But each of these payment types
also requires you to know the details of the payment. For credit cards, you
need the card number, card type, security code, and expiration date. For
an electronic check, you need the bank account number, routing number,
check number, and maybe even a driver's license number. Gift cards are
simple; you need only the card number and the balance. As for cash, you
probably don't need to store any additional data.
This situation calls for implementing both the supertype and the subtype
tables. A Payment table could contain all the high-level detail, and
individually credit card, gift card, and check tables would hold the information
pertinent to each payment type. We do not have a cash table, because
we do not need to store any additional data on cash payments beyond
what we have in the Payment table.
When implementing a subtype cluster in this way, you also need to
store the subtype discrimination, usually a short code or a number that is
stored as a column in the supertype table to designate the appropriate subtype
table. We recommend using a single character when possible, because
they are small and offer more meaning to a person than a number does. In
this example, you would store CC for credit card, G for a gift card, E for
electronic check, and C for cash. (Notice that we used CC for a credit card
to distinguish it from cash.) When querying a payment, you can join to the
appropriate payment type based on this discriminator.
If you need data only from either the supertype or the subtype, this
method offers two benefits: you need go to only one table, and you don't
retrieve extraneous data. However, the flip side is that you must determine
which subtype table you need to query and then join both tables if you
need data from both the supertype and a subtype table. Additionally, you
may find yourself needing information from the supertype and multiple
subtypes; this will add overhead to your queries because you must join
multiple tables.
Supertypes and Subtypes: A Final Word
Implementing supertypes and subtypes can, at times, be tricky. If you take
the time to fully understand the data and look at the implications of splitting
the data into multiple tables versus keeping it tighter, you should be
able to determine the best course of action. Don't be afraid to generate
some test data and run various options through performance tests to make
sure you make the correct choice. When we get to building the physical
model, we look at using subtype clusters as well as other alternatives for especially
complex situations.
Summary
In this chapter, we have looked at the available objects inside SQL Server
that you will use when implementing your physical model. It's important to
understand these objects for many reasons. You must keep all this in mind
when you design your logical model so that you design with SQL Server in
mind. This also plays a large part later when you build and implement your
physical model. You will probably not use every object in SQL Server for
every database you build, but you need to know your options. Later, we
walk through creating your physical model, and at that time we go over the
various ways you can use these physical objects to solve problems.
In the next chapter, we talk about normalization, and then we move on
to the meat and potatoes of this book by getting into our sample project
and digging into a lot of real-world issues.
Check out the authors' website where they co-host a podcast show for IT professionals.