With so many data warehouses getting to be behemoth in size, companies should be looking for ways to link data to external sources rather than simply continuing to store more information, according to Barry Devlin, a data warehouse expert and IBM Corp. consultant.
Devlin, who was responsible for the definition of IBM's warehouse architecture in the mid-'80s, spoke to nearly 500 attendees at The Data Warehousing Institute (TDWI)'s 2003 World Conference in Boston last week. He currently works on DB2 Information Integrator.
Devlin said that companies should link and copy the data needed from partners and government agencies, but not archive and store the information in data warehouses, which are growing bigger every day. Companies need to closely examine whether the information they are accumulating is important enough to store, he said.
"The percentage of data in the warehouse that will be used actually decreases as you put more information into it," Devlin told attendees. "As we move closer and closer to the individual transaction level, we need to ask ourselves if we really need this information."
Over the last 15 years, databases have been forced to move faster to provide automated decision making, he said. External information is becoming more important, and informational and operational systems are beginning to merge, he said.
"We are moving forward from a controlled and refined way of looking at things to a more open and multidimensional
TDWI attendee Patrico Espinosa of the Ecuadorian-based consultancy Grupomas, said Devlin showed that the concepts are changing in the industry. Companies want real-time databases so they can make quick business decisions, and that requires a lot of data gathering, but they risk making databases inefficient as they grow larger, Espinosa said.
"Before, the techniques were focused on making the data warehouse bigger and better," Espinosa said. "Now, the focus is on separating the external data instead of gathering it."
Devlin said companies should concentrate on decreasing the storage of infrequently accessed data and instead leverage content stores and Web services while cooperating to get data from partners and customers.
"We need to think outside the box in order to increase the relevance of what we want to achieve," Devlin said. "We are seeing emerging technologies that can join the warehouse with external data."
FOR MORE INFORMATION
Check out a Featured Topic on grid computing.
To provide your feedback on this article, contact Robert Westervelt.