A data structure is a specialized format for organizing, processing, retrieving and storing data. There are several basic and advanced types of data structures, all designed to arrange data to suit a specific purpose. Data structures make it easy for users to access and work with the data they need in appropriate ways. Most importantly, data structures frame the organization of information so that machines and humans can better understand it.
In computer science and computer programming, a data structure may be selected or designed to store data for the purpose of using it with various algorithms. In some cases, the algorithm's basic operations are tightly coupled to the data structure's design. Each data structure contains information about the data values, relationships between the data and -- in some cases -- functions that can be applied to the data.
For instance, in an object-oriented programming language, the data structure and its associated methods are bound together as part of a class definition. In non-object-oriented languages, there may be functions defined to work with the data structure, but they are not technically part of the data structure.
Why are data structures important?
Typical base data types, such as integers or floating-point values, that are available in most computer programming languages are generally insufficient to capture the logical intent for data processing and use. Yet applications that ingest, manipulate and produce information must understand how data should be organized to simplify processing. Data structures bring together the data elements in a logical way and facilitate the effective use, persistence and sharing of data. They provide a formal model that describes the way the data elements are organized.
Data structures are the building blocks for more sophisticated applications. They are designed by composing data elements into a logical unit representing an abstract data type that has relevance to the algorithm or application. An example of an abstract data type is a "customer name" that is composed of the character strings for "first name," "middle name" and "last name."
It is not only important to use data structures, but it is also important to choose the proper data structure for each task. Choosing an ill-suited data structure could result in slow runtimes or unresponsive code. Five factors to consider when picking a data structure include the following:
- What kind of information will be stored?
- How will that information be used?
- Where should data persist, or be kept, after it is created?
- What is the best way to organize the data?
- What aspects of memory and storage reservation management should be considered?
How are data structures used?
In general, data structures are used to implement the physical forms of abstract data types. Data structures are a crucial part of designing efficient software. They also play a critical role in algorithm design and how those algorithms are used within computer programs.
Software engineers use algorithms that are tightly coupled with the data structures -- such as lists, queues and mappings from one set of values to another. This approach can be fused in a variety of applications, including managing collections of records in a relational database and creating an index of those records using a data structure called a binary tree.
Some examples of how data structures are used include the following:
- Storing data. Data structures are used for efficient data persistence, such as specifying the collection of attributes and corresponding structures used to store records in a database management system.
- Managing resources and services. Core operating system (OS) resources and services are enabled through the use of data structures such as linked lists for memory allocation, file directory management and file structure trees, as well as process scheduling queues.
- Data exchange. Data structures define the organization of information shared between applications, such as TCP/IP packets.
- Ordering and sorting. Data structures such as binary search trees -- also known as an ordered or sorted binary tree -- provide efficient methods of sorting objects, such as character strings used as tags. With data structures such as priority queues, programmers can manage items organized according to a specific priority.
- Indexing. Even more sophisticated data structures such as B-trees are used to index objects, such as those stored in a database.
- Searching. Indexes created using binary search trees, B-trees or hash tables speed the ability to find a specific sought-after item.
- Scalability. Big data applications use data structures for allocating and managing data storage across distributed storage locations, ensuring scalability and performance. Certain big data programming environments -- such as Apache Spark -- provide data structures that mirror the underlying structure of database records to simplify querying.
Characteristics of data structures
Data structures are often classified by their characteristics. The following three characteristics are examples:
- Linear or non-linear. This characteristic describes whether the data items are arranged in sequential order, such as with an array, or in an unordered sequence, such as with a graph.
- Homogeneous or heterogeneous. This characteristic describes whether all data items in a given repository are of the same type. One example is a collection of elements in an array, or of various types, such as an abstract data type defined as a structure in C or a class specification in Java.
- Static or dynamic. This characteristic describes how the data structures are compiled. Static data structures have fixed sizes, structures and memory locations at compile time. Dynamic data structures have sizes, structures and memory locations that can shrink or expand, depending on the use.
If data structures are the building blocks of algorithms and computer programs, the primitive -- or base -- data types are the building blocks of data structures. The typical base data types include the following:
- Boolean, which stores logical values that are either true or false.
- integer, which stores a range on mathematical integers -- or counting numbers. Different sized integers hold a different range of values -- e.g., a signed 8-bit integer holds values from -128 to 127, and an unsigned long 32-bit integer holds values from 0 to 4,294,967,295.
- Floating-point numbers, which store a formulaic representation of real numbers.
- Fixed-point numbers, which are used in some programming languages and hold real values but are managed as digits to the left and the right of the decimal point.
- Character, which uses symbols from a defined mapping of integer values to symbols.
- Pointers, which are reference values that point to other values.
- String, which is an array of characters followed by a stop code -- usually a "0" value -- or is managed using a length field that is an integer value.
Types of data structures
The data structure type used in a particular situation is determined by the type of operations that will be required or the kinds of algorithms that will be applied. The various data structure types include the following:
- Array. An array stores a collection of items at adjoining memory locations. Items that are the same type are stored together so the position of each element can be calculated or retrieved easily by an index. Arrays can be fixed or flexible in length.
- Stack. A stack stores a collection of items in the linear order that operations are applied. This order could be last in, first out (LIFO) or first in, first out (FIFO).
- Queue. A queue stores a collection of items like a stack; however, the operation order can only be first in, first out.
- Linked list. A linked list stores a collection of items in a linear order. Each element, or node, in a linked list contains a data item, as well as a reference, or link, to the next item in the list.
- Tree. A tree stores a collection of items in an abstract, hierarchical way. Each node is associated with a key value, with parent nodes linked to child nodes -- or subnodes. There is one root node that is the ancestor of all the nodes in the tree.
- Heap. A heap is a tree-based structure in which each parent node's associated key value is greater than or equal to the key values of any of its children's key values.
- Graph. A graph stores a collection of items in a nonlinear fashion. Graphs are made up of a finite set of nodes, also known as vertices, and lines that connect them, also known as edges. These are useful for representing real-world systems such as computer networks.
- Trie. A trie, also known as a keyword tree, is a data structure that stores strings as data items that can be organized in a visual graph.
- Hash table. A hash table -- also known as a hash map -- stores a collection of items in an associative array that plots keys to values. A hash table uses a hash function to convert an index into an array of buckets that contain the desired data item.
These are considered complex data structures as they can store large amounts of interconnected data.
How to choose a data structure
When choosing a data structure for a program or application, developers should consider the answers to the following three questions:
- Supported operations. What functions and operations does the program need?
- Computational complexity. What level of computational performance is tolerable? For speed, a data structure whose operations execute in time linear to the number of items managed -- using Big O notation: O(n) -- will be faster than a data structure whose operations execute in time proportional to the square of the number of items managed -- O(n^2).
- Programming elegance. Are the organization of the data structure and its functional interface easy to use?
Some real-world examples include:
- Linked lists are best if a program is managing a collection of items that don't need to be ordered, constant time is required for adding or removing an item from the collection and increased search time is OK.
- Stacks are best if the program is managing a collection that needs to support a LIFO order.
- Queues should be used if the program is managing a collection that needs to support a FIFO order.
- Binary trees are good for managing a collection of items with a parent-child relationship, such as a family tree.
- Binary search trees are appropriate for managing a sorted collection where the goal is to optimize the time it takes to find specific items in the collection.
- Graphs work best if the application will analyze connectivity and relationships among a collection of individuals in a social media network.