Data structures are the core building blocks of algorithms and real-life applications. In terms of an analogy, a well-designed algorithm using data structure is just like a structure of a good house. For example, to design a good house, we need to put all components of a house together based on principles of good construction techniques. In the same way, design of an algorithm must be based on a good understanding of data structures:
The maximum benefit from good data structures results from designing algorithms around them in the first place. If we observe closely, changing the data structure does not change the correctness of our code because we replace a correct implementation with another correct implementation.
But different data structures take different times to execute various operations. So we compare the efficiency of operations provided by multiple data structures and choose the correct data structure among them to improve our code performance.
The data structure is an idea to organize various types of data in memory. In other words, data structures are several ways to efficiently organize data in memory to perform several operations. We use it to manage, process, and efficiently get relevant information.
There will be two primary components in every data structure: data and various operations working on that data. Data is information, and operations are algorithms working on that data to get valuable insights.
Data Structures = Data + Permissible operations on the data
Operations on any data structure can be grouped into two types: Query Operations and Modification Operations. These operations are frequent in real-life applications, and we should study the design, code, and analysis of critical operations for each data structure.
We can classify data structures in several ways based on the use case and properties.
Primitive data structure is a fundamental building block of data structures. This is a predefined way of storing data of only one type. In other words, this can hold a single value in a specific memory location. Some examples of primitive data structures are: boolean (true or false), char (16 bits), byte (8bits), short (16 bits), int (32 bits), long (64 bits), float (32 bits), double (64 bits).
Non-primitive data structures are derived from primitive data structures. This is a user-defined data structure that stores data of different types as a single entity with a relationship between each data item.
Based on the order of organizing and accessing data, non-primitive data structures can be classified into two categories: Linear data structures and Non-linear data structures.
Linear data structures: Here, elements are arranged or linked linearly with one after another. In other words, elements can be traversed or accessed in a single direction. Examples of linear data structures are array, linked list, stack, and queue.
Non-linear data structure: Here, elements are not arranged sequentially. In other words, each data element can be linked to more than one data element. It can exhibit a hierarchical relationship between elements, and there can be multiple directions to go from one data to another. Examples of non-linear data structures are tree, graph, set, etc.
Based on the memory representation of data, depending upon whether they are based on arrays or pointers, non-primitive data structures can be further classified into sequential and linked data structures.
Data sets manipulated by algorithms can grow, shrink, or change over time. We call such a data structure dynamic set. From another perspective: algorithms may perform several different operations on dynamic sets. The best way to implement a dynamic set depends upon the operations that must be supported.
We can categorize the dynamic set into three categories:
Containers: We use a container to denote a data structure that allows storing and retrieving data items independent of content. For example, array, linked list, stack, queue, binary tree, graph, etc., work as a container.
Dictionaries: Many algorithms need only the ability to insert, delete, and search elements in a set. We call a dynamic set that supports these operations a dictionary. For example, we can implement dictionary operations using an array, linked list, BST, and hash table. Here hash table is the efficient implementation of a dictionary where each operation works in O(1) time on average.
Priority queues: Many algorithms support the operations of searching, inserting, and deleting the element with maximum or minimum priority from a data set. For example, we can implement a priority queue using an array, linked list, BST, and heap. Here, heap provide an efficient implementation of priority queue where findMin() or findMax() operation works in O(1) time complexity and deleteMax() or deleteMin() operations in O(logn) time complexity.
Note: If an application requires processing data using both dictionary and priority queue, then balanced BST could be a perfect choice. Here all operations work in O(logn) time complexity.
The basic idea of data structures starts from studying abstract data types. Abstract Data type (ADT) is a class whose behaviour is defined by a set of values and operations.
Abstract data types only provide an interface about what operations can be performed but not how these operations will be implemented. In other words, this does not specify how data will be organized in memory and what algorithms will be used for implementing the operations.
From another perspective, we can think of ADT as a black box that hides the inner structure and design of the data type. It is a model of abstraction to provide an implementation-independent view, where a user only needs to know what a data type can do and does not need to know how that data type is implemented.
For example, we may want to maintain a first-in-first-out (FIFO) queue of elements. The required operations are the insertion and deletion of elements from the queue. In case of deletion, the elements must be removed in the same order in which they were inserted. It is more convenient to design efficient code for these operations and provide an interface to the user without specifying implementation details. We call the abstract data type that supports these operations a FIFO Queue.
Another example of an abstract data type is a queue where elements are associated with priorities. Here deletion will not happen according to the order of insertions but according to the priorities. That is, the first element to be removed in each step is the item of highest or lowest priority among elements in the queue. This abstract data type is called a Priority Queue.
So the most crucial part of an abstract data type is a list of operations that we want to support. We concentrate on critical operations of data structure and design an efficient algorithm for each operation. In simple words: abstract data types allow us to make the algorithm-design process more modular.
Storing any types of sequential records (Like contacts in our phone, speech signals in speech processing. etc.), Implementing stack, queue, hash tables, heaps, Fenwick tree, adjacency matrix representation of a graph, etc.
Singly-linked list: Maintaining a directory of names, Performing arithmetic operations on long integers, Manipulation of polynomials, Implementing stack and queue, Adjacency list representation of a graph, Hashing by chaining method, Dynamic memory allocation, etc.
Doubly-linked list: Implementing LRU Cache and Fibonacci heap, Thread scheduler in operating systems, Representing deck of cards game, Implementing undo and redo functionality in applications, Previous and next page in a web browser, etc.
Circular linked list: Processes scheduling in the operating system, Keep track of the turn in a multiplayer game, etc.
Storing browser history, UNDO/REDO options in a text editor, Process scheduling, Static memory allocation, Storing function calls in recursion, Identify missing braces in IDE or a compiler, Expression evaluation, Syntax parsing, etc.
Process scheduling in operating systems (CPU and IO scheduling), Interprocess communications, Waiting list in Applications, Customer care calls management
Accessing websites using keywords in search engines, Finding phone numbers on mobile devices, Employees information system, Spelling checkers in word processing software, Symbol table in a compiler, Load balancing, Database partitioning, Encrypting critical data in cryptography, etc.
Binary tree: Storing hierarchical data (folder structure, organization structure, etc.), Document Object Model to represent documents with a logical tree, etc.
Binary search tree: Used in applications where continuous insertion and deletion of data is happening, Implementing ordered map and set in programming languages libraries, etc.
Heap: Implementing priority queues, Scheduling processes in operating systems, Dynamic memory allocation, Order Statistics like finding kth smallest or largest, etc.
Trie: Autocompletion in google search, Spell-checking in word processors, Swipe features in mobile keypad, Network browser history of the URLs, Longest prefix matching used by routers in internet protocols, Predictive text technology (T9 dictionary in mobile phones), etc.
Advanced tree data structures
Compiler design for parsing of the syntax (Syntax Tree), Wireless networking and a memory allocation (Treap), Range query processing (Segment tree or binary index tree), Finding the nearest neighbour to a particular point (K dimensional tree), Storing data on the drive (B-tree), 3D computer graphics (BSP tree), Fast full-text search in word processors (Suffix tree), Database design (B-Tree), etc.
Shortest path algorithm: Network routing , Google maps, Shipment routing in e-commerce, Robot path planning, etc.
Breadth-first search (BFS): Finding people within a given distance k from a person on social network websites, Web crawling in a google search, Finding all neighbour nodes in the peer-to-peer network like BitTorrent, Finding neighbouring places via GPS navigation, etc.
Depth-first search (DFS): Detecting cycle in a graph, Bipartiteness test in a graph, Finding strongly connected components, Path-finding and solving maze puzzles, Topological sorting, etc.
Topological sorting: Generating a dependency graph of the dependent modules in IDE build-systems, Determining the order to create complex database tables with interdependencies, Determining order to take courses and their prerequisites to complete a degree, etc.
Other important applications of graph
Assigning the fastest pick-ups to Uber drivers (Hungarian algorithm), Facebook’s friend suggestion and graph search algorithm, Google page ranking algorithm and Knowledge graph, Resource allocation graph in the operating system, Transaction graphs in cryptocurrency (Blockchain, which is a large graph), Artificial neural networks, Product recommendation graphs (Amazon recommendation system), etc.
Enjoy learning, Enjoy algorithms!