Database Indexing in System Design

What is Database Indexing?

Database Indexing is a technique to improve the performance of a database by organizing data in a way that makes it easier to retrieve information. An index is similar to an index in a book, serving as a reference to the data stored in a database table.

When you index a database, you create an additional data structure that helps quickly locate a specific field value and its corresponding record. This structure is usually sorted for efficient binary search operation, which makes it faster to perform read operations. Indexing can also help to enforce the uniqueness of data in the database.

There are several different indexing methods available. But before exploring those, it's important to understand the need for indexing. For example, indexing might be necessary when a database has many fields and needs to be organized for improved performance.

What do we need Database Indexing?

Suppose you are in your city’s central library and have 5 minutes to find a particular book. Would you accomplish this task and find the book in the specified time frame? It would be nearly impossible to search for any book in such an extensive collection by checking each section. However, if we request access to the library’s index, our task would be easier because indexes provide all the information required to locate the book quickly. Similarly, a database index includes all of the information needed to access data efficiently.

Nowadays, IT industry is generating vast amounts of data. All such data need to be stored, but we need a way to efficiently store and retrieve such a massive amount of data. The ability of a company to access the data quickly is critical to its success. One way to achieve this is using the similar idea of maintaining indexes as we saw in the Library example. The indexes function as lookup tables for faster retrieval.

  • To improve query performance: Indexing speeds up data retrieval process from a database by allowing the system to quickly locate the desired data without having to search through the entire table.
  • To enforce unique constraints: Indexing can help enforce the uniqueness of data in the database, which prevent duplicate entries and ensure data integrity.
  • For efficient sorting and grouping: Indexing can also improve the efficiency of sorting and grouping operations. This make us possible to retrieve and manipulate data in an organized and meaningful way.

Methods of Database Indexing

There are primarily three methods of database indexing:

  • Clustered Indexing
  • Secondary Indexing
  • Multilevel Indexing

Clustered Indexing

Cluster indexing is a form of storage in which more than two records are stored in the same file. We may reduce the search cost by using cluster indexing since many records related to the same item are stored in one location.

In this method, an ordered data file is used to describe the clustering index, and a non-key field is used to sort the data file. In some instances, the index is built on columns other than the primary key, which may or may not be unique for each record. In such instances, we can group two or more columns to get the unique values and index them to make it easier to find the data. Hence records with similar characteristics are grouped, and indexes for these groups are generated.

Secondary Indexing

A secondary index indicates where the data is stored by providing a list of virtual pointers or references to the data’s location. The data is not physically processed in the index’s order but is stored in leaf nodes.

For example, consider a book’s contents page. Each entry specifies the information’s page number or position. Although the data is not structured, we have an ordered reference showing where the data points are located. We can only have dense ordering in the non-clustered index because data is not physically ordered, so sparse ordering is unlikely. It takes longer than the clustered index because more work is required to extract the data by following the pointer further. The data is immediately in front of the index in the case of a clustered index.

Multilevel Indexing

Indexes expand in tandem with the size of the database. A single-level index can become too big to store with multiple disc accesses because it is stored in the main memory. Multilevel indexing divides the main block into smaller blocks stored together in a single block. Internal blocks are separated into outer blocks, which are then pointed to the data blocks. With fewer overheads, this can be conveniently stored in the main memory.

How to create indexes?

Data is stored in rows and organized into tables in a database. Each row has a unique key that distinguishes it from the others, and these keys are stored in an index for easy retrieval. Since keys are stored in indexes, the index is automatically modified whenever a new row with a specific key is inserted.

As data is saved on a disk-based storage unit, it is organized into blocks. The atomic disc access operation is when these blocks are accessed in their entirety. Disk blocks are organized similarly to linked lists: they have a data section and a pointer to the next node (or block), and they don’t have to be stored in the same order.

We may claim that searching on a field that isn’t sorted requires a Linear Search because multiple records can only be sorted on one field. If that field is a non-key field, N block accesses are needed to search the entire table space. With a sorted field with log2 N block accesses, a Binary Search can be used. Furthermore, since the data is sorted based on a non-key field, the rest of the table does not need to be searched for duplicate values until a higher value is discovered. As a result, the performance boost is essential.

Advantages of Indexing

  • Indexing improves the search and reads the speed of the database
  • It allows you to reduce the total number of I/O operations needed to retrieve the data by eliminating the need to access a database row via an index structure.

Disadvantages of Indexing

  • With indexing partitioning of the table is not permitted
  • Indexing takes up more storage space since the indices are stored together in a table. If several fields within the same table are indexed, this file can easily exceed the size limits of the underlying file system.
  • Indexing reduces the write speed of the database.
  • A primary key on the table with a unique value is needed to perform the indexing database management scheme.

Conclusion

Indexes are a powerful method that a database uses in the background to speed up querying. Indexes power queries by allowing for a fast lookup of the requested information. It’s a data structure technique for quickly locating and accessing data in a database. Indexing is a technique for improving database efficiency by reducing the number of disc accesses needed when a query is running.

More from EnjoyAlgorithms

Self-paced Courses and Blogs