Binary search tree (BST) is a linked representation of a binary tree, where each node has a key and associated value. Each node in a BST contains a key and a value, and the keys must follow a specific order known as the "binary search tree property." This property states that the key in each node must be greater than or equal to the keys in the left subtree and less than or equal to the keys in the right subtree. In other words, both the left and right subtrees of a node must also be binary search trees.
Note: In the linked representation of a binary tree, each node is an object that contains a key, value, and pointers to the left and right children. Depending on the requirements, additional attributes such as a pointer to the parent node or a count of nodes may also be stored.
The binary search tree data structure supports various basic operations, such as search, insert, delete, find maximum, find minimum, find predecessor, find successor, find the kth smallest element, etc. These operations all have a time complexity of O(h), where h is the height of the tree.
If the binary search tree is balanced, meaning that it has roughly the same number of nodes on the left and right subtrees of the root, then the above operations will take O(log n) time, where n is the number of nodes in the tree. On the other hand, if the binary search tree is a linear chain of n nodes, then the operations will take O(n) time. On average, the height of a randomly built binary search tree is O(log n), so the basic operations on such a tree will take O(log n) time on average.
class BinarySearchTree
{
private BSTNode root
private class BSTNode
{
private int key
private int value
private BSTNode left
private BSTNode right
public BSTNode(int x)
{
key = x;
left = right = NULL
}
}
public BinarySearchTree()
{
root = null
}
public void insert(int key)
{
root = insertKey(root, key)
}
private BSTNode insertKey(BSTNode root, int key)
{
//Implementation logic
}
public BSTNode search(int key)
{
return searchKey(root, key)
}
private BSTNode searchKey(BSTNode root, int key)
{
//Implementation logic
}
public void delete(int key)
{
root = deleteKey(root, key)
}
private BSTNode deleteKey(BSTNode root, int key)
{
//Implementation logic
}
public void inorder()
{
inorderWalk(root)
}
private void inorderWalk(BSTNode root)
{
//Implementation logic
}
.... Other Methods....
}
One way to construct a binary search tree is to insert each key one by one, starting with an empty tree. However, the height of the resulting tree will depend on the order in which the keys are inserted. For example:
As we know, the time complexity of basic operations on a binary search tree depends on the height of the tree. If there are frequent insertions and deletions, the height of the tree will also change. In the worst case, the height will be O(n), while in the best case, it will be O(log n).
Therefore, it is important to maintain a balanced BST to ensure that operations can be performed in O(log n) time. One way to do this is to use a self-balancing binary search tree data structure, such as a red-black tree or an AVL tree.
If a binary search tree is created by randomly inserting n keys, the average height of the tree will be close to the best case, which is O(log n). This assumption is based on the idea that all n! permutations of the input keys are equally likely. Note: We will discuss the proof and properties of randomized BST in a separate blog.
This behaviour is similar to that of randomized quicksort, where the average case time complexity is O(n log n), which is much closer to the best case (O(n log n)) than to the worst case (O(n^2)). In other words, the construction of a BST and the quicksort algorithm are similar. The root node of the BST corresponds to the pivot in quicksort (keys to the left are smaller and keys to the right are larger), and the left and right subtrees correspond to the left and right partitions in quicksort.
It is clear that basic operations on a binary search tree will be fast if the tree is balanced, meaning that the height is O(log n). However, maintaining a height-balanced tree can be challenging when frequent insertions and deletions occur. To address this problem, we can use self-balancing binary search tree data structures such as AVL trees and red-black trees, which guarantee that the height of the tree will be O(log n) in the worst case.
An AVL tree is a self-balancing binary tree that ensures that the difference in height between the left and right subtrees of every node is at most 1. It achieves this balance by performing rotation operations during insertions and deletions in constant time.
A red-black tree is a self-balancing binary search tree that stores an extra bit of information (the color red or black) in each node. These colours are used to ensure that the tree remains balanced during insertions and deletions. While the balance of a red-black tree is not perfect, it is sufficient to keep the time complexity of basic operations around O(log n), where n is the total number of nodes in the tree.
The time complexity of dictionary operations (insert, delete, and search) can vary depending on the data structure used. For example, doubly-linked lists can perform insertions and deletions in O(1) time, but searching will take O(n) time in the worst case. Similarly, searching in a sorted array will take O(log n) time, but insertions and deletions will take O(n) time in the worst case.
Self-balancing binary search trees can perform all three operations (insert, delete, and search) in O(log n) time. However, one of the most efficient solutions is to use a hash table, which can perform these operations in an average of O(1) time. The question then arises: If hash tables are faster than BSTs for these basic operations, when should we prefer BSTs over hash tables? Let's think.
Binary search trees are memory efficient because they only allocate as much space as needed, unlike hash tables which may waste space if the exact number of elements to be stored is unknown. For example, if a hash function has a range of 0 ≤ h(k) ≤ 100, then an array of 100 elements must be allocated, even if only 20 elements are being stored. In contrast, a binary search tree would only allocate the necessary space.
BSTs store keys in sorted order and allow for the efficient performance of various operations in O(log n) time, such as finding the minimum or maximum, deleting the minimum or maximum, finding the successor or predecessor, and finding the kth largest or smallest. This versatility makes a self-balancing BST suitable as both a dictionary and a double-ended priority queue. While a heap data structure can support either the extractMin() or extractMax() operation, a self-balancing BST can support both.
BSTs are easier to implement and customize than hash tables, which often rely on libraries provided by programming languages. However, poor choice of the hash function can result in collisions and decreased performance of the hash table. Additionally, as the number of elements in a hash table grows, it may need to be extended and reallocated, which can be costly. BSTs, on the other hand, are simple to implement and do not require sudden allocation of large amounts of data or rehashing operations.
Binary search trees also allow for efficient range searches, where only the necessary subtrees are searched rather than iterating over every slot in a hash table (O(n)).
We will add more insights to this blog in the coming future. Enjoy learning, enjoy algorithms :)
Looking forward to your feedback and insights.