Binary Search Tree (BST) Introduction and Properties

Binary search tree (BST) is a linked representation of a binary tree, where each node has a key and associated value. Each node in a BST contains a key and a value, and the keys must follow a specific order known as the "binary search tree property." This property states that the key in each node must be greater than or equal to the keys in the left subtree and less than or equal to the keys in the right subtree. In other words, both the left and right subtrees of a node must also be binary search trees.

Note: In the linked representation of a binary tree, each node is an object that contains a key, value, and pointers to the left and right children. Depending on the requirements, additional attributes such as a pointer to the parent node or a count of nodes may also be stored.

Binary search tree example

Operations on Binary Search Trees

The binary search tree data structure supports various basic operations, such as search, insert, delete, find maximum, find minimum, find predecessor, find successor, find the kth smallest element, etc. These operations all have a time complexity of O(h), where h is the height of the tree.

If the binary search tree is balanced, meaning that it has roughly the same number of nodes on the left and right subtrees of the root, then the above operations will take O(log n) time, where n is the number of nodes in the tree. On the other hand, if the binary search tree is a linear chain of n nodes, then the operations will take O(n) time. On average, the height of a randomly built binary search tree is O(log n), so the basic operations on such a tree will take O(log n) time on average.

class BinarySearchTree 
{
    private BSTNode root
    private class BSTNode 
    {
        private int key
        private int value
        private BSTNode left
        private BSTNode right
        public BSTNode(int x) 
        {
            key = x;
            left = right = NULL
        }
    }
    
    public BinarySearchTree() 
    {
        root = null
    }
    
    public void insert(int key) 
    {
        root = insertKey(root, key)
    }
    
    private BSTNode insertKey(BSTNode root, int key) 
    {
        //Implementation logic
    }
    
    public BSTNode search(int key) 
    {
        return searchKey(root, key)
    }
    private BSTNode searchKey(BSTNode root, int key) 
    {
        //Implementation logic
    }
    
    public void delete(int key) 
    {
        root = deleteKey(root, key)
    }
    private BSTNode deleteKey(BSTNode root, int key) 
    {
        //Implementation logic
    }
    
    public void inorder() 
    {
        inorderWalk(root)
    }
    private void inorderWalk(BSTNode root) 
    {
        //Implementation logic
    }
    
    .... Other Methods....
}

Construction of a binary search tree

One way to construct a binary search tree is to insert each key one by one, starting with an empty tree. However, the height of the resulting tree will depend on the order in which the keys are inserted. For example:

  • In the best case, a BST with n keys will be perfectly balanced, meaning that there will be O(log n) nodes between the root and each leaf node.
  • In the worst case, if the keys are inserted in increasing or decreasing order, the tree will be a linear chain of nodes with a height of n - 1.

As we know, the time complexity of basic operations on a binary search tree depends on the height of the tree. If there are frequent insertions and deletions, the height of the tree will also change. In the worst case, the height will be O(n), while in the best case, it will be O(log n).

Therefore, it is important to maintain a balanced BST to ensure that operations can be performed in O(log n) time. One way to do this is to use a self-balancing binary search tree data structure, such as a red-black tree or an AVL tree.

Randomly built binary search tree

If a binary search tree is created by randomly inserting n keys, the average height of the tree will be close to the best case, which is O(log n). This assumption is based on the idea that all n! permutations of the input keys are equally likely. Note: We will discuss the proof and properties of randomized BST in a separate blog.

This behaviour is similar to that of randomized quicksort, where the average case time complexity is O(n log n), which is much closer to the best case (O(n log n)) than to the worst case (O(n^2)). In other words, the construction of a BST and the quicksort algorithm are similar. The root node of the BST corresponds to the pivot in quicksort (keys to the left are smaller and keys to the right are larger), and the left and right subtrees correspond to the left and right partitions in quicksort.

Height balanced BST

It is clear that basic operations on a binary search tree will be fast if the tree is balanced, meaning that the height is O(log n). However, maintaining a height-balanced tree can be challenging when frequent insertions and deletions occur. To address this problem, we can use self-balancing binary search tree data structures such as AVL trees and red-black trees, which guarantee that the height of the tree will be O(log n) in the worst case.

An AVL tree is a self-balancing binary tree that ensures that the difference in height between the left and right subtrees of every node is at most 1. It achieves this balance by performing rotation operations during insertions and deletions in constant time.

A red-black tree is a self-balancing binary search tree that stores an extra bit of information (the color red or black) in each node. These colours are used to ensure that the tree remains balanced during insertions and deletions. While the balance of a red-black tree is not perfect, it is sufficient to keep the time complexity of basic operations around O(log n), where n is the total number of nodes in the tree.

Properties of binary search trees

  1. The binary search tree property allows us to explore all keys in sorted order by performing an in-order tree traversal. This means we first explore all keys in the left subtree (less than the root key), then the root key itself, and finally, all keys in the right subtree (greater than the root key).
  2. A binary search tree is a recursive object that consists of the root node and two smaller binary search trees (the left and right subtrees). This means that many BST problems can be solved using recursion.
  3. A BST combines the fast search capabilities of a sorted array with the flexibility of insertion and deletion in a linked list.
  4. Many different BSTs can represent the same set of keys and values. The total number of possible BSTs with n different keys is known as the Catalan number Cn, which equals (2n)! / ((n + 1)! * n!).
  5. Each node in a BST is labelled with a single key such that all keys in the left subtree of a node are less than the key of the node, and all keys in the right subtree are greater than the key of the node. This means that for any binary tree with n keys, there is exactly one way to label the nodes to make it a BST.
  6. In a BST, new nodes are inserted into null links at the bottom of the tree, preserving the tree structure. For example, the root node will contain the first key inserted, and one of the children of the root will contain the second key inserted, and so on. Because each node has two links, the tree grows outward rather than just downward. This means that we only need to explore keys on the path from the root to the leaf, and the number of keys explored becomes a smaller fraction of the total number of keys in the tree as the tree size increases.

Why Binary Search Tree is important?

The time complexity of dictionary operations (insert, delete, and search) can vary depending on the data structure used. For example, doubly-linked lists can perform insertions and deletions in O(1) time, but searching will take O(n) time in the worst case. Similarly, searching in a sorted array will take O(log n) time, but insertions and deletions will take O(n) time in the worst case.

Self-balancing binary search trees can perform all three operations (insert, delete, and search) in O(log n) time. However, one of the most efficient solutions is to use a hash table, which can perform these operations in an average of O(1) time. The question then arises: If hash tables are faster than BSTs for these basic operations, when should we prefer BSTs over hash tables? Let's think.

Advantages of binary search tree over hash table

Binary search trees are memory efficient because they only allocate as much space as needed, unlike hash tables which may waste space if the exact number of elements to be stored is unknown. For example, if a hash function has a range of 0 ≤ h(k) ≤ 100, then an array of 100 elements must be allocated, even if only 20 elements are being stored. In contrast, a binary search tree would only allocate the necessary space.

BSTs store keys in sorted order and allow for the efficient performance of various operations in O(log n) time, such as finding the minimum or maximum, deleting the minimum or maximum, finding the successor or predecessor, and finding the kth largest or smallest. This versatility makes a self-balancing BST suitable as both a dictionary and a double-ended priority queue. While a heap data structure can support either the extractMin() or extractMax() operation, a self-balancing BST can support both.

BSTs are easier to implement and customize than hash tables, which often rely on libraries provided by programming languages. However, poor choice of the hash function can result in collisions and decreased performance of the hash table. Additionally, as the number of elements in a hash table grows, it may need to be extended and reallocated, which can be costly. BSTs, on the other hand, are simple to implement and do not require sudden allocation of large amounts of data or rehashing operations.

Binary search trees also allow for efficient range searches, where only the necessary subtrees are searched rather than iterating over every slot in a hash table (O(n)).

Critical ideas to think!

  • We can sort a given set of n numbers by first building a binary search tree containing these numbers and then printing the numbers by an inorder tree walk. What is the worst-case and best-case time complexity for this sorting algorithm?
  • Design an array implementation of BST with three arrays: One to store keys, one to store array indices corresponding to left pointers, and one to store array indices corresponding to right pointers. Also, compare the efficiency of operations with standard linked implementation.
  • How can we modify BST implementation to keep the most recently accessed key so that it can be accessed in O(1) time if the next insert() or search() uses the same key?
  • Design a method that takes a node and two keys, min and max, as arguments and returns true if all the keys are between min and max.
  • What is the difference between the binary-search-tree property and the min-heap property? Can we use the min-heap property to print keys in sorted order in O(n) time?
  • How many binary search tree shapes of n nodes are there with height n? How many ways are there to insert n distinct keys into an initially empty BST that results in a tree of height n?
  • Suppose we construct a BST by repeatedly inserting distinct keys. Proof that the number of nodes explored in searching for a key is one plus the number of nodes examined when a key was first inserted into the tree.
  • Is deleting key x and then key y from a binary search tree leaves the same tree as deleting key y and then key x?
  • Prove that if a node in a BST has two children, its successor has no left child, and its predecessor has no right child.
  • Suppose we have an estimate ahead of time of how often search keys are to be accessed in a BST and the freedom to insert them in any order we desire. Should the keys be inserted into the tree in increasing order, decreasing order of likely frequency of access, or some other order?
  • Comparison-based sorting takes atleast O(nlogn) time for sorting n elements. Similarly, proves that no comparison-based algorithm can build a BST using fewer than O(nlogn) comparisons.

We will add more insights to this blog in the coming future. Enjoy learning, enjoy algorithms :)

Looking forward to your feedback and insights.

More from EnjoyAlgorithms

Self-paced Courses and Blogs