Binary Search Trees
Learn all about Binary Search Trees in this comprehensive tutorial.
- •A Binary Search Tree (BST) is a type of Binary Tree data structure, where the following properties must be true for any node "X" in the tree:
- •Just to confirm that we actually have a Binary Search Tree data structure in front of us, we can check if the properties at the top of this page are true.
- •Searching for a value in a BST is very similar to how we found a value using Binary Search on an array.
- •Inserting a node in a BST is similar to searching for a value.
- •The next section will explain how we can delete a node in a BST, but to do that we need a function that finds the lowest value in a node's subtree.
- •To delete a node, our function must first search the BST to find it.
- •Binary Search Trees take the best from two other data structures: Arrays and Linked Lists.
- •On a Binary Search Tree, operations like inserting a new node, deleting a node, or searching for a node are actually O(h).
Binary Search Trees
A Binary Search Tree (BST) is a type of Binary Tree data structure, where the following properties must be true for any node "X" in the tree:
- The X node's left child and all of its descendants (children, children's children, and so on) have lower values than X's value.
- The right child, and all its descendants have higher values than X's value.
- Left and right subtrees must also be Binary Search Trees.
These properties makes it faster to search, add and delete values than a regular binary tree.
To make this as easy to understand and implement as possible, let's also assume that all values in a Binary Search Tree are unique.
The size of a tree is the number of nodes in it (n).
A subtree starts with one of the nodes in the tree as a local root, and consists of that node and all its descendants.
The descendants of a node are all the child nodes of that node, and all their child nodes, and so on. Just start with a node, and the descendants will be all nodes that are connected below that node.
The node's height is the maximum number of edges between that node and a leaf node.
A node's in-order successor is the node that comes after it if we were to do in-order traversal. In-order traversal of the BST above would result in node 13 coming before node 14, and so the successor of node 13 is node 14.
Traversal of a Binary Search Tree
Just to confirm that we actually have a Binary Search Tree data structure in front of us, we can check if the properties at the top of this page are true. So for every node in the figure above, check if all the values to the left of the node are lower, and that all values to the right are higher.
Another way to check if a Binary Tree is BST, is to do an in-order traversal (like we did on the previous page) and check if the resulting list of values are in an increasing order.
The code below is an implementation of the Binary Search Tree in the figure above, with traversal.
As we can see by running the code example above, the in-order traversal produces a list of numbers in an increasing (ascending) order, which means that this Binary Tree is a Binary Search Tree.
Search for a Value in a BST
Searching for a value in a BST is very similar to how we found a value using Binary Search on an array.
For Binary Search to work, the array must be sorted already, and searching for a value in an array can then be done really fast.
Similarly, searching for a value in a BST can also be done really fast because of how the nodes are placed.
The algorithm can be implemented like this:
The time complexity for searching a BST for a value is O(h), where h is the height of the tree.
For a BST with most nodes on the right side for example, the height of the tree becomes larger than it needs to be, and the worst case search will take longer. Such trees are called unbalanced.
Both Binary Search Trees above have the same nodes, and in-order traversal of both trees gives us the same result but the height is very different. It takes longer time to search the unbalanced tree above because it is higher.
We will use the next page to describe a type of Binary Tree called AVL Trees. AVL trees are self-balancing, which means that the height of the tree is kept to a minimum so that operations like search, insertion and deletion take less time.
Insert a Node in a BST
Inserting a node in a BST is similar to searching for a value.
Inserting nodes as described above means that an inserted node will always become a new leaf node.
All nodes in the BST are unique, so in case we find the same value as the one we want to insert, we do nothing.
This is how node insertion in BST can be implemented:
Find The Lowest Value in a BST Subtree
The next section will explain how we can delete a node in a BST, but to do that we need a function that finds the lowest value in a node's subtree.
This is how a function for finding the lowest value in the subtree of a BST node looks like:
We will use this minValueNode() function in the section below, to find a node's in-order successor, and use that to delete a node.
Delete a Node in a BST
To delete a node, our function must first search the BST to find it.
After the node is found there are three different cases where deleting a node must be done differently.
In step 3 above, the successor we find will always be a leaf node, and because it is the node that comes right after the node we want to delete, we can swap values with it and delete it.
This is how a BST can be implemented with functionality for deleting a node:
Line 1: The node argument here makes it possible for the function to call itself recursively on smaller and smaller subtrees in the search for the node with the data we want to delete.
Line 2-8: This is searching for the node with correct data that we want to delete.
Line 9-22: The node we want to delete has been found. There are three such cases:
- **Case 1**: Node with no child nodes (leaf node). None is returned, and that becomes the parent node's new left or right value by recursion (line 6 or 8).
- **Case 2**: Node with either left or right child node. That left or right child node becomes the parent's new left or right child through recursion (line 7 or 9).
- **Case 3**: Node has both left and right child nodes. The in-order successor is found using the minValueNode() function. We keep the successor's value by setting it as the value of the node we want to delete, and then we can delete the successor node.
Line 24: node is returned to maintain the recursive functionality.
BST Compared to Other Data Structures
Binary Search Trees take the best from two other data structures: Arrays and Linked Lists.
| Data Structure | Searching for a value | Delete / Insert leads to shifting in memory |
|---|---|---|
| Sorted Array | O(\log n) | Yes |
| Linked List | O(n) | No |
| Binary Search Tree | O(\log n) | No |
Searching a BST is just as fast as Binary Search on an array, with the same time complexity O(log n).
And deleting and inserting new values can be done without shifting elements in memory, just like with Linked Lists.
BST Balance and Time Complexity
On a Binary Search Tree, operations like inserting a new node, deleting a node, or searching for a node are actually O(h). That means that the higher the tree is (h), the longer the operation will take.
The reason why we wrote that searching for a value is O(log n) in the table above is because that is true if the tree is "balanced", like in the image below.
We call this tree balanced because there are approximately the same number of nodes on the left and right side of the tree.
The exact way to tell that a Binary Tree is balanced is that the height of the left and right subtrees of any node only differs by one. In the image above, the left subtree of the root node has height h=2, and the right subtree has height h=3.
For a balanced BST, with a large number of nodes (big n), we get height h ≈ \log_2 n, and therefore the time complexity for searching, deleting, or inserting a node can be written as O(h) = O(\log n).
But, in case the BST is completely unbalanced, like in the image below, the height of the tree is approximately the same as the number of nodes, h ≈ n, and we get time complexity O(h) = O(n) for searching, deleting, or inserting a node.
So, to optimize operations on a BST, the height must be minimized, and to do that the tree must be balanced.
And keeping a Binary Search Tree balanced is exactly what AVL Trees do, which is the data structure explained on the next page.
Module quiz
2 questionsWhich of the following is true about Binary Search Trees?
What is the most common pitfall when working with Binary Search Trees?
Answer all questions to submit.