1. Introduction to Advanced Data Structures
Advanced data structures are designed to optimize specific operations beyond what basic structures like arrays and linked lists offer. They maintain special properties that guarantee efficient performance.
Why Self-Balancing Trees?
In a regular Binary Search Tree (BST), the height can become O(n) in the worst case (skewed tree), making operations O(n). Self-balancing trees maintain a height of O(log n), ensuring O(log n) operations.
| Data Structure | Search | Insert | Delete | Space |
|---|---|---|---|---|
| BST (avg) | O(log n) | O(log n) | O(log n) | O(n) |
| BST (worst) | O(n) | O(n) | O(n) | O(n) |
| AVL Tree | O(log n) | O(log n) | O(log n) | O(n) |
| Red-Black Tree | O(log n) | O(log n) | O(log n) | O(n) |
| B-Tree | O(log n) | O(log n) | O(log n) | O(n) |
| Trie | O(m)* | O(m)* | O(m)* | O(ALPHABET × m × n) |
*m = length of string, n = number of strings
2. AVL Trees
2.1 Properties of AVL Trees
- It is a Binary Search Tree (left < root < right)
- Balance Factor of every node is -1, 0, or +1
- Height is always O(log n)
- Rebalancing is done through rotations after insertion/deletion
2.2 AVL Tree Rotations
When an insertion or deletion causes imbalance, rotations restore balance:
Perform right rotation at A.
Perform left rotation at A.
First left rotation at B (left child), then right rotation at A.
First right rotation at B (right child), then left rotation at A.
2.3 Rotation Diagrams
Before (BF=2): After (BF=0):
30 20
/ / \
20 ─────► 10 30
/
10
Steps:
1. B becomes new root
2. A becomes right child of B
3. Right subtree of B becomes left subtree of A
Before (BF=-2): After (BF=0):
10 20
\ / \
20 ─────► 10 30
\
30
Steps:
1. B becomes new root
2. A becomes left child of B
3. Left subtree of B becomes right subtree of A
Before: Step 1 (Left): Step 2 (Right):
30 30 25
/ / / \
20 ─────► 25 ─────► 20 30
\ /
25 20
Before: Step 1 (Right): Step 2 (Left):
10 10 15
\ \ / \
20 ─────► 15 ─────► 10 20
/ \
15 20
Insert 10: Insert 20: Insert 30: After RR:
10 10 10 20
\ \ BF=-2 / \
20 20 → 10 30
\
30
Insert 40: After RR at 30: Insert 50: After RR at 40:
20 20 20 20
/ \ / \ / \ / \
10 30 → 10 40 10 40 → 10 40
\ / \ / \ / \
40 30 50 30 50 30 50
Insert 25:
20
/ \
10 40
/ \
30 50
/
25
(Tree is balanced, no rotation needed)
2.4 Time Complexity Analysis
| Operation | Time Complexity | Explanation |
|---|---|---|
| Search | O(log n) | Height is always log n |
| Insert | O(log n) | Search + at most 2 rotations |
| Delete | O(log n) | Search + at most O(log n) rotations |
| Rotation | O(1) | Constant pointer changes |
3. B-Trees
3.1 Properties of B-Tree of Order m
- Every node has at most m children
- Every non-leaf node (except root) has at least ⌈m/2⌉ children
- Root has at least 2 children if it's not a leaf
- A non-leaf node with k children contains k-1 keys
- All leaves appear at the same level (perfectly balanced)
- Keys within a node are sorted in ascending order
Max keys per node = m - 1
Min keys per node = ⌈m/2⌉ - 1
[30]
/ \
[10, 20] [40, 50]
/ | \ / | \
[5] [15] [25] [35] [45] [55, 60]
Properties:
- Order m = 3
- Max children = 3, Max keys = 2
- Min children (non-root) = ⌈3/2⌉ = 2, Min keys = 1
- All leaves at same level
3.2 B-Tree Operations
Search Operation
- Start from root
- Compare key with keys in current node
- If key found, return success
- If key < smallest key, go to leftmost child
- If key > largest key, go to rightmost child
- Otherwise, go to appropriate child between keys
- If leaf reached without finding, return failure
Insertion Operation
- Search for the appropriate leaf node
- If leaf has room (< m-1 keys), insert key in sorted order
- If leaf is full, split the node:
- Find median key
- Keys less than median go to left node
- Keys greater than median go to right node
- Median moves up to parent
- If parent overflows, repeat split process upward
- If root splits, create new root (tree grows in height)
Insert 1,3,7,10: [1, 3, 7, 10] (fits in one node)
Insert 11: Split required (5 keys > max 4)
[7]
/ \
[1, 3] [10, 11]
Insert 13, 14, 15: [7]
/ \
[1, 3] [10, 11, 13, 14, 15]
Split right child:
[7, 13]
/ | \
[1, 3] [10,11] [14, 15]
Insert 18, 16: [7, 13]
/ | \
[1, 3] [10,11] [14, 15, 16, 18]
Insert 19: Split rightmost leaf:
[7, 13, 16]
/ | | \
[1,3] [10,11] [14,15] [18, 19]
Insert 24: [7, 13, 16]
/ | | \
[1,3] [10,11] [14,15] [18, 19, 24]
Deletion Operation
- Case 1: Key is in leaf node
- If node has more than minimum keys, simply delete
- If at minimum, borrow from sibling or merge
- Case 2: Key is in internal node
- Replace with inorder predecessor or successor
- Delete the predecessor/successor from leaf
- Case 3: Underflow handling
- Borrow from left or right sibling if possible
- Otherwise, merge with sibling
3.3 Time Complexity
| Operation | Time Complexity |
|---|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
4. B+ Trees
4.1 Differences: B-Tree vs B+ Tree
B-Tree
- Data in all nodes
- No duplicate keys
- Leaves not linked
- Search may end at any node
- Less keys in internal nodes
B+ Tree
- Data only in leaf nodes
- Keys duplicated in leaves
- Leaves linked (sequential access)
- Search always goes to leaf
- More keys in internal nodes
[30 | 50] ← Internal node
/ | \ (only keys, no data)
/ | \
[10|20|30] [40|50] [60|70|80] ← Leaf nodes
↓ ↓ ↓ (keys + data)
[data] [data] [data]
↔ ↔ ↔ ← Linked list
Key features:
- Internal nodes: only keys for navigation
- Leaf nodes: keys + data pointers
- Leaves linked for range queries
- Keys 30 and 50 appear in both internal and leaf nodes
4.2 Advantages of B+ Trees
- Better for range queries: Leaf nodes are linked, allowing sequential traversal
- More keys per internal node: Internal nodes don't store data, so they can hold more keys
- Better cache performance: Internal nodes fit better in memory
- Uniform access time: All searches go to leaf level
4.3 Time Complexity
| Operation | Time Complexity |
|---|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
| Range Query | O(log n + k) where k = results |
5. Red-Black Trees
5.1 Properties (Rules)
- Property 1: Every node is either RED or BLACK
- Property 2: Root is always BLACK
- Property 3: Every leaf (NIL/NULL) is BLACK
- Property 4: If a node is RED, both its children must be BLACK (no two consecutive reds)
- Property 5: For each node, all paths from node to descendant leaves have same number of black nodes (Black Height)
(13,B) B = Black, R = Red
/ \
(8,R) (17,R)
/ \ / \
(1,B) (11,B) (15,B) (25,B)
\ /
(6,R) (22,R)
Black Height from root = 2 (count black nodes to any leaf)
No two consecutive red nodes
Root is black
5.2 Insertion in Red-Black Tree
New nodes are always inserted as RED (to maintain black height). Then fix violations:
- Case 1: Uncle is RED
- Recolor parent and uncle to BLACK
- Recolor grandparent to RED
- Move up to grandparent and repeat
- Case 2: Uncle is BLACK, node is inner grandchild
- Rotate to make it outer grandchild (Case 3)
- Case 3: Uncle is BLACK, node is outer grandchild
- Rotate at grandparent
- Swap colors of parent and grandparent
5.3 Comparison: AVL vs Red-Black Trees
| Aspect | AVL Tree | Red-Black Tree |
|---|---|---|
| Balancing | Strictly balanced (BF: -1,0,1) | Loosely balanced (color rules) |
| Height | ≤ 1.44 log n | ≤ 2 log n |
| Search | Faster (shorter height) | Slightly slower |
| Insert/Delete | More rotations | Fewer rotations |
| Use Case | Search-heavy applications | Insert/delete-heavy (C++ STL map) |
5.4 Time Complexity
| Operation | Time Complexity |
|---|---|
| Search | O(log n) |
| Insert | O(log n) |
| Delete | O(log n) |
| Rotations per operation | O(1) amortized |
6. Tries (Prefix Trees)
6.1 Properties of Tries
- Root node is empty
- Each edge represents a character
- Path from root to a node represents a prefix
- End-of-word nodes are marked specially
- Common prefixes are shared
(root)
/ \
c d
| |
a o
/ \ |
t* r g*
/|\
d* e*
|
*
* = End of word marker
Searching "car":
root → c → a → r (found, marked as word end)
Searching "cab":
root → c → a → b (not found, no 'b' edge)
6.2 Trie Operations
structure TrieNode:
children[ALPHABET_SIZE] // Array of child pointers
isEndOfWord // Boolean flag
function INSERT(root, word):
node = root
for each character c in word:
index = c - 'a'
if node.children[index] is NULL:
node.children[index] = new TrieNode()
node = node.children[index]
node.isEndOfWord = true
function SEARCH(root, word):
node = root
for each character c in word:
index = c - 'a'
if node.children[index] is NULL:
return false
node = node.children[index]
return node.isEndOfWord
function STARTS_WITH(root, prefix):
node = root
for each character c in prefix:
index = c - 'a'
if node.children[index] is NULL:
return false
node = node.children[index]
return true
6.3 Time & Space Complexity
| Operation | Time Complexity | Explanation |
|---|---|---|
| Insert | O(m) | m = length of word |
| Search | O(m) | m = length of word |
| Delete | O(m) | m = length of word |
| Prefix Search | O(p + k) | p = prefix length, k = matches |
6.4 Applications of Tries
- Autocomplete: Finding all words with given prefix
- Spell Checker: Checking if word exists in dictionary
- IP Routing: Longest prefix matching
- Phone Directory: Contact search
7. Graphs
7.1 Types of Graphs
Undirected Graph
- Edges have no direction
- Edge (u,v) = Edge (v,u)
- Example: Social networks
A --- B
| |
C --- D
Directed Graph (Digraph)
- Edges have direction
- Edge (u,v) ≠ Edge (v,u)
- Example: Web links, dependencies
A → B
↓ ↓
C → D
7.2 Graph Representations
Adjacency Matrix
Graph: Adjacency Matrix:
0 --- 1 0 1 2 3
| / | 0 [ 0 1 1 0 ]
| / | 1 [ 1 0 1 1 ]
2 --- 3 2 [ 1 1 0 1 ]
3 [ 0 1 1 0 ]
Space: O(V²)
Check edge: O(1)
Find all neighbors: O(V)
Adjacency List
Graph: Adjacency List:
0 --- 1 0: [1, 2]
| / | 1: [0, 2, 3]
| / | 2: [0, 1, 3]
2 --- 3 3: [1, 2]
Space: O(V + E)
Check edge: O(degree of vertex)
Find all neighbors: O(degree of vertex)
7.3 Comparison of Representations
| Aspect | Adjacency Matrix | Adjacency List |
|---|---|---|
| Space | O(V²) | O(V + E) |
| Check if edge exists | O(1) | O(V) worst case |
| Find all neighbors | O(V) | O(degree) |
| Add edge | O(1) | O(1) |
| Best for | Dense graphs | Sparse graphs |
8. Breadth First Search (BFS)
8.1 BFS Algorithm
function BFS(graph, start):
visited = set()
queue = empty queue
visited.add(start)
queue.enqueue(start)
while queue is not empty:
vertex = queue.dequeue()
print(vertex) // Process vertex
for each neighbor of vertex in graph:
if neighbor not in visited:
visited.add(neighbor)
queue.enqueue(neighbor)
Graph:
0
/ \
1 2
/ \ \
3 4 5
BFS from 0:
Step 1: Visit 0, Queue: [1, 2]
Step 2: Visit 1, Queue: [2, 3, 4]
Step 3: Visit 2, Queue: [3, 4, 5]
Step 4: Visit 3, Queue: [4, 5]
Step 5: Visit 4, Queue: [5]
Step 6: Visit 5, Queue: []
BFS Order: 0 → 1 → 2 → 3 → 4 → 5
(Level by level traversal)
8.2 BFS Properties & Applications
- Visits vertices level by level
- Finds shortest path in unweighted graphs
- Uses O(V) extra space for queue
- Time Complexity: O(V + E)
- Space Complexity: O(V)
Applications:
- Shortest path in unweighted graph
- Finding connected components
- Level order traversal in trees
- Finding all nodes within k distance
- Web crawlers
- Social network friend suggestions
9. Depth First Search (DFS)
9.1 DFS Algorithm
function DFS(graph, vertex, visited):
visited.add(vertex)
print(vertex) // Process vertex
for each neighbor of vertex in graph:
if neighbor not in visited:
DFS(graph, neighbor, visited)
// Call: DFS(graph, start, empty set)
function DFS_Iterative(graph, start):
visited = set()
stack = empty stack
stack.push(start)
while stack is not empty:
vertex = stack.pop()
if vertex not in visited:
visited.add(vertex)
print(vertex) // Process vertex
for each neighbor of vertex in graph:
if neighbor not in visited:
stack.push(neighbor)
Graph:
0
/ \
1 2
/ \ \
3 4 5
DFS from 0 (recursive, left-first):
Step 1: Visit 0, explore neighbor 1
Step 2: Visit 1, explore neighbor 3
Step 3: Visit 3 (no unvisited neighbors), backtrack
Step 4: Back to 1, explore neighbor 4
Step 5: Visit 4, backtrack to 1, then to 0
Step 6: Explore neighbor 2
Step 7: Visit 2, explore neighbor 5
Step 8: Visit 5, done
DFS Order: 0 → 1 → 3 → 4 → 2 → 5
(Go deep, then backtrack)
9.2 DFS Properties & Applications
- Explores as deep as possible before backtracking
- Uses O(V) space for recursion stack
- Time Complexity: O(V + E)
- Space Complexity: O(V)
- Does NOT find shortest path
Applications:
- Detecting cycles in graph
- Topological sorting
- Finding strongly connected components
- Solving puzzles (mazes)
- Path finding
- Checking if graph is bipartite
9.3 BFS vs DFS Comparison
| Aspect | BFS | DFS |
|---|---|---|
| Data Structure | Queue (FIFO) | Stack (LIFO)/Recursion |
| Approach | Level by level | Go deep, backtrack |
| Shortest Path | Yes (unweighted) | No |
| Time | O(V + E) | O(V + E) |
| Space | O(V) | O(V) |
| Complete | Yes | Yes (finite graphs) |
| Best for | Shortest path, nearest | Connectivity, cycles |
10. Practice Questions
Key rotations needed:
- After 10: LL rotation at 50
- After 7: LR rotation at 10
- After 3: LL rotation at 7
Final tree will be balanced with height ~3
Differences:
- B-Tree: Data in all nodes; B+ Tree: Data only in leaves
- B-Tree: No linked leaves; B+ Tree: Leaves are linked
- B-Tree: No duplicate keys; B+ Tree: Keys duplicated in leaves
Advantages of B+ Tree:
- Better for range queries (linked leaves)
- More keys per node (internal nodes smaller)
- Better cache performance
Edges: A-B, A-C, B-D, B-E, C-F, D-G
BFS: A → B → C → D → E → F → G
(Level by level: A first, then B,C, then D,E,F, then G)
DFS: A → B → D → G → E → C → F
(Go deep: A→B→D→G, backtrack, E, backtrack, C→F)
- Every node is RED or BLACK
- Root is always BLACK
- Every NULL leaf is BLACK
- RED nodes have BLACK children (no consecutive reds)
- All paths from node to leaves have same black height
Trie: Tree structure where each node represents a character. Path from root represents prefix.
Time: O(m) for search/insert where m = word length
Applications: Autocomplete, Spell checker, IP routing, Dictionary
Key Points Summary
- AVL: Balance Factor = Height(left) - Height(right), must be -1, 0, or +1
- AVL Rotations: LL, RR, LR, RL - know when to apply each
- B-Tree order m: max m-1 keys, min ⌈m/2⌉-1 keys per node
- B+ Tree: Data only in leaves, leaves are linked
- Red-Black: 5 properties, root is black, no consecutive reds
- Trie: O(m) operations, good for prefix matching
- BFS: Queue, level-by-level, finds shortest path
- DFS: Stack/recursion, go deep then backtrack
- Both BFS and DFS: O(V+E) time, O(V) space