Module II

Advanced Data Structures

Master self-balancing trees (AVL, B-Tree, B+Tree, Red-Black), Tries data structure, and graph representations with traversal algorithms (BFS & DFS).

1. Introduction to Advanced Data Structures

Advanced data structures are designed to optimize specific operations beyond what basic structures like arrays and linked lists offer. They maintain special properties that guarantee efficient performance.

Why Self-Balancing Trees?

In a regular Binary Search Tree (BST), the height can become O(n) in the worst case (skewed tree), making operations O(n). Self-balancing trees maintain a height of O(log n), ensuring O(log n) operations.

Data Structure Search Insert Delete Space
BST (avg) O(log n) O(log n) O(log n) O(n)
BST (worst) O(n) O(n) O(n) O(n)
AVL Tree O(log n) O(log n) O(log n) O(n)
Red-Black Tree O(log n) O(log n) O(log n) O(n)
B-Tree O(log n) O(log n) O(log n) O(n)
Trie O(m)* O(m)* O(m)* O(ALPHABET × m × n)

*m = length of string, n = number of strings

2. AVL Trees

AVL Tree (Adelson-Velsky and Landis)
An AVL tree is a self-balancing Binary Search Tree where the difference between heights of left and right subtrees (Balance Factor) cannot be more than 1 for all nodes.
Balance Factor (BF) = Height(Left Subtree) - Height(Right Subtree)
Valid BF values: -1, 0, +1

2.1 Properties of AVL Trees

  • It is a Binary Search Tree (left < root < right)
  • Balance Factor of every node is -1, 0, or +1
  • Height is always O(log n)
  • Rebalancing is done through rotations after insertion/deletion

2.2 AVL Tree Rotations

When an insertion or deletion causes imbalance, rotations restore balance:

LL
Left-Left (Single Right Rotation)
Imbalance at node A, caused by insertion in left subtree of left child.
Perform right rotation at A.
RR
Right-Right (Single Left Rotation)
Imbalance at node A, caused by insertion in right subtree of right child.
Perform left rotation at A.
LR
Left-Right (Double Rotation)
Imbalance at node A, caused by insertion in right subtree of left child.
First left rotation at B (left child), then right rotation at A.
RL
Right-Left (Double Rotation)
Imbalance at node A, caused by insertion in left subtree of right child.
First right rotation at B (right child), then left rotation at A.

2.3 Rotation Diagrams

LL Rotation (Right Rotation)
    Before (BF=2):              After (BF=0):
         30                          20
        /                           /  \
       20         ─────►          10    30
      /
     10

    Steps:
    1. B becomes new root
    2. A becomes right child of B
    3. Right subtree of B becomes left subtree of A
RR Rotation (Left Rotation)
    Before (BF=-2):             After (BF=0):
       10                            20
         \                          /  \
          20       ─────►         10    30
            \
             30

    Steps:
    1. B becomes new root
    2. A becomes left child of B
    3. Left subtree of B becomes right subtree of A
LR Rotation (Left-Right)
    Before:                Step 1 (Left):        Step 2 (Right):
       30                      30                     25
      /                       /                      /  \
     20        ─────►        25       ─────►       20    30
       \                    /
        25                 20
RL Rotation (Right-Left)
    Before:                Step 1 (Right):       Step 2 (Left):
     10                       10                     15
       \                        \                   /  \
        20     ─────►            15    ─────►     10    20
       /                           \
      15                            20
📝 Example: Build AVL Tree by inserting 10, 20, 30, 40, 50, 25
Step-by-Step Construction
Insert 10:     Insert 20:     Insert 30:        After RR:
   10            10              10                20
                   \               \  BF=-2       /  \
                    20              20    →      10   30
                                      \
                                       30

Insert 40:        After RR at 30:     Insert 50:       After RR at 40:
    20                 20                 20                20
   /  \               /  \               /  \              /  \
  10   30    →      10    40            10   40    →     10   40
         \               /  \               /  \           /  \
          40           30    50           30   50        30   50

Insert 25:
      20
     /  \
   10    40
        /  \
       30   50
      /
     25
(Tree is balanced, no rotation needed)

2.4 Time Complexity Analysis

Operation Time Complexity Explanation
Search O(log n) Height is always log n
Insert O(log n) Search + at most 2 rotations
Delete O(log n) Search + at most O(log n) rotations
Rotation O(1) Constant pointer changes

3. B-Trees

B-Tree of Order m
A B-Tree is a self-balancing search tree designed for systems that read and write large blocks of data (like databases and file systems). It generalizes BST by allowing nodes to have more than two children.

3.1 Properties of B-Tree of Order m

  • Every node has at most m children
  • Every non-leaf node (except root) has at least ⌈m/2⌉ children
  • Root has at least 2 children if it's not a leaf
  • A non-leaf node with k children contains k-1 keys
  • All leaves appear at the same level (perfectly balanced)
  • Keys within a node are sorted in ascending order
For B-Tree of order m:
Max keys per node = m - 1
Min keys per node = ⌈m/2⌉ - 1
Example: Order 5 B-Tree has 2-4 keys per node (except root)
B-Tree of Order 3 Example
                    [30]
                   /    \
            [10, 20]    [40, 50]
           /   |   \    /   |   \
         [5] [15] [25] [35] [45] [55, 60]

Properties:
- Order m = 3
- Max children = 3, Max keys = 2
- Min children (non-root) = ⌈3/2⌉ = 2, Min keys = 1
- All leaves at same level

3.2 B-Tree Operations

Search Operation

📋 B-Tree Search Algorithm
  1. Start from root
  2. Compare key with keys in current node
  3. If key found, return success
  4. If key < smallest key, go to leftmost child
  5. If key > largest key, go to rightmost child
  6. Otherwise, go to appropriate child between keys
  7. If leaf reached without finding, return failure

Insertion Operation

📋 B-Tree Insertion Algorithm
  1. Search for the appropriate leaf node
  2. If leaf has room (< m-1 keys), insert key in sorted order
  3. If leaf is full, split the node:
    • Find median key
    • Keys less than median go to left node
    • Keys greater than median go to right node
    • Median moves up to parent
  4. If parent overflows, repeat split process upward
  5. If root splits, create new root (tree grows in height)
📝 Example: Insert 1, 3, 7, 10, 11, 13, 14, 15, 18, 16, 19, 24 into B-Tree of Order 5
Step-by-Step Insertion
Insert 1,3,7,10:        [1, 3, 7, 10]  (fits in one node)

Insert 11:              Split required (5 keys > max 4)
                              [7]
                             /   \
                     [1, 3]     [10, 11]

Insert 13, 14, 15:           [7]
                            /   \
                    [1, 3]     [10, 11, 13, 14, 15]

                    Split right child:
                              [7, 13]
                             /   |   \
                    [1, 3] [10,11] [14, 15]

Insert 18, 16:           [7, 13]
                        /   |   \
               [1, 3] [10,11] [14, 15, 16, 18]

Insert 19:     Split rightmost leaf:
                         [7, 13, 16]
                       /   |   |    \
              [1,3] [10,11] [14,15] [18, 19]

Insert 24:            [7, 13, 16]
                     /   |   |    \
            [1,3] [10,11] [14,15] [18, 19, 24]

Deletion Operation

📋 B-Tree Deletion Cases
  1. Case 1: Key is in leaf node
    • If node has more than minimum keys, simply delete
    • If at minimum, borrow from sibling or merge
  2. Case 2: Key is in internal node
    • Replace with inorder predecessor or successor
    • Delete the predecessor/successor from leaf
  3. Case 3: Underflow handling
    • Borrow from left or right sibling if possible
    • Otherwise, merge with sibling

3.3 Time Complexity

Operation Time Complexity
Search O(log n)
Insert O(log n)
Delete O(log n)

4. B+ Trees

B+ Tree
A B+ Tree is a variation of B-Tree where all data is stored in leaf nodes, and internal nodes only store keys for navigation. Leaf nodes are linked together for efficient range queries.

4.1 Differences: B-Tree vs B+ Tree

B-Tree

  • Data in all nodes
  • No duplicate keys
  • Leaves not linked
  • Search may end at any node
  • Less keys in internal nodes

B+ Tree

  • Data only in leaf nodes
  • Keys duplicated in leaves
  • Leaves linked (sequential access)
  • Search always goes to leaf
  • More keys in internal nodes
B+ Tree Structure
                         [30 | 50]                    ← Internal node
                        /    |    \                    (only keys, no data)
                      /      |      \
              [10|20|30]  [40|50]  [60|70|80]        ← Leaf nodes
                 ↓           ↓          ↓              (keys + data)
              [data]      [data]     [data]
                 ↔           ↔          ↔            ← Linked list

Key features:
- Internal nodes: only keys for navigation
- Leaf nodes: keys + data pointers
- Leaves linked for range queries
- Keys 30 and 50 appear in both internal and leaf nodes

4.2 Advantages of B+ Trees

  • Better for range queries: Leaf nodes are linked, allowing sequential traversal
  • More keys per internal node: Internal nodes don't store data, so they can hold more keys
  • Better cache performance: Internal nodes fit better in memory
  • Uniform access time: All searches go to leaf level
💡
Database Usage: B+ Trees are the preferred data structure for database indexes (used in MySQL, PostgreSQL, Oracle) because they excel at both point queries and range queries.

4.3 Time Complexity

Operation Time Complexity
Search O(log n)
Insert O(log n)
Delete O(log n)
Range Query O(log n + k) where k = results

5. Red-Black Trees

Red-Black Tree
A Red-Black Tree is a self-balancing BST where each node has an extra bit for color (red or black). The tree uses coloring rules to ensure the tree remains approximately balanced.

5.1 Properties (Rules)

📌 Five Properties of Red-Black Trees
  • Property 1: Every node is either RED or BLACK
  • Property 2: Root is always BLACK
  • Property 3: Every leaf (NIL/NULL) is BLACK
  • Property 4: If a node is RED, both its children must be BLACK (no two consecutive reds)
  • Property 5: For each node, all paths from node to descendant leaves have same number of black nodes (Black Height)
Valid Red-Black Tree Example
                    (13,B)                    B = Black, R = Red
                   /      \
               (8,R)      (17,R)
              /    \      /    \
          (1,B)  (11,B) (15,B) (25,B)
             \                  /
            (6,R)            (22,R)

Black Height from root = 2 (count black nodes to any leaf)
No two consecutive red nodes
Root is black

5.2 Insertion in Red-Black Tree

New nodes are always inserted as RED (to maintain black height). Then fix violations:

📋 Red-Black Tree Insertion Fix-Up Cases
  1. Case 1: Uncle is RED
    • Recolor parent and uncle to BLACK
    • Recolor grandparent to RED
    • Move up to grandparent and repeat
  2. Case 2: Uncle is BLACK, node is inner grandchild
    • Rotate to make it outer grandchild (Case 3)
  3. Case 3: Uncle is BLACK, node is outer grandchild
    • Rotate at grandparent
    • Swap colors of parent and grandparent

5.3 Comparison: AVL vs Red-Black Trees

Aspect AVL Tree Red-Black Tree
Balancing Strictly balanced (BF: -1,0,1) Loosely balanced (color rules)
Height ≤ 1.44 log n ≤ 2 log n
Search Faster (shorter height) Slightly slower
Insert/Delete More rotations Fewer rotations
Use Case Search-heavy applications Insert/delete-heavy (C++ STL map)

5.4 Time Complexity

Operation Time Complexity
Search O(log n)
Insert O(log n)
Delete O(log n)
Rotations per operation O(1) amortized

6. Tries (Prefix Trees)

Trie
A Trie (pronounced "try") is a tree-like data structure used to store strings. Each node represents a character, and paths from root to nodes represent prefixes. The name comes from reTRIEval.

6.1 Properties of Tries

  • Root node is empty
  • Each edge represents a character
  • Path from root to a node represents a prefix
  • End-of-word nodes are marked specially
  • Common prefixes are shared
Trie containing: "cat", "car", "card", "care", "dog"
                        (root)
                       /      \
                     c          d
                     |          |
                     a          o
                    / \         |
                   t*  r        g*
                      /|\
                     d* e*
                         |
                         *

    * = End of word marker

    Searching "car":
    root → c → a → r (found, marked as word end)

    Searching "cab":
    root → c → a → b (not found, no 'b' edge)

6.2 Trie Operations

Pseudocode
structure TrieNode:
    children[ALPHABET_SIZE]  // Array of child pointers
    isEndOfWord              // Boolean flag

function INSERT(root, word):
    node = root
    for each character c in word:
        index = c - 'a'
        if node.children[index] is NULL:
            node.children[index] = new TrieNode()
        node = node.children[index]
    node.isEndOfWord = true

function SEARCH(root, word):
    node = root
    for each character c in word:
        index = c - 'a'
        if node.children[index] is NULL:
            return false
        node = node.children[index]
    return node.isEndOfWord

function STARTS_WITH(root, prefix):
    node = root
    for each character c in prefix:
        index = c - 'a'
        if node.children[index] is NULL:
            return false
        node = node.children[index]
    return true

6.3 Time & Space Complexity

Operation Time Complexity Explanation
Insert O(m) m = length of word
Search O(m) m = length of word
Delete O(m) m = length of word
Prefix Search O(p + k) p = prefix length, k = matches
Space Complexity: O(ALPHABET_SIZE × m × n)
Where m = average word length, n = number of words

6.4 Applications of Tries

  • Autocomplete: Finding all words with given prefix
  • Spell Checker: Checking if word exists in dictionary
  • IP Routing: Longest prefix matching
  • Phone Directory: Contact search

7. Graphs

Graph
A Graph G = (V, E) consists of a set of vertices V and a set of edges E. Each edge connects two vertices. Graphs can be directed or undirected, weighted or unweighted.

7.1 Types of Graphs

Undirected Graph

  • Edges have no direction
  • Edge (u,v) = Edge (v,u)
  • Example: Social networks
    A --- B
    |     |
    C --- D
                        

Directed Graph (Digraph)

  • Edges have direction
  • Edge (u,v) ≠ Edge (v,u)
  • Example: Web links, dependencies
    A → B
    ↓   ↓
    C → D
                        

7.2 Graph Representations

Adjacency Matrix

Adjacency Matrix
A 2D array of size V×V where A[i][j] = 1 (or weight) if edge exists from i to j, else 0.
Adjacency Matrix Example
Graph:                    Adjacency Matrix:
    0 --- 1                     0   1   2   3
    |   / |               0  [  0   1   1   0  ]
    |  /  |               1  [  1   0   1   1  ]
    2 --- 3               2  [  1   1   0   1  ]
                          3  [  0   1   1   0  ]

Space: O(V²)
Check edge: O(1)
Find all neighbors: O(V)

Adjacency List

Adjacency List
An array of lists where the index represents a vertex and its list contains all adjacent vertices.
Adjacency List Example
Graph:                    Adjacency List:
    0 --- 1               0: [1, 2]
    |   / |               1: [0, 2, 3]
    |  /  |               2: [0, 1, 3]
    2 --- 3               3: [1, 2]

Space: O(V + E)
Check edge: O(degree of vertex)
Find all neighbors: O(degree of vertex)

7.3 Comparison of Representations

Aspect Adjacency Matrix Adjacency List
Space O(V²) O(V + E)
Check if edge exists O(1) O(V) worst case
Find all neighbors O(V) O(degree)
Add edge O(1) O(1)
Best for Dense graphs Sparse graphs

8. Breadth First Search (BFS)

Breadth First Search
BFS is a graph traversal algorithm that explores all vertices at the current depth level before moving to vertices at the next depth level. It uses a queue data structure.

8.1 BFS Algorithm

Pseudocode
function BFS(graph, start):
    visited = set()
    queue = empty queue

    visited.add(start)
    queue.enqueue(start)

    while queue is not empty:
        vertex = queue.dequeue()
        print(vertex)  // Process vertex

        for each neighbor of vertex in graph:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.enqueue(neighbor)
📝 Example: BFS Traversal from vertex 0
BFS Step-by-Step
Graph:
        0
       / \
      1   2
     / \   \
    3   4   5

BFS from 0:
Step 1: Visit 0, Queue: [1, 2]
Step 2: Visit 1, Queue: [2, 3, 4]
Step 3: Visit 2, Queue: [3, 4, 5]
Step 4: Visit 3, Queue: [4, 5]
Step 5: Visit 4, Queue: [5]
Step 6: Visit 5, Queue: []

BFS Order: 0 → 1 → 2 → 3 → 4 → 5
(Level by level traversal)

8.2 BFS Properties & Applications

📌 BFS Properties
  • Visits vertices level by level
  • Finds shortest path in unweighted graphs
  • Uses O(V) extra space for queue
  • Time Complexity: O(V + E)
  • Space Complexity: O(V)

Applications:

  • Shortest path in unweighted graph
  • Finding connected components
  • Level order traversal in trees
  • Finding all nodes within k distance
  • Web crawlers
  • Social network friend suggestions

9. Depth First Search (DFS)

Depth First Search
DFS is a graph traversal algorithm that explores as far as possible along each branch before backtracking. It uses a stack (or recursion).

9.1 DFS Algorithm

Pseudocode - Recursive
function DFS(graph, vertex, visited):
    visited.add(vertex)
    print(vertex)  // Process vertex

    for each neighbor of vertex in graph:
        if neighbor not in visited:
            DFS(graph, neighbor, visited)

// Call: DFS(graph, start, empty set)
Pseudocode - Iterative
function DFS_Iterative(graph, start):
    visited = set()
    stack = empty stack

    stack.push(start)

    while stack is not empty:
        vertex = stack.pop()

        if vertex not in visited:
            visited.add(vertex)
            print(vertex)  // Process vertex

            for each neighbor of vertex in graph:
                if neighbor not in visited:
                    stack.push(neighbor)
📝 Example: DFS Traversal from vertex 0
DFS Step-by-Step
Graph:
        0
       / \
      1   2
     / \   \
    3   4   5

DFS from 0 (recursive, left-first):
Step 1: Visit 0, explore neighbor 1
Step 2: Visit 1, explore neighbor 3
Step 3: Visit 3 (no unvisited neighbors), backtrack
Step 4: Back to 1, explore neighbor 4
Step 5: Visit 4, backtrack to 1, then to 0
Step 6: Explore neighbor 2
Step 7: Visit 2, explore neighbor 5
Step 8: Visit 5, done

DFS Order: 0 → 1 → 3 → 4 → 2 → 5
(Go deep, then backtrack)

9.2 DFS Properties & Applications

📌 DFS Properties
  • Explores as deep as possible before backtracking
  • Uses O(V) space for recursion stack
  • Time Complexity: O(V + E)
  • Space Complexity: O(V)
  • Does NOT find shortest path

Applications:

  • Detecting cycles in graph
  • Topological sorting
  • Finding strongly connected components
  • Solving puzzles (mazes)
  • Path finding
  • Checking if graph is bipartite

9.3 BFS vs DFS Comparison

Aspect BFS DFS
Data Structure Queue (FIFO) Stack (LIFO)/Recursion
Approach Level by level Go deep, backtrack
Shortest Path Yes (unweighted) No
Time O(V + E) O(V + E)
Space O(V) O(V)
Complete Yes Yes (finite graphs)
Best for Shortest path, nearest Connectivity, cycles

10. Practice Questions

Question 1 10 Marks
Insert the following elements into an AVL tree: 50, 25, 10, 5, 7, 3, 30, 20. Show the tree after each insertion and any rotations performed.
View Answer

Key rotations needed:

  • After 10: LL rotation at 50
  • After 7: LR rotation at 10
  • After 3: LL rotation at 7

Final tree will be balanced with height ~3

Question 2 5 Marks
What is the difference between B-Tree and B+ Tree? Give advantages of B+ Tree.
View Answer

Differences:

  • B-Tree: Data in all nodes; B+ Tree: Data only in leaves
  • B-Tree: No linked leaves; B+ Tree: Leaves are linked
  • B-Tree: No duplicate keys; B+ Tree: Keys duplicated in leaves

Advantages of B+ Tree:

  • Better for range queries (linked leaves)
  • More keys per node (internal nodes smaller)
  • Better cache performance
Question 3 10 Marks
Perform BFS and DFS traversal on the following graph starting from vertex A:
Edges: A-B, A-C, B-D, B-E, C-F, D-G
View Answer

BFS: A → B → C → D → E → F → G

(Level by level: A first, then B,C, then D,E,F, then G)

DFS: A → B → D → G → E → C → F

(Go deep: A→B→D→G, backtrack, E, backtrack, C→F)

Question 4 5 Marks
List the five properties of Red-Black trees.
View Answer
  1. Every node is RED or BLACK
  2. Root is always BLACK
  3. Every NULL leaf is BLACK
  4. RED nodes have BLACK children (no consecutive reds)
  5. All paths from node to leaves have same black height
Question 5 5 Marks
Explain Trie data structure with example. What are its applications?
View Answer

Trie: Tree structure where each node represents a character. Path from root represents prefix.

Time: O(m) for search/insert where m = word length

Applications: Autocomplete, Spell checker, IP routing, Dictionary

Key Points Summary

📌 Remember for Exam
  • AVL: Balance Factor = Height(left) - Height(right), must be -1, 0, or +1
  • AVL Rotations: LL, RR, LR, RL - know when to apply each
  • B-Tree order m: max m-1 keys, min ⌈m/2⌉-1 keys per node
  • B+ Tree: Data only in leaves, leaves are linked
  • Red-Black: 5 properties, root is black, no consecutive reds
  • Trie: O(m) operations, good for prefix matching
  • BFS: Queue, level-by-level, finds shortest path
  • DFS: Stack/recursion, go deep then backtrack
  • Both BFS and DFS: O(V+E) time, O(V) space