Module II: Advanced Data Structures

1. Introduction to Advanced Data Structures

Advanced data structures are designed to optimize specific operations beyond what basic structures like arrays and linked lists offer. They maintain special properties that guarantee efficient performance.

Why Self-Balancing Trees?

In a regular Binary Search Tree (BST), the height can become O(n) in the worst case (skewed tree), making operations O(n). Self-balancing trees maintain a height of O(log n), ensuring O(log n) operations.

Data Structure	Search	Insert	Delete	Space
BST (avg)	O(log n)	O(log n)	O(log n)	O(n)
BST (worst)	O(n)	O(n)	O(n)	O(n)
AVL Tree	O(log n)	O(log n)	O(log n)	O(n)
Red-Black Tree	O(log n)	O(log n)	O(log n)	O(n)
B-Tree	O(log n)	O(log n)	O(log n)	O(n)
Trie	O(m)*	O(m)*	O(m)*	O(ALPHABET × m × n)

*m = length of string, n = number of strings

2. AVL Trees

AVL Tree (Adelson-Velsky and Landis)

An AVL tree is a self-balancing Binary Search Tree where the difference between heights of left and right subtrees (Balance Factor) cannot be more than 1 for all nodes.

Balance Factor (BF) = Height(Left Subtree) - Height(Right Subtree)

Valid BF values: -1, 0, +1

2.1 Properties of AVL Trees

It is a Binary Search Tree (left < root < right)
Balance Factor of every node is -1, 0, or +1
Height is always O(log n)
Rebalancing is done through rotations after insertion/deletion

2.2 AVL Tree Rotations

When an insertion or deletion causes imbalance, rotations restore balance:

LL

Left-Left (Single Right Rotation)

Imbalance at node A, caused by insertion in left subtree of left child.
Perform right rotation at A.

RR

Right-Right (Single Left Rotation)

Imbalance at node A, caused by insertion in right subtree of right child.
Perform left rotation at A.

LR

Left-Right (Double Rotation)

Imbalance at node A, caused by insertion in right subtree of left child.
First left rotation at B (left child), then right rotation at A.

RL

Right-Left (Double Rotation)

Imbalance at node A, caused by insertion in left subtree of right child.
First right rotation at B (right child), then left rotation at A.

2.3 Rotation Diagrams

LL Rotation (Right Rotation)

    Before (BF=2):              After (BF=0):
         30                          20
        /                           /  \
       20         ─────►          10    30
      /
     10

    Steps:
    1. B becomes new root
    2. A becomes right child of B
    3. Right subtree of B becomes left subtree of A

RR Rotation (Left Rotation)

    Before (BF=-2):             After (BF=0):
       10                            20
         \                          /  \
          20       ─────►         10    30
            \
             30

    Steps:
    1. B becomes new root
    2. A becomes left child of B
    3. Left subtree of B becomes right subtree of A

LR Rotation (Left-Right)

    Before:                Step 1 (Left):        Step 2 (Right):
       30                      30                     25
      /                       /                      /  \
     20        ─────►        25       ─────►       20    30
       \                    /
        25                 20

RL Rotation (Right-Left)

    Before:                Step 1 (Right):       Step 2 (Left):
     10                       10                     15
       \                        \                   /  \
        20     ─────►            15    ─────►     10    20
       /                           \
      15                            20

📝 Example: Build AVL Tree by inserting 10, 20, 30, 40, 50, 25

Step-by-Step Construction

Insert 10:     Insert 20:     Insert 30:        After RR:
   10            10              10                20
                   \               \  BF=-2       /  \
                    20              20    →      10   30
                                      \
                                       30

Insert 40:        After RR at 30:     Insert 50:       After RR at 40:
    20                 20                 20                20
   /  \               /  \               /  \              /  \
  10   30    →      10    40            10   40    →     10   40
         \               /  \               /  \           /  \
          40           30    50           30   50        30   50

Insert 25:
      20
     /  \
   10    40
        /  \
       30   50
      /
     25
(Tree is balanced, no rotation needed)

2.4 Time Complexity Analysis

Operation	Time Complexity	Explanation
Search	O(log n)	Height is always log n
Insert	O(log n)	Search + at most 2 rotations
Delete	O(log n)	Search + at most O(log n) rotations
Rotation	O(1)	Constant pointer changes

3. B-Trees

B-Tree of Order m

A B-Tree is a self-balancing search tree designed for systems that read and write large blocks of data (like databases and file systems). It generalizes BST by allowing nodes to have more than two children.

3.1 Properties of B-Tree of Order m

Every node has at most m children
Every non-leaf node (except root) has at least ⌈m/2⌉ children
Root has at least 2 children if it's not a leaf
A non-leaf node with k children contains k-1 keys
All leaves appear at the same level (perfectly balanced)
Keys within a node are sorted in ascending order

For B-Tree of order m:
Max keys per node = m - 1
Min keys per node = ⌈m/2⌉ - 1

Example: Order 5 B-Tree has 2-4 keys per node (except root)

B-Tree of Order 3 Example

                    [30]
                   /    \
            [10, 20]    [40, 50]
           /   |   \    /   |   \
         [5] [15] [25] [35] [45] [55, 60]

Properties:
- Order m = 3
- Max children = 3, Max keys = 2
- Min children (non-root) = ⌈3/2⌉ = 2, Min keys = 1
- All leaves at same level

3.2 B-Tree Operations

Search Operation

📋 B-Tree Search Algorithm

Start from root
Compare key with keys in current node
If key found, return success
If key < smallest key, go to leftmost child
If key > largest key, go to rightmost child
Otherwise, go to appropriate child between keys
If leaf reached without finding, return failure

Insertion Operation

📋 B-Tree Insertion Algorithm

Search for the appropriate leaf node
If leaf has room (< m-1 keys), insert key in sorted order
If leaf is full, split the node:
- Find median key
- Keys less than median go to left node
- Keys greater than median go to right node
- Median moves up to parent
If parent overflows, repeat split process upward
If root splits, create new root (tree grows in height)

📝 Example: Insert 1, 3, 7, 10, 11, 13, 14, 15, 18, 16, 19, 24 into B-Tree of Order 5

Step-by-Step Insertion

Insert 1,3,7,10:        [1, 3, 7, 10]  (fits in one node)

Insert 11:              Split required (5 keys > max 4)
                              [7]
                             /   \
                     [1, 3]     [10, 11]

Insert 13, 14, 15:           [7]
                            /   \
                    [1, 3]     [10, 11, 13, 14, 15]

                    Split right child:
                              [7, 13]
                             /   |   \
                    [1, 3] [10,11] [14, 15]

Insert 18, 16:           [7, 13]
                        /   |   \
               [1, 3] [10,11] [14, 15, 16, 18]

Insert 19:     Split rightmost leaf:
                         [7, 13, 16]
                       /   |   |    \
              [1,3] [10,11] [14,15] [18, 19]

Insert 24:            [7, 13, 16]
                     /   |   |    \
            [1,3] [10,11] [14,15] [18, 19, 24]

Deletion Operation

📋 B-Tree Deletion Cases

Case 1: Key is in leaf node
- If node has more than minimum keys, simply delete
- If at minimum, borrow from sibling or merge
Case 2: Key is in internal node
- Replace with inorder predecessor or successor
- Delete the predecessor/successor from leaf
Case 3: Underflow handling
- Borrow from left or right sibling if possible
- Otherwise, merge with sibling

3.3 Time Complexity

Operation	Time Complexity
Search	O(log n)
Insert	O(log n)
Delete	O(log n)

4. B+ Trees

B+ Tree

A B+ Tree is a variation of B-Tree where all data is stored in leaf nodes, and internal nodes only store keys for navigation. Leaf nodes are linked together for efficient range queries.

4.1 Differences: B-Tree vs B+ Tree

B-Tree

Data in all nodes
No duplicate keys
Leaves not linked
Search may end at any node
Less keys in internal nodes

B+ Tree

Data only in leaf nodes
Keys duplicated in leaves
Leaves linked (sequential access)
Search always goes to leaf
More keys in internal nodes

B+ Tree Structure

                         [30 | 50]                    ← Internal node
                        /    |    \                    (only keys, no data)
                      /      |      \
              [10|20|30]  [40|50]  [60|70|80]        ← Leaf nodes
                 ↓           ↓          ↓              (keys + data)
              [data]      [data]     [data]
                 ↔           ↔          ↔            ← Linked list

Key features:
- Internal nodes: only keys for navigation
- Leaf nodes: keys + data pointers
- Leaves linked for range queries
- Keys 30 and 50 appear in both internal and leaf nodes

4.2 Advantages of B+ Trees

Better for range queries: Leaf nodes are linked, allowing sequential traversal
More keys per internal node: Internal nodes don't store data, so they can hold more keys
Better cache performance: Internal nodes fit better in memory
Uniform access time: All searches go to leaf level

💡

Database Usage: B+ Trees are the preferred data structure for database indexes (used in MySQL, PostgreSQL, Oracle) because they excel at both point queries and range queries.

4.3 Time Complexity

Operation	Time Complexity
Search	O(log n)
Insert	O(log n)
Delete	O(log n)
Range Query	O(log n + k) where k = results

5. Red-Black Trees

Red-Black Tree

A Red-Black Tree is a self-balancing BST where each node has an extra bit for color (red or black). The tree uses coloring rules to ensure the tree remains approximately balanced.

5.1 Properties (Rules)

📌 Five Properties of Red-Black Trees

Property 1: Every node is either RED or BLACK
Property 2: Root is always BLACK
Property 3: Every leaf (NIL/NULL) is BLACK
Property 4: If a node is RED, both its children must be BLACK (no two consecutive reds)
Property 5: For each node, all paths from node to descendant leaves have same number of black nodes (Black Height)

Valid Red-Black Tree Example

                    (13,B)                    B = Black, R = Red
                   /      \
               (8,R)      (17,R)
              /    \      /    \
          (1,B)  (11,B) (15,B) (25,B)
             \                  /
            (6,R)            (22,R)

Black Height from root = 2 (count black nodes to any leaf)
No two consecutive red nodes
Root is black

5.2 Insertion in Red-Black Tree

New nodes are always inserted as RED (to maintain black height). Then fix violations:

📋 Red-Black Tree Insertion Fix-Up Cases

Case 1: Uncle is RED
- Recolor parent and uncle to BLACK
- Recolor grandparent to RED
- Move up to grandparent and repeat
Case 2: Uncle is BLACK, node is inner grandchild
- Rotate to make it outer grandchild (Case 3)
Case 3: Uncle is BLACK, node is outer grandchild
- Rotate at grandparent
- Swap colors of parent and grandparent

5.3 Comparison: AVL vs Red-Black Trees

Aspect	AVL Tree	Red-Black Tree
Balancing	Strictly balanced (BF: -1,0,1)	Loosely balanced (color rules)
Height	≤ 1.44 log n	≤ 2 log n
Search	Faster (shorter height)	Slightly slower
Insert/Delete	More rotations	Fewer rotations
Use Case	Search-heavy applications	Insert/delete-heavy (C++ STL map)

5.4 Time Complexity

Operation	Time Complexity
Search	O(log n)
Insert	O(log n)
Delete	O(log n)
Rotations per operation	O(1) amortized

6. Tries (Prefix Trees)

Trie

A Trie (pronounced "try") is a tree-like data structure used to store strings. Each node represents a character, and paths from root to nodes represent prefixes. The name comes from reTRIEval.

6.1 Properties of Tries

Root node is empty
Each edge represents a character
Path from root to a node represents a prefix
End-of-word nodes are marked specially
Common prefixes are shared

Trie containing: "cat", "car", "card", "care", "dog"

                        (root)
                       /      \
                     c          d
                     |          |
                     a          o
                    / \         |
                   t*  r        g*
                      /|\
                     d* e*
                         |
                         *

    * = End of word marker

    Searching "car":
    root → c → a → r (found, marked as word end)

    Searching "cab":
    root → c → a → b (not found, no 'b' edge)

6.2 Trie Operations

Pseudocode

structure TrieNode:
    children[ALPHABET_SIZE]  // Array of child pointers
    isEndOfWord              // Boolean flag

function INSERT(root, word):
    node = root
    for each character c in word:
        index = c - 'a'
        if node.children[index] is NULL:
            node.children[index] = new TrieNode()
        node = node.children[index]
    node.isEndOfWord = true

function SEARCH(root, word):
    node = root
    for each character c in word:
        index = c - 'a'
        if node.children[index] is NULL:
            return false
        node = node.children[index]
    return node.isEndOfWord

function STARTS_WITH(root, prefix):
    node = root
    for each character c in prefix:
        index = c - 'a'
        if node.children[index] is NULL:
            return false
        node = node.children[index]
    return true

6.3 Time & Space Complexity

Operation	Time Complexity	Explanation
Insert	O(m)	m = length of word
Search	O(m)	m = length of word
Delete	O(m)	m = length of word
Prefix Search	O(p + k)	p = prefix length, k = matches

Space Complexity: O(ALPHABET_SIZE × m × n)

Where m = average word length, n = number of words

6.4 Applications of Tries

Autocomplete: Finding all words with given prefix
Spell Checker: Checking if word exists in dictionary
IP Routing: Longest prefix matching
Phone Directory: Contact search

7. Graphs

Graph

A Graph G = (V, E) consists of a set of vertices V and a set of edges E. Each edge connects two vertices. Graphs can be directed or undirected, weighted or unweighted.

7.1 Types of Graphs

Undirected Graph

Edges have no direction
Edge (u,v) = Edge (v,u)
Example: Social networks

    A --- B
    |     |
    C --- D

Directed Graph (Digraph)

Edges have direction
Edge (u,v) ≠ Edge (v,u)
Example: Web links, dependencies

    A → B
    ↓   ↓
    C → D

7.2 Graph Representations

Adjacency Matrix

A 2D array of size V×V where A[i][j] = 1 (or weight) if edge exists from i to j, else 0.

Adjacency Matrix Example

Graph:                    Adjacency Matrix:
    0 --- 1                     0   1   2   3
    |   / |               0  [  0   1   1   0  ]
    |  /  |               1  [  1   0   1   1  ]
    2 --- 3               2  [  1   1   0   1  ]
                          3  [  0   1   1   0  ]

Space: O(V²)
Check edge: O(1)
Find all neighbors: O(V)

Adjacency List

An array of lists where the index represents a vertex and its list contains all adjacent vertices.

Adjacency List Example

Graph:                    Adjacency List:
    0 --- 1               0: [1, 2]
    |   / |               1: [0, 2, 3]
    |  /  |               2: [0, 1, 3]
    2 --- 3               3: [1, 2]

Space: O(V + E)
Check edge: O(degree of vertex)
Find all neighbors: O(degree of vertex)

7.3 Comparison of Representations

Aspect	Adjacency Matrix	Adjacency List
Space	O(V²)	O(V + E)
Check if edge exists	O(1)	O(V) worst case
Find all neighbors	O(V)	O(degree)
Add edge	O(1)	O(1)
Best for	Dense graphs	Sparse graphs

8. Breadth First Search (BFS)

Breadth First Search

BFS is a graph traversal algorithm that explores all vertices at the current depth level before moving to vertices at the next depth level. It uses a queue data structure.

8.1 BFS Algorithm

Pseudocode

function BFS(graph, start):
    visited = set()
    queue = empty queue

    visited.add(start)
    queue.enqueue(start)

    while queue is not empty:
        vertex = queue.dequeue()
        print(vertex)  // Process vertex

        for each neighbor of vertex in graph:
            if neighbor not in visited:
                visited.add(neighbor)
                queue.enqueue(neighbor)

📝 Example: BFS Traversal from vertex 0

BFS Step-by-Step

Graph:
        0
       / \
      1   2
     / \   \
    3   4   5

BFS from 0:
Step 1: Visit 0, Queue: [1, 2]
Step 2: Visit 1, Queue: [2, 3, 4]
Step 3: Visit 2, Queue: [3, 4, 5]
Step 4: Visit 3, Queue: [4, 5]
Step 5: Visit 4, Queue: [5]
Step 6: Visit 5, Queue: []

BFS Order: 0 → 1 → 2 → 3 → 4 → 5
(Level by level traversal)

8.2 BFS Properties & Applications

📌 BFS Properties

Visits vertices level by level
Finds shortest path in unweighted graphs
Uses O(V) extra space for queue
Time Complexity: O(V + E)
Space Complexity: O(V)

Applications:

Shortest path in unweighted graph
Finding connected components
Level order traversal in trees
Finding all nodes within k distance
Web crawlers
Social network friend suggestions

9. Depth First Search (DFS)

Depth First Search

DFS is a graph traversal algorithm that explores as far as possible along each branch before backtracking. It uses a stack (or recursion).

9.1 DFS Algorithm

Pseudocode - Recursive

function DFS(graph, vertex, visited):
    visited.add(vertex)
    print(vertex)  // Process vertex

    for each neighbor of vertex in graph:
        if neighbor not in visited:
            DFS(graph, neighbor, visited)

// Call: DFS(graph, start, empty set)

Pseudocode - Iterative

function DFS_Iterative(graph, start):
    visited = set()
    stack = empty stack

    stack.push(start)

    while stack is not empty:
        vertex = stack.pop()

        if vertex not in visited:
            visited.add(vertex)
            print(vertex)  // Process vertex

            for each neighbor of vertex in graph:
                if neighbor not in visited:
                    stack.push(neighbor)

📝 Example: DFS Traversal from vertex 0

DFS Step-by-Step

Graph:
        0
       / \
      1   2
     / \   \
    3   4   5

DFS from 0 (recursive, left-first):
Step 1: Visit 0, explore neighbor 1
Step 2: Visit 1, explore neighbor 3
Step 3: Visit 3 (no unvisited neighbors), backtrack
Step 4: Back to 1, explore neighbor 4
Step 5: Visit 4, backtrack to 1, then to 0
Step 6: Explore neighbor 2
Step 7: Visit 2, explore neighbor 5
Step 8: Visit 5, done

DFS Order: 0 → 1 → 3 → 4 → 2 → 5
(Go deep, then backtrack)

9.2 DFS Properties & Applications

📌 DFS Properties

Explores as deep as possible before backtracking
Uses O(V) space for recursion stack
Time Complexity: O(V + E)
Space Complexity: O(V)
Does NOT find shortest path

Applications:

Detecting cycles in graph
Topological sorting
Finding strongly connected components
Solving puzzles (mazes)
Path finding
Checking if graph is bipartite

9.3 BFS vs DFS Comparison

Aspect	BFS	DFS
Data Structure	Queue (FIFO)	Stack (LIFO)/Recursion
Approach	Level by level	Go deep, backtrack
Shortest Path	Yes (unweighted)	No
Time	O(V + E)	O(V + E)
Space	O(V)	O(V)
Complete	Yes	Yes (finite graphs)
Best for	Shortest path, nearest	Connectivity, cycles

10. Practice Questions

Question 1 10 Marks

Insert the following elements into an AVL tree: 50, 25, 10, 5, 7, 3, 30, 20. Show the tree after each insertion and any rotations performed.

View Answer ▼

Key rotations needed:

After 10: LL rotation at 50
After 7: LR rotation at 10
After 3: LL rotation at 7

Final tree will be balanced with height ~3

Question 2 5 Marks

What is the difference between B-Tree and B+ Tree? Give advantages of B+ Tree.

View Answer ▼

Differences:

B-Tree: Data in all nodes; B+ Tree: Data only in leaves
B-Tree: No linked leaves; B+ Tree: Leaves are linked
B-Tree: No duplicate keys; B+ Tree: Keys duplicated in leaves

Advantages of B+ Tree:

Better for range queries (linked leaves)
More keys per node (internal nodes smaller)
Better cache performance

Question 3 10 Marks

Perform BFS and DFS traversal on the following graph starting from vertex A:
Edges: A-B, A-C, B-D, B-E, C-F, D-G

View Answer ▼

BFS: A → B → C → D → E → F → G

(Level by level: A first, then B,C, then D,E,F, then G)

DFS: A → B → D → G → E → C → F

(Go deep: A→B→D→G, backtrack, E, backtrack, C→F)

Question 4 5 Marks

List the five properties of Red-Black trees.

View Answer ▼

Every node is RED or BLACK
Root is always BLACK
Every NULL leaf is BLACK
RED nodes have BLACK children (no consecutive reds)
All paths from node to leaves have same black height

Question 5 5 Marks

Explain Trie data structure with example. What are its applications?

View Answer ▼

Trie: Tree structure where each node represents a character. Path from root represents prefix.

Time: O(m) for search/insert where m = word length

Applications: Autocomplete, Spell checker, IP routing, Dictionary

Key Points Summary

📌 Remember for Exam

AVL: Balance Factor = Height(left) - Height(right), must be -1, 0, or +1
AVL Rotations: LL, RR, LR, RL - know when to apply each
B-Tree order m: max m-1 keys, min ⌈m/2⌉-1 keys per node
B+ Tree: Data only in leaves, leaves are linked
Red-Black: 5 properties, root is black, no consecutive reds
Trie: O(m) operations, good for prefix matching
BFS: Queue, level-by-level, finds shortest path
DFS: Stack/recursion, go deep then backtrack
Both BFS and DFS: O(V+E) time, O(V) space