Indexing, B and B+ Tree
Indexing in DBMS
Section titled “Indexing in DBMS”Indexing is a data structure technique used to quickly locate and access data in a database table. It works like an index in a book—rather than scanning the entire table, the DBMS uses the index to find the desired record faster.
Indexes are created on one or more columns== of a table to ==improve retrieval speed at the cost of additional storage and maintenance overhead.
Types of Indexes in DBMS
Section titled “Types of Indexes in DBMS”- Primary Index
- Built on primary key
- Data file ordered on key
- One index entry per block
- Secondary Index
- Built on non-ordering attribute
- Data file not ordered
- Multiple entries may point to same block
- Clustering Index
- Built on non-key ordering attribute
- Data grouped on same value
- One entry per distinct value
- Dense Index
- Index entry for every record
- Faster search, more space
- Sparse Index
- Index entry for some records (usually one per block)
- Less space, slower than dense
- Multilevel Index
- Index on index
- Reduces search to O(log n)
- B-Tree Index
- Balanced tree
- Maintains sorted order
- Used for range queries
- B+ Tree Index
- Data pointers only at leaf nodes
- Leaf nodes linked
- Most widely used in DBMS
- Hash Index
- Uses hash function
- O(1) average search
- Not suitable for range queries
- Bitmap Index
- Uses bit vectors
- Efficient for low-cardinality attributes
- Used in data warehouses
- Composite (Multicolumn) Index
- Built on multiple attributes
- Order of attributes matters
Indexes Types are Classified Based on
Index TYPE → Primary / Secondary / ClusteringIndex DENSITY → Dense / SparseIndex STRUCTURE → B-Tree / B+-Tree / Hash / BitmapIndex Different Classification
Section titled “Index Different Classification”A. Indexes - based on ordering and Key
→ Logical classification based on file ordering & key vs non-key
| ⭐ | Key | Non Key |
|---|---|---|
| Ordered file → | Primary Index | Clustered Index |
| Unordered file → | Secondary Index | Secondary Index |
1. Primary Index
-
Created automatically== on the ==primary key of a table.
-
The index is based on the primary key field, which is unique and sorted.
- One entry per data block.
2. Secondary Index
-
Created on non-primary key attributes== to ==speed up queries.
- Can have duplicates.
- Useful when searching on non-key columns.
3. Clustered Index
-
Determines the physical order of data in the table.
- A table can have only one clustered index because data can be physically sorted in only one order.
4. Non-Clustered Index
-
Does not affect the physical order of data.
- Contains pointers to the actual data blocks.
- A table can have multiple non-clustered indexes.
B. Indexes - based on Density
→ Index record density, independent of index type
1. Dense Index ⭐
-
Contains one index entry for every record in the data file.
-
Provides faster lookups because each record has a direct index entry.
-
Requires more storage space== and has ==higher maintenance cost.
-
Commonly used in secondary indexes.
2. Sparse Index ⭐
-
Contains index entries for only some records (typically the first record of each block).
-
Requires less storage space== and ==lower maintenance.
- Searching involves locating the correct block and then scanning it.
-
Commonly used in primary indexes== where data is ==sorted sequentially.
- Note: Sparse indices can be used only if the relation is stored in the sorted order of the search key. ⭐
Sparse Index vs Dense Index
| Feature | Dense Index | Sparse Index |
|---|---|---|
| Index Entries | Contains one entry for every record in the data file. | Contains entries for some records only (typically one per data block). |
| Search Speed | Faster—direct mapping to every record. ⭐ | Slower—may need to search the data block after locating the closest index entry. |
| Storage Space | Requires more storage space due to more index entries. | Requires less storage space. ⭐ |
| Maintenance Cost | Higher—each record insertion/deletion affects the index. | Lower—only some index entries need updates. |
| Usage Scenario | Suitable for small datasets or frequently queried columns. | Suitable for large sequential files where data is sorted. |
Note: Typically, primary indexes are sparse==, and ==secondary indexes are dense.
Dense Index
Data File (sorted on key):+---------+---------+---------+| Key=10 | Key=20 | Key=30 |+---------+---------+---------+
Dense Index:+------+---------+| Key | Pointer |+------+---------+| 10 | → Rec10 || 20 | → Rec20 || 30 | → Rec30 |+------+---------+- Every record in the data file has a corresponding entry in the index.
Sparse Index
Data File (sorted on key):Block1: 10, 20, 30Block2: 40, 50, 60Block3: 70, 80, 90
Sparse Index:+------+----------+| Key | Pointer |+------+----------+| 10 | → Block1 || 40 | → Block2 || 70 | → Block3 |+------+----------+- Only the first key of each block appears in the index.
- To find
Key=50, the DBMS locates Block2 from the index and then searches within that block.
C. Indexes - based on Structure
→ Data structure used to implement index
- Can implement: Primary index, Secondary index and Clustering index
- B+-Tree is default in DBMS
1. B-Tree Index
- Balanced multi-level tree.
- Keys and data pointers at all nodes.
- Slower range queries than B+ tree.
2. B+ Tree Index ⭐
- Balanced tree.
- Data pointers only at leaf nodes.
- Internal nodes store keys only.
- Leaf nodes are linked.
- Excellent for range queries.
- Default index structure in DBMS.
3. Hash Index
- Uses hash function.
- Average O(1) search time.
- No support for range queries.
- Used for equality search only.
4. Bitmap Index
- Uses bit vectors.
- Efficient for low-cardinality attributes.
- Used in data warehousing / OLAP.
B-Tree and B+ Tree Notes for Practical DBMS Questions
Section titled “B-Tree and B+ Tree Notes for Practical DBMS Questions”1. B-Tree
Section titled “1. B-Tree”- Balanced search tree used for dynamic indexing in databases.
- Maintains sorted data and allows search, insert, delete in O(log n) time.
Properties
-
All leaves appear at the same level.
-
Each node can have at most m children (order = m).
-
Each internal node has k keys such that
ceil(m/2)-1 ≤ k ≤ m-1. -
Keys in a node separate subtrees:
For node keys K1 < K2 < ... < Kk:Child C1 < K1 < C2 < K2 < ... < Ck < Kk < Ck+1 -
Root can have minimum 2 children (or 1 key if leaf).
Operations
- Search: Follow keys recursively → O(log n)
- Insertion: Split node if full
- Deletion: Merge or redistribute nodes
Advantages
- Balanced → guaranteed O(log n) search
- Efficient for disk storage (minimizes disk access)
2. B+ Tree
Section titled “2. B+ Tree”- Variant of B-tree used in databases and file systems.
- All values stored at leaf nodes, internal nodes only store keys for indexing.
Properties
- Leaf nodes are linked as a linked list → efficient range queries.
- Internal nodes store keys only, not actual records.
- Order m: Internal nodes have
ceil(m/2) ≤ children ≤ m - Leaf nodes store d ≤ keys ≤ 2d
Advantages over B-Tree
- Range queries and sequential access → very fast (linked leaves)
- Less duplication of data in internal nodes → more keys per node → fewer levels → fewer disk I/O
- Insert/Delete easier to manage
Operations
- Search: Follow keys from root to leaf → O(log n)
- Insertion: Split leaf if full; propagate key to parent
- Deletion: Merge or redistribute leaves; propagate changes up
Comparison Table
Section titled “Comparison Table”| Feature | B-Tree | B+ Tree |
|---|---|---|
| Keys in leaf | Yes | Only pointers in internal nodes, values in leaves |
| Leaf nodes | Not linked | Linked list → fast sequential access |
| Range query | Slower | Fast |
| Tree height | Slightly taller | Slightly shorter (more keys per node) |
| Search | O(log n) | O(log n) |
| Insert/Delete | Complex | Simpler in leaves, internal nodes only updated |
Practical Tips for Solving Questions
Section titled “Practical Tips for Solving Questions”- Given order (m) and keys, always check min/max keys per node.
- Insertion: Start from leaf → insert → split if full → propagate middle key up.
- Deletion: Borrow from sibling → if underflow persists → merge nodes → update parent.
- Range search in B+ Tree: Go to starting key leaf → traverse linked leaf nodes.
- B+ Tree usually preferred in DBMS for indexing because of faster range queries and disk I/O efficiency.
Heap vs BST vs B-Tree (DBMS Practical Notes)
Section titled “Heap vs BST vs B-Tree (DBMS Practical Notes)”| Feature | Heap | BST (Binary Search Tree) | B-Tree |
|---|---|---|---|
| Definition | Complete binary tree satisfying heap property (Max/Min) | Binary tree where left < node < right | Balanced m-ary search tree used for indexing in DBMS |
| Purpose | Efficient priority queue operations | Efficient search, insert, delete in memory | Efficient disk-based search, insert, delete |
| Node Children | 2 (binary heap) | 2 | m (order of tree) |
| Balance | Complete binary → height ≈ log n | Not necessarily balanced → height can be n | Always balanced → height ≈ logₘ n |
| Search | O(n) (no ordering for arbitrary search) | O(h), h = height → worst O(n) | O(log n) guaranteed |
| Insertion | O(log n) (heapify up) | O(h) | O(log n) |
| Deletion | O(log n) (delete root, heapify) | O(h) | O(log n) |
| Range Queries | Not efficient | Possible but traversal required | Very efficient (leaves linked in B+ tree) |
| Use Case | Priority queue, scheduling | In-memory search | Database indexing, file systems |
| Storage | Array / in-memory | Pointer-based nodes | Disk blocks / nodes designed for I/O |
Summary Notes:
- Heap: Best for priority-based retrieval, not good for search.
- BST: Good for in-memory search, can degrade if unbalanced.
- B-Tree/B+ Tree: Best for disk-based databases, ensures balanced structure and efficient search, insert, delete, especially for large datasets.