DSA Material

Hash Table

Share to

A hash table maps keys to values by converting each key into a bucket index.

Book alignment (Chapter 11: Hash Tables)

This page is aligned to the hash-table chapter in the book:

Even when using Python dict, understanding these mechanisms is essential for reasoning about average vs worst-case behavior.

Core ideas

This chapter uses separate chaining (each bucket stores a small list of key-value pairs).

Invariants

Operations

set(key, value)

Compute bucket index, search the bucket for an existing key, and either update or append.

Hash table set operation

def set(self, key, value):
    index = self._index(key)
    bucket = self.buckets[index]

    for i, (stored_key, _) in enumerate(bucket):
        if stored_key == key:
            bucket[i] = (key, value)
            return

    bucket.append((key, value))
    self.size += 1
    if self.load_factor > self.max_load_factor:
        self._resize(self.capacity * 2)

get(key)

Jump directly to one bucket, then scan only inside that bucket.

Hash table get operation

def get(self, key):
    bucket = self.buckets[self._index(key)]
    for stored_key, stored_value in bucket:
        if stored_key == key:
            return stored_value
    raise KeyError(key)

delete(key)

Find matching pair in the bucket and remove it. Other buckets are untouched.

Hash table delete operation

def delete(self, key):
    bucket = self.buckets[self._index(key)]
    for i, (stored_key, _) in enumerate(bucket):
        if stored_key == key:
            del bucket[i]
            self.size -= 1
            return True
    return False

resize(new_capacity)

Allocate new buckets and reinsert every pair. This rebuild keeps bucket chains short.

def _resize(self, new_capacity):
    old_items = [pair for bucket in self.buckets for pair in bucket]
    self.capacity = max(4, new_capacity)
    self.buckets = [[] for _ in range(self.capacity)]
    self.size = 0

    for key, value in old_items:
        self.set(key, value)

Full implementation

class HashTable:
    def __init__(self, capacity=16, max_load_factor=0.75):
        self.capacity = max(4, capacity)
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0
        self.max_load_factor = max_load_factor

    @property
    def load_factor(self):
        return self.size / self.capacity

    def _index(self, key):
        return hash(key) % self.capacity

    def set(self, key, value):
        index = self._index(key)
        bucket = self.buckets[index]

        for i, (stored_key, _) in enumerate(bucket):
            if stored_key == key:
                bucket[i] = (key, value)
                return

        bucket.append((key, value))
        self.size += 1
        if self.load_factor > self.max_load_factor:
            self._resize(self.capacity * 2)

    def get(self, key):
        bucket = self.buckets[self._index(key)]
        for stored_key, stored_value in bucket:
            if stored_key == key:
                return stored_value
        raise KeyError(key)

    def delete(self, key):
        bucket = self.buckets[self._index(key)]
        for i, (stored_key, _) in enumerate(bucket):
            if stored_key == key:
                del bucket[i]
                self.size -= 1
                return True
        return False

    def contains(self, key):
        bucket = self.buckets[self._index(key)]
        return any(stored_key == key for stored_key, _ in bucket)

    def _resize(self, new_capacity):
        old_items = [pair for bucket in self.buckets for pair in bucket]
        self.capacity = max(4, new_capacity)
        self.buckets = [[] for _ in range(self.capacity)]
        self.size = 0

        for key, value in old_items:
            self.set(key, value)

Complexity summary

Operation Average Worst case
Insert O(1) O(n)
Lookup O(1) O(n)
Delete O(1) O(n)
Resize O(n) O(n)

Worst-case behavior appears when many keys collide into one bucket. Rehashing reduces that risk.

Collision strategy note

Two common strategies exist:

Python dict uses a highly optimized open-addressing approach with additional tricks.

Practical notes

Typical mistakes