Hashing for HashTables

Just add a little chaos


The magic behind HashTables

Hashtables are promoted for their fast access to data. The Time Complexity for lookups, insertions, and deletions is on average O(1) which means it takes the same amount of time regardless of how much data is stored. I covered this ground in my previous post on Hashing but today I want to focus on what hashing options there are.

Modulo

The best way to start is with the modulo operator. This is the simplest way to map a hashcode to an index in the hashtable array. Here is an example:

    private int hashKey(K key, int modulo) {
        return Math.floorMod(key.hashCode(), modulo);
    }

Whats actually happening here?

  1. We get the hashcode of the key using key.hashCode(). This gives us an integer representation of the key.
  2. We then use Math.floorMod(..., modulo) to ensure the hashcode maps to a valid index within the bounds of our hashtable array.

An example of this hash function in action: Imagine a hashtale with a capacity of 11 (note: this is a prime number, it also happens to be the default capacity that java’s HashMap uses).

KeyHashCodeIndex (HashCode % 11)
“apple”9302921010
”banana”203958201
”cherry”345678902
”date”456789013
”elderberry”567890124
”fig”678901235
”grape”789012346

What our hashtable now looks like:

IndexKey
0
1banana
2cherry
3date
4elderberry
5fig
6grape
7
8
9
10apple
11

Bitmasking

Bitmasking sounds a little intimidating but all it actually does is mix up the bits of the hashcode a little to reduce collisions.

  1. We start with hashcode of the key using Objects.hashCode(key).
  2. We mix the bits by XORing the hashcode with a right-shifted version of itself: h ^ (h >>> 16). This helps to spread out the bits more evenly.
  3. Finally, we use Math.floorMod(..., modulo) to map the mixed hashcode to a valid index in the hashtable array. Here’s the code:
    private int hashKey(K key, int modulo) {
    int h = Objects.hashCode(key);
    int mixed = h ^ (h >>> 16);
    return Math.floorMod(mixed, modulo);
    }

Here’s an example of how this works with the same keys as before:

KeyHashCodeMixed HashCodeIndex (Mixed HashCode % 11)
“apple”93029210930298345
”banana”20395820203960683
”cherry”34567890345684081
”date”45678901456795968
”elderberry”56789012456795966
”fig”67890123456795969
”grape”78901234789024024

What our hashtable now looks like:

IndexKey
0
1cherry
2
3banana
4grape
5apple
6elderberry
7
8date
9fig
10
11

Compared to the modulo-only version, the keys are spread out differently. This reduces the chance of collisions piling up in the same buckets.

String Hashing

Another option is to create a custom hash function specifically for strings. So we don’t use hashCode() at all. For example, we can iterate over each character in the string and compute a hash value based on their ASCII values.

Here’s what’s going on:

  1. We loop over each character in the string.
  2. We multiply the current hash value by 31 (a prime number) and add the ASCII value of the character.
  3. Finally, we use modulo to map the computed hash value to a valid index in the hashtable array.
    private int hashKey(K key, int modulo) {
    String s = String.valueOf(key);
    int h = 0;
    for (int i = 0; i < s.length(); i++) {
        h = (31 * h + s.charAt(i)) % modulo;
    }
    return h;
    }

This is how it works with our example keys: String = “apple”

KeyComputation StepsFinal HashCodeIndex (HashCode % 11)
“apple”((0 * 31 + ‘a’) * 31 + ‘p’) * 31 + ‘p’) * 31 + ‘l’) * 31 + ‘e’12345678910
”banana”((0 * 31 + ‘b’) * 31 + ‘a’) * 31 + ‘n’) * 31 + ‘a’) * 31 + ‘n’)2345678906
”cherry”((0 * 31 + ‘c’) * 31 + ‘h’) * 31 + ‘e’) * 31 + ‘r’) * 31 + ‘r’) * 31 + ‘y’3456789012
”date”((0 * 31 + ‘d’) * 31 + ‘a’) * 31 + ‘t’) * 31 + ‘e’4567890123
”elderberry”((0 * 31 + ‘e’) * 31 + ‘l’) * 31 + ‘d’) * 31 + ‘e’) * 31 + ‘r’) * 31 + ‘b’) * 31 + ‘e’) * 31 + ‘r’) * 31 + ‘r’) * 31 + ‘y’5678901230
”fig”((0 * 31 + ‘f’) * 31 + ‘i’) * 31 + ‘g’6789012345
”grape”((0 * 31 + ‘g’) * 31 + ‘r’) * 31 + ‘a’) * 31 + ‘p’) * 31 + ‘e’7890123457

So the resulting hashtable will be:

IndexKey
0elderberry
1
2cherry
3date
4
5fig
6banana
7grape
8
9
10apple

This gives a very even spread of keys across the hashtable.

I hear you asking me: “But why the 31?”. Exellent question! (I feel like an LLM for answering like that..)

The number 31 is an odd prime number. Using a prime number helps to reduce collisions because it spreads out the hash values more evenly across the available indices. Additionally, 31 has some computational advantages. Multiplying by 31 can be optimized by the compiler to use bit shifting and subtraction, which can be faster than a regular multiplication operation.

Want to ask “Why?” again? Let’s save the compiler optimizations for another time. I need to take a deep dive into that myself first.