top of page

Hash Tables

Tutoring > Data Structures > Hash Tables

Hash tables store key/value pairs in a table. Time complexity is O(1) for inserting, deleting and serching and O(n log n) for traversing.

They are great for quickly looking something up, the time complexity for looking up an item in a hash table is O(1) which is constant time. Each object in our table has a unique key such that we can access the object by applying a hash function to the key.

The hash function is an operation which, when the key is given as an input, returns an index of the memory where the values are stored.

The values, known as hashes are stored in an array of memory slots. 

The load factor of a table is the number of entries divided by the number of buckets. If hash tables have a high load factor, they lose much of their speed advantage. It's desirable to keep the load factor under 50%.

 

Rehashing is the process of creating another hash table when there is no more space left. It is a good idea to create a table twice the size of the original table as this will bring the load factor down to around 50% again, improving efficiency.

 

Hash Collisions are when two or more keys point to the same value. We want to avoid hash collisions if we can. Here are some ways to get around the problems caused by collisions:

 

  • Buckets 

These are data types that group objects together. When two or more keys collide, the elements are bundled together into buckets. A disadvantage of this is that you need to allocate a two-dimensional array from the start which can be costly in terms of memory if the table is large. Another drawback is that we may need to search through the entire column of a particular key in order to find the one we want.

 

  • Direct chaining

We can make one of the colliding keys point to a linked list in memory instead of a value. In the linked list we store the multiple hashes that were previously colliding. This way there are no collisions, but it does take a litle longer to access hashes in the linked lists as we now have to search through them to access values, which is O(n log n).

 

  • Linear probing

The idea is that we search for an element by decreasing the index by one if a value already occupies the address we are trynig to store data in. if there is still a collision then we decrease by one again until we find an empty slot. If the search gets to index 0 then we start again from the biggest element in the array and continue decreasing by one each time. 

 

This is fine until we want to actually find a key, we may have to follow it's possble memory locations until we find it or hit an empty one (in which case the entry is not present). Linear probing can also cause clustering which makes searching inefficient.

 

  • Double hashing

To get around the problem of clustering caused by linear probing we can apply a secondary hash function to decide how many slots to jump if the current slot is full. There are different approaches to this but the main thing to remember is that it will be more efficient than just stepping left by one each time

bottom of page