CS 201 - 11/4/14
Hash Table with Chaining
The Hash Table array is really array of
heads of linked lists.
The search through a linked list of a "set"
number of values is Constant (Big-Oh (1))
The linked lists only contain a "set" number of
values if we are able to control/reduce the
number of collisions during the "hash function"
calculation.
In order to help keep the size of the linked lists
to a "set" number of values, we should have
1) the array size be at least twice the expected
number of data elements.
2) the array size should be prime
The speed of the hash table relies on the inverse
relationship of execution speed and allocated
memory size
Closed Hash Tables (No linked list)
- values are stored directly in the array
- the hash table array size MUST be larger than
(or equal to) the number of data items
- resolution of collisions becomes much more
complicated
-when two items are hashed to the same location
- the first item gets stored in that
location
- the second item get store in a nearby
location
linear probing - when a collision occurs, we search for
the next open spot in the table in a linear
search manner.
if pos X is in use, then try pos X+1
if pos X+1 is in use, then try pos X+2
if pos X+2 is in use, then try pos X+3
....
clustering - when multiple data items get hashed to
a small number of nearby positions
linear probing tends to increase clustering
One solution to linear probing is quadratic probing
- we use a quadratic sequence to determine the
next "available location"
if pos X is in use, then try pos X+1
if pos X+1 is in use, then try pos X+4
if pos X+4 is in use, then try pos X+9
if pos X+9 is in use, then try pos X+16
if pos X+16 is in use, then try pos X+25
....
When using a closed hash table, at each array
position we need to store some "Meta-data"
along with the read data for the data items
This meta-data includes a "used" bit
- initially all positions are marked "unused"
- as a value is added into a position, that
position is marked as "used"
- when search for a value that is not in the
hash table, we stop the search only when
we encounter a position that is still "unused"
A closed hash table may need to be regularly
"rehashed" to help increase speed due to
currently empty positions still marked as "used".
A second meta-data element of "empty/inuse" bit
is sometimes helpful with deleting of data items