community/kernel/pom.xml · 67ddb8743598aab82ad6898378d781676e71d565 · 小白蛋 / Neo4jsource

Primitive hash collections and maps using hop-scotch hashing · 67ddb874

Mattias Persson authored Apr 09, 2014

Hop-scotch hashing is an algorithm for dealing with conflicts coming from
hashing values to an index and not really about the hashing itself, as any
hash function can be used with it.

Measured to be comparable to other leading libraries with equal
functionality and with substantially less, also dry, code.
The algorithm itself is pulled out into its own class and clocks in on
about 200-300 LOC, whereas the hash set/map implementations calls that
algorithm with one-liners. The state itself is abstracted into a Table interface
which is trivial to implement with, for example a primitive long[], a
combination of long[] and int[] for a long->int primitive map, a.s.o.

Implementation was driven by an iterative randomizing testing framework
also added in this commit where actions are defined, for example "add" and
"remove" with checks that goes with the actions. In the event a check not
returning OK the execution stops, reduces the list of actions down to a
minimum (by brute-force-reduction one action at a time) and finally prints
a java-code test case to reproduce the problem. Confidence in the code is
high since it has been tested for a very large combination of values
and sizes.

These sets and maps should be preferable over java.util collections in
cases where the key is int or long and access is single threaded.
If need be a multi-threaded version of the algorithm can be added later.

The new sets and maps and all interface it depends on from kernel has been
moved into a primitive-collections module that kernel depends on. Since
everything else depends on kernel this change will barely be noticable for
any other component, apart from class import changes.

67ddb874