|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<K>
it.unimi.dsi.sux4j.mph.AbstractHashFunction<T>
it.unimi.dsi.sux4j.mph.MinimalPerfectHashFunction<T>
public class MinimalPerfectHashFunction<T>
A minimal perfect hash function.
Given a list of elements without duplicates, the constructors of this class finds a minimal perfect hash function for
the list. Subsequent calls to the getLong(Object)
method will return a distinct
number for each elements in the list. For elements out of the list, the resulting number is not specified. In
some (rare) cases it might be possible to establish that an element was not in the original list, and
in that case -1 will be returned. The class can then be saved by serialisation and reused later.
The theoretical memory requirements are 2γ
=2.46 + o(n) bits per element, plus the bits
for the random hashes (which are usually negligible). The o(n) part is due to
an embedded ranking scheme that increases space
occupancy by 0.625%, bringing the actual occupied space to around 2.65 bits per element.
At construction time, however, about 15n integers (i.e., 60n bytes) are necessary.
As a commodity, this class provides a main method that reads from standard input a (possibly gzip'd) sequence of newline-separated strings, and writes a serialised minimal perfect hash function for the given list.
The technique used is very similar (but developed independently) to that described by Botelho, Pagh and Ziviani in “Simple and Efficient Minimal Perfect Hashing Functions”, Proc. WADS 2007, number 4619 of Lecture Notes in Computer Science, pages 139−150, 2007. In turn, the mapping technique described therein was actually first proposed by Chazelle, Kilian, Rubinfeld and Tal in “The Bloomier Filter: an Efficient Data Structure for Static Support Lookup Tables”, Proc. SODA 2004, pages 30−39, 2004, as one of the steps to implement a mutable table.
The basic ingredient is the Majewski-Wormald-Havas-Czech 3-hypergraph technique. After generating a random 3-hypergraph with suitably sorted edges, we assign to each vertex a two-bit number in such a way that for each 3-hyperedge the sum of the values associated to its vertices modulo 3 gives the index of the hash function generating the hinge. The hinge is distinct for each 3-hyperedge, and as a result we obtain a perfect hash of the original set (one just has to compute the three hash functions, collect the three two-bit values, add them modulo 3 and take the corresponding hash function value).
To obtain a minimal perfect hash, we simply notice that we whenever we have to assign a value to a vertex,
we can take care of using the number 3 instead of 0 if the vertex is actually the output value for some element. As
a result, the final value of the minimal perfect hash function is the number of nonzero pairs of bits that precede
the perfect hash value for the element.
To compute this number, we use a simple table-free ranking scheme, recording the number
of nonzero pairs each BITS_PER_BLOCK
bits and modifying the
standard broadword algorithm for computing the number of ones
in a word into an algorithm that counts
the number of nonzero pairs of bits in a word.
Field Summary | |
---|---|
protected long[] |
array
The bit array supporting values . |
static int |
BITS_PER_BLOCK
The number of bits per block in the rank structure. |
protected int[] |
count
The number of nonzero bit pairs up to a given block of BITS_PER_BLOCK bits. |
protected int |
m
The number of vertices of the intermediate hypergraph. |
protected int |
n
The number of elements. |
protected long |
seed
The seed of the underlying 3-hypergraph. |
static long |
serialVersionUID
|
protected TransformationStrategy<? super T> |
transform
The transformation strategy. |
protected LongBigList |
values
The final magick—the list of modulo-3 values that define the output of the minimal hash function. |
Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction |
---|
defRetValue |
Constructor Summary | |
---|---|
|
MinimalPerfectHashFunction(Iterable<? extends T> elements,
TransformationStrategy<? super T> transform)
Creates a new minimal perfect hash table for the given elements. |
protected |
MinimalPerfectHashFunction(MinimalPerfectHashFunction<T> mph)
Creates a new minimal perfect hash by copying a given one; non-transient fields are (shallow) copied. |
Method Summary | |
---|---|
static int |
countNonzeroPairs(long x)
Counts the number of nonzero pairs of bits in a long. |
long |
getLong(Object key)
|
static void |
main(String[] arg)
|
long |
numBits()
Returns the number of bits used by this structure. |
int |
rank(long x)
|
int |
size()
|
Methods inherited from class it.unimi.dsi.sux4j.mph.AbstractHashFunction |
---|
containsKey |
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction |
---|
clear, defaultReturnValue, defaultReturnValue, get, put, put, remove, removeLong |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final long serialVersionUID
public static final int BITS_PER_BLOCK
protected final int n
protected final int m
protected final long seed
protected final LongBigList values
protected final long[] array
values
.
protected final int[] count
BITS_PER_BLOCK
bits.
protected final TransformationStrategy<? super T> transform
Constructor Detail |
---|
public MinimalPerfectHashFunction(Iterable<? extends T> elements, TransformationStrategy<? super T> transform)
elements
- the elements to hash.transform
- a transformation strategy for the elements.protected MinimalPerfectHashFunction(MinimalPerfectHashFunction<T> mph)
mph
- the perfect hash to be copied.Method Detail |
---|
public long numBits()
public static int countNonzeroPairs(long x)
This method uses a very simple modification of the classical broadword algorithm to count the number of ones in a word. Usually, in the first step of the algorithm one computes the number of ones in each pair of bits. Instead, we compute one iff at least one of the bits in the pair is nonzero.
x
- a long.
x
.public int rank(long x)
public long getLong(Object key)
getLong
in interface Object2LongFunction<T>
public int size()
size
in interface Function<T,Long>
size
in class AbstractHashFunction<T>
public static void main(String[] arg) throws NoSuchMethodException, IOException, JSAPException
NoSuchMethodException
IOException
JSAPException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |