|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
AbstractHashFunction<K> | A very minimal abstract hash implementation. |
BitstreamImmutablePaCoTrie<T> | A succinct implementation of a binary partial compacted trie based on a recursive bitstream. |
Hashes | Basic hash functions. |
HollowTrie<T> | A hollow trie, that is, a compacted trie recording just the length of the paths associated to the internal nodes. |
HollowTrieDistributor<T> | A distributor based on a hollow trie. |
HollowTrieMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a hollow trie as a distributor. |
HypergraphSorter<T> | A class implementing the 3-hypergraph edge sorting procedure that is necessary for the Majewski-Wormald-Havas-Czech technique. |
LcpMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses longest common prefixes as distributors. |
MinimalPerfectHashFunction<T> | A minimal perfect hash function. |
MWHCFunction<T> | An immutable function stored using the Majewski-Wormald-Havas-Czech 3-hypergraph technique. |
PaCoTrieMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a partial compacted binary trie (PaCo trie) as distributor. |
RelativeTrieDistributor<T> | A distributor based on a relative trie. |
RelativeTrieMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a relative trie as a distributor. |
TwoStepsLcpMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses
longest common prefixes as distributors, and store their lengths using a TwoStepsMWHCFunction . |
TwoStepsMWHCFunction<T> | A read-only function stored using two Majewski-Wormald-Havas-Czech functions—one for frequent values, and one for infrequent values. |
Minimal perfect hash functions.
This package provides a number of state-of-the-art implementations of static (i.e., immutable) minimal perfect hash functions, and, more generally, of static functions from objects to integers. The classes can be gathered in three broad groups:
MWHCFunction
and TwoStepsMWHCFunction
). They can be used to associate arbitrary values to a set of objects,
so, in particular, they can be used to implement order-preserving minimal perfect hashing (elements are mapped
to their order in which they were provided, independently of their lexicographical order). They are also essential
building blocks for all other classes.
MinimalPerfectHashFunction
); they map a set
of n object to the set n = { 0, 1,… n − 1 }.
LcpMonotoneMinimalPerfectHashFunction
,
PaCoTrieMonotoneMinimalPerfectHashFunction
, HollowTrie
and HollowTrieMonotoneMinimalPerfectHashFunction
). These
functions requires keys to be prefix-free and provided in lexicographical order; they will map back each key to its position using
a very small number of bits per element, providing different space/time tradeoffs (in what follows,
l is maximum the string length):
LcpMonotoneMinimalPerfectHashFunction
is very fast, as it has just to evaluate three MWHCFunction
s
(so if the length of the strings is a constant multiplied by the machine word, it is actually constant time); however it uses
2.7 + log log n + log l bits per element. TwoStepsLcpMonotoneMinimalPerfectHashFunction
gains
a few bits by performing some additional compression, but it is usually slower (albeit always constant time).PaCoTrieMonotoneMinimalPerfectHashFunction
is slower, as it uses a partial compacted trie
(which requires linear time to be accessed) to distribute keys between buckets; theoretically it uses
2.7 + 2 log l bits per element, but the partial compacted trie is every efficiency in exploiting data redundancy, so the actual
occupancy is in general half with respect to the previous function.HollowTrie
is very slow as it has to traverse a succinct trie on the whole key set; it
is mainly of theoretical interest, as it uses just 2 + log l bits per element, and also in practice it
occupies less space than the previous two functions.HollowTrieMonotoneMinimalPerfectHashFunction
is slow, as it
uses a enriched hollow trie as a distributor, but it is faster than a hollow trie,
and it has the (quite surprising) property of using 3.44 + 1.23 log log l bits per element (note the double log). In practice,
it will use less than a byte per element for strings of length up to a billion bits.RelativeTrieMonotoneMinimalPerfectHashFunction
is faster than
HollowTrieMonotoneMinimalPerfectHashFunction
, but occupies in practice more space
(even if, from an asymptotic viewpoint, the space required is the same). Presently it is mainly of theoretical interest.LcpMonotoneMinimalPerfectHashFunction
and RelativeTrieMonotoneMinimalPerfectHashFunction
were introduced by Djamal Belazzougui, Paolo Boldi, Rasmus Pagh and Sebastiano Vigna
in “Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses”,
Proc. of the 20th Annual ACM–SIAM Symposium On Discrete Mathematics (SODA), ACM Press, 2009.
TwoStepsLcpMonotoneMinimalPerfectHashFunction
, PaCoTrieMonotoneMinimalPerfectHashFunction
,
HollowTrie
and HollowTrieMonotoneMinimalPerfectHashFunction
were introduced
by the same authors in “Theory and Practise of Minimal Monotone Perfect Hashing” (the class MWHCFunction
implements a compacted version
of the classical 3-hypergraph-based structure introduced therein).
Functions in this package implement the Object2LongFunction
interface. However,
the underlying machinery manipulates bit vectors only. To bring you own data
into the bit vector world, each contructor requires to specify a transformation strategy
that maps your objects into bit vectors. For instance, TransformationStrategies.utf16()
,
TransformationStrategies.prefixFreeUtf16()
, TransformationStrategies.iso()
,
and TransformationStrategies.prefixFreeIso()
are ready-made strategies that can be used with character sequences.
Note that if you plain to use monotone hashing, you must provide objects in an order such that the corresponding bit vectors
are lexicographically ordered. For instance, TransformationStrategies.utf16()
obtain this
results by concatenating the reversed 16-bit representation of each character.
All functions in this package will return a value in their range for most of the keys that are not in their domain. In other words, they will produce false positives; in the few cases in which it is possible to detect a negative, you will get the default return value.
If you are interested in getting a more precise behaviour (e.g., you are migrating from the deprecated SignedMinimalPerfectHash
class that was distributed with MG4J), you can sign a map, that is, you can record
a signature for each key and use it to filter false positives. A signing class for character sequences is provided
by the DSI utilities class
ShiftAddXorSignedStringMap. By
creating a function using one of the implementation provided with Sux4J and signing it using the above class, you can obtain the same functionality of
the old signed classes, but you can choose the size of the signature, whether to require monotonicity, and also the space/time tradeoff
of your function.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |