|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Class Summary | |
---|---|
AbstractHashFunction<K> | A very minimal abstract hash implementation. |
Hashes | Basic hash functions. |
HollowTrieDistributor<T> | A distributor based on a hollow trie. |
HollowTrieDistributorMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a hollow trie as a distributor. |
HollowTrieMonotoneMinimalPerfectHashFunction<T> | A hollow trie, that is, a compacted trie recording just the length of the paths associated to the internal nodes. |
HypergraphSorter<T> | A class implementing the 3-hypergraph edge sorting procedure that is necessary for the Majewski-Wormald-Havas-Czech technique. |
LcpMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses longest common prefixes as distributors. |
MinimalPerfectHashFunction<T> | A minimal perfect hash function. |
MWHCFunction<T> | An immutable function stored using the Majewski-Wormald-Havas-Czech 3-hypergraph technique. |
PaCoTrieDistributor<T> | A succinct implementation of a binary partial compacted trie based on a recursive bitstream. |
PaCoTrieDistributorMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a partial compacted binary trie (PaCo trie) as distributor. |
TwoStepsLcpMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses
longest common prefixes as distributors, and store their lengths using a TwoStepsMWHCFunction . |
TwoStepsMWHCFunction<T> | A read-only function stored using two Majewski-Wormald-Havas-Czech functions—one for frequent values, and one for infrequent values. |
ZFastTrieDistributor<T> | A distributor based on a z-fast trie. |
ZFastTrieDistributorMonotoneMinimalPerfectHashFunction<T> | A monotone minimal perfect hash implementation based on fixed-size bucketing that uses a z-fast trie as a distributor. |
Enum Summary | |
---|---|
HypergraphSorter.Result |
Minimal perfect hash functions.
This package provides a number of state-of-the-art implementations of static (i.e., immutable) minimal perfect hash functions, and, more generally, of static functions from objects to integers. The classes can be gathered in three broad groups:
MWHCFunction
and TwoStepsMWHCFunction
). They can be used to associate arbitrary values to a set of objects,
so, in particular, they can be used to implement order-preserving minimal perfect hashing (elements are mapped
to their order in which they were provided, independently of their lexicographical order). They are also essential
building blocks for all other classes.
MinimalPerfectHashFunction
); they map a set
of n object to the set n = { 0, 1,… n − 1 }.
LcpMonotoneMinimalPerfectHashFunction
,
PaCoTrieDistributorMonotoneMinimalPerfectHashFunction
, HollowTrieMonotoneMinimalPerfectHashFunction
and HollowTrieDistributorMonotoneMinimalPerfectHashFunction
). These
functions requires keys to be prefix-free and provided in lexicographical order; they will map back each key to its position using
a very small number of bits per element, providing different space/time tradeoffs (in what follows,
l is maximum the string length):
LcpMonotoneMinimalPerfectHashFunction
is very fast, as it has just to evaluate three MWHCFunction
s
(so if the length of the strings is a constant multiplied by the machine word, it is actually constant time); however it uses
2.14 + log log n + log l bits per element. TwoStepsLcpMonotoneMinimalPerfectHashFunction
gains
a few bits by performing some additional compression, but it is usually slightly slower (albeit always constant time).PaCoTrieDistributorMonotoneMinimalPerfectHashFunction
is slower, as it uses a partial compacted trie
(which requires linear time to be accessed) to distribute keys between buckets; theoretically it uses
2.14 + log(l - log n) bits per element, but the partial compacted trie is every efficiency in exploiting data redundancy, so the actual
occupancy is in general half with respect to the previous function.HollowTrieMonotoneMinimalPerfectHashFunction
is rather slow as it has to traverse a succinct trie on the whole key set;
it uses just 4 + log l + log log lvar> bits per element, and in practice it
is the monotone minimal perfect hash function that uses less space.HollowTrieDistributorMonotoneMinimalPerfectHashFunction
is slow, as it
uses a enriched hollow trie as a distributor, but it is faster than a hollow trie,
and it has the (quite surprising) property of using 3.21 + 1.23 log log l bits per element (note the double log). In practice,
it will use less than a byte per element for strings of length up to a billion bits.ZFastTrieDistributorMonotoneMinimalPerfectHashFunction
is faster than
HollowTrieDistributorMonotoneMinimalPerfectHashFunction
, but occupies in practice more space,
even if, from an asymptotic viewpoint, the space required is the same. Presently it is mainly of theoretical interest, but
it has the best behaviour when scaling to very large data sets (billions of strings).LcpMonotoneMinimalPerfectHashFunction
and ZFastTrieDistributorMonotoneMinimalPerfectHashFunction
were introduced by Djamal Belazzougui, Paolo Boldi, Rasmus Pagh and Sebastiano Vigna
in “Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses”,
Proc. of the 20th Annual ACM–SIAM Symposium On Discrete Mathematics (SODA), ACM Press, 2009.
TwoStepsLcpMonotoneMinimalPerfectHashFunction
, PaCoTrieDistributorMonotoneMinimalPerfectHashFunction
,
HollowTrieMonotoneMinimalPerfectHashFunction
and HollowTrieDistributorMonotoneMinimalPerfectHashFunction
were introduced
by the same authors in “Theory and Practise of Minimal Monotone Perfect Hashing” (the class MWHCFunction
implements a compacted version
of the classical 3-hypergraph-based structure introduced therein).
Functions in this package implement the Object2LongFunction
interface. However,
the underlying machinery manipulates bit vectors only. To bring you own data
into the bit vector world, each constructor requires to specify a transformation strategy
that maps your objects into bit vectors. For instance, TransformationStrategies.utf16()
,
TransformationStrategies.prefixFreeUtf16()
, TransformationStrategies.iso()
,
and TransformationStrategies.prefixFreeIso()
are ready-made strategies that can be used with character sequences.
Note that if you plain to use monotone hashing, you must provide objects in an order such that the corresponding bit vectors
are lexicographically ordered. For instance, TransformationStrategies.utf16()
obtain this
results by concatenating the reversed 16-bit representation of each character.
All functions in this package will return a value in their range for most of the keys that are not in their domain. In other words, they will produce false positives; in the few cases in which it is possible to detect a negative, you will get the default return value.
If you are interested in getting a more precise behaviour (e.g., you are migrating from the deprecated SignedMinimalPerfectHash
class that was distributed with MG4J), you can sign a function, that is, you can record
a signature for each key and use it to filter false positives. A signing class for character sequences is provided
by the DSI utilities class
ShiftAddXorSignedStringMap. By
creating a function using one of the implementation provided with Sux4J and signing it using the above class, you can obtain the same functionality of
the old signed classes, but you can choose the size of the signature, whether to require monotonicity, and also the space/time tradeoff
of your function. Alternatively, by signing with
LiterallySignedStringMap
we will get a two-way function (i.e., a full
StringMap implementation).
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |