it.unimi.dsi.sux4j.mph
Class HypergraphSorter<T>

java.lang.Object
  extended by it.unimi.dsi.sux4j.mph.HypergraphSorter<T>

public class HypergraphSorter<T>
extends Object

A class implementing the 3-hypergraph edge sorting procedure that is necessary for the Majewski-Wormald-Havas-Czech technique.

Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech have described in “A family of perfect hashing methods”, Comput. J., 39(6):547−554, 1996, a 3-hypergraph based technique to store functions (actually, the paper uses the technique just to store a permutation of the key set, but it is clear it can be used to store any function). More generally, the procedure first generates a random 3-hypergraph whose edges correspond to elements of the function domain. The, it sorts the edges of the random 3-hypergraph so that for each edge at least one vertex, the hinge, never appeared before in the sorted edge list (this happens with high probability if the number of vertices is at least γ times the number of edges).

Instances of this class contain the data necessary to generate the random hypergraph and apply the sorting procedure. At construction time, you provide just the desired number of edges; then, each call to generateAndSort() will generate a new 3-hypergraph using a 64-bit seed, an iterator returning the key set, and a corresponding TransformationStrategy. If the method returns true, the sorting was successful and in the public field stack you can retrieve the opposite of the desired order (so enumerating edges starting from the last in stack you are guaranteed to find each time a vertex that never appeared before). The public fields edge, numEdges and numVertices expose the structure of the generated 3-hypergraph. For m edges, the number of vertices will be &lceil γm ⌉ + 1, unless m is zero, in which case the number of vertices will be zero, too.

To guarantee the same results when reading a Majewski-Wormald-Havas-Czech-like structure, the method bitVectorToEdge() can be used to retrieve, starting from a bit vector, the corresponding edge. While having a function returning the edge starting from a key would be more object-oriented and avoid hidden dependencies, it would also require storing the transformation provided at construction time, which would make this class non-thread-safe. Just be careful to transform the keys into bit vectors using the same TransformationStrategy used to generate the random 3-hypergraph.

Support for preprocessed keys

This class provides two special access points for classes that have pre-digested their keys. The methods generateAndSort(Iterator, long) and tripleToEdge(long[], long, int, int[]) use fixed-length 192-bit keys under the form of triples of longs. The intended usage is that of turning the keys into such a triple using Jenkins's hash and then operating directly on the hash codes. This is particularly useful in chunked constructions, where the keys are replaced by their 192-bit hashes in the first place. Note that the hashes are actually rehashed using Hashes.jenkins(long[], long, long[])—this is necessary to vary the associated edges whenever the generated 3-hypergraph is not acyclic.

Warning: you cannot mix the bitvector-based and the triple-based constructors and static methods. It is your responsibility to pair them correctly.

Implementation details

We use Jenkin's hash in its 64-bit incarnation: beside providing an excellent hash function, it actually computes three 64-bit hash values, which is exactly what we need.

Rounds and Logging

Building and sorting a large 3-hypergraph may take hours. As it happens with all probabilistic algorithms, one can just give estimates of the expected time.

There are two probabilistic sources of problems: duplicate edges and non-acyclic hypergraphs. However, the probability of duplicate edges is vanishing when n approaches infinity, and once the hypergraph has been generated, the stripping procedure succeeds in an expected number of trials that tends to 1 as n approaches infinity.

To help diagnosing problem with the generation process class, this class will log at DEBUG level what's happening.

Note that if during the generation process the log warns more than once about duplicate edges (this happens at INFO level), you should suspect that there are duplicates in the string list, as duplicate edges are extremely unlikely.

Author:
Sebastiano Vigna

Nested Class Summary
static class HypergraphSorter.Result
           
 
Field Summary
 int[][] edge
          An 3×n array recording the triple of vertices involved in each edge.
static double GAMMA
          The mythical threshold (or better, a very closed upper bound of): random 3-hypergraphs are acyclic with high probability if the ratio vertices/edges exceeds this constant.
 int numEdges
          The number of edges in the hypergraph.
 int numVertices
          The number of vertices in the hypergraph ( ⌈ GAMMA * numEdges ⌉ + 1 ).
 int[] stack
          The edge stack.
 
Constructor Summary
HypergraphSorter(int numEdges)
          Creates a hypergraph sorter for a given number of edges.
 
Method Summary
static void bitVectorToEdge(BitVector bv, long seed, int numVertices, int[] e)
          Turns a bit vector into an edge.
 HypergraphSorter.Result generateAndSort(Iterator<? extends T> iterator, TransformationStrategy<? super T> transform, long seed)
          Generates a random 3-hypergraph and tries to sort its edges.
 HypergraphSorter.Result generateAndSort(Iterator<long[]> iterator, long seed)
          Generates a random 3-hypergraph and tries to sort its edges.
static void tripleToEdge(long[] triple, long seed, int numVertices, int[] e)
          Turns a triple of longs into an edge.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GAMMA

public static final double GAMMA
The mythical threshold (or better, a very closed upper bound of): random 3-hypergraphs are acyclic with high probability if the ratio vertices/edges exceeds this constant.

See Also:
Constant Field Values

numVertices

public final int numVertices
The number of vertices in the hypergraph ( ⌈ GAMMA * numEdges ⌉ + 1 ).


numEdges

public final int numEdges
The number of edges in the hypergraph.


edge

public final int[][] edge
An 3×n array recording the triple of vertices involved in each edge. It is *reversed* w.r.t. what you would expect to reduce object creation.


stack

public final int[] stack
The edge stack. Used also to invert the edge permutation.

Constructor Detail

HypergraphSorter

public HypergraphSorter(int numEdges)
Creates a hypergraph sorter for a given number of edges.

Parameters:
numEdges - the number of edges of this hypergraph sorter.
Method Detail

bitVectorToEdge

public static void bitVectorToEdge(BitVector bv,
                                   long seed,
                                   int numVertices,
                                   int[] e)
Turns a bit vector into an edge.

This method will never return a degenerate edge. However, if there are no edges the vector e will be filled with -1.

Parameters:
bv - a bit vector.
seed - the seed for the hash function.
numVertices - the number of vertices in the underlying hypergraph.
e - an array to store the resulting edge.

tripleToEdge

public static void tripleToEdge(long[] triple,
                                long seed,
                                int numVertices,
                                int[] e)
Turns a triple of longs into an edge.

This method will never return a degenerate edge. However, if there are no edges the vector e will be filled with -1.

Parameters:
triple - a triple of intermediate hashes.
seed - the seed for the hash function.
numVertices - the number of vertices in the underlying hypergraph.
e - an array to store the resulting edge.

generateAndSort

public HypergraphSorter.Result generateAndSort(Iterator<? extends T> iterator,
                                               TransformationStrategy<? super T> transform,
                                               long seed)
Generates a random 3-hypergraph and tries to sort its edges.

Parameters:
iterator - an iterator returning numEdges keys.
transform - a transformation from keys to bit vectors.
seed - a 64-bit random seed.
Returns:
true if the sorting procedure succeeded.

generateAndSort

public HypergraphSorter.Result generateAndSort(Iterator<long[]> iterator,
                                               long seed)
Generates a random 3-hypergraph and tries to sort its edges.

Parameters:
iterator - an iterator returning numEdges triples of longs.
seed - a 64-bit random seed.
Returns:
true if the sorting procedure succeeded.