it.unimi.dsi.sux4j.mph
Class MWHCFunction<T>

java.lang.Object
  extended by it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<T>
      extended by it.unimi.dsi.sux4j.mph.MWHCFunction<T>
All Implemented Interfaces:
Function<T,Long>, Object2LongFunction<T>, Serializable

public class MWHCFunction<T>
extends AbstractObject2LongFunction<T>
implements Serializable

An immutable function stored using the Majewski-Wormald-Havas-Czech 3-hypergraph technique.

Instance of this class store a function from keys to values. Keys are provided by an iterable object (whose iterators must return elements in a consistent order), whereas values are provided by a LongList. A convenient constructor automatically assigns to each key its rank (e.g., its position in iteration order).

As a commodity, this class provides a main method that reads from standard input a (possibly gzip'd) sequence of newline-separated strings, and writes a serialised function mapping each element of the list to its position.

Implementation Details

After generating a random 3-hypergraph with suitably sorted edges, we assign to each vertex a value in such a way that for each 3-hyperedge the xor of the three values associated to its vertices is the required value for the corresponding element of the function domain (this is the standard Majewski-Wormald-Havas-Czech construction).

Then, we measure whether it is favourable to compact the function, that is, to store nonzero values in a separate array, using a ranked marker array to record the positions of nonzero values.

A non-compacted, r-bit MWHCFunction on n elements requires γrn bits, whereas the compacted version takes just (γ + r)n bits (plus the bits that are necessary for the ranking structure; the current implementation uses Rank16). This class will transparently chose the most space-efficient method.

Since:
0.2
Author:
Sebastiano Vigna
See Also:
Serialized Form

Field Summary
protected  LongBigList data
          The final magick—the list of modulo-3 values that define the output of the minimal hash function.
protected  long globalSeed
          The seed used to generate the initial hash triple.
static int LOG2_CHUNK_SIZE
          The logarithm of the desired chunk size.
protected  int m
          The number of vertices of the intermediate hypergraph.
protected  LongArrayBitVector marker
          Optionally, a rank structure built on this bit array is used to mark positions containing non-zero value; indexing in data is made by ranking if this field is non-null.
protected  int n
          The number of elements.
protected  int[] offset
          The start offset of each block.
protected  Rank16 rank
          The ranking structure on marker.
protected  long[] seed
          The seed of the underlying 3-hypergraphs.
static long serialVersionUID
           
protected  TransformationStrategy<? super T> transform
          The transformation strategy to turn objects of type T into bit vectors.
protected  int width
          The data width.
 
Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
defRetValue
 
Constructor Summary
  MWHCFunction(Iterable<? extends T> elements, TransformationStrategy<? super T> transform)
           
  MWHCFunction(Iterable<? extends T> elements, TransformationStrategy<? super T> transform, ChunkedHashStore<T> chunkedHashStore)
           
  MWHCFunction(Iterable<? extends T> elements, TransformationStrategy<? super T> transform, LongList values, int width)
           
  MWHCFunction(Iterable<? extends T> elements, TransformationStrategy<? super T> transform, LongList values, int width, ChunkedHashStore<T> chunkedHashStore)
          Creates a new function for the given elements and values.
protected MWHCFunction(MWHCFunction<T> function)
          Creates a new function by copying a given one; non-transient fields are (shallow) copied.
 
Method Summary
 boolean containsKey(Object o)
           
 long getLong(Object o)
           
 long getLongByTriple(long[] triple)
           
static void main(String[] arg)
           
 long numBits()
          Returns the number of bits used by this structure.
 int size()
          Returns the number of elements in the function domain.
 
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
clear, defaultReturnValue, defaultReturnValue, get, put, put, remove, removeLong
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

serialVersionUID

public static final long serialVersionUID
See Also:
Constant Field Values

LOG2_CHUNK_SIZE

public static final int LOG2_CHUNK_SIZE
The logarithm of the desired chunk size.

See Also:
Constant Field Values

n

protected final int n
The number of elements.


m

protected final int m
The number of vertices of the intermediate hypergraph.


width

protected final int width
The data width.


globalSeed

protected final long globalSeed
The seed used to generate the initial hash triple.


seed

protected final long[] seed
The seed of the underlying 3-hypergraphs.


offset

protected final int[] offset
The start offset of each block.


data

protected final LongBigList data
The final magick—the list of modulo-3 values that define the output of the minimal hash function.


marker

protected final LongArrayBitVector marker
Optionally, a rank structure built on this bit array is used to mark positions containing non-zero value; indexing in data is made by ranking if this field is non-null.


rank

protected final Rank16 rank
The ranking structure on marker.


transform

protected final TransformationStrategy<? super T> transform
The transformation strategy to turn objects of type T into bit vectors.

Constructor Detail

MWHCFunction

public MWHCFunction(Iterable<? extends T> elements,
                    TransformationStrategy<? super T> transform,
                    ChunkedHashStore<T> chunkedHashStore)
             throws IOException
Throws:
IOException

MWHCFunction

public MWHCFunction(Iterable<? extends T> elements,
                    TransformationStrategy<? super T> transform)
             throws IOException
Throws:
IOException

MWHCFunction

public MWHCFunction(Iterable<? extends T> elements,
                    TransformationStrategy<? super T> transform,
                    LongList values,
                    int width)
             throws IOException
Throws:
IOException

MWHCFunction

public MWHCFunction(Iterable<? extends T> elements,
                    TransformationStrategy<? super T> transform,
                    LongList values,
                    int width,
                    ChunkedHashStore<T> chunkedHashStore)
             throws IOException
Creates a new function for the given elements and values.

Parameters:
elements - the elements in the domain of the function.
transform - a transformation strategy for the elements.
values - values to be assigned to each element, in the same order of the iterator returned by elements; if null, the assigned value will the the ordinal number of each element.
width - the bit width of the values.
chunkedHashStore - a (not necessarily checked) chunked hash store containing the elements. ALERT: check for accuracy in wording
Throws:
IOException

MWHCFunction

protected MWHCFunction(MWHCFunction<T> function)
Creates a new function by copying a given one; non-transient fields are (shallow) copied.

Parameters:
function - the function to be copied.
Method Detail

getLong

public long getLong(Object o)
Specified by:
getLong in interface Object2LongFunction<T>

getLongByTriple

public long getLongByTriple(long[] triple)

size

public int size()
Returns the number of elements in the function domain.

Specified by:
size in interface Function<T,Long>
Returns:
the number of the elements in the function domain.

numBits

public long numBits()
Returns the number of bits used by this structure.

Returns:
the number of bits used by this structure.

containsKey

public boolean containsKey(Object o)
Specified by:
containsKey in interface Function<T,Long>

main

public static void main(String[] arg)
                 throws NoSuchMethodException,
                        IOException,
                        JSAPException
Throws:
NoSuchMethodException
IOException
JSAPException