|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.mg4j.index.Index
it.unimi.dsi.mg4j.index.BitStreamIndex
public abstract class BitStreamIndex
A bitstream-based index. Instances of this class contains additional index data related to compression, such as the codes used for each part of the index.
Implementing subclasses must provide access to the index bitstream both at byte and bit level. A bitstream-based index usually exposes term or prefix maps, but this is not compulsory. Additionally, the index could also expose the offset list and the size list; the latter, in particular, is compulsory with certain codings.
The standard readers associated to an instance of this class are of type BitStreamIndexReader
.
Nonetheless, it is possible to generate automatically sources for wired classes that
work only for a particular set of codings and flags. The wired classes will be fetched
automagically by reflection, if available. Please read the section about performance in the MG4J manual.
Nested Class Summary | |
---|---|
static class |
BitStreamIndex.PropertyKeys
Symbolic names for additional properties of a BitStreamIndex . |
Nested classes/interfaces inherited from class it.unimi.dsi.mg4j.index.Index |
---|
Index.EmptyIndexIterator, Index.UriKeys |
Field Summary | |
---|---|
int |
bufferSize
The size of the buffer used to read the bit stream. |
CompressionFlags.Coding |
countCoding
The coding for counts. |
static int |
DEFAULT_BUFFER_SIZE
The default buffer size. |
static int |
DEFAULT_HEIGHT
The default height (fairly low, due to memory consumption). |
static int |
DEFAULT_QUANTUM
The default quantum (4% of index size). |
static int |
FIXED_POINT_BITS
Fixed number of fractional binary digits used in fixed-point computation of Golomb moduli. |
static long |
FIXED_POINT_MULTIPLIER
1L << . |
CompressionFlags.Coding |
frequencyCoding
The coding for frequencies. |
int |
height
The parameter h (the maximum height of a skip tower), or -1 if this index has no skips. |
LongList |
offsets
The offset of each term, if offsets were loaded or specified at creation time, or null . |
CompressionFlags.Coding |
pointerCoding
The coding for pointers. |
CompressionFlags.Coding |
positionCoding
The coding for positions. |
PrefixMap<? extends CharSequence> |
prefixMap
The prefix map for this index, or null if the prefix map was not loaded. |
int |
quantum
The quantum, or -1 if this index has no skips, or 0 if this is a BitStreamHPIndex and quanta are variable. |
Constructor<? extends IndexReader> |
readerConstructor
The constructor that will be used to create new index readers. |
StringMap<? extends CharSequence> |
termMap
The term map for this index, or null if the term map was not loaded. |
Fields inherited from class it.unimi.dsi.mg4j.index.Index |
---|
field, hasCounts, hasPayloads, hasPositions, keyIndex, maxCount, numberOfDocuments, numberOfOccurrences, numberOfPostings, numberOfTerms, payload, properties, singletonSet, sizes, termProcessor |
Constructor Summary | |
---|---|
BitStreamIndex(int numberOfDocuments,
int numberOfTerms,
long numberOfPostings,
long numberOfOccurrences,
int maxCount,
Payload payload,
CompressionFlags.Coding frequencyCoding,
CompressionFlags.Coding pointerCoding,
CompressionFlags.Coding countCoding,
CompressionFlags.Coding positionCoding,
int quantum,
int height,
int bufferSize,
TermProcessor termProcessor,
String field,
Properties properties,
StringMap<? extends CharSequence> termMap,
PrefixMap<? extends CharSequence> prefixMap,
IntList sizes,
LongList offsets)
|
Method Summary | |
---|---|
IndexIterator |
documents(CharSequence prefix,
int limit)
Returns a MultiTermIndexIterator over all terms starting with the given prefix,
provided their number does not exceed the given limit and that this index has a prefixMap . |
protected static String |
featureName(CompressionFlags.Coding coding)
|
static int |
gaussianGolombModulus(long quantumSigma,
int shift)
Computes the Gaussian Golomb modulus for a given standard deviation and shift using fixed-point arithmetic. |
protected Constructor<? extends IndexReader> |
getConstructor()
|
abstract InputBitStream |
getInputBitStream(int bufferSize)
Returns an input bit stream over the index. |
abstract InputStream |
getInputStream()
Returns an input stream over the index. |
IndexReader |
getReader(int bufferSize)
Creates and returns a new IndexReader based on this index. |
static int |
golombModulus(int p,
int q)
Computes the Golomb modulus for a given fraction using fixed-point arithmetic and a precomputed table for small values. |
static long |
quantumSigma(int frequency,
int numberOfDocuments,
int quantum)
Computes the standard deviation associated to a given quantum and document frequency. |
String |
toString()
|
Methods inherited from class it.unimi.dsi.mg4j.index.Index |
---|
documents, documents, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getEmptyIndexIterator, getInstance, getInstance, getInstance, getInstance, getReader, getTermProcessor, keyIndex |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final int DEFAULT_HEIGHT
public static final int DEFAULT_QUANTUM
public static final int DEFAULT_BUFFER_SIZE
public final CompressionFlags.Coding frequencyCoding
CompressionFlags
.
public final CompressionFlags.Coding pointerCoding
CompressionFlags
.
public final CompressionFlags.Coding countCoding
CompressionFlags
.
public final CompressionFlags.Coding positionCoding
CompressionFlags
.
public final LongList offsets
null
.
public final StringMap<? extends CharSequence> termMap
null
if the term map was not loaded.
public final PrefixMap<? extends CharSequence> prefixMap
null
if the prefix map was not loaded.
public final int height
h
(the maximum height of a skip tower), or -1 if this index has no skips.
public final int quantum
BitStreamHPIndex
and quanta are variable.
public final int bufferSize
public final Constructor<? extends IndexReader> readerConstructor
public static final int FIXED_POINT_BITS
public static final long FIXED_POINT_MULTIPLIER
1L << FIXED_POINT_BITS
.
Constructor Detail |
---|
public BitStreamIndex(int numberOfDocuments, int numberOfTerms, long numberOfPostings, long numberOfOccurrences, int maxCount, Payload payload, CompressionFlags.Coding frequencyCoding, CompressionFlags.Coding pointerCoding, CompressionFlags.Coding countCoding, CompressionFlags.Coding positionCoding, int quantum, int height, int bufferSize, TermProcessor termProcessor, String field, Properties properties, StringMap<? extends CharSequence> termMap, PrefixMap<? extends CharSequence> prefixMap, IntList sizes, LongList offsets)
Method Detail |
---|
protected Constructor<? extends IndexReader> getConstructor()
protected static String featureName(CompressionFlags.Coding coding)
public abstract InputBitStream getInputBitStream(int bufferSize) throws IOException
bufferSize
- a suggested buffer size.
IOException
public abstract InputStream getInputStream() throws IOException
IOException
public IndexReader getReader(int bufferSize) throws IOException
Index
IndexReader
based on this index. After that, you
can use the reader to read this index.
getReader
in class Index
bufferSize
- the size of the buffer to be used accessing the reader, or -1
for a default buffer size.
IndexReader
to read this index.
IOException
public IndexIterator documents(CharSequence prefix, int limit) throws IOException, TooManyTermsException
MultiTermIndexIterator
over all terms starting with the given prefix,
provided their number does not exceed the given limit and that this index has a prefixMap
.
documents
in class Index
prefix
- a prefix.limit
- a limit on the number of terms that will be used to resolve
the prefix query; if the terms starting with prefix
are more than
limit
, a TooManyTermsException
will be thrown.
IOException
- if an exception occurred while accessing the index.
TooManyTermsException
- if there are more than limit
terms starting with prefix
.public static int golombModulus(int p, int q)
p
/q
) / log( 1 - p
/q
) ⌉,
but the computation is orders of magnitude quicker.
p
- the numerator.q
- the denominator (larger than or equal to p
).
p
/q
.public static int gaussianGolombModulus(long quantumSigma, int shift)
The Golomb modulus for (positive and negative) integers normally distributed with standard deviation σ can be computed using the formula ⌈ 2 sqrt( 2 / π ) ln(2) σ ⌉.
The resulting Golomb modulus is near to optimal for coding such
integers after they have been passed through Fast.int2nat(int)
. Note,
however, that Golomb coding is not optimal for a normal distribution.
This function is used to compute the correct Golomb modulus for skip towers.
quantumSigma
- the standard deviation of a quantum as returned by quantumSigma(int, int, int)
.shift
- a shift parameter.
quantumSigma
by
the square root of 2shift
-1.public static long quantumSigma(int frequency, int numberOfDocuments, int quantum)
frequency
- the document frequency.numberOfDocuments
- the overall number of documents.quantum
- the quantum.
Math.sqrt( quantum * ( 1 - p ) ) / p
, where
p
is the relative frequency.public String toString()
toString
in class Object
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |