it.unimi.dsi.mg4j.tool
Class PartitionDocumentally

java.lang.Object
  extended by it.unimi.dsi.mg4j.tool.PartitionDocumentally

public class PartitionDocumentally
extends Object

Partitions an index documentally.

A global index is partitioned documentally by providing a DocumentalPartitioningStrategy that specifies a destination local index for each document, and a local document pointer. The global index is scanned, and the postings are partitioned among the local indices using the provided strategy. For instance, a ContiguousDocumentalStrategy divides an index into blocks of contiguous documents.

Since each local index contains a (proper) subset of the original set of documents, it contains in general a (proper) subset of the terms in the global index. Thus, the local term numbers and the global term numbers will not in general coincide. As a result, when a set of local indices is accessed transparently as a single index using a DocumentalCluster, a call to Index.documents(int) will throw an UnsupportedOperationException, because there is no way to map the global term numbers to local term numbers.

On the other hand, a call to Index.documents(CharSequence) will be passed each local index to build a global iterator. To speed up this phase for not-so-frequent terms, when partitioning an index you can require the construction of Bloom filters that will be used to try to avoid inquiring indices that do not contain a term. The precision of the filters is settable.

The property file will use a DocumentalMergedCluster unless you provide a ContiguousDocumentalStrategy, in which case a DocumentalConcatenatedCluster will be used instead. Note that there might be other cases in which the latter is adapt, in which case you can edit manually the property file. Important: this class just partitions the index. No auxiliary files (most notably, term maps or prefix maps) will be generated. Please refer to a StringMap implementation (e.g., ShiftAddXorSignedStringMap or ImmutableExternalPrefixMap).

Write-once output and distributed index partitioning

Plase see PartitionLexically—the same comments apply.

Since:
1.0.1
Author:
Alessandro Arrabito, Sebastiano Vigna

Field Summary
static int DEFAULT_BUFFER_SIZE
          The default buffer size for all involved indices.
 
Constructor Summary
PartitionDocumentally(String inputBasename, String outputBasename, DocumentalPartitioningStrategy strategy, String strategyFilename, int bloomFilterPrecision, int bufferSize, Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval)
           
 
Method Summary
static void main(String[] arg)
           
 void run()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_BUFFER_SIZE

public static final int DEFAULT_BUFFER_SIZE
The default buffer size for all involved indices.

See Also:
Constant Field Values
Constructor Detail

PartitionDocumentally

public PartitionDocumentally(String inputBasename,
                             String outputBasename,
                             DocumentalPartitioningStrategy strategy,
                             String strategyFilename,
                             int bloomFilterPrecision,
                             int bufferSize,
                             Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags,
                             boolean interleaved,
                             boolean skips,
                             int quantum,
                             int height,
                             int skipBufferSize,
                             long logInterval)
                      throws ConfigurationException,
                             IOException,
                             ClassNotFoundException,
                             SecurityException,
                             InstantiationException,
                             IllegalAccessException
Throws:
ConfigurationException
IOException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
Method Detail

run

public void run()
         throws Exception
Throws:
Exception

main

public static void main(String[] arg)
                 throws ConfigurationException,
                        IOException,
                        URISyntaxException,
                        ClassNotFoundException,
                        Exception
Throws:
ConfigurationException
IOException
URISyntaxException
ClassNotFoundException
Exception