it.unimi.dsi.mg4j.tool
Class PartitionDocumentally
java.lang.Object
it.unimi.dsi.mg4j.tool.PartitionDocumentally
public class PartitionDocumentally
- extends Object
Partitions an index documentally.
A global index is partitioned documentally by providing a DocumentalPartitioningStrategy
that specifies a destination local index for each document, and a local document pointer. The global index
is scanned, and the postings are partitioned among the local indices using the provided strategy. For instance,
a ContiguousDocumentalStrategy
divides an index into blocks of contiguous documents.
Since each local index contains a (proper) subset of the original set of documents, it contains in general a (proper)
subset of the terms in the global index. Thus, the local term numbers and the global term numbers will not in general coincide.
As a result, when a set of local indices is accessed transparently as a single index
using a DocumentalCluster
,
a call to Index.documents(int)
will throw an UnsupportedOperationException
,
because there is no way to map the global term numbers to local term numbers.
On the other hand, a call to Index.documents(CharSequence)
will be passed each local index to
build a global iterator. To speed up this phase for not-so-frequent terms, when partitioning an index you can require
the construction of Bloom filters that will be used to try to avoid
inquiring indices that do not contain a term. The precision of the filters is settable.
The property file will use a DocumentalMergedCluster
unless you provide
a ContiguousDocumentalStrategy
, in which case a
DocumentalConcatenatedCluster
will be used instead. Note that there might
be other cases in which the latter is adapt, in which case you can edit manually the property file.
Important: this class just partitions the index. No auxiliary files (most notably, term maps
or prefix maps) will be generated. Please refer to a StringMap
implementation (e.g.,
ShiftAddXorSignedStringMap
or ImmutableExternalPrefixMap
).
Write-once output and distributed index partitioning
Plase see PartitionLexically
—the same comments apply.
- Since:
- 1.0.1
- Author:
- Alessandro Arrabito, Sebastiano Vigna
Constructor Summary |
PartitionDocumentally(String inputBasename,
String outputBasename,
DocumentalPartitioningStrategy strategy,
String strategyFilename,
int bloomFilterPrecision,
int bufferSize,
Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags,
boolean interleaved,
boolean skips,
int quantum,
int height,
int skipBufferSize,
long logInterval)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_BUFFER_SIZE
public static final int DEFAULT_BUFFER_SIZE
- The default buffer size for all involved indices.
- See Also:
- Constant Field Values
PartitionDocumentally
public PartitionDocumentally(String inputBasename,
String outputBasename,
DocumentalPartitioningStrategy strategy,
String strategyFilename,
int bloomFilterPrecision,
int bufferSize,
Map<CompressionFlags.Component,CompressionFlags.Coding> writerFlags,
boolean interleaved,
boolean skips,
int quantum,
int height,
int skipBufferSize,
long logInterval)
throws ConfigurationException,
IOException,
ClassNotFoundException,
SecurityException,
InstantiationException,
IllegalAccessException
- Throws:
ConfigurationException
IOException
ClassNotFoundException
SecurityException
InstantiationException
IllegalAccessException
run
public void run()
throws Exception
- Throws:
Exception
main
public static void main(String[] arg)
throws ConfigurationException,
IOException,
URISyntaxException,
ClassNotFoundException,
Exception
- Throws:
ConfigurationException
IOException
URISyntaxException
ClassNotFoundException
Exception