|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.mg4j.document.ZipDocumentCollectionBuilder
public class ZipDocumentCollectionBuilder
A builder to create ZipDocumentCollection
s.
After creating an instance of this class, it is possible to add incrementally
new documents. Each document must be started with startDocument(CharSequence, CharSequence)
and ended with endDocument()
; inside each document, each non-text field must be written by passing
an object to nonTextField(Object)
, whereas each text field must be
started with startTextField()
and ended with endTextField()
: inbetween, a call
to add(MutableString, MutableString)
must be made for each word/nonword pair retrieved
from the original collection. At the end, close()
returns a ZipDocumentCollection
that must be serialised.
Alternatively, you can just call build(DocumentSequence)
and all the above will
be handled for you.
Each Zip entry corresponds to a document: the title is recorded in the comment field, whereas the
URI is written with MutableString.writeSelfDelimUTF8(java.io.OutputStream)
directly to the zipped output stream. When building an exact
ZipDocumentCollection
subsequent word/nonword pairs are written in the same way, and
delimited by two empty strings. If the collection is not exact, just words are written,
and delimited by an empty string. Non-text fields are written directly to the zipped output stream.
Constructor Summary | |
---|---|
ZipDocumentCollectionBuilder(String zipFilename,
DocumentFactory factory,
boolean exact,
ProgressLogger progressLogger)
Creates a new zipped collection builder. |
Method Summary | |
---|---|
void |
add(MutableString word,
MutableString nonWord)
Adds a word and a nonword to the current text field, provided that a text field has started but not yet ended; otherwise, doesn't do anything. |
ZipDocumentCollection |
build(DocumentSequence inputSequence)
A utility method copying all documents of an input sequence to a zipped collection. |
ZipDocumentCollection |
close()
Terminates the contruction of the zipped collection and returns it. |
void |
endDocument()
Ends a document entry. |
void |
endTextField()
Ends a new text field. |
static void |
main(String[] arg)
|
void |
nonTextField(Object o)
Adds a non-text field. |
void |
startDocument(CharSequence title,
CharSequence uri)
Starts a document entry. |
void |
startTextField()
Starts a new text field. |
void |
virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments)
Adds a virtual field. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public ZipDocumentCollectionBuilder(String zipFilename, DocumentFactory factory, boolean exact, ProgressLogger progressLogger) throws FileNotFoundException
zipFilename
- the filename of the zip file.factory
- the factory of the base document sequence.exact
- true iff also non-words should be preserved.progressLogger
- a progress logger.
FileNotFoundException
Method Detail |
---|
public void startDocument(CharSequence title, CharSequence uri) throws IOException
title
- the document title (usually, the result of Document.title()
).uri
- the document uri (usually, the result of Document.uri()
).
IOException
public void endDocument() throws IOException
IOException
public void startTextField()
public void nonTextField(Object o) throws IOException
o
- the content of the non-text field.
IOException
public void virtualField(ObjectList<Scan.VirtualDocumentFragment> fragments) throws IOException
fragments
- the virtual fragments to be added.
IOException
public void endTextField() throws IOException
IOException
public void add(MutableString word, MutableString nonWord) throws IOException
Usually, word
e nonWord
are just the result of a call
to WordReader.next(MutableString, MutableString)
.
word
- a word.nonWord
- a nonword.
IOException
public ZipDocumentCollection close() throws IOException
IOException
public ZipDocumentCollection build(DocumentSequence inputSequence) throws IOException
IOException
public static void main(String[] arg) throws com.martiansoftware.jsap.JSAPException, IOException, ClassNotFoundException, InvocationTargetException, NoSuchMethodException, IllegalAccessException, InstantiationException
com.martiansoftware.jsap.JSAPException
IOException
ClassNotFoundException
InvocationTargetException
NoSuchMethodException
IllegalAccessException
InstantiationException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |