org.apache.solr.handler.extraction
Class SolrContentHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.apache.solr.handler.extraction.SolrContentHandler
All Implemented Interfaces:
ExtractingParams, ContentHandler, DTDHandler, EntityResolver, ErrorHandler

public class SolrContentHandler
extends DefaultHandler
implements ExtractingParams

The class responsible for handling Tika events and translating them into SolrInputDocuments. This class is not thread-safe.

User's may wish to override this class to provide their own functionality.

See Also:
SolrContentHandlerFactory, ExtractingRequestHandler, ExtractingDocumentLoader

Field Summary
 
Fields inherited from interface org.apache.solr.handler.extraction.ExtractingParams
BOOST_PREFIX, CAPTURE_ATTRIBUTES, CAPTURE_ELEMENTS, DEFAULT_FIELD, EXTRACT_FORMAT, EXTRACT_ONLY, LITERALS_PREFIX, LOWERNAMES, MAP_PREFIX, RESOURCE_NAME, STREAM_TYPE, UNKNOWN_FIELD_PREFIX, XPATH_EXPRESSION
 
Constructor Summary
SolrContentHandler(org.apache.tika.metadata.Metadata metadata, org.apache.solr.common.params.SolrParams params, org.apache.solr.schema.IndexSchema schema)
           
SolrContentHandler(org.apache.tika.metadata.Metadata metadata, org.apache.solr.common.params.SolrParams params, org.apache.solr.schema.IndexSchema schema, Collection<String> dateFormats)
           
 
Method Summary
 void characters(char[] chars, int offset, int length)
           
 void endElement(String uri, String localName, String qName)
           
protected  String findMappedName(String name)
          Get the name mapping
protected  float getBoost(String name)
          Get the value of any boost factor for the mapped name.
 org.apache.solr.common.SolrInputDocument newDocument()
          This is called by a consumer when it is ready to deal with a new SolrInputDocument.
 void startDocument()
           
 void startElement(String uri, String localName, String qName, Attributes attributes)
           
protected  String transformValue(String val, org.apache.solr.schema.SchemaField schFld)
          Can be used to transform input values based on their SchemaField

This implementation only formats dates using the DateUtil.

 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SolrContentHandler

public SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
                          org.apache.solr.common.params.SolrParams params,
                          org.apache.solr.schema.IndexSchema schema)

SolrContentHandler

public SolrContentHandler(org.apache.tika.metadata.Metadata metadata,
                          org.apache.solr.common.params.SolrParams params,
                          org.apache.solr.schema.IndexSchema schema,
                          Collection<String> dateFormats)
Method Detail

newDocument

public org.apache.solr.common.SolrInputDocument newDocument()
This is called by a consumer when it is ready to deal with a new SolrInputDocument. Overriding classes can use this hook to add in or change whatever they deem fit for the document at that time. The base implementation adds the metadata as fields, allowing for potential remapping.

Returns:
The SolrInputDocument.

startDocument

public void startDocument()
                   throws SAXException
Specified by:
startDocument in interface ContentHandler
Overrides:
startDocument in class DefaultHandler
Throws:
SAXException

startElement

public void startElement(String uri,
                         String localName,
                         String qName,
                         Attributes attributes)
                  throws SAXException
Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class DefaultHandler
Throws:
SAXException

endElement

public void endElement(String uri,
                       String localName,
                       String qName)
                throws SAXException
Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class DefaultHandler
Throws:
SAXException

characters

public void characters(char[] chars,
                       int offset,
                       int length)
                throws SAXException
Specified by:
characters in interface ContentHandler
Overrides:
characters in class DefaultHandler
Throws:
SAXException

transformValue

protected String transformValue(String val,
                                org.apache.solr.schema.SchemaField schFld)
Can be used to transform input values based on their SchemaField

This implementation only formats dates using the DateUtil.

Parameters:
val - The value to transform
schFld - The SchemaField
Returns:
The potentially new value.

getBoost

protected float getBoost(String name)
Get the value of any boost factor for the mapped name.

Parameters:
name - The name of the field to see if there is a boost specified
Returns:
The boost value

findMappedName

protected String findMappedName(String name)
Get the name mapping

Parameters:
name - The name to check to see if there is a mapping
Returns:
The new name, if there is one, else name


Copyright © 2011 Apache Software Foundation. All Rights Reserved.