org.apache.jackrabbit.core.query
Class PdfTextFilter

java.lang.Object
  extended byorg.apache.jackrabbit.core.query.PdfTextFilter
All Implemented Interfaces:
org.apache.jackrabbit.core.query.TextFilter

public class PdfTextFilter
extends Object
implements org.apache.jackrabbit.core.query.TextFilter

Extracts texts from Adobe PDF document binary data. Taken from Jakarta Slide class org.apache.slide.extractor.PDFExtractor


Constructor Summary
PdfTextFilter()
           
 
Method Summary
 boolean canFilter(String mimeType)
           
 Map doFilter(org.apache.jackrabbit.core.state.PropertyState data, String encoding)
          Returns a map with a single entry for field FieldNames.FULLTEXT.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PdfTextFilter

public PdfTextFilter()
Method Detail

canFilter

public boolean canFilter(String mimeType)
Specified by:
canFilter in interface org.apache.jackrabbit.core.query.TextFilter
Returns:
true for application/pdf, false otherwise.

doFilter

public Map doFilter(org.apache.jackrabbit.core.state.PropertyState data,
                    String encoding)
             throws RepositoryException
Returns a map with a single entry for field FieldNames.FULLTEXT.

Specified by:
doFilter in interface org.apache.jackrabbit.core.query.TextFilter
Parameters:
data - object containing Adobe PDF document data.
encoding - text encoding is not used, since it is specified in the data.
Returns:
a map with a single Reader value for field FieldNames.FULLTEXT.
Throws:
RepositoryException - if data is a multi-value property or it does not contain valid PDF document.


Copyright © -2005 The Apache Software Foundation. All Rights Reserved.