org.pdfbox.util
Class PDFHighlighter

java.lang.Object
  extended byorg.pdfbox.util.PDFStreamEngine
      extended byorg.pdfbox.util.PDFTextStripper
          extended byorg.pdfbox.util.PDFHighlighter

public class PDFHighlighter
extends PDFTextStripper

Highlighting of words in a PDF document with an XML file.

Version:
$Revision: 1.5 $
Author:
slagraulet (slagraulet@cardiweb.com), Ben Litchfield (ben@csh.rit.edu)
See Also:
Adobe Highlight File Format

Field Summary
 
Fields inherited from class org.pdfbox.util.PDFTextStripper
output
 
Constructor Summary
PDFHighlighter()
          Default constructor.
 
Method Summary
protected  void endPage(PDPage pdPage)
          End a page.
 void generateXMLHighlight(PDDocument pdDocument, String[] sWords, Writer xmlOutput)
          Generate an XML highlight string based on the PDF.
 void generateXMLHighlight(PDDocument pdDocument, String highlightWord, Writer xmlOutput)
          Generate an XML highlight string based on the PDF.
 Color getHighlightColor()
          Get the color to highlight the strings with.
 String getHighlightColorAsString()
          Get the highlight color as an HTML like string.
static void main(String[] args)
          Command line application.
 void setHighlightColor(Color color)
          Get the color to highlight the strings with.
 void setHighlightColor(String color)
          Set the highlight color using HTML like rgb string.
 
Methods inherited from class org.pdfbox.util.PDFTextStripper
endDocument, endParagraph, flushText, getCharactersByArticle, getCurrentPage, getCurrentPageNo, getEndBookmark, getEndPage, getLineSeparator, getOutput, getPageSeparator, getStartBookmark, getStartPage, getText, getText, getWordSeparator, processPage, processPages, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSuppressDuplicateOverlappingText, showCharacter, startDocument, startPage, startParagraph, writeCharacters, writeText, writeText
 
Methods inherited from class org.pdfbox.util.PDFStreamEngine
getColorSpaces, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, processOperator, processOperator, processStream, processSubStream, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFHighlighter

public PDFHighlighter()
               throws IOException
Default constructor.

Throws:
IOException - If there is an error constructing this class.
Method Detail

generateXMLHighlight

public void generateXMLHighlight(PDDocument pdDocument,
                                 String highlightWord,
                                 Writer xmlOutput)
                          throws IOException
Generate an XML highlight string based on the PDF.

Parameters:
pdDocument - The PDF to find words in.
highlightWord - The word to search for.
xmlOutput - The resulting output xml file.
Throws:
IOException - If there is an error reading from the PDF, or writing to the XML.

generateXMLHighlight

public void generateXMLHighlight(PDDocument pdDocument,
                                 String[] sWords,
                                 Writer xmlOutput)
                          throws IOException
Generate an XML highlight string based on the PDF.

Parameters:
pdDocument - The PDF to find words in.
sWords - The words to search for.
xmlOutput - The resulting output xml file.
Throws:
IOException - If there is an error reading from the PDF, or writing to the XML.

endPage

protected void endPage(PDPage pdPage)
                throws IOException
Description copied from class: PDFTextStripper
End a page. Default implementation is to do nothing. Subclasses may provide additional information.

Overrides:
endPage in class PDFTextStripper
Parameters:
pdPage - The page we are about to process.
Throws:
IOException - If there is any error writing to the stream.
See Also:
PDFTextStripper.endPage( PDPage )

main

public static void main(String[] args)
                 throws IOException
Command line application.

Parameters:
args - The command line arguments to the application.
Throws:
IOException - If there is an error generating the highlight file.

getHighlightColor

public Color getHighlightColor()
Get the color to highlight the strings with. Default is Color.YELLOW.

Returns:
The color to highlight strings with.

setHighlightColor

public void setHighlightColor(Color color)
Get the color to highlight the strings with. Default is Color.YELLOW.

Parameters:
color - The color to highlight strings with.

setHighlightColor

public void setHighlightColor(String color)
Set the highlight color using HTML like rgb string. The string must be 6 characters long.

Parameters:
color - The color to use for highlighting. Should be in the format of "FF0000".

getHighlightColorAsString

public String getHighlightColorAsString()
Get the highlight color as an HTML like string. This will return a string of six characters.

Returns:
The current highlight color. For example FF0000