org.apache.cocoon.portal.util
Class HtmlDomParser

java.lang.Object
  extended byorg.apache.xerces.parsers.XMLParser
      extended byorg.apache.xerces.parsers.AbstractXMLDocumentParser
          extended byorg.apache.xerces.parsers.AbstractDOMParser
              extended byorg.apache.cocoon.portal.util.HtmlDomParser
All Implemented Interfaces:
org.apache.xerces.xni.XMLDocumentHandler, org.apache.xerces.xni.XMLDTDContentModelHandler, org.apache.xerces.xni.XMLDTDHandler

public class HtmlDomParser
extends org.apache.xerces.parsers.AbstractDOMParser

This parser uses the nekohtml parser to parse html and generate a document.

Version:
$Id: HtmlDomParser.java 280293 2005-09-12 08:52:02Z cziegeler $

Field Summary
 
Fields inherited from class org.apache.xerces.parsers.AbstractDOMParser
abort, CORE_DOCUMENT_CLASS_NAME, CREATE_CDATA_NODES_FEATURE, CREATE_ENTITY_REF_NODES, CURRENT_ELEMENT_NODE, DEFAULT_DOCUMENT_CLASS_NAME, DEFER_NODE_EXPANSION, DOCUMENT_CLASS_NAME, fBaseURIStack, fCreateCDATANodes, fCreateEntityRefNodes, fCurrentCDATASection, fCurrentCDATASectionIndex, fCurrentEntityDecl, fCurrentNode, fCurrentNodeIndex, fDeferNodeExpansion, fDeferredDocumentImpl, fDeferredEntityDecl, fDocument, fDocumentClassName, fDocumentImpl, fDocumentIndex, fDocumentType, fDocumentTypeIndex, fDOMFilter, fErrorHandler, fFilterReject, fFirstChunk, fInCDATASection, fIncludeComments, fIncludeIgnorableWhitespace, fInDTD, fInDTDExternalSubset, fInEntityRef, fInternalSubset, fNamespaceAware, fRejectedElement, fRoot, fSkippedElemStack, fStorePSVI, fStringBuffer, INCLUDE_COMMENTS_FEATURE, INCLUDE_IGNORABLE_WHITESPACE, NAMESPACES, PSVI_DOCUMENT_CLASS_NAME
 
Fields inherited from class org.apache.xerces.parsers.AbstractXMLDocumentParser
fDocumentSource, fDTDContentModelSource, fDTDSource
 
Fields inherited from class org.apache.xerces.parsers.XMLParser
ENTITY_RESOLVER, ERROR_HANDLER, fConfiguration
 
Fields inherited from interface org.apache.xerces.xni.XMLDTDHandler
CONDITIONAL_IGNORE, CONDITIONAL_INCLUDE
 
Fields inherited from interface org.apache.xerces.xni.XMLDTDContentModelHandler
OCCURS_ONE_OR_MORE, OCCURS_ZERO_OR_MORE, OCCURS_ZERO_OR_ONE, SEPARATOR_CHOICE, SEPARATOR_SEQUENCE
 
Constructor Summary
HtmlDomParser(Properties properties)
           
 
Method Summary
protected static org.cyberneko.html.HTMLConfiguration getConfig(Properties properties)
           
static Document parse(String systemId, InputStream stream, String encoding)
          Parse html.
 
Methods inherited from class org.apache.xerces.parsers.AbstractDOMParser
abort, attributeDecl, characters, comment, createAttrNode, createElementNode, doctypeDecl, elementDecl, emptyElement, endAttlist, endCDATA, endConditional, endDocument, endDTD, endElement, endExternalSubset, endGeneralEntity, endParameterEntity, externalEntityDecl, getDocument, getDocumentClassName, handleBaseURI, handleBaseURI, ignorableWhitespace, ignoredCharacters, internalEntityDecl, notationDecl, processingInstruction, reset, setCharacterData, setDocumentClassName, setLocale, startAttlist, startCDATA, startConditional, startDocument, startDTD, startElement, startExternalSubset, startGeneralEntity, startParameterEntity, textDecl, unparsedEntityDecl, xmlDecl
 
Methods inherited from class org.apache.xerces.parsers.AbstractXMLDocumentParser
any, element, empty, endContentModel, endGroup, getDocumentSource, getDTDContentModelSource, getDTDSource, occurrence, pcdata, separator, setDocumentSource, setDTDContentModelSource, setDTDSource, startContentModel, startGroup
 
Methods inherited from class org.apache.xerces.parsers.XMLParser
parse
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlDomParser

public HtmlDomParser(Properties properties)
Method Detail

getConfig

protected static org.cyberneko.html.HTMLConfiguration getConfig(Properties properties)

parse

public static Document parse(String systemId,
                             InputStream stream,
                             String encoding)
                      throws IOException
Parse html.

Throws:
IOException


Copyright ? 1999-2005 The Apache Software Foundation. All Rights Reserved.