com.quiotix.html.parser
Class HtmlScrubber
java.lang.Object
com.quiotix.html.parser.HtmlVisitor
com.quiotix.html.parser.HtmlScrubber
- public class HtmlScrubber
- extends HtmlVisitor
HtmlScrubber is a Visitor which walks an HtmlDocument and cleans it up.
It can change tags and tag attributes to uppercase or lowercase, strip
out unnecessary quotes from attribute values, and strip trailing spaces
before a newline.
- Author:
- Brian Goetz, Quiotix
Additional contributions by: Thorsten Weber
Constructor Summary |
HtmlScrubber()
Create an HtmlScrubber with the default options (downcase tags and
tag attributes, strip out unnecessary quotes.) |
HtmlScrubber(int flags)
Create an HtmlScrubber with the desired set of options. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TAGS_UPCASE
public static final int TAGS_UPCASE
- See Also:
- Constant Field Values
TAGS_DOWNCASE
public static final int TAGS_DOWNCASE
- See Also:
- Constant Field Values
ATTR_UPCASE
public static final int ATTR_UPCASE
- See Also:
- Constant Field Values
ATTR_DOWNCASE
public static final int ATTR_DOWNCASE
- See Also:
- Constant Field Values
STRIP_QUOTES
public static final int STRIP_QUOTES
- See Also:
- Constant Field Values
TRIM_SPACES
public static final int TRIM_SPACES
- See Also:
- Constant Field Values
DEFAULT_OPTIONS
public static final int DEFAULT_OPTIONS
- See Also:
- Constant Field Values
flags
protected int flags
previousElement
protected HtmlDocument.HtmlElement previousElement
inPreBlock
protected boolean inPreBlock
HtmlScrubber
public HtmlScrubber()
- Create an HtmlScrubber with the default options (downcase tags and
tag attributes, strip out unnecessary quotes.)
HtmlScrubber
public HtmlScrubber(int flags)
- Create an HtmlScrubber with the desired set of options.
- Parameters:
flags
- A bitmask representing the desired scrubbing options
start
public void start()
- Overrides:
start
in class HtmlVisitor
visit
public void visit(HtmlDocument.Tag t)
- Overrides:
visit
in class HtmlVisitor
visit
public void visit(HtmlDocument.EndTag t)
- Overrides:
visit
in class HtmlVisitor
visit
public void visit(HtmlDocument.Text t)
- Overrides:
visit
in class HtmlVisitor
visit
public void visit(HtmlDocument.Comment c)
- Overrides:
visit
in class HtmlVisitor
visit
public void visit(HtmlDocument.Newline n)
- Overrides:
visit
in class HtmlVisitor
visit
public void visit(HtmlDocument.Annotation a)
- Overrides:
visit
in class HtmlVisitor
visit
public void visit(HtmlDocument.TagBlock bl)
- Overrides:
visit
in class HtmlVisitor