|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.google.streamhtmlparser.util.HtmlUtils
public final class HtmlUtils
Utility functions for HTML and Javascript that are most likely not interesting to users outside this package.
The HtmlParser
will be open-sourced hence we took the
decision to keep these utilities in this package as well as not to
leverage others that may exist in the google3
code base.
The functionality exposed is designed to be 100% compatible with the corresponding logic in the C-version of the HtmlParser as such we are particularly concerned with cross-language compatibility.
Note: The words Javascript
and ECMAScript
are used
interchangeably unless otherwise noted.
Nested Class Summary | |
---|---|
static class |
HtmlUtils.META_REDIRECT_TYPE
Indicates the type of content contained in the content HTML
attribute of the meta HTML tag. |
Method Summary | |
---|---|
static String |
encodeCharForAscii(char chr)
Encodes the specified character using Ascii for convenient insertion into a single-quote enclosed String . |
static boolean |
isAttributeJavascript(String attribute)
Determines if the HTML attribute specified expects javascript for its value. |
static boolean |
isAttributeStyle(String attribute)
Determines if the HTML attribute specified expects a style
for its value. |
static boolean |
isAttributeUri(String attribute)
Determines if the HTML attribute specified expects a URI
for its value. |
static boolean |
isHtmlSpace(char chr)
Determines if the specified character is an HTML whitespace character. |
static boolean |
isJavascriptIdentifier(char chr)
Determines if the specified character is a valid character in an ECMAScript identifier. |
static boolean |
isJavascriptRegexpPrefix(String input)
Determines if the input token provided is a valid token prefix to a javascript regular expression. |
static boolean |
isJavascriptWhitespace(char chr)
Determines if the specified character is an ECMAScript whitespace or line terminator character. |
static HtmlUtils.META_REDIRECT_TYPE |
parseContentAttributeForUrl(String value)
Parses the given String to determine if it contains a URL in the
format followed by the content attribute of the meta
HTML tag. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Method Detail |
---|
public static boolean isAttributeJavascript(String attribute)
onclick
attribute.
Currently returns true
for any attribute name that starts
with "on" which is not exactly correct but we trust a developer to
not use non-spec compliant attribute names (e.g. onbogus).
attribute
- the name of an HTML attribute
false
if the input is null or is not an attribute
that expects javascript code; true
public static boolean isAttributeStyle(String attribute)
style
for its value. Currently this is only true for the style
HTML attribute.
attribute
- the name of an HTML attribute
true
iff the attribute name is one that expects a
style for a value; otherwise false
public static boolean isAttributeUri(String attribute)
URI
for its value. For example, both href
and src
expect a URI
but style
does not. Returns
false
if the attribute given was null
.
attribute
- the name of an HTML attribute
true
if the attribute name is one that expects
a URI for a value; otherwise null
ATTRIBUTE_EXPECTS_URI
public static boolean isHtmlSpace(char chr)
Space
character
Tab
character
Line feed
character
Carriage Return
character
Zero-Width Space
character
​
)
which is not included in the C version.
chr
- the char
to check
true
if the character is an HTML whitespace character
White spacepublic static boolean isJavascriptWhitespace(char chr)
Tab
, Vertical Tab
,
Form Feed
, Space
,
No-break space
)
Line Feed
,
Carriage Return
, Line separator
,
Paragraph Separator
).
Encompasses the characters in sections 7.2 and 7.3 of ECMAScript 3, in
particular, this list is quite different from that in
Character.isWhitespace
.
ECMAScript Language Specification
chr
- the char
to check
true
or false
public static boolean isJavascriptIdentifier(char chr)
Character.isJavaIdentifierStart
and Character.isJavaIdentifierPart
given that Java
and Javascript follow similar identifier naming rules but we lose
compatibility with the C-version.
chr
- char
to check
true
if the chr
is a Javascript whitespace
character; otherwise false
public static boolean isJavascriptRegexpPrefix(String input)
Set
of identifiers that can precede a regular expression in the
javascript grammar, and returns true
if the provided
String
is in that Set
.
input
- the String
token to check
true
iff the token is a valid prefix of a regexppublic static String encodeCharForAscii(char chr)
String
. Printable characters
are returned as-is. Carriage Return, Line Feed, Horizontal Tab,
back-slash and single quote are all backslash-escaped. All other characters
are returned hex-encoded.
chr
- char
to encode
char
public static HtmlUtils.META_REDIRECT_TYPE parseContentAttributeForUrl(String value)
String
to determine if it contains a URL in the
format followed by the content
attribute of the meta
HTML tag.
This function expects to receive the value of the content
HTML
attribute. This attribute takes on different meanings depending on the
value of the http-equiv
HTML attribute of the same meta
tag. Since we may not have access to the http-equiv
attribute,
we instead rely on parsing the given value to determine if it contains
a URL.
The specification of the meta
HTML tag can be found in:
http://dev.w3.org/html5/spec/Overview.html#attr-meta-http-equiv-refresh
We return HtmlUtils.META_REDIRECT_TYPE
indicating whether the
value contains a URL and whether we are at the start of the URL or past
the start. We are at the start of the URL if and only if one of the two
conditions below is true:
Examples:
meta
tag where the content
attribute contains a URL [we are not at the start of the URL]:
<meta http-equiv="refresh" content="5; URL=http://www.google.com">
meta
tag where the content
attribute contains a URL [we are at the start of the URL]:
<meta http-equiv="refresh" content="5; URL=">
meta
tag where the content
attribute does not contain a URL:
<meta http-equiv="content-type" content="text/html">
value
- String
to parse
HtmlUtils.META_REDIRECT_TYPE
indicating the presence
of a URL in the given value
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |