Apache JMeter
2.0.1.20050615

org.apache.jmeter.protocol.http.parser
Class HTMLParser

java.lang.Object
  extended byorg.apache.jmeter.protocol.http.parser.HTMLParser

public abstract class HTMLParser
extends Object

HtmlParsers can parse HTML content to obtain URLs.

Version:
$Revision: 1.23 $ updated on $Date: 2004/03/24 03:04:46 $
Author:
Jordi Salvat i Alabart

Nested Class Summary
static class HTMLParser.Test
           
 
Constructor Summary
protected HTMLParser()
          Protected constructor to prevent instantiation except from within subclasses.
 
Method Summary
 Iterator getEmbeddedResourceURLs(byte[] html, URL baseUrl)
          Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
 Iterator getEmbeddedResourceURLs(byte[] html, URL baseUrl, Collection coll)
          Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
abstract  Iterator getEmbeddedResourceURLs(byte[] html, URL baseUrl, URLCollection coll)
          Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
static HTMLParser getParser()
           
static HTMLParser getParser(String htmlParserClassName)
           
protected  boolean isReusable()
          Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLParser

protected HTMLParser()
Protected constructor to prevent instantiation except from within subclasses.

Method Detail

getParser

public static final HTMLParser getParser()

getParser

public static final HTMLParser getParser(String htmlParserClassName)

getEmbeddedResourceURLs

public Iterator getEmbeddedResourceURLs(byte[] html,
                                        URL baseUrl)
                                 throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

URLs should not appear twice in the returned iterator.

Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.

Parameters:
html - HTML code
baseUrl - Base URL from which the HTML code was obtained
Returns:
an Iterator for the resource URLs
Throws:
HTMLParseException

getEmbeddedResourceURLs

public abstract Iterator getEmbeddedResourceURLs(byte[] html,
                                                 URL baseUrl,
                                                 URLCollection coll)
                                          throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

All URLs should be added to the Collection.

Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException. N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

Parameters:
html - HTML code
baseUrl - Base URL from which the HTML code was obtained
coll - URLCollection
Returns:
an Iterator for the resource URLs
Throws:
HTMLParseException

getEmbeddedResourceURLs

public Iterator getEmbeddedResourceURLs(byte[] html,
                                        URL baseUrl,
                                        Collection coll)
                                 throws HTMLParseException
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc... N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

Parameters:
html - HTML code
baseUrl - Base URL from which the HTML code was obtained
coll - Collection - will contain URLString objects, not URLs
Returns:
an Iterator for the resource URLs
Throws:
HTMLParseException

isReusable

protected boolean isReusable()
Parsers should over-ride this method if the parser class is re-usable, in which case the class will be cached for the next getParser() call.

Returns:
true if the Parser is reusable

Apache JMeter
2.0.1.20050615

Copyright © 1998-2005 Apache Software Foundation. All Rights Reserved.