Module Iterator
source code
Iterate over records of a XML parse tree.
The standard parser is callback based over all the elements of a file.
If the file contains records, many people would like to be able to
iterate over each record and only use the callback parser to analyze the
record.
If the expression is a 'ParseRecords', then the code to do this is
easy; use its make_reader to grab records and its record_expression to
parse them. However, this isn't general enough. The use of a
ParseRecords in the format definition should be strictly a implementation
decision for better memory use. So there needs to be an API which allows
both full and record oriented parsers.
Here's an example use of the API: >>> import sys >>>
import swissprot38 # one is in Martel/test/testformats >>> from
xml.dom import pulldom >>> iterator =
swissprot38.format.make_iterator("swissprot38_record")
>>> text = open("sample.swissprot").read()
>>> for record in iterator.iterateString(text,
pulldom.SAX2DOM()): .. print "Read a record with the following
AC numbers:" ... for acc in
record.document.getElementsByTagName("ac_number"): ...
acc.writexml(sys.stdout) ... sys.stdout.write(" ")
...
There are several parts to this API. First is the 'Iterator
There are two parts to the API. One is the EventStream. This
contains a single method called "next()" which returns a list
of SAX events in the 2-ple (event_name, args). It is called multiple
times to return successive event lists and returns None if no events are
available.
The other is the Iterator
Sean McGrath has a RAX parser (Record API for XML) which uses a
concept similar to this.