Change History
1.4 Release
1.3 Release
1.2 Release
1.1 Release
1.0 release
0.9 release
0.8 release
0.7 release
0.6 release
0.5 release
0.4 release
0.3 release
0.2 release
0.1 release
To Do List
Known problems
Contributors
Change History
1.4 Release
Update of Cookbook containing a chapter about rule API.
Patched SAXWriter to not pass in null System or Public IDs which can cause problems in Saxon.
Patched dom4j to work against Jaxen 1.0 RC1 or later build.
Applied patch to bug found by Tom Oehser that XPath expressions using elements or attributes whose name starts
with '_' were not being handled correctly. It turns out this was a SAXPath issue.
Applied patch to bug found by Soumanjoy Das that creating a new DOMDocument then calling createElement() would generate
a ClassCastException.
Applied patch supplied by James Dodd that fixes a MIME encoding issue in the embedded Aelfred parser
Applied patch to fix bug found by David Frankson. Adding attributes with null values causes problems in XSLT engines
so now adding a null valued attribute is equivalent to removing the attribute. So null attribute values are silently ignored.
e.g.
Element element = ...;
element.addAttribute( "foo", "123" );
...
Attribute attribute = element.attribute( "foo" );
assertTrue( attribute != null );
...
element.addAttribute( "foo", null );
attribute = element.attribute( "foo" );
assertTrue( attribute == null );
1.3 Release
Patches
Applied patch to bug found by Mike Skells that was causing XPath.matches() to return true for absolute XPaths which returned different nodes to the node provided to the XPath.
Applied patch provided by Stefan that was causing IndexOutOfBoundsException when using the evaluate() method in DefaultXPath on an empty result set. Also added a test case to org.dom4j.TestXPathBug called testStefan().
Applied patch suggested by Frank Walinsky, that XPath objects are now Serializable.
Applied patch provided by Bill Burton that fixes union pattern matching.
1.2 Release
New Swing TableModel for displaying XML
Added a new Swing TableModel for displaying XML data in a Swing user interface. It uses
an XPath based model to define the rows and column values.
A table definition can be specified using a simple XML format and then loaded in a small amount of code.
e.g. here's an example of a table that will list the servlets used in a web.xml document
<table select="/web-app/servlet">
<column select="servlet-name">Name</column>
<column select="servlet-class">Class</column>
<column select="../servlet-mapping[servlet-name=$Name]/url-pattern">Mapping</column>
</table>
Notice the use of the $Name XPath variable to access other cells on the row.
Here's the pseudo code to display a table for an XML document.
Document tableDefinition = ...;
Document source = ...;
TableModel tableModel = new XMLTableModel( tableDefinition, source );
JTable table = new JTable( tableModel );
There is a sample program in samples/swing/JTableTool which will display any table definition
for a given source XML document. There is an example table definition for the periodic table
in xml/swing/tableForAtoms.xml.
Registering Namespace URIs for XPath
Added a new helper method to make it easier to create namespace contexts for doing
namespace aware XPath expressions. The new setNamespaceURIs(Map) method on XPath makes it easier
to pass in the prefixes and URIs you wish to use in an XPath expression. Here's an example of it in action
Map uris = new HashMap();
uris.put( "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/" );
uris.put( "m", "urn:xmethodsBabelFish" );
XPath xpath = document.createXPath( "/SOAP-ENV:Envelope/SOAP-ENV:Body/m:BabelFish" );
xpath.setNamespaceURIs( uris );
Node element = xpath.selectSingleNode( document );
In addition DocumentFactory has a setXPathNamespaceURIs(Map) method so that common namespace URIs can be associated with
a DocumentFactory so namespace prefixes can be used across many XPath expressions in an easy way. e.g.
// register prefixes with my factory
Map uris = new HashMap();
uris.put( "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/" );
uris.put( "m", "urn:xmethodsBabelFish" );
DocumentFactory factory = new DocumentFactory();
factory.setXPathNamespaceURIs( uris );
// now parse a document using my factory
SAXReader reader = new SAXReader();
reader.setDocumentFactory( factory );
Document doc = reader.read( "soap.xml" );
// now lets use the prefixes
Node element = doc.selectSingleNode( "/SOAP-ENV:Envelope/SOAP-ENV:Body/m:BabelFish" );
Whitespace handling
There is a new mergeAdjacentText option available on SAXReader
to concatenate adjacent text nodes into a single Text node.
In addition there is a new stripWhitespaceText option to strip text which occurs between
start/end tags which only consists of whitespace.
For example, parsing the following XML with the stripWhitespaceText option enabled
and the mergeAdjacentText option enabled
will result in a single child node of the parent element, rather than 3 (2 text nodes containing whitespace and one element).
<parent>
<child>foo</child>
</parent>
Note that this option will not break most mixed content markup such as the following,
since its only whitespace between tag start/ends that gets removed; non-whitespace strings are not trimmed.
<p>hello <b>James</b> how are you?</p>
Both these options together can improve the parsing performance by around 10-12% depending on the
structure of the document. Though the whitespace layout of the XML document will be lost, so only
use these modes in data-centric applications like XML messaging and SOAP.
So a typical SOAP or XML messaging developer, who may care more about performance than
preserving exact whitespace layout, may use the following code to
make the SAX parsing more optimal.
SAXReader reader = new SAXReader();
reader.setMergeAdjacentText( true );
reader.setStripWhitespaceText( true );
Document doc = reader.read( "soap.xml" );
Patches
Applied patch to HTMLWriter to fix bug found by Dominik Deimling that was not correctly outputting CDATA sections correctly.
Patched the setName() method on Element so that elements can be renamed. Also added a new setQName() to the Element interface so that elements can be renamed in a namespace aware manner. Thanks to Robert Lebowitz for this.
Applied fix to bug found by Manfred Lotz that XMLWriter in whitespace trimming mode would sometimes not correctly insert a space when text is seperated by newlines. The Test case testWhitespaceBug() in org.dom4j.TestXMLWriter reproduces the bug that has now been fixed.
Applied patches supplied by Stefan Graeber that enhance the datatype support to support included schemata and derived element types.
Applied patches suggested by Omer van der Horst Jansen to enable dom4j to fully work
properly on JDK1.1 platforms. There were some uses of java.util.Stack which have been changed to ArrayList.
Applied patches supplied by Maarten Coene that fixes some issues with using the correct
DocumentFactory when using the DOM implementation.
Updated the MSV support to comply with the latest MSV version, 1.12 (Nov 01 2001).
In addition the MSVDemo.java in dom4j/src/samples/validate has been replaced by
JARVDemo.java which now uses the JARV API
to validate a dom4j
document using the MSV implementation.
This demo can validate any XML document against any DTD, XML Schema, Relax NG, Relax or Trex
schema - thanks to the excellent JARV API and MSV library.
Applied patches supplied by Steen Lehmann that fixes handling of external DTD entities in SAXContentHandler
and fix the XML output of the ExternalEntityDecl
Applied patch to bug found by Steen Lehmann that XPath expressions on the root element
were not correctly handling namespaces correctly. The test case is demonstrated in
dom4j/src/test/org/dom4j/xpath/TestSelectSingleNode.java
Added patch found by Howard Moore when using XTags that XPath string values which contained
strings with entities, such as the use of & in a text, would result in redundant spaces
occuring, breaking URLs.
1.1 Release
New features
Added a new package, org.dom4j.dtd which contains some DTD declaration classes which are added to the DocumentType interfaces
List of declarations. This is useful for finding out details of the attribute or element delcarations inside either the
internal or external DTD subset of a document.
To expand internal or external DTD subsets when parsing with SAXReader use the 2 properties on SAXReader (and SAXContentHandler).
SAXReader reader = new SAXReader();
reader.setIncludeInternalDTDDeclarations( true );
reader.setIncludeExternalDTDDeclarations( true );
Document doc = reader.read( "foo.xml" );
DocumentType docType = doc.getDocType();
List internalDecls = docType.getInternalDeclarations();
List externalDecls = docType.getExternalDeclarations();
This new feature means that XML documents which use internal DTD subsets, external DTDs or a mixture of internal and external
DTD subsets can now be properly round tripped.
Note that there appears to be a bug in Crimson 1.1.3 which does not properly differentiate between internal or external DTD
subsets. Refer to the startDTD()
method of LexicalHandler
for details of how startEntity/endEntity is meant to demark external DTD subsets.
Its our intention to expand internal DTD subsets by default (so that documents can be properly round tripped by default)
but require external DTD subsets to be explicitly enabled via the property on the SAXReader (or SAXContentHandler).
This bug in Crimson causes all DTD declarations to appear as internal DTD subsets, which both is a performance overhead
and breaks round tripping of documents which just use external DTD declarations. So until this matter is resolved
both internal and external declarations are not expanded by default.
Note that the code works perfectly against Xerces.
Patches
Applied patch submitted by Yuxin Ruan which fixes some issues with XML Schema Data Type support
Followed Dennis Sosnoski's suggestion, adding a null text String to an Element
now throws an IllegalArgumentException.
To ensure that the IllegalArgumentException is not thrown its advisable
to check for null first. For example...
Element element = ...;
String text = ...;
// might throw IllegalArgumentException
// if text == null
element.addText( text );
// safer to do this
if ( text != null ) {
element.addText( text );
}
Fixed problem found by Kesav Kumar Kolla whereby a deserialized Document could have problems if
new elements were attempted to be added. The problem was an issue with DocumentFactory not correctly
deserializing itself properly.
Fixed problem found by David Hooker with Ant build file for the
binary and source distribution that was not including the manifest file
in the distribution.
Applied patch submitted by Lari Hotari that was causing the XMLWriter to fail when used as a
SAX XMLFilter or ContentHandler to turn SAX events into XML text. Thanks Lari!
Fixed bug found by Kohsuke Kawaguchi that there was a problem in XMLWriter during its serialization of
a document which redeclared the default namespace prefix. It turned out to be a bug in
org.dom4j.tree.NamespaceStack where redeclarations of namespace prefixes were not being handled properly
during serialization. The test cases in org.dom4j.TestXMLWriter and org.dom4j.TestNamespaces have been improved
to test these features more rigorously.
Fixed bug found by Toby
that was causing a security exception in applets when using a DocumentFactory.
Implemented the suggestion by Kesav Kumar, that the detach() method now returns the node (this) so that moving
nodes from one part of a document to any another can now be one line of code. Here's an example of it in use.
Document doc1 = ...;
Document doc2; = ...;
Element destination = doc2.getRootElement();
Element source = doc1.selectSingleNode( "//foo[@style='bar']" );
// lets move the source to the destination
destination.add( source.detach() );
Added better checking in selectSingleNode() implementation so that XPath expressions which do not
return a Node throw a meaningful exception (not ClassCastException) informing the user of
why the XPath expression did not succeed.
Added patch found by Kesav Kumar that a document containing null Strings would cause a NullPointerException
to be thrown if it was passed into SAXWriter (used by the JAXP - XSLT code). Now the SAXWriter will
quietly ignore null Strings, as will XMLWriter.
1.0 release
New features
Added helper method setXMLFilter() to SAXReader making it easier to install SAX filters to filter or preprocess SAX events
before a dom4j Document is created. Also added a new sample program called sax.FilterDemo that demonstrates how to use a SAX filter
with dom4j.
Added full support for Jaxen function, namespace and variable context interfaces.
This allows the XPath engine to be fully customized. e.g.
XPath xpath = document.createXPath( "//foo[@code='123']" );
// customize function, namespace and variable contexts
xpath.setFunctionContext( myFunctionContext );
xpath.setNamespaceContext( myNamespaceContext );
xpath.setVariableContext( myVariableContext );
List nodes = xpath.selectNodes( document );
Added new helper class org.dom4j.util.XMLErrorHandler which
turns SAX ErrorHandler callbacks into XML that can then be output in a JAXM or SOAP message
or styled via XSLT or whatever.
Added new helper method DocumentHelper.makeElement(doc, "a/b/c") which will
navigate from a document or element to the given simple path, creating new elements along the way if need be.
This allows elements to be found or created using a simple path expression mechansim.
Added helper method getQName(String qualifiedName) to Element so that easier element name matching can be done. Here are some examples of it in use.
// find all elements with a local name of "foo"
// in any namespace
List list = element.elements( "foo" );
// find all elements with a local name "foo"
// and the default namespace URI
List list = element.elements( element.getQName( "foo" ) );
// find all elements which match the local name "foo"
// and the namespace URI mapped to the "x" prefix
List list = element.elements( element.getQName( "x:foo" ) );
Added helper method on org.dom4j.DocumentFactory called getQNames
that returns a List of all the QNames that were used to parse the documents.
Added an EntityResolver property to SAXReader to make it easier to configure a specific EntityResolver.
Patches
Added patch so that patterns such as @id='123' and name()='foo' are now
working properly again. Also patterns such as not(@id='123') work now too.
Patched the dynamic loading of classes to fix some ClassLoader issues found with some application servers.
Ported the data type support to work with the latest MSV library from Sun
Fixed bug spotted by Stefan Graeber that was causing a DocumentException to be thrown with Xerces
when turning validation mode on.
Patched bug in QName which was using the qualified name rather than the local name along with the namespace URI
to determine equality.
Added patch kindly supplied by Michal Palicka that SAXReader was passing in the wrong name for the SAX string-interning feature. Thanks Michal!
Fixed the behaviour of DocumentFactory.createXPathFilter() to use XPath filtering
rather than XSLT style patterns. One of the major differences is that an XSLT pattern
(used in the <xsl:template match="pattern"/> element in XSLT) works slightly
differently. An element <foo> would match an XSLT pattern "foo" whereas an
element <bar> could match an XPath filter "foo" if it contained a child <foo>
element.
Patched the behaviour of Node.matches(String xpathExpression) so that it uses
XPath filters now rather than XSLT patterns.
Patched bug in XRule implementation in org.dom4j.rule that was causing ordering problems
when using stylesheets - the Rule precendence order was not being correctly used.
Backed out a previous patch added to 0.9 such that attributes with no namespace prefix are in
no namespace. An attribute does not inherit the default namespace - the only way to put an attribute
into a namespace is via a namespace prefix.
Patched XMLWriter to that a flush() is not required when using an OutputStream
and the various sub-document write() methods
are called such as write(Element), write(Attribute), write(Node), write(Namespace) etc.
Fixed bug in SAXReader that setEntityResolver() was not always behaving properly.
Also the default entity resolver used to locate XML Schemas seems to work properly now.
Moved the XML Schema Data Type supporting classes in org.dom4j.schema.Schema* to
org.dom4j.datatype.Datatype*. This should avoid confusion and better describe the
intent of the classes, to implement Data typing, rather than schema validation.
We hope to use the MSV
library for all of our schema validation requirements.
0.9 release
Full support for the Jaxen XPath engine
The XPath engine in dom4j has been migrated to using Jaxen.
This single XPath engine can be plugged into any model such that Jaxen will support
DOM, dom4j, EXML and JDOM. Hopefully we'll get Jaxen working on Java Beans too.
In general this will mean a much better, more compliant and more bug-free XPath
engine for dom4j as it will be used extensively across XML object models.
Already numerous irregularities have been fixed in the XPath support in dom4j.
We have donated the dom4j XPath test harness to Jaxen so that we now have a large
rigorous test harness to ensure correct XPath behaviour - this test harness is run
against all 4 current XML object models to ensure consistent behaviour and valid
XPath compliance.
We are also in the process of migrating over our XPath extension functions as
well as adding additional XPath functions such as those defined in XSLT and XPointer.
New features
New class org.dom4j.io.XMLResult which is-a JAXP Result which uses the same
org.dom4j.io.OutputFormat object to provide its formatting options
to allow XML output from JAXP (such as via XSLT) to be pretty printed.
XMLWriter now implements the SAX XMLFilter interface so that it can
be added to a SAX parsing filter chain to output the XML being parsed in a simple way.
Many thanks to Joseph Bowbeer for his help in this area.
Added setProperty() and setFeature() methods to SAXReader to allow
the easy configuration of custom parser properties via SAXReader, such
as being able to specify the location of schema or DTD resources.
Added new method OutputFormat.createCompactFormat() for those wishing
to output their XML in a compact format, such as in messaging systems.
Patches and bug fixes
Fixed bug in getNamespaceForPrefix() where if the prefix is null or ""
and there is a default namespace defined, this method was returning
a namespace instance with the incorrect URI.
Patched DOM writer so that it uses JAXP if it is available on the CLASSPATH
using namespace aware mode by default.
Fixed a number of issues relating to namespaces and the redefinition of namespace prefixes.
We now have a quite aggressive JUnit test harness to ensure that we handle namespace URIs
correctly when prefixes are mapped and unmapped.
Applied patch from Andrew Wason for HTMLWriter to support the full
HTML 4.01 DTD elements which do not require proper XML element closes.
The new elements are PARAM, AREA, LINK, COL, BASE and META.
Fixed bug found by Dennis Sosnoski that SAX warnings were causing
exceptions to be thrown by the SAXReader. Now warnings are silently ignored.
If you want to detect warnings then an ErrorHandler should be registered with the
SAXReader.
Patched bug that was also found by Jonathan Doughty for the non-standard
behaviour of the FilterIterator. Also added Jonathan's JUnit test case
to the distribution so that this problem should not come back.
Fixed bug that when round tripping into JAXP and back again, sometimes
additional namespace attributes were appearing.
Now the TestRoundTrip JUnit test case includes JAXP round tripping.
Fixed bug that attributes without a namespace prefix which are inside
an element with a default namespace declaration of the form xmlns="theURI",
the attribute now correctly inherits the namespace URI.
Applied patch found by Stefan Graeber that the UserDataFactory was not
correctly creating UserDataAttribute instances.
Fixed bug that SAXWriter and DocumentSource were not correctly producing
lexical events such as entities, comments and DOCTYPE declarations.
Many thanks to Joseph Bowbeer for his help in this area.
0.8 release
New methods
hasContent()
has been added to the Node interface
so that it is easy to decide if a node is a leaf node or not.
This method was suggested by Dane Foster.
This method returns true if the node is a Branch (i.e. an Element or Document)
which contains at least one node.
getPath(Element context)
getUniquePath(Element context)
These new methods
allow paths and unique paths to be created relatively. Previously both
getPath() and getUniquePath() would create absolute XPath expressions.
These new methods allow relative path expressions to be created by providing
an ancestor Element from which to make the path expression.
This method was suggested by Chris Nokleberg.
Patches and bug fixes
Fixed bug found by Chris Nokleberg when using the UserDataElement
that the clone() and createCopy() methods were not correctly
copying the user data object. A JUnit test case has been added that
tests this fix (org.dom4j.TestUserData).
If any deep copying of user data objects is required then
UserDataElement now has a method getCopyOfUserData()
which can be overloaded to perform a deep copy of any user data
objects if required.
Minor patch for dom4j implementors wishing to create their own
QName implementations. Previously the DocumentFactory class was
hardwired to use QNameCache internally which was hard wired to
only create QName instances.
Now some factory methods have been added such that you can derive
your own DocumentFactory which uses your own
QNameCache which creates your own QName classes.
If JAXP can be found in the CLASSPATH then it is now used first by
the SAXReader to find the correct SAX parser class.
We have found that sometimes (e.g. Tomcat 4.0 beta 6) the value
of the org.xml.sax.driver system property is set to a class which is not
in the CLASSPATH but a valid JAXP parser is present.
So now we try JAXP first, then the SAX system property then if all else fails
we use the bundled Aelfred SAX parser.
Fixed XPath bug found by James Elson that the path /foo[@a and @b] or
/foo[@a='1' and @b='2'] was no longer working correctly. This is now fixed
and many tests of this nature have been added to the JUnit test harness.
Fixed some namespace related bugs found by Steen Lehmann.
It appears that for a document of:-
<a xmlns="dummyNamespace">
<b>
<c>Hello</c>
</b>
</a>
Then the path /a/b/c will not find anything - this is correct according to the XPath spec.
Instead the path /*[name()='a']/*[name()='b']/*[name()='c'] is required.
These changes have been applied to getPath() and getUniquePath() such that
these methods now work, irrespectively of the namespaces used in a document.
Finally many new test cases have been added to validate a variety of XPath
expressions with various uses of namespaces.
SAXWriter now fully supports the SAX feature "http://xml.org/sax/features/namespace-prefixes".
Failure to support this feature properly was causing problems
when outputting a dom4j Document using JAXP - the namespace declarations often did not appear correctly.
Patched bug in XMLWriter which caused multiple duplicate namespace declarations to sometimes appear.
0.7 release
Integration with SAXPath
The SAXPath project is a
Simple API for XPath parsing. Its analogous to
SAX in that the API abstracts away the details of parsing XPath
expressions and provides a simple event based callback
interface.
Originally dom4j was using a parser generated via the Antlr tool
which resulted in a considerably larger code base.
Now dom4j uses SAXPath for its XPath parsing which results in faster XPath parsing
and a much smaller code base.
The dom4j.jar is now about 100 Kb smaller!
Also several XPath related bugs are now fixed. For example the numeric paths like '2 + count(//foo)' are now working.
Patches and bug fixes
Fixed bug found by Tobias Rademacher that XML Schema Data Type support
wasn't working correctly when the XSD document used a namespace prefix.
The bug was hidden by a further bug in the JUnit test case that was not correctly
testing this case. Both these bugs are now fixed.
Fixed bug found by Piero de Salvia that some invalid XPath expressions were not correctly
throwing exceptions. Now any attempt to use any invalid XPath expressions should result
in an
InvalidXPathException
being thrown.
Applied patch submitted by Theodor Schwarzinger that fixes the preceding-sibling and preceding axes.
Fixed bug found my James Elson that the normalize() method was being quite agressive and removing
all text nodes! New JUnit test case added to ensure this doesn't break again.
Improved the setContent() semantics on Branch (and so Element and Document) such that the
parent and document relationships are correctly removed for old content and added for new content.
As a helper method, the setContent() method will clone any content nodes which are already
part of an existing document. So for example the following code will clone the content of a document.
Document doc1 = ...;
Document doc2 = DocumentHelper.createDocument();
doc2.setContent( doc1.content() );
Though this behaviour is much more useful when used with elements...
Element sourceElement;
Element destElement;
// copy the content of sourceElemenet
destElement.setContent( sourceElement.content() );
0.6 release
Serialization support added
Support has been added for Java Serialization so dom4j documents can be serialized over RMI or EJB calls.
Note that currently Serialization is much slower (by a factor of 2-5 times) than using the textual format of
XML so we recommend sending XML text over RMI rather than serialization if possible. Over time we will tune
the serialization implementation to be at least as fast as using the text format (even if that means under the
covers we just use the text format).
Patches and bug fixes
Fixed bug in XPath engine found by Christophe Ponsard
for paths of the form /* which were not finding
anything. Now we have an extensible XPath test harness (in src/test/org/dom4j/TestXPathExamples.java)
which contains some test cases for these kinds of paths. We can extend these cases
to test other XPath expressions easily.
Fixed bug in elementByID() method found by Thomas Nichols that was resulting in
the element not being found correctly.
Fixed bug in IndexedElement reported by Kerstin Grünefeld that was causing
a null pointer exception when using XPath on an IndexedElement.
Applied the patch supplied by Mike Skells that fix problems with the
getUniquePath() method not returning properly indexed elements
Applied a fix to the problem found by Dane Foster when using dom4j with JTidy.
JTidy returns null for getLocalName() so DOMReader has been patched to handle
nulls returned from either getLocalName() or getName().
Fixed bug reported anonymously to the Sourceforge Site
here
that explicitly creating a Document from an existing Element could cause problems when
using XMLWriter.
Assorted performance tunings of SAX parsing, avoiding unnecessary repeated code paths.
Tidied factory and construction of Element code such that there are no longer
dependencies on the SAX Attributes class. This was originally added as a performance
enhancement, but after further refactoring this is now no longer needed.
This makes the process of creating new Element derivations or DocumentFactory
implementations easier.
0.5 release
NodeComparator available
For those wishing to do value based comparisons of Nodes,
Element, Attributes, Documents or Document fragments
there is a new
NodeComparator
class which implements the
Comparator
interface from the Java Collections Framework.
New helper method DocumentHelper.parseText()
A new helper method has been added for parsing text.
For example:-
Document document = DocumentHelper.parseText(
"<team> <author>James</author> </team>"
);
New Branch.normalize() method
The Branch interface (and so Document and Element interfaces) has
a new normalize() method that has the same semantics as the same method
in the DOM API to remove empty Text nodes and merge adjacent Text nodes.
Easier document building methods
A document can now be constructed more easily now that the addXXX() methods
return a reference to the Document or Element which created them.
An example is shown below
import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
public class Foo {
public Document createDocument() {
Document document = DocumentHelper.createDocument();
Element root = document.addElement( "root" );
Element author1 = root.addElement( "author" )
.addAttribute( "name", "James" )
.addAttribute( "location", "UK" )
.addText( "James Strachan" );
Element author2 = root.addElement( "author" )
.addAttribute( "name", "Bob" )
.addAttribute( "location", "Canada" )
.addText( "Bob McWhirter" );
return document;
}
}
Note that the addElement() method returns the new child element
not the parent element.
To promote consistency, the Element.setAttributeValue() method is now deprecated
and should be replaced with Element.addAttribute().
Patches and bug fixes
Applied Theo's patch for cloning of Documents correctly
together with JUnit test cases to ensure this keeps working.
Applied Rob Wilson's patch that NullPointerExceptions
were being thrown if a Document is output with the XMLWriter
and an attribute value is null.
Fixed problem found by Nicolas Fonrose that XPath expressions
using namespace prefixes were not working correcty.
Fixed problem found by Thomas Nichols whereby default namespaces
with no prefix were not being processed correctly.
As a result of finding this bug we now have a rigorous JUnit round trip test
harness in place which highlighted a number of issues with namespaces when
round tripping from dom4j to SAX to DOM to Text and back again.
These issues have now been fixed and should not show up again hopefully.
Fixed some detach() bugs that were found with Attributes.
Default encoding is now "UTF-8" rather than "UTF8". Thanks to
Thomas Nichols for spotting that one. Also the default
line seperator when using XMLWriter is now "\n" rather than "\r\n"
If an XMLWriter is used with an OutputStream then an explicit call to
flush() is no longer required after calling write(Document)
Some housekeeping was performed in the naming of some implementation classes.
The old XPathXXX.java classes in the org.dom4j.tree package
where XXX = Attribute, CDATA,
Comment, Entity, ProcessingInstruction and Text have been renamed to
DefaultXXX and the corresponding DefaultXXX has been renamed to
FlyweightXXX. This makes it clearer the purpose of these implementation
classes. The default implementations of the leaf nodes are mutable but cannot
be shared across elements. The FlyweightXXX implementations are immutable
and can be shared across nodes and documents.
0.4 release
Enhanced event notification mechanism
A new enhanced event notification mechanism has been implemented
by David White.
Now you can register multple
ElementHandler
instances with a
SAXReader
object before you parse a document such that the different
handlers are notified when different paths are reached.
The ElementHandler
interface now has both onStart() and onEnd()
allowing more fine grained control over when you are called
and the ability to perform actions before or after the content
for an Element is populated.
The methods also take a reference to a
ElementPath
to allow more optimised and powerful access to the path to the specified document.
Early alpha release of XML Schema Data Type support
This release contains an alpha release of XML Schema Data Type
support.
The main class in question is the XML Schema Data Type aware
DatatypeDocumentFactory
which will create an XML Schema Data Type aware XML object model.
The getData() and setData(Object) methods
on
Attribute and
Element
allow access to the concrete data types such as Dates and Numbers.
Patches and bug fixes
Applied Theo's patch for the XPath substring function
that was causing the incorrect string indexes to be returned.
The substring now returns the correct answer.
Applied Theo's patch for incorrectly escaping of element text.
Fixed bug in the XPath engine for absolute path expressions which
now work correctly when applied to leaf nodes.
Fixed bug
in the name() and local-name()
functions such that the following expressions now work fine
local-name(..), name(parent::*) .
A variety of minor performance tuning optimisations have been made.
0.3 release
The org.dom4j.io.OutputFormat class now has a new helper
method to make it easier to create pretty print formatting objects.
The new method is OutputFormat.createPrettyPrint() .
So to pretty print some XML (trimming all whitespace and indenting nicely)
the following code should do the job...
OutputFormat format = OutputFormat.createPrettyPrint();
XMLWriter writer = new XMLWriter( out, format );
writer.write( document );
writer.close();
SAXReader.read(String url) can now accept either
a URL or a file name which makes things a little easier.
The logic uses the existence of a ':' in the url String to determine if
it should be treated as a URL or a File name.
For more explicit control over whether documents are Files or URLs
call SAXReader.read(File file) or SAXReader.read(URL url)
A new extension function, matrix-concat() was submitted by
James Pereira.
By default, doing concat() functions in XPath the 'string-value' is taken
for each argument.
So for a document:-
<root project="dom4j">
<contributor>James Pereira</contributor>
<contributor>Bob McWhirter</contributor>
</root;>
Then the XPath
concat( 'thanks ', /root/contributor )
would return
"thanks James Pereira"
as the /root/contributor expression matches a node set of 2
elements, but the "string-value" takes the first elements text.
Whereas matrix-contact will do a cartesian product of all the
arguments and then do the concatenation of each combination of nodes. So
matrix-concat( 'thanks ', /root/contributor )
will produce
"thanks James Pereira"
"thanks Bob McWhirter"
The cartesian product is done such that multiple paths can be used.
matrix-concat( 'thanks ', /root/contributor, ' for working on ', '/@project' )
will produce
"thanks James Pereira for working on dom4j"
"thanks Bob McWhirter for working on dom4j"
Fixed bug where XMLWriter.write(Object) was not correctly
writing a Document instance.
Finally, a couple of small issues with the build process have been fixed.
The dom4j.jar no longer contains any SAX or DOM classes (they are all in dom4j-full.jar)
And the Antlr grammar files for the XPath parser are now corrrectly
included in the binary distribution.
0.2 release
There following new features were added:-
|
 | Clean integration with XSLT via JAXP / TrAX API.
|
 | New SAXValidator to allow validation on prebuild Document instances
|
 | XMLWriter and HTMLWriter rewritten so that they work at either
the SAX level or the dom4j level.
API much improved and more like Reader and Writer in the JDK.
|
 | API modified to avoid clashes with WC3 DOM such that a dual
implementation of dom4j and DOM is now possible. An early alpha
release of a DOM implementation of dom4j is available.
|
 | New sorting method added to Node for easier selections of nodes
which are sorted via an XPath expression. The following
code sorts all CUSTOMER elements by their name attributes and
removes duplicates:-
Document document
= new SAXReader().read( new File( "customers.xml" ) );
List customers
= document.selectNodes( "//CUSTOMER", "@name", true );
|
 | The getText() and getStringValue() methods of Element now return
the textual values of CDATA, Entity and Text nodes.
The previous version only returned Text node values.
|
 | Refactored code and removed XPathEngine, XPathHelper and
all the static newXXX() methods in DocumentFactory.
Added equivalent methods to DocumentHelper and DocumentFactory.
|
This release also includes full XPath source code.
0.1 release
Initial release which comes complete with DOM, JAXP and SAX
support and integrated XPath
To Do List
|
 | The internal subset does not pass through DOMReader and DOMWriter. This needs patching!
|
 | We should add support for Xerces XNI API via an XNIReader and XNIWriter. This would also allow
dom4j users to make good use of the NekoHTML parser thats layered on top of XNI.
|
 | Better documentation and user guides
|
 | A lazy parser; implement a special Element implementation (or probably a special List)
which allows the XPP (XML Pull Parser) to parse the document as it is navigated rather than
all up front.
|
 | Build a dom4j validator based on top of Suns MSV library
|
 | Ensure that the optional DOM implementation passes the DOM compliance tests
|
 | Implement a ValidatingDocumentFactory and an EncodingDocumentFactory
which can be used by developers where invalid strings may be specified
allowing validation or encoding of names or text values to be
done in one place for use across parsers or application code.
This would avoid any performance hit by making this kind of validation
the default behaviour.
|
 | Implement a canonical XML processor
|
 | Implement XML Signature
|
 | Implement XPointer, XLink and XInclude
|
 | Build a version of XMLC which uses the dom4j API rather than DOM which
could also make use of XPath, XSLT and Java 2 Collections support.
|
 |
Consider adding support for
Java Generics
such that typesafe Iterators can be used. For example
Iterator<Node> iter = element.nodeIterator();
while ( iter.hasNext() ) {
Node node = iter.next();
}
Iterator<Element> iter2 = element.elementIterator( "foo" );
while ( iter2.hasNext() ) {
Element foo = iter2.next();
}
|
 | Implement XSLT engine on top of dom4j?
|
 | XML Query implementation on top of dom4j?
|
Known problems
The following functions are not yet fully supported in the inbuilt
XPath engine
|
 | id() |
 | generate-id() |
 | format-number() |
The optional W3C DOM implementation of the dom4j API is not yet at
full DOM compliance
Contributors
The following people have contributed to the dom4j project.
Many thanks to you all!
|
 | James Strachan |
 | Bob McWhirter |
 | James Dodd |
 | James Elson |
 | Jakob Jenkov |
 | James Pereira |
 | David White |
 | Tobias Rademacher |
 | Rashmi Mathew |
 | Jonathan Doughty |
 | Joseph Bowbeer |
 | Michal Palicka |
 | Yuxin Ruan |
 | Steen Lehmann |
 | Maarten Coene |
 | Stefan Graeber |
|