DocumentValidation

(Uche 2000-05-02)

XMLDocument?'s builder does not currently support validation. In fact, it is a low-level handler for pyexpat which doesn't support validation.

One nice benefit of using 4DOM would be that DTD validation support comes along for the ride. Now it is important to note that DTD validation is pretty broken in some areas, such as namespaces, and very limited in some areas as data-typing, but it does have its uses, and more importanly, its strong adherents in the XML community.

For one thing, it would allow the use of parsed general entities (this allows a mechanism not unlike macro-expansion), unparsed general entities and notations (which allow inclusion of external non-XML data such as images) and of course, document validation.

It is also important to establish a framework for validation so that once the XML Schema spec is complete, it will be easy to add such support (note that some parties are already working on schema validators in Python). Other schema methodologies such as Schematron and RELAX. Should be considered as options.

The good news is that 4DOM currently uses SAX for reading, which allows us to use xmlproc, a validating parser.

There is also the matter of specifyingthe schema.

Schemas are specified in the document-type declaration (note, different from DTD=document-type definition) as follows:

<?xml version="1.0"?> <!DOCTYPE ADDRBOOK SYSTEM "addrschema.dtd" PUBLIC "http://url.org/addrschema.dtd">

Note thet the "PUBLIC" part is optional, but with the above we can simply read the schema from the given URL. Most validating XML parsers already do this.

There is also a mechanism for locating resources referenced in SYSTEM identifiers: XCatalog?. 4DOM's reader already supports XCatalog? through xmlproc.

It might be useful to also add an attribute to XMLDocument? objects with their schema. This would allow more flexible validation-on-the-fly and would enable alternative schemas not supported by xmlproc, such as XML schemas, Schematron and RELAX.