Using Structured Text
The goal of StructuredText is to make it possible to express structured text using a relatively simple plain text format. Simple structures, like bullets or headings are indicated through conventions that are natural, for some definition of "natural". Hierarchical structures are indicated through indentation. The use of indentation to express hierarchical structure is inspired by the Python programming language.
Use of StructuredText consists of one to three logical steps. In the
first step, a text string is converted to a network of objects using
StructuredText.Basic facility, as in the following
raw=open("mydocument.txt").read() import StructuredText st=StructuredText.Basic(raw)
The output of
StructuredText.Basic is simply a
StructuredTextDocument? object containing StructuredTextParagraph?
objects arranged in a hierarchy. Paragraphs are delimited by strings
of two or more whitespace characters beginning and ending with
newline characters. Hierarchy is indicated by indentation. The
indentation of a paragraph is the minimum number of leading spaces
in a line containing non-white-space characters after converting tab
characters to spaces (assuming a tab stop every eight characters).
StructuredTextNode? objects support the read-only subset of the
Document Object Model (DOM) API. It should be possible to process
StructuredTextNode hierarchies using XML tools such as XSLT.
The second step in using StructuredText is to apply additional structuring rules based on text content. A variety of differentText rules can be used. Typically, these are used to implement a structured text language for producing documents, but any sort of structured text language could be implemented in the second step. For example, it is possible to use StructuredText to implement structured text formats for representing structured data. The second step, which could consist of multiple processing steps, is performed by processing, or "coloring", the hierarchy of generic StructuredTextParagraph? objects into a network of more specialized objects. Typically, the objects produced should also implement the DOM API to allow processing with XML tools.
A document processor is provided to convert a StructuredTextDocument? object containing only StructuredTextParagraph? objects into a StructuredTextDocument? object containing a richer collection of objects such as bullets, headings, emphasis, and so on using hints in the text. Hints are selected based on conventions of the sort typically seen in electronic mail or news-group postings. It should be noted, however, that these conventions are somewhat culturally dependent, fortunately, the document processor is easily customized to implement alternative rules. Here's an example of using the DOC processor to convert the output of the previous example:
The final step is to process the colored networks produced from the second step to produce additional outputs. The final step could be performed by Python programs, or by XML tools. A Python outputter is provided for the document processor output that produces Hypertext Markup Language (HTML) text:
- karl (May 24, 2001 8:54 pm; Comment #1)
> StructuredTextNode objects support the read-only subset of the > Document Object Model (DOM) API. It should be possible to process >
StructuredTextNodehierarchies using XML tools such as XSLT.
No, they don't.
First of all, they don't support the API. They may have calls that do the same thing as DOM calls, but that's not the same as supporting the API.
Second, even if they were named properly, the calls aren't very compliant. There are differences.
Why am I being a standards wonk? The DOM is a very basic interface. It's meant for random complex tools to work with it. Those tools need the standard to be followed to work, little differences break them.
The upshot is, I'd be surprised if any XML tools such as XSLT worked with these objects. This shouldn't be advertised as a possibility. STX shouldn't be advertised as supporting any DOM until there's a reason to believe that it does.
I want the DOm? to be supported, and ParsedXML? has a great DOM test suite that works with any DOM implementation. Unfortunately, the tests are very interdependent right now - no readonly tests, for example, you have to write to test reading. I hope that this gets fixed and becomes useful to make STX a real DOM.