You are not logged in Log in Join
You are here: Home » Members » jim » StructuredTextWiki » NGDocumentation

Log in
Name

Password

 
 

History for NGDocumentation

??changed:
-
Using Structured Text

  The goal of StructuredText is to make it possible to express
  structured text using a relatively simple plain text format. Simple
  structures, like bullets or headings are indicated through
  conventions that are natural, for some definition of
  "natural". Hierarchical structures are indicated through
  indentation. The use of indentation to express hierarchical
  structure is inspired by the Python programming language.

  Use of StructuredText consists of one to three logical steps. In the
  first step, a text string is converted to a network of objects using
  the 'StructuredText.Basic' facility, as in the following
  example::

    raw=open("mydocument.txt").read()
    import StructuredText
    st=StructuredText.Basic(raw)

  The output of 'StructuredText.Basic' is simply a
  StructuredTextDocument object containing StructuredTextParagraph
  objects arranged in a hierarchy. Paragraphs are delimited by strings
  of two or more whitespace characters beginning and ending with
  newline characters. Hierarchy is indicated by indentation. The
  indentation of a paragraph is the minimum number of leading spaces
  in a line containing non-white-space characters after converting tab
  characters to spaces (assuming a tab stop every eight characters).

  StructuredTextNode objects support the read-only subset of the
  Document Object Model (DOM) API. It should be possible to process
  'StructuredTextNode' hierarchies using XML tools such as XSLT.

  The second step in using StructuredText is to apply additional
  structuring rules based on text content. A variety of differentText
  rules can be used. Typically, these are used to implement a
  structured text language for producing documents, but any sort of
  structured text language could be implemented in the second
  step. For example, it is possible to use StructuredText to implement
  structured text formats for representing structured data. The second
  step, which could consist of multiple processing steps, is
  performed by processing, or "coloring", the hierarchy of generic
  StructuredTextParagraph objects into a network of more specialized
  objects. Typically, the objects produced should also implement the DOM
  API to allow processing with XML tools.

  A document processor is provided to convert a StructuredTextDocument
  object containing only StructuredTextParagraph objects
  into a StructuredTextDocument object containing a richer collection
  of objects such as bullets, headings, emphasis, and so on using
  hints in the text. Hints are selected based on conventions of the
  sort typically seen in electronic mail or news-group postings. It
  should be noted, however, that these conventions are somewhat
  culturally dependent, fortunately, the document processor is easily
  customized to implement alternative rules. Here's an example of
  using the DOC processor to convert the output of the previous example::

    doc=StructuredText.Document(st)

  The final step is to process the colored networks produced from the
  second step to produce additional outputs. The final step could be
  performed by Python programs, or by XML tools. A Python outputter is
  provided for the document processor output that produces Hypertext Markup
  Language (HTML) text::

    html=StructuredText.HTML(doc)

  One of the most important features of StructuredText is it's
  customizability. For information on customizing StructuredText, see:

  - CustomizingTheDocumentProcessor

  - CustomizingTheOutputter


<hr solid id=comments_below>


karl (May 24, 2001 8:54 pm; Comment #1)  --
 <pre>
 >   StructuredTextNode objects support the read-only subset of the
 >   Document Object Model (DOM) API. It should be possible to process
 >   'StructuredTextNode' hierarchies using XML tools such as XSLT.
 </pre>
 
 No, they don't.
 
 First of all, they don't support the API.  They may have calls that
 do the same thing as DOM calls, but that's not the same as supporting
 the API.
 
 Second, even if they were named properly, the calls aren't very
 compliant.  There are differences.
 
 Why am I being a standards wonk?  The DOM is a very basic interface.
 It's meant for random complex tools to work with it.  Those tools need the
 standard to be followed to work, little differences break them.
 
 The upshot is, I'd be surprised if any XML tools such as XSLT worked with
 these objects.  This shouldn't be advertised as a possibility.  STX
 shouldn't be advertised as supporting any DOM until there's a reason
 to believe that it does.
 
 I want the DOm to be supported, and ParsedXML has a great DOM test
 suite that works with any DOM implementation.  Unfortunately,
 the tests are very interdependent right now - no readonly
 tests, for example, you have to write to test reading.  I hope that
 this gets fixed and becomes useful to make STX a real DOM.