CustomizingTheDocumentProcessor
Customizing the document processor
The document processor is driven by two tables. The first table,
named paragraph_types
, is a sequence of callable objects or method
names for coloring paragraphs. If a table entry is a string, then it
is the name of a method of the document processor to be used. For
each input paragraph, the objects in the table are called until one
returns a value (not None
). The value returned replaces the
original input paragraph in the output. If none of the objects in
the paragraph types table return a value, then a copy of the
original paragraph is used. The new object returned by calling a
paragraph type should implement the ReadOnlyDOM?,
StructuredTextColorizable?, and StructuredTextSubparagraphContainer?
interfaces. See the Document.py
source file for examples.
A paragraph type may return a list or tuple of replacement paragraphs, this allowing a paragraph to be split into multiple paragraphs.
The second table, text_types
, is a sequence of callable objects or
method names for coloring text. The callable objects in this table
are used in sequence to transform the input text into new text or
objects. The callable objects are passed a string and return
nothing (None
) or a three-element tuple consisting of:
- a replacement object,
- a starting position, and
- an ending position
The text from the starting position is (logically) replaced with the replacement object. The replacement object is typically an object that implements that implements the ReadOnlyDOM?, and StructuredTextColorizable? interfaces. The replacement object can also be a string or a list of strings or objects. Replacement is done from beginning to end and text after the replacement ending position will be passed to the character type objects for processing.
To create a new StructuredText format based on the document
processor, simply subclass the document processor's class and
override the processing tables or the methods that the processing
table references. The class of the document processor can be found
in the DocumentClass
module of the StructuredText package.
Example 1, Disabling use of single quotes for literal inline text
Many people don't like the ClassicStructuredTextRule? that causes
single-quoted strings to be translated to literal text (e.g. HTML
code
tags). We can disable this in two ways. First, we can
modify the text_types table to remove this text type. The original
text_type table in the DocumentClass
class looks like:
text_types = ![ 'doc_href', 'doc_strong', 'doc_emphasize', 'doc_literal', ]
We can create our own document processor class with a different table:
import StructuredText, StructuredText.DocumentClass, re class myDocumentClass(StructuredText.DocumentClass.DocumentClass): text_types = filter(lambda t: t != 'doc_literal', StructuredText.DocumentClass.DocumentClass.text_types) Document=myDocumentClass() src=open('mydata').read() # get some source text basic=StructuredText.Basic(src) # convert it to a basic document doc=Document(basic) # convert it to a document-style html=StructuredText.HTML(doc) # generate HTML
Note that we created the subclass table with a filter so that we can still pick up new text stypes as they are added to the base class. Another approach would be to replace the method that detects literal text with one that does nothing:
class myDocumentClass(StructuredText.DocumentClass.DocumentClass): def doc_literal(self, s): pass
Example 2, Provide an alternate literal format
Rather than disable the ability to provide literal text, we could simply change it by providing a function that implements a different rule. For example, we might want to allow literal inline text to be spelled with double backward and forward single quotes as in:
We can use expressions in the DTML var tag as in ``<dtml-var "x+'.txt'">''
In this case, we simply override the method that recognizes literal text with one that implements this rule:
class myDocumentClass(StructuredText.DocumentClass.DocumentClass): def doc_literal( self, s, expr=re.compile( "(?:\s|^)``" # open "([^\n]+?)" # contents "''(?:\s|[,.;:!?]|$)" # close ).search): r=expr(s) if r: start, end = r.span(1) return ( StructuredText.DocumentClass.StructuredTextLiteral( s[start:end]), start-2, end+2) else: return None