History for CustomizingTheDocumentProcessor
??changed:
-
Customizing the document processor
The document processor is driven by two tables. The first table,
named 'paragraph_types', is a sequence of callable objects or method
names for coloring paragraphs. If a table entry is a string, then it
is the name of a method of the document processor to be used. For
each input paragraph, the objects in the table are called until one
returns a value (not 'None'). The value returned replaces the
original input paragraph in the output. If none of the objects in
the paragraph types table return a value, then a copy of the
original paragraph is used. The new object returned by calling a
paragraph type should implement the ReadOnlyDOM,
StructuredTextColorizable, and StructuredTextSubparagraphContainer
interfaces. See the 'Document.py' source file for examples.
A paragraph type may return a list or tuple of replacement
paragraphs, this allowing a paragraph to be split into multiple
paragraphs.
The second table, 'text_types', is a sequence of callable objects or
method names for coloring text. The callable objects in this table
are used in sequence to transform the input text into new text or
objects. The callable objects are passed a string and return
nothing ('None') or a three-element tuple consisting of:
- a replacement object,
- a starting position, and
- an ending position
The text from the starting position is (logically) replaced with the
replacement object. The replacement object is typically an object
that implements that implements the ReadOnlyDOM, and
StructuredTextColorizable interfaces. The replacement object can
also be a string or a list of strings or objects. Replacement is
done from beginning to end and text after the replacement ending
position will be passed to the character type objects for processing.
To create a new StructuredText format based on the document
processor, simply subclass the document processor's class and
override the processing tables or the methods that the processing
table references. The class of the document processor can be found
in the 'DocumentClass' module of the StructuredText package.
Example 1, Disabling use of single quotes for literal inline text
Many people don't like the ClassicStructuredTextRule that causes
single-quoted strings to be translated to literal text (e.g. HTML
'code' tags). We can disable this in two ways. First, we can
modify the text_types table to remove this text type. The original
text_type table in the 'DocumentClass' class looks like::
text_types = ![
'doc_href',
'doc_strong',
'doc_emphasize',
'doc_literal',
]
We can create our own document processor class with a different
table::
import StructuredText, StructuredText.DocumentClass, re
class myDocumentClass(StructuredText.DocumentClass.DocumentClass):
text_types = filter(lambda t: t != 'doc_literal',
StructuredText.DocumentClass.DocumentClass.text_types)
Document=myDocumentClass()
src=open('mydata').read() # get some source text
basic=StructuredText.Basic(src) # convert it to a basic document
doc=Document(basic) # convert it to a document-style
html=StructuredText.HTML(doc) # generate HTML
Note that we created the subclass table with a filter so that we
can still pick up new text stypes as they are added to the base class.
Another approach would be to replace the method that detects
literal text with one that does nothing::
class myDocumentClass(StructuredText.DocumentClass.DocumentClass):
def doc_literal(self, s): pass
Example 2, Provide an alternate literal format
Rather than disable the ability to provide literal text, we could
simply change it by providing a function that implements a
different rule. For example, we might want to allow literal inline
text to be spelled with double backward and forward single quotes
as in::
We can use expressions in the DTML var tag as
in ``<dtml-var "x+'.txt'">''
In this case, we simply override the method that recognizes
literal text with one that implements this rule::
class myDocumentClass(StructuredText.DocumentClass.DocumentClass):
def doc_literal(
self, s,
expr=re.compile(
"(?:\s|^)``" # open
"(![^\n]+?)" # contents
"''(?:\s|![,.;:!?]|$)" # close
).search):
r=expr(s)
if r:
start, end = r.span(1)
return (
StructuredText.DocumentClass.StructuredTextLiteral(
s![start:end]),
start-2, end+2)
else:
return None