You are not logged in Log in Join
You are here: Home » Members » jim » StructuredTextWiki » CustomizingTheDocumentProcessor

Log in
Name

Password

 
 

History for CustomizingTheDocumentProcessor

??changed:
-
Customizing the document processor

  The document processor is driven by two tables. The first table,
  named 'paragraph_types', is a sequence of callable objects or method
  names for coloring paragraphs. If a table entry is a string, then it
  is the name of a method of the document processor to be used. For
  each input paragraph, the objects in the table are called until one
  returns a value (not 'None'). The value returned replaces the
  original input paragraph in the output. If none of the objects in
  the paragraph types table return a value, then a copy of the
  original paragraph is used.  The new object returned by calling a
  paragraph type should implement the ReadOnlyDOM,
  StructuredTextColorizable, and StructuredTextSubparagraphContainer
  interfaces. See the 'Document.py' source file for examples.

  A paragraph type may return a list or tuple of replacement
  paragraphs, this allowing a paragraph to be split into multiple
  paragraphs. 

  The second table, 'text_types', is a sequence of callable objects or
  method names for coloring text. The callable objects in this table
  are used in sequence to transform the input text into new text or
  objects.  The callable objects are passed a string and return
  nothing ('None') or a three-element tuple consisting of:

    - a replacement object,

    - a starting position, and

    - an ending position

  The text from the starting position is (logically) replaced with the
  replacement object. The replacement object is typically an object
  that implements that implements the ReadOnlyDOM, and
  StructuredTextColorizable interfaces. The replacement object can
  also be a string or a list of strings or objects. Replacement is
  done from beginning to end and text after the replacement ending
  position will be passed to the character type objects for processing.

  To create a new StructuredText format based on the document
  processor, simply subclass the document processor's class and
  override the processing tables or the methods that the processing
  table references.  The class of the document processor can be found
  in the 'DocumentClass' module of the StructuredText package.

  Example 1, Disabling use of single quotes for literal inline text

    Many people don't like the ClassicStructuredTextRule that causes
    single-quoted strings to be translated to literal text (e.g. HTML 
    'code' tags).  We can disable this in two ways. First, we can
    modify the text_types table to remove this text type. The original
    text_type table in the 'DocumentClass' class looks like::

      text_types = ![
	 'doc_href',
	 'doc_strong',
	 'doc_emphasize',
	 'doc_literal',
	 ]

    We can create our own document processor class with a different
    table::

      import StructuredText, StructuredText.DocumentClass, re

      class myDocumentClass(StructuredText.DocumentClass.DocumentClass):

          text_types = filter(lambda t: t != 'doc_literal', 
	                      StructuredText.DocumentClass.DocumentClass.text_types)

      Document=myDocumentClass()

      src=open('mydata').read()        # get some source text
      basic=StructuredText.Basic(src)  # convert it to a basic document
      doc=Document(basic)              # convert it to a document-style
      html=StructuredText.HTML(doc)    # generate HTML

    Note that we created the subclass table with a filter so that we
    can still pick up new text stypes as they are added to the base class.
    Another approach would be to replace the method that detects
    literal text with one that does nothing::

      class myDocumentClass(StructuredText.DocumentClass.DocumentClass):

          def doc_literal(self, s): pass

  Example 2, Provide an alternate literal format

    Rather than disable the ability to provide literal text, we could
    simply change it by providing a function that implements a
    different rule. For example, we might want to allow literal inline
    text to be spelled with double backward and forward single quotes
    as in::

       We can use expressions in the DTML var tag as 
       in ``<dtml-var "x+'.txt'">''

    In this case, we simply override the method that recognizes
    literal text with one that implements this rule::

      class myDocumentClass(StructuredText.DocumentClass.DocumentClass):

	  def doc_literal(
	     self, s,
	     expr=re.compile(
	       "(?:\s|^)``"           # open
	       "(![^\n]+?)"            # contents
	       "''(?:\s|![,.;:!?]|$)"  # close
	       ).search):

	     r=expr(s)
	     if r:
		start, end = r.span(1)
		return (
		    StructuredText.DocumentClass.StructuredTextLiteral(
		      s![start:end]),
		    start-2, end+2)
	     else:
		return None