The next generation of structured text has three main goals
- Break the processes of organizing the paragraphs, tagging special elements, and parsing the paragraphs into distinct and seperate phases.
- Enable the user to define new structured text types and or overload/ignore old types.
- Enable the user to define new or custom parsers.
How are the processes broken up?
- The first module is ST. This module contains the StructuredText function and the DOC class.
- The first step is organizing the paragraphs. This is done by the StructuredText function. This function returns a structure of the formatt ["A",[B]?]. Part A is a paragraph, part B is a list of sub-paragraphs of part A. This list is nested and can be traversed to find sub-paragraphs of part A's subparagraph. This list maintains the same format for each element, ["A",[B]?].
- The second step is tagging structured text types in each
paragraph. This is done by the DOC class. Doc first receives a
structure returned by Structured Text. Doc maintains an
internal list of class instances for each structured text type.
Each class takes a raw string / unprocessed paragraph and searches
for a structured text type. The structure returned by Doc is very
similar to the structure returned. The only differnce is that part
A is now a list of raw strings and structured text instances.
'EX : original paragraph = "this is a link to john"\n" is returned as [["this is a link to ",
, "\n"]?, ]'. Each instance maintains an internal string which holds the original matching string and any other structured text types found in the string.
- The final stage is parsing the structure returned by DOC. The parser by my definition is something that interprets the raw strings and type instances and generate the appropriate code, such as html. The parser traverses each paragraph and each instance's string and interprets the code based on what type of instance the string belongs to.
How does the user define new types, extend/overload old one?
- To define a new type, the user would need to create a class for
the new type. This class would contain the expr that matched the
new type. The class
__call__method would be overloaded to receive a raw string and determine if it matched the new type. If the raw string matches, an new instance of the type is created and that instance's string becomes the matching sub-string. The instance also maintains the start and end positions of the sub-string in relation to the original string. The class also needs to maintain a span method, which returns a tuple (start,end), of the sub-string's position.
- To change how a type is matched the user would need to alter the expr in the class for the type to be changed.
- To ignore a type, the easy way is to remove the type from self.types in Doc. This class either be done brute force by literally removing it from the code, or by subclassing doc and simply splicing the type from self.type NOTE : this requires knowing the location of the type in the list
How to extend DOC to recognize new types
Define the new type
- Need to write a new class for the type. This class must have the following
- an overloaded call function
- an overloaded init function
- a span function
- a type function
- string function
- The init function will create a self.str item for the new type. Also has two items for the span function, self.start, self.end
- The type function will return string which tells what type the instance is. Ex : the current header class's .type() returns "header"
- A string function which returns self.str
- The overloaded call function receives a string and determines if there is a matching structured text type in the string. If there is, set self.start and self.end for the range of the sub-string that matches. Create a new instance whose string is the matching sub-string. Return the new instanace
- span returns the tuple (self.start,self.end)
Make it so DOC can recognize the new type
- Need to create a new DOC, which subclasses the old DOC
- Overload the init function. Perform the original DOC init, but then self.types needs to be modified. This is a list of structured text types. An instance of the new type must be inserted/appended to the list. NOTE : Order does matter.
How to extend DOC to overload old types
How to extend a parser
Sub-Class an older parser
- Why Sub-class an older parser?
- If the user is modifying a small number of structured text types, it is faster to sub-class and have the majority of the types pre-defined by an old parser.
To add a new type
- Modify the self.types for the parser class. This is a dictionary, so order is illrelevant. Ex: self.types["newtypename"]? = self.typefunction where newtypename is the string returned by the new instance's .type() call and typefunction is the function in the parser which handles that instance type.
- Modify the self.self_par if the strucuture marks paragraphs internally. Ex: self.self_par.append("newtypename") where newtypename is the string returned by the new instance's .type() call. Headers and lists do this currently
To overload a built-in type
- In the new class, re-define the function which handles the type Ex: overloading header In the sub-class, re-define header def header(self,object): self.string = self.string + "I am not a header or crook"
- Remember that functions receive instances of the types they handle. To go through the instance's string, use the .string() call.
- The .string() call returns either a string (if the object's string is text only) or a list if the object's string contains other instances. If a list is returned it is necessary to go throught each item. There can be only three things in a list, strings, lists, and instances. For strings, call the self.paragraph function, or whatever function handles strings. For lists, call the self.loop function. For instances, call then self.instance function.