Basic rules for Structured Text
Seperate the 3 steps of organizing the paragraphs, tagging structured text structures, parsing the paragraphs to generate output
Make it possible to define new types and overload/extend existing types
Make it possible to customize parsers and define new ones
How does one use Structured Text NG
There are two basic files currently, ST.py and Html.
ST.py contains the function StructuredText and the class DOC
paragraphs is either a list of paragraphs, such as from a file.readlines() or a single large string. ST.StructuredText can accept either input.
ST.StructuredText returns a structure which ST.DOC will accept.
The structure returned has the format [A,[B]]. A is the original paragraph and [B] is a list of sub-paragraphs of A.
ST.DOC is a class whose call function has been overloaded to accept the structure returned by ST.StructuredText. Since ST.DOC is a class, an instance must first be created.
ST.DOC returns a structure of the same format returned by ST.StructuredText. However, the original paragraph (part A) will be altered to reflect that structured text types have been found. Part A will consist either of a string if no structured text types were found, or it will consist of a list of sub-strings of the original paragraph which were not part of the structured text item and instances which internally maintain the sub-string which contained the structured text item
Html contains a class, !HTML whose call function has been overloaded to accept a ST.DOC structure.
Since Html.HTML is a class, an instance of it must first be created before it can be used.
Html.HTML will traverse the ST.DOC structure. The instance's will be interrogated to determine which type they are and Html.HTML will generate the appropriate code.
How does the user define new types, extend/overload old one?
To define a new type, the user would need to create a class for the new type. This class would contain the expr that matched the new type. The class
__call__method would be overloaded to receive a raw string and determine if it matched the new type. If the raw string matches, an new instance of the type is created and that instance's string becomes the matching sub-string. The instance also maintains the start and end positions of the sub-string in relation to the original string. The class also needs to maintain a span method, which returns a tuple (start,end), of the sub-string's position.
To change how a type is matched the user would need to alter the expr in the class for the type to be changed.
To ignore a type, the easy way is to remove the type from self.types in Doc. This class either be done brute force by literally removing it from the code, or by subclassing doc and simply splicing the type from self.type NOTE : this requires knowing the location of the type in the list
How to extend DOC to recognize new types
Define the new type
Need to write a new class for the type. This class must have the following
an overloaded call function
an overloaded init function
a span function
a type function
The init function will create a self.str item for the new type. Also has two items for the span function, self.start, self.end
The type function will return string which tells what type the instance is. Ex : the current header class's .type() returns "header"
A string function which returns self.str
The overloaded call function receives a string and determines if there is a matching structured text type in the string. If there is, set self.start and self.end for the range of the sub-string that matches. Create a new instance whose string is the matching sub-string. Return the new instanace
span returns the tuple (self.start,self.end)
Make it so DOC can recognize the new type
Need to create a new DOC, which subclasses the old DOC
Overload the init function. Perform the original DOC init, but then self.types needs to be modified. This is a list of structured text types. An instance of the new type must be inserted/appended to the list. NOTE : Order does matter.
How to extend DOC to overload old types
Remove a type
Sub-class ST.DOC, and overload the init function
Splice the type to remove from self.types.
Now use the sub-class of ST.DOC to accept the ST.StructuredText structure.
Modify an existing type
sub-class the existing type. Ex: class my_header(ST.header):
To modify what recognizes the structure, overload the init function. Perform the original init. There is a self.expr item which is how each type finds a matching sub-string. Modify this to fit your needs
If neccessary, modify the overloaded call function. It is up to the call function to receive a string and find any sub-strings in the string which match self.expr. If a sub-string is found, call will create an instance of the type, like the header call will create a header instance in this manner, result = header(sub-string). Call must also provide the result a start and end (result.start, result.end) which indicate the starting and ending positions of the substring in the original string. Call will return the newly created instance.
Sub-class ST.DOC and overload the init function. Call the original init, then in self.types, replace the previous class name with the new one. Ex: in the new DOC
__init__do. (For this example I will use header)
self.types[:2]? = my_header()
How to extend a parser
Sub-Class an older parser
Why Sub-class an older parser?
If the user is modifying a small number of structured text types, it is faster to sub-class and have the majority of the types pre-defined by an old parser.
To add a new type
Modify the self.types for the parser class. This is a dictionary, so order is illrelevant. Ex: self.types["newtypename"]? = self.typefunction where newtypename is the string returned by the new instance's .type() call and typefunction is the function in the parser which handles that instance type.
Modify the self.self_par if the strucuture marks paragraphs internally. Ex: self.self_par.append("newtypename") where newtypename is the string returned by the new instance's .type() call. Headers and lists do this currently
To overload a built-in type
In the new class, re-define the function which handles the type Ex: overloading header In the sub-class, re-define header def header(self,object): self.string = self.string + "I am not a header or crook"
Remember that functions receive instances of the types they handle. To go through the instance's string, use the .string() call.
The .string() call returns either a string (if the object's string is text only) or a list if the object's string contains other instances. If a list is returned it is necessary to go throught each item. There can be only three things in a list, strings, lists, and instances. For strings, call the self.paragraph function, or whatever function handles strings. For lists, call the self.loop function. For instances, call then self.instance function.