You are not logged in Log in Join
You are here: Home » Members » Mnemonic » English » Python Scripts » Strip HTML-tags

Log in
Name

Password

 

Strip HTML-tags

Based on striphtml.py from Itamar.

Modified by Frank Ristau
Modified by Zeljko Trulec

License: GPL

German version

Problem:
A document or method should be used (for example with a sendmail task). Unfortunally it contains HTML-tags we do not need.
Solution:
To respond an External Method.
Controlling word wrap with passing parameters:
 0 = Retains just the existing word wrap (\n)
 1 = In addition replaces <br> to nl (\n) and </p> to double nl (\n\n)
 2 = No word wrap in response
 
 
Step 1:
Copy the python-script striphtml.py into the Extensions Folder of your Zope-Installation.
#!/usr/bin/python

import sgmllib, string

class StrippingParser(sgmllib.SGMLParser):

    from htmlentitydefs import entitydefs # replace entitydefs from sgmllib

    def __init__(self, umbr):
        sgmllib.SGMLParser.__init__(self)
        self.result = ""
        self.umbruch = umbr

    def handle_data(self, data):
        if data:
            self.result = self.result + data

    def handle_charref(self, name):
        self.result = "%s&#%s;" % (self.result, name)

    def handle_entityref(self, name):
        if self.entitydefs.has_key(name):
            x = ';'
        else:
            # this breaks unstandard entities that end with ';'
            x = ''
        self.result = "%s&%s%s" % (self.result, name, x)

    def start_br(self, attrs):
        if self.umbruch == 1:
            self.result = self.result + "\n"

    def start_p(self, attrs):
	#muss vorhanden sein, da sonst end_p() ignoriert wird
	pass

    def end_p(self):
        if self.umbruch == 1:
            self.result = self.result + "\n\n"

    def unknown_starttag(self, tag, attrs):
        pass

    def unknown_endtag(self, tag):
        pass


def strip(s, umbr=0):
    parser = StrippingParser(umbr)
    parser.feed(s)
    parser.close()
    if umbr == 2:
        return string.join(string.split(parser.result))
    else:
        return parser.result

if __name__=='__main__':
    import sys
    print strip(open(sys.argv[1], "r").read())
Step 2:
Add an External Method with your Management-Interface:
Id striphtml 
Title striphtml 
Module Name striphtml 
Function Name strip 
 
Step 3:
Insert the DTML-tags to call striphtml with syntax:
<dtml-with striphtml_dir>
    <dtml-call "REQUEST.set('variable', striphtml(_['doc_or_meth'],w_wrap))">
</dtml-with>
Finally print the converted HTML-Text:
<dtml-var variable>