You are not logged in Log in Join
You are here: Home » Members » Stefan's Home » Unicode aware lexicon type for ZCTextIndex » swpackage_view

Log in



Unicode aware lexicon type for ZCTextIndex


The stock lexicon deals well with 8 bit strings if you get the locale setting in zope.conf right; it does not work with Unicode and UTF-8. UnicodeLexicon fills this gap.


This product adds a ZCTextIndex Unicode Lexicon type to Zope. The lexicon comes with word splitters, stop word removers, and a case normalizer.

If you have GenericSetup installed, you can use the provided extension profile to create Unicode lexicons in your portal_catalog and update the Title, Description, and SearchableText ZCTextIndexes.


The lexicon assumes either Unicode or UTF-8. If your site employs e.g. UTF-16 (aka UCS-2) or UTF-32 (aka UCS-4) you will have to change the enc constant in accordingly.

The extension profile installs lexicons without stop word removers. This is because only English language stop words are supported, and it is safe to assume you are using Unicode precisely because you need to handle non-English text.

Latest Release: 1.0.0 (Zope 2.8-2.11)
Last Updated: 2006-08-14 04:36:43
Author: shh
License: ZPL
Categories: Search/Catalog
Maturity: Stable

Available Releases

Version Maturity Platform Released
1.0.0 Stable   2006-08-14 04:36:43
  UnicodeLexicon-1.0.0.tar.gz (9 K) All md5