You are not logged in Log in Join
You are here: Home » Members » Bjorn Stabell » ZCTextIndex splitter that works with Chinese, Japanese, and Korean text » INSTALLATION » View Document

Log in
Name

Password

 

INSTALLATION

Installation

  1. To enable CJKSplitter for your portal:
    1. Unpack the product in your Products directory and restart Zope.
    2. Create a new Lexicon in portal_catalog that uses CJKSplitter. Feel free to enable stop words and case normalizing; non-latin letters will not be case normalized, and the only stop words are English.
    3. Recreate the full-text indexes (Title, SearchableText) and have them use the new lexicon.
    4. Recatalog the portal (portal_catalog -> Advanced -> Update Catalog). This may fail with UnicodeError if your object's don't return Unicode strings for the methods that get indexed using CJKSplitter, for example SearchableText and Title. To be safe you should always return Unicode strings, but if you want to be brave, set default_encoding in CJKSplitter.py to the encoding that your non-Unicode text is in; this will make CJKSplitter make braver (the default is ASCII) assumptions about non-Unicode strings when converting them into Unicode.
    5. RECOMMENDED: Add a string property called "management_character_set" with value "UTF-8" to the root of your Zope installation. This will enable you to query and view the CJK characters in the lexicon.
  2. To enable Chinese encodings in Unicode in Python (OPTIONAL):
    1. Get and install the Chinese Python codecs from Source Forge
    2. Add these aliases to .../python2.X/encodings/aliases.py:
                'gb2312': 'eucgb2312_cn',
                'big5': 'big5_tw',
      
  3. To let you print Unicode strings to your terminal (OPTIONAL):
    1. Change if 0 to if 1 where it says Enable to support locale aware default string encodings
    2. Set your appropriate LANG and LC_* variables to match your system's locale