You are not logged in Log in Join
You are here: Home » Members » Bjorn Stabell » ZCTextIndex splitter that works with Chinese, Japanese, and Korean text » CHANGES » View Document

Log in
Name

Password

 

CHANGES

Changes

Version 0.2 - 2003-03-09 - Bjorn Stabell, Exoweb [email protected]

  • Rewrote process algorithm to just work directly on Unicode. There was no need to work in UTF-8, as the old program did. Need to make sure all text is stored as Unicode, so people need to start using ...:utf8:utext etc in their form fields.
  • Removed configuration file stuff. Replaced definition of symbols and CJK with calls to unicodedata.category().
  • Made structure and interface conform more closely to another splitter: HTMLSplitter.
  • Removed Chinese comments and added English comments.
  • Described how algorithm works.
  • Made it refreshable by ignoring ValueError from registerFactory. (There is probably a better way of doing this.)
  • Added to installation instructions how to enable GB2312 and Unicode support in Python. Hopefully we can Chinese support with the Python distributed with Zope in the future.
  • Added a simple set of unit tests.

Version 0.1 - 2002-12-14 - ZopeChina.com [email protected]

  • Original version of splitter. Works for UTF-8.