You are not logged in Log in Join
You are here: Home » Members » panjunyong » CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex » swpackage_view

Log in
Name

Password

 
 

CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex

CJKSplitter is a ZCTextIndex splitter for CJK (Chinese-Japenese-Korea) text stored as Unicode. It uses a simple, but workable, "hack" instead of trying to do real word splitting from dictionaries. Compared to a dictionary based word splitter, this results in a bigger index and more matches than necessary, but it is a cheap price to pay for the reduced complexity.

Note: go to plone.org for newer releases.

Feature

  • support multiple encodings: unicode/utf-8/gb18030/gbk/gb2312/mbcs/big5. provide 3 splitters(more to come):
    • CJK splitter : support unicode/utf-8 encoding. this encoding is compatible with version 0.1
    • CJK GB splitter : support unicode/gb18030/gbk/gb2312/mbcs encodings.
    • CJK GB splitter : support unicode/gb18030/gbk/gb2312/mbcs encodings.
    • CJK BIG5 splitter : support unicode/big5/mbcs encodings
  • small index storage for CJK: index stored as unicode(2 byts) but not utf-8(3 bytes)
  • support english globing
  • precise CJK char indentifying (\u4E00-\u9FFF)
  • use regular expression to compatible with defualt English white space splitter
  • easy to install, easy to use
  • support single character search

About ZOpen

ZOpen is one of the leading ZSPs(Zope Service Provider) in China. We are also the founder of CZUG (Chinese Zope User Group). We are trying to make Zope/CMF/Plone works for the Chinese people. We wish all the Chinese Zope guys can be together and make zope works better for Chinese:)

Latest Release: 0.7
Last Updated: 2004-11-15 10:05:52
Author: panjunyong
License: ZPL
Categories:
Maturity: Stable
  Information

Available Releases

Version Maturity Platform Released
cjkspliter-0_7 Stable   2004-11-15 10:05:52
  cjkspliter-0_7.tgz (7 K) All md5
  cjksplitter-0_7_1.tgz (6 K) All md5
  cjksplitter-0_7_3.tgz (3 K) All md5
CJKSplitter-0_6 Stable   2004-06-03 22:50:44
  cjksplitter-0_6.tgz (4 K) All md5
CJKSplitter-0_5_1 Stable   2004-02-10 20:56:57
  cjksplitter-0_5_1.tgz (11 K) All md5
CJKSplitter-0_5 Stable   2004-02-05 21:40:17
  cjksplitter-0_5.tgz (11 K) All md5
0.1 Stable   2006-08-14 03:23:00
  CJKSplitter0.1.zip (7 K) All