You are not logged in Log in Join
You are here: Home » Members » panjunyong » CJKSplitter - Chinese, Japanese, Korean word splitter for ZCTextIndex » change log

Log in
Name

Password

 

change log

v0.6

  • support single Chinese character search

v0.5.1

  • add CREDIT.txt and Licence infomation

v0.5

  • use regular expression to compatible with defualt English white space splitter
  • removed configuration file, much simpler code, easy to install, easy to use
  • support multiple encodings: unicode/utf-8/gb18030/gbk/gb2312/mbcs/big5. provide 3 splitters:
    • CJK splitter : support unicode/utf-8 encoding. this encoding is compatible with version 0.1
    • CJK GB splitter : support unicode/gb18030/gbk/gb2312/mbcs encodings.
    • CJK BIG5 splitter : support unicode/big5/mbcs encodings
  • unicode encoding is detected automatically. this make CJKSplitter compatible with Archtypes 1.2+ (string stored as unicode)
  • better encoding handling to avoid exception (replace)
  • smaller index storage for CJK: index stored as unicode(2 byts) but not utf-8(3 bytes)
  • support english globing
  • precise CJK char recongnize (\u4E00-\u9FFF)
  • maybe better performance, not tested
  • better documentations (thanks bjorn!)

v0.2

this is bjorn's([email protected]) contributes

v0.1

initial release, support utf-8 encoding only.