You are not logged in Log in Join
You are here: Home » Members » strobl » HowTos » ZCatalogs with Umlauts

Log in
Name

Password

 

ZCatalogs with Umlauts

Zope comes with a standard product ZCatalog, which allows creating and querying text, field- and keyword indexes for arbitrary content.

By default, a ZCatalog only recognizes standard ASCII characters as printable. This makes searching words containing letters with diacritical marks ("Umlauts") impossible. Fortunalty, there are simple ways around this obstacle, unfortunately, these are somewhat hidden and not very well known.

I'll describe two standard ways of making a ZCatalog "umlaut-aware" below.

Use the correct Locale settings.

If the Zope Server in question is under your control, and if you don't care changing the behaviour globally, start Zope using the Locale flag. In Germany, when using Zope on a German Windows 2000, one may start Zope with:

    start -L german

That's it. If setting the Locale works on your plattform, new ZCatalog content (text content for textindices, that is) will now be splitted according to the definition of "printable" according to the German Locale.

Use a ISO8859-1-Splitter

Sometimes, using locales is not an available option. On some plattforms, the necessary locale might just not work. In other cases, the start parameters of a Zope instance aren't under our control. This applies to some of the free zope services, like http://www.freezope.org, for example, which use a shared Zope for many users, and have to use identical settings for all.

In 2000, I wrote a patch to the standard splitter, and published it on zope.org. (IsoSplitter , now obsolete). Parts of this code have been integrated into the new modularized searchindex machinery a few versions ago, by Andreas Jung.

The following recipe gives a simple example of creating a simple fulltext index for a bunch of DTML Documents, using an IsoSplitter.

Recipe

  • create a ZCatalog object, call it Katalog
  • delete the Vocabulary from that catalog
  • create a new Vocabulary in Katalog, using "Werner Strobles ISO8859-1-Splitter" --well, my Name is Wolfgang Strobl, but so it goes ... :-) --, call it Vocabulary. Set "Globbing", as you see fit. Globbing means that you can use wildcards in queries. But take care: this is somewhat expensive, for large documents or large numbers of documents.
  • the newly created ZCatalog has text indices for Title and PrincipiaSearchSource, by default.

Now create the following DTML-Methods within 'Katalog':

displaySearchResult.html:

        <dtml-var standard_html_header>
        <dtml-var searchField>
        <hr>
        <h2>Hits</h2>
        <table>
        <dtml-in "Katalog(meta_type='DTML Document',
                PrincipiaSearchSource=searchWhat)" sort=title>
        <dtml-var displaySearchResultLine>
        </dtml-in>
        </table>
        <dtml-var standard_html_footer>

displaySearchResultLine:

        <tr>
        <td><a href="<dtml-var "Katalog.getURL()">"><dtml-var title>
        </a>
        </td>
        <td><small><dtml-var bobobase_modification_time fmt=ISO> </small>
        </td>
        </tr>

searchField:

        <h3>Search</h3>
        <p>
        <form enctype="application/x-www-form-urlencoded" 
         action="displaySearchResult.html" method="POST">
        <font size="3">Suche: <input type="text" name="searchWhat" size="40" 
        <dtml-if searchWhat>
        value="&dtml-searchWhat;"
        <dtml-else>
        value=""
        </dtml-if>
        ></a> <input type="submit" value="Search">
        </form>
        <small>You may use wildcards and Boolean expressions, here. 
        For example: <strong><font face="courier">kette*</font></strong>
        oder <strong><font face="courier">fichtel and sachs</font></strong>
        </small>
        </p>

and finally

search.html:

        <dtml-var standard_html_header>
        <dtml-var searchField>
        <dtml-var standard_html_footer>

  • Create or copy a few DTML Documents into the folder where Katalog is located,
  • use the Find Objects Tab in the Management View of Katalog to Find all objects of type DTML Document into this ZCatalog. Push Find and Catalog. What this actually does is traversing all documents within the folder where Katalog is located, selecting only those which are of type DTML Document, and cataloging these on the way.

Finally, display search.html, trying to find words or some combination of words.

Notes

  • A ZCatalogs payload (indexes, metadata) is in no way related to its content as a folderish object. A ZCatalog usually doesn't contain anything but a single Vocuabulary. This is somewhat confusing to newcomers.
  • Putting all the methods into the ZCatalog object is in no way neccessary, but gives an easy way of dropping a pre-configured "search machine" into arbitrary folder hierarchies, by a simple copy operation.