You are not logged in Log in Join
You are here: Home » Zope Documentation » Books » The Zope Book Releases » The Zope Book (2.5 Edition) » Searching and Categorizing Content

Log in
Name

Password

 
Previous Page Up one Level Next Page Searching and Categorizing Content Comments On/Off Table of Contents

Chapter 11: Searching and Categorizing Content

The Catalog is Zope's built in search engine. It allows you to categorize and search all kinds of Zope objects. You can also use it to search external data such as relational data, files, and remote web pages. In addition to searching you can use the Catalog to organize collections of objects.

Anonymous User - May 13, 2002 5:05 pm:
 the excessive number of "comment" buttons is really distracting.
Anonymous User - May 13, 2002 5:13 pm:
 Turn them off by hitting the COM button at the top or bottom of the page.
Anonymous User - July 30, 2002 12:55 pm:
 The table of contents doesn't link to the appropriate pages. Pressing Chapter12 takes one to Chapter11. Fix
 it please.
Anonymous User - Aug. 9, 2002 9:24 am:
 The link is correct.  Only the chapter number on the page is wrong, or rather, not updated.
Anonymous User - Sep. 16, 2002 6:47 pm:
 You're wrong. The Zope logo is just plain ugly.
Anonymous User - Nov. 29, 2002 5:55 pm:
 Uh... Much of this section needs to be rewritten as of Zope 2.6.  A lot has changed!
amit_khan - Dec. 4, 2002 4:52 am:
How can it be used to search external data such as relational data in MySQL, Filesystem objects and web pages
 - i mean where the object store is not ZODB? An "How-To" on the same would be really helpfull.

The Catalog supports a rich query interface. You can perform full text searching, and can search multiple indexes at once. In addition, the catalog keeps track of meta-data about indexed objects. Here are the two most common ZCatalog usage patterns:

Mass Cataloging
Cataloging a large collection of objects all at once.

Automatic Cataloging
Cataloging objects as they are created and tracking changes made to them.

Anonymous User - July 20, 2002 10:34 am:
 s/indexes/indices/

Anonymous User - July 20, 2002 12:22 pm:
 s/indexes/indices/

Getting started with Mass Cataloging

Let's take a look at how to use the catalog to search documents. Cataloging a bunch of objects all at once is called

mass cataloging

. Mass cataloging involves three steps:

  • Creating a ZCatalog
  • Finding objects and cataloging them
  • Creating a web interface to search the catalog.

Choose ZCatalog from the product add list to create a ZCatalog object. This takes you to the ZCatalog add form, as shown in [9-1].

ZCatalog add form

Figure 9-1 ZCatalog add form

The Add form asks you for an Id and a Title. The third form element is the Vocabulary select box. For now, leave this box on "Create one for me". Give your ZCatalog the Id "AnimalTracker" and click Add to create your new catalog. The Catalog icon looks like a folder with a small magnifying glass on it. Select the AnimalTracker icon to see the Contents view of the Catalog.

Anonymous User - Nov. 24, 2002 6:11 am:
 In my version (Zope 2.6.0) There is no <b><i>Vobulary</i></b> select box! 
 Where did it go and how do I create a voabulary on my own?<br>
Anonymous User - Dec. 24, 2002 9:04 am:
 Yes, and I was unable to make full-text indexing work in 2.6.0 at all. Where do I find updated info on
 cataloging? It seems that *all* existing documentation describes pre-2.6 releases of ZCatalog...
Anonymous User - Dec. 24, 2002 9:25 am:
 The ZCatalog chapter in the 2.6 edition of this book
 http://www.zope.org/Documentation/Books/ZopeBook/2_6Edition/SearchingZCatalog.stx should be updated within
 the next few weeks. In the meantime, if you're having problems, ask a question on the Zope maillist
 ([email protected]).

A ZCatalog looks a lot like a folder, but it has a few more tabs. Six tabs on the ZCatalog are the exact same six tabs you find on a standard folder. ZCatalog have the following views: Contents, Catalog, Properties, Indexes, MetaData, Find Objects, Advanced, Undo, Security, and Ownership. When you click on a ZCatalog, you are on the Contents view. Here, you can add new objects and the ZCatalog will contain them just as any folder does. You should note that containment does not imply that the object is searchable.

Now that you have created a ZCatalog, you can move onto the next step, finding objects and cataloging them. Suppose you have a zoo site with information about animals. To work with these examples, create two DTML Documents that contain information about reptiles and amphibians:

Title: Chilean four-eyed frog
The Chilean four-eyed frog has a bright pair of spots on its rump that look like enormous eyes. When seated, the frog's thighs conceal these eyespots. When predators approach, the frog lowers its head and lifts its rump, creating a much larger and more intimidating head. Frogs are amphibians.

Title: Carpet python
Morelia spilotes variegata averages 2.4 meters in length. It is a medium-sized python with black-to-gray patterns of blotches, crossbands, stripes, or a combination of these markings on a light yellowish-to-dark brown background. Snakes are reptiles.

Anonymous User - July 12, 2002 3:57 am:
 when you create the two documents

 the title and the id is Chilean four-eyed frog for example
 where do you place your text once your DTML document is made?

Visitors to your Zoo want to be able to search for information on the Zoo's animals. Eager herpetologists want to know if you have their favorite snake, so you should provide them with the ability to search for certain words and show all the documents that contain those words. Searching is one of the most useful and common web activities.

Anonymous User - Aug. 21, 2002 6:20 am:
 Since the examples tries to show us how to index contents of files, I presume they mean that the content
 should be somewhere in the DTML-file, in the header or body.

The AnimalTracker ZCatalog you created can catalog all of the documents in your Zope site and let your users search for specific words. To catalog your documents, go to the AnimalTracker ZCatalog and click on the Find Objects tab.

In this view, you tell the ZCatalog what kind of objects you are interested in. You want to catalog all DTML Documents so select DTML Document from the Find objects of type multiple selection and click Find and Catalog.

The ZCatalog will now start from the folder where it is located and search for all DTML Documents. It will search the folder and then descend down into all of the sub-folders and their sub-folders. If you have lots and lots of objects, this may take a long time to complete, so be patient.

Anonymous User - Nov. 29, 2002 5:47 pm:
 Is this saying that the ZCatalog should be the root of the site?
Anonymous User - Dec. 24, 2002 9:07 am:
The text clearly states that. However it seems that it starts indexing from the container of ZCatalog object.

After a period of time, the Catalog will take you to the Catalog view automatically, with a status message telling you what it just did.

Below the status information is a list of objects that are cataloged, they are all DTML Documents. To confirm that these are the objects you are interested in, you can click on them to visit them.

You have completed the first step of searching your objects, cataloging them into a ZCatalog. Now your documents are in the ZCatalog's database. Now you can move onto the third step, creating a web page and result form to query the ZCatalog.

Below the status information is a list of objects that are cataloged. They are all DTML Documents. To confirm that these are the objects you are interested in, you can click on them to visit them.

Anonymous User - Dec. 9, 2002 11:06 am:
 yeah, i thought so too

You have completed the first step of searching your objects, cataloging them into a ZCatalog. Now your documents are in the ZCatalog's database. Now you can move onto the third step, creating a web page and result form to query the ZCatalog.

Anonymous User - May 29, 2002 4:45 pm:
 The last two paragraphs appear twice.

Search and Report Forms

To create search and report forms, make sure you are inside the AnimalTracker catalog and select Z Search Interface from the add list. Select the AnimalTracker ZCatalog as the searchable object, as shown in [9-2].

russf - Nov. 12, 2002 5:53 pm:
 This does not mention ZCTTextIndex and the lexicon that I tripped over in the current version.

Creating a search form for a ZCatalog

Figure 9-2 Creating a search form for a ZCatalog

Anonymous User - July 15, 2002 4:22 am:
 mirobest

Name the Report Id "SearchResults" and the Search Input Id "SearchForm" and click Add. This will create two new DTML Methods in the AnimalTracker ZCatalog named SeachForm and SearchResults.

Anonymous User - Jan. 7, 2003 4:00 pm:
 Assuming in the lastest version this means to select "Generate DTML Methods" rather than "Generate Page
 Templates". Not sure what the latter would be.

These objects are contained in the ZCatalog, but they are not cataloged by the ZCatalog. The AnimalTracker has only cataloged DTML Documents. The search Form and Report methods are just a user interface to search the animal documents in the Catalog. You can verify this by noting that the search and report forms are not listed in the Cataloged Objects tab.

To search the AnimalTracker ZCatalog, select the SearchForm method and click on its View tab. This form has a number of elements on it. There is one search element for each index in the ZCatalog. Indexes are explained further in the next section. For now, you want to use the PrincipiaSearchSource form element. You can leave all the other form elements blank.

ypan - Dec. 5, 2002 11:18 pm:
 There mast be something wrong. I get following form output:
 "This query requires no input."
Anonymous User - Jan. 6, 2003 2:34 pm:
 I�ve got the same message when Viewing.
Anonymous User - Jan. 6, 2003 3:53 pm:
 I do too.  This appears to be broken.
Anonymous User - Jan. 6, 2003 4:35 pm:
 I don't understand how my objects get indexed. My searches are returning nothing, and I assume it is because
 they never got indexed? This whole section is very unclear and confuses which sections need to come before
 which others. For example, One needs to create the Z Search Interface AFTER the indices have been defined,
 not before, as laid out in this chapter. The search interface does not get automatically updated. It is very
 unclear what to name indices, and there are no predefined indices to assist in this understanding.
Anonymous User - Jan. 7, 2003 4:04 pm:
 As far as I can tell, there is no "PrincipiaSearchSource" form element any longer, and that is the source of
 these problems.

By typing words into the PrincipiaSearchSource form element you can search all of the documents cataloged by the AnimalTracker ZCatalog. For example, type in the word "Reptiles". The AnimalTracker ZCatalog will be searched and return a simple table of objects that have the word "Reptiles" in them. The search results should include the carpet python. You can also try specifying multiple search terms like "reptile amphibian". Search results for this query should include both the Chilean four-eyed Frog and the carpet python. Congratulations, you have successfully created a catalog, cataloged content into it and searched it through the web.

Anonymous User - May 30, 2002 7:58 pm:
 Actually, the search return none of them. I believe it�s mispelled. And the search engine doesn�t perform
term expansiom (e.g. plural<->singular. If the search term is "reptile" and the term one item is
indexed by
 is "reptiles", the item won�t be in the result list).
ajung - May 31, 2002 9:19 am:
 ZCatalog/TextIndex does not claim to perform term expansion or stemming!
Anonymous User - July 12, 2002 4:54 am:
 No, so the query is misspelt in the above paragraph. Search for "reptiles amphibians"

Configuring Catalogs

The Catalog is capable of much more powerful and complex searches than the one you just performed. Let's take a look at how the Catalog stores information. This will help you tailor your catalogs to provide the sort of searching you want.

Defining Indexes

ZCatalogs store information about objects and their contents in fast databases called indexes. Indexes can store and retrieve large volumes of information very quickly. You can create different kinds of indexes that remember different kinds of information about your objects. For example, you could have one index that remembers the text content of DTML Documents, and another index that remembers any objects that have a specific property.

When you search a ZCatalog you are not searching through your objects one by one. That would take far too much time if you had a lot of objects. Before you search a ZCatalog, it looks at your objects and remembers whatever you tell it to remember about them. This process is called indexing. From then on, you can search for certain criteria and the ZCatalog will return objects that match the criteria you provide.

A good way to think of an index in a ZCatalog is just like an index in a book. For example, in a book's index you can look up the word Python:


Python: 23, 67, 227

The word Python appears on three pages. Zope indexes work like this except that they map the search term, in this case the word Python, to a list of all the objects that contain it, instead of a list of pages in a book.

In Zope 2.4, indexes can be added and removed from a Catalog using a new, "pluggable" index interface as shown in [9-3]:

Managing indexes

Figure 9-3 Managing indexes

Here, you can see that ZCatalogs come with some predefined indexes. Each index has a name, like PrincipiaSearchSource, and a type, like TextIndex.

Anonymous User - Jan. 7, 2003 4:33 pm:
 TextIndex is now deprecated. Instead, use ZCTextIndex, which requires the previous creation of a ZCTextIndex
 Lexicon. Boy does this section of the book need to be updated!

When you catalog an object the Catalog uses each index to examine the object. The catalog consults attributes and methods to find an object's value for each index. For example, in the case of the DTML Documents cataloged with a

PrincipiaSearchSource

index, the Catalog calls each document's

PrincipiaSearchSource

method and records the results in its

PrincipiaSearchSource

index. If the Catalog cannot find an attribute or method for an index, then it ignores it. In other words it's fine if an object does not support a given index. There are four kinds of indexes:

TextIndex
Searches text. Use this kind of index when you want a full-text search.

FieldIndex
Searches objects for specific values. Use this kind of index when you want to search date objects, numbers, or specific strings.

KeywordIndex
Searches collections of specific values. This index is like a FieldIndex, but it allows you to search collections rather than single values.

PathIndex
Searches for all objects that contain certain URL path elements. For example, you could search for all the objects whose paths begin with /Animals/Zoo.

Anonymous User - June 20, 2002 9:35 pm:
 TopicIndex --
   Searches among FilteredSets;  each set contains the document IDs of documents
   which match the set's filter expression.  Use this kind of index to optimize
   frequently-accessed searches.

 DateIndex --
   A subclass of FieldIndex, optimized for date-time values.  Use this index
   for any field known to be a date or a date-time.

 DateRangeIndex --
   Searches objects based on a pair of dates / date-times.  Use this index to
   search for objects which are "current" or "in effect" at a given time.

Anonymous User - Jan. 7, 2003 5:34 pm:
Not mentioned at all here is that the way you establish the index for an object is to add Properties for that
 index. The Property names should exactly match the names that will be indexed.

We'll examine these different indexes more closely later in the chapter. New indexes can be created from the Indexes view of a ZCatalog. There, you can enter the name and select a type for your new index. This creates a new empty index in the ZCatalog. To populate this index with information, you need to Go to the Advanced view and click the the Update Catalog button. Recataloging your content may take a while if you have lots of cataloged objects.

ajung - Apr. 21, 2002 4:44 pm:
 The addForm for TextIndexes allow to select a vocabulary. 
 It should be mentioned that a fulltext search with globbing support
 requires to create a new vocabulary with globbing support and
 that his vocabulary must be selected at index creation time.
Anonymous User - Sep. 27, 2002 6:47 pm:
 what is "globbing"?

To remove an index from a Catalog, select the Indexes and click on the Delete button. This will delete the index and all of its indexed content. As usual, this operation is undoable.

Defining Meta Data

The ZCatalog can not only index information about your object, but it can also store information about your object in a tabular database called the Meta-Data Table. The Meta-Data Table works similarly to a relational database table, it consists of one or more columns that define the schema of the table. The table is filled with rows of information about cataloged objects. These rows can contain information about cataloged objects that you want to store in the table. Your meta data columns don't need to match your Catalog's indexes. Indexes allow you to search; meta-data allows you to report search results.

The Meta-Data Table is useful for generating search reports. It keeps track of information about objects that goes on your report forms. For example, if you create a Meta-Data Table column called absolute_url, then your report forms can use this information to create links to your objects that are returned in search results.

To add a new Meta-Data Table column, type in the name of the column on the Meta-Data Table view and click Add. To remove a column from the Meta-Data Table, select the column check box and click on the Delete button. This will delete the column and all of its content for each row. As usual, this operation is undoable. Next let's look more closely at how to search a Catalog.

Searching Catalogs

You can search a Catalog by passing it search terms. These search terms describe what you are looking for in one or more indexes. The Catalog can glean this information from the web request, or you can pass this information explicitly from DTML or Python. In response to a search request, a Catalog will return a list of records corresponding to the cataloged objects that match the search terms.

Searching with Forms

In this chapter you used the Z Search Interface to automatically build a Form/Action pair to query a Catalog (the Form/Action pattern is discussed in Chapter 4, "Dynamic Content with DTML"). The Z Search Interface builds a very simple form and a very simple report. These two methods are a good place to start understanding how Catalogs are queried and how you can customize and extend your search interface.

Anonymous User - Sep. 27, 2002 6:59 pm:
 The Form/Action pattern is discussed in
 http://www.zope.org/Documentation/Books/ZopeBook/current/AdvZPT.stx
 Section "Form Processing"

Suppose you have a catalog that holds news items. Each news item has contents, an author and a date. Your catalog has three indexes that correspond to these attributes. The contents index is a text index, and the author and date indexes are field indexes. Here is the search form that would allow you to query such a catalog:


<dtml-var standard_html_header>

<form action="Report" method="get">
<h2><dtml-var document_title></h2>
Enter query parameters:<br><table>

<tr><th>Content</th>
    <td><input name="content" width=30 value=""></td></tr>
<tr><th>Author</th>
    <td><input name="author" width=30 value=""></td></tr>
<tr><th>Date</th>
    <td><input name="date"  width=30 value=""></td></tr>

<tr><td colspan=2 align=center>
<input type="SUBMIT" value="Submit Query">
</td></tr>
</table>
</form>

<dtml-var standard_html_footer>
Anonymous User - June 4, 2002 5:47 pm:
 This can easily made XML-friendly. Terminate the INPUT elements with / and quote the WIDTH attribute. Oh and
 use br/ unstead of br.

This form consists of three input boxes named content, author, and date. These names of the input form elements match the names of the indexes in the catalog. These names must match the names of the catalog's indexes for the catalog to find the search terms. Here is a report form that works with the search form:


<dtml-var standard_html_header>

<table>
  <dtml-in NewsCatalog>
  <tr>
    <td><dtml-var author></td>
    <td><dtml-var date></td>
  </tr>
  </dtml-in>
</table>

<dtml-var standard_html_footer>
Anonymous User - Sep. 11, 2002 8:45 am:
Again you managed to give a teriffic example. This doesn't do anything but to list what you have typed in for
 author and date in the 'form'.
 And even more: It does so for each object that is in your catalog.
 What did you want to explain, here?
 I mean, if one has come so far, he/she should be at a point where he/she understands how the in-loop works
 ... and there isn't anything else you can learn from this example, is there??
Anonymous User - Jan. 14, 2003 6:32 am:
 this works if you want to look for a 
 object containing the data AND author.
 but what about if you want to search in the catalog
 looking for data OR author ???

There are a few things going on here which merit closer examination. The heart of the whole thing is the in tag.:


<dtml-in NewsCatalog>

This tag calls the NewsCatalog Catalog. Notice how the form parameters from the search form (content, author, date) are not mentioned here at all. Zope automatically makes sure that the query parameters from the search form are given to the Catalog. All you have to do is make sure the report form calls the Catalog. Zope locates the search terms in the web request and passes them to the Catalog.

The Catalog returns a sequence of Record Objects (just like ZSQL Methods). These record objects correspond to search hits, which are objects that match the search criteria you typed in. For a record to match a search, it must match all criteria for each specified index. So if you enter an author and some search terms for the contents, the Catalog will only return records that match both the author and the contents.

Anonymous User - Jan. 14, 2003 6:36 am:
 "the Catalog will only return records that match both 
 the author and the contents"
 And if i want the author OR the contents ?????

Record objects had an attribute for every column in the database table. Record objects for Catalogs work very similarly, except that a Catalog Record object has an attribute for every column in the Meta-Data Table. In fact, the purpose of the Meta-Data Table is to define the schema for the Record objects that Catalog queries return.

Anonymous User - June 3, 2002 4:34 pm:
 s/Record objects had/Record objects have/

Searching from Python

DTML makes querying a Catalog from a form very simple. For the most part, DTML will automatically make sure your search parameters are passed properly to the Catalog.

Sometimes though you may not want to search a Catalog from a web form; some other part of your application may want to query a Catalog. For example, suppose you want to add a sidebar to the Zope Zoo that shows news items that only relate to the animals in the section of the site that you are currently looking at. As you've seen, the Zope Zoo site is built up from Folders that organize all the sections according to animal. Each Folder's id is a name that specifies the group or animal the folder contains. Suppose you want your sidebar to show you all the news items that contain the id of the current section. Here is a Script called relevantSectionNews that queries the news Catalog with the currentfolder's id:


## Script (Python) "relevantSectionNews"
##
""" Returns news relevant to the current folder's id """
id=context.getId()
return context.NewsCatalog({'content' : id})

This script queries the NewsCatalog by calling it like a method. Catalog's expect a mapping as the first argument when they are called. The argument maps the name of an index to the search terms you are looking for. In this case, the content index will be queried for all news items that contain the name of the current Folder. To use this in your sidebar, just edit the Zope Zoo's standard_html_header to use the relevantSectionNews script:


<html>
<body>
<dtml-var style_sheet>
<dtml-var navigation>
<ul>
<dtml-in relevantSectionNews>
  <li><a href="&dtml-absolute_url;"><dtml-var title></a></li>
</dtml-in>
</ul>
Anonymous User - June 29, 2002 1:19 pm:
 Should be "Catalogs expect...".

This method assumes that you have defined absolute_url and title as meta-data columns in the news Catalog. Now, when you are in a particular section, the sidebar will show a simple list of links to news items that contain the id of the current animal section you are viewing.

Searching and Indexing Details

Earlier you saw that the Catalog supports three types of indexes, text indexes, field indexes and keyword indexes. Let's examine these indexes more closely to understand what they are good for and how to search them.

Searching Text Indexes

A Text Index is used to index text. After indexing, you can search the index for objects that contain certain words. Text Indexes support a rich search grammar for doing more advanced searches than just looking for a word. ZCatalog's Text Index can:

  • Search for Boolean expressions like "word1 AND word2". This will search for all objects that contain both "word1" and "word2". Valid Boolean operators include AND, OR, and AND NOT.
  • Control search order with parenthetical expressions "(word1 AND word2) OR word3)". This will return objects containing "word1" and "word2" or just objects that contain the term "word3".
  • If you use a special kind of Vocabulary object (explained a little further on) you can search using simple wild cards like "Z*", which returns all words that begin with "Z".

All of these advanced features can be mixed together. For example, "((bob AND uncle) AND NOT Zoo*)" will return all objects that contain the terms "bob" and "uncle" but will not include any objects that contain words that start with "Zoo" like "Zoologist", "Zoology", or "Zoo" itself.

eisokangas - July 16, 2002 3:22 am:
 I don't understand why you need a vocabulary for wildcards. In Zope 2.5.1 even more advanced wildcards (e.g.
 *comput*) appear to work, even on German words/documents, without a German-specific vocab.

Querying a TextIndex with these advanced features works just like querying it with the original simple features. In the HTML search form for DTML Documents, for example, you could enter "Koala AND Lion" and get all documents about Koalas and Lions. Querying a TextIndex from Python with advanced features works much the same; suppose you want to change your relevantSectionNews Script to not include any news items that contain the word "catastrophic":


## Script (Python) "relevantSectionNews"
##
""" Returns relevant, non-catastropic news """"
id=context.getId()
return context.NewsCatalog(
         {'content' : id + ' AND NOT catastrophic'}
        )

TextIndexes are very powerful. When mixed with the Automatic Cataloging pattern described later in the chapter, they give you the ability to automatically free-text search all of your objects as you create and edit them.

Vocabularies

Vocabularies are used by text indexes. A vocabulary is an object that manages language specific text indexing options. In order for the ZCatalog to work with any kind of language, it must understand certain behaviors of that language. For example, all languages:

  • have a different concept of words. In English and many other languages, words are defined by white space boundaries, but in other languages, like Chinese and Japanese, words are defined by their contextual usage.
  • have different concepts of stop words. A stop word is a common word that should be ignored by indexes. The French word nous would be extremely common in French text and should probably be removed as a stop word, but in English text it might make perfect sense to catalog this word because it is very infrequent.
  • have different concepts of synonymous, The synonym pair automobile/car would not make sense in any language but English.
  • have different concepts of stemming. In English, it is common for text indexers to strip suffixes like ing from words, so that bake and baking match the same word. This is called stemming. These suffix strippings would only make sense to English, and other languages would want to provide their own stemming (or none at all).

Current Vocabularies

There are a number of vocabularies currently available for ZCatalog:

Plain Vocabularies
Plain vocabularies are very simple and do minimal English language specific tasks.

Globbing Vocabularies
Globbing vocabularies are more complex vocabularies that allow wild card searches on English text to be performed. The down side of them is that they consume a lot more memory and database space than plain vocabularies.

Anonymous User - Sep. 27, 2002 7:13 pm:
 /strip suffixes like *ing*/strip suffixes like *e* and *ing*/ blf

The idea behind Vocabularies is to customize the way text in any language is indexed. Because of this, other languages may be supported in the future by people who create a Vocabulary specific to their language. Creating your own Vocabulary is an advanced topic, and beyond the scope of this book.

ajung - Apr. 21, 2002 4:41 pm:
 There should be a short example how globbing works and what
 wildcards are used.
Anonymous User - Apr. 29, 2002 11:15 pm:
 and where to find/install one
ajung - May 31, 2002 9:25 am:
 For Zope 2.6 there are some new parameters on the Add form for vocabularies:

 Index numbers -- allows to index and search for numbers
 Index single characters  -- allows to index and search for single characters
 Case-insensitive  -- make indexing and searching case-sensitive or not
eisokangas - July 16, 2002 3:18 am:
 The concepts behind globbing / stoplists, wildcards, stemming, etc., as applied in Zcatalog, need much more
 fleshing out.

Using Vocabularies

When you create a new ZCatalog, the ZCatalog add form has a select box for you to choose a vocabulary to use. If you do not select a vocabulary, the ZCatalog automatically creates a Plain Vocabulary for you, and adds it to the ZCatalog's contents (this can be seen on the Contents view of the AnimalTracker you created for the examples in this chapter).

To use a Globbing Vocabulary or any other kind of Vocabulary, you must create it first before you create the Catalog you want to use it on. A ZCatalog can use any Vocabulary inside its contents or any Vocabulary that it can find above it in the Zope Folder hierarchy.

Searching Field Indexes

FieldIndexes differ slightly from TextIndexes. A TextIndex will treat the value it finds in your object, for example the contents of a News Item, like text. This means that it breaks the text up into words and indexes all the individual words.

A FieldIndex does not break up the value it finds. Instead, it indexes the entire value it finds. This is very useful for tracking objects that have traits with fixed values.

In the news item example, you created two FieldIndexes, date and author. With the existing search form, these fields are not very useful. To use them more effectively you have to customize your search form a little. Before doing that though, let's consider some use cases for these indexes.

The date index lets you search for News Items by the time they were created. The existing search form is not very useful though because you have to type in exactly the time you were looking for, right down to the second, in the text box to get any hits. This is obviously not very useful. It would be better to search for a range of dates, like all of the News Items added in the last 24 hours, or all of the next Items from last month.

The author index lets you search for News Items by certain authors. Unless you know exactly the name of the author you are looking for though, you will not get any results. It would be better to be able to select from a list of all the unique authors indexed by the author index.

FieldIndexes are designed to do both range searching and searching for a unique value in the index. To take advantage of these features, you need only change your search form a little bit. Let's try the first example, range searching with dates.

tseaver - June 20, 2002 9:38 pm:
 This example should be ripped out and re-used for the "Searching DateIndexes" example; maybe we could use a
 different example (e.g., a download count?)
 for the FieldIndex version.

Like TextIndexes, FieldIndexes can be passed special options to enable these features. These special features need to be passed in as form elements that get turned into Catalog queries. Here is the search form used in the previous section Searching with Forms, but with some new form elements added to enable searching for News Items modified since "Yesterday", "Last Week", "Last Month", "Last Year" or "Ever":


<dtml-var standard_html_header>

<form action="Report" method="get">
<h2><dtml-var document_title></h2>
Search for News Items:<br><table>

<tr><th>Content</th>
    <td><input name="content" width=30 value=""></td></tr>
<tr><th>Author</th>
    <td><input name="author" width=30 value=""></td></tr>
<tr>
  <td><p>modified since:</p></td>
  <td>
    <input type="hidden" name="date_usage" value="range:min">
    <select name="date:date">
      <option value="<dtml-var expr="ZopeTime(0)" >">Ever</option> 
      <option value="<dtml-var expr="ZopeTime() - 1" >">Yesterday</option>
      <option value="<dtml-var expr="ZopeTime() - 7" >">Last Week</option>
      <option value="<dtml-var expr="ZopeTime() - 30" >">Last Month</option>
      <option value="<dtml-var expr="ZopeTime() - 365" >">Last Year</option>
    </select>
  </td>
</tr>

<tr><td colspan=2 align=center>
<input type="SUBMIT" value="Submit Query">
</td></tr>
</table>
</form>
<dtml-var standard_html_footer>

This should make your search form look like [9-4].

Range searching by Date

Figure 9-4 Range searching by Date

This HTML form changes the date format from the old search form. Instead of just a text box, it offers you a selection box where you can choose a date. But remember, this is a range search. Can you spot the part that tells the date FieldIndex to search by range? Here it is:


<input type="hidden" name="date_usage" value="range:min">
Anonymous User - Sep. 13, 2002 8:25 am:
 No matter what I do, with this script I always get my whole catalog returned.
 Even if I change <dtml-var expr="ZopeTime() - 1"> to 
<dtml-var expr="ZopeTime() - 0"> I get the whole catalog returned. I am afraid I am to dumb to use Zope
 properly. Please... HELP!!!!!

This is a special kind of HTML form element called a hidden element. It does not show up anywhere on the search form that you look at, but it is still passed into Zope when you submit the form. This special element, called date_usage tells the date FieldIndex that the value in the date form element is a minimum range boundary. This means that the FieldIndex will not just return objects that have that date, but it will return objects that have that date or any later date.

Any kind of FieldIndex can be told what kind of range specifiers to use by adding an additional search argument that suffixes the index name with "_usage". In addition to specifying a minimum range boundary, you specify a maximum range boundary by changing the hidden form element to:


<input type="hidden" name="date_usage" value="range:max">
Anonymous User - Apr. 24, 2002 8:28 am:
 The usage of <index>_usage should no longer be mentioned as we
 deprecated this syntax in favour for using records.

This will cause the search form to return all News Items modified before the specified date, instead of after.

The "_usage" syntax can also be used when calling a Catalog directly from a script, like this Script, relevantRecentSectionNews:


## Script (Python) "relevantRecentSectionNews"
##
""" Return relevant, and recent, news for this section """ 
id=context.getId()
return context.NewsCatalog(
         {'content'    : id,
          'date'       : ZopeTime() - 7,
          'date_usage' : 'range:min',
         } 
        )

This works just like your old relevantSectionNews script, except that it only shows news items created in the last week.

You can also supply both a minimum and maximum range boundary. There's one catch to this, however. Normally if you specify no range boundary or just one boundary, ZCatalog uses the value you pass in as the search term. But when you provide two range boundaries, the ZCatalog needs two values, not one. Here is the relevantRecentSectionNews Script above with some slight modification to provide a list of date objects instead of just one:


## Script (Python) "relevantRecentSectionNews"
##
""" 
Return relevant news modified in the last month, but not the
last week
"""
id=context.getId()
return context.NewsCatalog(
         {'content'    : id,
          'date'       : [ZopeTime() - 30, ZopeTime() - 7],
          'date_usage' : 'range:min:max',
         } 
        )

This script will return all of the relevant News Items modified in the last month, but not in the last week. When using two range specifiers, it is important to make sure you get the order of the values to correctly match the order of the range specifiers. If you were to accidentally switch the "min" and "max" around, but didn't switch around the two dates, then you will get no search results because you are making a query that doesn't make sense (providing a minimum value that is larger than the maximum value).

The second use case you considered above was being able to search from a list of all unique authors. There is a special method on the ZCatalog that does exactly this called uniqueValuesFor. The uniqueValuesFor method returns a list of unique values for a certain index. Let's change your search form yet again, and replace the original author input box with something a little more useful:


<dtml-var standard_html_header>

<form action="Report" method="get">
<h2><dtml-var document_title></h2>
Search for News Items:<br><table>

<tr><th>Content:</th>
    <td><input name="content" width=30 value=""></td></tr>
<tr valign="top">
   <td><p>Author:</p></td>

   <td>
     <select name="author:list" size=6 MULTIPLE>
     <dtml-in expr="AnimalTracker.uniqueValuesFor('author')">
       <option value="<dtml-var sequence-item>">
       <dtml-var sequence-item></option>
     </dtml-in>
     </select>
   </td>
 </tr>

<tr>
  <td><p>modified since:</p></td>
  <td>
    <input type="hidden" name="date_usage" value="range:min">
    <select name="date:date">
      <option value="<dtml-var "ZopeTime(0)" >">Ever</option> 
      <option value="<dtml-var "ZopeTime() - 1" >">Yesterday</option>
      <option value="<dtml-var "ZopeTime() - 7" >">Last Week</option>
      <option value="<dtml-var "ZopeTime() - 30" >">Last Month</option>
      <option value="<dtml-var "ZopeTime() - 365" >">Last Year</option>
    </select>
  </td>
</tr>

<tr><td colspan=2 align=center>
<input type="SUBMIT" name="SUBMIT" value="Submit Query">
</td></tr>
</table>
</form>
<dtml-var standard_html_footer>

The new, important bit of code added to the search form is:


<select name="author:list" size=6 MULTIPLE>
<dtml-in expr="AnimalTracker.uniqueValuesFor('author')">
  <option value="<dtml-var sequence-item>">
  <dtml-var sequence-item></option>
</dtml-in>
</select>
Anonymous User - Apr. 27, 2002 6:32 pm:
 A generic sample in addition would also help for us who use the ZB as a reference. Then we dont have to read
 the whole chapter above to figure out what is going on.
 Example: 
 <select name="zCatalogIndexName:list" size=6 MULTIPLE>
 <dtml-in expr="ZCatalogID.uniqueValuesFor('zCatalogIndexName')">
   <option value="<dtml-var sequence-item>">
   <dtml-var sequence-item></option>
 </dtml-in>
 </select>

 This would also help to establish some standards for writing additional how-to's etc. BTW-I can't get this
 script to work properly.
Anonymous User - Apr. 28, 2002 6:05 pm:
 uniqueValuesFor ONLY works if the property in the Index is a FieldIndex, not TextIndex.

The HTML was also changed a bit to make the on-screen presentation make sense.

In this example, you are changing the form element author from just a simple text box to an HTML multiple select box. This box contains a unique list of all the authors that are indexed in the author FieldIndex. Now, your search form should look like [9-5].

Range searching and unique Authors

Figure 9-5 Range searching and unique Authors

Anonymous User - Apr. 28, 2002 5:59 pm:
 This Figure does not match the HTML in the example above.

That's it. You can continue to extend this search form using HTML form elements to be as complex as you'd like. In the next section, we'll show you how to use the next kind of index, keyword indexes.

Searching Keyword Indexes

A KeywordIndex indexes a sequence of keywords for objects and can be queried for any objects that have one or more of those keywords.

Suppose that you have a number of Image objects that have a topics property. The topics property is a lines property that lists the relevant topics for a given Image, for example, "Portraits", "19th Century", and "Women" for a picture of Queen Victoria.

The topics provide a way of categorizing Images. Each Image can belong in one or more categories depending on its topics property. For example, the portrait of Queen Victoria belongs to three categories and can thus be found by searching for any of the three terms.

You can use a KeyWord index to search the topics property. Define a KeyWord index with the name topics on your ZCatalog. Then catalog your Images. Now you should be able to find all the Images that are portraits by creating a search form and searching for "Portraits" in the topics field. You can also find all pictures that represent 19th Century subjects by searching for "19th Century".

It's important to realize that the same Image can be in more than one category. This gives you much more flexibility in searching and categorizing your objects than you get with a field index. Using a field index your portrait of Queen Victoria can only be categorized one way. Using a keyword index it can be categorized a couple different ways.

Often you will use a small list of terms with KeyWord indexes. In this case you may want to use the uniqueValuesFor method to create a custom search form. For example here's a snippet of DTML that will create a multiple select box for all the values in the topics index:


<select name="topics:list" multiple>
<dtml-in expr="uniqueValuesFor('topics')">
  <option value="&dtml-sequence-item;"><dtml-var sequence-item></option>
</dtml-in>
</select>
Anonymous User - Aug. 7, 2002 7:44 am:
 what if I want something to input by hand? (Ie I've 1000 keywords, cannot display them all)

  <textarea name="topics:lines"></textarea>

  BIG PROBLEM: works only if you've one keywordindex! If you've 2:

  <textarea name="topics:lines"></textarea>
  <textarea name="topics2:lines"></textarea>

  Doesnt work. Only work if you put ALL the keywords in "topics" and "topics2".

  how to solve it? Is is a bug?

Using this search form you can provide users with a range of valid search terms. You can select as many topics as you want and Zope will find all the Images that match one or more of your selected topics. Not only can each object have several indexed terms, but you can provide several search terms and find all objects that have one or more of those values.

Searching Path Indexes

Path indexes allow you to search for objects based on their location in Zope. Suppose you have an object whose path is /zoo/animals/Africa/tiger.doc. You can find this object with the path queries: /zoo, or /zoo/animals, or /zoo/animals/Africa. In other words, a path index allows you to find objects within a given folder (and below).

If you place related objects within the same folders, you can use path indexes to quickly located these objects. For example:


<h2>Lizard Pictures</h2>

<p>
<dtml-in expr="Catalog(meta_type='Image',
                       path='/Zoo/Animals/Lizard')">
<a href="&dtml-absolute_url;"><dtml-var title></a>
</dtml-in>
</p>
Anonymous User - Apr. 28, 2002 6:25 pm:
 typo - quickly "locate" these objects

This query searches a catalog for all images that are located within the /Zoo/Animals/Lizard folder and below. It creates a link to each image.

tseaver - June 20, 2002 9:46 pm:
 More from the TopicIndex README:

   TopicIndex API

     'addFilteredSet(Id, filterType, expression)' --
       Add a new filtered set.

       o 'Id':  unique Id for the FilteredSet

       o 'filterType':     'PythonFilteredSet'

       o 'expression':  Python expression defining the filter

     'delFilteredSet(Id)' --
       Remove the given FilteredSet from the index.

       o 'Id':  unique Id for the FilteredSet

     'clearFilteredSet(Id)' --
       Remove all document IDs from the FilteredSet.

       o 'Id':  unique Id for the FilteredSet

Depending on how you choose to arrange objects in your site, you may find that a path indexes are more or less effective. If you locate objects without regard to their subject (for example, if objects are mostly located in user "home" folders) then path indexes may be of limited value. In these cases, key word and field indexes will be more useful.

tseaver - June 20, 2002 9:41 pm:
 Here is the README for TopicIndex:

  TopicIndex

     Reference: http://dev.zope.org/Wikis/DevSite/Proposals/TopicIndexes

     A TopicIndex is a container for so-called FilteredSet. A FilteredSet
     consists of an expression and a set of internal ZCatalog document 
     identifiers that represent a pre-calculated result list for performance
     reasons. Instead of executing the same query on a ZCatalog multiple times
     it is much faster to use a TopicIndex instead.

     Building up FilteredSets happens on the fly when objects are cataloged
     and uncatalogued. Every indexed object is evaluated against the expressions
     of every FilteredSet. An object is added to a FilteredSet if the expression
     with the object evaluates to 1. Uncatalogued objects are removed from the
     FilteredSet.

   Types of FilteredSet

     PythonFilteredSet

       A PythonFilteredSet evaluates using the eval() function inside the
       context of the FilteredSet class. The object to be indexes must
       be referenced inside the expression using "o.".

       Examples::

              "o.meta_type=='DTML Method'"

   Queries on TopicIndexes

     A TopicIndex is queried in the same way as other ZCatalog Indexes and
     supports usage of the 'operator' parameter to specify how to combine
     search results.
tseaver - June 20, 2002 9:49 pm:
 From the DateIndexes README:

   Overview

     Normal FieldIndexes *can* be used to index values which are DateTime
     instances, but they are hideously expensive:

     o DateTime instances are *huge*, both in RAM and on disk.

     o DateTime instances maintain an absurd amount of precision, far
       beyond any reasonable search criteria for "normal" cases.

     DateIndex is a pluggable index which addresses these two issues
     as follows:

     o It normalizes the indexed value to an integer representation
       with a granularity of one minute.

     o It normalizes the 'query' value into the same form.

     o Objects which return 'None' for the index query are omitted from
       the index.
tseaver - June 20, 2002 9:51 pm:
 From the DateRangeIndexes README:

   Overview

     Zope applications frequently wish to perform efficient queries
     against a pair of date attributes/methods, representing a time
     interval (e.g., effective / expiration dates).  This query *can*
     be done using a pair of indexes, but this implementation is
     hideously expensive:

     o DateTime instances are *huge*, both in RAM and on disk.

     o DateTime instances maintain an absurd amount of precision, far
       beyond any reasonable search criteria for "normal" cases.

     o Results must be fetched and intersected between two indexes.

     o Handling objects which do not specify both endpoints (i.e.,
       where the interval is open or half-open) is iffy, as the
       default value needs to be coerced into a different abnormal
       value for each end to permit ordered comparison.

     o The *very* common case of the open interval (neither endpoint
       specified) should be optimized.

     DateRangeIndex is a pluggable index which addresses these issues
     as follows:

     o It groups the "open" case into a special set, '_always'.

     o It maintains separate ordered sets for each of the "half-open"
       cases: '_since_only' and '_until_only'

     o It performs the expensive "intersect two range search" operation
       only on the (usually small) set of objects which provide a
       closed interval.

     o It flattens the key values into integers with granularity of
       one minute.

     o It normalizes the 'query' value into the same form.

Advanced Searching with Records

A new feature in Zope 2.4 is the ability to query indexes more precisely using record objects. Record objects contain information about how to query an index. Records are Python objects with attributes, or mappings. Different indexes support different record attributes.

Keyword Index Record Attributes

query
Either a sequence of words or a single word. (mandatory)
operator
Specifies whether all keywords or only one need to match. Allowed values: and, or. (optional, default: or)

For example:


# big or shiny
results=Catalog(categories=['big, 'shiny'])

# big and shiny
results=Catalog(categories={'query':['big','shiny'], 
                                     'operator':'and'})
Anonymous User - Aug. 7, 2002 7:50 am:
 how is this related with the input form, for example? More example needed.
Anonymous User - Jan. 14, 2003 6:48 am:
 and what about ???
 ((categorires = A or B) AND name = C) or
  (categories = C or name = D)

 etc, etc, etc..
 and NOR, XOR...

The second query matches objects that have both the keywords "big" and "shiny". Without using the record syntax you can only match objects that are big or shiny.

Field Index Record Attributes

query
Either a sequence of objects or a single value to be passed as query to the index (mandatory)
range
Defines a range search on a Field Index (optional, default: not set).

Allowed values:

min
Searches for all objects with values larger than the minimum of the values passed in the query parameter.
max
Searches for all objects with values smaller than the maximum of the values passed in the query parameter.
minmax
Searches for all objects with values smaller than the maximum of the values passed in the query parameter and larger than the minimum of the values passwd in the query parameter.

For example:


# items modified in the last week
results=Catalog(bobobase_modification_time={
                  'query':DateTime() - 7,
                  'range': 'min'}
                )
Anonymous User - Aug. 2, 2002 10:46 am:
 typo: "passwd" -> "passed"

This query matches objects with a bobobase_modification_time of less than DateTime() -7. Compare this query with one defined in relevantRecentSectionNews earlier in this chapter which uses date_usage to accomplish the same query.

Text Index Record Attributes

query
Either a sequence of words (seperated by white space) or a single word to be passed as query to the index. (mandatory)
operator
Specifies how to combine the search terms. (optional, default: or).

Allowed values:

and
All terms must be present.
or
At least one term must be present.
andnot
The first term must be present, but none of the rest of the terms.

There's not much reason to use record queries with text indexes since you can embed the operator information in the query string itself in a very flexible manner.

Path Index Record Attributes

query
Path to search for either as a string (e.g. "/Zoo/Birds") or list (e.g. ["Zoo", "Birds"]). (mandatory)
level
The path level to begin searching at. (optional, default: 0)

Suppose you have a collection of objects with these paths:

  1. /aa/bb/aa
  2. /aa/bb/bb
  3. /aa/bb/cc
  4. /bb/bb/aa
  5. /bb/bb/bb
  6. /bb/bb/cc
  7. /cc/bb/aa
  8. /cc/bb/bb
  9. /cc/bb/cc

Here are some examples queries and their results to show how the level attribute works:

jshell - Aug. 8, 2002 5:54 pm:
 This example full of aa's and bb's and such is very hard to follow, especially when wanting the answer "do
 path indexes do Depth based searching?"
 For example, WebDAV (and also Exchange 2000, LDAP, etc) have the concept of Depth in their queries. A query
 of 'depth 0' means 'search/match ONLY the object specified (no children)'. A query of 'depth 1' means
 'search/match the object specified and its immediate children, but no further'. This yields results similar
 to calling 'objectValues()' in an ObjectManager. 'depth infinite' means 'search/match from this particular
 point in the tree, downward to infinity'.
 It seems every other tree based storage system has the concept of 'depth' in their query languages, and Path
 Indexes still seem broken without it.
Anonymous User - Sep. 12, 2002 6:55 am:
 This 'level' concept needs more explaining. I guess that level=0 means
 that the path is interpreted as absolute starting from the root, -1 means 
 relative/starting from anywhere in the path and the other values means 
 absolute starting as the nth path element in the searched path. 

 I second the opinion that the example with all the 'bb's is difficult to
 understand.

  • query=/aa/bb, level=0 returns 1, 2, 3
  • query=/bb/bb, level=0 returns 4, 5, 6
  • query=/bb/bb, level=1 returns 2, 5, 8
  • query=/bb/bb, level=-1 returns 2, 4, 5, 6, 8
  • query=/xx, level=-1 returns none

You can use the level attribute to flexibly search different parts of the path.

As of Zope 2.4.1, you can also include level information in a search without using a record. Simply use a tuple containing the query and the level. Here's an example tuple: ("/aa/bb", 1).

Anonymous User - Apr. 30, 2002 1:04 am:
 Link broken, example needs context.

Creating Records in HTML

You can also perform record queries using HTML forms. Here's an example showing how to create a search form using records:


<form action="Report" method="get">
<table>
<tr><th>Search Terms (must match all terms)</th>
    <td><input name="content.query:record" width=30 value=""></td></tr>
    <input type="hidden" name="content.operator:record" value="and">
<tr><td colspan=2 align=center>
<input type="SUBMIT" value="Submit Query">
</td></tr>
</table>
</form>

For more information on creating records in HTML see the section "Passing Parameters to Scripts" in Chapter 10, Advanced Zope Scripting.

Stored Queries

While the main use of the Catalog is to provide interactive searching, you can also use stored queries to categorize and organize your site. For example, in the section on keyword indexes you saw how you can use the Catalog and properties to search for categories of Images such as portraits. In addition to providing interactive searching for categories of Images you can create web pages with canned queries. So for example, here's some DTML that you could use for a page that displays all your portraits:


<dtml-var standard_html_header>

<h1>Portraits</h1>

<dtml-in expr="ImageCatalog({'topics':'Portraits'})">
<p> 
<dtml-var sequence-item>
<dtml-var title_or_id>
</p>
</dtml-in>

<dtml-var standard_html_footer>
Anonymous User - May 27, 2002 8:31 am:
 Through out the tutorial you ahve not mentioned what is ImageCatalog and suddenly u put that ...can you be
 more specific .
 I acn really tell one thing this is the best tutorial i ahve ever seen in any language.

The dynamic nature of this page is not visible to the viewer. However, just add another portrait, update the catalog and this page will automatically include the new Image.

This technique can be very powerful. Not only can you organize and display public resources, but you can easily institute workflow systems by tagging objects with properties to indicate their state and cataloging them. After that it's easy for you to create pages for different people that show which objects need their attention. This technique is even more powerful when using the Automatic Cataloging pattern.

Automatic Cataloging

Automatic Cataloging is an advanced Catalog usage pattern that keeps objects up to date as they are changed. It requires that as objects are created, changed, and destroyed, they are automatically tracked by a ZCatalog. This usually involves the objects notifying the Catalog when they are created, changed, or deleted.

Anonymous User - May 6, 2002 8:58 am:
 This usually involves the objects notifying the Catalog when they are created, changed, or deleted.

 For creation & modification, I understand it wors with the Object.reindex() method. I didn't find any
 information on how to tell the catalog an oject's been destroyed.
 Is the catalog supposed to detect the situation by itself.

 Did I mis something ??
Anonymous User - May 13, 2002 7:07 pm:
 correction between fig 9.6 & 9.7
Randolpho - May 17, 2002 3:57 pm:
 Anonymous User 1 -- you're think of the Z Catalog, when automatic cataloging is done by the item cataloged,
not the Z Catalog. The Item either has to roll its own form of catalog awareness, or it needs to inherit from
 Z Catalog's CatalogPathAwareness class. That gives the object methods like index_object(), unindex_object(),
 and reindex_object() (which is "surprisingly useful" ;)). It also implements some standard Zope hooks called
 manage_beforeDelete(), manage_afterAdd(), and manage_beforeClone(). These index, unindex, or reindex the
object. The object that inherits from CatalogPathAwareness must provide its own form of notification to the Z
 Catalog. This is best done by calling self.reindex_object() at the end of an edit method. I wrote a special
 Product for just such an occasion that is fully catalog aware and can contain any number of attributes. It's
 available at http://www.zope.org/Members/Randolpho/ZCatalogedObject

This usage pattern has a number of advantages in comparison to mass cataloging. Mass cataloging is simple but has drawbacks. The total amount of content you can index in one transaction is equivalent to the amount of free virtual memory available to the Zope process, plus the amount of temporary storage the system has. In other words, the more content you want to index all at once, the better your computer hardware has to be. Mass cataloging works well for indexing up to a few thousand objects, but beyond that automatic indexing works much better.

Another major advantage of automatic cataloging is that it can handle objects that change. As objects evolve and change, the index information is always current, even for rapidly changing information sources like message boards.

In this section, we'll show you an example that creates "news" items thatpeople can add to your site. These items will get automatically cataloged. This example consists of two steps:

  • Creating a new type of object to catalog.
  • Creating a Catalog to catalog the newly created objects.

Anonymous User - May 29, 2002 4:57 pm:
 s/thatpeople/that people/

As of Zope 2.3, none of the "out-of-the-box" Zope objects support automatic cataloging. This is for backwards compatibility reasons. For now, you have to define your own kind of objects that can be cataloged automatically. One of the ways this can be done is by defining a ZClass.

Anonymous User - Sep. 27, 2002 7:54 pm:
 Read this book from start on. dunno "ZClass". explain. blf

A ZClass is a Zope object that defines new types of Zope objects. In a way, a ZClass is like a blueprint that describes how new Zope objects are built. Consider a news item as discussed in examples earlier in the chapter. News items not only have content, but they also have specific properties that make them news items. Often these Items come in collections that have their own properties. You want to build a News site that collects News Items, reviews them, and posts them online to a web site where readers can read them.

In this kind of system, you may want to create a new type of object called a News Item. This way, when you want to add a new news item to your site, you just select it from the product add list. If you design this object to be automatically cataloged, then you can search your news content very powerfully. In this example, you will just skim a little over ZClasses, which are described in much more detail in Chapter 14, "Extending Zope."

New types of objects are defined in the Products section of the Control Panel. This is reached by clicking on the Control Panel and then clicking on Product Management. Products contain new kinds of ZClasses. On this screen, click "Add" to add a New product. You will be taken to the Add form for new Products.

Name the new Product "News" and click "Generate". This will take you back to the Products Management view and you will see your new Product.

Select the News Product by clicking on it. This new Product looks a lot like a Folder. It contains one object called Help and has an Add menu, as well as the usual Folder "tabs" across the top. To add a new ZClass, pull down the Add menu and select ZClass. This will take you to the ZClass add form, as shown in [9-6].

ZClass add form

Figure 9-6 ZClass add form

This is a complicated form which will be explained in much more detail in Chapter 14, "Extending Zope". For now, you only need to do three things to create your ZClass:

  • Specify the Id "NewsItem" This is the name of the new ZClass.
  • Specify the meta_type "News Item". This will be used to create the Add menu entry for your new type of object.
  • Select ZCatalog:CatalogAware from the left hand Base Classes box, and click the button with the arrow pointing to the right hand Base Classes box. This should cause ZCatalog:CatalogAware to show up in the right hand window.

When you're done, don't change any of the other settings in the Form. To create your new ZClass, click Add. This will take you back to your News Product. Notice that there is now a new object called NewsItem as well as several other objects. The NewsItem object is your new ZClass. The other objects are "helpers" that you will examine more in Chapter 14, "Extending Zope".

Anonymous User - May 8, 2002 5:57 am:
 If the class has to inherit from another base class (for example : OFSFolder)
 CatalogAware has to be put FIRST, otherwise it doesn't work.
Randolpho - May 17, 2002 3:38 pm:
 The reason it doesn't work has to do with Python's multiple Inheritance; most base classes enherit the
 methods "manage_afterAdd", "manage_afterClone" and "manage_beforeDelete". CatalogAwareness requires the use
 of those methods to inform the ZCatalog of an instantiation, deletion, or copy. Python specifies that the
 classes that come first in the inheritance list get priority when it comes to overridden methods. That means
 that if you inherit from a base class before CatalogAware, you inherit the base classes "manage_afterAdd",
 "manage_afterClone", and "manage_beforeDelete" methods, rather than CatalogAware's versions of them.
 Also, unless there has been a change in Zope 2.5.1 that I'm unaware of, you need to inherit from
 CatalogPathAwareness, not CatalogAwareness.

Select the NewsItem ZClass object. Your view should now look like [9-7].

A ZClass Methods View

Figure 9-7 A ZClass Methods View

This is the

Methods

View of a ZClass. Here, you can add Zope objects that will act as

methods on your new type of object

. Here, for example, you can create DTML Methods or Scripts and these objects will become methods on any new

News Items

that are created. Before creating any methods however, let's review the needs of this new "News Item" object:

News Content
The news Item contains news content, this is its primary purpose. This content should be any kind of plain text or marked up content like HTML or XML.

Author Credit
The News Item should provide some kind of credit to the author or organization that created it.

Date
News Items are timely, so the date that the item was created is important.

Keywords
News Items fit into various lists of categories. By convention, these lists of categories are often called keywords.

tseaver - June 20, 2002 9:55 pm:
 The catalog index on 'Date' should be a 'DateRangeIndex'.

You may want your new News Item object to have other properties, these are just suggestions. To add new properties to your News Item click on the Property Sheets tab. This takes you to the Property Sheets view.

Properties are added to new types of objects in groups called Property Sheets. Since your object has no property sheets defined, this view is empty. To add a New Property Sheet, click Add Common Instance Property Sheet, and give the sheet the name "News". Now click Add. This will add a new Property Sheet called News to your object. Clicking on the new Property Sheet will take you to the Properties view of the News Property Sheet, as shown in [9-8].

The properties screen for a Property Sheet

Figure 9-8 The properties screen for a Property Sheet

This view is almost identical to the

Properties

view found on Folders and other objects. Here, you can create the properties of your News Item object. Create three new properties in this form:

content
This property's type should be text. Each newly created News Item will contain its own unique content property.

author
This property's type should be string. This will contain the name of the news author.

date
This property's type should be date. This will contain the time and date the news item was last updated. A date property requires a value, so for now you can enter the string "01/01/2000".

That's it! Now you have created a Property Sheet that describes your News Items and what kind of information they contain. Properties can be thought of as the data that an object contains. Now that we have the data all set, you need to create an interface to your new kind of objects. This is done by creating new Views for your object.

Click on the Views tab. This will take you to the Views view, as shown in [9-9].

The Views view

Figure 9-9 The Views view

Here, you can see that Zope has created three default Views for you. These views will be described in much more detail in Chapter 14, "Extending Zope", but for now, it suffices to say that these views define the tabs that your objects will eventually have.

To create a new view, use the form at the bottom of the Views view. Create a new View with the name "News" and select "propertysheets/News/manage" from the select box and click Add. This will create a new View on this screen under the original three Views, as shown in [9-10].

The new News View

Figure 9-10 The new News View

Since this View is going to give us the ability to edit the News Item, we want to make it the first view that you see when you select a News Item object. To change the order of the views, select the newly created News view and click the First button. This should move the new view from the bottom to the top of the list.

The final step in creating a ZClass is defining the methods for the class. Methods are defined on the Methods View. Click on the Methods tab and you will be taken to the Methods view. Select DTML Method from the add list and add a new DTML Method with the id "index_html". This will be the default view of your news item. Add the following DTML to the new method:


<dtml-var standard_html_header>

<h1>News Flash</h1>

<p><dtml-var date></p>

<p><dtml-var author></p>

<P><dtml-var content></p>

<dtml-var standard_html_footer>

That's it! You've created your own kind of object called a News Item. When you go to the root folder, you will now see a new entry in your add list.

But don't add any new News Items yet, because the second step in this exercise is to create a Catalog that will catalog your new News Items. Go to the root folder and create a new catalog with the id Catalog.

Anonymous User - Apr. 15, 2002 12:44 pm:
 Why the root folder? Is it because automatic cataloguing only works within the acquisition path? This should
 be stated more clearly. Perhaps "in the root folder, or any folder that your object will acquire by
 acquisition."
neves - June 19, 2002 6:40 pm:
 It should be clear that you must use the id "Catalog" for your zcatalog. It 
 is the default name the catalogaware class will search in the acquisition 
 path.
Anonymous User - July 3, 2002 7:55 am:
 But I need to have different catalogs which work with different z-classes. How can I change the catalog id
 that the class searches for, from the default 'Catalog'?
Anonymous User - Oct. 9, 2002 8:56 pm:
 A workaround for me was to use multiple "Catalogs" in different folders. Let's say you have one Catalog in
 the Root Folder and another Catalog in Foo/Foe/Products. If you add a CatalogPathAware ZClass below
 Foo/Foe/Products it will be added to the Catalog in the same folder but not to the Catalog in the Root
 Folder. If you add the ZClass somewhere above Foo/Foe/Products, e.g. Foo, it will be added to the Catalog in
the Root Folder. This leads to the conclusion that ZClass simply scans upwards in the hierachie for the first
 ZCatalog named "Catalog".

Like the previous two examples of using a ZCatalog, you need to create Indexes and a Meta-Data Table that make sense for your objects. First, delete the default indexes in the new ZCatalog and create the following indexes to replace them:

content
This should be a TextIndex. This will index the content of your News Items.

title
This should be a TextIndex. This will index the title of your News Items.

author
This should be a FieldIndex. This will index the author of the News Item.

date
This should be a FieldIndex. This will index the date of the News Item.

tseaver - June 20, 2002 9:59 pm:
 'date' --
   this should be a DateIndex.  This will index the date of the News Item.

After creating these Indexes, delete the default Meta-Data columns and add these columns to replace them:

  • author
  • date
  • title
  • absolute_url

After creating the Indexes and Meta-Data Table columns, create a search interface for the Catalog using the Z Search Interface tool described previously in this chapter.

Anonymous User - Apr. 15, 2002 12:47 pm:
 Is this step necessary in order for automatic cataloguing to work? If so, state that and why; if not, put
 this information later, after showing that the catalogue works, such as "now that the automatic catalogue is
 working, you will probably want to go back and add an interface with which to search it, as shown earlier in
 this chapter."

Now you are ready to go. Start by adding some new News Items to your Zope. Go anywhere in Zope and select News Item from the add list. This will take you to the add Form for News items.

Anonymous User - Apr. 15, 2002 12:51 pm:
 "Go anywhere in Zope" because the catalogue is in the root folder, or because the catalogue works from
 anywhere? What is unclear is how the catalogue finds items to be automatically catalogued, and how the items
 find the catalogue. From these instructions it would appear that any object that subclasses "ZCatalogAware"
 will show up in every ZCatalog on the site. Certainly this isn't true. So how does a ZCatalog make the
 distinction which items to automatically catalog? I've completed these instructions three times with no
success, but believe I know my way around Zope fairly well. Is there something missing in these instructions?
Anonymous User - May 24, 2002 1:48 pm:
 Create ZCatalog and name it 'Catalog', and it should works :)

Give your new News Item the id "KoalaGivesBirth" and click Add. This will create a new News Item. Select the new News Item.

Notice how it has four tabs that match the four Views that were in the ZClass. The first View is News, this view corresponds to the News Property Sheet you created in the News Item ZClass.

Enter your news in the contents box:


Today, Bob the Koala bear gave birth to little baby Jimbo.

Enter your name in the Author box, and today's date in the Date box.

Click Change and your News Item should now contain some news. Because the News Item object is CatalogAware, it is automatically cataloged when it is changed or added. Verify this by looking at the Cataloged Objects tab of the ZCatalog you created for this example.

Anonymous User - May 6, 2002 7:54 am:
I have tried this a couple of times and had no success. Although I made sure that my Zclass was CatalogAware,
 I find myself having to enter the Zcatalog method and updat it myself.
Anonymous User - May 6, 2002 9:01 am:
 So did I,

 it's very confusing, I understand that the catalog CAN do it, but actually do it only if told with a
 object.reindex() command.
Anonymous User - May 27, 2002 9:07 am:
 I ahve one doubt ..like i ahve created the content types in pyhton can you tell how can i make it automatic
 catalog aware. As it is mentioned if i am adding it through the zclasses ....then i guess it may work but
 what if i add through the python code.
Anonymous User - May 30, 2002 3:27 am:
 This part of the tutorial does seem to be wrong - the object is successfully added to the catalogue when
 initially created, and removed when it is deleted, but is not reindexed when - for example - a property is
 changed. This seems to be the same whether you extend CatalogAware, CatalogPathAware, or both.
 What's the best way of getting reindex() called after a property update? Someone has mentioned adding it to
 the end of edit methods, but I can't see what these are/where to hook into.
Anonymous User - June 27, 2002 9:01 am:
 well, read the how-to in zope.org. It calls reindex if the objext is changed

The News Item you added is the only object that is cataloged. As you add more News Items to your site, they will automatically get cataloged here. Add a few more items, and then experiment with searching the ZCatalog. For example, if you search for "Koala" you should get back the KoalaGivesBirth News Item.

At this point you may want to use some of the more advanced search forms that you created earlier in the chapter. You can see for example that as you add new News Items with new authors, the authors select list on the search form changes to include the new information.

Anonymous User - Jan. 28, 2003 2:18 pm:
 Who are you? 

 My card, pretty lady. 

 Devil May-Care Music Production, Beelzebub Scratch, President. 

 I like your style, too bad you're not a singer. 

 Oh, but I am, I am a singer! 

 Hmm, no fooling. 

 No, no listen... 

 Fantastic, different.

Conclusion

The cataloging features of ZCatalog allow you to search your objects for certain attributes very quickly. This can be very useful for sites with lots of content that many people need to be able to search in an efficient manner.

Searching the ZCatalog works a lot like searching a relational database, except that the searching is more object-oriented. Not all data models are object-oriented however, so in some cases you will want to use the ZCatalog, but in other cases you may want to use a relational database. The next chapter goes into more details about how Zope works with relational databases, and how you can use relational data as objects in Zope.

Previous Page Up one Level Next Page Searching and Categorizing Content Comments On/Off Table of Contents