You are not logged in Log in Join
You are here: Home » Members » rbickers » Cataloging SQL Data and Almost Anything Else

Log in
Name

Password

 

Cataloging SQL Data and Almost Anything Else

Overview

This How-To describes specifically how to get SQL data into a ZCatalog for searching, but the principles here can be used to put just about anything into the catalog.

Disclaimer: This information is based on my experience using ZCatalog with SQL and other external data. It is not based on deep knowledge of the inner workings of the catalog, so this may or may not be making use of the catalog as it was designed to be used.

ZCatalog Basics (Plus)

There are some basic things about ZCatalog that you should understand before trying this procedure. If are not already very familiar with how ZCatalog works, you should read Chapter 9 of the Zope Book before you continue.

To summarize what you need to know, a ZCatalog (hereinafter referred to as just the catalog) stores 2 main types of information for each object in the catalog: indexes and meta data. The indexes are used during the search to find objects based on search parameters, and the meta data is made available as the results of a matching object. For example, if I have an index PrincipiaSearchSource that contains the text of a document object, I can search document text by using that index. If I have a meta data title, I will be able to display the title of the object when it's returned in the list of search results.

The catalog stores information from an object by going through each defined index and meta data name. If the object either has a property or a method with the same name, that information is taken from the object and stored in the catalog. When searching on the catalog, a special result object is returned for each match, not the object that was cataloged (since it is never actually stored in the catalog). These are two very important things to understand and remember.

One discovery I made (maybe it's documented somewhere other than the source code, I don't know), is that the unique ID under which an object is cataloged is used to construct the URL returned by the result object's getURL() method. This comes in very handy when cataloging things that are not in the object database. You can catalog an object that contains the appropriate index and meta data information, but use your own uid to construct the URL you want used when the object shows up in the search results. This should make more sense as you follow the examples below.

Setting Up the Catalog

The first step in any search is to set up the catalog. There is no difference in setting up a catalog for "normal" searches and setting one up for a search of SQL data, so I'm not going to go into detail here on how to do that. Just make sure you have created a catalog with the desired indexes and meta data.

Cataloging SQL Data

There are two main steps in cataloging the SQL data: 1) setup a ZSQL Method that returns the fields you wish to store in indexes and/or meta data, and 2) create a script that calls catalog_object() on each record returned by the ZSQL Method. These two steps are described in detail below.

Creating a ZSQL Method

Before you can catalog SQL data, you must be able to tell the catalog what data you want it to store. You do this by creating a ZSQL Method that returns records whose field names correspond to the indexes and/or meta data names in the catalog.

For example, the SQL table Books contains the following fields:

  • SQL Fields: Number, Title, Author, Description, Price

The catalog contains the following indexes and meta data:

  • Indexes: PrincipiaSearchSource, bobobase_modification_time, id, meta_type, title
  • Meta data: bobobase_modification_time, id, meta_type, summary, title

If you want the PrincipiaSearchSource index to contain the Description, Title, and Author SQL fields, the summary meta data to contain the Description, and the book Number to be the id, you could use the following ZSQL Method (named getBooksToCatalog):

        SELECT Number, Title,
          concat(Description, Title, Author) as PrincipiaSearchSource,
          'Book' as meta_type, Number as id, Description as summary
        FROM Books

You now have a ZSQL Method that will return a list of all of the books using field names that match the indexes and meta data of the catalog. I use Book as the meta_type so that I can easily know in the search results that the object refers to a book. It's a nice feature for me, but it isn't necessary for this to work. This illustrates the point that you can provide static information to be stored in the catalog by including it in your query with the desired meta data name.

Creating a Script to Catalog the Objects

Now that you have a list of records you want to store, you need to create a script that will iterate over the list and add the information to the catalog. This is easily done with a Python Script. This script is created in the catalog object itself. If you place it anywhere else (it really doesn't matter where), you will need to modify it so that it can find the catalog. Since this script is in the catalog, you can just use the bound variable container to access it. The following script (named catalogBooks) will do the job::

for book in container.getBooksToCatalog(): container.catalog_object(book, '/Publications/getBook/+book.Number+/bookdetails.html') print Book # + book.Number return printed

This iterates over the records returned by the ZSQL Method created above, and calls catalog_object() on each. This is where most of the magic happens. Each book number that is cataloged will be printed, one per line...nothing fancy. You don't need the print statements, but it gives some reassuring feedback.

catalog_object() takes two parameters: 1) the object to be cataloged, and 2) a unique ID (uid). The object is a record from the ZSQL Method, and as mentioned above, the uid is used to form the URL that will be returned by the result object's getURL() method. The use of the uid will be explained in detail in the next section.

All you need to do now is execute the script and the data will be put in the catalog. You can run the script either through the Test tab in the management interface, or by entering its URL in your browser. Either way will work the same. You can browse through the catalog to verify that the indexes and meta data have been filled in as expected.

The Result Object's URL

A catalog search returns a list of matching result records. As with creating the catalog, there is no difference in how you search a catalog with SQL data in it. The difference is in the result object's URL. The URL as returned by getURL() does not reference an actual object in the object database. Instead, the uid used to catalog the SQL data will cause getURL() to return URLs like http://wherever/Publications/getBook/00123/bookdetails.html. This URL refers to a ZSQL Method getBook and a DTML Method bookdetails.html, that when used together will display the information on that book. This is actually a feature of Zope and ZSQL Methods. It has nothing to do with the catalog, but it's included here to illustrate the use of the uid.

The folder Publications contains the methods for displaying the information on a book:

  • getBook has the single argument number and has the advanced setting Allow "simple" direct traversal checked. The SQL query is as follows:
    
    

SELECT * FROM Books WHERE

  • bookdetails.html has dtml-var statements that display the fields of the result. For example (a very plain example):
            <dtml-var name="standard_html_header">
            <TABLE><TR>
            <TD>Book Number</TD><TD><dtml-var Number></TD>
            <TD>Description</TD><TD><dtml-var Description></TD>
            </TR></TABLE>
            <dtml-var name="standard_html_footer">
    

If you provide links using getURL() on the results page of a search, you can use whatever uid suits you for creating URLs that will displaying the SQL data.

Cataloging Just About Anything

Using a process similar to that above, you can catalog just about anything. As demonstrated above, it doesn't actually have to be an object in the Zope object database. It's a virtual object. If the object you pass to catalog_object() has properties or methods with the same name as the indexes and/or meta data of your catalog, you can add it to the catalog. Note that it is not necessary to provide all index and meta data information in object to be cataloged. If an object doesn't have a particular property or method with the same name as an index/meta data, it is ignored and left blank in the catalog. The key to allowing the user to view the virtual object is in the uid used when cataloging the information. You just need to construct a uid (and thus a valid URL) that will display the object.

If this doesn't make sense, the following example will hopefully help.

Cataloging Virtual Objects Example

What Is Already There

In this example, there are a set of articles in PDF format that can be viewed online. Each article is listed in the SQL table Articles with the following fields:

  • SQL Fields: Number, Title, Author, Description, Filename

The first four fields should be self explanatory. The Filename field contains the filename of the PDF document associated with the article. The PDF documents are stored on the file system in a directory accessible via a URL such as http://wherever/pdfdocs/filename.pdf. They are not stored in the Zope object database.

The site contains the ZSQL Method getArticle and the DTML Method articledetails.html that show the article information stored in the SQL table (similar to what was used in the book example above). This "preview" of the article also includes a link to the PDF document so that it can be downloaded or viewed through a browser plug-in.

getArticle is as follows:

        SELECT * FROM Articles WHERE <dtml-sqltest number column="Number" type=nb>

A very plain version of articledetails.html might is as follows:

        <dtml-var name="standard_html_header">
        <TABLE><TR>
        <TD>Article Number</TD><TD><dtml-var Number></TD>
        <TD>Description</TD><TD><dtml-var Description></TD>
        <TD>PDF</TD><TD><A HREF="/pdfdocs/&dtml-Filename;">View/Download</A></TD>
        </TR></TABLE>
        <dtml-var name="standard_html_footer">

Setting Up the Search

The goal is to be able to search the text of the article, but have the results direct the user to the article display page above where they can then view the PDF document.

First, create a catalog that contains the following indexes and meta data:

  • Indexes: PrincipiaSearchSource, bobobase_modification_time, id, meta_type, title
  • Meta data: bobobase_modification_time, id, meta_type, summary, title

To get a list of the articles to catalog, start with the following ZSQL Method 'getArticlesToCatalog':

        SELECT Number, Title, 'Article' as meta_type, Filename
        FROM Books

This gives you a list of the articles, but in order to search the text of the PDF document, the text needs to be stored in the PrincipiaSearchSource index. Since the document source is not in the database, you can't just return it using the ZSQL Method as you did with the book descriptions above.

This where it gets exciting. You need to get the records from the ZSQL Method results to return the text of the PDF document as the PrincipiaSearchSource. To do this, you need to use an advanced feature of ZSQL Methods to assign the record to a class. But first you must create the class.

Creating the ZSQL Record Class

Since Zope cannot read PDF documents, and I don't know of any Python modules that will do it, you must first convert the PDF documents into corresponding text files so that Zope can read the article text. There are a number of tools you can use to do this very quickly and with little effort.

Next, create a file sqlrecord.py in the Extensions directory of your Zope installation that contains the following source code::

from string import split

class Article: """Class used by ZSQL for indexing articles"""

def id(self): """Use article file as id"""

return split(self.Filename, .)[0]

def PrincipiaSearchSource(self): """Read article text"""

# Get the .txt version of the .pdf filename basename = split(self.Filename, '.)[0] filename = /path-to-docs/pdfdocs/+basename+.txt'

try: fp = open(filename, r) except IOError: return ''

text = fp.read() fp.close() return text

You will need to modify the path information to point to the directory where the converted .txt files are.

Now go to the Advanced tab of the getArticlesToCatalog ZSQL Method and enter Article as the Class Name and sqlrecord as the Class File. The Article class will now provide the methods id() and PrincipiaSearchSource() that can be used by the ZSQL Method to provide information to the catalog.

The final step is to create the Python Script the iterates over each ZSQL record and passes it to catalog_object(). The following script (named catalogArticles) will do the job. As with the book example, if you do not place this script in the catalog object itself, you must modify container to be your catalog:

        for article in container.getArticlesToCatalog():
            container.catalog_object(article,
              '/Publications/getArticle/'+article.Number+'/articledetails.html')
            print 'Article #' + article.Number
        return printed

When you run this script, Zope will use getArticlesToCatalog, along with the associated class methods, to store the article information and searchable text in the catalog. As you can see from the uid in the catalogArticles script, the search result object will provide URLs that point to the "preview" of the article described above, where the user can then select to view the PDF article. You could also provide a uid that points directly to the PDF file if you wanted to skip the preview.

Summary

You can see by example that the possibilities of what can be stored in the catalog are great. Although both examples used SQL data, there's no reason you can't use an object that gets the index and meta data information from the Zope object database, other external files, remote data on other servers, or any combination of the above. If you can create an object that gathers the appropriate information, you can stick it in the catalog.

Questions/Comments

I hope this How-To has been helpful. If you have any questions, comments, or information to improve this process, please let me know.