Log in |
Cataloging SQL Data and Almost Anything ElseOverviewThis How-To describes specifically how to get SQL data into a ZCatalog for searching, but the principles here can be used to put just about anything into the catalog. Disclaimer: This information is based on my experience using ZCatalog with SQL and other external data. It is not based on deep knowledge of the inner workings of the catalog, so this may or may not be making use of the catalog as it was designed to be used. ZCatalog Basics (Plus)There are some basic things about ZCatalog that you should understand before trying this procedure. If are not already very familiar with how ZCatalog works, you should read Chapter 9 of the Zope Book before you continue. To summarize what you need to know, a ZCatalog (hereinafter referred to as just the catalog) stores 2 main types of information for each object in the catalog: indexes and meta data. The indexes are used during the search to find objects based on search parameters, and the meta data is made available as the results of a matching object. For example, if I have an index The catalog stores information from an object by going through each defined index and meta data name. If the object either has a property or a method with the same name, that information is taken from the object and stored in the catalog. When searching on the catalog, a special result object is returned for each match, not the object that was cataloged (since it is never actually stored in the catalog). These are two very important things to understand and remember. One discovery I made (maybe it's documented somewhere other than the source code, I don't know), is that the unique ID under which an object is cataloged is used to construct the URL returned by the result object's Setting Up the CatalogThe first step in any search is to set up the catalog. There is no difference in setting up a catalog for "normal" searches and setting one up for a search of SQL data, so I'm not going to go into detail here on how to do that. Just make sure you have created a catalog with the desired indexes and meta data. Cataloging SQL Data There are two main steps in cataloging the SQL data: 1) setup a ZSQL Method that returns the fields you wish to store in indexes and/or meta data, and 2) create a script that calls Creating a ZSQL Method Before you can catalog SQL data, you must be able to tell the catalog what data you want it to store. You do this by creating a ZSQL Method that returns records whose field names correspond to the indexes and/or meta data names in the catalog. For example, the SQL table
The catalog contains the following indexes and meta data:
If you want the SELECT Number, Title, concat(Description, Title, Author) as PrincipiaSearchSource, 'Book' as meta_type, Number as id, Description as summary FROM Books You now have a ZSQL Method that will return a list of all of the books using field names that match the indexes and meta data of the catalog. I use Creating a Script to Catalog the Objects Now that you have a list of records you want to store, you need to create a script that will iterate over the list and add the information to the catalog. This is easily done with a Python Script. This script is created in the catalog object itself. If you place it anywhere else (it really doesn't matter where), you will need to modify it so that it can find the catalog. Since this script is in the catalog, you can just use the bound variable for book in container.getBooksToCatalog():
container.catalog_object(book,
'/Publications/getBook/ This iterates over the records returned by the ZSQL Method created above, and calls All you need to do now is execute the script and the data will be put in the catalog. You can run the script either through the The Result Object's URL A catalog search returns a list of matching result records. As with creating the catalog, there is no difference in how you search a catalog with SQL data in it. The difference is in the result object's URL. The URL as returned by The folder
SELECT * FROM Books WHERE
If you provide links using Cataloging Just About Anything Using a process similar to that above, you can catalog just about anything. As demonstrated above, it doesn't actually have to be an object in the Zope object database. It's a virtual object. If the object you pass to If this doesn't make sense, the following example will hopefully help. Cataloging Virtual Objects ExampleWhat Is Already There In this example, there are a set of articles in PDF format that can be viewed online. Each article is listed in the SQL table
The first four fields should be self explanatory. The The site contains the ZSQL Method SELECT * FROM Articles WHERE <dtml-sqltest number column="Number" type=nb> A very plain version of <dtml-var name="standard_html_header"> <TABLE><TR> <TD>Article Number</TD><TD><dtml-var Number></TD> <TD>Description</TD><TD><dtml-var Description></TD> <TD>PDF</TD><TD><A HREF="/pdfdocs/&dtml-Filename;">View/Download</A></TD> </TR></TABLE> <dtml-var name="standard_html_footer"> Setting Up the SearchThe goal is to be able to search the text of the article, but have the results direct the user to the article display page above where they can then view the PDF document. First, create a catalog that contains the following indexes and meta data:
To get a list of the articles to catalog, start with the following ZSQL Method 'getArticlesToCatalog': SELECT Number, Title, 'Article' as meta_type, Filename FROM Books This gives you a list of the articles, but in order to search the text of the PDF document, the text needs to be stored in the This where it gets exciting. You need to get the records from the ZSQL Method results to return the text of the PDF document as the Creating the ZSQL Record ClassSince Zope cannot read PDF documents, and I don't know of any Python modules that will do it, you must first convert the PDF documents into corresponding text files so that Zope can read the article text. There are a number of tools you can use to do this very quickly and with little effort. Next, create a file from string import split class Article: """Class used by ZSQL for indexing articles""" def id(self): """Use article file as id""" return split(self.Filename, def PrincipiaSearchSource(self): """Read article text""" # Get the .txt version of the .pdf filename
basename = split(self.Filename, '. try:
fp = open(filename, text = fp.read() fp.close() return text You will need to modify the path information to point to the directory where the converted .txt files are. Now go to the The final step is to create the Python Script the iterates over each ZSQL record and passes it to for article in container.getArticlesToCatalog(): container.catalog_object(article, '/Publications/getArticle/'+article.Number+'/articledetails.html') print 'Article #' + article.Number return printed When you run this script, Zope will use SummaryYou can see by example that the possibilities of what can be stored in the catalog are great. Although both examples used SQL data, there's no reason you can't use an object that gets the index and meta data information from the Zope object database, other external files, remote data on other servers, or any combination of the above. If you can create an object that gathers the appropriate information, you can stick it in the catalog. Questions/CommentsI hope this How-To has been helpful. If you have any questions, comments, or information to improve this process, please let me know. |