You are not logged in Log in Join
You are here: Home » Members » hellmann » CMF News Feed » CMF News Feed

Log in
Name

Password

 

CMF News Feed

CMF News Feed

The CMF News Feed is an application that allows you to import news headlines from other sites directly into your site.

For example, combine headlines from Zope.org, Slashdot, The Washington Post, and other sites into one set of news headlines on your private web site. Alternatively, monitor industry news sites and display the headlines on your company intranet.

Disclaimer

This application makes use of public web news feeds. The author of this program strongly urges users to not abuse the services provided by these feeds.

Do not configure your news feed to run more than every 15-30 minutes.

Requirements

The news feed operates directly on the database for your CMF-based Zope portal. You need to be running a CMF portal on a Zope server with a ZEO backend to allow the agent to connect and insert content into the database.

The RDF/RSS content used by the news feed is XML based, so an XML parser is required. The standard Python XML-SIG package PyXML, available from python.org, is supported.

Since all of that software, including the agent, is written in Python, you also need to have Python installed. Versions 2.1.1+ are supported.

Operation

The news feed interprets RSS/RDF feeds from sources you define and creates "CMF News Items" in your portal. To do this, it needs to connect to the object database and assume the identify of a portal member. The portal member to be used can be specified in the configuration file (see below), but it is most appropriate to create a new user such as newsfeed for this purpose. The name does not matter, but the user should not be used for other purposes.

When the news feed connects to the database, it looks through the member home directory in the portal. It examines each Folder object it finds to determine if it contains a Link object named "'RDF'". If the RDF link is found, the URL is taken as a source for news. The feed is read and new stories which have not been seen before are converted to News Story objects and stored in the a folder named for the current date..

Installation

Create an Identify for the News Feed

First, create the user to be used by the news feed program. For the purposes of this document, the name newsfeed will be used. If the news feed is to be allowed to publish documents directly, give the new user the Reviewer role.

NOTE
Currently only the default workflow is supported. This should not be difficult to change, if someone wants to contribute a patch.
NOTE
Currently the user must have the Reviewer role. An enhancement to make this requirement optional has been submitted by Tres Seavers, but I have not integrated it yet.

Install and Configure the News Feed Software

Once the user is created, the next step is to install the news feed software. The news feed can run on any computer system which has network access to the ZEO server for the portal. For the purposes of these instructions, we will assume that the feed is going to run on the same computer as the ZEO server.

  1. Login to the server as the user that owns the Zope installation. Change directory to one level above the INSTANCE_HOME for the Zope server. For the purposes of these instructions we will assume that the two are separate and that your INSTANCE_HOME is in a directory called InstanceHome.
  2. Extract the tarball containing the CMF News Feed source files to create a new directory called CMFNewsFeed-X where X is a version number that will depend on which version of the package you have downloaded. Change directory into the CMFNewsFeed-X directory, hereafter referred to as the FEED_HOME.
  3. In order for the feed program to connect to the ZEO server, you must tell it where to find the server process. Two files are necessary for this, the custom_zodb.py file tells the Zope libraries how to connect to the database and the zope.conf file tells the custom_zodb.py where that database is on the network. It is normally sufficient to copy the two files custom_zodb.py and zope.conf from the INSTANCE_HOME of your server.

    For example:

            % cp ../InstanceHome/custom_zodb.py .
            % cp ../InstanceHome/zope.conf .
    

  4. As the feed connects to the ZEO server like any other ZEO client, it will have access to the data stored within. It will not, however, know how to interpret that data until all of the Zope software and associated Products are made available to it. The zope.conf file from the previous step sets things up so the agent can access the Zope core software. Products which are installed outside of your INSTANCE_HOME should also be available as a result of copying this file over.

    Products installed within the INSTANCE_HOME, however, will not be. If there are any product directories within your INSTANCE_HOME, it is important to make them visible in the FEED_HOME. On a system which supports it, symbolic links are the easiest way to do this.

    For example:

             % ln -s ../InstanceHome/Products .
    

Configuring the Feed Software

Now all that remains is to configure the news feed so that it knows its identity within the portal, and set up the search settings to be monitored.

  1. The news feed configuration is read from the file getnews.conf in the FEED_HOME directory. For a new installation, create the a fresh version of this file by copying getnews.conf.in. For an upgrade installation, merge the changes to the file in your existing FEED_HOME, with the new getnews.conf.in to create a new file.
  2. The format of the configuration file is standard Python. The file is read into memory and executed in a protected sandbox when the news feed starts up. Required values are extracted from the resulting namespace, and other values are ignored. This means you can litter the configuration file with whatever you might need in order to build up the configuration values.
  3. Edit the configuration file to set the member_name and portal_name properties. The value member_name should be set to the name of the portal user created earlier. The portal_name should refer to the dotted path from the root of the ZODB to the CMF Portal.

    For example, if you reach the portal through the URL:

             http://www.madeupname.com/path/to/Portal
    

    then the portal_name would be:

             portal_name = 'path.to.Portal'
    
             **NOTE** - 'SiteAccess' users, the full path from the true
             root of the database must be provided.  This would include
             any folders with 'SiteRoot' objects.
    

Dirty Feeds

Some sites which claim to deliver the RSS/RDF XML format feed really deliver something that is not valid XML. For that reason, it is possible to tell the news feed to translate some strings before parsing the XML. This is done through the fixup_translation_table in the configuration file. Some common problems are handled by the table in the default getnews.conf.in file. New values can be added by appending tuples containing the "search for" text and the "replace with" text to the list.

Configuring the Feed Sources

And now the software is ready to use! All that remains is to set up the feed sources in the Portal. That configuration is done in the portal as the user created earlier. Login to the portal through your web browser as the user specified as member_name above, then for each feed source follow these instructions.

  1. Create a folder to contain the results of the feed.
  2. Set the id of the folder to a meaningful name.
  3. Within the folder, create a Link object named "'RDF'". Set the URL for the new object to the RSS/RDF feed for the news source.

    For example, to import Zope.org headlines, use:

             http://www.zope.org/SiteIndex/news.rss
    

Running the News Feed

To run the news feed, change directory to the FEED_HOME and run the program getnews.py. The configuration data is detected from the location of the software, and the feed sources are pulled out of the portal. As the feed runs, new articles will be added to the portal and automatically indexed. This makes it easy for you to set up a cron job to run every 15-30 minutes (not more frequently, please) to update the portal contents.

Additional command line arguments, mostly to control the amount of progress information printed during operation, are available. Use -h to get a list.

Credits

The CMF News Feed makes use of slashbox.py, Copyright (c) 2000 Richard Offer. See the source file for licensing and author information.