You are not logged in Log in Join
You are here: Home » Members » sbrauer » Webalizer with Content-Type enhancements » README for Webalizer-CT » View Document

Log in
Name

Password

 

README for Webalizer-CT

This fork of Webalizer was created in response to a problem with access log analysis that came up while using Zope.

Why Would You Care?

If you're running Zope behind Apache, and would like to be able to see the actual number of page views for your website. If you have the time, or inclination, you could go through your Zope site, and rename objects so that they all have extensions on them, but that's not always feasible.

What's The Problem?

Zope doesn't require you to use specific extensions in the IDs of Zope objects. This can be good since it hides technical details from your urls. Unfortunately, many log analysis programs (such as Webalizer, Analog, and AWstats) use file extensions in urls to distinguish "pages" or "page views" from other hits. (Note that there's no specific dependency on Zope in this version of Webalizer. Non-Zopers may also find it useful if they have similar issues with file extensions.)

And The Solution?

It occurred to me that it would be nice if an analyzer used the response Content-Type to determine whether a request resulted in a "page" or not.

It turns out that Apache's LogFormat directive makes it very easy to add the response Content-Type header value to the access logs. This is great, since a very common practice for Zope virtual hosting is to run Apache in front of Zope and use RewriteRules and the Zope "Virtual Host Monster". (If you run Zope standalone, it may be possible to configure or worst-case monkey-patch Zope to log the content-type in its Z2.log. I haven't investigated this yet. If anyone figures out how to do this, let me know so I can document it here.)

What To Do:

To add the Content-Type to the end of your "Combined Log Format" access logs, edit your Apache configuration file (httpd.conf, which is often located at /etc/httpd/conf/httpd.conf on RedHat Linux systems). Find the line that defines the combined log format. It looks like this:

LogFormat "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

We want to add the Content-Type to the end of the line. So change it to look like this:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %{Content-Type}o" combined

Then restart Apache.
From this point on, your access logs should contain the response mimetype at the end of each line. (Access logs records prior to this change will of course not have the mimetype, so this version of Webalizer will treat those records as non-pages.)

The next step is to build and install this version of Webalizer. The build/install process for this version is no different than the standard Webalizer. In brief:

  1. ./configure --enable-dns --with-dblib=/usr/lib

    If you want to install the software to a location other than /usr/local use the --prefix configure option.
    If you want to set the default location of the config file to a location other than /etc use the --with-etcdir configure option.
    For example:

    ./configure --enable-dns --with-dblib=/usr/lib --prefix=/opt --with-etcdir=/opt/etc
  2. make
  3. make install

Please refer to the INSTALL file included in this tarball, and consult the FAQ at the Webalizer site if you need further assistance.
Also note that if Webalizer was already installed as a package on your system, you may want to uninstall that package before installing this version.

This version adds a couple of extra options to the webalizer.conf file. Please read the Webalizer docs for help on the standard options. I will only discuss here the options unique to this version.

You'll want to set LogType to eclf ("Enhanced Combined Log Format"):

LogType eclf

And you'll want to make sure to set the new option PageContentType includes the mimetypes that you want Webalizer to consider "pages":

PageContentType text/html

You can specify multiple PageContentType options like so:

PageContentType text/html
PageContentType text/wml
PageContentType text/xml

One other difference in this version of Webalizer is that in addition to reporting the Top URLs, it also reports the Top Pages! You don't have to configure anything to enable this extra functionality. Also, if you do enable AllURLs and/or DumpURLs to create html and/or text files of all url hits, Webalizer will also create files containing all page hits. The files will be in the output directory and the filenames will start with "page_" instead of "url_".