Using PostgreSQL's aggregates functions to speed up loops

Created by pupq . Last modified 2003-10-02 03:18:34.

PostgreSQL's aggregate functions can be used to significantly speed up loops that display items.

Note for PostgreSQL-users: This tutorial is about using PostgreSQL and Zope together, but the core idea here can be used in any scripting or reporting language. Instead of using Zope-looping commands (<dtml-in>), you might be using Perl, Python, PHP, or even (shudder) Visual Basic. The message here is: you can handle a large list as a single result value. Doens't matter what scripting language that's written it! :-)

Looping over sequences returned by ZSQL methods by using <dtml-in> is a great technique, and allows you to handle complicated cases, such as alternative colors, batching, etc.

Much of the time, though, you're probably doing something simple, like this:

  <ul>
    <dtml-in goals>
      <li><dtml-var goal></li>
    </dtml-in>
  </ul>

which is fine for straightforward lists.

Imagine, though, that you were listing goals not just for yourself, but for all staff at your company. To handle this, you might use nested ZSQL <dtml-in> clauses:

  <dtml-in staff>
    <h2><dtml-var staffname></h2>
    <dtml-in "goals({ 'staffname': staffname})">
      <dtml-var goal>
    </dtml-in>
  </dtml-in>

which works great, except that this means that for every single staff member, you're running a new query to find their goals. Imagine that you have 100 staff members, and each has, on average, five goals. Instead of one single query, you're running an outer query finding 100 records, and 100 inner queries to find 500 records total. This can become fairly sluggish.

If instead, you wanted to show all staff members, their goals (about 5 goals each), and progress against their goals (about 5 points each), you're running 100 x 5 x 5 = 2,500 queries -- and it's not even a terribly complicated page!

Often, you need a solution that lets you handle this without so many nested queries.

Solution One: Use first- Variables

One good solution is to use the sometimes-ignored first- variables of the <dtml-in> tag. This is a variable that is true only for the first occurence of any particular value in the loop.

For example:

  <dtml-in staff_and_goals>

    <dtml-if first-staffname>
      <h2><dtml-var staffname></h2>
      <ul>
    </dtml-if>

    <li><dtml-var goal></li>

    <dtml-if last-staffname>
      </ul>
    </dtml-if>

  </dtml-in>

runs through a single flat query that returns all goals for all staff. Every time the staff member name changes, it outputs a new header, and outputs a new goal every time.

(Of course, in a real example, you'd better use something guaranteed to be unique, such as a unique 'staffid'--otherwise you might have two staff members named John Smith, and they would show up just once, with the combined list of goals.)

We've replaced 101 queries (slow!) with one query that returns 500 records (faster!). first- tags work with any <dtml-in> sequence, not just those from a database, or those from PostgreSQL.

Solution Two: Create a PostgreSQL Aggregate for HTML Loops

A nifty PostgreSQL-specific solution can be more flexible and faster.

In PostgreSQL, you can create new aggregate functions. Aggregate functions (called domain functions in some database systems) are functions like Min(), Sum(), etc., which show information about a domain or set.

A simple query using a domain function would be:

  SELECT     staffname, Count(goals)
  FROM       Staff NATURAL JOIN Goals
  GROUP BY   staffname

which would show all staff, and a count of the number of goals they have.

PostgreSQL comes with many functions like this, including interesting ones to calculate standard deviations of a set and such.

One of the nicest features of PostgreSQL is that you can use procedural languages to write new user functions, even for aggregate functions. (In fact, though it's still beta, you can use Python as a procedural language, leading to Zope nirvana.)

We can create our own aggregate, which will be an HTML list aggregate, showing all members of a set as an HTML list.

For example:

    SELECT    staffname, html_ul(goal) 
    FROM      Staff NATURAL JOIN Goals
    GROUP BY  staffname;

would yield::

|| staffname || goals || || John Smith ||

Make friends
Learn Python;|| || Jane Doe ||
- Learn Perl

Log in

Using PostgreSQL's aggregates functions to speed up loops

Solution One: Use first- Variables

Solution Two: Create a PostgreSQL Aggregate for HTML Loops

How To Do It

Extending the Idea

Final thoughts