You are not logged in Log in Join
You are here: Home » Members » pupq » Using PostgreSQL's aggregates functions to speed up loops » howto_view

Log in
Name

Password

 

Using PostgreSQL's aggregates functions to speed up loops

 

Created by pupq . Last modified 2003-10-02 03:18:34.

PostgreSQL's aggregate functions can be used to significantly speed up loops that display items.

Note for PostgreSQL-users: This tutorial is about using PostgreSQL and Zope together, but the core idea here can be used in any scripting or reporting language. Instead of using Zope-looping commands (<dtml-in>), you might be using Perl, Python, PHP, or even (shudder) Visual Basic. The message here is: you can handle a large list as a single result value. Doens't matter what scripting language that's written it! :-)

Looping over sequences returned by ZSQL methods by using <dtml-in> is a great technique, and allows you to handle complicated cases, such as alternative colors, batching, etc.

Much of the time, though, you're probably doing something simple, like this:

  <ul>
    <dtml-in goals>
      <li><dtml-var goal></li>
    </dtml-in>
  </ul>

which is fine for straightforward lists.

Imagine, though, that you were listing goals not just for yourself, but for all staff at your company. To handle this, you might use nested ZSQL <dtml-in> clauses:

  <dtml-in staff>
    <h2><dtml-var staffname></h2>
    <dtml-in "goals({ 'staffname': staffname})">
      <dtml-var goal>
    </dtml-in>
  </dtml-in>

which works great, except that this means that for every single staff member, you're running a new query to find their goals. Imagine that you have 100 staff members, and each has, on average, five goals. Instead of one single query, you're running an outer query finding 100 records, and 100 inner queries to find 500 records total. This can become fairly sluggish.

If instead, you wanted to show all staff members, their goals (about 5 goals each), and progress against their goals (about 5 points each), you're running 100 x 5 x 5 = 2,500 queries -- and it's not even a terribly complicated page!

Often, you need a solution that lets you handle this without so many nested queries.

Solution One: Use first- Variables

One good solution is to use the sometimes-ignored first- variables of the <dtml-in> tag. This is a variable that is true only for the first occurence of any particular value in the loop.

For example:

  <dtml-in staff_and_goals>

    <dtml-if first-staffname>
      <h2><dtml-var staffname></h2>
      <ul>
    </dtml-if>

    <li><dtml-var goal></li>

    <dtml-if last-staffname>
      </ul>
    </dtml-if>

  </dtml-in>

runs through a single flat query that returns all goals for all staff. Every time the staff member name changes, it outputs a new header, and outputs a new goal every time.

(Of course, in a real example, you'd better use something guaranteed to be unique, such as a unique 'staffid'--otherwise you might have two staff members named John Smith, and they would show up just once, with the combined list of goals.)

We've replaced 101 queries (slow!) with one query that returns 500 records (faster!). first- tags work with any <dtml-in> sequence, not just those from a database, or those from PostgreSQL.

Solution Two: Create a PostgreSQL Aggregate for HTML Loops

A nifty PostgreSQL-specific solution can be more flexible and faster.

In PostgreSQL, you can create new aggregate functions. Aggregate functions (called domain functions in some database systems) are functions like Min(), Sum(), etc., which show information about a domain or set.

A simple query using a domain function would be:

  SELECT     staffname, Count(goals)
  FROM       Staff NATURAL JOIN Goals
  GROUP BY   staffname

which would show all staff, and a count of the number of goals they have.

PostgreSQL comes with many functions like this, including interesting ones to calculate standard deviations of a set and such.

One of the nicest features of PostgreSQL is that you can use procedural languages to write new user functions, even for aggregate functions. (In fact, though it's still beta, you can use Python as a procedural language, leading to Zope nirvana.)

We can create our own aggregate, which will be an HTML list aggregate, showing all members of a set as an HTML list.

For example:

    SELECT    staffname, html_ul(goal) 
    FROM      Staff NATURAL JOIN Goals
    GROUP BY  staffname;

would yield::

|| staffname || goals || || John Smith ||

  • Make friends
  • Learn Python;|| || Jane Doe ||
    • Learn Perl
      • ||

        So that we get a single text value for the entire list, already formatted as a simple HTML unordered list.

        This way, we can run a single query, and get 100 records (less time in DTML looping, less time for Zope to assign every single instance variable, etc.) This can be a good win for programmer productivity and performance.

        The biggest advantage, though, is that this technique can be used when you have more than one sub-list. For example, if we were listing all staff's goals and their possessions, we'd have a nasty, nested in loop--the -first loop idea above becomes cumbsersome. With the aggregated-list method, though, this is as simple as adding html_ul(possessions) to our SQL query.

        How To Do It

        You need to have the plpgsql procedural language installed and set up in PostgreSQL. See the help that comes with PostgreSQL if you don't know how to do this.

        (You could also do this in any of the other procedural languages that PostgreSQL can use: tcl, perl, or python. For simple things like this, though, plpgsql is probably the easiest choice.)

        Then:

          CREATE FUNCTION html_li_agg(text,text) RETURNS text AS '
            begin
              return $1 || ''<li>'' || $2 || ''</li>'';
            end;
          ' LANGUAGE 'plpgsql';
        

        This is the function which is called for every item in the list. It returns whatever it has already seen (passed in the first parameter, $1, and adds the new item, $2, surrounded by <li> ... </li>.

        To handle the wrapping of the entire thing in <ul> ... '

      ':
        CREATE FUNCTION html_ul_final(text) RETURN text AS '
          begin
            if $1 <> '''' then
              return ''<ul>'' || $1 || ''</ul>'';
            else
              return null;
            end if;
          end;
        ' LANGUAGE 'plpgsql';
      

      This is the final function, meaning that it is called at the end of the aggregating. If any items were found in the list, it wraps them in a <ul> set; otherwise, it returns null. This way, we can examine our field for null values (to find empty lists), and won't insert empty <ul></ul> lists into our pages.

      Lastly:

        CREATE AGGREGATE html_ul (basetype=text, sfunc=html_li_agg, stype=text, initcond='', finalfunc=html_ul_final);
      

      and that's it--you can now use the html_ul aggregate function.

      Extending the Idea

      You can even extend this: want to build a html_ol function for ordered lists?

      Just re-used the html_li_agg function, and make a new html_ol() aggregate:

        CREATE FUNCTION html_ol_final(text) RETURN text AS '
          begin
            if $1 <> '''' then
              return ''<ol>'' || $1 || ''</ol>'';
            else
              return null;
            end if;
          end;
        ' LANGUAGE 'plpgsql';
      

      and:

        CREATE AGGREGATE html_ol (basetype=text, sfunc=html_li_agg, stype=text, initcond='', finalfunc=html_ol_final);
      

      You can also use html_li() by itself, in case you want to insert the open-list and close-list tags yourself:

        CREATE AGGREGATE html_li (basetype=text, sfunc=html_li_agg, stype=text, initcond='');
      

      At the our nonprofit site, we have several list-aggregates tags defined, including html_ul(), html_ol(), html_p() (for showing the list as paragraphs), comma() (for showing with commas), semicolon(), etc.

      Final thoughts

      No, it's not usual database practice.

      Yes, it's probably an abuse of domain functions.

      Yes, it blurs the data-storage vs. presentation separations. However, it works great, and can often simplify complicated and slow in loops.

      Plus, since creating views in PostgreSQL is so easy, you can abstract away the implementation with a view:

        CREATE VIEW staff_and_goals AS
          SELECT     staffname, html_ul(goals) AS goals_list
          FROM       Staff NATURAL JOIN Goals
          GROUP BY   staffname
          ORDER BY   staffname;
      

      and now your Zope fiends needs only have:

        SELECT staffname, goals_list FROM staff_and_goals;
      

      in their ZSQL method to handle this complicated loop.

      Happy Zoping! Love Live PostgreSQL!