Using PostgreSQL's aggregates functions to speed up loops |
Created by pupq . Last modified 2003-10-02 03:18:34. |
PostgreSQL's aggregate functions can be used to significantly speed up loops that display items. |
Note for PostgreSQL-users: This tutorial is
about using PostgreSQL and Zope together, but
the core idea here can be used in any
scripting or reporting language. Instead of
using Zope-looping commands ( Looping over sequences returned by ZSQL methods
by using Much of the time, though, you're probably doing something simple, like this: <ul> <dtml-in goals> <li><dtml-var goal></li> </dtml-in> </ul> which is fine for straightforward lists. Imagine, though, that you were listing goals
not just for yourself, but for all staff
at your company. To handle this,
you might use nested ZSQL <dtml-in staff> <h2><dtml-var staffname></h2> <dtml-in "goals({ 'staffname': staffname})"> <dtml-var goal> </dtml-in> </dtml-in> which works great, except that this means that for every single staff member, you're running a new query to find their goals. Imagine that you have 100 staff members, and each has, on average, five goals. Instead of one single query, you're running an outer query finding 100 records, and 100 inner queries to find 500 records total. This can become fairly sluggish. If instead, you wanted to show all staff members, their goals (about 5 goals each), and progress against their goals (about 5 points each), you're running 100 x 5 x 5 = 2,500 queries -- and it's not even a terribly complicated page! Often, you need a solution that lets you handle this without so many nested queries. Solution One: Use first- Variables One good solution is to use the sometimes-ignored
For example: <dtml-in staff_and_goals> <dtml-if first-staffname> <h2><dtml-var staffname></h2> <ul> </dtml-if> <li><dtml-var goal></li> <dtml-if last-staffname> </ul> </dtml-if> </dtml-in> runs through a single flat query that returns all goals for all staff. Every time the staff member name changes, it outputs a new header, and outputs a new goal every time. (Of course, in a real example, you'd better use something guaranteed to be unique, such as a unique 'staffid'--otherwise you might have two staff members named John Smith, and they would show up just once, with the combined list of goals.) We've replaced 101 queries (slow!) with
one query that returns 500 records (faster!).
Solution Two: Create a PostgreSQL Aggregate for HTML LoopsA nifty PostgreSQL-specific solution can be more flexible and faster. In PostgreSQL, you can create new aggregate
functions. Aggregate functions (called
domain functions in some database systems)
are functions like A simple query using a domain function would be: SELECT staffname, Count(goals) FROM Staff NATURAL JOIN Goals GROUP BY staffname which would show all staff, and a count of the number of goals they have. PostgreSQL comes with many functions like this, including interesting ones to calculate standard deviations of a set and such. One of the nicest features of PostgreSQL is that you can use procedural languages to write new user functions, even for aggregate functions. (In fact, though it's still beta, you can use Python as a procedural language, leading to Zope nirvana.) We can create our own aggregate, which will
be an For example: SELECT staffname, html_ul(goal) FROM Staff NATURAL JOIN Goals GROUP BY staffname; would yield:: || staffname || goals || || John Smith ||
So that we get a single text value for the entire list, already formatted as a simple HTML unordered list. This way, we can run a single query, and get 100 records (less time in DTML looping, less time for Zope to assign every single instance variable, etc.) This can be a good win for programmer productivity and performance. The biggest advantage, though, is that this
technique can be used when you have more than
one sub-list. For example, if we were listing
all staff's goals and their possessions,
we'd have a nasty, nested in loop--the How To Do It You need to have the (You could also do this in any of the other procedural languages that PostgreSQL can use: tcl, perl, or python. For simple things like this, though, plpgsql is probably the easiest choice.) Then: CREATE FUNCTION html_li_agg(text,text) RETURNS text AS ' begin return $1 || ''<li>'' || $2 || ''</li>''; end; ' LANGUAGE 'plpgsql'; This is the function which is called for every
item in the list. It returns whatever it has
already seen (passed in the first parameter,
To handle the wrapping of the entire thing
in CREATE FUNCTION html_ul_final(text) RETURN text AS ' begin if $1 <> '''' then return ''<ul>'' || $1 || ''</ul>''; else return null; end if; end; ' LANGUAGE 'plpgsql'; This is the final function, meaning that it
is called at the end of the aggregating. If
any items were found in the list, it wraps them
in a Lastly: CREATE AGGREGATE html_ul (basetype=text, sfunc=html_li_agg, stype=text, initcond='', finalfunc=html_ul_final); and that's it--you can now use the html_ul aggregate function. Extending the IdeaYou can even extend this: want to build a html_ol function for ordered lists? Just re-used the html_li_agg function, and make a new html_ol() aggregate: CREATE FUNCTION html_ol_final(text) RETURN text AS ' begin if $1 <> '''' then return ''<ol>'' || $1 || ''</ol>''; else return null; end if; end; ' LANGUAGE 'plpgsql'; and: CREATE AGGREGATE html_ol (basetype=text, sfunc=html_li_agg, stype=text, initcond='', finalfunc=html_ol_final); You can also use html_li() by itself, in case you want to insert the open-list and close-list tags yourself: CREATE AGGREGATE html_li (basetype=text, sfunc=html_li_agg, stype=text, initcond=''); At the our nonprofit site, we have
several list-aggregates tags defined, including
Final thoughtsNo, it's not usual database practice. Yes, it's probably an abuse of domain functions. Yes, it blurs the data-storage vs. presentation separations. However, it works great, and can often simplify complicated and slow in loops. Plus, since creating views in PostgreSQL is so easy, you can abstract away the implementation with a view: CREATE VIEW staff_and_goals AS SELECT staffname, html_ul(goals) AS goals_list FROM Staff NATURAL JOIN Goals GROUP BY staffname ORDER BY staffname; and now your Zope fiends needs only have: SELECT staffname, goals_list FROM staff_and_goals; in their ZSQL method to handle this complicated loop. Happy Zoping! Love Live PostgreSQL! |