Current Articles | RSS Feed
The PostgreSQL relational database management system (RDBMS) extends the standard SQL3 it implements with several conveniences for application development. Among the most useful extensions are window functions, which perform a calculation across a set of table rows that are somehow related to the current row.
Suppose you want to know the total media market by state from a reference tabulated in terms of city (or metropolitan) listings. You can write a query with a window function like this:
SELECT state_name, city_name, market_size, SUM(market_size)OVER (PARTITION BY state_name) FROM markets;
and see something like:
A slightly more involved expression might report, for instance:
The examples above are typical of questions that come up in real-world reporting requirements. Window functions make their expression concise and maintainable.
In PostgreSQL, you can create a window function by replacing any use of a function in a SELECT list, or its associated ORDER BY clause, with an OVER clause immediately after the function. In an ordinary select statement such as:
SELECT
ORDER BY
OVER
SELECT emp_id, salary, AVG(salary) FROM salaries;
the average is over all reported employees. Contrast this with a statement that uses a windows function:
SELECT emp_id, salary, AVG(salary) OVER(PARTITION by site) FROM salaries;
The first two columns of these two reports will be identical. The third column in the latter case, though, will show averages only for people at the same site. All employees working in Miami share one average, all those in Toronto have a different average, and so on.
Some window functions depend on the order of their inputs. You can control this order with an optional ORDER BY clause. ORDER BY can be particularly useful when considering budget problems, as this example shows:
SELECT request, sum(request) FROM projects ORDER BY priority;
Now change the query to:
SELECT request, sum(request) OVER (ORDER BY priority)FROM projects ORDER BY priority;
and notice the subtly different result:
One way to read this table in English is: "If we fund only the top-priority project, our total expense will be 35; if we fund the top two in priority, we'll spend 145; ..." This is a view of data that decision-makers frequently prize – and a window function makes it easy to see!
It's certainly possible to compute the same results without window functions; plenty of applications do. Many retrieve items from a table into a host language such as Java or Ruby, then use procedural logic in the host language to calculate the same "windows." Among other disadvantages, this approach has the potential to transmit large volumes of undisplayed intermediate data over the network, and thus slow overall performance.
Another alternative is to use stored procedures or user-defined functions on the RDBMS side. While this minimizes runtime network traffic, it generally still falls short of window functions in clarity and query performance. When written from scratch, code to compute a window function often takes four to 10 times as many lines as the simpler window function-based query.
The SQL/OLAP amendment to SQL:1999 (SQL:1999 is a synonym for SQL3) introduced a <window> clause to act on collections or partitions of data. Not long after this, the SQL:2003 standard formalized window functions as described here. The subsequent SQL:2008 standard (labeled "SQL-2008" in some Postgres documents) slightly refined the definition of window functions.
<window>
PostgreSQL gained its window functions with the release of 8.4 in mid-2009. Postgres targets the SQL:2008 definition with its implementation. The principal difference between the Postgres implementation and the SQL:2008 definition has to do with specialized treatment of NULLS; see the Note at the bottom of the Window Functions documentation page for more details. Also, in Postgres, any built-in or user-defined aggregate function, such as count, max, bit_or, or xmlagg, is available for computation as a window function.
NULLS
count
max
bit_or
xmlagg
Many of the queries or reports that arise in business intelligence (BI) and analytical applications can be expressed naturally in terms of window functions. When you prepare a table where a common value ties together several distinct rows, it's worth your time to investigate whether your calculation can best be done as a window function. You will save both your own development time, and runtime resource usage, when you write window functions. They're among the best and most useful standardized extensions to SQL3.
Allowed tags: <a> link, <b> bold, <i> italics