A correlated subquery is a subquery that references columns from the outer query. Unlike a regular subquery that PostgreSQL evaluates once, a correlated subquery is re-executed for each row processed by the outer query. This makes them powerful for row-level comparisons against group-level aggregates, but potentially expensive on large datasets.
How a correlated subquery works
The key characteristic is the reference to the outer query's alias inside the inner query. In the example below, the inner query references f.rating from the outer query's current row:
SELECT film_id, title, length, rating
FROM film f
WHERE length > (
SELECT AVG(length)
FROM film
WHERE rating = f.rating
);
Output:
film_id | title | length | rating
---------+-----------------------------+--------+--------
133 | Chamber Italian | 117 | NC-17
4 | Affair Prejudice | 117 | G
5 | African Egg | 130 | G
6 | Agent Truman | 169 | PG
...
For each row in the outer query, PostgreSQL computes the average length of all films that share the same rating as the current film. Only films whose length exceeds that per-rating average are returned.
Step-by-step execution
Here is what PostgreSQL does for each row in the film table:
- The outer query reads a row from
film fand notes itsratingvalue (e.g.,'G'). - The correlated subquery runs:
SELECT AVG(length) FROM film WHERE rating = 'G'. - The result (e.g.,
118.9) is compared against the current film'slength. - If the film's length exceeds the average, the row is included in the output.
The condition rating = f.rating is what creates the correlation — it binds the inner query to the outer row's context.
Performance considerations and alternatives
On large tables, re-executing the inner query for every outer row is costly. A common optimization is to pre-compute the aggregate in a derived table and join it:
SELECT f.film_id, f.title, f.length, f.rating
FROM film f
JOIN (
SELECT rating, AVG(length) AS avg_length
FROM film
GROUP BY rating
) avg_by_rating ON f.rating = avg_by_rating.rating
WHERE f.length > avg_by_rating.avg_length;
This version computes the per-rating average once instead of once per film, which is significantly faster at scale.
Tips for working with correlated subqueries
- Use
EXPLAIN ANALYZEto measure the actual execution cost — the plan will show how many times the inner node is executed. - Index the correlated column in the inner query's table (e.g.,
film.rating) to speed up each inner execution. - If the same aggregate is needed multiple times, move the correlated subquery to a CTE or derived table that computes it once.
- Correlated subqueries are often used with
EXISTS— which short-circuits after finding the first matching row and can be more efficient than returning a full aggregate.
Reference: PostgreSQL documentation — Subquery Expressions.