Correlated Subquery

Learn how PostgreSQL correlated subqueries reference the outer query to perform row-by-row comparisons, such as filtering rows that exceed a per-group average.

5 min read · Last updated: March 2026 · Back to overview

Quick Answer

A correlated subquery in PostgreSQL is a subquery that references columns from the outer query. Unlike a regular subquery that runs once, PostgreSQL re-executes the correlated subquery for each row processed by the outer query.

Spin up a Postgres database in 20 seconds with Vela.

Try Vela Sandbox

A correlated subquery is a subquery that references columns from the outer query. Unlike a regular subquery that PostgreSQL evaluates once, a correlated subquery is re-executed for each row processed by the outer query. This makes them powerful for row-level comparisons against group-level aggregates, but potentially expensive on large datasets.

How a correlated subquery works

The key characteristic is the reference to the outer query's alias inside the inner query. In the example below, the inner query references f.rating from the outer query's current row:

SELECT film_id, title, length, rating
FROM film f
WHERE length > (
  SELECT AVG(length)
  FROM film
  WHERE rating = f.rating
);

Output:

film_id |            title            | length | rating
---------+-----------------------------+--------+--------
    133 | Chamber Italian             |    117 | NC-17
      4 | Affair Prejudice            |    117 | G
      5 | African Egg                 |    130 | G
      6 | Agent Truman                |    169 | PG
...

For each row in the outer query, PostgreSQL computes the average length of all films that share the same rating as the current film. Only films whose length exceeds that per-rating average are returned.

Step-by-step execution

Here is what PostgreSQL does for each row in the film table:

  • The outer query reads a row from film f and notes its rating value (e.g., 'G').
  • The correlated subquery runs: SELECT AVG(length) FROM film WHERE rating = 'G'.
  • The result (e.g., 118.9) is compared against the current film's length.
  • If the film's length exceeds the average, the row is included in the output.

The condition rating = f.rating is what creates the correlation — it binds the inner query to the outer row's context.

Performance considerations and alternatives

On large tables, re-executing the inner query for every outer row is costly. A common optimization is to pre-compute the aggregate in a derived table and join it:

SELECT f.film_id, f.title, f.length, f.rating
FROM film f
JOIN (
  SELECT rating, AVG(length) AS avg_length
  FROM film
  GROUP BY rating
) avg_by_rating ON f.rating = avg_by_rating.rating
WHERE f.length > avg_by_rating.avg_length;

This version computes the per-rating average once instead of once per film, which is significantly faster at scale.

Tips for working with correlated subqueries

  • Use EXPLAIN ANALYZE to measure the actual execution cost — the plan will show how many times the inner node is executed.
  • Index the correlated column in the inner query's table (e.g., film.rating) to speed up each inner execution.
  • If the same aggregate is needed multiple times, move the correlated subquery to a CTE or derived table that computes it once.
  • Correlated subqueries are often used with EXISTS — which short-circuits after finding the first matching row and can be more efficient than returning a full aggregate.

Reference: PostgreSQL documentation — Subquery Expressions.

Continue in Subquery: ANY Operator.

Related in this section: Subquery · ANY Operator · ALL Operator

Frequently Asked Questions

What makes a subquery "correlated" in PostgreSQL?

A subquery is correlated when it references one or more columns from the outer query. The reference creates a dependency — the subquery cannot be evaluated independently because it needs the current row value from the outer query. PostgreSQL re-runs the inner query once per outer row.

What is a common use case for a correlated subquery?

The most common use case is comparing each row against a group-level aggregate. For example, finding films longer than the average for their specific rating category, or finding employees who earn more than the average salary in their department. The correlation ties the aggregate to the current row's group.

Why can correlated subqueries be slow?

Because PostgreSQL re-executes the inner query for every row in the outer query, the total cost grows linearly with the outer row count. On large tables this leads to many individual executions of the inner query. Adding an index on the correlated column (e.g., the rating column) can significantly reduce this cost.

How can I rewrite a correlated subquery for better performance?

Often you can replace a correlated subquery with a JOIN to a derived table or CTE that pre-computes the aggregate. For example, instead of re-computing AVG(length) per rating for every row, compute it once with GROUP BY rating and then join the result back to the film table on the rating column.

Can a correlated subquery appear in the SELECT list?

Yes. A scalar correlated subquery in the SELECT list returns one value per outer row. For example: SELECT title, (SELECT AVG(length) FROM film WHERE rating = f.rating) AS avg_rating_length FROM film f. This pattern is convenient but carries the same per-row execution cost as in the WHERE clause.