Correlated Subquery in PostgreSQL

A correlated subquery is a subquery that references columns from the outer query. Unlike a regular subquery that PostgreSQL evaluates once and reuses, a correlated subquery is re-executed for each row processed by the outer query. This makes them powerful for row-level comparisons against group-level aggregates, but potentially expensive on large datasets.

How a Correlated Subquery Works

The key characteristic is a reference to the outer query’s row alias inside the inner query. The outer alias binds the inner query to the current row’s context:

SELECT column_list
FROM outer_table o
WHERE column operator (
  SELECT aggregate(column)
  FROM inner_table
  WHERE inner_table.group_col = o.group_col  -- correlation
);

PostgreSQL processes each outer row in sequence, substitutes its group_col value into the inner query, runs the inner query, evaluates the comparison, and includes or excludes the row.

Practical Example

Create an events table and an event_categories table to track ticket sales across venues:

CREATE TABLE event_categories (
  category_id   SERIAL PRIMARY KEY,
  category_name VARCHAR(50) NOT NULL
);

CREATE TABLE events (
  event_id      SERIAL PRIMARY KEY,
  title         VARCHAR(100) NOT NULL,
  category_id   INT REFERENCES event_categories (category_id),
  tickets_sold  INT         NOT NULL DEFAULT 0,
  ticket_price  NUMERIC(8, 2)
);

INSERT INTO event_categories (category_name) VALUES
  ('Concert'), ('Conference'), ('Sports');

INSERT INTO events (title, category_id, tickets_sold, ticket_price) VALUES
  ('Summer Fest',     1, 9200,  85.00),
  ('Jazz Evening',    1, 4100,  45.00),
  ('Rock Night',      1, 7800,  65.00),
  ('Tech Summit',     2, 1200, 299.00),
  ('DevConf 2026',    2,  680, 199.00),
  ('AI Workshop',     2,  950, 249.00),
  ('Championship',    3, 18000, 120.00),
  ('Cup Final',       3, 22000, 150.00),
  ('League Match',    3,  8500,  55.00);

Find all events with more tickets sold than the average for their category:

SELECT e.title, e.tickets_sold, ec.category_name
FROM events e
JOIN event_categories ec ON ec.category_id = e.category_id
WHERE e.tickets_sold > (
  SELECT AVG(tickets_sold)
  FROM events
  WHERE category_id = e.category_id  -- correlation to outer row
)
ORDER BY ec.category_name, e.tickets_sold DESC;

     title       | tickets_sold | category_name
-----------------+--------------+---------------
 Summer Fest     |         9200 | Concert
 Rock Night      |         7800 | Concert
 Tech Summit     |         1200 | Conference
 AI Workshop     |          950 | Conference
 Cup Final       |        22000 | Sports
 Championship    |        18000 | Sports

For each row in events e, the inner query computes AVG(tickets_sold) only for rows sharing the same category_id. The correlation is WHERE category_id = e.category_id.

Step-by-step Execution

For the first row (Summer Fest, category_id = 1):

The outer query notes category_id = 1 (Concert).
The correlated subquery runs: SELECT AVG(tickets_sold) FROM events WHERE category_id = 1 — result: 7033.
9200 > 7033 is true, so the row is included.
The process repeats for every subsequent outer row.

Performance Considerations and Alternatives

On large tables, re-executing the inner query per outer row is expensive. Pre-compute the aggregate once with a derived table:

SELECT e.title, e.tickets_sold, ec.category_name
FROM events e
JOIN event_categories ec ON ec.category_id = e.category_id
JOIN (
  SELECT category_id, AVG(tickets_sold) AS avg_sold
  FROM events
  GROUP BY category_id
) cat_avg ON cat_avg.category_id = e.category_id
WHERE e.tickets_sold > cat_avg.avg_sold
ORDER BY ec.category_name, e.tickets_sold DESC;

This version computes the per-category average once (one pass over events), then joins, rather than once per outer row. Use EXPLAIN ANALYZE to compare actual execution costs between the two approaches.

Testing with Vela

Correlated subqueries are one of the query patterns most likely to degrade silently as tables grow — what takes milliseconds on a small dataset can take seconds on production. Vela’s database branching lets you clone your production database to a branch, run EXPLAIN ANALYZE against real data volumes, and test the rewritten JOIN version side by side, all without touching the live environment.

Production Tips

Index the correlated column in the inner query’s table (e.g., events.category_id) to reduce the cost of each individual inner execution.
Use EXPLAIN ANALYZE and inspect the “loops” field on the inner plan node — it shows exactly how many times the inner query ran.
Replace correlated subqueries with a JOIN to a pre-aggregated derived table or CTE when the outer table has more than a few thousand rows.
Correlated subqueries are a natural fit for EXISTS checks (not aggregates): EXISTS (SELECT 1 FROM orders WHERE product_id = p.product_id) short-circuits after the first match, making it more efficient than counting.
A scalar correlated subquery in the SELECT list is readable but carries the same per-row cost — consider computing it in a lateral join (JOIN LATERAL) for better planner flexibility.

Frequently Asked Questions

What makes a subquery 'correlated' in PostgreSQL?

A subquery is correlated when it references a column from the outer query inside the inner SELECT. This reference creates a dependency: the inner query cannot be evaluated independently because it needs the value of the current outer row. PostgreSQL re-runs the inner query once per outer row.

Does a correlated subquery lock rows in the inner table?

A correlated subquery that only reads data acquires the same locks as a regular SELECT — typically no row-level locks under read-committed isolation. The inner query is simply re-evaluated per outer row. Only if you use SELECT ... FOR UPDATE inside the subquery will row locks be acquired.

Why can correlated subqueries be slow?

Because PostgreSQL re-executes the inner query for every row of the outer query, the total cost grows linearly with the outer row count. On a table with 100,000 rows the inner query runs 100,000 times. Indexing the correlated column and rewriting as a JOIN to a pre-aggregated derived table are the standard remediations.

Can I use IF EXISTS with a correlated subquery?

There is no IF EXISTS syntax for subqueries. To test row existence in a correlated context, use the EXISTS operator: WHERE EXISTS (SELECT 1 FROM orders o WHERE o.product_id = p.product_id). EXISTS short-circuits after finding the first matching row, which is often faster than returning an aggregate.

What is the safest way to use correlated subqueries in production?

Always run EXPLAIN ANALYZE to check how many times the inner node executes and what its actual cost is. Replace correlated subqueries with a JOIN to a derived table or CTE that pre-computes the aggregate once. Index the column used in the correlation predicate to reduce the cost of each inner execution.