A correlated subquery is a subquery that references columns from the outer query. Unlike a regular subquery that PostgreSQL evaluates once and reuses, a correlated subquery is re-executed for each row processed by the outer query. This makes them powerful for row-level comparisons against group-level aggregates, but potentially expensive on large datasets.
How a Correlated Subquery Works
The key characteristic is a reference to the outer query’s row alias inside the inner query. The outer alias binds the inner query to the current row’s context:
SELECT column_list
FROM outer_table o
WHERE column operator (
SELECT aggregate(column)
FROM inner_table
WHERE inner_table.group_col = o.group_col -- correlation
);
PostgreSQL processes each outer row in sequence, substitutes its group_col value into the inner query, runs the inner query, evaluates the comparison, and includes or excludes the row.
Practical Example
Create an events table and an event_categories table to track ticket sales across venues:
CREATE TABLE event_categories (
category_id SERIAL PRIMARY KEY,
category_name VARCHAR(50) NOT NULL
);
CREATE TABLE events (
event_id SERIAL PRIMARY KEY,
title VARCHAR(100) NOT NULL,
category_id INT REFERENCES event_categories (category_id),
tickets_sold INT NOT NULL DEFAULT 0,
ticket_price NUMERIC(8, 2)
);
INSERT INTO event_categories (category_name) VALUES
('Concert'), ('Conference'), ('Sports');
INSERT INTO events (title, category_id, tickets_sold, ticket_price) VALUES
('Summer Fest', 1, 9200, 85.00),
('Jazz Evening', 1, 4100, 45.00),
('Rock Night', 1, 7800, 65.00),
('Tech Summit', 2, 1200, 299.00),
('DevConf 2026', 2, 680, 199.00),
('AI Workshop', 2, 950, 249.00),
('Championship', 3, 18000, 120.00),
('Cup Final', 3, 22000, 150.00),
('League Match', 3, 8500, 55.00);
Find all events with more tickets sold than the average for their category:
SELECT e.title, e.tickets_sold, ec.category_name
FROM events e
JOIN event_categories ec ON ec.category_id = e.category_id
WHERE e.tickets_sold > (
SELECT AVG(tickets_sold)
FROM events
WHERE category_id = e.category_id -- correlation to outer row
)
ORDER BY ec.category_name, e.tickets_sold DESC;
title | tickets_sold | category_name
-----------------+--------------+---------------
Summer Fest | 9200 | Concert
Rock Night | 7800 | Concert
Tech Summit | 1200 | Conference
AI Workshop | 950 | Conference
Cup Final | 22000 | Sports
Championship | 18000 | Sports
For each row in events e, the inner query computes AVG(tickets_sold) only for rows sharing the same category_id. The correlation is WHERE category_id = e.category_id.
Step-by-step Execution
For the first row (Summer Fest, category_id = 1):
- The outer query notes
category_id = 1(Concert). - The correlated subquery runs:
SELECT AVG(tickets_sold) FROM events WHERE category_id = 1— result: 7033. 9200 > 7033is true, so the row is included.- The process repeats for every subsequent outer row.
Performance Considerations and Alternatives
On large tables, re-executing the inner query per outer row is expensive. Pre-compute the aggregate once with a derived table:
SELECT e.title, e.tickets_sold, ec.category_name
FROM events e
JOIN event_categories ec ON ec.category_id = e.category_id
JOIN (
SELECT category_id, AVG(tickets_sold) AS avg_sold
FROM events
GROUP BY category_id
) cat_avg ON cat_avg.category_id = e.category_id
WHERE e.tickets_sold > cat_avg.avg_sold
ORDER BY ec.category_name, e.tickets_sold DESC;
This version computes the per-category average once (one pass over events), then joins, rather than once per outer row. Use EXPLAIN ANALYZE to compare actual execution costs between the two approaches.
Testing with Vela
Correlated subqueries are one of the query patterns most likely to degrade silently as tables grow — what takes milliseconds on a small dataset can take seconds on production. Vela’s database branching lets you clone your production database to a branch, run EXPLAIN ANALYZE against real data volumes, and test the rewritten JOIN version side by side, all without touching the live environment.
Production Tips
- Index the correlated column in the inner query’s table (e.g.,
events.category_id) to reduce the cost of each individual inner execution. - Use EXPLAIN ANALYZE and inspect the “loops” field on the inner plan node — it shows exactly how many times the inner query ran.
- Replace correlated subqueries with a JOIN to a pre-aggregated derived table or CTE when the outer table has more than a few thousand rows.
- Correlated subqueries are a natural fit for
EXISTSchecks (not aggregates):EXISTS (SELECT 1 FROM orders WHERE product_id = p.product_id)short-circuits after the first match, making it more efficient than counting. - A scalar correlated subquery in the
SELECTlist is readable but carries the same per-row cost — consider computing it in a lateral join (JOIN LATERAL) for better planner flexibility.