Introduction to PostgreSQL DISTINCT ON
DISTINCT ON is a PostgreSQL extension to standard SQL that lets you retrieve the first row per unique value of one or more specified columns, while keeping all other columns from that row in the result. This is particularly useful when you want the "latest," "highest," or "best" record per group without a subquery or window function.
The basic syntax is:
SELECT DISTINCT ON (column1, column2, ...)
column1, column2, other_columns, ...
FROM table_name
ORDER BY column1, column2, ...;
The key rule: the expressions inside DISTINCT ON (...) must match the leftmost expressions in the ORDER BY clause. The remaining ORDER BY expressions determine which row within each group is kept as "first."
DISTINCT ON example: highest score per student
Consider a student_scores table with scores for multiple subjects per student:
CREATE TABLE student_scores (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
subject VARCHAR(50) NOT NULL,
score INTEGER NOT NULL
);
INSERT INTO student_scores (name, subject, score) VALUES
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'Physics', 92),
('Bob', 'Physics', 88),
('Charlie', 'Math', 95),
('Charlie', 'Physics', 90);
To retrieve each student's highest score across all subjects:
SELECT DISTINCT ON (name)
name, subject, score
FROM student_scores
ORDER BY name, score DESC;
Result:
name | subject | score
---------+---------+-------
Alice | Physics | 92
Bob | Physics | 88
Charlie | Math | 95
(3 rows)
Here DISTINCT ON (name) keeps one row per student, and ORDER BY name, score DESC ensures that the row with the highest score is the one retained.
Most recent order per customer
A common application pattern: finding the latest order for each customer:
SELECT DISTINCT ON (customer_id)
customer_id, order_id, placed_at, total_amount
FROM orders
ORDER BY customer_id, placed_at DESC;
This is more concise than the equivalent subquery approach and typically performs well with a compound index on (customer_id, placed_at DESC).
DISTINCT ON vs. SELECT DISTINCT
SELECT DISTINCT eliminates rows that are completely identical across all selected columns. DISTINCT ON (col) keeps one row per unique value of col — the other columns in that row can vary. They solve fundamentally different problems:
- Use
SELECT DISTINCTto deduplicate rows where all columns match. - Use
DISTINCT ONto get one representative row per group (like the row with the max or most recent value).
Important rules and tips
- The expression in
DISTINCT ON (...)must be the leftmost expression in theORDER BYclause. Misaligning them causes an error. - Omitting
ORDER BYentirely is allowed but makes the "first" row selection unpredictable — always include it in practice. - For performance on large tables, create a composite index matching your
ORDER BYcolumns (e.g.,CREATE INDEX ON orders (customer_id, placed_at DESC)). DISTINCT ONis PostgreSQL-specific. If cross-database portability matters, use a subquery withROW_NUMBER()instead.
Reference: PostgreSQL documentation — SELECT DISTINCT ON.