SELECT DISTINCT in PostgreSQL | PostgreSQL Tutorial

By default, a SELECT query returns every row that matches the FROM and WHERE conditions, including duplicates. SELECT DISTINCT eliminates those duplicates, retaining only one row for each unique value — or unique combination of values — across all columns in the select list.

Syntax

Deduplicate on a single column:

SELECT DISTINCT column1
FROM table_name;

Deduplicate on a combination of columns — a row is only eliminated if all listed columns match another row simultaneously:

SELECT DISTINCT column1, column2
FROM table_name;

Deduplicate across every column:

SELECT DISTINCT *
FROM table_name;

Practical Example

Create a table of event registrations where attendees may register for multiple sessions:

CREATE TABLE registrations (
  id          SERIAL PRIMARY KEY,
  attendee    VARCHAR(100) NOT NULL,
  session     VARCHAR(100) NOT NULL,
  location    VARCHAR(100),
  registered_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

INSERT INTO registrations (attendee, session, location) VALUES
  ('Ada Lovelace',   'Keynote',        'New York'),
  ('Grace Hopper',   'Keynote',        'San Francisco'),
  ('Ada Lovelace',   'Workshop A',     'New York'),
  ('Linus Torvalds', 'Keynote',        'Helsinki'),
  ('Grace Hopper',   'Workshop A',     'San Francisco'),
  ('Linus Torvalds', 'Workshop A',     'Helsinki'),
  ('Ada Lovelace',   'Keynote',        'New York'),   -- duplicate registration
  ('Grace Hopper',   'Keynote',        'San Francisco'); -- duplicate registration

Find unique sessions being attended:

SELECT DISTINCT session
FROM registrations
ORDER BY session;

Result:

session
-----------
Keynote
Workshop A
(2 rows)

Find unique (attendee, session) combinations:

SELECT DISTINCT attendee, session
FROM registrations
ORDER BY attendee, session;

This removes the two duplicate rows, returning 6 unique pairs instead of 8.

Find all cities where attendees are located:

SELECT DISTINCT location
FROM registrations
ORDER BY location;

Use DISTINCT to enumerate enum-like values:

-- Discover all distinct session types without scanning the whole table
SELECT DISTINCT session FROM registrations;

Verify deduplication by comparing counts:

SELECT
  COUNT(*)                            AS total_rows,
  COUNT(DISTINCT attendee)            AS unique_attendees,
  COUNT(DISTINCT session)             AS unique_sessions,
  COUNT(DISTINCT (attendee, session)) AS unique_pairs
FROM registrations;

SELECT DISTINCT vs. GROUP BY

For simple deduplication, SELECT DISTINCT and GROUP BY produce the same result:

-- These are equivalent:
SELECT DISTINCT location FROM registrations ORDER BY location;
SELECT location FROM registrations GROUP BY location ORDER BY location;

The difference emerges when you need aggregates alongside the unique values:

-- DISTINCT cannot do this:
SELECT location, COUNT(*) AS registrations_from_location
FROM registrations
GROUP BY location
ORDER BY registrations_from_location DESC;

Use case	Recommended
List unique values only	`SELECT DISTINCT`
Unique values with counts or sums	`GROUP BY`
One representative row per group	`DISTINCT ON`

Testing with Vela

When using SELECT DISTINCT to power dropdowns, faceted search, or deduplication logic in an application, the correct output depends heavily on the actual data distribution in production. Vela’s database branching lets you clone your production database and run DISTINCT queries — or compare them against GROUP BY alternatives with EXPLAIN ANALYZE — on real data volumes before any schema or query change goes live.

Production Tips

DISTINCT forces a full deduplication pass over all retrieved rows. On large result sets, this is expensive — use EXPLAIN to see whether PostgreSQL is using a sort or hash strategy.
If you only need unique values from a column with a B-tree index, PostgreSQL can use an index-only skip-scan to retrieve distinct values efficiently without reading the full table.
SELECT DISTINCT on multiple columns is rarely the right tool if you also need to filter by one of those columns — consider rewriting as GROUP BY with HAVING instead.
Avoid DISTINCT as a band-aid for fixing incorrect JOINs that produce duplicate rows — identify and fix the join condition instead.
COUNT(DISTINCT col) is supported alongside regular SELECT but is computed independently from any DISTINCT in the SELECT clause.
When building unique-value lists for UI dropdowns, consider a materialized view refreshed on a schedule rather than running SELECT DISTINCT on the full table for every page load.

Frequently Asked Questions

What is the difference between SELECT DISTINCT and SELECT DISTINCT ON in PostgreSQL?

SELECT DISTINCT removes rows that are completely identical across all listed columns. SELECT DISTINCT ON (column) keeps the first row per unique value of the specified column, allowing other columns to vary — it requires an ORDER BY clause to determine which row is 'first'. DISTINCT ON is a PostgreSQL-specific extension not available in standard SQL.

Does SELECT DISTINCT lock the table?

No. DISTINCT is processed after rows are retrieved; it only adds a deduplication step (sort- or hash-based) to the query execution. The SELECT itself acquires only an ACCESS SHARE lock, which does not block writes.

How does SELECT DISTINCT handle NULL values?

PostgreSQL treats NULLs as equal for DISTINCT purposes. Multiple rows where a column is NULL collapse into a single NULL row in the result. This differs from equality comparisons where NULL = NULL evaluates to NULL (unknown), not TRUE.

Is SELECT DISTINCT the same as GROUP BY?

They produce identical unique-value output for simple cases: SELECT DISTINCT col and SELECT col FROM t GROUP BY col return the same rows. The key difference is that GROUP BY supports aggregate functions like COUNT(*) or SUM() alongside the grouped column, while SELECT DISTINCT does not.

Can SELECT DISTINCT slow down my query?

Yes. DISTINCT forces PostgreSQL to deduplicate results, which typically involves sorting or hashing all retrieved rows. On large tables this adds cost. If you only need unique values from an indexed column, PostgreSQL may use a skip-scan; check EXPLAIN ANALYZE output. Consider GROUP BY or an EXISTS subquery if they serve your use case more efficiently.