Duplicate rows accumulate when tables lack unique constraints or data is imported from external sources without deduplication. PostgreSQL offers several approaches to find and remove duplicates: a self-join with DELETE USING, a ROW_NUMBER() subquery, and an immediate table technique.
Finding duplicate rows
Before deleting, identify which values are duplicated and how many copies exist:
SELECT fruit, COUNT(fruit)
FROM basket
GROUP BY fruit
HAVING COUNT(fruit) > 1
ORDER BY fruit;
Output:
fruit | count
---------+-------
apple | 2
orange | 3
(2 rows)
Method 1: DELETE USING with a self-join
Join the table to itself and delete the rows with lower IDs, keeping the row with the highest ID per duplicate group:
DELETE FROM basket a
USING basket b
WHERE a.id < b.id
AND a.fruit = b.fruit;
After this, each fruit value appears only once — specifically the row with the highest ID. To keep the lowest ID instead, flip the comparison operator:
DELETE FROM basket a
USING basket b
WHERE a.id > b.id
AND a.fruit = b.fruit;
Method 2: Subquery with ROW_NUMBER()
Use ROW_NUMBER() partitioned by the duplicate column to number each duplicate group, then delete rows with a row number greater than 1:
DELETE FROM basket
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY fruit ORDER BY id) AS row_num
FROM basket
) t
WHERE t.row_num > 1
);
This keeps the row with the lowest ID per group. To keep the highest, change ORDER BY id to ORDER BY id DESC.
For duplicates defined across multiple columns, partition by all of them:
DELETE FROM table_name
WHERE id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY column_1, column_2 ORDER BY id) AS row_num
FROM table_name
) t
WHERE t.row_num > 1
);
Method 3: Immediate table
Create a deduplicated copy, drop the original, and rename the copy. This is useful when the table lacks a primary key or other unique identifier:
-- 1. Create a new table with the same structure
CREATE TABLE basket_temp (LIKE basket);
-- 2. Insert only distinct rows
INSERT INTO basket_temp (fruit, id)
SELECT DISTINCT ON (fruit) fruit, id
FROM basket;
-- 3. Drop the original
DROP TABLE basket;
-- 4. Rename the copy
ALTER TABLE basket_temp RENAME TO basket;
Practical tips
- Always run a
SELECTwith your deduplication logic first to confirm which rows will be deleted before executing theDELETE. - Wrap the delete in a transaction so you can roll back if the result is not what you expected.
- After removing duplicates, add a
UNIQUEconstraint on the relevant columns to prevent future duplicates from accumulating. - The self-join method (
DELETE USING) is typically faster on small-to-medium tables. TheROW_NUMBER()subquery is more flexible for complex duplicate definitions.
Reference: PostgreSQL documentation — DELETE.