Delete Duplicate Rows in PostgreSQL

Duplicate rows accumulate when tables lack unique constraints or data is imported from external sources without deduplication. PostgreSQL offers three main approaches to find and remove duplicates: a self-join using DELETE USING, a ROW_NUMBER() window function subquery, and an immediate table technique. All three should be wrapped in a transaction so you can ROLLBACK if something goes wrong.

Finding Duplicate Rows First

Before deleting, identify which values are duplicated and how many copies exist:

CREATE TABLE shipments (
  id          SERIAL PRIMARY KEY,
  tracking_no VARCHAR(50) NOT NULL,
  carrier     VARCHAR(50) NOT NULL,
  destination VARCHAR(150) NOT NULL
);

INSERT INTO shipments (tracking_no, carrier, destination)
VALUES
  ('TRK-1001', 'FastFreight', 'Berlin'),
  ('TRK-1001', 'FastFreight', 'Berlin'),   -- duplicate
  ('TRK-2050', 'SkyMove',     'Tokyo'),
  ('TRK-2050', 'SkyMove',     'Tokyo'),    -- duplicate
  ('TRK-2050', 'SkyMove',     'Tokyo'),    -- duplicate again
  ('TRK-3099', 'RailEx',      'Paris');

Find duplicated tracking numbers:

SELECT tracking_no, COUNT(*) AS copies
FROM shipments
GROUP BY tracking_no
HAVING COUNT(*) > 1
ORDER BY tracking_no;

 tracking_no | copies
-------------+--------
 TRK-1001    |      2
 TRK-2050    |      3
(2 rows)

Method 1: DELETE USING (Self-Join)

Join the table to itself and delete the row with the lower ID, keeping the highest ID for each duplicate group:

DELETE FROM shipments a
USING shipments b
WHERE a.id < b.id
  AND a.tracking_no = b.tracking_no
  AND a.carrier = b.carrier;

Output:

DELETE 3

To keep the lowest ID instead, flip the operator:

DELETE FROM shipments a
USING shipments b
WHERE a.id > b.id
  AND a.tracking_no = b.tracking_no
  AND a.carrier = b.carrier;

Method 2: ROW_NUMBER() Subquery

Use ROW_NUMBER() partitioned by the duplicate columns. Rows numbered greater than 1 are duplicates:

DELETE FROM shipments
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (
             PARTITION BY tracking_no, carrier
             ORDER BY id
           ) AS row_num
    FROM shipments
  ) ranked
  WHERE row_num > 1
);

To keep the highest ID, change ORDER BY id to ORDER BY id DESC inside the window function.

For duplicates defined across multiple columns, include all of them in PARTITION BY:

DELETE FROM shipments
WHERE id IN (
  SELECT id
  FROM (
    SELECT id,
           ROW_NUMBER() OVER (
             PARTITION BY tracking_no, carrier, destination
             ORDER BY id
           ) AS row_num
    FROM shipments
  ) ranked
  WHERE row_num > 1
);

Preview the rows that would be deleted before running the DELETE:

SELECT id, tracking_no, carrier, destination
FROM (
  SELECT id, tracking_no, carrier, destination,
         ROW_NUMBER() OVER (
           PARTITION BY tracking_no, carrier
           ORDER BY id
         ) AS row_num
  FROM shipments
) ranked
WHERE row_num > 1;

Method 3: Immediate Table

When the table has no usable unique identifier, create a deduplicated copy:

-- 1. Create a new table with the same structure
CREATE TABLE shipments_dedup (LIKE shipments INCLUDING ALL);

-- 2. Insert only the first row per duplicate group
INSERT INTO shipments_dedup (tracking_no, carrier, destination)
SELECT DISTINCT ON (tracking_no, carrier) tracking_no, carrier, destination
FROM shipments
ORDER BY tracking_no, carrier, id;

-- 3. Rename the tables (swap them)
ALTER TABLE shipments RENAME TO shipments_old;
ALTER TABLE shipments_dedup RENAME TO shipments;

-- 4. Drop the original after verifying the result
DROP TABLE shipments_old;

Note: this method fails if other tables have foreign keys referencing the original table. Use methods 1 or 2 in that case.

Testing with Vela

Deduplication is irreversible once committed. Use Vela database branching to run the deduplication logic against a copy of your production data first. Preview the DELETE with a SELECT, execute on the branch, verify the result set, and then apply the same SQL to production. After deduplication, add a UNIQUE constraint on the relevant columns so duplicates cannot accumulate again.

Production Tips

Always run a SELECT with the same deduplication logic first to confirm which row IDs will be deleted.
Wrap the DELETE in an explicit transaction so you can ROLLBACK if the result is unexpected.
After deduplication, add a UNIQUE constraint to prevent future duplicates: ALTER TABLE shipments ADD CONSTRAINT shipments_tracking_carrier_unique UNIQUE (tracking_no, carrier);
The self-join method is typically faster on small-to-medium tables. The ROW_NUMBER() subquery is more flexible for complex duplicate definitions spanning many columns.

For very large tables, chunk the delete using a CTE with LIMIT to avoid long lock contention:

WITH to_delete AS (
  SELECT a.id FROM shipments a
  JOIN shipments b ON a.tracking_no = b.tracking_no
    AND a.carrier = b.carrier AND a.id < b.id
  LIMIT 10000
)
DELETE FROM shipments WHERE id IN (SELECT id FROM to_delete);

If the immediate table method is used, recreate all indexes and constraints on the new table before renaming.

Frequently Asked Questions

What does deleting duplicate rows mean in PostgreSQL?

Deleting duplicate rows means removing all but one copy of rows that share identical values in one or more columns. PostgreSQL does not automatically prevent duplicates unless you have a UNIQUE constraint, so duplicates can accumulate through bulk imports or missing constraints.

Does deleting duplicates lock the table?

Yes. The DELETE statement acquires a ROW EXCLUSIVE lock on the table and row-level locks on each deleted row. For large deduplication operations, consider running during a maintenance window or batching the deletes to limit lock duration.

What happens if the table has no primary key or unique ID column?

Without a unique row identifier you cannot use the self-join method reliably. Use the immediate table technique instead: create a new table with the same structure, INSERT DISTINCT rows into it, DROP the original, and RENAME the copy.

Can I preview which rows will be deleted before running the DELETE?

Yes. Replace DELETE with SELECT in the self-join or subquery to see exactly which row IDs would be deleted. Always do this before executing the destructive statement, especially on large tables.

What is the safest way to delete duplicates in production?

Preview with SELECT first, wrap the DELETE in a transaction so you can ROLLBACK if the result is unexpected, then add a UNIQUE constraint afterward to prevent duplicates from accumulating again.