ANALYZE Command in PostgreSQL | PostgreSQL Tutorial

PostgreSQL’s query planner makes decisions—which index to use, which join algorithm to choose, how much memory to allocate for a sort—based on statistics about your data. The ANALYZE command collects these statistics by sampling a fraction of each table and storing the results in the pg_statistic system catalog. Outdated statistics cause the planner to choose suboptimal plans; fresh statistics enable the optimal ones.

Syntax

-- Analyze a specific table
ANALYZE orders;

-- Analyze specific columns (faster than full table)
ANALYZE orders (product_id, status, placed_at);

-- Analyze all tables in the current database
ANALYZE;

-- Show detailed progress during analysis
ANALYZE VERBOSE orders;

ANALYZE takes only a brief share lock while sampling—it does not block concurrent reads or writes.

Practical Example

Set up an orders table, load data, and inspect what statistics ANALYZE collects:

CREATE TABLE orders (
  id          SERIAL PRIMARY KEY,
  product_id  INTEGER NOT NULL,
  status      VARCHAR(20) NOT NULL DEFAULT 'pending',
  total_amount NUMERIC(10,2) NOT NULL,
  placed_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Simulate a bulk import
INSERT INTO orders (product_id, status, total_amount, placed_at)
SELECT
  (random() * 499 + 1)::INT,
  CASE WHEN random() < 0.6 THEN 'completed'
       WHEN random() < 0.9 THEN 'pending'
       ELSE 'cancelled' END,
  (random() * 500 + 5)::NUMERIC(10,2),
  now() - (random() * 365 || ' days')::INTERVAL
FROM generate_series(1, 50000);

-- After the bulk insert, refresh statistics
ANALYZE orders;

Query the statistics that ANALYZE stored for the status column:

SELECT
  attname            AS column_name,
  n_distinct,
  most_common_vals,
  most_common_freqs,
  correlation
FROM pg_stats
WHERE tablename = 'orders'
  AND attname   = 'status';

Key statistics fields:

Field	Meaning
`n_distinct`	Estimated distinct values. Negative = fraction of total rows (e.g., -0.003 = 0.3% distinct)
`most_common_vals`	The most frequent values in the column
`most_common_freqs`	Frequency of each most-common value (used for equality filter estimates)
`histogram_bounds`	Bucket boundaries for range queries
`correlation`	Physical vs logical order correlation. Near 1.0 = physically sorted; affects index scan cost estimates

Controlling Statistics Detail

The default statistics target is 100 samples per column (set by default_statistics_target). For columns with skewed distributions or high cardinality that appear in WHERE or JOIN conditions, raise the target:

-- Increase statistics detail for product_id
ALTER TABLE orders ALTER COLUMN product_id SET STATISTICS 500;

-- Apply the new target immediately
ANALYZE orders (product_id);

-- Confirm the updated statistics
SELECT attname, n_distinct, array_length(histogram_bounds, 1) AS histogram_buckets
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'product_id';

Higher targets improve plan quality at the cost of slightly more time to run ANALYZE and more space in pg_statistic.

Monitoring Autovacuum and ANALYZE Activity

The autovacuum daemon runs ANALYZE automatically when enough rows have changed. Monitor its activity:

SELECT
  schemaname,
  relname             AS table_name,
  last_analyze,
  last_autoanalyze,
  n_live_tup          AS live_rows,
  n_mod_since_analyze AS rows_since_last_analyze
FROM pg_stat_user_tables
ORDER BY rows_since_last_analyze DESC NULLS LAST;

If last_autoanalyze is NULL or very old on an active table, autovacuum may be disabled or its threshold may need tuning:

-- Trigger autovacuum analyze more aggressively on a high-write table
ALTER TABLE orders SET (
  autovacuum_analyze_scale_factor = 0.01,  -- 1% row changes triggers analyze
  autovacuum_analyze_threshold    = 50
);

When to Run ANALYZE Manually

After a bulk INSERT, COPY, or large UPDATE that modifies a large fraction of a table’s rows.
After creating a new index, so the planner has fresh statistics for index selectivity estimates.
When EXPLAIN ANALYZE shows large discrepancies between estimated and actual row counts.
After restoring a database from backup—pg_statistic data may be stale relative to the restored snapshot.

Testing with Vela

Statistics drift is hard to test in staging environments that have different data volumes than production. On a Vela branch, you can run a bulk import against a production-scale clone, immediately run ANALYZE, and verify with EXPLAIN ANALYZE that the planner’s row estimates match real counts—all without touching the live database.

Production Tips

Run VACUUM ANALYZE table_name to reclaim dead-tuple space and update statistics in one pass—more efficient than running them separately.
After a major bulk load (COPY, INSERT ... SELECT), do not wait for autovacuum; run ANALYZE immediately so queries over the new data get good plans from the start.
Raising default_statistics_target globally (in postgresql.conf) increases plan quality across all tables at the cost of slower ANALYZE runs. Target specific columns with SET STATISTICS instead.
Use ANALYZE VERBOSE in development to see which tables and columns are sampled and how long each takes—useful for budgeting ANALYZE time in maintenance windows.
The pg_stats view is a friendlier interface to pg_statistic and is the best place to inspect statistics for a specific column before opening a tuning investigation.

Frequently Asked Questions

What does ANALYZE do in PostgreSQL?

ANALYZE samples rows from a table and computes statistics about column value distributions—distinct value counts, most common values, histogram bounds, and physical sort correlation. The query planner reads these statistics from pg_statistic to estimate row counts and choose optimal execution plans.

How is ANALYZE different from VACUUM?

ANALYZE collects statistics for the query planner. VACUUM reclaims storage from dead rows created by UPDATE and DELETE operations and updates the visibility map used by index-only scans. VACUUM ANALYZE does both in one pass, which is the most efficient option when you need both.

Does autovacuum run ANALYZE automatically?

Yes. The autovacuum daemon runs ANALYZE automatically when a table has accumulated enough changed rows—by default 20% of its row count plus 50 rows. You can monitor this through pg_stat_user_tables and tune the threshold per table.

When should I increase the statistics target for a column?

Increase the target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS N when EXPLAIN ANALYZE shows large row-count misestimates on a column with skewed data or high cardinality used in WHERE clauses or JOINs. Common values are 200–500; the default is 100.

Is ANALYZE safe to run on a production database?

Yes. ANALYZE takes only a brief share lock during the sampling phase and does not block reads or writes. It is designed to run concurrently with normal database activity. Running it on individual tables or specific columns further limits its scope and duration.