PostgreSQL’s query planner makes decisions—which index to use, which join algorithm to choose, how much memory to allocate for a sort—based on statistics about your data. The ANALYZE command collects these statistics by sampling a fraction of each table and storing the results in the pg_statistic system catalog. Outdated statistics cause the planner to choose suboptimal plans; fresh statistics enable the optimal ones.
Syntax
-- Analyze a specific table
ANALYZE orders;
-- Analyze specific columns (faster than full table)
ANALYZE orders (product_id, status, placed_at);
-- Analyze all tables in the current database
ANALYZE;
-- Show detailed progress during analysis
ANALYZE VERBOSE orders;
ANALYZE takes only a brief share lock while sampling—it does not block concurrent reads or writes.
Practical Example
Set up an orders table, load data, and inspect what statistics ANALYZE collects:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
product_id INTEGER NOT NULL,
status VARCHAR(20) NOT NULL DEFAULT 'pending',
total_amount NUMERIC(10,2) NOT NULL,
placed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Simulate a bulk import
INSERT INTO orders (product_id, status, total_amount, placed_at)
SELECT
(random() * 499 + 1)::INT,
CASE WHEN random() < 0.6 THEN 'completed'
WHEN random() < 0.9 THEN 'pending'
ELSE 'cancelled' END,
(random() * 500 + 5)::NUMERIC(10,2),
now() - (random() * 365 || ' days')::INTERVAL
FROM generate_series(1, 50000);
-- After the bulk insert, refresh statistics
ANALYZE orders;
Query the statistics that ANALYZE stored for the status column:
SELECT
attname AS column_name,
n_distinct,
most_common_vals,
most_common_freqs,
correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'status';
Key statistics fields:
| Field | Meaning |
|---|---|
n_distinct | Estimated distinct values. Negative = fraction of total rows (e.g., -0.003 = 0.3% distinct) |
most_common_vals | The most frequent values in the column |
most_common_freqs | Frequency of each most-common value (used for equality filter estimates) |
histogram_bounds | Bucket boundaries for range queries |
correlation | Physical vs logical order correlation. Near 1.0 = physically sorted; affects index scan cost estimates |
Controlling Statistics Detail
The default statistics target is 100 samples per column (set by default_statistics_target). For columns with skewed distributions or high cardinality that appear in WHERE or JOIN conditions, raise the target:
-- Increase statistics detail for product_id
ALTER TABLE orders ALTER COLUMN product_id SET STATISTICS 500;
-- Apply the new target immediately
ANALYZE orders (product_id);
-- Confirm the updated statistics
SELECT attname, n_distinct, array_length(histogram_bounds, 1) AS histogram_buckets
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'product_id';
Higher targets improve plan quality at the cost of slightly more time to run ANALYZE and more space in pg_statistic.
Monitoring Autovacuum and ANALYZE Activity
The autovacuum daemon runs ANALYZE automatically when enough rows have changed. Monitor its activity:
SELECT
schemaname,
relname AS table_name,
last_analyze,
last_autoanalyze,
n_live_tup AS live_rows,
n_mod_since_analyze AS rows_since_last_analyze
FROM pg_stat_user_tables
ORDER BY rows_since_last_analyze DESC NULLS LAST;
If last_autoanalyze is NULL or very old on an active table, autovacuum may be disabled or its threshold may need tuning:
-- Trigger autovacuum analyze more aggressively on a high-write table
ALTER TABLE orders SET (
autovacuum_analyze_scale_factor = 0.01, -- 1% row changes triggers analyze
autovacuum_analyze_threshold = 50
);
When to Run ANALYZE Manually
- After a bulk
INSERT,COPY, or largeUPDATEthat modifies a large fraction of a table’s rows. - After creating a new index, so the planner has fresh statistics for index selectivity estimates.
- When
EXPLAIN ANALYZEshows large discrepancies between estimated and actual row counts. - After restoring a database from backup—
pg_statisticdata may be stale relative to the restored snapshot.
Testing with Vela
Statistics drift is hard to test in staging environments that have different data volumes than production. On a Vela branch, you can run a bulk import against a production-scale clone, immediately run ANALYZE, and verify with EXPLAIN ANALYZE that the planner’s row estimates match real counts—all without touching the live database.
Production Tips
- Run
VACUUM ANALYZE table_nameto reclaim dead-tuple space and update statistics in one pass—more efficient than running them separately. - After a major bulk load (
COPY,INSERT ... SELECT), do not wait for autovacuum; runANALYZEimmediately so queries over the new data get good plans from the start. - Raising
default_statistics_targetglobally (inpostgresql.conf) increases plan quality across all tables at the cost of slowerANALYZEruns. Target specific columns withSET STATISTICSinstead. - Use
ANALYZE VERBOSEin development to see which tables and columns are sampled and how long each takes—useful for budgetingANALYZEtime in maintenance windows. - The
pg_statsview is a friendlier interface topg_statisticand is the best place to inspect statistics for a specific column before opening a tuning investigation.