PostgreSQL's query planner makes decisions—which index to use, which join algorithm to choose, how much memory to allocate for a sort—based on statistics about your data. The ANALYZE command collects these statistics by sampling a fraction of each table and storing the results in the pg_statistic system catalog. Outdated statistics lead to bad plans; fresh statistics lead to optimal ones.
Basic ANALYZE syntax
-- Analyze a specific table
ANALYZE orders;
-- Analyze specific columns (faster than full table analyze)
ANALYZE orders (customer_id, status, placed_at);
-- Analyze all tables in the current database
ANALYZE;
-- Analyze with verbose progress output
ANALYZE VERBOSE orders;
ANALYZE does not lock the table for reads or writes—it takes only a brief share lock while sampling, making it safe to run on production tables.
What statistics ANALYZE collects
For each analyzed column, PostgreSQL stores information in pg_statistic (summarized in the more readable pg_stats view):
SELECT
attname AS column_name,
n_distinct,
correlation,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'status';
Key statistics fields:
- n_distinct: estimated number of distinct values. Negative means it's a fraction of total rows (e.g., -0.01 = 1% distinct).
- most_common_vals / most_common_freqs: the most frequent values and their frequencies—used for selectivity estimates on equality filters.
- histogram_bounds: bucket boundaries for range queries—used when a value does not appear in most_common_vals.
- correlation: physical-to-logical row order correlation. Values near 1.0 mean the column is physically sorted, which makes index scans more efficient.
Controlling statistics detail
The default statistics target is 100 histogram buckets per column (controlled by default_statistics_target). For columns with skewed distributions or high cardinality, increase the target to give the planner more detail:
-- Increase statistics detail for a specific column
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 500;
-- Apply the new target
ANALYZE orders (customer_id);
Higher targets improve plan quality at the cost of slightly more time to run ANALYZE and more space in pg_statistic.
Autovacuum and automatic ANALYZE
PostgreSQL's autovacuum daemon runs ANALYZE automatically when a table has changed enough rows (default: 20% of row count + 50 rows). You can view autovacuum activity:
SELECT
schemaname,
relname,
last_analyze,
last_autoanalyze,
n_live_tup,
n_dead_tup
FROM pg_stat_user_tables
ORDER BY last_autoanalyze DESC NULLS LAST;
If last_autoanalyze is null or very old on an active table, autovacuum may be disabled or the table's autovacuum settings may need tuning.
When to run ANALYZE manually
- After a bulk
INSERT,COPY, orUPDATEthat changes a large fraction of a table's rows. - After creating a new index, to ensure the planner has fresh statistics for index selectivity estimates.
- When
EXPLAIN ANALYZEshows large discrepancies between estimated and actual row counts. - After restoring a database from backup—pg_statistic is included in pg_dump, but may be stale.