ANALYZE Command

Learn how the PostgreSQL ANALYZE command collects table statistics stored in pg_statistic, how autovacuum runs it automatically, and when to run it manually for accurate query plans.

5 min read · Last updated: March 2026 · Back to overview

Quick Answer

ANALYZE collects statistics about table column distributions and stores them in pg_statistic. The query planner uses these statistics to choose efficient execution plans. PostgreSQL runs ANALYZE automatically via autovacuum, but you should run it manually after bulk data loads or when query plans look wrong.

Spin up a Postgres database in 20 seconds with Vela.

Try Vela Sandbox

PostgreSQL's query planner makes decisions—which index to use, which join algorithm to choose, how much memory to allocate for a sort—based on statistics about your data. The ANALYZE command collects these statistics by sampling a fraction of each table and storing the results in the pg_statistic system catalog. Outdated statistics lead to bad plans; fresh statistics lead to optimal ones.

Basic ANALYZE syntax

-- Analyze a specific table
ANALYZE orders;

-- Analyze specific columns (faster than full table analyze)
ANALYZE orders (customer_id, status, placed_at);

-- Analyze all tables in the current database
ANALYZE;

-- Analyze with verbose progress output
ANALYZE VERBOSE orders;

ANALYZE does not lock the table for reads or writes—it takes only a brief share lock while sampling, making it safe to run on production tables.

What statistics ANALYZE collects

For each analyzed column, PostgreSQL stores information in pg_statistic (summarized in the more readable pg_stats view):

SELECT
  attname       AS column_name,
  n_distinct,
  correlation,
  most_common_vals,
  most_common_freqs
FROM pg_stats
WHERE tablename = 'orders'
  AND attname = 'status';

Key statistics fields:

  • n_distinct: estimated number of distinct values. Negative means it's a fraction of total rows (e.g., -0.01 = 1% distinct).
  • most_common_vals / most_common_freqs: the most frequent values and their frequencies—used for selectivity estimates on equality filters.
  • histogram_bounds: bucket boundaries for range queries—used when a value does not appear in most_common_vals.
  • correlation: physical-to-logical row order correlation. Values near 1.0 mean the column is physically sorted, which makes index scans more efficient.

Controlling statistics detail

The default statistics target is 100 histogram buckets per column (controlled by default_statistics_target). For columns with skewed distributions or high cardinality, increase the target to give the planner more detail:

-- Increase statistics detail for a specific column
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 500;

-- Apply the new target
ANALYZE orders (customer_id);

Higher targets improve plan quality at the cost of slightly more time to run ANALYZE and more space in pg_statistic.

Autovacuum and automatic ANALYZE

PostgreSQL's autovacuum daemon runs ANALYZE automatically when a table has changed enough rows (default: 20% of row count + 50 rows). You can view autovacuum activity:

SELECT
  schemaname,
  relname,
  last_analyze,
  last_autoanalyze,
  n_live_tup,
  n_dead_tup
FROM pg_stat_user_tables
ORDER BY last_autoanalyze DESC NULLS LAST;

If last_autoanalyze is null or very old on an active table, autovacuum may be disabled or the table's autovacuum settings may need tuning.

When to run ANALYZE manually

  • After a bulk INSERT, COPY, or UPDATE that changes a large fraction of a table's rows.
  • After creating a new index, to ensure the planner has fresh statistics for index selectivity estimates.
  • When EXPLAIN ANALYZE shows large discrepancies between estimated and actual row counts.
  • After restoring a database from backup—pg_statistic is included in pg_dump, but may be stale.

Continue in Performance: VACUUM Command.

Related in this section: EXPLAIN · EXPLAIN ANALYZE · Indexes and Index-Only Scans

Frequently Asked Questions

What does ANALYZE do in PostgreSQL?

ANALYZE samples rows from a table and computes statistics about column value distributions—distinct value counts, most common values, histogram bounds, and physical sort correlation. The query planner reads these statistics from pg_statistic to estimate row counts and choose optimal execution plans.

How is ANALYZE different from VACUUM?

ANALYZE collects statistics for the query planner. VACUUM reclaims storage from dead rows created by UPDATE and DELETE operations and updates the visibility map used by index-only scans. VACUUM ANALYZE does both in one pass.

Does autovacuum run ANALYZE automatically?

Yes. The autovacuum daemon runs ANALYZE automatically when a table has accumulated enough changed rows (by default, 20% of its row count plus 50 rows). You can monitor autovacuum activity through pg_stat_user_tables.

When should I increase the statistics target for a column?

Increase the statistics target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS N when EXPLAIN ANALYZE shows large row-count misestimates on a column, especially for highly skewed data or high-cardinality columns used in WHERE clauses or JOINs. Common values are 200–500.

Is ANALYZE safe to run on a production database?

Yes. ANALYZE takes only a brief share lock during the sampling phase and does not block reads or writes. It is designed to run concurrently with normal database activity. Running it on individual tables or columns further limits the scope and duration.