pg_stat_statements in PostgreSQL | PostgreSQL Tutorial

The pg_stat_statements extension records query-level performance data for every database on a PostgreSQL instance. Every time a query runs, its statistics are aggregated by a normalized query fingerprint—literals are replaced with parameters, so queries that differ only in their values are counted together. This makes it straightforward to identify which query patterns consume the most time across millions of executions.

Enabling pg_stat_statements

The extension ships with PostgreSQL but must be explicitly loaded. Add it to postgresql.conf and create the extension in the target database:

-- In postgresql.conf (requires a PostgreSQL restart)
shared_preload_libraries = 'pg_stat_statements'

-- After restarting, create the extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Verify that it is active:

SELECT extname, extversion
FROM pg_extension
WHERE extname = 'pg_stat_statements';

Finding the Slowest Queries by Mean Execution Time

These are the queries where each individual call is expensive—prime candidates for index improvements or query rewrites:

SELECT
  LEFT(query, 100)                          AS query_snippet,
  calls,
  ROUND(mean_exec_time::NUMERIC, 2)         AS mean_ms,
  ROUND(total_exec_time::NUMERIC, 2)        AS total_ms,
  rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

Finding Queries That Consume the Most Total Time

A fast query called millions of times can dominate overall database load. Sort by total execution time to find the biggest contributors:

SELECT
  LEFT(query, 100)                                                 AS query_snippet,
  calls,
  ROUND(total_exec_time::NUMERIC, 2)                               AS total_ms,
  ROUND(mean_exec_time::NUMERIC, 2)                                AS mean_ms,
  ROUND((total_exec_time / SUM(total_exec_time) OVER ()) * 100, 1) AS pct_of_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Identifying Cache-Unfriendly Queries

Queries with low buffer cache hit rates are reading data from disk. Adding an index or increasing shared_buffers can help:

SELECT
  LEFT(query, 100)                 AS query_snippet,
  calls,
  shared_blks_read                 AS disk_reads,
  shared_blks_hit                  AS cache_hits,
  ROUND(
    shared_blks_hit::NUMERIC /
    NULLIF(shared_blks_hit + shared_blks_read, 0) * 100, 1
  )                                AS cache_hit_pct
FROM pg_stat_statements
WHERE shared_blks_read > 0
ORDER BY shared_blks_read DESC
LIMIT 20;

A cache_hit_pct below 90% on a frequently executed query is worth investigating.

Identifying High-Variability Queries

Queries with high standard deviation relative to their mean may have data-dependent performance issues—for example, queries that work well for most values but fall back to a sequential scan for a few popular ones:

SELECT
  LEFT(query, 100)                                   AS query_snippet,
  calls,
  ROUND(mean_exec_time::NUMERIC, 2)                  AS mean_ms,
  ROUND(stddev_exec_time::NUMERIC, 2)                AS stddev_ms,
  ROUND(max_exec_time::NUMERIC, 2)                   AS max_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY stddev_exec_time DESC
LIMIT 20;

Resetting Statistics

-- Reset all query statistics (superuser only)
SELECT pg_stat_statements_reset();

-- Reset a single query by queryid (PostgreSQL 14+)
SELECT pg_stat_statements_reset(userid, dbid, queryid)
FROM pg_stat_statements
WHERE query LIKE '%orders%'
LIMIT 1;

Reset after a tuning effort to get a clean baseline for the next measurement period.

Testing with Vela

pg_stat_statements data is most valuable when collected against a production-representative workload. Create a Vela branch cloned from production, run your actual application traffic or a replay tool against the branch, and query pg_stat_statements on the branch to identify which queries dominate load before making any index or schema changes. This approach lets you measure the impact of each optimization in isolation without affecting production query plans or statistics.

Production Tips

Set pg_stat_statements.track = all in postgresql.conf to also capture queries executed inside stored procedures and functions—not just top-level statements.
The query column normalizes literals. Use queryid for programmatic lookups to join with other monitoring data.
Statistics persist across connections but reset on server restart unless pg_stat_statements.save = on (default is on).
Combine pg_stat_statements with EXPLAIN ANALYZE on the top offenders to pinpoint exactly which plan node consumes the most time.
Check pg_stat_statements.max (default 5000) if the view seems to be missing queries—when the limit is reached, the least-used entries are evicted. Increase it for high-cardinality workloads.

Frequently Asked Questions

What is pg_stat_statements in PostgreSQL?

pg_stat_statements is a built-in PostgreSQL extension that tracks cumulative execution statistics for every distinct SQL query pattern—call count, total and mean execution time, rows processed, and buffer usage. It is the primary tool for identifying slow or high-load queries in production without any application-level instrumentation.

How do I enable pg_stat_statements?

Add pg_stat_statements to shared_preload_libraries in postgresql.conf and restart PostgreSQL. Then run CREATE EXTENSION IF NOT EXISTS pg_stat_statements in the database where you want to query the statistics.

How are query texts normalized in pg_stat_statements?

Literal values (numbers, strings, timestamps) in queries are replaced with parameter placeholders ($1, $2, ...) so that queries differing only in their literal values are grouped under the same fingerprint. This means two queries identical except for their WHERE clause values count as a single entry.

How do I find which queries use the most total database time?

Query pg_stat_statements ordered by total_exec_time DESC. This surfaces the queries with the highest cumulative execution time. These are the best optimization targets because improving them reduces overall database load the most, even if individual executions are fast.

When should I reset pg_stat_statements?

Reset statistics with pg_stat_statements_reset() after significant schema changes, after deploying optimizations, or at the start of a load test to get a clean baseline. Without resetting, stale entries from retired queries continue to appear and skew your analysis.