The GROUP BY clause in PostgreSQL collapses rows that share the same values in one or more columns into a single summary row. It is the foundation of aggregate reporting — without GROUP BY, aggregate functions like COUNT or SUM collapse the entire table into one row. With it, you get one row per unique grouping.
Syntax
SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1
ORDER BY column1;
Every column in the SELECT list that is not inside an aggregate must appear in the GROUP BY clause. You can group by multiple columns, expressions, or ordinal positions.
Practical Example
Create a simple sales schema and load some data:
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR NOT NULL,
category VARCHAR NOT NULL,
price NUMERIC(10, 2) NOT NULL
);
CREATE TABLE order_lines (
line_id SERIAL PRIMARY KEY,
product_id INT NOT NULL REFERENCES products,
quantity INT NOT NULL,
sold_at DATE NOT NULL
);
INSERT INTO products (product_name, category, price) VALUES
('Widget A', 'Hardware', 12.50),
('Widget B', 'Hardware', 8.00),
('Gadget X', 'Electronics', 49.99),
('Gadget Y', 'Electronics', 29.99),
('Service Pack', 'Services', 99.00);
INSERT INTO order_lines (product_id, quantity, sold_at) VALUES
(1, 3, '2025-01-05'), (2, 10, '2025-01-07'),
(3, 1, '2025-01-10'), (4, 2, '2025-01-15'),
(1, 5, '2025-02-03'), (3, 4, '2025-02-18'),
(5, 1, '2025-02-20'), (2, 7, '2025-03-01');
Count units sold per product category:
SELECT
p.category,
COUNT(ol.line_id) AS order_count,
SUM(ol.quantity) AS units_sold,
AVG(p.price) AS avg_price
FROM order_lines ol
JOIN products p ON p.product_id = ol.product_id
GROUP BY p.category
ORDER BY units_sold DESC;
Expected result:
category | order_count | units_sold | avg_price
-------------+-------------+------------+-----------
Hardware | 4 | 25 | 10.25
Electronics | 3 | 7 | 39.99
Services | 1 | 1 | 99.00
Group by month and category together to get a time-series breakdown:
SELECT
date_trunc('month', ol.sold_at) AS month,
p.category,
SUM(ol.quantity * p.price) AS revenue
FROM order_lines ol
JOIN products p ON p.product_id = ol.product_id
GROUP BY 1, 2
ORDER BY 1, revenue DESC;
Using the ordinal positions 1, 2 is a PostgreSQL convenience — equivalent to listing the full expressions.
Verify the grouping with a catalog query:
SELECT
p.category,
COUNT(DISTINCT p.product_id) AS distinct_products,
COUNT(ol.line_id) AS total_orders
FROM products p
LEFT JOIN order_lines ol ON ol.product_id = p.product_id
GROUP BY p.category;
Aggregate Functions Reference
| Function | Returns |
|---|---|
COUNT(*) | Total rows in the group |
COUNT(col) | Non-NULL values in the column |
COUNT(DISTINCT col) | Unique non-NULL values |
SUM(col) | Total of all values |
AVG(col) | Mean value |
MIN(col) / MAX(col) | Smallest / largest value |
STRING_AGG(col, sep) | Concatenated strings with separator |
Testing with Vela
Before deploying GROUP BY-heavy reporting queries to production, use Vela’s database branching to benchmark them on a production-data clone. Create a branch from the live database, run EXPLAIN ANALYZE on the GROUP BY query, and experiment with index changes — all without impacting the production workload. This is especially useful when introducing GROUPING SETS, ROLLUP, or materialized views to replace a slow GROUP BY query.
Production Tips
- Index the columns used in
GROUP BYwhen the query runs on a reporting schedule — a B-tree index on the grouping column can turn a sequential scan into an index scan. - Place selective filters in
WHERErather thanHAVINGwhen possible;WHEREreduces the row count before grouping, letting the planner use indexes.HAVINGis evaluated after aggregation and cannot use standard indexes. COUNT(DISTINCT col)is slower thanCOUNT(*)on large groups because it must hash or sort within each group. For high-frequency dashboards, consider a pre-aggregated table or a materialized view with a unique index.- Use
GROUPING SETS,ROLLUP, orCUBEto compute multiple levels of aggregation in a single table scan instead of writing severalUNION ALLqueries. - When using
GROUP BYwith aJOIN, filter the joined table in theWHEREclause before grouping to avoid aggregating unnecessary rows.