SELECT - GROUP BY#
A SELECT statement clause that divides the query result into groups of rows, usually for the purpose of performing one or more aggregations on each group. The SELECT statement returns one row per group.
Syntax#
GROUP BY {
column-expression
| ROLLUP ( <group_by_expression> [ ,...n ] )
| CUBE ( <group_by_expression> [ ,...n ] )
| GROUPING SETS ( <grouping_set> [ ,...n ] )
| () --calculates the grand total
} [ ,...n ]
<group_by_expression> ::=
column-expression
| ( column-expression [ ,...n ] )
<grouping_set> ::=
() --calculates the grand total
| <grouping_set_item>
| ( <grouping_set_item> [ ,...n ] )
<grouping_set_item> ::=
<group_by_expression>
| ROLLUP ( <group_by_expression> [ ,...n ] )
| CUBE ( <group_by_expression> [ ,...n ] )
-- For backward compatibility only.
GROUP BY
[ ALL ] column-expression [ ,...n ]
| column-expression [ ,...n ] [ WITH { CUBE | ROLLUP } ]
Arguments#
column-expression
Specifies a column or a non-aggregate calculation on a column. This column can belong to a table, derived table, or view. The column must appear in the FROM clause of the SELECT statement, but is not required to appear in the SELECT list.
For valid expressions, see expression.
The column must appear in the FROM clause of the SELECT statement, but is not required to appear in the SELECT list. However, each table or view column in any nonaggregate expression in the <select> list must be included in the GROUP BY list.
Specifies a column or a non-aggregate calculation on a column. This column can belong to a table, derived table, or view. The column must appear in the FROM clause of the SELECT statement, but is not required to appear in the SELECT list.
For valid expressions, see expression.
The column must appear in the FROM clause of the SELECT statement, but is not required to appear in the SELECT list. However, each table or view column in any nonaggregate expression in the <select> list must be included in the GROUP BY list:
The following statements are allowed:
SELECT ColumnA, ColumnB FROM T GROUP BY ColumnA, ColumnB;
SELECT ColumnA + ColumnB FROM T GROUP BY ColumnA, ColumnB;
SELECT ColumnA + ColumnB FROM T GROUP BY ColumnA + ColumnB;
SELECT ColumnA + ColumnB + constant FROM T GROUP BY ColumnA, ColumnB;
The following statements are not allowed:
SELECT ColumnA, ColumnB FROM T GROUP BY ColumnA + ColumnB;
SELECT ColumnA + constant + ColumnB FROM T GROUP BY ColumnA + ColumnB;
The column expression cannot contain:
A column alias that is defined in the SELECT list. It can use a column alias for a derived table that is defined in the FROM clause.
A column of type text, ntext, or image. However, you can use a column of text, ntext, or image as an argument to a function that returns a value of a valid data type. For example, the expression can use SUBSTRING() and CAST(). This also applies to expressions in the HAVING clause.
A subquery. Error 144 is returned.
GROUP BY column-expression [ ,…n ]
Groups the SELECT statement results according to the values in a list of one or more column expressions.
For example, this query creates a Sales table with columns for Country, Region, and Sales. It inserts four rows and two of the rows have matching values for Country and Region.
CREATE TABLE Sales ( Country varchar(50), Region varchar(50), Sales int );
INSERT INTO sales VALUES (N'Canada', N'Alberta', 100);
INSERT INTO sales VALUES (N'Canada', N'British Columbia', 200);
INSERT INTO sales VALUES (N'Canada', N'British Columbia', 300);
INSERT INTO sales VALUES (N'United States', N'Montana', 100);
The Sales table contains these rows:
Country |
Region |
Sales |
---|---|---|
Canada |
Alberta |
100 |
Canada |
British Columbia |
200 |
Canada |
British Columbia |
300 |
United States |
Montana |
100 |
This next query groups Country and Region and returns the aggregate sum for each combination of values.
SELECT Country, Region, SUM(sales) AS TotalSales
FROM Sales
GROUP BY Country, Region;
The query result has 3 rows since there are 3 combinations of values for Country and Region. The TotalSales for Canada and British Columbia is the sum of two rows.
Country |
Region |
TotalSales |
---|---|---|
Canada |
Alberta |
100 |
Canada |
British Columbia |
500 |
United States |
Montana |
100 |
GROUP BY ROLLUP#
Creates a group for each combination of column expressions. In addition, it “rolls up” the results into subtotals and grand totals. To do this, it moves from right to left decreasing the number of column expressions over which it creates groups and the aggregation(s).
The column order affects the ROLLUP output and can affect the number of rows in the result set.
For example, GROUP BY ROLLUP (col1, col2, col3, col4) creates groups for each combination of column expressions in the following lists.
col1, col2, col3, col4
col1, col2, col3, NULL
col1, col2, NULL, NULL
col1, NULL, NULL, NULL
NULL, NULL, NULL, NULL –This is the grand total
Using the table from the previous example, this code runs a GROUP BY ROLLUP operation instead of a simple GROUP BY.
SELECT Country, Region, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY ROLLUP (Country, Region);
The query result has the same aggregations as the simple GROUP BY without the ROLLUP. In addition, it creates subtotals for each value of Country. Finally, it gives a grand total for all rows. The result looks like this:
Country |
Region |
TotalSales |
---|---|---|
Canada |
Alberta |
100 |
Canada |
British Columbia |
500 |
Canada |
NULL |
600 |
United States |
Montana |
100 |
United States |
NULL |
100 |
NULL |
NULL |
700 |
GROUP BY CUBE ( )#
GROUP BY CUBE creates groups for all possible combinations of columns. For GROUP BY CUBE (a, b) the results has groups for unique values of (a, b), (NULL, b), (a, NULL), and (NULL, NULL).
Using the table from the previous examples, this code runs a GROUP BY CUBE operation on Country
and Region
.
SELECT Country, Region, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY CUBE (Country, Region);
The query result has groups for unique values of (Country, Region), (NULL, Region), (Country, NULL), and (NULL, NULL). The results look like this:
Country |
Region |
TotalSales |
---|---|---|
Canada |
Alberta |
100 |
NULL |
Alberta |
100 |
Canada |
British Columbia |
500 |
NULL |
British Columbia |
500 |
United States |
Montana |
100 |
NULL |
Montana |
100 |
NULL |
NULL |
700 |
Canada |
NULL |
600 |
United States |
NULL |
100 |
GROUP BY GROUPING SETS ( )#
The GROUPING SETS option gives you the ability to combine multiple GROUP BY clauses into one GROUP BY clause. The results are the equivalent of UNION ALL of the specified groups.
For example, GROUP BY ROLLUP (Country, Region)
and GROUP BY GROUPING SETS ( ROLLUP (Country, Region) )
return the same results.
When GROUPING SETS has two or more elements, the results are a union of the elements. This example returns the union of the ROLLUP and CUBE results for Country and Region.
SELECT Country, Region, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY GROUPING SETS ( ROLLUP (Country, Region), CUBE (Country, Region) );
The results are the same as this query that returns a union of the two GROUP BY statements.
SELECT Country, Region, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY ROLLUP (Country, Region)
UNION ALL
SELECT Country, Region, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY CUBE (Country, Region);
SQL does not consolidate duplicate groups generated for a GROUPING SETS list.
For example, in GROUP BY ( (), CUBE (Country, Region) )
, both elements return a row for the grand total
and both rows will be listed in the results.
GROUP BY ()#
Specifies the empty group which generates the grand total. This is useful as one of the elements of a GROUPING SET. For example, this statement gives the total sales for each country and then gives the grand-total for all countries.
SELECT Country, SUM(Sales) AS TotalSales
FROM Sales
GROUP BY GROUPING SETS ( Country, () );
GROUP BY [ ALL ] column-expression [ ,…n ]#
Specifies to include all groups in the results regardless of whether they meet the search criteria in the WHERE clause. Groups that don’t meet the search criteria have NULL for the aggregation.
General Remarks#
How GROUP BY interacts with the SELECT statement#
SELECT list:
Vector aggregates. If aggregate functions are included in the SELECT list, GROUP BY calculates a summary value for each group. These are known as vector aggregates.
Distinct aggregates. The aggregates AVG (DISTINCT column_name), COUNT (DISTINCT column_name), and SUM (DISTINCT column_name) are supported with ROLLUP, CUBE, and GROUPING SETS.
WHERE clause:
SQL removes Rows that do not meet the conditions in the WHERE clause before any grouping operation is performed.
HAVING clause:
SQL uses the having clause to filter groups in the result set.
ORDER BY clause:
Use the ORDER BY clause to order the result set. The GROUP BY clause does not order the result set.
NULL values:
If a grouping column contains NULL values, all NULL values are considered equal and they are collected into a single group.
Examples#
A. Use a simple GROUP BY clause#
The following example retrieves the total for each SalesOrderID from the SalesOrderDetail table. This example uses AdventureWorks.
SELECT SalesOrderID, SUM(LineTotal) AS SubTotal
FROM Sales.SalesOrderDetail AS sod
GROUP BY SalesOrderID
ORDER BY SalesOrderID;
B. Use a GROUP BY clause with multiple tables#
The following example retrieves the number of employees for each City
from the Address
table joined to the EmployeeAddress
table.
This example uses AdventureWorks.
SELECT a.City, COUNT(bea.AddressID) EmployeeCount
FROM Person.BusinessEntityAddress AS bea
INNER JOIN Person.Address AS a
ON bea.AddressID = a.AddressID
GROUP BY a.City
ORDER BY a.City;
C. Use a GROUP BY clause with an expression#
The following example retrieves the total sales for each year by using the DATEPART
function.
The same expression must be present in both the SELECT
list and GROUP BY
clause.
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
, SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
ORDER BY DATEPART(yyyy,OrderDate);
D. Use a GROUP BY clause with a HAVING clause#
The following example uses the HAVING clause to specify which of the groups generated in the GROUP BY clause should be included in the result set.
SELECT DATEPART(yyyy,OrderDate) AS N'Year'
, SUM(TotalDue) AS N'Total Order Amount'
FROM Sales.SalesOrderHeader
GROUP BY DATEPART(yyyy,OrderDate)
HAVING DATEPART(yyyy,OrderDate) >= N'2003'
ORDER BY DATEPART(yyyy,OrderDate);