If you're looking to master the art of writing advanced SQL queries, you can't go wrong with studying SQL window functions. Although they will add an extra layer of complexity to your basic data management and analysis, learning to apply them will be rewarding and absolutely essential for a power user of SQL.
As usual, we'll start with the definition.
What are SQL window functions?
A window function in SQL is a type of function that performs a calculation across a specific set of rows (the 'window' in question), defined by an OVER() clause.
Window functions use values from one or multiple rows to return a value for each row, which makes them different from traditional aggregate functions, which return a single value for multiple rows.
To make things clearer from the very beginning, let's proceed directly to the general syntax for window functions.
Key components of SQL window functions
The syntax for window functions is as follows:
SELECT column_1, column_2, column_3, function()
OVER (PARTITION BY partition_expression ORDER BY order_expression) as output_column_name
FROM table_name
In this syntax:
The SELECT clause defines the columns you want to select from the table_name table.
function() is the window function you want to use.
The OVER clause defines the partitioning and ordering of rows in the window.
The PARTITION BY clause divides rows into partitions based on the specified partition_expression; if the partition_expression is not specified, the result set will be treated as a single partition.
The ORDER BY clause uses the specified order_expression to define the order in which rows will be processed within each partition; if the order_expression is not specified, rows will be processed in an undefined order.
Finally, output_column_name is the name of your output column.
These are the key SQL window function components. One more thing worth mentioning is that window functions are applied after the processing of WHERE, GROUP BY, and HAVING clauses.
This means you can use the output of your window functions in subsequent clauses of your queries.
The OVER() clause
Now let's take a closer look at the OVER() clause in SQL, which is essentially the core of window functions. It determines the partitioning and ordering of a rowset before the associated window function is applied.
The OVER() clause can be applied with functions to compute aggregated values such as moving averages, running totals, cumulative aggregates, or top N per group results.
The PARTITION BY clause
The PARTITION BY clause is used to partition the rows of a table into groups. This comes in handy when dealing with large datasets that need to be split into smaller parts, which are easier to manage.
PARTITION BY is always used inside the OVER() clause; if it is omitted, the entire table is treated as a single partition.
The ORDER BY clause
Another notable clause is ORDER BY, which determines the order of rows within a partition; if it is omitted, the order is undefined.
For instance, when it comes to ranking functions, ORDER BY specifies the order in which ranks are assigned to rows.
Frame specification
In the same OVER() clause, you can specify the upper and lower bounds of a window frame using one of the two subclauses, ROWS or RANGE. The basic syntax for both of these subclauses is essentially the same:
ROWS BETWEEN lower_bound AND upper_bound
RANGE BETWEEN lower_bound AND upper_bound
And in some cases, they might even return the same result. However, there's an important difference.
In the ROWS subclause, the frame is defined by beginning and ending row positions. Offsets are differences in row numbers from the current row number.
As opposed to that, in the RANGE subclause, the frame is defined by a value range. Offsets are differences in row values from the current row value.
Types of SQL window functions
Window functions in SQL Server are divided into three main types: aggregate, ranking, and value functions. Let's have a brief overview of each.
Aggregate window functions
AVG() returns the average of the values in a group, ignoring null values.
MAX() returns the maximum value in the expression.
MIN() returns the minimum value in the expression.
SUM() returns the sum of all the values, or only the DISTINCT values, in the expression.
COUNT() returns the number of items found in a group.
STDEV() returns the statistical standard deviation of all values in the specified expression.
STDEVP() returns the statistical standard deviation for the population for all values in the specified expression.
VAR() returns the statistical variance of all values in the specified expression; it may be followed by the OVER clause.
VARP() returns the statistical variance for the population for all values in the specified expression.
Ranking window functions
ROW_NUMBER() assigns a unique sequential integer to rows within a partition of a result set.
RANK() assigns a unique rank to each row within a partition with gaps in the ranking sequence when there are ties.
DENSE_RANK() assigns a unique rank to each row within a partition without gaps in the ranking sequence when there are ties.
PERCENT_RANK() calculates the relative rank of a row within a group of rows.
NTILE() distributes rows in an ordered partition into a specified number of approximately equal groups.
Value window functions
LAG() retrieves values from rows that precede the current row in the result set.
LEAD() retrieves values from rows that follow the current row in the result set.
FIRST_VALUE() returns the first value in an ordered set of values within a partition.
LAST_VALUE() returns the last value in an ordered set of values within a partition.
NTH_VALUE() returns the value of the nth row in the ordered set of values.
CUME_DIST() returns the cumulative distribution of a value in a group of values.
Practical examples of SQL window functions
With the theory out of the way, let's move on to practice and have a look at several real-life examples of window functions.
Cumulative totals
Our first example calculates cumulative totals of sales from the Sales.SalesOrderHeader table. Note that we partition our data by CustomerID, which allows us to check cumulative totals for each client.
Additionally, the results are ordered by date, which shows us the gradual growth of the said totals.
SELECT SalesOrderID,
OrderDate,
CustomerID,
TotalDue,
SUM(TotalDue) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal
FROM Sales.SalesOrderHeader
ORDER BY CustomerID,
OrderDate;
Our next example deals with the ranking of employees based on their total sales amounts.
For this, we use the Sales.SalesOrderHeader and HumanResources.Employee tables.
Note that we use both RANK() and DENSE_RANK(); the former assigns consecutive ranks with gaps in case it encounters identical values,
while the latter assigns consecutive ranks without gaps.
SELECT e.BusinessEntityID AS EmployeeID,
p.FirstName,
p.LastName,
SUM(soh.TotalDue) AS TotalSales,
RANK() OVER (ORDER BY SUM(soh.TotalDue) DESC) AS SalesRank,
DENSE_RANK() OVER (ORDER BY SUM(soh.TotalDue) DESC) AS SalesDenseRank
FROM Sales.SalesOrderHeader soh
JOIN Sales.SalesPerson sp
ON soh.SalesPersonID = sp.BusinessEntityID
JOIN HumanResources.Employee e
ON sp.BusinessEntityID = e.BusinessEntityID
JOIN Person.Person p
ON e.BusinessEntityID = p.BusinessEntityID
GROUP BY e.BusinessEntityID,
p.FirstName,
p.LastName
ORDER BY SalesRank;
Like in the previous case, let's run this query in the Studio.
Rolling and moving averages
Finally, here's the third and most complex example, which returns a rolling average of sales over the course of three months. Here, we do the following:
Find the years with minimum and maximum sales.
Get a list of months that will take part in our calculations.
Calculate total sales for each month.
Calculate the average sales for each month and the two months that precede it (which makes it three months overall).
WITH SalesYears AS (
SELECT
MIN(YEAR(OrderDate)) AS MinYear,
MAX(YEAR(OrderDate)) AS MaxYear
FROM Sales.SalesOrderHeader
),
DateSeries AS (
SELECT MIN(YEAR(OrderDate)) AS SalesYear, 1 AS SalesMonth
FROM Sales.SalesOrderHeader
UNION ALL
SELECT
CASE WHEN SalesMonth = 12 THEN SalesYear + 1 ELSE SalesYear END,
CASE WHEN SalesMonth = 12 THEN 1 ELSE SalesMonth + 1 END
FROM DateSeries
WHERE SalesYear * 100 + SalesMonth < (SELECT MaxYear * 100 + 12 FROM SalesYears)
),
MonthlySales AS (
SELECT
YEAR(OrderDate) AS SalesYear,
MONTH(OrderDate) AS SalesMonth,
SUM(TotalDue) AS MonthlyTotal
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate), MONTH(OrderDate)
),
FinalData AS (
SELECT
m.SalesYear,
m.SalesMonth,
m.MonthlyTotal
FROM MonthlySales m
)
SELECT
SalesYear,
SalesMonth,
MonthlyTotal,
AVG(MonthlyTotal) OVER (
ORDER BY SalesYear, SalesMonth
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS Rolling3MonthAvg
FROM FinalData
ORDER BY SalesYear, SalesMonth
OPTION (MAXRECURSION 1000);
And again, we run the query and get the result that we can analyze.
Why is it better to use SQL window functions in dbForge Studio for SQL Server?
As an aside, let us tell you a bit more about dbForge Studio for SQL Server, which helped us show the examples above.
It's a multifeatured IDE that handles a rich variety of database development, management, and administration tasks, including, but not limited to the following.
Context-aware code completion, formatting, refactoring, and debugging
Query optimization
Integration with multiple version control systems
Comparison and synchronization of database schemas and table data
Data import and export supporting 14 most popular formats
Database migration, backup and recovery
Generation of meaningful test data
Pivot tables for data aggregation and analysis
Generation of comprehensive database documentation and data reports
There's much more to it, so we'd love to invite you to download dbForge Studio for a free 30-day trial and explore its capabilities, which go far beyond practice with SQL window functions.
Benefits of SQL window functions
Now, let's briefly recap the key advantages of window functions that make them so useful for data analysts and developers.
Efficiency
First and foremost, window functions are effective at complex operations with your data.
Once you master them, your queries will be concise and precise, without multiple subqueries or excessive JOINs.
This also has a positive impact on query performance, since there are fewer rows to be processed by the query.
Versatility
Window functions are versatile as well. They help you with aggregations, rankings, distributions, and other operations without narrowing the result set down to a single row.
You can partition the result set by one or more columns, and then perform your calculations within each partition.
Row-level context
It is also worth noting that window functions preserve the original rows and columns in the result set.
Unlike aggregate functions, they maintain the row-level context and don't collapse rows into a single result.
Conclusion
Now that you know the basics of SQL window functions, you can try applying them in real-life cases using the top IDE on the market.
Get started today - download dbForge Studio for SQL Server for a free 30-day trial right now
and write even the most complex queries with ease.
dbForge Studio for SQL Server
Your ultimate IDE for SQL Server development and administration
A window function (a.k.a. analytic function) is a function that uses values from a specific set of rows (a window) and returns a value for each row. The window is defined by an OVER() clause.
How do window functions differ from regular aggregate functions?
A window function doesn't change the number of rows, but rather assigns a value to each row of the dataset, even if these values are the same.
This makes a window function different from an aggregate function, which returns a single value for multiple rows.
How many window functions are there in SQL Server?
We can single out 20 window functions in SQL Server, typically distributed among 3 types.
What are the most common types of window functions in SQL Server?
Window functions in SQL Server can be divided into 3 common types:
Value window functions: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE(), NTH_VALUE(), CUME_DIST()
Are SQL window functions efficient?
Yes. They help avoid writing long queries with multiple subqueries and JOINs. As a result, they require fewer rows to be examined and thus are executed faster.
What are some best practices for writing efficient SQL window functions?
Apply window functions only where they deliver a definite advantage over other SQL constructs; in some cases, a JOIN or a subquery might suffice and provide better performance
You might want to filter data before applying window functions; this will reduce the size of the processed dataset, which will also have a positive impact on performance
Once you write a query with a window-function, double-check its execution plan for bottlenecks or steps that consume excessive resources; if there are any, optimize your query