## SQL WINDOW FUNCTIONS |

SQL > Window Functions
## What is a window function in SQL?With aggregate functions, you usually get a single result for each group that you look at (as specified by the columns in the GROUP BY clause). However, there are times when you want to apply the aggregate function to each row, and this is where window functions come in. Window functions are called such because they operate on each row within a window. A window is defined by the **PARTITION BY**: Defines the window partition into groups of rows. This is similar to the**GROUP BY**clause. Note that this is optional. You do not need to specify PARTITION BY if your window covers the entire data set.**ORDER BY**: Orders the value within each window partition. This is optional, although in most of the use cases, it makes sense to order your results in some way.
There are a number of aggregate functions that can act as a window function. The following are the common aggregate functions used as part of a window function: - Common aggregate functions:
**SUM**,**AVG**,**COUNT** - Ranking functions:
**RANK**,**DENSE_RANK**,**ROW_NUMBER** - Grouping function:
**NTILE** - Before- and after-functions:
**LEAD**,**LAG**
We go through each type of functions below: ## Common Aggregate Functions: SUM, AVG, COUNTWe use the following table for our examples. Table
To list the average sales amount of the store each salesperson belongs to, we use the following window function: SELECT Store_ID, Salesperson, Sales, AVG(Sales) OVER (PARTITION BY Store_ID) Avg_StoreFROM Store_Sales; Result:
## Ranking Functions: RANK, DENSE_RANK, ROW_NUMBER
When there is no tie among the values, these three functions all return the same result set. Here we use the same SELECT Store_ID, Salesperson, Sales, RANK() OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_RankFROM Store_Sales ORDER BY Store_ID, Sales_Rank; Result:
If we replace SELECT Store_ID, Salesperson, Sales, DENSE_RANK() OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Dense_RankFROM Store_Sales ORDER BY Store_ID, Sales_Rank; Result:
If we replace SELECT Store_ID, Salesperson, Sales, ROW_NUMBER() OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Row_NumberFROM Store_Sales ORDER BY Store_ID, Sales_Rank; Result:
Notice the different values for Store_ID = 3. For this store, two salespeople are tied for the second most sales. Under ## Grouping Function: NTILEThe For example, if there are 100 rows in a window and we use If the number of rows is not divisible by the number of groups, then the size of each group can vary by 1, with the earlier groups having the larger count. For example, if there are 99 rows in a window and we use Let's look at an example. If we want to divide all salesperson in our example into four groups based on the amount of sales each generates regardless of which store the salesperson belongs to, we would use the following query: SELECT Store_ID, Salesperson, Sales, NTILE(4) OVER (ORDER BY Sales DESC) Sales_GroupFROM Store_Sales ORDER BY Sales DESC; Result:
Here we have a total of 13 rows and we are trying to group them into four groups. So Group 1 gets 4 rows, and Groups 2-4 each get 3 rows. ## Before- and After-Functions: LEAD, LAGThere are instances where you are interested in doing a calculation based on the row(s) prior to the particular row you are interested in or based on the row(s) after the particular row you are interested in. In these cases, the The LAG(expression [, interval [, default_value]]) OVER ()
The The LAG(expression [, interval [, default_value]]) OVER ()
The Let's look at an example with the following query: SELECT Store_ID, Salesperson, Sales,
LEAD(Sales) OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Lead,
LAG(Sales) OVER (PARTITION BY Store_ID ORDER BY Sales DESC) Sales_Lag FROM Store_Sales ORDER BY Store_ID, Sales DESC; Result:
Let's look at the data for Cathy. In the Sales_Lead column, the amount is 374, which corresponds to the Sales amount of Aaron, who ranks right below Cathy. In the Sales_Lag column, the amount is 492, which corresponds to the Sales amount of Beatrice, who ranks right above Cathy. The above query itself may not look very useful. However, the usefulness becomes more evident if we are trying to see how much difference there is between each person and the person who ranks right above. In this case, we would use the following query: SELECT Store_ID, Salesperson, Sales,
LAG(Sales,1,Sales) OVER (PARTITION BY Store_ID ORDER BY Sales DESC) - Sales Sales_Difference
FROM Store_SalesORDER BY Store_ID, Sales DESC; Result:
SQL DECODE |