Introduction
Window Functions were introduced in ANSI SQL:2003, making them a relatively new feature. They allow functions to be applied to a set of rows using the OVER clause.
One of the most common use cases for window functions is analytical queries, specifically identifying islands and gaps in a dataset.
- Islands are groups of continuous data sequences.
- Gaps are missing sequences between data points.
Dataset
Here’s the initial dataset:
| col1 |
|---|
| 2 |
| 3 |
| 11 |
| 12 |
| 13 |
| 27 |
| 33 |
| 34 |
| 35 |
| 42 |
Creating the Dataset
|
|
Expected Output
After identifying islands in the dataset, we should get the following result:
| start_range | end_range |
|---|---|
| 2 | 3 |
| 11 | 13 |
| 27 | 27 |
| 33 | 35 |
| 42 | 42 |
Algorithmic Approach
To solve this, we need to group numbers that belong to the same continuous sequence (island).
| col1 | grp |
|---|---|
| 2 | a |
| 3 | a |
| 11 | b |
| 12 | b |
| 13 | b |
| 27 | c |
| 33 | d |
| 34 | d |
| 35 | d |
| 42 | e |
Each group (grp) represents an island, where numbers belong to a continuous sequence. For example, group D represents the sequence 33, 34, 35.
SQL Implementation
We can use window functions to achieve this:
|
|
Output
| col1 | grp |
|---|---|
| 2 | 1 |
| 3 | 1 |
| 11 | 8 |
| 12 | 8 |
| 13 | 8 |
| 27 | 21 |
| 33 | 26 |
| 34 | 26 |
| 35 | 26 |
| 42 | 32 |
Explanation
- By subtracting
ROW_NUMBER() OVER(ORDER BY col1)fromcol1, we create a unique identifier (grp) for each island. - Numbers in the same sequence get the same
grpvalue.
Final Query
Now, we just need to group by the grp column to get the start and end of each island:
|
|
Final Output
| start_range | end_range |
|---|---|
| 2 | 3 |
| 11 | 13 |
| 27 | 27 |
| 33 | 35 |
| 42 | 42 |
Conclusion
By leveraging window functions, we efficiently identify islands in a dataset using row numbering and grouping techniques. This method can be extended to various real-world use cases, such as:
- Tracking continuous patient visits in healthcare analytics.
- Identifying consecutive login days in user behavior analysis.
- Detecting gaps in financial transactions for fraud detection.
Credits
Pictures
References
- SQL Example were taken from “T-SQL Window Functions, Second Edition For data analysis and beyond” by Itzik Ben-Gan