Introduction
In this post, we explore the concept of gaps in sequential data and how Window Functions can be leveraged to identify and analyze them effectively.
Refresher on Window Functions
Window Functions were introduced in ANSI SQL:2003. These functions allow computations across a set of table rows that are related to the current row using the OVER
clause.
A common use case for window functions is analytical queries, particularly in identifying islands and gaps in datasets:
- Islands: Groups of continuous data sequences.
- Gaps: Missing sequences between data points.
Problem Scenario
Consider a scenario where a patient enrolled in a health program is required to check in daily. Our objective is to identify the gaps in their consecutive visits.
Dataset
Sample Data
visit_date |
---|
2025-02-02 |
2025-02-03 |
2025-02-11 |
2025-02-12 |
2025-02-13 |
2025-02-27 |
2025-04-03 |
2025-04-04 |
2025-04-05 |
2025-05-12 |
Creating the Dataset in SQL
|
|
Expected Output
After analyzing gaps, we should obtain the following missing date ranges:
start_range | end_range |
---|---|
2025-02-04 | 2025-02-10 |
2025-02-14 | 2025-02-26 |
2025-02-28 | 2025-04-02 |
2025-04-06 | 2025-05-11 |
Algorithmic Approach
To identify gaps, we calculate the difference in days between each visit. If the gap is greater than one day, it indicates a missing period.
Step 1: Compare Each Visit to the Next
current_visit | next_visit |
---|---|
2025-02-02 | 2025-02-03 |
2025-02-03 | 2025-02-11 |
2025-02-11 | 2025-02-12 |
2025-02-12 | 2025-02-13 |
2025-02-13 | 2025-02-27 |
2025-02-27 | 2025-04-03 |
2025-04-03 | 2025-04-04 |
2025-04-04 | 2025-04-05 |
2025-04-05 | 2025-05-12 |
2025-05-12 | NULL |
Step 2: Compute Days Between Visits
current_visit | next_visit | days_between |
---|---|---|
2025-02-02 | 2025-02-03 | 1 |
2025-02-03 | 2025-02-11 | 8 |
2025-02-11 | 2025-02-12 | 1 |
2025-02-12 | 2025-02-13 | 1 |
2025-02-13 | 2025-02-27 | 14 |
2025-02-27 | 2025-04-03 | 35 |
2025-04-03 | 2025-04-04 | 1 |
2025-04-04 | 2025-04-05 | 1 |
2025-04-05 | 2025-05-12 | 37 |
2025-05-12 | NULL | NULL |
Any difference greater than 1 day represents a gap.
SQL Implementation
We use the LEAD Window Function to fetch the next visit date for comparison.
|
|
Output
startDate | endDate |
---|---|
2025-02-02 | 2025-02-03 |
2025-02-03 | 2025-02-11 |
2025-02-11 | 2025-02-12 |
2025-02-12 | 2025-02-13 |
2025-02-13 | 2025-02-27 |
2025-02-27 | 2025-04-03 |
2025-04-03 | 2025-04-04 |
2025-04-04 | 2025-04-05 |
2025-04-05 | 2025-05-12 |
2025-05-12 | NULL |
Final Query
The final query identifies gaps by filtering where the difference is greater than 1 day:
|
|
Explanation
- The CTE_LEAD common table expression fetches the next visit date.
- The outer query filters out rows where the gap is more than one day.
- The
DATEADD()
function adjusts the start and end dates to reflect the missing range.
Final Output
startDate | endDate |
---|---|
2025-02-04 | 2025-02-10 |
2025-02-14 | 2025-02-26 |
2025-02-28 | 2025-04-02 |
2025-04-06 | 2025-05-11 |
Conclusion
This post demonstrated how Window Functions help identify gaps in sequential data efficiently, making them a powerful tool for analytical SQL queries.