Merging/Creating Intervals with SELECT statement

Puzzle: For each department generate all ranges of consecutive years of hiring. For ex, if a department hired every year from 1981 to 1983 and then after a break from 1985 to 1986, there should be 2 ranges for that department in the output:

1981-1983
1985-1986

Expected result for emp table (in scott schema):

 DEPTNO YR_RANGE
------- --------------
     10 1981 - 1982
     20 1980 - 1981
     20 1987 - 1987
     30 1981 - 1981

Level: Advanced

col yr_range for a20

WITH x AS (
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year, LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) <=1 THEN 0 ELSE 1 END step
FROM EMP
ORDER BY deptno, hiredate
), y AS (
SELECT x.*, SUM(STEP)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
)
SELECT deptno, MIN(year) || ' - ' || MAX(year) AS yr_range
FROM Y
GROUP BY deptno, gr_id
ORDER BY 1,2;
 DEPTNO YR_RANGE
------- --------------
     10 1981 - 1982
     20 1980 - 1981
     20 1987 - 1987
     30 1981 - 1981

Let’s go over querie’s logic step by step.

Step 1: Identify records that will contribute to a range row in the output.

Visually, it is easy to see if we simply sort all records by deptno and hiredate:

set pagesize 100

SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year
FROM emp
ORDER BY deptno, hiredate;

DEPTNO ENAME      HIREDATE  YEAR
------ ---------- --------- -----
    10 CLARK      09-JUN-81 1981
    10 KING       17-NOV-81 1981
    10 MILLER     23-JAN-82 1982
    20 SMITH      17-DEC-80 1980
    20 JONES      02-APR-81 1981
    20 FORD       03-DEC-81 1981
    20 SCOTT      19-APR-87 1987
    20 ADAMS      23-MAY-87 1987
    30 ALLEN      20-FEB-81 1981
    30 WARD       22-FEB-81 1981
    30 BLAKE      01-MAY-81 1981
    30 TURNER     08-SEP-81 1981
    30 MARTIN     28-SEP-81 1981
    30 JAMES      03-DEC-81 1981

In the result above I marked alternating groups of records with bold font. So the first interval will be deptno:10, years: 1981-1982; the 2nd: deptno:20, years: 1980-1981; the 3rd: deptno: 20, years: 1987-1987, and the last one: deptno: 30, years: 1981-1981.

The challenge is to add a column “group id” to the above output that would uniquely identify each group. Once this is done, we will group by this column and take MIN/MAX on the year column to form the range.

Step 2: Though it is not so easy to immediately add the group id column, we can easily identify when each group starts and ends. This can be done by comparing current record’s year with the previous record’s year staying within a department based partition:

SELECT deptno, ename, hiredate, 
       TO_CHAR(hiredate,'yyyy') year, 
       LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-
	         LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate)<=1 THEN 0 
	    ELSE 1 
       END step
FROM emp
ORDER BY deptno, hiredate

 DEPTNO ENAME      HIREDATE  YEAR  PREV_YEAR        STEP
------- ---------- --------- ----- ---------- ----------
     10 CLARK      09-JUN-81 1981                      1
     10 KING       17-NOV-81 1981  1981                0
     10 MILLER     23-JAN-82 1982  1981                0
     20 SMITH      17-DEC-80 1980                      1
     20 JONES      02-APR-81 1981  1980                0
     20 FORD       03-DEC-81 1981  1981                0
     20 SCOTT      19-APR-87 1987  1981                1
     20 ADAMS      23-MAY-87 1987  1987                0
     30 ALLEN      20-FEB-81 1981                      1
     30 WARD       22-FEB-81 1981  1981                0
     30 BLAKE      01-MAY-81 1981  1981                0
     30 TURNER     08-SEP-81 1981  1981                0
     30 MARTIN     28-SEP-81 1981  1981                0
     30 JAMES      03-DEC-81 1981  1981                0

We can see that the STEP column when it turns to 1 indicates the beginning of the new range. In order to turn the step column into a group id, we simply need to make a cumulative summation on this column:

Step 3: Add Analytic SUM function

WITH x AS (
SELECT deptno, ename, hiredate, 
       TO_CHAR(hiredate,'yyyy') year, 
       LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-
	         LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate)<=1 THEN 0 
	    ELSE 1 
       END step
FROM emp
ORDER BY deptno, hiredate
) SELECT x.*, SUM(step)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x

DEPTNO ENAME      HIREDATE  YEAR        STEP      GR_ID
------ ---------- --------- ----- ---------- ----------
    10 CLARK      09-JUN-81 1981           1          1
    10 KING       17-NOV-81 1981           0          1
    10 MILLER     23-JAN-82 1982           0          1
    20 SMITH      17-DEC-80 1980           1          1
    20 JONES      02-APR-81 1981           0          1
    20 FORD       03-DEC-81 1981           0          1
    20 SCOTT      19-APR-87 1987           1          2
    20 ADAMS      23-MAY-87 1987           0          2
    30 ALLEN      20-FEB-81 1981           1          1
    30 WARD       22-FEB-81 1981           0          1
    30 BLAKE      01-MAY-81 1981           0          1
    30 TURNER     08-SEP-81 1981           0          1
    30 MARTIN     28-SEP-81 1981           0          1
    30 JAMES      03-DEC-81 1981           0          1

Step 4: Now we can see that a combination of depton and gr_id columns uniquely identify each group of records that will fall in a corresponding year range, so we are ready to do the aggregation:

WITH x AS (
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year, LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) <=1 THEN 0 ELSE 1 END step
FROM EMP
ORDER BY deptno, hiredate
), y AS (
SELECT x.*, SUM(STEP)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
)
SELECT deptno, MIN(year) || ' - ' || MAX(year) AS yr_range
FROM Y
GROUP BY deptno, gr_id
ORDER BY 1,2;

DEPTNO YR_RANGE
------ ------------
    10 1981 - 1982
    20 1980 - 1981
    20 1987 - 1987
    30 1981 - 1981

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s