Puzzle: For each department generate all ranges of consecutive years of hiring. For ex, if a department hired every year from 1981 to 1983 and then after a break from 1985 to 1986, there should be 2 ranges for that department in the output:
1981-1983
1985-1986
Expected result for emp table (in scott schema):
DEPTNO YR_RANGE
------- --------------
10 1981 - 1982
20 1980 - 1981
20 1987 - 1987
30 1981 - 1981
Level: Advanced
col yr_range for a20
WITH x AS (
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year, LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
CASE WHEN TO_CHAR(hiredate,'yyyy')-LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) <=1 THEN 0 ELSE 1 END step
FROM EMP
ORDER BY deptno, hiredate
), y AS (
SELECT x.*, SUM(STEP)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
)
SELECT deptno, MIN(year) || ' - ' || MAX(year) AS yr_range
FROM Y
GROUP BY deptno, gr_id
ORDER BY 1,2;
DEPTNO YR_RANGE
------- --------------
10 1981 - 1982
20 1980 - 1981
20 1987 - 1987
30 1981 - 1981
Let’s go over querie’s logic step by step.
Step 1: Identify records that will contribute to a range row in the output.
Visually, it is easy to see if we simply sort all records by deptno and hiredate:
set pagesize 100
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year
FROM emp
ORDER BY deptno, hiredate;
DEPTNO ENAME HIREDATE YEAR
------ ---------- --------- -----
10 CLARK 09-JUN-81 1981
10 KING 17-NOV-81 1981
10 MILLER 23-JAN-82 1982
20 SMITH 17-DEC-80 1980
20 JONES 02-APR-81 1981
20 FORD 03-DEC-81 1981
20 SCOTT 19-APR-87 1987
20 ADAMS 23-MAY-87 1987
30 ALLEN 20-FEB-81 1981
30 WARD 22-FEB-81 1981
30 BLAKE 01-MAY-81 1981
30 TURNER 08-SEP-81 1981
30 MARTIN 28-SEP-81 1981
30 JAMES 03-DEC-81 1981
In the result above I marked alternating groups of records with bold font. So the first interval will be deptno:10, years: 1981-1982; the 2nd: deptno:20, years: 1980-1981; the 3rd: deptno: 20, years: 1987-1987, and the last one: deptno: 30, years: 1981-1981.
The challenge is to add a column “group id” to the above output that would uniquely identify each group. Once this is done, we will group by this column and take MIN/MAX on the year column to form the range.
Step 2: Though it is not so easy to immediately add the group id column, we can easily identify when each group starts and ends. This can be done by comparing current record’s year with the previous record’s year staying within a department based partition:
SELECT deptno, ename, hiredate,
TO_CHAR(hiredate,'yyyy') year,
LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
CASE WHEN TO_CHAR(hiredate,'yyyy')-
LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate)<=1 THEN 0
ELSE 1
END step
FROM emp
ORDER BY deptno, hiredate
DEPTNO ENAME HIREDATE YEAR PREV_YEAR STEP
------- ---------- --------- ----- ---------- ----------
10 CLARK 09-JUN-81 1981 1
10 KING 17-NOV-81 1981 1981 0
10 MILLER 23-JAN-82 1982 1981 0
20 SMITH 17-DEC-80 1980 1
20 JONES 02-APR-81 1981 1980 0
20 FORD 03-DEC-81 1981 1981 0
20 SCOTT 19-APR-87 1987 1981 1
20 ADAMS 23-MAY-87 1987 1987 0
30 ALLEN 20-FEB-81 1981 1
30 WARD 22-FEB-81 1981 1981 0
30 BLAKE 01-MAY-81 1981 1981 0
30 TURNER 08-SEP-81 1981 1981 0
30 MARTIN 28-SEP-81 1981 1981 0
30 JAMES 03-DEC-81 1981 1981 0
We can see that the STEP column when it turns to 1 indicates the beginning of the new range. In order to turn the step column into a group id, we simply need to make a cumulative summation on this column:
Step 3: Add Analytic SUM function
WITH x AS (
SELECT deptno, ename, hiredate,
TO_CHAR(hiredate,'yyyy') year,
LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
CASE WHEN TO_CHAR(hiredate,'yyyy')-
LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate)<=1 THEN 0
ELSE 1
END step
FROM emp
ORDER BY deptno, hiredate
) SELECT x.*, SUM(step)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
DEPTNO ENAME HIREDATE YEAR STEP GR_ID
------ ---------- --------- ----- ---------- ----------
10 CLARK 09-JUN-81 1981 1 1
10 KING 17-NOV-81 1981 0 1
10 MILLER 23-JAN-82 1982 0 1
20 SMITH 17-DEC-80 1980 1 1
20 JONES 02-APR-81 1981 0 1
20 FORD 03-DEC-81 1981 0 1
20 SCOTT 19-APR-87 1987 1 2
20 ADAMS 23-MAY-87 1987 0 2
30 ALLEN 20-FEB-81 1981 1 1
30 WARD 22-FEB-81 1981 0 1
30 BLAKE 01-MAY-81 1981 0 1
30 TURNER 08-SEP-81 1981 0 1
30 MARTIN 28-SEP-81 1981 0 1
30 JAMES 03-DEC-81 1981 0 1
Step 4: Now we can see that a combination of depton and gr_id columns uniquely identify each group of records that will fall in a corresponding year range, so we are ready to do the aggregation:
WITH x AS (
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year, LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
CASE WHEN TO_CHAR(hiredate,'yyyy')-LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) <=1 THEN 0 ELSE 1 END step
FROM EMP
ORDER BY deptno, hiredate
), y AS (
SELECT x.*, SUM(STEP)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
)
SELECT deptno, MIN(year) || ' - ' || MAX(year) AS yr_range
FROM Y
GROUP BY deptno, gr_id
ORDER BY 1,2;
DEPTNO YR_RANGE
------ ------------
10 1981 - 1982
20 1980 - 1981
20 1987 - 1987
30 1981 - 1981
If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/
For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.