3 Solutions to Puzzle of the Week #20

Puzzle of the Week #20:

Produce the historical highest/lowest salary report that should comply with the following requirements:

  • Use Single SELECT statement only
  • Only employees who was paid the highest or lowest salary in their respective department at the moment of hiring should be selected
  • Show name, date of hire, department number, job title, salary table (emp) columns and two additional calculated columns/flags: min_flag and max_flag to indicate that the employee was hired with the min/max salary in their respective department as of the time of hiring.
  • If two or more employees in the same department are paid the same max/min salary, only the one who was hired first should be picked for the report.
  • The query should work in Oracle 11g.

Expected Result:

POW20ER

#1. Using Common Table Expression (CTE) or Recursive WITH clause

WITH y AS (
SELECT ename, job, deptno, hiredate, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY hiredate) rn
FROM emp
), x (ename, job, deptno, hiredate, sal, min_sal, max_sal, min_flag, max_flag, rn) AS (
SELECT ename, job, deptno, hiredate, sal, sal, sal, 1, 1, 1
FROM y
WHERE rn=1
UNION ALL
SELECT y.ename, y.job, y.deptno, y.hiredate, y.sal, 
       LEAST(x.min_sal, y.sal), GREATEST(x.max_sal, y.sal),
       CASE WHEN y.sal<x.min_sal THEN 1 END, 
       CASE WHEN y.sal>x.max_sal THEN 1 END, y.rn
FROM y JOIN x ON y.deptno=x.deptno AND y.rn=x.rn+1
)
SELECT ename, job, deptno, hiredate, sal, min_flag, max_flag
FROM x
WHERE 1 IN (min_flag, max_flag)
ORDER BY deptno, hiredate;

#2. Using Cumulative Analytic Functions MIN, MAX, and ROW_NUMBER

WITH x AS (
SELECT ename, job, deptno, hiredate, sal,
       MIN(sal)OVER(PARTITION BY deptno ORDER BY hiredate) min_sal,
       MAX(sal)OVER(PARTITION BY deptno ORDER BY hiredate) max_sal,
       ROW_NUMBER()OVER(PARTITION BY deptno, sal ORDER BY hiredate) rn
FROM emp
)
SELECT ename, job, deptno, hiredate, sal,
       DECODE(sal, min_sal, 1) min_flag,
       DECODE(sal, max_sal, 1) max_flag
FROM x
WHERE sal IN (min_sal, max_sal)
  AND rn=1;

#3. Using Cumulative Analytic Functions MIN, MAX, and COUNT

WITH x AS (
SELECT ename, job, deptno, hiredate, sal,
       CASE WHEN MIN(sal)OVER(PARTITION BY deptno ORDER BY hiredate)=sal
             AND COUNT(*)OVER(PARTITION BY deptno, sal ORDER BY hiredate)=1 THEN 1 
       END min_flag,
       CASE WHEN MAX(sal)OVER(PARTITION BY deptno ORDER BY hiredate)=sal
             AND COUNT(*)OVER(PARTITION BY deptno, sal ORDER BY hiredate)=1 THEN 1 
       END max_flag
FROM emp
)
SELECT *
FROM x
WHERE 1 IN (min_flag, max_flag);

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

 

 

Three Solutions to Puzzle of the Week #15

Puzzle of the Week #15:

Find all the year based intervals from 1975 up to now when the company did not hire employees. Use a single SELECT statement against emp table.

Expected Result:

years
------------
1975 - 1979
1983 - 1986
1988 - 2016

Solutions

#1: Grouping by an expression on ROWNUM (no Analytic functions!)

SQL> col years for a15

SQL> WITH x AS (
  2  SELECT 1975+LEVEL-1 yr
  3  FROM dual
  4  CONNECT BY 1975+LEVEL-1<=EXTRACT(YEAR FROM SYSDATE)
  5  MINUS
  6  SELECT EXTRACT(YEAR FROM hiredate)
  7  FROM emp
  8  )
  9  SELECT MIN(yr) || ' - ' || MAX(yr) "years"
 10  FROM x
 11  GROUP BY yr-ROWNUM
 12  ORDER BY yr-ROWNUM;

years
---------------
1975 - 1979
1983 - 1986
1988 - 2016

#2: Calculating steps with Analytic function and grouping by a sum of step.

WITH x AS (
SELECT 1975+LEVEL-1 yr
FROM dual
CONNECT BY 1975+LEVEL-1<=EXTRACT(YEAR FROM SYSDATE)
MINUS
SELECT EXTRACT(YEAR FROM hiredate)
FROM emp
), y AS (
SELECT DECODE(yr, LAG(yr,1)OVER(ORDER BY yr)+1, 0, 1) AS step, yr
FROM x
), z AS (
SELECT yr, SUM(step)OVER(ORDER BY yr) grp
FROM y
)
SELECT MIN(yr) || ' - ' || MAX(yr) "years"
FROM z
GROUP BY grp
ORDER BY grp;

years
---------------
1975 - 1979
1983 - 1986
1988 - 2016

#3: Using Self Outer Join to calculate steps

WITH x AS (
SELECT 1975+LEVEL-1 yr
FROM dual
CONNECT BY 1975+LEVEL-1<=EXTRACT(YEAR FROM SYSDATE)
MINUS
SELECT EXTRACT(YEAR FROM hiredate)
FROM emp
), y AS (
SELECT x1.yr, NVL2(x2.yr, 0, 1) step
FROM x x1 LEFT JOIN x x2 ON x1.yr=x2.yr+1
), z AS (
SELECT yr, SUM(step)OVER(ORDER BY yr) grp
FROM y
)
SELECT MIN(yr) || ' - ' || MAX(yr) "years"
FROM z
GROUP BY grp
ORDER BY grp;

years
---------------
1975 - 1979
1983 - 1986
1988 - 2016

 

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Three Solutions to Puzzle of the Week #14

Puzzle of the Week #14:

For each department, find its share in the company’s total payroll. The puzzle should be solved with a single SELECT statement that does not utilize sub-queries, WITH clause, in-line views, temporary tables or PL/SQL functions

Expected Result:

DEPTNO   Share, %
------ ----------
    10      30.15
    20      37.47
    30      32.39

Solutions

#1: Using Analytic Functions (SUM)

SELECT DISTINCT deptno, 
                ROUND(100*SUM(sal)OVER(PARTITION BY deptno)/SUM(sal)OVER(),2) AS "Share, %"
FROM emp
ORDER BY 1
/

DEPTNO   Share, %
------ ----------
    10      30.15
    20      37.47
    30      32.39

#2: Using Cartesian Product

SELECT a.deptno, 
       ROUND(100*SUM(a.sal)*COUNT(DISTINCT a.ROWID)/(SUM(b.sal)*COUNT(DISTINCT b.ROWID)), 2) AS "Share, %"
FROM emp a, emp b
GROUP BY a.deptno
ORDER BY 1
/
DEPTNO   Share, %
------ ----------
    10      30.15
    20      37.47
    30      32.39

#3: Using SUM(DISTINCT ..) on a Cartesian Product

SELECT a.deptno,
       ROUND(100*TRUNC(SUM(DISTINCT a.sal+a.empno/1000000)) /
                 TRUNC(SUM(DISTINCT b.sal+b.empno/1000000)), 2) "Share, %"
FROM emp a, emp b
GROUP BY a.deptno
ORDER BY 1;
DEPTNO   Share, %
------ ----------
    10      30.15
    20      37.47
    30      32.39

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Solutions to Puzzle of the Week #13

Puzzle of the Week #13:

Table Setup and Puzzle description can be located here

Expected Result:

  ID FULL_NAME                             GROUP_ID
---- ----------------------------------- ----------
   8 Oscar Pedro Fernando Rodriguez               1
   9 Rodriguez, Oscar Pedro Fernando              1
  10 Oscar Fernando Rodriguez Pedro               1
   1 John Smith                                   2
   2 John L. Smith                                2
   4 Smith, John                                  2
   5 Tom Khan                                     3
  11 KHAN, TOM S.                                 3

Solutions:

#1. Using CTE (Recursive WITH) and LISTAGG

WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y(id, token, lvl) AS (
SELECT name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, 1), 1 
FROM x
UNION ALL
SELECT x.name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1), y.lvl+1
FROM x JOIN y ON x.name_id=y.id AND REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1) IS NOT NULL
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name, 
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
--- ---------------------------------------- ----------
  8 Oscar Pedro Fernando Rodrigues                    1
  9 Rodrigues, Oscar Pedro Fernando                   1
 10 Oscar Fernando Rodrigues Pedro                    1
  1 John Smith                                        2
  2 John L. Smith                                     2
  4 Smith, John                                       2
  5 Tom Khan                                          3
 11 KHAN, TOM S.                                      3

Explanation:

The key idea is to split each name into multiple name tokens, then sort and merge them back into a single line. Matching (duplicate) names will have the same merged line so we could use it to identify duplicates. DENSE_RANK analytic function is used to generate sequential group id values.

The same idea is used in the solution below. The only difference is the way to split the names into tokens.

#2: Using CONNECT BY and TABLE/CAST/MULTISET functions

 WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y AS (
SELECT name_id AS id, y.column_value AS token
FROM x,
     TABLE(CAST(MULTISET(SELECT REGEXP_SUBSTR(x.full_name, '[^ ]+', 1, LEVEL) token
                    FROM dual
                    CONNECT BY LEVEL <= LENGTH(full_name)-LENGTH(REPLACE(full_name,' '))+1
                        )
                AS sys.odcivarchar2list)
          ) y
WHERE LENGTH(y.column_value)>1
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name,
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
---- ---------------------------------------- ----------
   8 Oscar Pedro Fernando Rodrigues                    1
   9 Rodrigues, Oscar Pedro Fernando                   1
  10 Oscar Fernando Rodrigues Pedro                    1
   1 John Smith                                        2
   2 John L. Smith                                     2
   4 Smith, John                                       2
   5 Tom Khan                                          3
  11 KHAN, TOM S.                                      3

 

Solutions to Puzzle of the Week #9

Puzzle of the Week #9:

All employees are sorted by employee number and need to be split in 3 groups equal in size (as close as possible).  Employees with the smallest numbers will get into the 1st group, the 2nd group will have employees with the next (bigger) range of numbers, etc.

Write a single SELECT statement (against emp table) that would show group number, the range of employee numbers, and the size of each group.

Expected Result:

Group RANGE           Count
----- ---------- ----------
    1 7369-7654           5
    2 7698-7844           5
    3 7876-7934           4

Solutions:

#1: Using NTile Analytic function

WITH x AS (
SELECT empno, NTILE(3)OVER(ORDER BY empno) nt
FROM emp
)
SELECT nt "Group", MIN(empno)||'-'||MAX(empno) "Range", COUNT(*) "Count"
FROM x
GROUP BY nt
ORDER BY 1;

 Group Range           Count
------ ---------- ----------
     1 7369-7654           5
     2 7698-7844           5
     3 7876-7934           4

#2: Simulating NTile function

WITH x AS (
SELECT empno, CEIL(ROW_NUMBER()OVER(ORDER BY empno)/CEIL(COUNT(*)OVER()/3)) nt
FROM emp
)
SELECT nt "Group", MIN(empno)||'-'||MAX(empno) "Range", COUNT(*) "Count"
FROM x
GROUP BY nt
ORDER BY 1;

Group Range           Count
------ ---------- ----------
     1 7369-7654           5
     2 7698-7844           5
     3 7876-7934           4

 

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

Solutions to Puzzle of the Week #8

Puzzle of the Week #8:

Find job titles represented in every department. Write a single SELECT statement only.

Expected Result: (Only clerks and managers work in all 3 departments: 10,20, and 30)

JOB
--------
CLERK
MANAGER

Solutions:

#1: Two COUNT(DISTINCT ..) in HAVING

SELECT job
FROM emp
GROUP BY job
HAVING COUNT(DISTINCT deptno)=(SELECT COUNT(DISTINCT deptno) FROM emp)

#2: Analytic COUNT(DISTINCT ..) with CONNECT BY

SELECT DISTINCT job
FROM (
SELECT job, deptno, LEVEL level#, COUNT(DISTINCT deptno) OVER() cnt
FROM emp
CONNECT BY job=PRIOR job
AND deptno>PRIOR deptno
)
WHERE level#=cnt

#3: Two Analytic COUNT(DISTINCT..)

WITH x AS (
SELECT deptno, job, COUNT(DISTINCT deptno)OVER() cnt, COUNT(DISTINCT deptno)OVER(PARTITION BY job) cnt2
FROM emp
)
SELECT DISTINCT job
FROM x
WHERE cnt=cnt2

OR

WITH x AS (
SELECT deptno, job, COUNT(DISTINCT deptno)OVER() cnt, COUNT(DISTINCT deptno)OVER(PARTITION BY job) cnt2
FROM emp
)
SELECT job
FROM x
WHERE cnt=cnt2
GROUP BY job

#4: Cartesian Product and Two COUNT(DISTINCT ..)

SELECT a.job
FROM emp a, emp b
GROUP BY a.job
HAVING COUNT(DISTINCT a.deptno)=COUNT(DISTINCT b.deptno)

#5: ROLLUP with RANK OVER COUNT(DISTINCT..)

WITH x AS (
SELECT job, COUNT(DISTINCT deptno) cnt, 
       RANK()OVER(ORDER BY COUNT(DISTINCT deptno)  DESC) rk
FROM emp
GROUP BY ROLLUP(job)
)
SELECT job
FROM x
WHERE rk=1
  AND job IS NOT NULL

#6: Analytic COUNT(DITSINCT..) comparison with MINUS

WITH x AS (
SELECT job, 
       CASE WHEN COUNT(DISTINCT deptno)OVER()=COUNT(DISTINCT deptno)OVER(PARTITION BY job) THEN 1 END
FROM emp
MINUS
SELECT job, NULL
FROM emp
)
SELECT job
FROM x

#7: No COUNT(DISTINCT ..) solution:

WITH x AS (
SELECT a.deptno, b.job, NVL(COUNT(c.empno),0) idx
FROM (SELECT DISTINCT deptno FROM emp) a CROSS JOIN (SELECT DISTINCT job FROM emp) b
    LEFT JOIN emp c ON a.deptno=c.deptno AND b.job=c.job
GROUP BY a.deptno, b.job
)
SELECT job
FROM x
GROUP BY job
HAVING MIN(idx)>0

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Simulate LEAD and LAG functions using other analytic functions

Puzzle: Simulate LEAD and LAG functions using other analytic functions

Level: Intermediate

Solution:

Let’s say that we need to see every employee’s name and 2 more employees who were hired right before. The expected result may look like this:

ENAME      HIREDATE  EMPL1      EMPL2
---------- --------- ---------- ---------
SMITH      17-DEC-80
ALLEN      20-FEB-81 SMITH
WARD       22-FEB-81 ALLEN      SMITH
JONES      02-APR-81 WARD       ALLEN
BLAKE      01-MAY-81 JONES      WARD
CLARK      09-JUN-81 BLAKE      JONES
TURNER     08-SEP-81 CLARK      BLAKE
MARTIN     28-SEP-81 TURNER     CLARK
KING       17-NOV-81 MARTIN     TURNER
JAMES      03-DEC-81 KING       MARTIN
FORD       03-DEC-81 JAMES      KING
MILLER     23-JAN-82 FORD       JAMES
SCOTT      19-APR-87 MILLER     FORD
ADAMS      23-MAY-87 SCOTT      MILLER

It is a no-brainer task if we employ LAG function:

SELECT ename, hiredate, 
              LAG(ename,1) OVER(ORDER BY hiredate) empl1, 
              LAG(ename,2) OVER(ORDER BY hiredate) empl2
FROM emp;

One of the possible approaches is to use ROWS window attribute with MIN/MAX analytic functions:

SELECT ename, hiredate, 
       MAX(ename) OVER(ORDER BY hiredate ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) empl1,
       MAX(ename) OVER(ORDER BY hiredate ROWS BETWEEN 2 PRECEDING AND 2 PRECEDING) empl2
FROM emp;

Common rule is:

LAG(value_expr,offset,default) OVER ([partition_clause] order_by_clause )

is the same as

NVL(MIN(value_expr)OVER ([partition_clause] order_by_clause  
   ROWS BETWEEN offset PRECEDING AND offset PRECEDING), default)

and

LEAD(value_expr,offset,default) OVER ([partition_clause] order_by_clause )

is the same as

NVL(MIN(value_expr)OVER ([partition_clause] order_by_clause  
  ROWS BETWEEN offset FOLLOWING AND offset FOLLOWING), default)

This substitution becomes essential in other RDBMS where MIN/MAX analytic functions are supported while LEAD/LAG are not. Teradata is one of the examples.

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Merging/Creating Intervals with SELECT statement

Puzzle: For each department generate all ranges of consecutive years of hiring. For ex, if a department hired every year from 1981 to 1983 and then after a break from 1985 to 1986, there should be 2 ranges for that department in the output:

1981-1983
1985-1986

Expected result for emp table (in scott schema):

 DEPTNO YR_RANGE
------- --------------
     10 1981 - 1982
     20 1980 - 1981
     20 1987 - 1987
     30 1981 - 1981

Level: Advanced

col yr_range for a20

WITH x AS (
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year, LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) <=1 THEN 0 ELSE 1 END step
FROM EMP
ORDER BY deptno, hiredate
), y AS (
SELECT x.*, SUM(STEP)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
)
SELECT deptno, MIN(year) || ' - ' || MAX(year) AS yr_range
FROM Y
GROUP BY deptno, gr_id
ORDER BY 1,2;
 DEPTNO YR_RANGE
------- --------------
     10 1981 - 1982
     20 1980 - 1981
     20 1987 - 1987
     30 1981 - 1981

Let’s go over querie’s logic step by step.

Step 1: Identify records that will contribute to a range row in the output.

Visually, it is easy to see if we simply sort all records by deptno and hiredate:

set pagesize 100

SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year
FROM emp
ORDER BY deptno, hiredate;

DEPTNO ENAME      HIREDATE  YEAR
------ ---------- --------- -----
    10 CLARK      09-JUN-81 1981
    10 KING       17-NOV-81 1981
    10 MILLER     23-JAN-82 1982
    20 SMITH      17-DEC-80 1980
    20 JONES      02-APR-81 1981
    20 FORD       03-DEC-81 1981
    20 SCOTT      19-APR-87 1987
    20 ADAMS      23-MAY-87 1987
    30 ALLEN      20-FEB-81 1981
    30 WARD       22-FEB-81 1981
    30 BLAKE      01-MAY-81 1981
    30 TURNER     08-SEP-81 1981
    30 MARTIN     28-SEP-81 1981
    30 JAMES      03-DEC-81 1981

In the result above I marked alternating groups of records with bold font. So the first interval will be deptno:10, years: 1981-1982; the 2nd: deptno:20, years: 1980-1981; the 3rd: deptno: 20, years: 1987-1987, and the last one: deptno: 30, years: 1981-1981.

The challenge is to add a column “group id” to the above output that would uniquely identify each group. Once this is done, we will group by this column and take MIN/MAX on the year column to form the range.

Step 2: Though it is not so easy to immediately add the group id column, we can easily identify when each group starts and ends. This can be done by comparing current record’s year with the previous record’s year staying within a department based partition:

SELECT deptno, ename, hiredate, 
       TO_CHAR(hiredate,'yyyy') year, 
       LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-
	         LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate)<=1 THEN 0 
	    ELSE 1 
       END step
FROM emp
ORDER BY deptno, hiredate

 DEPTNO ENAME      HIREDATE  YEAR  PREV_YEAR        STEP
------- ---------- --------- ----- ---------- ----------
     10 CLARK      09-JUN-81 1981                      1
     10 KING       17-NOV-81 1981  1981                0
     10 MILLER     23-JAN-82 1982  1981                0
     20 SMITH      17-DEC-80 1980                      1
     20 JONES      02-APR-81 1981  1980                0
     20 FORD       03-DEC-81 1981  1981                0
     20 SCOTT      19-APR-87 1987  1981                1
     20 ADAMS      23-MAY-87 1987  1987                0
     30 ALLEN      20-FEB-81 1981                      1
     30 WARD       22-FEB-81 1981  1981                0
     30 BLAKE      01-MAY-81 1981  1981                0
     30 TURNER     08-SEP-81 1981  1981                0
     30 MARTIN     28-SEP-81 1981  1981                0
     30 JAMES      03-DEC-81 1981  1981                0

We can see that the STEP column when it turns to 1 indicates the beginning of the new range. In order to turn the step column into a group id, we simply need to make a cumulative summation on this column:

Step 3: Add Analytic SUM function

WITH x AS (
SELECT deptno, ename, hiredate, 
       TO_CHAR(hiredate,'yyyy') year, 
       LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-
	         LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate)<=1 THEN 0 
	    ELSE 1 
       END step
FROM emp
ORDER BY deptno, hiredate
) SELECT x.*, SUM(step)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x

DEPTNO ENAME      HIREDATE  YEAR        STEP      GR_ID
------ ---------- --------- ----- ---------- ----------
    10 CLARK      09-JUN-81 1981           1          1
    10 KING       17-NOV-81 1981           0          1
    10 MILLER     23-JAN-82 1982           0          1
    20 SMITH      17-DEC-80 1980           1          1
    20 JONES      02-APR-81 1981           0          1
    20 FORD       03-DEC-81 1981           0          1
    20 SCOTT      19-APR-87 1987           1          2
    20 ADAMS      23-MAY-87 1987           0          2
    30 ALLEN      20-FEB-81 1981           1          1
    30 WARD       22-FEB-81 1981           0          1
    30 BLAKE      01-MAY-81 1981           0          1
    30 TURNER     08-SEP-81 1981           0          1
    30 MARTIN     28-SEP-81 1981           0          1
    30 JAMES      03-DEC-81 1981           0          1

Step 4: Now we can see that a combination of depton and gr_id columns uniquely identify each group of records that will fall in a corresponding year range, so we are ready to do the aggregation:

WITH x AS (
SELECT deptno, ename, hiredate, TO_CHAR(hiredate,'yyyy') year, LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) prev_year,
       CASE WHEN TO_CHAR(hiredate,'yyyy')-LAG(TO_CHAR(hiredate,'yyyy'),1)OVER(PARTITION BY deptno ORDER BY hiredate) <=1 THEN 0 ELSE 1 END step
FROM EMP
ORDER BY deptno, hiredate
), y AS (
SELECT x.*, SUM(STEP)OVER(PARTITION BY deptno ORDER BY hiredate) gr_id
FROM x
)
SELECT deptno, MIN(year) || ' - ' || MAX(year) AS yr_range
FROM Y
GROUP BY deptno, gr_id
ORDER BY 1,2;

DEPTNO YR_RANGE
------ ------------
    10 1981 - 1982
    20 1980 - 1981
    20 1987 - 1987
    30 1981 - 1981

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Generate a department/employee roll report

Level: Intermediate/Advanced

Puzzle: Generate a department /employee roll report (with a single  SELECT statement) that would look as following:

10         20         30
---------- ---------- -------
CLARK      ADAMS      ALLEN
KING       FORD       BLAKE
MILLER     JONES      JAMES
           SCOTT      MARTIN
           SMITH      TURNER
                      WARD

Assumption: Only departments 10, 20, and 30 are expected in the output.

Note that columns in the report may and will likely contain different number of values. This makes the puzzle somewhat tricky.

Method/Workaround #1: Using FULL join on 3 in-line views

WITH d10 AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn
FROM emp
WHERE deptno=10
),   d20 AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn
FROM emp
WHERE deptno=20
),   d30 AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn
FROM emp
WHERE deptno=30
)
SELECT  d10.ename "10", d20.ename "20", d30.ename "30"
FROM d10 FULL JOIN d20 ON d10.rn=d20.rn
	 FULL JOIN d30 ON d10.rn=d30.rn OR d20.rn=d30.rn
ORDER BY COALESCE(d10.rn, d20.rn, d30.rn)

Note the OR operator in the 2nd FULL JOIN condition. If you omit it, the result will be different:

WITH d10 AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn
FROM emp
WHERE deptno=10
),   d20 AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn
FROM emp
WHERE deptno=20
),   d30 AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn
FROM emp
WHERE deptno=30
)
SELECT  d10.ename "10", d20.ename "20", d30.ename "30"
FROM d10 FULL JOIN d20 ON d10.rn=d20.rn
	 FULL JOIN d30 ON d10.rn=d30.rn --OR d20.rn=d30.rn
ORDER BY COALESCE(d10.rn, d20.rn, d30.rn)
/

10         20         30
---------- ---------- -------
CLARK      ADAMS      ALLEN
KING       FORD       BLAKE
MILLER     JONES      JAMES
                      MARTIN
           SCOTT
                      TURNER
           SMITH
                      WARD

Since we don’t know which department will have more employees, we can’t reliably pick the right order for joining tables, so we have to twist it with an additional OR condition.

Overall, this solution is quite simple and straightforward, but very bulky and not scallable. Imagine having 10 departments to show in the report. Not a very neat SQL.
The following 2 workarounds offer substantially better solution.

Method/Workaround #2: Using PIVOT clause

SELECT "10","20","30"
FROM (
  SELECT ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY ename) rn, deptno, ename
  FROM emp
)
PIVOT
(
  MAX(ename)
  FOR deptno IN (10,20,30)
)
ORDER BY rn

Note, that aggregation is done by the “rn” column which is the only common attribute in all 3 columns. Since rn is unique in each deparment, grouping by it will make MAX(ename) evaluate to ename itself as each group will always have 1 value.

Method/Workaround #3: Traditional simulation of PIVOT clause

WITH x AS (
SELECT CASE WHEN deptno=10 THEN ename END "10",
       CASE WHEN deptno=20 THEN ename END "20",
       CASE WHEN deptno=30 THEN ename END "30",
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY ename) rn
FROM emp
)
SELECT MAX("10") AS "10",
       MAX("20") AS "20",
       MAX("30") AS "30"
FROM x
GROUP BY rn
ORDER BY rn

It is a less compact but much more generic approach in a sense that it will work even in those RDBMS that don’t support PIVOT. The idea behind this method is identical to the one used in Method 2.

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

How to generate a random sample of numeric and alpha-numeric values

Puzzle: Generate a random sample of numeric and alpha-numeric values

Level: Intermediate/Advanced

This problem often arises when you need to run a test and you don’t have a table to experiment with. Not a problem any more!

Related Post: Use TRUNC function to generate various date ranges

Step 1: Generate a random list of 10 alpha-numeric values:

SELECT dbms_random.string('x',3) rnd
FROM dual
CONNECT BY ROWNUM<=10

RND
-----
3TI
1JB
CIP
9SE
79K
YNZ
VEG
V0B
KPN
ILR

Step 2: Generate a random list of 10 integer values:

SELECT TRUNC(dbms_random.VALUE(-999,999)) rnd
FROM dual
CONNECT BY ROWNUM<=10

RND
----
 539
 153
 979
 689
 212
 267
  -5
-832
-160
 665

Step 3: Mix the above lists together (with equal shares):

SELECT CASE WHEN MOD(level,2)=0 THEN dbms_random.string('x',3)
            ELSE TO_CHAR(TRUNC(dbms_random.VALUE(-999,999)))
       END rnd
FROM dual
CONNECT BY ROWNUM<=10

RND
-------
-513
SVA
-475
NRM
-903
G45
-654
2S3
415
0HG

Step 4: Let’s have a fairly random number of integers and strings in the output

WITH x AS (
SELECT CASE WHEN MOD(level,2)=0 THEN dbms_random.string('x',3)
            ELSE TO_CHAR(TRUNC(dbms_random.VALUE(-999,999)))
       END rnd,
       ROW_NUMBER()OVER(ORDER BY dbms_random.value) rk
FROM dual
CONNECT BY ROWNUM<=40
)
SELECT CAST(rnd AS VARCHAR2(4)) rnd
FROM x
WHERE rk<=10

RND
----
-985
0TG
8JZ
-714
500
199
7IJ
249
RNI
F2G

Step 5: Final touch – let’s add a column that would flag integers

WITH x AS (
SELECT CASE WHEN MOD(level,2)=0 THEN dbms_random.string('x',3)
            ELSE TO_CHAR(TRUNC(dbms_random.VALUE(-999,999)))
       END rnd,
       ROW_NUMBER()OVER(ORDER BY dbms_random.value) rk
FROM dual
CONNECT BY ROWNUM<=40
)
SELECT CAST(rnd AS VARCHAR2(4)) rnd, CASE WHEN REGEXP_LIKE(rnd,'^-?[[:digit:]]+$') THEN 1 ELSE 0 END is_int
FROM x
WHERE rk<=10

RND      IS_INT
---- ----------
WIS           0
-558          1
0QR           0
-433          1
RB0           0
PT8           0
409           1
YOV           0
969           1
FFI           0

 

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.