Explore the power of analytic functions – Part 2

In the previous blog post on that subject we reviewed a couple a notable applications of Oracle analytic functions. Today, I came across another interesting illustration of the same concept.

Problem: List all employees from the same department and holding the same job title as ADAMS.

Expected Result:

ENAME JOB DEPTNO
ADAMS CLERK 20
SMITH CLERK 20

Like before, we start with traditional approaches that every experienced developer would easily demonstrate.

Strategy #1:  Using multi-column subquery

SELECT ename, job, deptno
FROM scott.emp
WHERE (deptno, job) IN (SELECT deptno, job
                        FROM scott.emp
                        WHERE ename = 'ADAMS')
ORDER BY 2, 3, 1

Strategy #2:  Using self-join

SELECT a.ename, job, deptno
FROM scott.emp a JOIN scott.emp b USING(deptno, job)
WHERE b.ename = 'ADAMS'
ORDER BY job, deptno, a.ename

Strategy #3:  Using EXISTS predicate

SELECT ename, job, deptno
FROM scott.emp a
WHERE EXISTS (SELECT 1
              FROM scott.emp
              WHERE ename  = 'ADAMS'
                AND deptno = a.deptno
                AND job    = a.job)
ORDER BY 2, 3, 1

A common feature of all the strategies above is having two copies of the emp table with two joining conditions (deptno, job) and one filter (ename = ‘ADAMS’)

As we have seen before, with analytic functions, we can get away with a single copy of th emp table.

Strategy #4:  Using COUNT Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       COUNT(DECODE(ename, 'ADAMS', 1)) 
             OVER(PARTITION BY deptno, job) cnt
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE cnt > 0
ORDER BY 2, 3, 1

Of course, you can use different analytic functions here:

Strategy #5:  Using MAX Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       MAX(DECODE(ename, 'ADAMS', ename)) 
           OVER(PARTITION BY deptno, job) adams
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE adams = 'ADAMS'
ORDER BY 2, 3, 1

Strategy #6:  Using LISTAGG Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       LISTAGG(DECODE(ename, 'ADAMS', 'Y'), '|') WITHIN GROUP (ORDER BY 1) 
               OVER(PARTITION BY deptno, job) flag
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE flag LIKE 'Y%'
ORDER BY 2, 3, 1

We need to use LIKE operator in case we have more than a single Adams working in the same department and holding the same job title.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions

Analytic functions are still underutilized among database developers. The goal of this publication is to demonstrate hidden opportunities to improve query performance by using analytic functions where traditional approach is still dominating.

Let’s consider a few examples. We will use Oracle’s traditional educational schema scott and its famous emp and dept tables.

Problem #1: Find all employees from the department where a top paid clerk works.

A quick look at the clerk records reveals the expected department (10):

SELECT deptno, job, sal
FROM scott.emp
WHERE job = 'CLERK'
ORDER BY sal DESC

Result:

DEPTNO JOB SAL
10 CLERK 1300
20 CLERK 1100
30 CLERK 950
20 CLERK 800

Traditional approach suggests finding the department (10) first and then the task becomes trivial:

SELECT ename, deptno, job, sal
FROM scott.emp
WHERE deptno IN (<department that we found>)

OK, that works, but now we need to produce a query that will get us that department number (of several ones if they all have top paid clerks).

There are many ways of finding a top record with and without analytic functions. If we don’t use analytic functions we will end up using two copies of the emp table as in the following example:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
  AND sal = (SELECT MAX(sal) FROM scott.emp WHERE job = 'CLERK')

or this:

SELECT a.deptno
FROM scott.emp a LEFT JOIN scott.emp b ON a.sal < b.sal
AND b.job = 'CLERK'
WHERE b.deptno IS NULL 
  AND a.job = 'CLERK'

The use of analytic functions reduces the number of table copies as we can get all the necessary details in a single table scan as in the following query:

WITH x AS (
SELECT deptno, MAX(sal) max_sal, RANK() OVER(ORDER BY MAX(sal) DESC) rk
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
)
SELECT deptno
FROM x
WHERE rk = 1

Such SQL looks even shorter in databases, such as Snowflake and Terdata, that support QUALIFY clause in SELECT statement:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
QUALIFY RANK() OVER(ORDER BY MAX(sal) DESC) = 1

So a complete traditional approach will use 3 copies of the emp table as in the following query:

SELECT ename, deptno, job, sal
FROM scott.emp                                 -- copy #1
WHERE deptno IN (SELECT deptno
                 FROM scott.emp                -- copy #2
                 WHERE job = 'CLERK'
                   AND sal = (SELECT MAX(sal) 
                              FROM scott.emp   -- copy #3
                              WHERE job = 'CLERK')
                 )

Yes, it works, but what an overkill! Is it still possible to use a single emp table scan to solve this problem? The answer is YES:

WITH x AS (
SELECT ename, deptno, job, sal, 
       MAX(DECODE(job,'CLERK',sal)) OVER() max_sal_global,
       MAX(DECODE(job,'CLERK',sal)) OVER(PARTITION BY deptno) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

And of course, in the Snowflake/Teradata SQL we would not even have to use a CTE (common table expression) , thanks to the QUALIFY clause.

MAX analytic functions combined with DECODE (CASE would work as well) here ignore all non-Clerk rows . The OVER clause gives us either department level top salary or global top salary value for the clerks. And since it is done on a row level, which means that the these analytic functions return the same values for all employees in the same department, we can achieve our goal easily.

MAX function is not the only one analytic function that can be used here. The following example demonstrates the FIRST_VALUE function to achieve the same result:

WITH x AS (
SELECT ename, deptno, job, sal, 
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_global,
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(PARTITION BY deptno
              ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

As an exercise, try to use LISTAGG analytic function to solve this problem.

Problem #2: Find employees who are paid above the average salary in their respective department.

Again, we will start with a “traditional” approach:

SELECT ename, deptno, job, sal
FROM scott.emp a
WHERE sal > (SELECT AVG(sal)
             FROM scott.emp
             WHERE deptno = a.deptno)
ORDER BY deptno, sal

Result:

ENAME DEPTNO JOB SAL
KING 10 PRESIDENT 5000
JONES 20 MANAGER 2975
SCOTT 20 ANALYST 3000
FORD 20 ANALYST 3000
ALLEN 30 SALESMAN 1600
BLAKE 30 MANAGER 2850

An experienced developer would quickly see that this problem can be solved with the same approach as the problem #1 (above):

WITH x AS (
SELECT ename, deptno, job, sal, AVG(sal) OVER(PARTITION BY deptno) avg_sal
FROM scott.emp
)
SELECT *
FROM x
WHERE sal > avg_sal
ORDER BY deptno, sal

Result:

ENAME DEPTNO JOB SAL AVG_SAL
KING 10 PRESIDENT 5000 2916.66667
JONES 20 MANAGER 2975 2175
FORD 20 ANALYST 3000 2175
SCOTT 20 ANALYST 3000 2175
ALLEN 30 SALESMAN 1600 1566.66667
BLAKE 30 MANAGER 2850 1566.66667

With this strategy we can even see the department average salary value.

Problem #3: List all employees who work in the same department as the president.

The problem was discussed in one of my old blog posts and I strongly suggest you to check it out: A trick that helps avoiding multiple table scans.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

 

 

A trick that helps avoiding multiple table scans.

Let’s look at a fairly simple SQL problem:

In a traditional scott.emp table, find all employees who work in the same department as the president.

  • Make your query work even if there are more than 1 president records exist in emp table
  • Make Oracle scan emp table just ONCE

A “traditional” solution to this problem may look like this:

Strategy #1: Using a subquery

SELECT *
FROM scott.emp
WHERE deptno IN (SELECT deptno 
                 FROM scott.emp 
                 WHERE job='PRESIDENT')

or this:

Strategy #2: Using a self-join

SELECT DISTINCT a.*
FROM scott.emp a JOIN scott.emp b ON a.deptno=b.deptno
WHERE b.job='PRESIDENT'

Note, that DISTINCT option in the above query is needed to prevent duplicates if there were multiple presidents in а specific department.

Both solutions above use 2 copies of the emp table which makes oracle scan the same scott.emp table twice.

A trick presented below allows you to use only a single copy of the emp table to solve the problem. The trick involves different conceptual and technical approaches compared to the solutions we have seen so far.

Conceptually, we should rephrase the problem in a way that would keep it identical and at the same time allows us to use different technical arsenal. This approach is explained in a detailed manner in my book “Oracle SQL Tricks and Workarounds”. We can rephrase the puzzle and say that we are looking for employees from departments with some “positive” number of presidents working there. Technically speaking, we need to use analytic function COUNT and check if it is greater than 0:

Strategy #3: Using analytic function COUNT

WITH x AS (
SELECT e.*, 
       COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno) cnt
FROM scott.emp e
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE cnt>0
ORDER BY empno

We do need to use a common table expression as we cannot filter out by analytic function in the same query where the function is used. Nevertheless, we scan the emp table just once, and during this scan, Oracle engine counts the number of presidents in each department.

COUNT is not the only analytic function that can be employed to solve the problem.

Strategy #4: Using analytic function LISTAGG

WITH x AS (
SELECT e.*, 
       LISTAGG(job,'|') 
         WITHIN GROUP (ORDER BY job) OVER(PARTITION BY deptno) jobs
FROM scott.emp e 
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE '|' || jobs || '|' LIKE '%|PRESIDENT|%'
ORDER BY empno

Instead of counting the presidents by department, we simply concatenate all the job titles and check if the resulting string includes a president.

Finally, if you don’t like using sub-queries in general, we can leverage the power of MODEL clause:

Strategy #5: Using MODEL clause to avoid sub-queries

SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM scott.emp
MODEL RETURN UPDATED ROWS 
DIMENSION BY (
  empno, 
  SIGN(COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno)) cnt
) 
MEASURES(ename, job, mgr, hiredate, sal, comm, deptno, 0 dummy) 
RULES(dummy[ANY, 1]=1)
ORDER BY empno

The tricky part here is using a composition of SIGN, COUNT, and DECODE functions (i.e. SIGN on top of what we used in Strategy #3) as a secondary dimension and empno as primary. Employee number is unique by itself, so adding another dimension will still maintain uniqueness required by MODEL clause. The only MODEL RULE changes the dummy measure which “triggers” the “RETURN UPDATED ROWS” instruction and returns only those rows where the dummy dimension was set to 1 – notice that its default value is 0.

You can check the execution plan for all of the above strategies to see how many times Oracle scans the emp table.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

 

Monday Hiring SQL Puzzle and Lateral View Usage Nuance

Let’s solve a fairly simple SQL problem:

For each department count the number of people hired on Monday

• Use scott.emp table
• Show department number and count columns
• If no employees from a given department was hired on Monday, we should list such
department with 0 in the count column
• Sort the result by the department number

Expected Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

We will start with the in-line scalar subquery approach as it is probably one of the most intuitive:

Strategy #1: In-Line Scalar Subquery

SELECT deptno, (SELECT COUNT(*)
                FROM scott.emp
                WHERE deptno=e.deptno
                AND TO_CHAR(hiredate, 'DY')='MON') mon_hires
FROM scott.emp e
GROUP BY deptno
ORDER BY 1

When you only need a single value/column/expression from a correlated subquery, the in-line subquery in SELECT clause works just fine. If we needed more than one: count and let say total salary, we would need to use LATERAL view (or a completely different approach – see below).

Strategy #2: Lateral View

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires, m.total_sal
FROM x, LATERAL (SELECT COUNT(*) mon_hires, SUM(sal) total_sal
                 FROM scott.emp e
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

So far, all is good. I know that some database developers don’t like using table aliases too much and when an opportunity comes they use ANSI standard JOIN syntax with USING clause. Can it be applied in the lateral view?

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(empno) mon_hires
                 FROM scott.emp JOIN x USING (deptno)
                 WHERE TO_CHAR(hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The syntax is correct but the result is apparently not! What happened?

We replaced the WHERE clause condition of e.deptno=x.deptno with the JOIN x USING(deptno) – it first looked legitimate to me until I realized that we have just added an extra join instead of a reference to an external table (CTE x). Essentially, our last (incorrect) query is the same as the following:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(*) mon_hires
                 FROM scott.emp e, x
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

We simply introduced a new (and unwanted) join on the CTE x and turned the correlated reference (in the WHERE clause) into an old-style joining condition which did not even require the LATERAL view functionality:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, (SELECT COUNT(*) mon_hires
         FROM scott.emp e, x
         WHERE e.deptno=x.deptno
           AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The above examples prove that we need to be very careful when mixing up different techniques and new syntax options.

Finally, the last approach will show that only a single scan of the emp table is needed to get the result:

Strategy #3: Conditional Counting

SELECT deptno, COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

This is the best and incidentally the shortest solution that once again demonstrates the power of conditional counting (aggregation) right in SELECT clause.

Likewise we can also show the total salary of those hired on Monday:

SELECT deptno, 
       COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires,
       SUM(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', sal)) total_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

8 Solutions to 2018 Oracle SQL Puzzle of the Week #9

Recent employment Puzzle

For each location, show 2 most recently hired employees

  • Use a single SELECT statement only.
  • ename1 and hiredate1 columns should correspond the latest hired employee while ename1 and hiredate1 columns – the previous one

Expected Result:

LOC ENAME1 HIREDATE1 ENAME2 HIREDATE2
NEW YORK MILLER 23-JAN-82 KING 17-NOV-81
CHICAGO JAMES 03-DEC-81 MARTIN 28-SEP-81
DALLAS ADAMS 23-MAY-87 SCOTT 19-APR-87

Solutions:

Solution #1. Using Self-Join and MAX functions

SELECT d.loc, 
     MAX(e1.ename) KEEP(DENSE_RANK FIRST ORDER BY e1.hiredate DESC) ename1, 
     MAX(e1.hiredate) hiredate1, 
     MAX(e2.ename) KEEP(DENSE_RANK FIRST ORDER BY e2.hiredate DESC) ename2, 
     MAX(e2.hiredate) hiredate2 
FROM scott.emp e1 JOIN scott.emp e2 ON e1.deptno=e2.deptno 
 AND e1.hiredate>=e2.hiredate 
 AND e1.ROWID!=e2.ROWID 
                  JOIN scott.dept d ON e1.deptno=d.deptno 
GROUP BY d.loc;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

SELECT d.loc, 
       MAX(ename) KEEP(DENSE_RANK FIRST ORDER BY hiredate DESC) ename1,
       MAX(hiredate) hiredate1, 
       REGEXP_SUBSTR(LISTAGG(ename, '|') WITHIN GROUP (ORDER BY hiredate DESC),
 '[^|]+',1,2) ename2,
       REGEXP_SUBSTR(LISTAGG(hiredate, '|') WITHIN GROUP (ORDER BY hiredate DESC),
 '[^|]+',1,2) hiredate2
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
GROUP BY d.loc
ORDER BY 1;

Solution #3. Using CTE, ROW_NUMBER, and Self-Join

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT a.loc, a.ename ename1, a.hiredate hiredate1,
              b.ename ename2, b.hiredate hiredate2
FROM x a JOIN x b ON a.loc=b.loc AND a.rn=1 AND b.rn=2;

Solution #4. Using Pivot

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, e1_ename AS ename1, e1_hdate AS hiredate1,
       e2_ename AS ename2, e2_hdate AS hiredate2
FROM x
PIVOT (
MAX(ename) ename, MAX(hiredate) hdate FOR rn IN (1 AS e1, 2 AS e2) 
)
ORDER BY 1;

Solution #5. Simulating Pivot with MAX and DECODE functions

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, 
       MAX(DECODE(rn,1,ename)) ename1, 
       MAX(DECODE(rn,1,hiredate)) hiredate1,
       MAX(DECODE(rn,2,ename)) ename2, 
       MAX(DECODE(rn,2,hiredate)) hiredate2
FROM x
GROUP BY loc
ORDER BY 1;

Solution #6. Using CONNECT BY

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, PRIOR ename ename1, PRIOR hiredate hiredate1, 
       ename ename2, hiredate hriedate2
FROM x
WHERE rn=2
START WITH rn=1
CONNECT BY loc=PRIOR loc
       AND rn=PRIOR rn+1;

Solution #7. Using LEAD and ROW_NUMBER Analytic functions

WITH x AS (
SELECT d.loc, e.ename ename1, e.hiredate hiredate1, 
 LEAD(e.ename,1) OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) ename2,
 LEAD(e.hiredate,1) OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) hiredate2,
 ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
) 
SELECT loc, ename1, hiredate1, ename2, hiredate2
FROM x
WHERE rn=1
ORDER BY 1;

Solution #8. Using Model Clause:

SELECT loc, ename1, hiredate1, ename2, hiredate2
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
MODEL
RETURN UPDATED ROWS
PARTITION BY (d.loc)
DIMENSION BY (
   ROW_NUMBER()OVER(PARTITION BY d.loc ORDER BY e.hiredate DESC) AS rn
)
MEASURES(
    ename AS ename1, hiredate AS hiredate1, 
    ename AS ename2, hiredate AS hiredate2
)
RULES(
    ename2[1]   =ename1[2],
    hiredate2[1]=hiredate1[2]
)
ORDER BY 1;

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

3 Solutions to 2018 Oracle SQL Puzzle of the Week #2

2018 Puzzle of the Week #2:

For each of the following salary ranges select two randomly chosen employees:
0-999
1000-1999
2000-2999
3000+

Expected Result (in SQL*Plus):

ENAME      SAL        RANGE
---------- ---------- ---------
SCOTT            3000 3000+
FORD             3000 3000+
BLAKE            2850 2000-2999
CLARK            2450 2000-2999
TURNER           1500 1000-1999
MILLER           1300 1000-1999
JAMES             950 0-999
SMITH             800 0-999
  • Remember to use only a single SELECT statement.
  • Use table emp (from Oracle scott schema)

Solutions:

Solution #1: Using ROW_NUMBER with random.value functions:

We are applying a random sorting order to each of the salary ranges and take 2 top records from each range:

WITH x AS (
SELECT CASE WHEN sal<=999  THEN '0-999'
            WHEN sal<=1999 THEN '1000-1999'
            WHEN sal<=2999 THEN '2000-2999'
            ELSE                '3000+'
       END range,
       ename, sal
FROM emp
), y AS (
SELECT ename, sal, range, 
       ROW_NUMBER()OVER(PARTITION BY range 
                        ORDER BY dbms_random.value) rn
FROM x
)
SELECT range, ename, sal
FROM y
WHERE rn<=2
ORDER BY range

Result:

RANGE     ENAME      SAL
--------- ---------- ----------
0-999     JAMES      950
0-999     SMITH      800
1000-1999 WARD 1250
1000-1999 TURNER 1500
2000-2999 JONES 2975
2000-2999 CLARK 2450
3000+     FORD 3000
3000+     KING 5000

Result (of subsequent execution):

RANGE     ENAME             SAL
--------- ---------- ----------
0-999     SMITH             800
0-999     JAMES             950
1000-1999 WARD             1250
1000-1999 MARTIN           1250
2000-2999 BLAKE            2850
2000-2999 JONES            2975
3000+     SCOTT            3000
3000+     KING             5000

Solution #2: Using DECODE, MAX() KEEP and UNION ALL:

Instead of taking top 2 records (randomly sorted), we are taking top 1 and bottom 1 and combine them together. DECODE function mimics the CASE from the previous solution.

WITH x AS (
SELECT DECODE(1, SIGN(999-sal), '0-999', SIGN(1999-sal), '1000-1999',
                 SIGN(2999-sal), '2000-2999', '3000+') range,
       ename, sal, ROWNUM || dbms_random.value rnd
FROM scott.emp
)
SELECT range, MAX(ename)KEEP(DENSE_RANK FIRST ORDER BY rnd) ename,
              MAX(sal)  KEEP(DENSE_RANK FIRST ORDER BY rnd) sal
FROM x
GROUP BY range
UNION ALL
SELECT range, MAX(ename)KEEP(DENSE_RANK LAST ORDER BY rnd) ename,
              MAX(sal)  KEEP(DENSE_RANK LAST ORDER BY rnd) sal
FROM x
GROUP BY range
ORDER BY range

Result:

RANGE     ENAME             SAL
--------- ---------- ----------
0-999     JAMES             950
0-999     SMITH             800
1000-1999 MARTIN           1250
1000-1999 WARD             1250
2000-2999 JONES            2975
2000-2999 BLAKE            2850
3000+     FORD             3000
3000+     KING             5000

Note, that we concatenated ROWNUM with dbms_random.value to produce UNIQUE random value. Without ROWNUM (or any other KEY) there is always a chance that dbms_random.value will repeat on different rows and hence top and bottom values could be mixed and the same employee will be repeated twice.

Solution #3: Using SIN for random value simulation and multi-column UNPIVOT with MAX() KEEP function:

Instead of combining top and bottom records from two statements using UNION ALL, here were calculating top and bottom values as 1 record and UNPIVOT them to produce two rows per salary range:

WITH x AS (
SELECT DECODE(1, SIGN(999-sal), '0-999', SIGN(1999-sal), '1000-1999',
                 SIGN(2999-sal), '2000-2999', '3000+') range,
       ename, sal, 
       SIN(ROWNUM*TO_NUMBER(SUBSTR(
                             extract(second 
                                     from current_timestamp),-3))
           ) rnd
FROM scott.emp
), y AS (
SELECT range, MAX(ename)KEEP(DENSE_RANK FIRST ORDER BY rnd) ename1,
              MAX(sal)  KEEP(DENSE_RANK FIRST ORDER BY rnd) sal1,
              MAX(ename)KEEP(DENSE_RANK LAST ORDER BY rnd) ename2,
              MAX(sal)  KEEP(DENSE_RANK LAST ORDER BY rnd) sal2
FROM x
GROUP BY range
)
SELECT range, ename, sal
FROM y
UNPIVOT (
  (ename, sal) for (t1, t2) in ((ename1,sal1), (ename2,sal2))
)
ORDER BY range

Result:

RANGE     ENAME             SAL
--------- ---------- ----------
0-999     SMITH             800
0-999     JAMES             950
1000-1999 MILLER           1300
1000-1999 MARTIN           1250
2000-2999 CLARK            2450
2000-2999 BLAKE            2850
3000+     FORD             3000
3000+     SCOTT            3000

Note the use of multi-column UNPIVOT. Randomization simulation is based on a fairly random selection of the last 3 digits in the current timestamp’s second value. This number is used as a “seed”. When this seed is multiplied by the rownum, the result is used as a SIN function argument which makes the outcome pseudo-random.

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

9 Solutions to Puzzle of the Week #21

Puzzle of the Week #21:

Produce a report that shows employee name, his/her immediate manager name, and the next level manager name. The following conditions should be met:

  • Use Single SELECT statement only
  • Use mgr column to identify employee’s immediate manager
  • The query should work in Oracle 11g.
  • A preferred solution should use only a single instance of emp table.

Expected Result:

NAME1      NAME2      NAME3
---------- ---------- ------
SMITH      FORD       JONES
ALLEN      BLAKE      KING
WARD       BLAKE      KING
JONES      KING
MARTIN     BLAKE      KING
BLAKE      KING
CLARK      KING
SCOTT      JONES      KING
KING
TURNER     BLAKE      KING
ADAMS      SCOTT      JONES
JAMES      BLAKE      KING
FORD       JONES      KING
MILLER     CLARK      KING

Solutions:

#1. Using connect_by_root, sys_connect_by_path, and regexp_substr functions

col name1 for a10
col name2 for a10
col name3 for a10
WITH x AS(
SELECT CONNECT_BY_ROOT(ename) name,
       SYS_CONNECT_BY_PATH(ename, ',') path,
       CONNECT_BY_ROOT(empno) empno
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
SELECT name, REGEXP_SUBSTR(MAX(path), '[^,]+', 1, 2) name2,
             REGEXP_SUBSTR(MAX(path), '[^,]+', 1, 3) name3
FROM x
GROUP BY name, empno
ORDER BY empno;

#2. Using CONNECT BY twice

WITH x AS (
SELECT ename, PRIOR ename mname, empno, mgr
FROM emp
WHERE LEVEL=2 OR mgr IS NULL
CONNECT BY PRIOR empno=mgr
)
SELECT ename name1, mname name2, MAX(PRIOR mname) name3
FROM x
WHERE LEVEL<=2
CONNECT BY PRIOR empno=mgr
GROUP BY ename, mname, empno
ORDER BY empno

#3. Using CONNECT BY and Self Outer Join

WITH x AS (
SELECT ename, PRIOR ename mname, PRIOR mgr AS mgr, empno
FROM emp
WHERE LEVEL=2 OR mgr IS NULL
CONNECT BY PRIOR empno=mgr
)
SELECT x.ename name1, x.mname name2, e.ename name3
FROM x LEFT JOIN emp e ON x.mgr=e.empno
ORDER BY x.empno

#4. Using 2 Self Outer Joins

SELECT a.ename name1, b.ename name2, c.ename name3
FROM emp a LEFT JOIN emp b ON a.mgr=b.empno
           LEFT JOIN emp c ON b.mgr=c.empno
ORDER BY a.empno

#5. Using CONNECT BY and PIVOT

SELECT name1, name2, name3
FROM (
SELECT ename, LEVEL lvl, CONNECT_BY_ROOT(empno) empno
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
PIVOT(
MAX(ename)
FOR lvl IN (1 AS name1, 2 AS name2, 3 AS name3)
)
ORDER BY empno;

#6. PIVOT Simulation

WITH x AS (
SELECT ename, LEVEL lvl, CONNECT_BY_ROOT(empno) empno
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
SELECT MAX(DECODE(lvl, 1, ename)) name1,
       MAX(DECODE(lvl, 2, ename)) name2,
       MAX(DECODE(lvl, 3, ename)) name3
FROM x
GROUP BY empno
ORDER BY empno;

#7. Using CONNECT BY and no WITH/Subqueries (Credit to Krishna Jamal)

SELECT ename Name1, PRIOR ename Name2,
DECODE(LEVEL, 
    3, CONNECT_BY_ROOT(ename), 
    4, TRIM(BOTH ' ' FROM 
        REPLACE(
            REPLACE(SYS_CONNECT_BY_PATH(PRIOR ename, ' '), PRIOR ename), 
        CONNECT_BY_ROOT(ename)))
        ) Name3
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
ORDER BY empno;

#8. A composition of Methods 1 and 7:

SELECT ename Name1, PRIOR ename Name2,
       CASE WHEN LEVEL IN (3,4) 
          THEN REGEXP_SUBSTR(SYS_CONNECT_BY_PATH(ename, ','),'[^,]+',1,LEVEL-2) 
       END AS Name3
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
ORDER BY empno;

#9. Using NTH_VALUE Analytic function (Oracle 11.2 and up):

WITH x AS (
SELECT CONNECT_BY_ROOT(ename) n1, CONNECT_BY_ROOT(empno) empno,
 NTH_VALUE(ename, 2) OVER(PARTITION BY CONNECT_BY_ROOT(ename) ORDER BY LEVEL) n2,
 NTH_VALUE(ename, 3) OVER(PARTITION BY CONNECT_BY_ROOT(ename) ORDER BY LEVEL) n3
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
SELECT n1 name1, MAX(n2) name2, MAX(n3) name3
FROM x
GROUP BY n1, empno
ORDER BY empno

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

3 Solutions to Puzzle of the Week #20

Puzzle of the Week #20:

Produce the historical highest/lowest salary report that should comply with the following requirements:

  • Use Single SELECT statement only
  • Only employees who was paid the highest or lowest salary in their respective department at the moment of hiring should be selected
  • Show name, date of hire, department number, job title, salary table (emp) columns and two additional calculated columns/flags: min_flag and max_flag to indicate that the employee was hired with the min/max salary in their respective department as of the time of hiring.
  • If two or more employees in the same department are paid the same max/min salary, only the one who was hired first should be picked for the report.
  • The query should work in Oracle 11g.

Expected Result:

POW20ER

#1. Using Common Table Expression (CTE) or Recursive WITH clause

WITH y AS (
SELECT ename, job, deptno, hiredate, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY hiredate) rn
FROM emp
), x (ename, job, deptno, hiredate, sal, min_sal, max_sal, min_flag, max_flag, rn) AS (
SELECT ename, job, deptno, hiredate, sal, sal, sal, 1, 1, 1
FROM y
WHERE rn=1
UNION ALL
SELECT y.ename, y.job, y.deptno, y.hiredate, y.sal, 
       LEAST(x.min_sal, y.sal), GREATEST(x.max_sal, y.sal),
       CASE WHEN y.sal<x.min_sal THEN 1 END, 
       CASE WHEN y.sal>x.max_sal THEN 1 END, y.rn
FROM y JOIN x ON y.deptno=x.deptno AND y.rn=x.rn+1
)
SELECT ename, job, deptno, hiredate, sal, min_flag, max_flag
FROM x
WHERE 1 IN (min_flag, max_flag)
ORDER BY deptno, hiredate;

#2. Using Cumulative Analytic Functions MIN, MAX, and ROW_NUMBER

WITH x AS (
SELECT ename, job, deptno, hiredate, sal,
       MIN(sal)OVER(PARTITION BY deptno ORDER BY hiredate) min_sal,
       MAX(sal)OVER(PARTITION BY deptno ORDER BY hiredate) max_sal,
       ROW_NUMBER()OVER(PARTITION BY deptno, sal ORDER BY hiredate) rn
FROM emp
)
SELECT ename, job, deptno, hiredate, sal,
       DECODE(sal, min_sal, 1) min_flag,
       DECODE(sal, max_sal, 1) max_flag
FROM x
WHERE sal IN (min_sal, max_sal)
  AND rn=1;

#3. Using Cumulative Analytic Functions MIN, MAX, and COUNT

WITH x AS (
SELECT ename, job, deptno, hiredate, sal,
       CASE WHEN MIN(sal)OVER(PARTITION BY deptno ORDER BY hiredate)=sal
             AND COUNT(*)OVER(PARTITION BY deptno, sal ORDER BY hiredate)=1 THEN 1 
       END min_flag,
       CASE WHEN MAX(sal)OVER(PARTITION BY deptno ORDER BY hiredate)=sal
             AND COUNT(*)OVER(PARTITION BY deptno, sal ORDER BY hiredate)=1 THEN 1 
       END max_flag
FROM emp
)
SELECT *
FROM x
WHERE 1 IN (min_flag, max_flag);

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

 

 

Interview Question: get 2 random employees for each salary range?

Interview Question

Level: Intermediate/Advanced

For each of the following salary ranges select two randomly chosen employees:

0-999
1000-1999
2000-2999
3000+

Expected Result:

ENAME             SAL RANGE
---------- ---------- ---------
SCOTT            3000 3000+
FORD             3000 3000+
BLAKE            2850 2000-2999
CLARK            2450 2000-2999
TURNER           1500 1000-1999
MILLER           1300 1000-1999
JAMES             950 0-999
SMITH             800 0-999

Solution:

WITH x AS (
SELECT ename, sal,
       CASE WHEN sal>=3000 THEN '3000+'
            WHEN sal>=2000 THEN '2000-2999'
            WHEN sal>=1000 THEN '1000-1999'
            ELSE                '0-999'
       END as range,
       ROW_NUMBER() OVER(PARTITION BY DECODE(GREATEST(sal, 3000), sal, 0, 1) +
                                      DECODE(GREATEST(sal, 2000), sal, 0, 1) +
                                      DECODE(GREATEST(sal, 1000), sal, 0, 1)
                         ORDER BY DBMS_RANDOM.VALUE) rn
FROM emp
)
SELECT ename, sal, range
FROM x
WHERE rn<=2
ORDER BY sal DESC

 

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Two ways to build a salary range report without using CASE function

Interview Question: Produce a salary range report with a single SELECT statement. Decode function is allowed, CASE function – is not.

Level: Intermediate

Expected Result:

RANGE                             Employees
-------------------------------- ----------
0-999                                     2
1000-2999                                 9
3000-5999                                 3

Strategy #1:

SELECT COALESCE(DECODE(LEAST(sal, 999), sal, '0-999'),
                DECODE(LEAST(sal, 2999), GREATEST(sal, 1000), '1000-2999'),
                DECODE(LEAST(sal, 9999), GREATEST(sal, 3000), '3000-5999')
                ) AS range,
       COUNT(*) "Employees"
FROM emp
GROUP BY COALESCE(DECODE(LEAST(sal, 999), sal, '0-999'),
                  DECODE(LEAST(sal, 2999), GREATEST(sal, 1000), '1000-2999'),
                  DECODE(LEAST(sal, 9999), GREATEST(sal, 3000), '3000-5999')
                  )
ORDER BY 1

Explanation:

In Oracle SQL terms, a mathematical condition

a <=x <=b

can be  interpreted as

x BETWEEN a AND b

however, this condition is good only for CASE function, and not for DECODE. The trick is to use another interpretation:

LEAST(b,x)=GREATEST(x,a)

– that can be used in DECODE.

CASE-based Solution:

SELECT CASE WHEN sal<=999 THEN '0-999'
            WHEN sal BETWEEN 1000 AND 2999 THEN '1000-2999'
            WHEN sal BETWEEN 3000 AND 5999 THEN '3000-5999'
       END AS range,
       COUNT(*) "Employees"
FROM emp
GROUP BY CASE WHEN sal<=999 THEN '0-999'
              WHEN sal BETWEEN 1000 AND 2999 THEN '1000-2999'
              WHEN sal BETWEEN 3000 AND 5999 THEN '3000-5999'
         END
ORDER BY 1

Strategy #2:

WITH x AS (
SELECT DECODE(1, (SELECT COUNT(*) FROM dual WHERE emp.sal<=999), '0-999',
                 (SELECT COUNT(*) FROM dual WHERE emp.sal BETWEEN 1000 AND 2999), '1000-2999',
                 (SELECT COUNT(*) FROM dual WHERE emp.sal BETWEEN 3000 AND 5999), '3000-5999'
             ) AS range
FROM emp
)
SELECT range, COUNT(*) AS "Employees"
FROM x
GROUP BY range
ORDER BY 1

Explanation:
This query demonstrates how to mimic CASE function using DECODE and in-line scalar subquery from dual.

Suggested further reading:

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions. The book is also available on Amazon and in all major book stores.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/