A trick that helps avoiding multiple table scans.

Let’s look at a fairly simple SQL problem:

In a traditional scott.emp table, find all employees who work in the same department as the president.

  • Make your query work even if there are more than 1 president records exist in emp table
  • Make Oracle scan emp table just ONCE

A “traditional” solution to this problem may look like this:

Strategy #1: Using a subquery

SELECT *
FROM scott.emp
WHERE deptno IN (SELECT deptno 
                 FROM scott.emp 
                 WHERE job='PRESIDENT')

or this:

Strategy #2: Using a self-join

SELECT DISTINCT a.*
FROM scott.emp a JOIN scott.emp b ON a.deptno=b.deptno
WHERE b.job='PRESIDENT'

Note, that DISTINCT option in the above query is needed to prevent duplicates if there were multiple presidents in а specific department.

Both solutions above use 2 copies of the emp table which makes oracle scan the same scott.emp table twice.

A trick presented below allows you to use only a single copy of the emp table to solve the problem. The trick involves different conceptual and technical approaches compared to the solutions we have seen so far.

Conceptually, we should rephrase the problem in a way that would keep it identical and at the same time allows us to use different technical arsenal. This approach is explained in a detailed manner in my book “Oracle SQL Tricks and Workarounds”. We can rephrase the puzzle and say that we are looking for employees from departments with some “positive” number of presidents working there. Technically speaking, we need to use analytic function COUNT and check if it is greater than 0:

Strategy #3: Using analytic function COUNT

WITH x AS (
SELECT e.*, 
       COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno) cnt
FROM scott.emp e
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE cnt>0
ORDER BY empno

We do need to use a common table expression as we cannot filter out by analytic function in the same query where the function is used. Nevertheless, we scan the emp table just once, and during this scan, Oracle engine counts the number of presidents in each department.

COUNT is not the only analytic function that can be employed to solve the problem.

Strategy #4: Using analytic function LISTAGG

WITH x AS (
SELECT e.*, 
       LISTAGG(job,'|') 
         WITHIN GROUP (ORDER BY job) OVER(PARTITION BY deptno) jobs
FROM scott.emp e 
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE '|' || jobs || '|' LIKE '%|PRESIDENT|%'
ORDER BY empno

Instead of counting the presidents by department, we simply concatenate all the job titles and check if the resulting string includes a president.

Finally, if you don’t like using sub-queries in general, we can leverage the power of MODEL clause:

Strategy #5: Using MODEL clause to avoid sub-queries

SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM scott.emp
MODEL RETURN UPDATED ROWS 
DIMENSION BY (
  empno, 
  SIGN(COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno)) cnt
) 
MEASURES(ename, job, mgr, hiredate, sal, comm, deptno, 0 dummy) 
RULES(dummy[ANY, 1]=1)
ORDER BY empno

The tricky part here is using a composition of SIGN, COUNT, and DECODE functions (i.e. SIGN on top of what we used in Strategy #3) as a secondary dimension and empno as primary. Employee number is unique by itself, so adding another dimension will still maintain uniqueness required by MODEL clause. The only MODEL RULE changes the dummy measure which “triggers” the “RETURN UPDATED ROWS” instruction and returns only those rows where the dummy dimension was set to 1 – notice that its default value is 0.

You can check the execution plan for all of the above strategies to see how many times Oracle scans the emp table.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

 

3 Solutions to 2018 Oracle SQL Puzzle of the Week #14

Yet Another Top Employee Puzzle

Find the employee who remained the top paid employee (excluding the president) the longest period of time between 1980 and 1981

  • Use a single SELECT statement only.
  • President should be excluded from the analysis.
  • Show the number of days the employee remained the top paid person as well as Start Date (hiredate) and End Date (the date when another top employee started)
  • The End Date for the last top paid employee in the interval should be 31-DEC-1981.

Expected Result:

EMPNO ENAME JOB SAL Start Date End Date Days on Top
7566 JONES MANAGER 2975 02-APR-81 03-DEC-81 245

Solutions:

Solution #1. Using RANK to filter the top employee:

WITH x AS ( 
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
WHERE job!='PRESIDENT' 
), y AS ( 
SELECT empno, ename, job, sal, hiredate start_date, max_sal,  
       LEAD(hiredate) OVER(ORDER BY hiredate) end_date 
FROM x 
WHERE sal=max_sal 
), z AS ( 
SELECT y.*, LEAST(end_date, date'1981-12-31')-start_date days_on_top, 
RANK()OVER(ORDER BY LEAST(end_date, date'1981-12-31')-start_date DESC) rk 
FROM y 
WHERE EXTRACT(YEAR FROM start_date) IN (1980, 1981)  
) 
SELECT empno,ename,job,sal, start_date "Start Date", 
       end_date "End Date", days_on_top	"Days on Top" 
FROM z 
WHERE rk=1

Solution #2. Using Subquery to filter the top employee:

WITH x AS ( 
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
WHERE job!='PRESIDENT' 
  AND hiredate>=date'1980-01-01'  
), y AS ( 
SELECT empno, ename, job, sal, hiredate start_date,  
       LEAST(date'1981-12-31', 
             LEAD(hiredate) OVER(ORDER BY hiredate)) end_date 
FROM x 
WHERE sal=max_sal 
) 
SELECT empno,ename,job,sal, start_date "Start Date", 
       end_date "End Date", end_date-start_date "Days on Top" 
FROM y 
WHERE end_date-start_date=(SELECT MAX(end_date-start_date) FROM y)

Solution #3. Using MODEL with RETURN UPDATED ROWS to filter the top employee:

WITH e AS ( 
SELECT empno, ename, sal, job, LEAST(hiredate, date'1981-12-31') hiredate,  
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp 
WHERE hiredate>=date'1980-01-01'  
  AND job!='PRESIDENT' 
), x AS ( 
SELECT empno, ename, job, sal, hiredate,  
       NVL(LEAD(hiredate)OVER(ORDER BY hiredate),
           date'1981-12-31')-hiredate diff, 
       NVL(LEAD(hiredate)OVER(ORDER BY hiredate),
           date'1981-12-31') end_date 
FROM e 
WHERE sal=max_sal 
) 
SELECT empno, ename, job, sal, hiredate "Start Date", 
       end_date "End Date", diff "Days on Top" 
FROM x 
MODEL RETURN UPDATED ROWS 
DIMENSION BY (empno, RANK()OVER(ORDER BY diff DESC) rk) 
MEASURES(ename,job,sal, hiredate, end_date, diff, 0 dummy) 
RULES(dummy[ANY, 1]=1)

The following query will only work as long as there is only 1 top paid employee who stayed on top the longest. In case if we had more than 1 it would only list one of those:

WITH x AS (
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal
FROM scott.emp 
WHERE job!='PRESIDENT'
), y AS (
SELECT empno, ename, job, sal, hiredate start_date, 
 LEAST(date'1981-12-31', 
       LEAD(hiredate) OVER(ORDER BY hiredate)) end_date,
 LEAST(date'1981-12-31', 
       LEAD(hiredate) OVER(ORDER BY hiredate))-hiredate days_top
FROM x
WHERE sal=max_sal
ORDER BY days_top DESC NULLS LAST, hiredate
)
SELECT *
FROM y
WHERE ROWNUM=1

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Further Reading:

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.