Explore the power of analytic functions – Part 2

May 4, 2020Zahar HilkevichLeave a comment

In the previous blog post on that subject we reviewed a couple a notable applications of Oracle analytic functions. Today, I came across another interesting illustration of the same concept.

Problem: List all employees from the same department and holding the same job title as ADAMS.

Expected Result:

ENAME	JOB	DEPTNO
ADAMS	CLERK	20
SMITH	CLERK	20

Like before, we start with traditional approaches that every experienced developer would easily demonstrate.

Strategy #1: Using multi-column subquery

SELECT ename, job, deptno
FROM scott.emp
WHERE (deptno, job) IN (SELECT deptno, job
                        FROM scott.emp
                        WHERE ename = 'ADAMS')
ORDER BY 2, 3, 1

Strategy #2: Using self-join

SELECT a.ename, job, deptno
FROM scott.emp a JOIN scott.emp b USING(deptno, job)
WHERE b.ename = 'ADAMS'
ORDER BY job, deptno, a.ename

Strategy #3: Using EXISTS predicate

SELECT ename, job, deptno
FROM scott.emp a
WHERE EXISTS (SELECT 1
              FROM scott.emp
              WHERE ename  = 'ADAMS'
                AND deptno = a.deptno
                AND job    = a.job)
ORDER BY 2, 3, 1

A common feature of all the strategies above is having two copies of the emp table with two joining conditions (deptno, job) and one filter (ename = ‘ADAMS’)

As we have seen before, with analytic functions, we can get away with a single copy of th emp table.

Strategy #4: Using COUNT Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       COUNT(DECODE(ename, 'ADAMS', 1)) 
             OVER(PARTITION BY deptno, job) cnt
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE cnt > 0
ORDER BY 2, 3, 1

Of course, you can use different analytic functions here:

Strategy #5: Using MAX Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       MAX(DECODE(ename, 'ADAMS', ename)) 
           OVER(PARTITION BY deptno, job) adams
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE adams = 'ADAMS'
ORDER BY 2, 3, 1

Strategy #6: Using LISTAGG Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       LISTAGG(DECODE(ename, 'ADAMS', 'Y'), '|') WITHIN GROUP (ORDER BY 1) 
               OVER(PARTITION BY deptno, job) flag
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE flag LIKE 'Y%'
ORDER BY 2, 3, 1

We need to use LIKE operator in case we have more than a single Adams working in the same department and holding the same job title.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions

April 29, 2020Zahar Hilkevich1 Comment

Analytic functions are still underutilized among database developers. The goal of this publication is to demonstrate hidden opportunities to improve query performance by using analytic functions where traditional approach is still dominating.

Let’s consider a few examples. We will use Oracle’s traditional educational schema scott and its famous emp and dept tables.

Problem #1: Find all employees from the department where a top paid clerk works.

A quick look at the clerk records reveals the expected department (10):

SELECT deptno, job, sal
FROM scott.emp
WHERE job = 'CLERK'
ORDER BY sal DESC

Result:

DEPTNO	JOB	SAL
10	CLERK	1300
20	CLERK	1100
30	CLERK	950
20	CLERK	800

Traditional approach suggests finding the department (10) first and then the task becomes trivial:

SELECT ename, deptno, job, sal
FROM scott.emp
WHERE deptno IN (<department that we found>)

OK, that works, but now we need to produce a query that will get us that department number (of several ones if they all have top paid clerks).

There are many ways of finding a top record with and without analytic functions. If we don’t use analytic functions we will end up using two copies of the emp table as in the following example:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
  AND sal = (SELECT MAX(sal) FROM scott.emp WHERE job = 'CLERK')

or this:

SELECT a.deptno
FROM scott.emp a LEFT JOIN scott.emp b ON a.sal < b.sal
AND b.job = 'CLERK'
WHERE b.deptno IS NULL 
  AND a.job = 'CLERK'

The use of analytic functions reduces the number of table copies as we can get all the necessary details in a single table scan as in the following query:

WITH x AS (
SELECT deptno, MAX(sal) max_sal, RANK() OVER(ORDER BY MAX(sal) DESC) rk
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
)
SELECT deptno
FROM x
WHERE rk = 1

Such SQL looks even shorter in databases, such as Snowflake and Terdata, that support QUALIFY clause in SELECT statement:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
QUALIFY RANK() OVER(ORDER BY MAX(sal) DESC) = 1

So a complete traditional approach will use 3 copies of the emp table as in the following query:

SELECT ename, deptno, job, sal
FROM scott.emp                                 -- copy #1
WHERE deptno IN (SELECT deptno
                 FROM scott.emp                -- copy #2
                 WHERE job = 'CLERK'
                   AND sal = (SELECT MAX(sal) 
                              FROM scott.emp   -- copy #3
                              WHERE job = 'CLERK')
                 )

Yes, it works, but what an overkill! Is it still possible to use a single emp table scan to solve this problem? The answer is YES:

WITH x AS (
SELECT ename, deptno, job, sal, 
       MAX(DECODE(job,'CLERK',sal)) OVER() max_sal_global,
       MAX(DECODE(job,'CLERK',sal)) OVER(PARTITION BY deptno) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

And of course, in the Snowflake/Teradata SQL we would not even have to use a CTE (common table expression) , thanks to the QUALIFY clause.

MAX analytic functions combined with DECODE (CASE would work as well) here ignore all non-Clerk rows . The OVER clause gives us either department level top salary or global top salary value for the clerks. And since it is done on a row level, which means that the these analytic functions return the same values for all employees in the same department, we can achieve our goal easily.

MAX function is not the only one analytic function that can be used here. The following example demonstrates the FIRST_VALUE function to achieve the same result:

WITH x AS (
SELECT ename, deptno, job, sal, 
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_global,
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(PARTITION BY deptno
              ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

As an exercise, try to use LISTAGG analytic function to solve this problem.

Problem #2: Find employees who are paid above the average salary in their respective department.

Again, we will start with a “traditional” approach:

SELECT ename, deptno, job, sal
FROM scott.emp a
WHERE sal > (SELECT AVG(sal)
             FROM scott.emp
             WHERE deptno = a.deptno)
ORDER BY deptno, sal

Result:

ENAME	DEPTNO	JOB	SAL
KING	10	PRESIDENT	5000
JONES	20	MANAGER	2975
SCOTT	20	ANALYST	3000
FORD	20	ANALYST	3000
ALLEN	30	SALESMAN	1600
BLAKE	30	MANAGER	2850

An experienced developer would quickly see that this problem can be solved with the same approach as the problem #1 (above):

WITH x AS (
SELECT ename, deptno, job, sal, AVG(sal) OVER(PARTITION BY deptno) avg_sal
FROM scott.emp
)
SELECT *
FROM x
WHERE sal > avg_sal
ORDER BY deptno, sal

Result:

ENAME	DEPTNO	JOB	SAL	AVG_SAL
KING	10	PRESIDENT	5000	2916.66667
JONES	20	MANAGER	2975	2175
FORD	20	ANALYST	3000	2175
SCOTT	20	ANALYST	3000	2175
ALLEN	30	SALESMAN	1600	1566.66667
BLAKE	30	MANAGER	2850	1566.66667

With this strategy we can even see the department average salary value.

Problem #3: List all employees who work in the same department as the president.

The problem was discussed in one of my old blog posts and I strongly suggest you to check it out: A trick that helps avoiding multiple table scans.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

3 Solutions to 2018 Oracle SQL Puzzle of the Week #14

April 25, 2018April 25, 2018Zahar HilkevichLeave a comment

Yet Another Top Employee Puzzle

Find the employee who remained the top paid employee (excluding the president) the longest period of time between 1980 and 1981

Use a single SELECT statement only.
President should be excluded from the analysis.
Show the number of days the employee remained the top paid person as well as Start Date (hiredate) and End Date (the date when another top employee started)
The End Date for the last top paid employee in the interval should be 31-DEC-1981.

Expected Result:

EMPNO	ENAME	JOB	SAL	Start Date	End Date	Days on Top
7566	JONES	MANAGER	2975	02-APR-81	03-DEC-81	245

Solutions:

Solution #1. Using RANK to filter the top employee:

WITH x AS ( 
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
WHERE job!='PRESIDENT' 
), y AS ( 
SELECT empno, ename, job, sal, hiredate start_date, max_sal,  
       LEAD(hiredate) OVER(ORDER BY hiredate) end_date 
FROM x 
WHERE sal=max_sal 
), z AS ( 
SELECT y.*, LEAST(end_date, date'1981-12-31')-start_date days_on_top, 
RANK()OVER(ORDER BY LEAST(end_date, date'1981-12-31')-start_date DESC) rk 
FROM y 
WHERE EXTRACT(YEAR FROM start_date) IN (1980, 1981)  
) 
SELECT empno,ename,job,sal, start_date "Start Date", 
       end_date "End Date", days_on_top	"Days on Top" 
FROM z 
WHERE rk=1

Solution #2. Using Subquery to filter the top employee:

WITH x AS ( 
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
WHERE job!='PRESIDENT' 
  AND hiredate>=date'1980-01-01'  
), y AS ( 
SELECT empno, ename, job, sal, hiredate start_date,  
       LEAST(date'1981-12-31', 
             LEAD(hiredate) OVER(ORDER BY hiredate)) end_date 
FROM x 
WHERE sal=max_sal 
) 
SELECT empno,ename,job,sal, start_date "Start Date", 
       end_date "End Date", end_date-start_date "Days on Top" 
FROM y 
WHERE end_date-start_date=(SELECT MAX(end_date-start_date) FROM y)

Solution #3. Using MODEL with RETURN UPDATED ROWS to filter the top employee:

WITH e AS ( 
SELECT empno, ename, sal, job, LEAST(hiredate, date'1981-12-31') hiredate,  
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp 
WHERE hiredate>=date'1980-01-01'  
  AND job!='PRESIDENT' 
), x AS ( 
SELECT empno, ename, job, sal, hiredate,  
       NVL(LEAD(hiredate)OVER(ORDER BY hiredate),
           date'1981-12-31')-hiredate diff, 
       NVL(LEAD(hiredate)OVER(ORDER BY hiredate),
           date'1981-12-31') end_date 
FROM e 
WHERE sal=max_sal 
) 
SELECT empno, ename, job, sal, hiredate "Start Date", 
       end_date "End Date", diff "Days on Top" 
FROM x 
MODEL RETURN UPDATED ROWS 
DIMENSION BY (empno, RANK()OVER(ORDER BY diff DESC) rk) 
MEASURES(ename,job,sal, hiredate, end_date, diff, 0 dummy) 
RULES(dummy[ANY, 1]=1)

The following query will only work as long as there is only 1 top paid employee who stayed on top the longest. In case if we had more than 1 it would only list one of those:

WITH x AS (
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal
FROM scott.emp 
WHERE job!='PRESIDENT'
), y AS (
SELECT empno, ename, job, sal, hiredate start_date, 
 LEAST(date'1981-12-31', 
       LEAD(hiredate) OVER(ORDER BY hiredate)) end_date,
 LEAST(date'1981-12-31', 
       LEAD(hiredate) OVER(ORDER BY hiredate))-hiredate days_top
FROM x
WHERE sal=max_sal
ORDER BY days_top DESC NULLS LAST, hiredate
)
SELECT *
FROM y
WHERE ROWNUM=1

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

sqlpatterns

Oracle SQL Tricks, Tips, and Workarounds

MAX Analytic function

Explore the power of analytic functions – Part 2

Explore the power of analytic functions

3 Solutions to 2018 Oracle SQL Puzzle of the Week #14

Yet Another Top Employee Puzzle

Solutions:

Solution #1. Using RANK to filter the top employee:

Solution #2. Using Subquery to filter the top employee:

Solution #3. Using MODEL with RETURN UPDATED ROWS to filter the top employee:

Further Reading: