Explore the power of analytic functions

Analytic functions are still underutilized among database developers. The goal of this publication is to demonstrate hidden opportunities to improve query performance by using analytic functions where traditional approach is still dominating.

Let’s consider a few examples. We will use Oracle’s traditional educational schema scott and its famous emp and dept tables.

Problem #1: Find all employees from the department where a top paid clerk works.

A quick look at the clerk records reveals the expected department (10):

SELECT deptno, job, sal
FROM scott.emp
WHERE job = 'CLERK'
ORDER BY sal DESC

Result:

DEPTNO JOB SAL
10 CLERK 1300
20 CLERK 1100
30 CLERK 950
20 CLERK 800

Traditional approach suggests finding the department (10) first and then the task becomes trivial:

SELECT ename, deptno, job, sal
FROM scott.emp
WHERE deptno IN (<department that we found>)

OK, that works, but now we need to produce a query that will get us that department number (of several ones if they all have top paid clerks).

There are many ways of finding a top record with and without analytic functions. If we don’t use analytic functions we will end up using two copies of the emp table as in the following example:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
  AND sal = (SELECT MAX(sal) FROM scott.emp WHERE job = 'CLERK')

or this:

SELECT a.deptno
FROM scott.emp a LEFT JOIN scott.emp b ON a.sal < b.sal
AND b.job = 'CLERK'
WHERE b.deptno IS NULL 
  AND a.job = 'CLERK'

The use of analytic functions reduces the number of table copies as we can get all the necessary details in a single table scan as in the following query:

WITH x AS (
SELECT deptno, MAX(sal) max_sal, RANK() OVER(ORDER BY MAX(sal) DESC) rk
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
)
SELECT deptno
FROM x
WHERE rk = 1

Such SQL looks even shorter in databases, such as Snowflake and Terdata, that support QUALIFY clause in SELECT statement:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
QUALIFY RANK() OVER(ORDER BY MAX(sal) DESC) = 1

So a complete traditional approach will use 3 copies of the emp table as in the following query:

SELECT ename, deptno, job, sal
FROM scott.emp                                 -- copy #1
WHERE deptno IN (SELECT deptno
                 FROM scott.emp                -- copy #2
                 WHERE job = 'CLERK'
                   AND sal = (SELECT MAX(sal) 
                              FROM scott.emp   -- copy #3
                              WHERE job = 'CLERK')
                 )

Yes, it works, but what an overkill! Is it still possible to use a single emp table scan to solve this problem? The answer is YES:

WITH x AS (
SELECT ename, deptno, job, sal, 
       MAX(DECODE(job,'CLERK',sal)) OVER() max_sal_global,
       MAX(DECODE(job,'CLERK',sal)) OVER(PARTITION BY deptno) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

And of course, in the Snowflake/Teradata SQL we would not even have to use a CTE (common table expression) , thanks to the QUALIFY clause.

MAX analytic functions combined with DECODE (CASE would work as well) here ignore all non-Clerk rows . The OVER clause gives us either department level top salary or global top salary value for the clerks. And since it is done on a row level, which means that the these analytic functions return the same values for all employees in the same department, we can achieve our goal easily.

MAX function is not the only one analytic function that can be used here. The following example demonstrates the FIRST_VALUE function to achieve the same result:

WITH x AS (
SELECT ename, deptno, job, sal, 
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_global,
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(PARTITION BY deptno
              ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

As an exercise, try to use LISTAGG analytic function to solve this problem.

Problem #2: Find employees who are paid above the average salary in their respective department.

Again, we will start with a “traditional” approach:

SELECT ename, deptno, job, sal
FROM scott.emp a
WHERE sal > (SELECT AVG(sal)
             FROM scott.emp
             WHERE deptno = a.deptno)
ORDER BY deptno, sal

Result:

ENAME DEPTNO JOB SAL
KING 10 PRESIDENT 5000
JONES 20 MANAGER 2975
SCOTT 20 ANALYST 3000
FORD 20 ANALYST 3000
ALLEN 30 SALESMAN 1600
BLAKE 30 MANAGER 2850

An experienced developer would quickly see that this problem can be solved with the same approach as the problem #1 (above):

WITH x AS (
SELECT ename, deptno, job, sal, AVG(sal) OVER(PARTITION BY deptno) avg_sal
FROM scott.emp
)
SELECT *
FROM x
WHERE sal > avg_sal
ORDER BY deptno, sal

Result:

ENAME DEPTNO JOB SAL AVG_SAL
KING 10 PRESIDENT 5000 2916.66667
JONES 20 MANAGER 2975 2175
FORD 20 ANALYST 3000 2175
SCOTT 20 ANALYST 3000 2175
ALLEN 30 SALESMAN 1600 1566.66667
BLAKE 30 MANAGER 2850 1566.66667

With this strategy we can even see the department average salary value.

Problem #3: List all employees who work in the same department as the president.

The problem was discussed in one of my old blog posts and I strongly suggest you to check it out: A trick that helps avoiding multiple table scans.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

 

 

3 Solutions to the 2018 Oracle SQL Puzzle of the Week #5

Puzzle of the Week #5:

Find the shortest and longest last names of the employees in each department.

 

  • If two or more employees tie for the shortest or longest name, pick the name that comes first in alphabetical order
  • Use hr.employees or scott.emp tables
  • Use a single SELECT statement only
  • Ideally, the solution should NOT rely on any sub-queries, CTEs  (WITH clause), or inline views
  • Exclude unknown (NULL) departments

Solutions:

#1. Using MIN() KEEP Group Function

SELECT department_id, 
 MIN(last_name) KEEP(DENSE_RANK FIRST 
                     ORDER BY LENGTH(last_name)) shortest,
 MIN(last_name) KEEP(DENSE_RANK FIRST 
                     ORDER BY LENGTH(last_name) DESC) longest
FROM hr.employees
WHERE department_id IS NOT NULL
GROUP BY department_id

#2. Using FIRST_VALUE Analytic Function and DISTINCT option

(Credit to Igor Shpungin)

SELECT DISTINCT department_id, 
   FIRST_VALUE(last_name) OVER(PARTITION BY department_id 
                               ORDER BY LENGTH(last_name))      shortest, 
   FIRST_VALUE(last_name) OVER(PARTITION BY department_id 
                               ORDER BY LENGTH(last_name) DESC) longest 
FROM hr.employees 
WHERE department_id IS NOT NULL 
ORDER BY 1

#3. Using MODEL clause

(Credit to Naoto Katayama)

SELECT department_id, shortest, longest
FROM hr.employees
WHERE department_id IS NOT NULL
MODEL
RETURN UPDATED ROWS
PARTITION BY (department_id)
DIMENSION BY (
 ROW_NUMBER()OVER(PARTITION BY department_id 
                  ORDER BY LENGTH(last_name), last_name) rn1,
 ROW_NUMBER()OVER(PARTITION BY department_id 
                  ORDER BY LENGTH(last_name) DESC, last_name) rn2)
MEASURES(last_name, 
         CAST(NULL AS VARCHAR2(25)) AS shortest, 
         CAST(NULL AS VARCHAR2(25)) AS longest)
RULES(
      shortest[0,0]=MAX(last_name)[1,ANY], 
      longest [0,0]=MAX(last_name)[ANY,1]
)
ORDER BY department_id

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.