Substitution SQL Puzzle

May 17, 2020Zahar HilkevichLeave a comment

Level: Advanced

A colleague of mine approached me recently with a puzzle he struggled with: you have a table (let’s call it data_table) with id and val (i.e. value) columns. You are given two parameters: value_to_overwrite and value_to_use that should transform the content of the data_table in a special way:

If both parameters exist in the data_table in the val column for the same id, then the one that is equal to value_to_overwrite should be substituted with value_to_use
If none or just one of the parameters exist in the data_table.val column, than the val column should remain the same
List all the rows from the data_table after the transformation.

Let’s create the data_table using the following DDL command:

CREATE TABLE data_table AS
SELECT 1 id, 'a' val FROM dual
UNION ALL
SELECT 1 id, 'b' val FROM dual
UNION ALL
SELECT 1 id, 'c' val FROM dual
UNION ALL
SELECT 2 id, 'b' val FROM dual
UNION ALL
SELECT 2 id, 'd' val FROM dual

ID	VAL
1	a
1	b
1	c
2	b
2	d

For parameters value_to_overwrite = ‘a’ and value_to_use = ‘b’ the expected result should look like this:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	c
2	b	b
2	d	d

Note, that for id = 1, value ‘b’ is substituted with new value ‘a’ because both, value_to_overwrite (‘a’) and value_to_use (‘b’) exist in the val column. All other values should remain the same as substitution condition is not met.

To mimic the parameter use in the query we will create another table (rule_table) with a single row in it.

CREATE TABLE rule_table AS
SELECT 'a' value_to_use, 'b' value_to_overwrite
FROM dual

Translating requirements from English to SQL will likely result in a bulky and inefficient query. Let’s demonstrate that:

/* Values that need to be substituted */
SELECT d.id, d.val AS original_value, r.value_to_use AS new_value
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use IN (SELECT val
                         FROM data_table
                         WHERE id = d.id)
UNION ALL
/* Values that remain the same as only value_to_overwrite exist for given id */
SELECT d.id, d.val, d.val
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use NOT IN (SELECT val
                             FROM data_table
                             WHERE id = d.id)
UNION ALL
/* Values that remain the same as value_to_overwrite does not match val */
SELECT d.id, d.val, d.val
FROM data_table d
WHERE d.val NOT IN (SELECT value_to_overwrite
                    FROM rule_table)

As you can see, there are multiple (five) copies of the data_table used, which will lead to a poor performance when the size of the table increases dramatically.

A way better approach is to take the first SELECT from the UNIONed statement above and turn the INNER JOIN into an LEFT OUTER JOIN. At the same time, we need to move the filtering condition from the WHERE clause to the JOIN (otherwise, the LEFT JOIN will work as INNER JOIN):

SELECT d.id,
       d.val                      AS original_value,
       NVL(r.value_to_use, d.val) AS new_value
FROM data_table d LEFT JOIN rule_table r 
                  ON d.val = r.value_to_overwrite
                 AND r.value_to_use IN (SELECT val
                                        FROM data_table
                                        WHERE id = d.id)

This is a quite efficient and fairly short query that uses only two copies of the data_table. Can we do better than that? Yes, we can!

WITH x AS (
SELECT id, val,
       MIN(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  min_val,
       MAX(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  max_val,
       LEAST(value_to_use, value_to_overwrite)    min_ow,
       GREATEST(value_to_use, value_to_overwrite) max_ow,
       value_to_use, value_to_overwrite
FROM data_table CROSS JOIN rule_table 
)
SELECT id, val AS original_value,
       CASE WHEN min_val=min_ow AND
                 max_val=max_ow AND
                 val=value_to_overwrite THEN value_to_use
       ELSE val
       END AS new_value
FROM x

Analytic functions MIN and MAX let us scan the data_table vertically while LEAST and GREATEST do the same horizontally. The later pair of functions come very handy when you need to compare pairs of values, so the smaller of the values should match LEAST and the other – GREATEST.

And still, the last strategy has one flaw: we used a Cartesian Product (CROSS JOIN) which means that had we have more than one substitution rule, the method would not work properly. Let’s fix it.

First, we will add one more rule:

INSERT INTO rule_table VALUES('b', 'c')

Now, the expected result should looks as the following:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	b
2	b	b
2	d	d

Note, that the second rule turns original ‘c’ value into ‘b’.

And again, Analytic functions do all the magic:

WITH x AS (
SELECT id, val, value_to_overwrite, value_to_use,
       LEAST(value_to_overwrite, value_to_use) || '|' ||
       GREATEST(value_to_overwrite, value_to_use) rule_vals,
       LISTAGG(DISTINCT val, '|') WITHIN GROUP(ORDER BY val)
       OVER(PARTITION BY id) vals
FROM data_table LEFT JOIN rule_table ON val = value_to_overwrite
)
SELECT id, val AS original_value,
       CASE WHEN value_to_overwrite IS NULL THEN val
            WHEN INSTR(vals, rule_vals)=0 THEN val
            ELSE value_to_use
       END     AS new_value
FROM x

This time, LISTAGG analytic function (with DISTINCT option – recently supported by Oracle) helps matching the val against value_to_overwrite and value_to_use pair.

I strongly recommend executing parts of the above queries to gain a better understanding of the demonstrated strategies. livesql.oracle.com site offers you a great query tool with the latest version of Oracle database.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions – Part 2

May 4, 2020Zahar HilkevichLeave a comment

In the previous blog post on that subject we reviewed a couple a notable applications of Oracle analytic functions. Today, I came across another interesting illustration of the same concept.

Problem: List all employees from the same department and holding the same job title as ADAMS.

Expected Result:

ENAME	JOB	DEPTNO
ADAMS	CLERK	20
SMITH	CLERK	20

Like before, we start with traditional approaches that every experienced developer would easily demonstrate.

Strategy #1: Using multi-column subquery

SELECT ename, job, deptno
FROM scott.emp
WHERE (deptno, job) IN (SELECT deptno, job
                        FROM scott.emp
                        WHERE ename = 'ADAMS')
ORDER BY 2, 3, 1

Strategy #2: Using self-join

SELECT a.ename, job, deptno
FROM scott.emp a JOIN scott.emp b USING(deptno, job)
WHERE b.ename = 'ADAMS'
ORDER BY job, deptno, a.ename

Strategy #3: Using EXISTS predicate

SELECT ename, job, deptno
FROM scott.emp a
WHERE EXISTS (SELECT 1
              FROM scott.emp
              WHERE ename  = 'ADAMS'
                AND deptno = a.deptno
                AND job    = a.job)
ORDER BY 2, 3, 1

A common feature of all the strategies above is having two copies of the emp table with two joining conditions (deptno, job) and one filter (ename = ‘ADAMS’)

As we have seen before, with analytic functions, we can get away with a single copy of th emp table.

Strategy #4: Using COUNT Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       COUNT(DECODE(ename, 'ADAMS', 1)) 
             OVER(PARTITION BY deptno, job) cnt
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE cnt > 0
ORDER BY 2, 3, 1

Of course, you can use different analytic functions here:

Strategy #5: Using MAX Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       MAX(DECODE(ename, 'ADAMS', ename)) 
           OVER(PARTITION BY deptno, job) adams
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE adams = 'ADAMS'
ORDER BY 2, 3, 1

Strategy #6: Using LISTAGG Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       LISTAGG(DECODE(ename, 'ADAMS', 'Y'), '|') WITHIN GROUP (ORDER BY 1) 
               OVER(PARTITION BY deptno, job) flag
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE flag LIKE 'Y%'
ORDER BY 2, 3, 1

We need to use LIKE operator in case we have more than a single Adams working in the same department and holding the same job title.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions

April 29, 2020Zahar Hilkevich1 Comment

Analytic functions are still underutilized among database developers. The goal of this publication is to demonstrate hidden opportunities to improve query performance by using analytic functions where traditional approach is still dominating.

Let’s consider a few examples. We will use Oracle’s traditional educational schema scott and its famous emp and dept tables.

Problem #1: Find all employees from the department where a top paid clerk works.

A quick look at the clerk records reveals the expected department (10):

SELECT deptno, job, sal
FROM scott.emp
WHERE job = 'CLERK'
ORDER BY sal DESC

Result:

DEPTNO	JOB	SAL
10	CLERK	1300
20	CLERK	1100
30	CLERK	950
20	CLERK	800

Traditional approach suggests finding the department (10) first and then the task becomes trivial:

SELECT ename, deptno, job, sal
FROM scott.emp
WHERE deptno IN (<department that we found>)

OK, that works, but now we need to produce a query that will get us that department number (of several ones if they all have top paid clerks).

There are many ways of finding a top record with and without analytic functions. If we don’t use analytic functions we will end up using two copies of the emp table as in the following example:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
  AND sal = (SELECT MAX(sal) FROM scott.emp WHERE job = 'CLERK')

or this:

SELECT a.deptno
FROM scott.emp a LEFT JOIN scott.emp b ON a.sal < b.sal
AND b.job = 'CLERK'
WHERE b.deptno IS NULL 
  AND a.job = 'CLERK'

The use of analytic functions reduces the number of table copies as we can get all the necessary details in a single table scan as in the following query:

WITH x AS (
SELECT deptno, MAX(sal) max_sal, RANK() OVER(ORDER BY MAX(sal) DESC) rk
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
)
SELECT deptno
FROM x
WHERE rk = 1

Such SQL looks even shorter in databases, such as Snowflake and Terdata, that support QUALIFY clause in SELECT statement:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
QUALIFY RANK() OVER(ORDER BY MAX(sal) DESC) = 1

So a complete traditional approach will use 3 copies of the emp table as in the following query:

SELECT ename, deptno, job, sal
FROM scott.emp                                 -- copy #1
WHERE deptno IN (SELECT deptno
                 FROM scott.emp                -- copy #2
                 WHERE job = 'CLERK'
                   AND sal = (SELECT MAX(sal) 
                              FROM scott.emp   -- copy #3
                              WHERE job = 'CLERK')
                 )

Yes, it works, but what an overkill! Is it still possible to use a single emp table scan to solve this problem? The answer is YES:

WITH x AS (
SELECT ename, deptno, job, sal, 
       MAX(DECODE(job,'CLERK',sal)) OVER() max_sal_global,
       MAX(DECODE(job,'CLERK',sal)) OVER(PARTITION BY deptno) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

And of course, in the Snowflake/Teradata SQL we would not even have to use a CTE (common table expression) , thanks to the QUALIFY clause.

MAX analytic functions combined with DECODE (CASE would work as well) here ignore all non-Clerk rows . The OVER clause gives us either department level top salary or global top salary value for the clerks. And since it is done on a row level, which means that the these analytic functions return the same values for all employees in the same department, we can achieve our goal easily.

MAX function is not the only one analytic function that can be used here. The following example demonstrates the FIRST_VALUE function to achieve the same result:

WITH x AS (
SELECT ename, deptno, job, sal, 
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_global,
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(PARTITION BY deptno
              ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

As an exercise, try to use LISTAGG analytic function to solve this problem.

Problem #2: Find employees who are paid above the average salary in their respective department.

Again, we will start with a “traditional” approach:

SELECT ename, deptno, job, sal
FROM scott.emp a
WHERE sal > (SELECT AVG(sal)
             FROM scott.emp
             WHERE deptno = a.deptno)
ORDER BY deptno, sal

Result:

ENAME	DEPTNO	JOB	SAL
KING	10	PRESIDENT	5000
JONES	20	MANAGER	2975
SCOTT	20	ANALYST	3000
FORD	20	ANALYST	3000
ALLEN	30	SALESMAN	1600
BLAKE	30	MANAGER	2850

An experienced developer would quickly see that this problem can be solved with the same approach as the problem #1 (above):

WITH x AS (
SELECT ename, deptno, job, sal, AVG(sal) OVER(PARTITION BY deptno) avg_sal
FROM scott.emp
)
SELECT *
FROM x
WHERE sal > avg_sal
ORDER BY deptno, sal

Result:

ENAME	DEPTNO	JOB	SAL	AVG_SAL
KING	10	PRESIDENT	5000	2916.66667
JONES	20	MANAGER	2975	2175
FORD	20	ANALYST	3000	2175
SCOTT	20	ANALYST	3000	2175
ALLEN	30	SALESMAN	1600	1566.66667
BLAKE	30	MANAGER	2850	1566.66667

With this strategy we can even see the department average salary value.

Problem #3: List all employees who work in the same department as the president.

The problem was discussed in one of my old blog posts and I strongly suggest you to check it out: A trick that helps avoiding multiple table scans.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

A trick that helps avoiding multiple table scans.

November 29, 2018Zahar Hilkevich1 Comment

Let’s look at a fairly simple SQL problem:

In a traditional scott.emp table, find all employees who work in the same department as the president.

Make your query work even if there are more than 1 president records exist in emp table
Make Oracle scan emp table just ONCE

A “traditional” solution to this problem may look like this:

Strategy #1: Using a subquery

SELECT *
FROM scott.emp
WHERE deptno IN (SELECT deptno 
                 FROM scott.emp 
                 WHERE job='PRESIDENT')

or this:

Strategy #2: Using a self-join

SELECT DISTINCT a.*
FROM scott.emp a JOIN scott.emp b ON a.deptno=b.deptno
WHERE b.job='PRESIDENT'

Note, that DISTINCT option in the above query is needed to prevent duplicates if there were multiple presidents in а specific department.

Both solutions above use 2 copies of the emp table which makes oracle scan the same scott.emp table twice.

A trick presented below allows you to use only a single copy of the emp table to solve the problem. The trick involves different conceptual and technical approaches compared to the solutions we have seen so far.

Conceptually, we should rephrase the problem in a way that would keep it identical and at the same time allows us to use different technical arsenal. This approach is explained in a detailed manner in my book “Oracle SQL Tricks and Workarounds”. We can rephrase the puzzle and say that we are looking for employees from departments with some “positive” number of presidents working there. Technically speaking, we need to use analytic function COUNT and check if it is greater than 0:

Strategy #3: Using analytic function COUNT

WITH x AS (
SELECT e.*, 
       COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno) cnt
FROM scott.emp e
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE cnt>0
ORDER BY empno

We do need to use a common table expression as we cannot filter out by analytic function in the same query where the function is used. Nevertheless, we scan the emp table just once, and during this scan, Oracle engine counts the number of presidents in each department.

COUNT is not the only analytic function that can be employed to solve the problem.

Strategy #4: Using analytic function LISTAGG

WITH x AS (
SELECT e.*, 
       LISTAGG(job,'|') 
         WITHIN GROUP (ORDER BY job) OVER(PARTITION BY deptno) jobs
FROM scott.emp e 
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE '|' || jobs || '|' LIKE '%|PRESIDENT|%'
ORDER BY empno

Instead of counting the presidents by department, we simply concatenate all the job titles and check if the resulting string includes a president.

Finally, if you don’t like using sub-queries in general, we can leverage the power of MODEL clause:

Strategy #5: Using MODEL clause to avoid sub-queries

SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM scott.emp
MODEL RETURN UPDATED ROWS 
DIMENSION BY (
  empno, 
  SIGN(COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno)) cnt
) 
MEASURES(ename, job, mgr, hiredate, sal, comm, deptno, 0 dummy) 
RULES(dummy[ANY, 1]=1)
ORDER BY empno

The tricky part here is using a composition of SIGN, COUNT, and DECODE functions (i.e. SIGN on top of what we used in Strategy #3) as a secondary dimension and empno as primary. Employee number is unique by itself, so adding another dimension will still maintain uniqueness required by MODEL clause. The only MODEL RULE changes the dummy measure which “triggers” the “RETURN UPDATED ROWS” instruction and returns only those rows where the dummy dimension was set to 1 – notice that its default value is 0.

You can check the execution plan for all of the above strategies to see how many times Oracle scans the emp table.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Interview Question: Count all rows in a table without using COUNT

March 17, 2018Zahar Hilkevich6 Comments

This is a fairly simple interview question:

Count all rows in a table without using COUNT and any Analytic functions.

A bit more advanced version of the puzzle could be to count rows without using ANY functions at all, neither aggregate nor analytic.

The following video will give you answers to this and some other questions:

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Interview Question: Get top and bottom paid employees in each department

February 15, 2018February 15, 2018Zahar Hilkevich1 Comment

This is a typical interview problem: list all bottom and top paid employees in each department. A preferred solution should not be using UNION or UNION ALL operators.

Please watch this short video to learn a couple of non-obvious techniques and to impress your potential employers on your next job interview.

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

3 Solutions to the 2018 Oracle SQL Puzzle of the Week #5

February 12, 2018Zahar HilkevichLeave a comment

Puzzle of the Week #5:

Find the shortest and longest last names of the employees in each department.

If two or more employees tie for the shortest or longest name, pick the name that comes first in alphabetical order
Use hr.employees or scott.emp tables
Use a single SELECT statement only
Ideally, the solution should NOT rely on any sub-queries, CTEs (WITH clause), or inline views
Exclude unknown (NULL) departments

Solutions:

#1. Using MIN() KEEP Group Function

SELECT department_id, 
 MIN(last_name) KEEP(DENSE_RANK FIRST 
                     ORDER BY LENGTH(last_name)) shortest,
 MIN(last_name) KEEP(DENSE_RANK FIRST 
                     ORDER BY LENGTH(last_name) DESC) longest
FROM hr.employees
WHERE department_id IS NOT NULL
GROUP BY department_id

#2. Using FIRST_VALUE Analytic Function and DISTINCT option

(Credit to Igor Shpungin)

SELECT DISTINCT department_id, 
   FIRST_VALUE(last_name) OVER(PARTITION BY department_id 
                               ORDER BY LENGTH(last_name))      shortest, 
   FIRST_VALUE(last_name) OVER(PARTITION BY department_id 
                               ORDER BY LENGTH(last_name) DESC) longest 
FROM hr.employees 
WHERE department_id IS NOT NULL 
ORDER BY 1

#3. Using MODEL clause

(Credit to Naoto Katayama)

SELECT department_id, shortest, longest
FROM hr.employees
WHERE department_id IS NOT NULL
MODEL
RETURN UPDATED ROWS
PARTITION BY (department_id)
DIMENSION BY (
 ROW_NUMBER()OVER(PARTITION BY department_id 
                  ORDER BY LENGTH(last_name), last_name) rn1,
 ROW_NUMBER()OVER(PARTITION BY department_id 
                  ORDER BY LENGTH(last_name) DESC, last_name) rn2)
MEASURES(last_name, 
         CAST(NULL AS VARCHAR2(25)) AS shortest, 
         CAST(NULL AS VARCHAR2(25)) AS longest)
RULES(
      shortest[0,0]=MAX(last_name)[1,ANY], 
      longest [0,0]=MAX(last_name)[ANY,1]
)
ORDER BY department_id

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

9 Solutions to Puzzle of the Week #21

August 29, 2016August 13, 2017Zahar HilkevichLeave a comment

Puzzle of the Week #21:

Produce a report that shows employee name, his/her immediate manager name, and the next level manager name. The following conditions should be met:

Use Single SELECT statement only
Use mgr column to identify employee’s immediate manager
The query should work in Oracle 11g.
A preferred solution should use only a single instance of emp table.

Expected Result:

NAME1      NAME2      NAME3
---------- ---------- ------
SMITH      FORD       JONES
ALLEN      BLAKE      KING
WARD       BLAKE      KING
JONES      KING
MARTIN     BLAKE      KING
BLAKE      KING
CLARK      KING
SCOTT      JONES      KING
KING
TURNER     BLAKE      KING
ADAMS      SCOTT      JONES
JAMES      BLAKE      KING
FORD       JONES      KING
MILLER     CLARK      KING

Solutions:

#1. Using connect_by_root, sys_connect_by_path, and regexp_substr functions

col name1 for a10
col name2 for a10
col name3 for a10
WITH x AS(
SELECT CONNECT_BY_ROOT(ename) name,
       SYS_CONNECT_BY_PATH(ename, ',') path,
       CONNECT_BY_ROOT(empno) empno
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
SELECT name, REGEXP_SUBSTR(MAX(path), '[^,]+', 1, 2) name2,
             REGEXP_SUBSTR(MAX(path), '[^,]+', 1, 3) name3
FROM x
GROUP BY name, empno
ORDER BY empno;

#2. Using CONNECT BY twice

WITH x AS (
SELECT ename, PRIOR ename mname, empno, mgr
FROM emp
WHERE LEVEL=2 OR mgr IS NULL
CONNECT BY PRIOR empno=mgr
)
SELECT ename name1, mname name2, MAX(PRIOR mname) name3
FROM x
WHERE LEVEL<=2
CONNECT BY PRIOR empno=mgr
GROUP BY ename, mname, empno
ORDER BY empno

#3. Using CONNECT BY and Self Outer Join

WITH x AS (
SELECT ename, PRIOR ename mname, PRIOR mgr AS mgr, empno
FROM emp
WHERE LEVEL=2 OR mgr IS NULL
CONNECT BY PRIOR empno=mgr
)
SELECT x.ename name1, x.mname name2, e.ename name3
FROM x LEFT JOIN emp e ON x.mgr=e.empno
ORDER BY x.empno

#4. Using 2 Self Outer Joins

SELECT a.ename name1, b.ename name2, c.ename name3
FROM emp a LEFT JOIN emp b ON a.mgr=b.empno
           LEFT JOIN emp c ON b.mgr=c.empno
ORDER BY a.empno

#5. Using CONNECT BY and PIVOT

SELECT name1, name2, name3
FROM (
SELECT ename, LEVEL lvl, CONNECT_BY_ROOT(empno) empno
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
PIVOT(
MAX(ename)
FOR lvl IN (1 AS name1, 2 AS name2, 3 AS name3)
)
ORDER BY empno;

#6. PIVOT Simulation

WITH x AS (
SELECT ename, LEVEL lvl, CONNECT_BY_ROOT(empno) empno
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
SELECT MAX(DECODE(lvl, 1, ename)) name1,
       MAX(DECODE(lvl, 2, ename)) name2,
       MAX(DECODE(lvl, 3, ename)) name3
FROM x
GROUP BY empno
ORDER BY empno;

#7. Using CONNECT BY and no WITH/Subqueries (Credit to Krishna Jamal)

SELECT ename Name1, PRIOR ename Name2,
DECODE(LEVEL, 
    3, CONNECT_BY_ROOT(ename), 
    4, TRIM(BOTH ' ' FROM 
        REPLACE(
            REPLACE(SYS_CONNECT_BY_PATH(PRIOR ename, ' '), PRIOR ename), 
        CONNECT_BY_ROOT(ename)))
        ) Name3
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
ORDER BY empno;

#8. A composition of Methods 1 and 7:

SELECT ename Name1, PRIOR ename Name2,
       CASE WHEN LEVEL IN (3,4) 
          THEN REGEXP_SUBSTR(SYS_CONNECT_BY_PATH(ename, ','),'[^,]+',1,LEVEL-2) 
       END AS Name3
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
ORDER BY empno;

#9. Using NTH_VALUE Analytic function (Oracle 11.2 and up):

WITH x AS (
SELECT CONNECT_BY_ROOT(ename) n1, CONNECT_BY_ROOT(empno) empno,
 NTH_VALUE(ename, 2) OVER(PARTITION BY CONNECT_BY_ROOT(ename) ORDER BY LEVEL) n2,
 NTH_VALUE(ename, 3) OVER(PARTITION BY CONNECT_BY_ROOT(ename) ORDER BY LEVEL) n3
FROM emp
WHERE LEVEL<=3
CONNECT BY empno=PRIOR mgr
)
SELECT n1 name1, MAX(n2) name2, MAX(n3) name3
FROM x
GROUP BY n1, empno
ORDER BY empno

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

3 Solutions to Puzzle of the Week #20

August 15, 2016August 15, 2016Zahar HilkevichLeave a comment

Puzzle of the Week #20:

Produce the historical highest/lowest salary report that should comply with the following requirements:

Use Single SELECT statement only
Only employees who was paid the highest or lowest salary in their respective department at the moment of hiring should be selected
Show name, date of hire, department number, job title, salary table (emp) columns and two additional calculated columns/flags: min_flag and max_flag to indicate that the employee was hired with the min/max salary in their respective department as of the time of hiring.
If two or more employees in the same department are paid the same max/min salary, only the one who was hired first should be picked for the report.
The query should work in Oracle 11g.

Expected Result:

POW20ER

#1. Using Common Table Expression (CTE) or Recursive WITH clause

WITH y AS (
SELECT ename, job, deptno, hiredate, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY hiredate) rn
FROM emp
), x (ename, job, deptno, hiredate, sal, min_sal, max_sal, min_flag, max_flag, rn) AS (
SELECT ename, job, deptno, hiredate, sal, sal, sal, 1, 1, 1
FROM y
WHERE rn=1
UNION ALL
SELECT y.ename, y.job, y.deptno, y.hiredate, y.sal, 
       LEAST(x.min_sal, y.sal), GREATEST(x.max_sal, y.sal),
       CASE WHEN y.sal<x.min_sal THEN 1 END, 
       CASE WHEN y.sal>x.max_sal THEN 1 END, y.rn
FROM y JOIN x ON y.deptno=x.deptno AND y.rn=x.rn+1
)
SELECT ename, job, deptno, hiredate, sal, min_flag, max_flag
FROM x
WHERE 1 IN (min_flag, max_flag)
ORDER BY deptno, hiredate;

#2. Using Cumulative Analytic Functions MIN, MAX, and ROW_NUMBER

WITH x AS (
SELECT ename, job, deptno, hiredate, sal,
       MIN(sal)OVER(PARTITION BY deptno ORDER BY hiredate) min_sal,
       MAX(sal)OVER(PARTITION BY deptno ORDER BY hiredate) max_sal,
       ROW_NUMBER()OVER(PARTITION BY deptno, sal ORDER BY hiredate) rn
FROM emp
)
SELECT ename, job, deptno, hiredate, sal,
       DECODE(sal, min_sal, 1) min_flag,
       DECODE(sal, max_sal, 1) max_flag
FROM x
WHERE sal IN (min_sal, max_sal)
  AND rn=1;

#3. Using Cumulative Analytic Functions MIN, MAX, and COUNT

WITH x AS (
SELECT ename, job, deptno, hiredate, sal,
       CASE WHEN MIN(sal)OVER(PARTITION BY deptno ORDER BY hiredate)=sal
             AND COUNT(*)OVER(PARTITION BY deptno, sal ORDER BY hiredate)=1 THEN 1 
       END min_flag,
       CASE WHEN MAX(sal)OVER(PARTITION BY deptno ORDER BY hiredate)=sal
             AND COUNT(*)OVER(PARTITION BY deptno, sal ORDER BY hiredate)=1 THEN 1 
       END max_flag
FROM emp
)
SELECT *
FROM x
WHERE 1 IN (min_flag, max_flag);

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Three Solutions to Puzzle of the Week #15

June 20, 2016June 20, 2016Zahar Hilkevich1 Comment

Puzzle of the Week #15:

Find all the year based intervals from 1975 up to now when the company did not hire employees. Use a single SELECT statement against emp table.

Expected Result:

years
------------
1975 - 1979
1983 - 1986
1988 - 2016

Solutions

#1: Grouping by an expression on ROWNUM (no Analytic functions!)

SQL> col years for a15

SQL> WITH x AS (
  2  SELECT 1975+LEVEL-1 yr
  3  FROM dual
  4  CONNECT BY 1975+LEVEL-1<=EXTRACT(YEAR FROM SYSDATE)
  5  MINUS
  6  SELECT EXTRACT(YEAR FROM hiredate)
  7  FROM emp
  8  )
  9  SELECT MIN(yr) || ' - ' || MAX(yr) "years"
 10  FROM x
 11  GROUP BY yr-ROWNUM
 12  ORDER BY yr-ROWNUM;

years
---------------
1975 - 1979
1983 - 1986
1988 - 2016

#2: Calculating steps with Analytic function and grouping by a sum of step.

WITH x AS (
SELECT 1975+LEVEL-1 yr
FROM dual
CONNECT BY 1975+LEVEL-1<=EXTRACT(YEAR FROM SYSDATE)
MINUS
SELECT EXTRACT(YEAR FROM hiredate)
FROM emp
), y AS (
SELECT DECODE(yr, LAG(yr,1)OVER(ORDER BY yr)+1, 0, 1) AS step, yr
FROM x
), z AS (
SELECT yr, SUM(step)OVER(ORDER BY yr) grp
FROM y
)
SELECT MIN(yr) || ' - ' || MAX(yr) "years"
FROM z
GROUP BY grp
ORDER BY grp;

years
---------------
1975 - 1979
1983 - 1986
1988 - 2016

#3: Using Self Outer Join to calculate steps

WITH x AS (
SELECT 1975+LEVEL-1 yr
FROM dual
CONNECT BY 1975+LEVEL-1<=EXTRACT(YEAR FROM SYSDATE)
MINUS
SELECT EXTRACT(YEAR FROM hiredate)
FROM emp
), y AS (
SELECT x1.yr, NVL2(x2.yr, 0, 1) step
FROM x x1 LEFT JOIN x x2 ON x1.yr=x2.yr+1
), z AS (
SELECT yr, SUM(step)OVER(ORDER BY yr) grp
FROM y
)
SELECT MIN(yr) || ' - ' || MAX(yr) "years"
FROM z
GROUP BY grp
ORDER BY grp;

years
---------------
1975 - 1979
1983 - 1986
1988 - 2016

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.