Substitution SQL Puzzle

May 17, 2020Zahar HilkevichLeave a comment

Level: Advanced

A colleague of mine approached me recently with a puzzle he struggled with: you have a table (let’s call it data_table) with id and val (i.e. value) columns. You are given two parameters: value_to_overwrite and value_to_use that should transform the content of the data_table in a special way:

If both parameters exist in the data_table in the val column for the same id, then the one that is equal to value_to_overwrite should be substituted with value_to_use
If none or just one of the parameters exist in the data_table.val column, than the val column should remain the same
List all the rows from the data_table after the transformation.

Let’s create the data_table using the following DDL command:

CREATE TABLE data_table AS
SELECT 1 id, 'a' val FROM dual
UNION ALL
SELECT 1 id, 'b' val FROM dual
UNION ALL
SELECT 1 id, 'c' val FROM dual
UNION ALL
SELECT 2 id, 'b' val FROM dual
UNION ALL
SELECT 2 id, 'd' val FROM dual

ID	VAL
1	a
1	b
1	c
2	b
2	d

For parameters value_to_overwrite = ‘a’ and value_to_use = ‘b’ the expected result should look like this:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	c
2	b	b
2	d	d

Note, that for id = 1, value ‘b’ is substituted with new value ‘a’ because both, value_to_overwrite (‘a’) and value_to_use (‘b’) exist in the val column. All other values should remain the same as substitution condition is not met.

To mimic the parameter use in the query we will create another table (rule_table) with a single row in it.

CREATE TABLE rule_table AS
SELECT 'a' value_to_use, 'b' value_to_overwrite
FROM dual

Translating requirements from English to SQL will likely result in a bulky and inefficient query. Let’s demonstrate that:

/* Values that need to be substituted */
SELECT d.id, d.val AS original_value, r.value_to_use AS new_value
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use IN (SELECT val
                         FROM data_table
                         WHERE id = d.id)
UNION ALL
/* Values that remain the same as only value_to_overwrite exist for given id */
SELECT d.id, d.val, d.val
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use NOT IN (SELECT val
                             FROM data_table
                             WHERE id = d.id)
UNION ALL
/* Values that remain the same as value_to_overwrite does not match val */
SELECT d.id, d.val, d.val
FROM data_table d
WHERE d.val NOT IN (SELECT value_to_overwrite
                    FROM rule_table)

As you can see, there are multiple (five) copies of the data_table used, which will lead to a poor performance when the size of the table increases dramatically.

A way better approach is to take the first SELECT from the UNIONed statement above and turn the INNER JOIN into an LEFT OUTER JOIN. At the same time, we need to move the filtering condition from the WHERE clause to the JOIN (otherwise, the LEFT JOIN will work as INNER JOIN):

SELECT d.id,
       d.val                      AS original_value,
       NVL(r.value_to_use, d.val) AS new_value
FROM data_table d LEFT JOIN rule_table r 
                  ON d.val = r.value_to_overwrite
                 AND r.value_to_use IN (SELECT val
                                        FROM data_table
                                        WHERE id = d.id)

This is a quite efficient and fairly short query that uses only two copies of the data_table. Can we do better than that? Yes, we can!

WITH x AS (
SELECT id, val,
       MIN(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  min_val,
       MAX(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  max_val,
       LEAST(value_to_use, value_to_overwrite)    min_ow,
       GREATEST(value_to_use, value_to_overwrite) max_ow,
       value_to_use, value_to_overwrite
FROM data_table CROSS JOIN rule_table 
)
SELECT id, val AS original_value,
       CASE WHEN min_val=min_ow AND
                 max_val=max_ow AND
                 val=value_to_overwrite THEN value_to_use
       ELSE val
       END AS new_value
FROM x

Analytic functions MIN and MAX let us scan the data_table vertically while LEAST and GREATEST do the same horizontally. The later pair of functions come very handy when you need to compare pairs of values, so the smaller of the values should match LEAST and the other – GREATEST.

And still, the last strategy has one flaw: we used a Cartesian Product (CROSS JOIN) which means that had we have more than one substitution rule, the method would not work properly. Let’s fix it.

First, we will add one more rule:

INSERT INTO rule_table VALUES('b', 'c')

Now, the expected result should looks as the following:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	b
2	b	b
2	d	d

Note, that the second rule turns original ‘c’ value into ‘b’.

And again, Analytic functions do all the magic:

WITH x AS (
SELECT id, val, value_to_overwrite, value_to_use,
       LEAST(value_to_overwrite, value_to_use) || '|' ||
       GREATEST(value_to_overwrite, value_to_use) rule_vals,
       LISTAGG(DISTINCT val, '|') WITHIN GROUP(ORDER BY val)
       OVER(PARTITION BY id) vals
FROM data_table LEFT JOIN rule_table ON val = value_to_overwrite
)
SELECT id, val AS original_value,
       CASE WHEN value_to_overwrite IS NULL THEN val
            WHEN INSTR(vals, rule_vals)=0 THEN val
            ELSE value_to_use
       END     AS new_value
FROM x

This time, LISTAGG analytic function (with DISTINCT option – recently supported by Oracle) helps matching the val against value_to_overwrite and value_to_use pair.

I strongly recommend executing parts of the above queries to gain a better understanding of the demonstrated strategies. livesql.oracle.com site offers you a great query tool with the latest version of Oracle database.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions – Part 2

May 4, 2020Zahar HilkevichLeave a comment

In the previous blog post on that subject we reviewed a couple a notable applications of Oracle analytic functions. Today, I came across another interesting illustration of the same concept.

Problem: List all employees from the same department and holding the same job title as ADAMS.

Expected Result:

ENAME	JOB	DEPTNO
ADAMS	CLERK	20
SMITH	CLERK	20

Like before, we start with traditional approaches that every experienced developer would easily demonstrate.

Strategy #1: Using multi-column subquery

SELECT ename, job, deptno
FROM scott.emp
WHERE (deptno, job) IN (SELECT deptno, job
                        FROM scott.emp
                        WHERE ename = 'ADAMS')
ORDER BY 2, 3, 1

Strategy #2: Using self-join

SELECT a.ename, job, deptno
FROM scott.emp a JOIN scott.emp b USING(deptno, job)
WHERE b.ename = 'ADAMS'
ORDER BY job, deptno, a.ename

Strategy #3: Using EXISTS predicate

SELECT ename, job, deptno
FROM scott.emp a
WHERE EXISTS (SELECT 1
              FROM scott.emp
              WHERE ename  = 'ADAMS'
                AND deptno = a.deptno
                AND job    = a.job)
ORDER BY 2, 3, 1

A common feature of all the strategies above is having two copies of the emp table with two joining conditions (deptno, job) and one filter (ename = ‘ADAMS’)

As we have seen before, with analytic functions, we can get away with a single copy of th emp table.

Strategy #4: Using COUNT Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       COUNT(DECODE(ename, 'ADAMS', 1)) 
             OVER(PARTITION BY deptno, job) cnt
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE cnt > 0
ORDER BY 2, 3, 1

Of course, you can use different analytic functions here:

Strategy #5: Using MAX Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       MAX(DECODE(ename, 'ADAMS', ename)) 
           OVER(PARTITION BY deptno, job) adams
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE adams = 'ADAMS'
ORDER BY 2, 3, 1

Strategy #6: Using LISTAGG Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       LISTAGG(DECODE(ename, 'ADAMS', 'Y'), '|') WITHIN GROUP (ORDER BY 1) 
               OVER(PARTITION BY deptno, job) flag
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE flag LIKE 'Y%'
ORDER BY 2, 3, 1

We need to use LIKE operator in case we have more than a single Adams working in the same department and holding the same job title.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions

April 29, 2020Zahar Hilkevich1 Comment

Analytic functions are still underutilized among database developers. The goal of this publication is to demonstrate hidden opportunities to improve query performance by using analytic functions where traditional approach is still dominating.

Let’s consider a few examples. We will use Oracle’s traditional educational schema scott and its famous emp and dept tables.

Problem #1: Find all employees from the department where a top paid clerk works.

A quick look at the clerk records reveals the expected department (10):

SELECT deptno, job, sal
FROM scott.emp
WHERE job = 'CLERK'
ORDER BY sal DESC

Result:

DEPTNO	JOB	SAL
10	CLERK	1300
20	CLERK	1100
30	CLERK	950
20	CLERK	800

Traditional approach suggests finding the department (10) first and then the task becomes trivial:

SELECT ename, deptno, job, sal
FROM scott.emp
WHERE deptno IN (<department that we found>)

OK, that works, but now we need to produce a query that will get us that department number (of several ones if they all have top paid clerks).

There are many ways of finding a top record with and without analytic functions. If we don’t use analytic functions we will end up using two copies of the emp table as in the following example:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
  AND sal = (SELECT MAX(sal) FROM scott.emp WHERE job = 'CLERK')

or this:

SELECT a.deptno
FROM scott.emp a LEFT JOIN scott.emp b ON a.sal < b.sal
AND b.job = 'CLERK'
WHERE b.deptno IS NULL 
  AND a.job = 'CLERK'

The use of analytic functions reduces the number of table copies as we can get all the necessary details in a single table scan as in the following query:

WITH x AS (
SELECT deptno, MAX(sal) max_sal, RANK() OVER(ORDER BY MAX(sal) DESC) rk
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
)
SELECT deptno
FROM x
WHERE rk = 1

Such SQL looks even shorter in databases, such as Snowflake and Terdata, that support QUALIFY clause in SELECT statement:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
QUALIFY RANK() OVER(ORDER BY MAX(sal) DESC) = 1

So a complete traditional approach will use 3 copies of the emp table as in the following query:

SELECT ename, deptno, job, sal
FROM scott.emp                                 -- copy #1
WHERE deptno IN (SELECT deptno
                 FROM scott.emp                -- copy #2
                 WHERE job = 'CLERK'
                   AND sal = (SELECT MAX(sal) 
                              FROM scott.emp   -- copy #3
                              WHERE job = 'CLERK')
                 )

Yes, it works, but what an overkill! Is it still possible to use a single emp table scan to solve this problem? The answer is YES:

WITH x AS (
SELECT ename, deptno, job, sal, 
       MAX(DECODE(job,'CLERK',sal)) OVER() max_sal_global,
       MAX(DECODE(job,'CLERK',sal)) OVER(PARTITION BY deptno) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

And of course, in the Snowflake/Teradata SQL we would not even have to use a CTE (common table expression) , thanks to the QUALIFY clause.

MAX analytic functions combined with DECODE (CASE would work as well) here ignore all non-Clerk rows . The OVER clause gives us either department level top salary or global top salary value for the clerks. And since it is done on a row level, which means that the these analytic functions return the same values for all employees in the same department, we can achieve our goal easily.

MAX function is not the only one analytic function that can be used here. The following example demonstrates the FIRST_VALUE function to achieve the same result:

WITH x AS (
SELECT ename, deptno, job, sal, 
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_global,
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(PARTITION BY deptno
              ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

As an exercise, try to use LISTAGG analytic function to solve this problem.

Problem #2: Find employees who are paid above the average salary in their respective department.

Again, we will start with a “traditional” approach:

SELECT ename, deptno, job, sal
FROM scott.emp a
WHERE sal > (SELECT AVG(sal)
             FROM scott.emp
             WHERE deptno = a.deptno)
ORDER BY deptno, sal

Result:

ENAME	DEPTNO	JOB	SAL
KING	10	PRESIDENT	5000
JONES	20	MANAGER	2975
SCOTT	20	ANALYST	3000
FORD	20	ANALYST	3000
ALLEN	30	SALESMAN	1600
BLAKE	30	MANAGER	2850

An experienced developer would quickly see that this problem can be solved with the same approach as the problem #1 (above):

WITH x AS (
SELECT ename, deptno, job, sal, AVG(sal) OVER(PARTITION BY deptno) avg_sal
FROM scott.emp
)
SELECT *
FROM x
WHERE sal > avg_sal
ORDER BY deptno, sal

Result:

ENAME	DEPTNO	JOB	SAL	AVG_SAL
KING	10	PRESIDENT	5000	2916.66667
JONES	20	MANAGER	2975	2175
FORD	20	ANALYST	3000	2175
SCOTT	20	ANALYST	3000	2175
ALLEN	30	SALESMAN	1600	1566.66667
BLAKE	30	MANAGER	2850	1566.66667

With this strategy we can even see the department average salary value.

Problem #3: List all employees who work in the same department as the president.

The problem was discussed in one of my old blog posts and I strongly suggest you to check it out: A trick that helps avoiding multiple table scans.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

A trick that helps avoiding multiple table scans.

November 29, 2018Zahar Hilkevich1 Comment

Let’s look at a fairly simple SQL problem:

In a traditional scott.emp table, find all employees who work in the same department as the president.

Make your query work even if there are more than 1 president records exist in emp table
Make Oracle scan emp table just ONCE

A “traditional” solution to this problem may look like this:

Strategy #1: Using a subquery

SELECT *
FROM scott.emp
WHERE deptno IN (SELECT deptno 
                 FROM scott.emp 
                 WHERE job='PRESIDENT')

or this:

Strategy #2: Using a self-join

SELECT DISTINCT a.*
FROM scott.emp a JOIN scott.emp b ON a.deptno=b.deptno
WHERE b.job='PRESIDENT'

Note, that DISTINCT option in the above query is needed to prevent duplicates if there were multiple presidents in а specific department.

Both solutions above use 2 copies of the emp table which makes oracle scan the same scott.emp table twice.

A trick presented below allows you to use only a single copy of the emp table to solve the problem. The trick involves different conceptual and technical approaches compared to the solutions we have seen so far.

Conceptually, we should rephrase the problem in a way that would keep it identical and at the same time allows us to use different technical arsenal. This approach is explained in a detailed manner in my book “Oracle SQL Tricks and Workarounds”. We can rephrase the puzzle and say that we are looking for employees from departments with some “positive” number of presidents working there. Technically speaking, we need to use analytic function COUNT and check if it is greater than 0:

Strategy #3: Using analytic function COUNT

WITH x AS (
SELECT e.*, 
       COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno) cnt
FROM scott.emp e
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE cnt>0
ORDER BY empno

We do need to use a common table expression as we cannot filter out by analytic function in the same query where the function is used. Nevertheless, we scan the emp table just once, and during this scan, Oracle engine counts the number of presidents in each department.

COUNT is not the only analytic function that can be employed to solve the problem.

Strategy #4: Using analytic function LISTAGG

WITH x AS (
SELECT e.*, 
       LISTAGG(job,'|') 
         WITHIN GROUP (ORDER BY job) OVER(PARTITION BY deptno) jobs
FROM scott.emp e 
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE '|' || jobs || '|' LIKE '%|PRESIDENT|%'
ORDER BY empno

Instead of counting the presidents by department, we simply concatenate all the job titles and check if the resulting string includes a president.

Finally, if you don’t like using sub-queries in general, we can leverage the power of MODEL clause:

Strategy #5: Using MODEL clause to avoid sub-queries

SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM scott.emp
MODEL RETURN UPDATED ROWS 
DIMENSION BY (
  empno, 
  SIGN(COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno)) cnt
) 
MEASURES(ename, job, mgr, hiredate, sal, comm, deptno, 0 dummy) 
RULES(dummy[ANY, 1]=1)
ORDER BY empno

The tricky part here is using a composition of SIGN, COUNT, and DECODE functions (i.e. SIGN on top of what we used in Strategy #3) as a secondary dimension and empno as primary. Employee number is unique by itself, so adding another dimension will still maintain uniqueness required by MODEL clause. The only MODEL RULE changes the dummy measure which “triggers” the “RETURN UPDATED ROWS” instruction and returns only those rows where the dummy dimension was set to 1 – notice that its default value is 0.

You can check the execution plan for all of the above strategies to see how many times Oracle scans the emp table.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

12 Solutions to 2018 Oracle SQL Puzzle of the Week #10

March 19, 2018March 19, 2018Zahar HilkevichLeave a comment

Top Salary Puzzle

Find highest salary in each department without using MAX function

Use a single SELECT statement only.
For an added complexity (optional): try not using ANY functions at all (neither group, nor analytic, not even scalar)

Expected Result:

DEPTNO	MAX_SAL
10	5000
20	3000
30	2850

Solutions:

We will begin with a simpler problem that does allow us using functions.

Solution #1. Using MIN function

Credit to: Boobal Ganesan

MIN function can be seen as an opposite to the MAX, so it is trivial to employ it here:

SELECT deptno, -MIN(-sal) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

This is an “order” based approach that sorts the values within a concatenated string and then uses regular expression to cut the first token.

SELECT deptno,
       REGEXP_SUBSTR(LISTAGG(sal,',') 
                     WITHIN GROUP(ORDER BY sal DESC),'[^,]+',1,1) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #3. Using AVG(…) KEEP() group function

This is another “order” based strategy whete AVG function can be replaced with MIN or any other aggregate function that returns a single value out of a set of identical ones.

SELECT deptno, AVG(sal) KEEP(DENSE_RANK FIRST ORDER BY sal DESC) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #4. Using Analytic function and CTE

ROW_NUMBER is chosen in this approach, though other analytic functions, such as RANK, DENSE_RANK, LEAD, LAG, FIRST_VALUE, etc can be used here (with some changes) as well. ROW_NUMBER is convenient to use as it allows to avoid DISTINCT option.

WITH x AS (
SELECT deptno, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY sal DESC) rn
FROM scott.emp
)
SELECT deptno, sal max_sal
FROM x
WHERE rn=1
ORDER BY 1;

Solution #5. Using MATCH_RECOGNIZE clause

Credit to: KATAYAMA NAOTO

This approach is similar to the previous one if we used LAG analytic function: which would return NULL for the top record.

SELECT deptno, sal max_sal 
FROM scott.emp
MATCH_RECOGNIZE (
PARTITION BY deptno
ORDER BY sal DESC
ALL ROWS PER MATCH
PATTERN (ISNULL)
DEFINE ISNULL AS PREV(ISNULL.sal) IS NULL
);

Solution #6. CONNECT BY and CONNECT_BY_ISLEAF while avoiding Analytic functions

This approach is a bit artificial. We could have used DISTINCT and avoid START WITH clause completely. CTEs x and y are used to simulate ROW_NUMBER analytic function.

WITH x AS (
SELECT deptno, sal
FROM scott.emp
ORDER BY 1,2
), y AS (
SELECT x.*, ROWNUM rn
FROM x
)
SELECT deptno, sal
FROM y
WHERE CONNECT_BY_ISLEAF=1
CONNECT BY deptno=PRIOR deptno
       AND rn=PRIOR rn+1
START WITH (deptno, rn) IN (SELECT deptno, MIN(rn)
                            FROM y
                            GROUP BY deptno);

Solution #7. Using MODEL clause with ROW_NUMBER function

This method is pretty much the same as in the Solution #4 above. The RETURN UPDATED ROWS and dummy measures are used to only return rows with rn=1.

SELECT deptno, max_sal
FROM scott.emp
MODEL
RETURN UPDATED ROWS
PARTITION BY (deptno)
DIMENSION BY (ROW_NUMBER() OVER(PARTITION BY deptno ORDER BY sal DESC) rn)
MEASURES(sal max_sal, 0 dummy)
RULES(
 dummy[1]=1
)
ORDER BY 1;

The following 5 solutions (##8-12) satisfy the “added complexity” term and do NOT use any functions at all.

Solution #8. Using ALL predicate

Generally speaking, >=ALL filter is identical to =(SELECT MAX() …). See my book for more detailed explanations.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE sal>=ALL(SELECT sal
               FROM scott.emp
               WHERE deptno=a.deptno)
GROUP BY deptno, sal
ORDER BY 1;

Solution #9. Using NOT EXISTS predicate

See Chapter 10 of my book for details.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE NOT EXISTS(SELECT 1
                 FROM scott.emp
                 WHERE deptno=a.deptno
                   AND sal>a.sal)
GROUP BY deptno, sal
ORDER BY 1;

Solution #10. Using Outer-Join with IS NULL filter

This approach is also covered very deeply in my book, Chapter 10.

SELECT a.deptno, a.sal max_sal
FROM scott.emp a LEFT JOIN scott.emp b ON a.deptno=b.deptno
                                      AND b.sal>a.sal 
WHERE b.empno IS NULL
GROUP BY a.deptno, a.sal
ORDER BY 1;

Solution #11. Using MINUS and ANY predicate

MINUS serves 2 purposes: it removes non-top rows and eliminates duplicates, so no DISTINCT option (or GROUP BY) is required.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE sal<ANY(SELECT sal 
              FROM scott.emp
              WHERE deptno=a.deptno);

Solution #12. Using MINUS and EXISTS predicate

Last two approaches covered in the drill from the Chapter 10 of my book.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE EXISTS(SELECT 1 
             FROM scott.emp
             WHERE deptno=a.deptno
               AND sal>a.sal);

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

8 Solutions to 2018 Oracle SQL Puzzle of the Week #9

March 12, 2018March 12, 2018Zahar HilkevichLeave a comment

Recent employment Puzzle

For each location, show 2 most recently hired employees

Use a single SELECT statement only.
ename1 and hiredate1 columns should correspond the latest hired employee while ename1 and hiredate1 columns – the previous one

Expected Result:

LOC	ENAME1	HIREDATE1	ENAME2	HIREDATE2
NEW YORK	MILLER	23-JAN-82	KING	17-NOV-81
CHICAGO	JAMES	03-DEC-81	MARTIN	28-SEP-81
DALLAS	ADAMS	23-MAY-87	SCOTT	19-APR-87

Solutions:

Solution #1. Using Self-Join and MAX functions

SELECT d.loc, 
     MAX(e1.ename) KEEP(DENSE_RANK FIRST ORDER BY e1.hiredate DESC) ename1, 
     MAX(e1.hiredate) hiredate1, 
     MAX(e2.ename) KEEP(DENSE_RANK FIRST ORDER BY e2.hiredate DESC) ename2, 
     MAX(e2.hiredate) hiredate2 
FROM scott.emp e1 JOIN scott.emp e2 ON e1.deptno=e2.deptno 
 AND e1.hiredate>=e2.hiredate 
 AND e1.ROWID!=e2.ROWID 
                  JOIN scott.dept d ON e1.deptno=d.deptno 
GROUP BY d.loc;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

SELECT d.loc, 
       MAX(ename) KEEP(DENSE_RANK FIRST ORDER BY hiredate DESC) ename1,
       MAX(hiredate) hiredate1, 
       REGEXP_SUBSTR(LISTAGG(ename, '|') WITHIN GROUP (ORDER BY hiredate DESC),
 '[^|]+',1,2) ename2,
       REGEXP_SUBSTR(LISTAGG(hiredate, '|') WITHIN GROUP (ORDER BY hiredate DESC),
 '[^|]+',1,2) hiredate2
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
GROUP BY d.loc
ORDER BY 1;

Solution #3. Using CTE, ROW_NUMBER, and Self-Join

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT a.loc, a.ename ename1, a.hiredate hiredate1,
              b.ename ename2, b.hiredate hiredate2
FROM x a JOIN x b ON a.loc=b.loc AND a.rn=1 AND b.rn=2;

Solution #4. Using Pivot

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, e1_ename AS ename1, e1_hdate AS hiredate1,
       e2_ename AS ename2, e2_hdate AS hiredate2
FROM x
PIVOT (
MAX(ename) ename, MAX(hiredate) hdate FOR rn IN (1 AS e1, 2 AS e2) 
)
ORDER BY 1;

Solution #5. Simulating Pivot with MAX and DECODE functions

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, 
       MAX(DECODE(rn,1,ename)) ename1, 
       MAX(DECODE(rn,1,hiredate)) hiredate1,
       MAX(DECODE(rn,2,ename)) ename2, 
       MAX(DECODE(rn,2,hiredate)) hiredate2
FROM x
GROUP BY loc
ORDER BY 1;

Solution #6. Using CONNECT BY

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, PRIOR ename ename1, PRIOR hiredate hiredate1, 
       ename ename2, hiredate hriedate2
FROM x
WHERE rn=2
START WITH rn=1
CONNECT BY loc=PRIOR loc
       AND rn=PRIOR rn+1;

Solution #7. Using LEAD and ROW_NUMBER Analytic functions

WITH x AS (
SELECT d.loc, e.ename ename1, e.hiredate hiredate1, 
 LEAD(e.ename,1) OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) ename2,
 LEAD(e.hiredate,1) OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) hiredate2,
 ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
) 
SELECT loc, ename1, hiredate1, ename2, hiredate2
FROM x
WHERE rn=1
ORDER BY 1;

Solution #8. Using Model Clause:

SELECT loc, ename1, hiredate1, ename2, hiredate2
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
MODEL
RETURN UPDATED ROWS
PARTITION BY (d.loc)
DIMENSION BY (
   ROW_NUMBER()OVER(PARTITION BY d.loc ORDER BY e.hiredate DESC) AS rn
)
MEASURES(
    ename AS ename1, hiredate AS hiredate1, 
    ename AS ename2, hiredate AS hiredate2
)
RULES(
    ename2[1]   =ename1[2],
    hiredate2[1]=hiredate1[2]
)
ORDER BY 1;

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

7 Solutions to 2018 Oracle SQL Puzzle of the Week #4

February 6, 2018Zahar HilkevichLeave a comment

2018 Puzzle of the Week #4:

Calculate Mutual Funds’ Performance

For a given table fund_performance (see the CREATE TABLE statement below), calculate each fund’s performance over the 6-month period from Jan-2016 till Jun-2016.

Use a single SELECT statement
Performance is calculated as a multiplication of all the months’ performance rates for the given time frame
The solution should work for any time frame, so treat from-month and to-month as query parameters
DDL command:

CREATE TABLE fund_performance AS
SELECT 1 fund_id, '2016-01' perf_month, 1.05 perf_rate
FROM dual 
UNION ALL
SELECT 1, '2016-02', 1.02 FROM dual UNION ALL
SELECT 1, '2016-03', 0.92 FROM dual UNION ALL
SELECT 1, '2016-04', 1.01 FROM dual UNION ALL
SELECT 1, '2016-05', 1.04 FROM dual UNION ALL
SELECT 1, '2016-06', 0.95 FROM dual UNION ALL
SELECT 2, '2016-01', 1.04 FROM dual UNION ALL
SELECT 2, '2016-02', 1.03 FROM dual UNION ALL
SELECT 2, '2016-03', 0.98 FROM dual UNION ALL
SELECT 2, '2016-04', 1.04 FROM dual UNION ALL
SELECT 2, '2016-05', 1.01 FROM dual UNION ALL
SELECT 2, '2016-06', 0.98 FROM dual;

Expected Result:

FUND_ID	Cumulative Performance
1	0.98
2	1.08

Solutions:

Solution #1: Using Math Formula (Sum of Logs = Log of Product)

SELECT fund_id, ROUND(EXP(SUM(LN(perf_rate))),2) "Cumulative Performance" 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-02' AND '2016-05' 
GROUP BY fund_id 
ORDER BY 1

Solution #2: Using Dynamic XML Query with XMLTYPE

(Credit to: Katayama Naoto)

SELECT fund_id,  
       ROUND(TO_NUMBER(EXTRACTVALUE(XMLTYPE(
		dbms_xmlgen.getxml('SELECT '|| LISTAGG(perf_rate,'*') 
                                      WITHIN GROUP(ORDER BY perf_month)||' C 
				    FROM dual')),'/ROWSET/ROW/C')),2) AS "cumulative performance" 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
GROUP BY fund_id 
ORDER BY 1

Solution #3: Using Dynamic XML with XMLQUERY

(Credit to: Boobal Ganesan)

SELECT fund_id, 
       ROUND(TO_NUMBER(XMLQUERY((LISTAGG(perf_rate,'*') 
	             WITHIN GROUP(ORDER BY fund_id)) RETURNING CONTENT)),2) "cumulative performance" 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
GROUP BY fund_id

Solution #4: Using Model Clause with 2 measures

(Credit to: Katayama Naoto)

WITH x AS ( 
SELECT fund_id, cump, flag 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
MODEL 
PARTITION BY (fund_id) 
DIMENSION BY (ROW_NUMBER()OVER(PARTITION BY fund_id ORDER BY perf_month) AS N) 
MEASURES(perf_rate, 
         CAST(0 AS NUMBER) AS cump, 
         CAST(0 AS NUMBER) AS flag) 
RULES( 
      cump[ANY] ORDER BY N = perf_rate[CV(N)] * NVL(cump[CV(N)-1],1),
      flag[ANY] ORDER BY N = NVL2(perf_rate[CV(N)+1],0,1)
     )
)
SELECT fund_id, ROUND(cump,2) "Cumulative Performance" 
FROM x
WHERE flag=1 
ORDER BY fund_id

Solution #5: Using Model clause with 1 measure

WITH d AS (
SELECT fund_id, perf_month, perf_rate, 
       RANK()OVER(PARTITION BY fund_id ORDER BY perf_month DESC) rk
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
), x AS ( 
SELECT * 
FROM d
MODEL 
  PARTITION BY (fund_id) 
  DIMENSION BY (ROW_NUMBER()OVER(PARTITION BY fund_id ORDER BY perf_month) AS N) 
  MEASURES     (perf_rate, rk, CAST(0 AS NUMBER) AS cump) 
  RULES        (cump[ANY] ORDER BY N = perf_rate[CV(N)] * NVL(cump[CV(N)-1],1) )
)
SELECT fund_id, ROUND(cump,2) "Cumulative Performance" 
FROM x
WHERE rk=1 
ORDER BY fund_id

Solution #6: Using Recursive CTE

WITH d AS (
SELECT fund_id, perf_rate, 
       ROW_NUMBER()OVER(PARTITION BY fund_id ORDER BY perf_month) rn,
       COUNT(*)OVER(PARTITION BY fund_id) cnt
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
), x(fund_id, cum_perf, rn, cnt) AS (
SELECT fund_id, perf_rate, 1, cnt
FROM d
WHERE rn=1
UNION ALL
SELECT x.fund_id, x.cum_perf*d.perf_rate, d.rn, d.cnt
FROM x JOIN d ON x.fund_id=d.fund_id
             AND x.rn+1=d.rn
)
SELECT fund_id, ROUND(cum_perf,2) "Cumulative Performance"
FROM x
WHERE rn=cnt

Solution #7: Using 12c new Function based WITH clause

(Credit to: Katayama Naoto)

WITH
FUNCTION product(list IN sys.odcinumberlist) RETURN NUMBER IS
   v_result NUMBER DEFAULT 1;
BEGIN
  FOR i IN list.FIRST .. list.LAST LOOP
      v_result := v_result * list(i);
  END LOOP;
  RETURN v_result;
END;
SELECT fund_id, product(CAST(COLLECT(perf_rate) AS sys.odcinumberlist)) AS "Cumulative Performance"
FROM fund_performance
GROUP BY fund_id
ORDER BY 1

You can execute first 6 of the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

How to generate a list of first N binary numbers in Oracle SQL?

June 4, 2017June 30, 2017Zahar HilkevichLeave a comment

In my recent post I showed how to convert a decimal number (i.e. an integer) into a binary string. We can build upon that technique to answer the question:

WITH x AS (
SELECT LEVEL n
FROM dual
CONNECT BY LEVEL<=50
)
SELECT x.N, y.bin
FROM x, LATERAL (SELECT LISTAGG(SIGN(BITAND(x.N, POWER(2,LEVEL-1))),'') 
                        WITHIN GROUP(ORDER BY LEVEL DESC) bin
                 FROM dual
                 CONNECT BY POWER(2, LEVEL-1)<=x.N) y

Note the LATERAL keyword (Oracle 12c new feature) that enables us to reference “x” in the inline view “y”. In pre-12c world, we would have to use TABLE/CAST/MULTISET function composition to achieve the same result:

WITH x AS (
SELECT LEVEL n
FROM dual
CONNECT BY LEVEL<=50
)
SELECT x.N, y.column_value bin
FROM x, TABLE(CAST(MULTISET(
          SELECT LISTAGG(SIGN(BITAND(x.N, POWER(2,LEVEL-1))),'') 
                 WITHIN GROUP(ORDER BY LEVEL DESC) bin
          FROM dual
          CONNECT BY POWER(2, LEVEL-1)<=x.N) AS sys.odcivarchar2list)) y

The idea used in the following query is based on a totally different approach. It builds a string of “0”s and “1”s in a loop until its length reaches a desired value:

WITH x(v, n) AS (
SELECT column_value, 1
FROM TABLE(sys.odcivarchar2list('0','1'))
UNION ALL
SELECT x.v || t.column_value, x.n+1
FROM TABLE(sys.odcivarchar2list('0','1')) t JOIN x on LENGTH(x.v)=n
WHERE n<=CEIL(LOG(2,50))
), y AS (
SELECT NVL(LTRIM(x.v,'0'),'0') bin, ROWNUM-1 dec
FROM x
WHERE n=(SELECT MAX(n) FROM x)
)
SELECT *
FROM y
WHERE dec<=50

To better understand the above query, try the following one:

SELECT *                            
FROM TABLE(sys.odcivarchar2list('0','1')), 
     TABLE(sys.odcivarchar2list('0','1')),
     TABLE(sys.odcivarchar2list('0','1')),
     TABLE(sys.odcivarchar2list('0','1'))

If we put enough tables in the Cartesian product and concatenate all column_value columns in a single character string expression, we will achieve our goal. The challenge with this approach is to dynamically change the number of the tables in the FROM clause. This can be simulated in the recursive WITH clause by repeatedly adding more and more collections of bits (0 and 1).

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Integer to Binary Conversion in Oracle SQL

May 25, 2017May 25, 2017Zahar Hilkevich3 Comments

Interestingly enough, Oracle does not have a built-in function to convert Decimal numbers (i.e. integers) into Binary. This post offers an elegant way of doing so.

The following script is intended to be executed in SQL*Plus, so it uses some SQL*Plus commands:

column bin format a40
undefine N
SELECT LISTAGG(SIGN(BITAND(&&N, POWER(2,LEVEL-1))),'') 
       WITHIN GROUP(ORDER BY LEVEL DESC) bin
FROM dual
CONNECT BY POWER(2, LEVEL-1)<=&&N;

Result (for N=400):

BIN
-------------
110010000

Result (for N=1401):

BIN
------------
10101111001

Explanation:

How many digits may the resulting binary string have? The answer comes from Math: not more than LOG(2, N) + 1. Let’s first generate a numeric range from 1 to LOG(2,N)+1:

SELECT LEVEL
FROM dual
CONNECT BY LEVEL<=LOG(2,&N)+1

Result (for N=1401):

Alternatively, we can use mathematically equivalent condition in the CONNECT BY clause using POWER instead of LOG function:

SELECT LEVEL
FROM dual
CONNECT BY POWER(2,LEVEL)<=&N*2

SELECT LEVEL
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&N

Now, we will check every bit of the desired result (i.e. binary representation of N) by using BITAND function:

SELECT LEVEL, BITAND(&&N, POWER(2,LEVEL-1)) bit
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&&N

Result (for N=12):

LEVEL        BIT
----- ----------
    1          0
    2          0
    3          4
    4          8

Positive values in the bit column refer to a bit 1 in the corresponding position (in reverse order) of the binary value. It’s easy to turn those values to 1 by using SIGN function:

SELECT LEVEL, SIGN(BITAND(&&N, POWER(2,LEVEL-1))) bit
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&&N

Result (for N=12):

LEVEL        BIT
----- ----------
    1          0
    2          0
    3          1
    4          1

Here, we can see that we need to concatenate the values in the bit column in reverse order. This is very easy to do using LISTAGG function:

SELECT LISTAGG(SIGN(BITAND(&&N, POWER(2,LEVEL-1))),'') 
       WITHIN GROUP(ORDER BY LEVEL DESC) bin
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&&N

Result (for N=12):

BIN
----------
1100

Note that we sorted all the rows in descending order of the LEVEL to obtain the correct order of bits.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Solutions to Puzzle of the Week #13

June 6, 2016Zahar HilkevichLeave a comment

Puzzle of the Week #13:

Table Setup and Puzzle description can be located here

Expected Result:

  ID FULL_NAME                             GROUP_ID
---- ----------------------------------- ----------
   8 Oscar Pedro Fernando Rodriguez               1
   9 Rodriguez, Oscar Pedro Fernando              1
  10 Oscar Fernando Rodriguez Pedro               1
   1 John Smith                                   2
   2 John L. Smith                                2
   4 Smith, John                                  2
   5 Tom Khan                                     3
  11 KHAN, TOM S.                                 3

Solutions:

#1. Using CTE (Recursive WITH) and LISTAGG

WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y(id, token, lvl) AS (
SELECT name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, 1), 1 
FROM x
UNION ALL
SELECT x.name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1), y.lvl+1
FROM x JOIN y ON x.name_id=y.id AND REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1) IS NOT NULL
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name, 
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
--- ---------------------------------------- ----------
  8 Oscar Pedro Fernando Rodrigues                    1
  9 Rodrigues, Oscar Pedro Fernando                   1
 10 Oscar Fernando Rodrigues Pedro                    1
  1 John Smith                                        2
  2 John L. Smith                                     2
  4 Smith, John                                       2
  5 Tom Khan                                          3
 11 KHAN, TOM S.                                      3

Explanation:

The key idea is to split each name into multiple name tokens, then sort and merge them back into a single line. Matching (duplicate) names will have the same merged line so we could use it to identify duplicates. DENSE_RANK analytic function is used to generate sequential group id values.

The same idea is used in the solution below. The only difference is the way to split the names into tokens.

#2: Using CONNECT BY and TABLE/CAST/MULTISET functions

 WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y AS (
SELECT name_id AS id, y.column_value AS token
FROM x,
     TABLE(CAST(MULTISET(SELECT REGEXP_SUBSTR(x.full_name, '[^ ]+', 1, LEVEL) token
                    FROM dual
                    CONNECT BY LEVEL <= LENGTH(full_name)-LENGTH(REPLACE(full_name,' '))+1
                        )
                AS sys.odcivarchar2list)
          ) y
WHERE LENGTH(y.column_value)>1
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name,
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
---- ---------------------------------------- ----------
   8 Oscar Pedro Fernando Rodrigues                    1
   9 Rodrigues, Oscar Pedro Fernando                   1
  10 Oscar Fernando Rodrigues Pedro                    1
   1 John Smith                                        2
   2 John L. Smith                                     2
   4 Smith, John                                       2
   5 Tom Khan                                          3
  11 KHAN, TOM S.                                      3