Explore the power of analytic functions – Part 2

In the previous blog post on that subject we reviewed a couple a notable applications of Oracle analytic functions. Today, I came across another interesting illustration of the same concept.

Problem: List all employees from the same department and holding the same job title as ADAMS.

Expected Result:

ENAME JOB DEPTNO
ADAMS CLERK 20
SMITH CLERK 20

Like before, we start with traditional approaches that every experienced developer would easily demonstrate.

Strategy #1:  Using multi-column subquery

SELECT ename, job, deptno
FROM scott.emp
WHERE (deptno, job) IN (SELECT deptno, job
                        FROM scott.emp
                        WHERE ename = 'ADAMS')
ORDER BY 2, 3, 1

Strategy #2:  Using self-join

SELECT a.ename, job, deptno
FROM scott.emp a JOIN scott.emp b USING(deptno, job)
WHERE b.ename = 'ADAMS'
ORDER BY job, deptno, a.ename

Strategy #3:  Using EXISTS predicate

SELECT ename, job, deptno
FROM scott.emp a
WHERE EXISTS (SELECT 1
              FROM scott.emp
              WHERE ename  = 'ADAMS'
                AND deptno = a.deptno
                AND job    = a.job)
ORDER BY 2, 3, 1

A common feature of all the strategies above is having two copies of the emp table with two joining conditions (deptno, job) and one filter (ename = ‘ADAMS’)

As we have seen before, with analytic functions, we can get away with a single copy of th emp table.

Strategy #4:  Using COUNT Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       COUNT(DECODE(ename, 'ADAMS', 1)) 
             OVER(PARTITION BY deptno, job) cnt
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE cnt > 0
ORDER BY 2, 3, 1

Of course, you can use different analytic functions here:

Strategy #5:  Using MAX Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       MAX(DECODE(ename, 'ADAMS', ename)) 
           OVER(PARTITION BY deptno, job) adams
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE adams = 'ADAMS'
ORDER BY 2, 3, 1

Strategy #6:  Using LISTAGG Analytic function

WITH x AS (
SELECT ename, job, deptno, 
       LISTAGG(DECODE(ename, 'ADAMS', 'Y'), '|') WITHIN GROUP (ORDER BY 1) 
               OVER(PARTITION BY deptno, job) flag
FROM scott.emp
)
SELECT ename, job, deptno
FROM x
WHERE flag LIKE 'Y%'
ORDER BY 2, 3, 1

We need to use LIKE operator in case we have more than a single Adams working in the same department and holding the same job title.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Explore the power of analytic functions

Analytic functions are still underutilized among database developers. The goal of this publication is to demonstrate hidden opportunities to improve query performance by using analytic functions where traditional approach is still dominating.

Let’s consider a few examples. We will use Oracle’s traditional educational schema scott and its famous emp and dept tables.

Problem #1: Find all employees from the department where a top paid clerk works.

A quick look at the clerk records reveals the expected department (10):

SELECT deptno, job, sal
FROM scott.emp
WHERE job = 'CLERK'
ORDER BY sal DESC

Result:

DEPTNO JOB SAL
10 CLERK 1300
20 CLERK 1100
30 CLERK 950
20 CLERK 800

Traditional approach suggests finding the department (10) first and then the task becomes trivial:

SELECT ename, deptno, job, sal
FROM scott.emp
WHERE deptno IN (<department that we found>)

OK, that works, but now we need to produce a query that will get us that department number (of several ones if they all have top paid clerks).

There are many ways of finding a top record with and without analytic functions. If we don’t use analytic functions we will end up using two copies of the emp table as in the following example:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
  AND sal = (SELECT MAX(sal) FROM scott.emp WHERE job = 'CLERK')

or this:

SELECT a.deptno
FROM scott.emp a LEFT JOIN scott.emp b ON a.sal < b.sal
AND b.job = 'CLERK'
WHERE b.deptno IS NULL 
  AND a.job = 'CLERK'

The use of analytic functions reduces the number of table copies as we can get all the necessary details in a single table scan as in the following query:

WITH x AS (
SELECT deptno, MAX(sal) max_sal, RANK() OVER(ORDER BY MAX(sal) DESC) rk
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
)
SELECT deptno
FROM x
WHERE rk = 1

Such SQL looks even shorter in databases, such as Snowflake and Terdata, that support QUALIFY clause in SELECT statement:

SELECT deptno
FROM scott.emp
WHERE job = 'CLERK'
GROUP BY deptno
QUALIFY RANK() OVER(ORDER BY MAX(sal) DESC) = 1

So a complete traditional approach will use 3 copies of the emp table as in the following query:

SELECT ename, deptno, job, sal
FROM scott.emp                                 -- copy #1
WHERE deptno IN (SELECT deptno
                 FROM scott.emp                -- copy #2
                 WHERE job = 'CLERK'
                   AND sal = (SELECT MAX(sal) 
                              FROM scott.emp   -- copy #3
                              WHERE job = 'CLERK')
                 )

Yes, it works, but what an overkill! Is it still possible to use a single emp table scan to solve this problem? The answer is YES:

WITH x AS (
SELECT ename, deptno, job, sal, 
       MAX(DECODE(job,'CLERK',sal)) OVER() max_sal_global,
       MAX(DECODE(job,'CLERK',sal)) OVER(PARTITION BY deptno) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

And of course, in the Snowflake/Teradata SQL we would not even have to use a CTE (common table expression) , thanks to the QUALIFY clause.

MAX analytic functions combined with DECODE (CASE would work as well) here ignore all non-Clerk rows . The OVER clause gives us either department level top salary or global top salary value for the clerks. And since it is done on a row level, which means that the these analytic functions return the same values for all employees in the same department, we can achieve our goal easily.

MAX function is not the only one analytic function that can be used here. The following example demonstrates the FIRST_VALUE function to achieve the same result:

WITH x AS (
SELECT ename, deptno, job, sal, 
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_global,
       FIRST_VALUE(DECODE(job,'CLERK',sal)) 
         OVER(PARTITION BY deptno
              ORDER BY DECODE(job,'CLERK',sal) DESC NULLS LAST) max_sal_dept
FROM scott.emp
)
SELECT ename, deptno, job, sal
FROM x
WHERE max_sal_global=max_sal_dept

As an exercise, try to use LISTAGG analytic function to solve this problem.

Problem #2: Find employees who are paid above the average salary in their respective department.

Again, we will start with a “traditional” approach:

SELECT ename, deptno, job, sal
FROM scott.emp a
WHERE sal > (SELECT AVG(sal)
             FROM scott.emp
             WHERE deptno = a.deptno)
ORDER BY deptno, sal

Result:

ENAME DEPTNO JOB SAL
KING 10 PRESIDENT 5000
JONES 20 MANAGER 2975
SCOTT 20 ANALYST 3000
FORD 20 ANALYST 3000
ALLEN 30 SALESMAN 1600
BLAKE 30 MANAGER 2850

An experienced developer would quickly see that this problem can be solved with the same approach as the problem #1 (above):

WITH x AS (
SELECT ename, deptno, job, sal, AVG(sal) OVER(PARTITION BY deptno) avg_sal
FROM scott.emp
)
SELECT *
FROM x
WHERE sal > avg_sal
ORDER BY deptno, sal

Result:

ENAME DEPTNO JOB SAL AVG_SAL
KING 10 PRESIDENT 5000 2916.66667
JONES 20 MANAGER 2975 2175
FORD 20 ANALYST 3000 2175
SCOTT 20 ANALYST 3000 2175
ALLEN 30 SALESMAN 1600 1566.66667
BLAKE 30 MANAGER 2850 1566.66667

With this strategy we can even see the department average salary value.

Problem #3: List all employees who work in the same department as the president.

The problem was discussed in one of my old blog posts and I strongly suggest you to check it out: A trick that helps avoiding multiple table scans.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

 

 

Monday Hiring SQL Puzzle and Lateral View Usage Nuance

Let’s solve a fairly simple SQL problem:

For each department count the number of people hired on Monday

• Use scott.emp table
• Show department number and count columns
• If no employees from a given department was hired on Monday, we should list such
department with 0 in the count column
• Sort the result by the department number

Expected Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

We will start with the in-line scalar subquery approach as it is probably one of the most intuitive:

Strategy #1: In-Line Scalar Subquery

SELECT deptno, (SELECT COUNT(*)
                FROM scott.emp
                WHERE deptno=e.deptno
                AND TO_CHAR(hiredate, 'DY')='MON') mon_hires
FROM scott.emp e
GROUP BY deptno
ORDER BY 1

When you only need a single value/column/expression from a correlated subquery, the in-line subquery in SELECT clause works just fine. If we needed more than one: count and let say total salary, we would need to use LATERAL view (or a completely different approach – see below).

Strategy #2: Lateral View

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires, m.total_sal
FROM x, LATERAL (SELECT COUNT(*) mon_hires, SUM(sal) total_sal
                 FROM scott.emp e
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

So far, all is good. I know that some database developers don’t like using table aliases too much and when an opportunity comes they use ANSI standard JOIN syntax with USING clause. Can it be applied in the lateral view?

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(empno) mon_hires
                 FROM scott.emp JOIN x USING (deptno)
                 WHERE TO_CHAR(hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The syntax is correct but the result is apparently not! What happened?

We replaced the WHERE clause condition of e.deptno=x.deptno with the JOIN x USING(deptno) – it first looked legitimate to me until I realized that we have just added an extra join instead of a reference to an external table (CTE x). Essentially, our last (incorrect) query is the same as the following:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(*) mon_hires
                 FROM scott.emp e, x
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

We simply introduced a new (and unwanted) join on the CTE x and turned the correlated reference (in the WHERE clause) into an old-style joining condition which did not even require the LATERAL view functionality:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, (SELECT COUNT(*) mon_hires
         FROM scott.emp e, x
         WHERE e.deptno=x.deptno
           AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The above examples prove that we need to be very careful when mixing up different techniques and new syntax options.

Finally, the last approach will show that only a single scan of the emp table is needed to get the result:

Strategy #3: Conditional Counting

SELECT deptno, COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

This is the best and incidentally the shortest solution that once again demonstrates the power of conditional counting (aggregation) right in SELECT clause.

Likewise we can also show the total salary of those hired on Monday:

SELECT deptno, 
       COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires,
       SUM(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', sal)) total_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

3 Solutions to 2018 Oracle SQL Puzzle of the Week #14

Yet Another Top Employee Puzzle

Find the employee who remained the top paid employee (excluding the president) the longest period of time between 1980 and 1981

  • Use a single SELECT statement only.
  • President should be excluded from the analysis.
  • Show the number of days the employee remained the top paid person as well as Start Date (hiredate) and End Date (the date when another top employee started)
  • The End Date for the last top paid employee in the interval should be 31-DEC-1981.

Expected Result:

EMPNO ENAME JOB SAL Start Date End Date Days on Top
7566 JONES MANAGER 2975 02-APR-81 03-DEC-81 245

Solutions:

Solution #1. Using RANK to filter the top employee:

WITH x AS ( 
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
WHERE job!='PRESIDENT' 
), y AS ( 
SELECT empno, ename, job, sal, hiredate start_date, max_sal,  
       LEAD(hiredate) OVER(ORDER BY hiredate) end_date 
FROM x 
WHERE sal=max_sal 
), z AS ( 
SELECT y.*, LEAST(end_date, date'1981-12-31')-start_date days_on_top, 
RANK()OVER(ORDER BY LEAST(end_date, date'1981-12-31')-start_date DESC) rk 
FROM y 
WHERE EXTRACT(YEAR FROM start_date) IN (1980, 1981)  
) 
SELECT empno,ename,job,sal, start_date "Start Date", 
       end_date "End Date", days_on_top	"Days on Top" 
FROM z 
WHERE rk=1

Solution #2. Using Subquery to filter the top employee:

WITH x AS ( 
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
WHERE job!='PRESIDENT' 
  AND hiredate>=date'1980-01-01'  
), y AS ( 
SELECT empno, ename, job, sal, hiredate start_date,  
       LEAST(date'1981-12-31', 
             LEAD(hiredate) OVER(ORDER BY hiredate)) end_date 
FROM x 
WHERE sal=max_sal 
) 
SELECT empno,ename,job,sal, start_date "Start Date", 
       end_date "End Date", end_date-start_date "Days on Top" 
FROM y 
WHERE end_date-start_date=(SELECT MAX(end_date-start_date) FROM y)

Solution #3. Using MODEL with RETURN UPDATED ROWS to filter the top employee:

WITH e AS ( 
SELECT empno, ename, sal, job, LEAST(hiredate, date'1981-12-31') hiredate,  
       MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp 
WHERE hiredate>=date'1980-01-01'  
  AND job!='PRESIDENT' 
), x AS ( 
SELECT empno, ename, job, sal, hiredate,  
       NVL(LEAD(hiredate)OVER(ORDER BY hiredate),
           date'1981-12-31')-hiredate diff, 
       NVL(LEAD(hiredate)OVER(ORDER BY hiredate),
           date'1981-12-31') end_date 
FROM e 
WHERE sal=max_sal 
) 
SELECT empno, ename, job, sal, hiredate "Start Date", 
       end_date "End Date", diff "Days on Top" 
FROM x 
MODEL RETURN UPDATED ROWS 
DIMENSION BY (empno, RANK()OVER(ORDER BY diff DESC) rk) 
MEASURES(ename,job,sal, hiredate, end_date, diff, 0 dummy) 
RULES(dummy[ANY, 1]=1)

The following query will only work as long as there is only 1 top paid employee who stayed on top the longest. In case if we had more than 1 it would only list one of those:

WITH x AS (
SELECT empno, ename, job, sal, hiredate, 
       MAX(sal)OVER(ORDER BY hiredate) max_sal
FROM scott.emp 
WHERE job!='PRESIDENT'
), y AS (
SELECT empno, ename, job, sal, hiredate start_date, 
 LEAST(date'1981-12-31', 
       LEAD(hiredate) OVER(ORDER BY hiredate)) end_date,
 LEAST(date'1981-12-31', 
       LEAD(hiredate) OVER(ORDER BY hiredate))-hiredate days_top
FROM x
WHERE sal=max_sal
ORDER BY days_top DESC NULLS LAST, hiredate
)
SELECT *
FROM y
WHERE ROWNUM=1

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Further Reading:

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

3 Solutions to 2018 Oracle SQL Puzzle of the Week #11

Mimic ROW_NUMBER function

Write a single SELECT statement that produces the same result as the following one:

SELECT e.*, ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY ename) rn
FROM scott.emp e
  • Use a single SELECT statement only.
  • Analytic functions are NOT allowed
  • Any SQL clauses that use PARTITION BY keywords are NOT allowed

Expected Result:

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO RN
7782 CLARK MANAGER 7839 09-JUN-81 2450 10 1
7839 KING PRESIDENT 17-NOV-81 5000 10 2
7934 MILLER CLERK 7782 23-JAN-82 1300 10 3
7876 ADAMS CLERK 7788 23-MAY-87 1100 20 1
7902 FORD ANALYST 7566 03-DEC-81 3000 20 2
7566 JONES MANAGER 7839 02-APR-81 2975 20 3
7788 SCOTT ANALYST 7566 19-APR-87 3000 20 4
7369 SMITH CLERK 7902 17-DEC-80 800 20 5
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30 1
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30 2
7900 JAMES CLERK 7698 03-DEC-81 950 30 3
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30 4
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30 5
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30 6

Solutions:

Solution #1. Using MATCH_RECOGNIZE clause

Credit to: Naoto Katayama

SELECT empno,ename,job,mgr,hiredate,sal,comm,deptno,rn  
FROM scott.emp 
MATCH_RECOGNIZE ( 
ORDER BY deptno,ename,empno 
MEASURES RUNNING COUNT(*) AS rn 
ALL ROWS PER MATCH 
PATTERN (FIRSTROW NEXTROWS*) 
DEFINE 
   FIRSTROW AS PREV(FIRSTROW.deptno) IS NULL  
OR PREV(FIRSTROW.deptno) != FIRSTROW.deptno, 
   NEXTROWS AS PREV(NEXTROWS.deptno) = NEXTROWS.deptno 
)

Solution #2. Using Self-Join with Cartesian Product and GROUP BY

Partial Credit to: Boobal Ganesan

SELECT e1.empno,e1.ename,e1.job,e1.mgr,e1.hiredate,e1.sal,e1.comm,e1.deptno,  
       COUNT(*) rn 
FROM scott.emp e1 LEFT OUTER JOIN scott.emp e2  
  ON e1.deptno = e2.deptno 
 AND e2.ename || ROWIDTOCHAR(e2.ROWID) <= e1.ename || ROWIDTOCHAR(e1.ROWID) 
GROUP BY e1.empno,e1.ename,e1.job,e1.mgr,e1.hiredate,e1.sal,e1.comm,e1.deptno 
ORDER BY e1.deptno, COUNT(*)

Solution #3. Using CTE, ROWNUM, and arithmetic formula

WITH x AS ( 
SELECT * 
FROM scott.emp 
ORDER BY deptno, ename 
), y AS ( 
SELECT deptno, MIN(ROWNUM) min_rn 
FROM x 
GROUP BY deptno 
) 
SELECT x.*, ROWNUM-y.min_rn+1 AS rn 
FROM x JOIN y ON x.deptno=y.deptno 
ORDER BY x.deptno, x.ename

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

12 Solutions to 2018 Oracle SQL Puzzle of the Week #10

Top Salary Puzzle

Find highest salary in each department without using MAX function

  • Use a single SELECT statement only.
  • For an added complexity (optional): try not using ANY functions at all (neither group, nor analytic, not even scalar)

Expected Result:

DEPTNO MAX_SAL
10 5000
20 3000
30 2850

Solutions:

We will begin with a simpler problem that does allow us using functions.

Solution #1. Using MIN function

Credit to: Boobal Ganesan

MIN function can be seen as an opposite to the MAX, so it is trivial to employ it here:

SELECT deptno, -MIN(-sal) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

This is an “order” based approach that sorts the values within a concatenated string and then uses regular expression to cut the first token.

SELECT deptno,
       REGEXP_SUBSTR(LISTAGG(sal,',') 
                     WITHIN GROUP(ORDER BY sal DESC),'[^,]+',1,1) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #3. Using AVG(…) KEEP() group function

This is another “order” based strategy whete AVG function can be replaced with MIN or any other aggregate function that returns a single value out of a set of identical ones.

SELECT deptno, AVG(sal) KEEP(DENSE_RANK FIRST ORDER BY sal DESC) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #4. Using Analytic function and CTE

ROW_NUMBER is chosen in this approach, though other analytic functions, such as RANK, DENSE_RANK, LEAD, LAG, FIRST_VALUE, etc can be used here (with some changes) as well. ROW_NUMBER is convenient to use as it allows to avoid DISTINCT option.

WITH x AS (
SELECT deptno, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY sal DESC) rn
FROM scott.emp
)
SELECT deptno, sal max_sal
FROM x
WHERE rn=1
ORDER BY 1;

Solution #5. Using MATCH_RECOGNIZE clause

Credit to: KATAYAMA NAOTO

This approach is similar to the previous one if we used LAG analytic function: which would return NULL for the top record.

SELECT deptno, sal max_sal 
FROM scott.emp
MATCH_RECOGNIZE (
PARTITION BY deptno
ORDER BY sal DESC
ALL ROWS PER MATCH
PATTERN (ISNULL)
DEFINE ISNULL AS PREV(ISNULL.sal) IS NULL
);

Solution #6. CONNECT BY and CONNECT_BY_ISLEAF while avoiding Analytic functions

This approach is a bit artificial. We could have used DISTINCT and avoid START WITH clause completely.  CTEs x and y are used to simulate ROW_NUMBER analytic function.

WITH x AS (
SELECT deptno, sal
FROM scott.emp
ORDER BY 1,2
), y AS (
SELECT x.*, ROWNUM rn
FROM x
)
SELECT deptno, sal
FROM y
WHERE CONNECT_BY_ISLEAF=1
CONNECT BY deptno=PRIOR deptno
       AND rn=PRIOR rn+1
START WITH (deptno, rn) IN (SELECT deptno, MIN(rn)
                            FROM y
                            GROUP BY deptno);

Solution #7. Using MODEL clause with ROW_NUMBER function

This method is pretty much the same as in the Solution #4 above. The RETURN UPDATED ROWS and dummy measures are used to only return rows with rn=1.

SELECT deptno, max_sal
FROM scott.emp
MODEL
RETURN UPDATED ROWS
PARTITION BY (deptno)
DIMENSION BY (ROW_NUMBER() OVER(PARTITION BY deptno ORDER BY sal DESC) rn)
MEASURES(sal max_sal, 0 dummy)
RULES(
 dummy[1]=1
)
ORDER BY 1;

The following 5 solutions (##8-12) satisfy the “added complexity” term and do NOT use any functions at all.

Solution #8. Using ALL predicate

Generally speaking, >=ALL filter is identical to =(SELECT MAX() …). See my book for more detailed explanations.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE sal>=ALL(SELECT sal
               FROM scott.emp
               WHERE deptno=a.deptno)
GROUP BY deptno, sal
ORDER BY 1;

Solution #9. Using NOT EXISTS predicate

See Chapter 10 of my book for details.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE NOT EXISTS(SELECT 1
                 FROM scott.emp
                 WHERE deptno=a.deptno
                   AND sal>a.sal)
GROUP BY deptno, sal
ORDER BY 1;

Solution #10. Using Outer-Join with IS NULL filter

This approach is also covered very deeply in my book, Chapter 10.

SELECT a.deptno, a.sal max_sal
FROM scott.emp a LEFT JOIN scott.emp b ON a.deptno=b.deptno
                                      AND b.sal>a.sal 
WHERE b.empno IS NULL
GROUP BY a.deptno, a.sal
ORDER BY 1;

Solution #11. Using MINUS and ANY predicate

MINUS serves 2 purposes: it removes non-top rows and eliminates duplicates, so no DISTINCT option (or GROUP BY) is required.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE sal<ANY(SELECT sal 
              FROM scott.emp
              WHERE deptno=a.deptno);

Solution #12. Using MINUS and EXISTS predicate

Last two approaches covered in the drill from the Chapter 10 of my book.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE EXISTS(SELECT 1 
             FROM scott.emp
             WHERE deptno=a.deptno
               AND sal>a.sal);

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

3 Solutions to 2018 Oracle SQL Puzzle of the Week #3

2018 Puzzle of the Week #3:

Exact Coin Change Puzzle.

Suppose that you are a sales person at a cash register and you have one purchase to serve before you close. A buyer has to pay X dollars and N cents with bills only (no coins). You have lots of bills of various nomination and limited number of coins: 3 quarters, 9 dimes, 19 nickels, and 4 pennies left in the register. You are required to give the exact change (between 1 and 99 cents) using smallest number of (available) coins.

  • Use a single SELECT statement
  • The result should return 1 row and 4 columns indicating how many coins of each type to use
  • 1 Quarter = 25 cents; 1 Dime = 10 cents; 1 Nickel = 5 cents

Sample result for a change of 63 cents:

 
  Quarters      Dimes    Nickels    Pennies
---------- ---------- ---------- ----------
         2          1          0          3

Solutions:

Solution #1: Using Math formula and MODEL clause:

For American coins one can rely on a mathematical formula to get the smallest number of coins for exact change:

Quarters: FLOOR of [Change Amount]/25
Dimes: FLOOR(([Change Amount] – 25*[Quarters])/10)
Nickels: FLOOR(([Change Amount] – 25*[Quarters]-10*[Dimes])/5)
Pennies: [Change Amount] – 25*[Quarters]-10*[Dimes] – 5*[Nickels]

One of the easiest ways to implement this strategy is to employ the MODEL clause:

WITH m AS (
SELECT 63 AS cents
FROM dual 
)
SELECT cents "Change", 
       Q "Quarters", 
       D "Dimes", 
       N "Nickels", 
       P "Pennies"
FROM m
MODEL
DIMENSION BY(0 AS dummy)
MEASURES(
 cents,
 CAST(0 AS NUMBER(3)) AS Q,
 CAST(0 AS NUMBER(3)) AS D,
 CAST(0 AS NUMBER(3)) AS N,
 CAST(0 AS NUMBER(3)) AS P
)
RULES (
 Q[0]=FLOOR(CENTS[0]/25),
 D[0]=FLOOR((CENTS[0]-Q[0]*25)/10),
 N[0]=FLOOR((CENTS[0]-Q[0]*25-D[0]*10)/5),
 P[0]=(CENTS[0]-Q[0]*25-D[0]*10-N[0]*5)
)

Result:

Change Quarters Dimes Nickels Pennies
63 2 1 0 3

If we want to extend this solution to see the change combinations for all values from 1 to 99, we will need to change the above solution as follows:

WITH m AS (
SELECT LEVEL cents
FROM dual 
CONNECT BY LEVEL<=99
)
SELECT cents "Change", 
       Q "Quarters", 
       D "Dimes", 
       N "Nickels", 
       P "Pennies"
FROM m
MODEL
PARTITION BY(ROWNUM AS rn)
DIMENSION BY(0 AS dummy)
MEASURES(
 cents,
 CAST(0 AS NUMBER(3)) AS Q,
 CAST(0 AS NUMBER(3)) AS D,
 CAST(0 AS NUMBER(3)) AS N,
 CAST(0 AS NUMBER(3)) AS P
)
RULES (
 Q[0]=FLOOR(CENTS[0]/25),
 D[0]=FLOOR((CENTS[0]-Q[0]*25)/10),
 N[0]=FLOOR((CENTS[0]-Q[0]*25-D[0]*10)/5),
 P[0]=(CENTS[0]-Q[0]*25-D[0]*10-N[0]*5)
)
ORDER BY 1

Result:

Change Quarters Dimes Nickels Pennies
1 0 0 0 1
2 0 0 0 2
3 0 0 0 3
4 0 0 0 4
5 0 0 1 0
6 0 0 1 1
7 0 0 1 2
8 0 0 1 3
9 0 0 1 4
10 0 1 0 0
95 3 2 0 0
96 3 2 0 1
97 3 2 0 2
98 3 2 0 3
99 3 2 0 4

Solution #2: Using Enhanced Math formula:

It’s easy to see that the MOD function is very handy in determining the number of coins other than quarters (the largest):

WITH a AS (
SELECT 63 cents
FROM dual
)
SELECT a.cents "Change",
       FLOOR(a.cents/25) "Quarters", 
       FLOOR(MOD(a.cents,25)/10) "Dimes",
       FLOOR(MOD(MOD(a.cents,25),10)/5) "Nickels",
       MOD(MOD(MOD(a.cents,25),10),5) "Pennies"
FROM a

Alternatively, we can see coin combinations for all change amounts from 1 to 99 cents:

WITH a AS (
SELECT LEVEL cents
FROM dual
CONNECT BY LEVEL<100
)
SELECT a.cents "Change",
       FLOOR(a.cents/25) "Quarters", 
       FLOOR(MOD(a.cents,25)/10) "Dimes",
       FLOOR(MOD(MOD(a.cents,25),10)/5) "Nickels",
       MOD(MOD(MOD(a.cents,25),10),5) "Pennies"
FROM a
ORDER BY a.cents

Solution #3: Using Cartesian Product and Top Record pattern approach:

If we did not know the exact math formula, we could still count on the brute force approach – go over all possible coin permutations (Cartesian product) that sum up to the required total amount and then chose the combination with the fewest number of coins (top record pattern):

WITH r AS (
SELECT LEVEL-1 n
FROM dual
CONNECT BY LEVEL<=20
), x AS (
SELECT q.n "Quarters", d.n "Dimes", n.n "Nickels", p.n "Pennies",
 RANK() OVER(ORDER BY q.n + d.n + n.n + p.n) rk
FROM r q, r d, r n, r p
WHERE q.n<=3
 AND d.n<=9
 AND n.n<=19 --not needed
 AND p.n<=4
 AND q.n*25 + d.n*10 + n.n*5 + p.n = 63 --amount of change
)
SELECT "Quarters", "Dimes", "Nickels", "Pennies"
FROM x
WHERE rk=1

If we want to extend this solution to see the change combinations for all values from 1 to 99, we will need to change the above solution as follows:

WITH r AS ( -- this range is to be reused 5 times in this query
SELECT LEVEL-1 n
FROM dual
CONNECT BY LEVEL<=100
), x AS (
SELECT c.n "Change", q.n "Quarters", d.n "Dimes", 
       n.n "Nickels", p.n "Pennies",
       RANK() OVER(PARTITION BY c.n ORDER BY q.n + d.n + n.n + p.n) rk
FROM r q, r d, r n, r p, r c
WHERE q.n<=3
 AND d.n<=9
 AND n.n<=19 --now it is needed
 AND p.n<=4  AND q.n*25 + d.n*10 + n.n*5 + p.n = c.n --amount of change  
 AND c.n>0
)
SELECT "Change", "Quarters", "Dimes", "Nickels", "Pennies"
FROM x
WHERE rk=1
ORDER BY 1

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

5 Solutions to 2018 Oracle SQL Puzzle of the Week #1

2018 Puzzle of the Week #1:

For a given text string, find the first (from the beginning) longest sub-string that does not have repeating characters.

Solutions:

Solution #1: Using CONNECT BY clause (for range generation), REGEXP_COUNT, and RANK() functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
), x AS ( 
SELECT SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1) substr, 
       RANK() OVER(ORDER BY r2.rn - r1.rn DESC, r1.rn) rk 
FROM r r1, r r2, w 
WHERE r1.rn<=r2.rn 
 AND REGEXP_COUNT(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1), '(.).*\1') = 0 
) 
SELECT substr 
FROM x 
WHERE rk=1

Result of execution in Oracle Live SQL client:

SUBSTR
rkans

Solution #2: Using CONNECT BY clause (for range generation), REGEXP_LIKE, and MAX() KEEP functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
) 
SELECT MAX(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1)) 
 KEEP(DENSE_RANK FIRST ORDER BY r2.rn - r1.rn DESC, r1.rn) substr 
FROM r r1, r r2, w 
WHERE r1.rn<=r2.rn 
 AND NOT REGEXP_LIKE(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1), '(.).*\1')

Solution #3: Using CONNECT BY clause (twice), LATERAL view, REGEXP_COUNT, and RANK() functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), s AS ( 
SELECT SUBSTR(word, LEVEL) word, LEVEL rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
) 
SELECT MAX(x.substr) 
       KEEP(DENSE_RANK FIRST ORDER BY LENGTH(x.substr) DESC, s.rn) substr 
FROM s, LATERAL(SELECT SUBSTR(s.word, 1, LEVEL) substr 
                FROM dual 
                CONNECT BY LEVEL<=LENGTH(s.word)) x 
WHERE REGEXP_COUNT(x.substr, '(.).*\1') = 0

Solution #4: Using XMLTable function (for range generation), Correlated subquery with COUNT(DISTINCT), and MAX() KEEP function:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn, word
FROM w, XMLTABLE('for $i in 1 to $N cast as xs:integer return $i' 
                 PASSING LENGTH(w.word) AS N) x
) 
SELECT MAX(SUBSTR(r1.word, r1.rn, r2.rn - r1.rn + 1))
 KEEP(DENSE_RANK FIRST ORDER BY r2.rn - r1.rn DESC, r1.rn) substr 
FROM r r1, r r2
WHERE r1.rn<=r2.rn 
 AND r2.rn - r1.rn + 1 = 
 (SELECT COUNT(DISTINCT SUBSTR(SUBSTR(r1.word, r1.rn, r2.rn - r1.rn + 1), 
                               LEVEL, 1)) 
 FROM dual 
 CONNECT BY LEVEL<=r2.rn - r1.rn + 1 
 )

Solution #5: Using CONNECT BY, Recursive CTE, INSTR, SUBSTR, and MAX() KEEP functions:

WITH w AS (
 SELECT 'arkansas' word
 FROM dual
), s(sub, word, lvl, rn) AS (
SELECT SUBSTR(word, LEVEL, 1), SUBSTR(word, LEVEL) word, 1, ROWNUM
FROM w
CONNECT BY SUBSTR(word, LEVEL) IS NOT NULL
UNION ALL
SELECT SUBSTR(word, 1, lvl+1), word, lvl+1, ROWNUM
FROM s
WHERE LENGTH(SUBSTR(word, 1, lvl+1))=lvl+1
 AND INSTR(sub, SUBSTR(word, lvl+1, 1))=0
)
SELECT MAX(sub) KEEP (DENSE_RANK FIRST ORDER BY lvl DESC, rn) substr
FROM s

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

 

 

 

 

 

Three Strategies for replacing multiple spaces with single ones.

Sometimes, we get to work with a text that has multiple sequential space characters that we don’t need, and hence need to remove. There are many ways to accomplish the task in pure Oracle SQL. Below, you can find 3 of them that I believe are worth mentioning.

We will be removing undesired spaces from the following character string:

Three   Strategies    for    replacing    multiple         spaces

Strategy #1: Regular Expressions

Of course, every time we need to process some text data, Regular Expressions come to mind. Here, the choice is very natural and straightforward:

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
)
SELECT REGEXP_REPLACE(text, '[ ]+',' ') text
FROM x

Result:

TEXT
----------------------------------------------
Three Strategies for replacing multiple spaces

Regular expression [ ]+    finds all occurrences of sequential spaces and regexp_replace function substitutes each of such occurrences with a single space.

Strategy #2: Triple Replace

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
)
SELECT REPLACE(REPLACE(REPLACE(text, ' ', '| '), ' |'), '|') text
FROM x

Result:

TEXT
----------------------------------------------
Three Strategies for replacing multiple spaces

To see how this trick works, let’s break it down in 3 steps:

Step 1: Replace Spaces with Pipe-Space combination:

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
)
SELECT REPLACE(text, ' ', '| ') text
FROM x

Result:

Three| | | Strategies| | | | for| | | | replacing| | | | multiple| | | | | | | | | spaces

Step 2: Remove all Space-Pipe combinations

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
)
SELECT REPLACE(REPLACE(text, ' ', '| '), ' |') text
FROM x

Result:

TEXT
---------------------------------------------------
Three| Strategies| for| replacing| multiple| spaces

Note, that we first replaced each space with PIPE-SPACE sequence and then removed the opposite order sequence SPACE-PIPE, which left us with non-repeated occurrences of PIPE-Space combinations.

Step 3: Remove PIPE characters

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
)
SELECT REPLACE(REPLACE(REPLACE(text, ' ', '| '), ' |'), '|') text
FROM x

Result:

TEXT
----------------------------------------------
Three Strategies for replacing multiple spaces

Strategy #3: Recursive WITH Clause

WITH x AS ( 
SELECT 'Three Strategies for replacing multiple spaces' AS text 
FROM dual 
), y(text) AS (
SELECT REPLACE(x.text, '  ', ' ')
FROM x
UNION ALL
SELECT REPLACE(y.text, '  ', ' ')
FROM y
WHERE INSTR(y.text, '  ')>0
)
SELECT text
FROM y
WHERE INSTR(y.text, '  ')=0

Result:

TEXT
----------------------------------------------
Three Strategies for replacing multiple spaces

The idea is to replace two-space combination with a single space until no more two-space sequence will be left in the text.

Final thoughts

What will change if we need to process not a single text value but a table/collection?

Let’s examine how each of the above strategies will work:

Strategy #1: Regular Expressions

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
UNION ALL
SELECT 'One    more    string with     spaces   '
FROM dual
)
SELECT REGEXP_REPLACE(text, '[ ]+',' ') text
FROM x

Result:

TEXT
-----------------------------------------------
Three Strategies for replacing multiple spaces
One more string with spaces

Strategy #2: Triple Replace

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
UNION ALL
SELECT 'One    more    string with     spaces   '
FROM dual
)
SELECT REPLACE(REPLACE(REPLACE(text, ' ', '| '), ' |'), '|') text
FROM x

Result:

TEXT
-----------------------------------------------
Three Strategies for replacing multiple spaces
One more string with spaces

So far, so good.

Strategy #3: Recursive WITH Clause

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
UNION ALL
SELECT 'One    more    string with     spaces   '
FROM dual
), y(text) AS (
SELECT REPLACE(x.text, '  ', ' ')
FROM x
UNION ALL
SELECT REPLACE(y.text, '  ', ' ')
FROM y
WHERE INSTR(y.text, '  ')>0
)
SELECT text
FROM y
WHERE INSTR(y.text, '  ')=0

Result:

TEXT
-----------------------------------------------
One more string with spaces
Three Strategies for replacing multiple spaces

All is good except for the order – the line with the smaller number of “double” spaces came first as it was first cleaned up. To preserve the original sort order we will need to make a change:

WITH x AS (
SELECT 'Three   Strategies    for    replacing    multiple         spaces'  AS text
FROM dual
UNION ALL
SELECT 'One    more    string with     spaces   '
FROM dual
), y(text, rn) AS (
SELECT REPLACE(x.text, '  ', ' '), ROWNUM 
FROM x
UNION ALL
SELECT REPLACE(y.text, '  ', ' '), y.rn
FROM y
WHERE INSTR(y.text, '  ')>0
)
SELECT text
FROM y
WHERE INSTR(y.text, '  ')=0
ORDER BY rn

Result:

TEXT
-----------------------------------------------
Three Strategies for replacing multiple spaces
One more string with spaces

Now, it’s all good!

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

How to generate a list of first N binary numbers in Oracle SQL?

In my recent post I showed how to convert a decimal number (i.e. an integer) into a binary string. We can build upon that technique to answer the question:

WITH x AS (
SELECT LEVEL n
FROM dual
CONNECT BY LEVEL<=50
)
SELECT x.N, y.bin
FROM x, LATERAL (SELECT LISTAGG(SIGN(BITAND(x.N, POWER(2,LEVEL-1))),'') 
                        WITHIN GROUP(ORDER BY LEVEL DESC) bin
                 FROM dual
                 CONNECT BY POWER(2, LEVEL-1)<=x.N) y

Note the LATERAL keyword (Oracle 12c new feature) that enables us to reference “x” in the inline view “y”. In pre-12c world, we would have to use TABLE/CAST/MULTISET function composition to achieve the same result:

WITH x AS (
SELECT LEVEL n
FROM dual
CONNECT BY LEVEL<=50
)
SELECT x.N, y.column_value bin
FROM x, TABLE(CAST(MULTISET(
          SELECT LISTAGG(SIGN(BITAND(x.N, POWER(2,LEVEL-1))),'') 
                 WITHIN GROUP(ORDER BY LEVEL DESC) bin
          FROM dual
          CONNECT BY POWER(2, LEVEL-1)<=x.N) AS sys.odcivarchar2list)) y

The idea used in the following query is based on a totally different approach. It builds a string of “0”s and “1”s in a loop until its length reaches a desired value:

WITH x(v, n) AS (
SELECT column_value, 1
FROM TABLE(sys.odcivarchar2list('0','1'))
UNION ALL
SELECT x.v || t.column_value, x.n+1
FROM TABLE(sys.odcivarchar2list('0','1')) t JOIN x on LENGTH(x.v)=n
WHERE n<=CEIL(LOG(2,50))
), y AS (
SELECT NVL(LTRIM(x.v,'0'),'0') bin, ROWNUM-1 dec
FROM x
WHERE n=(SELECT MAX(n) FROM x)
)
SELECT *
FROM y
WHERE dec<=50

To better understand the above query, try the following one:

SELECT *                            
FROM TABLE(sys.odcivarchar2list('0','1')), 
     TABLE(sys.odcivarchar2list('0','1')),
     TABLE(sys.odcivarchar2list('0','1')),
     TABLE(sys.odcivarchar2list('0','1'))

If we put enough tables in the Cartesian product and concatenate all column_value columns in a single character string expression, we will achieve our goal. The challenge with this approach is to dynamically change the number of the tables in the FROM clause. This can be simulated in the recursive WITH clause by repeatedly adding more and more collections of bits (0 and 1).

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.