Monday Hiring SQL Puzzle and Lateral View Usage Nuance

Let’s solve a fairly simple SQL problem:

For each department count the number of people hired on Monday

• Use scott.emp table
• Show department number and count columns
• If no employees from a given department was hired on Monday, we should list such
department with 0 in the count column
• Sort the result by the department number

Expected Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

We will start with the in-line scalar subquery approach as it is probably one of the most intuitive:

Strategy #1: In-Line Scalar Subquery

SELECT deptno, (SELECT COUNT(*)
                FROM scott.emp
                WHERE deptno=e.deptno
                AND TO_CHAR(hiredate, 'DY')='MON') mon_hires
FROM scott.emp e
GROUP BY deptno
ORDER BY 1

When you only need a single value/column/expression from a correlated subquery, the in-line subquery in SELECT clause works just fine. If we needed more than one: count and let say total salary, we would need to use LATERAL view (or a completely different approach – see below).

Strategy #2: Lateral View

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires, m.total_sal
FROM x, LATERAL (SELECT COUNT(*) mon_hires, SUM(sal) total_sal
                 FROM scott.emp e
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

So far, all is good. I know that some database developers don’t like using table aliases too much and when an opportunity comes they use ANSI standard JOIN syntax with USING clause. Can it be applied in the lateral view?

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(empno) mon_hires
                 FROM scott.emp JOIN x USING (deptno)
                 WHERE TO_CHAR(hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The syntax is correct but the result is apparently not! What happened?

We replaced the WHERE clause condition of e.deptno=x.deptno with the JOIN x USING(deptno) – it first looked legitimate to me until I realized that we have just added an extra join instead of a reference to an external table (CTE x). Essentially, our last (incorrect) query is the same as the following:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(*) mon_hires
                 FROM scott.emp e, x
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

We simply introduced a new (and unwanted) join on the CTE x and turned the correlated reference (in the WHERE clause) into an old-style joining condition which did not even require the LATERAL view functionality:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, (SELECT COUNT(*) mon_hires
         FROM scott.emp e, x
         WHERE e.deptno=x.deptno
           AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The above examples prove that we need to be very careful when mixing up different techniques and new syntax options.

Finally, the last approach will show that only a single scan of the emp table is needed to get the result:

Strategy #3: Conditional Counting

SELECT deptno, COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

This is the best and incidentally the shortest solution that once again demonstrates the power of conditional counting (aggregation) right in SELECT clause.

Likewise we can also show the total salary of those hired on Monday:

SELECT deptno, 
       COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires,
       SUM(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', sal)) total_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Interview Question: For each department count the number of employees who get no commission.

Interview Question:

Write a single SELECT statement that returns  the number of employees who get no commission broken down by department. (Use scott.emp table)

Level:

Intermediate

Expected Result:

DEPTNO NO_COMM_COUNT
10 3
20 5
30 3

 Solutions

A very typical attempt to solve this problem results in the following query:

SELECT deptno, COUNT(*) no_comm_count
FROM scott.emp
WHERE comm IS NULL OR comm=0
GROUP BY deptno
ORDER BY 1

Yes, the result looks correct, but is the query correct?

The answer is NO! It would become apparent if we had a department where all employees get paid commission, so the number of those who does not would be 0.

Let’s change the requirement a bit – we will show all department and number of employees hired on Friday:

SELECT deptno, COUNT(*) fri_count
FROM scott.emp
WHERE TO_CHAR(hiredate, 'DY')='FRI'
GROUP BY deptno 
ORDER BY 1

The result of this query is clearly not what we want:

DEPTNO FRI_COUNT
30 2

We would expect the following instead:

DEPTNO FRI_COUNT
10 0
20 0
30 2

Why don’t we get the departments 10 and 20? The answer is very simple – because we filter “all” those department rows with our WHERE clause. So how should we work around?

Let’s start with more intuitive but less efficient approaches – we will use the same query as before and UNION it with another query that returns “empty” departments. Essentially, the original problem transforms into a new one – find all department where no employees were hired on Friday.

Strategy #1: Using UNION ALL with multi-column non-correlated subquery:

SELECT deptno, COUNT(*) fri_count 
FROM scott.emp 
WHERE TO_CHAR(hiredate, 'DY')='FRI' 
GROUP BY deptno
UNION ALL
SELECT deptno, 0 fri_count 
FROM scott.emp
WHERE (deptno, 'FRI') NOT IN (SELECT deptno, TO_CHAR(hiredate, 'DY')
                              FROM scott.emp)
GROUP BY deptno
ORDER BY 1
DEPTNO FRI_COUNT
10 0
20 0
30 2

Strategy #2: Using UNION ALL with ALL predicate on correlated subquery:

SELECT deptno, COUNT(*) fri_count
FROM scott.emp
WHERE TO_CHAR(hiredate, 'DY')='FRI'
GROUP BY deptno
UNION ALL 
SELECT deptno, 0 no_comm_count 
FROM scott.emp a
WHERE 'FRI'!=ALL(SELECT TO_CHAR(hiredate, 'DY')
                 FROM scott.emp b
                 WHERE a.deptno=b.deptno) 
GROUP BY deptno
ORDER BY 1

It is apparent that the ALL predicate ensures that no employees were hired on Friday.

Now we will mimic the behavior of the UNION ALL operator using LEFT JOIN:

Strategy #3: Using LEFT JOIN:

SELECT a.deptno, COUNT(DISTINCT b.empno) fri_count
FROM scott.emp a LEFT JOIN scott.emp b ON a.deptno=b.deptno
                                      AND TO_CHAR(b.hiredate, 'DY')='FRI'
GROUP BY a.deptno
ORDER BY 1

COUNT(DISTINCT …) is needed to handle a Cartesian Product as the join by deptno column produces many to many  relationship, i.e. Cartesian product.

Strategy #4: Generic substitution technique for an outer-join using UNION ALL

WITH e AS (
SELECT deptno, COUNT(*) fri_count
FROM scott.emp
WHERE TO_CHAR(hiredate, 'DY') = 'FRI'
GROUP BY deptno
UNION ALL
SELECT deptno, 0
FROM scott.emp
GROUP BY deptno
)
SELECT deptno, MAX(fri_count) fri_count
FROM e
GROUP BY deptno
ORDER BY 1

All the above techniques may look cool but they are clearly an overkill for such a simple problem. There is a simple rule worth remembering:

If you need to conditionally aggregate all records in the table but you fail doing so due to a WHERE clause filter, consider moving the filter into the GROUP function you use in SELECT.

Strategy #5: Conditional Aggregation

SELECT deptno, COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'FRI', 1)) fri_count
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Alternatively, you can use CASE function inside of COUNT. It is especially convenient for our original question/problem, i.e. to count employees who is not paid a commission:

SELECT deptno, COUNT(CASE WHEN LNNVL(comm>0) THEN 1 END) no_comm_count
FROM scott.emp
GROUP BY deptno
ORDER BY 1
DEPTNO NO_COMM_COUNT
10 3
20 5
30 3

This approach is the most efficient as it makes Oracle scanning the emp table only once.

Notice the use of the LNNVL function. You can read more about it in my recent post here.

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

7 Solutions to 2018 Oracle SQL Puzzle of the Week #13

Second Top Employee as of the Start of Employment

List all employees who were 2nd top paid in the entire company as of the time their employment started

  • Use a single SELECT statement only.
  • At the time of employment start the rank of the employee by salary should be 2.
  • Show the top salary at the time when the employee started with the company.
  • We assume that no employees have ever been terminated since day 1.

Expected Result:

ENAME JOB SAL HIREDATE MAX_SAL
WARD SALESMAN 1250 22-FEB-81 1600
BLAKE MANAGER 2850 01-MAY-81 2975
FORD ANALYST 3000 03-DEC-81 5000
SCOTT ANALYST 3000 19-APR-87 5000

Solutions:

Solution #1. Using LATERAL view, RANK and cumulative MAX analytic functions (Oracle 12g+):

SELECT e.ename, e.job, e.hiredate, e.sal, r.max_sal 
FROM scott.emp e, LATERAL(SELECT a.empno,  
                                 RANK() OVER(ORDER BY a.sal DESC) rk, 
                                 MAX(a.sal) OVER() max_sal 
                          FROM scott.emp a 
                          WHERE a.hiredate<=e.hiredate) r 
WHERE e.empno=r.empno  
  AND rk=2 
ORDER BY e.hiredate

Solution #2. Using CTE, cumulative MAX analytic function and a correlated subquery with COUNT to mimic the filter by RANK:

WITH x AS ( 
SELECT ename, job, hiredate, sal, MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp a 
) 
SELECT * 
FROM x 
WHERE 1=(SELECT COUNT(*) 
         FROM scott.emp 
         WHERE hiredate<=x.hiredate 
           AND sal>x.sal) 
ORDER BY hiredate

Solution #3. Using CTE, cumulative MAX analytic function and an in-line scalar subquery in SELECT to mimic the RANK:

WITH x AS ( 
SELECT ename, job, hiredate, sal, MAX(sal)OVER(ORDER BY hiredate) max_sal, 
       (SELECT COUNT(*)+1 
        FROM scott.emp 
        WHERE sal>e.sal  
          AND hiredate<=e.hiredate) rk 
FROM scott.emp e 
) 
SELECT ename, job, hiredate, sal, max_sal 
FROM x 
WHERE rk=2 
ORDER BY hiredate

Solution #4. Using self-join and Cartesian Product with aggregation:

SELECT a.ename, a.job, a.hiredate, a.sal, MAX(b.sal) max_sal 
FROM scott.emp a JOIN scott.emp b ON b.hiredate<=a.hiredate 
                                 AND b.sal>a.sal 
GROUP BY a.ename, a.job, a.hiredate, a.sal 
HAVING COUNT(DISTINCT b.empno)=1 
ORDER BY a.hiredate

Solution #5. Using CTE and cumulative MAX analytic function (twice):

WITH x AS ( 
SELECT ename, job, hiredate, sal, 
       MAX(sal) OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
), y  AS ( 
SELECT ename, job, hiredate, sal, max_sal, MAX(sal) OVER(ORDER BY hiredate) max_sal2 
FROM x 
WHERE sal<max_sal 
) 
SELECT ename, job, hiredate, sal, max_sal 
FROM y 
WHERE sal=max_sal2 
ORDER BY hiredate

Solution #6. Using regular and recursive CTEs, ROWNUM, GREATEST, and CASE functions (no Analytic functions!):

WITH e AS ( 
SELECT ename, job, sal, hiredate 
FROM scott.emp 
ORDER BY hiredate 
), x AS ( 
SELECT ename, job, sal, hiredate, ROWNUM rn 
FROM e 
), y(max_sal, sal2, rn) AS ( 
SELECT sal, 0, 1 
FROM x 
WHERE rn=1 
UNION ALL 
SELECT GREATEST(x.sal, y.max_sal) AS max_sal, 
       CASE WHEN x.sal>y.max_sal THEN y.max_sal 
            WHEN x.sal>y.sal2 AND x.sal<=y.max_sal THEN x.sal  
            ELSE y.sal2  
       END AS sal2, 
       x.rn 
FROM x JOIN y ON x.rn=y.rn+1 
) 
SELECT x.ename, x.job, x.sal, x.hiredate, y.max_sal 
FROM y JOIN x ON y.rn=x.rn AND y.sal2=x.sal

Solution #7. Using CTE and MODEL clause to mimic Solution #6:

WITH x AS ( 
SELECT * 
FROM scott.emp 
MODEL 
DIMENSION BY (ROW_NUMBER() OVER(ORDER BY hiredate) rn) 
MEASURES(ename, job, sal, hiredate, sal max_sal, 0 sal2) 
RULES( 
    max_sal[rn>1] = GREATEST(max_sal[CV()-1], sal[CV()]), 
    sal2[rn>1] = CASE WHEN sal[CV()]> max_sal[CV()-1] THEN max_sal[CV()-1] 
                      WHEN sal[CV()]> sal2[CV()-1]   
		       AND sal[CV()]<=max_sal[CV()-1] THEN sal[CV()]  
                      ELSE sal2[CV()-1] 
                 END 
     ) 
) 
SELECT ename, job, sal, hiredate, max_sal 
FROM x 
WHERE sal=sal2

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Further Reading:

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

12 Solutions to 2018 Oracle SQL Puzzle of the Week #10

Top Salary Puzzle

Find highest salary in each department without using MAX function

  • Use a single SELECT statement only.
  • For an added complexity (optional): try not using ANY functions at all (neither group, nor analytic, not even scalar)

Expected Result:

DEPTNO MAX_SAL
10 5000
20 3000
30 2850

Solutions:

We will begin with a simpler problem that does allow us using functions.

Solution #1. Using MIN function

Credit to: Boobal Ganesan

MIN function can be seen as an opposite to the MAX, so it is trivial to employ it here:

SELECT deptno, -MIN(-sal) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

This is an “order” based approach that sorts the values within a concatenated string and then uses regular expression to cut the first token.

SELECT deptno,
       REGEXP_SUBSTR(LISTAGG(sal,',') 
                     WITHIN GROUP(ORDER BY sal DESC),'[^,]+',1,1) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #3. Using AVG(…) KEEP() group function

This is another “order” based strategy whete AVG function can be replaced with MIN or any other aggregate function that returns a single value out of a set of identical ones.

SELECT deptno, AVG(sal) KEEP(DENSE_RANK FIRST ORDER BY sal DESC) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #4. Using Analytic function and CTE

ROW_NUMBER is chosen in this approach, though other analytic functions, such as RANK, DENSE_RANK, LEAD, LAG, FIRST_VALUE, etc can be used here (with some changes) as well. ROW_NUMBER is convenient to use as it allows to avoid DISTINCT option.

WITH x AS (
SELECT deptno, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY sal DESC) rn
FROM scott.emp
)
SELECT deptno, sal max_sal
FROM x
WHERE rn=1
ORDER BY 1;

Solution #5. Using MATCH_RECOGNIZE clause

Credit to: KATAYAMA NAOTO

This approach is similar to the previous one if we used LAG analytic function: which would return NULL for the top record.

SELECT deptno, sal max_sal 
FROM scott.emp
MATCH_RECOGNIZE (
PARTITION BY deptno
ORDER BY sal DESC
ALL ROWS PER MATCH
PATTERN (ISNULL)
DEFINE ISNULL AS PREV(ISNULL.sal) IS NULL
);

Solution #6. CONNECT BY and CONNECT_BY_ISLEAF while avoiding Analytic functions

This approach is a bit artificial. We could have used DISTINCT and avoid START WITH clause completely.  CTEs x and y are used to simulate ROW_NUMBER analytic function.

WITH x AS (
SELECT deptno, sal
FROM scott.emp
ORDER BY 1,2
), y AS (
SELECT x.*, ROWNUM rn
FROM x
)
SELECT deptno, sal
FROM y
WHERE CONNECT_BY_ISLEAF=1
CONNECT BY deptno=PRIOR deptno
       AND rn=PRIOR rn+1
START WITH (deptno, rn) IN (SELECT deptno, MIN(rn)
                            FROM y
                            GROUP BY deptno);

Solution #7. Using MODEL clause with ROW_NUMBER function

This method is pretty much the same as in the Solution #4 above. The RETURN UPDATED ROWS and dummy measures are used to only return rows with rn=1.

SELECT deptno, max_sal
FROM scott.emp
MODEL
RETURN UPDATED ROWS
PARTITION BY (deptno)
DIMENSION BY (ROW_NUMBER() OVER(PARTITION BY deptno ORDER BY sal DESC) rn)
MEASURES(sal max_sal, 0 dummy)
RULES(
 dummy[1]=1
)
ORDER BY 1;

The following 5 solutions (##8-12) satisfy the “added complexity” term and do NOT use any functions at all.

Solution #8. Using ALL predicate

Generally speaking, >=ALL filter is identical to =(SELECT MAX() …). See my book for more detailed explanations.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE sal>=ALL(SELECT sal
               FROM scott.emp
               WHERE deptno=a.deptno)
GROUP BY deptno, sal
ORDER BY 1;

Solution #9. Using NOT EXISTS predicate

See Chapter 10 of my book for details.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE NOT EXISTS(SELECT 1
                 FROM scott.emp
                 WHERE deptno=a.deptno
                   AND sal>a.sal)
GROUP BY deptno, sal
ORDER BY 1;

Solution #10. Using Outer-Join with IS NULL filter

This approach is also covered very deeply in my book, Chapter 10.

SELECT a.deptno, a.sal max_sal
FROM scott.emp a LEFT JOIN scott.emp b ON a.deptno=b.deptno
                                      AND b.sal>a.sal 
WHERE b.empno IS NULL
GROUP BY a.deptno, a.sal
ORDER BY 1;

Solution #11. Using MINUS and ANY predicate

MINUS serves 2 purposes: it removes non-top rows and eliminates duplicates, so no DISTINCT option (or GROUP BY) is required.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE sal<ANY(SELECT sal 
              FROM scott.emp
              WHERE deptno=a.deptno);

Solution #12. Using MINUS and EXISTS predicate

Last two approaches covered in the drill from the Chapter 10 of my book.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE EXISTS(SELECT 1 
             FROM scott.emp
             WHERE deptno=a.deptno
               AND sal>a.sal);

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Interview Question: Get top and bottom paid employees in each department

This is a typical interview problem: list all bottom and top paid employees in each department. A preferred solution should not be using UNION or UNION ALL operators.

Please watch this short video to learn a couple of non-obvious techniques and to impress your potential employers on your next job interview.

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

5 Solutions to 2018 Oracle SQL Puzzle of the Week #1

2018 Puzzle of the Week #1:

For a given text string, find the first (from the beginning) longest sub-string that does not have repeating characters.

Solutions:

Solution #1: Using CONNECT BY clause (for range generation), REGEXP_COUNT, and RANK() functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
), x AS ( 
SELECT SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1) substr, 
       RANK() OVER(ORDER BY r2.rn - r1.rn DESC, r1.rn) rk 
FROM r r1, r r2, w 
WHERE r1.rn<=r2.rn 
 AND REGEXP_COUNT(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1), '(.).*\1') = 0 
) 
SELECT substr 
FROM x 
WHERE rk=1

Result of execution in Oracle Live SQL client:

SUBSTR
rkans

Solution #2: Using CONNECT BY clause (for range generation), REGEXP_LIKE, and MAX() KEEP functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
) 
SELECT MAX(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1)) 
 KEEP(DENSE_RANK FIRST ORDER BY r2.rn - r1.rn DESC, r1.rn) substr 
FROM r r1, r r2, w 
WHERE r1.rn<=r2.rn 
 AND NOT REGEXP_LIKE(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1), '(.).*\1')

Solution #3: Using CONNECT BY clause (twice), LATERAL view, REGEXP_COUNT, and RANK() functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), s AS ( 
SELECT SUBSTR(word, LEVEL) word, LEVEL rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
) 
SELECT MAX(x.substr) 
       KEEP(DENSE_RANK FIRST ORDER BY LENGTH(x.substr) DESC, s.rn) substr 
FROM s, LATERAL(SELECT SUBSTR(s.word, 1, LEVEL) substr 
                FROM dual 
                CONNECT BY LEVEL<=LENGTH(s.word)) x 
WHERE REGEXP_COUNT(x.substr, '(.).*\1') = 0

Solution #4: Using XMLTable function (for range generation), Correlated subquery with COUNT(DISTINCT), and MAX() KEEP function:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn, word
FROM w, XMLTABLE('for $i in 1 to $N cast as xs:integer return $i' 
                 PASSING LENGTH(w.word) AS N) x
) 
SELECT MAX(SUBSTR(r1.word, r1.rn, r2.rn - r1.rn + 1))
 KEEP(DENSE_RANK FIRST ORDER BY r2.rn - r1.rn DESC, r1.rn) substr 
FROM r r1, r r2
WHERE r1.rn<=r2.rn 
 AND r2.rn - r1.rn + 1 = 
 (SELECT COUNT(DISTINCT SUBSTR(SUBSTR(r1.word, r1.rn, r2.rn - r1.rn + 1), 
                               LEVEL, 1)) 
 FROM dual 
 CONNECT BY LEVEL<=r2.rn - r1.rn + 1 
 )

Solution #5: Using CONNECT BY, Recursive CTE, INSTR, SUBSTR, and MAX() KEEP functions:

WITH w AS (
 SELECT 'arkansas' word
 FROM dual
), s(sub, word, lvl, rn) AS (
SELECT SUBSTR(word, LEVEL, 1), SUBSTR(word, LEVEL) word, 1, ROWNUM
FROM w
CONNECT BY SUBSTR(word, LEVEL) IS NOT NULL
UNION ALL
SELECT SUBSTR(word, 1, lvl+1), word, lvl+1, ROWNUM
FROM s
WHERE LENGTH(SUBSTR(word, 1, lvl+1))=lvl+1
 AND INSTR(sub, SUBSTR(word, lvl+1, 1))=0
)
SELECT MAX(sub) KEEP (DENSE_RANK FIRST ORDER BY lvl DESC, rn) substr
FROM s

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

 

 

 

 

 

Interview Question: Get the 2nd highest salary in the company

Question: Get the 2nd highest salary in the company.

Question level: Intermediate

We picked just 6 workarounds for this fairly simple problem. Overall, there are at least 15 different approaches available.

Method/Workaround #1 (Level: Beginner)

SELECT MAX(SAL)
FROM (SELECT SAL
      FROM EMP
      WHERE SAL<(SELECT MAX(SAL) 
                 FROM EMP)
     )

Method/Workaround #2 (Level: Intermediate)

SELECT sal
FROM (SELECT sal, ROWNUM rn
      FROM (SELECT SAL
            FROM EMP
            GROUP BY SAL
            ORDER BY 1 DESC)
      WHERE ROWNUM<=2
      )
WHERE RN=2 

Method/Workaround #3 (Level: Intermediate)

SELECT DISTINCT sal
FROM (SELECT SAL, DENSE_RANK()OVER(ORDER BY SAL DESC) RK
      FROM EMP)
WHERE RK=2      

Method/Workaround #4 (Level: Intermediate)

SELECT SAL
FROM (SELECT DISTINCT SAL, DENSE_RANK()OVER(ORDER BY SAL DESC) RK
      FROM EMP)
WHERE RK=2     

Method/Workaround #5 (Level: Advanced)

SELECT A.SAL
FROM EMP A JOIN EMP B ON A.SAL<=b.SAL
GROUP BY a.sal
HAVING COUNT(DISTINCT b.sal)=2

Method/Workaround #6 (Level: Intermediate)

SELECT DISTINCT SAL
FROM emp a
WHERE 2=(SELECT COUNT(DISTINCT sal)
         FROM emp b
         WHERE b.sal>=a.sal) 

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Interview Question: How to Retrieve Unique Values without Using DISTINCT keyword

Question: List unique jobs (from emp table) without using DISTINCT keyword

Question Level: Beginner+

We picked 5 different methods (workarounds) to show in this post. It is possible to produce at least 15 if you read some other posts in this blog.

Method/Workaround #1: Use GROUP BY (Level: Beginner)

SELECT job
FROM emp
GROUP BY job

Method/Workaround #2: Use Analytical functions with a subquery (Level: Intermediate)

SELECT job
FROM(SELECT job, ROW_NUMBER()OVER(PARTITION BY job ORDER BY 1) rn
     FROM emp) 
WHERE rn=1

Method/Workaround #3: Use correlated subquery (Level: Intermediate)

SELECT job
FROM emp a
WHERE empno=(SELECT MAX(empno) 
             FROM emp
             WHERE job=a.job)

Method/Workaround #4: Use left join with IS NULL filter (Level: Advanced)

SELECT a.job
FROM emp a LEFT JOIN emp b ON a.job=b.job AND a.empno<b.empno
WHERE b.empno IS NULL

Method/Workaround #5: Use NOT EXIST (Level: Intermediate)

SELECT job
FROM emp a
WHERE NOT EXISTS(SELECT 1
                 FROM emp b
                 WHERE a.job=b.job 
                   AND b.empno>a.empno)

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

15 Workarounds for Getting Top Records

To illustrate the concept we will be solving the following problem defined for scott schema:

Find all top paid employees in each department. Display employee names, salaries, jobs, and department.

To qualify for a workaround, a solution’s execution plan should have a distinct hash value (More on that can be found in my book “Oracle SQL Tricks and Workarounds”).

Workaround #1: Correlated subquery

SELECT ename, job, sal, deptno
FROM emp a
WHERE sal=(SELECT MAX(sal)
           FROM emp b
           WHERE b.deptno=a.deptno);

Result:

ENAME      JOB              SAL     DEPTNO
---------- --------- ---------- ----------
BLAKE      MANAGER         2850         30
SCOTT      ANALYST         3000         20
KING       PRESIDENT       5000         10
FORD       ANALYST         3000         20

Execution Plan:

Plan hash value: 1245077725

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |     1 |    47 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN           |         |     1 |    47 |     8  (25)| 00:00:01 |
|   2 |   VIEW               | VW_SQ_1 |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |         |     3 |    21 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP     |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | EMP     |    14 |   294 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Workaround #2: Correlated subquery with arithmetic transformation

SELECT ename, job, sal, deptno
FROM emp a
WHERE 0=(SELECT MAX(b.sal)-a.sal
         FROM emp b
         WHERE b.deptno=a.deptno)

Execution Plan:

Plan hash value: 2649664444

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    21 |    24   (0)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  4 |    TABLE ACCESS FULL| EMP  |     5 |    35 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Workaround #3: Non-Correlated subquery

SELECT ename, job, sal, deptno
FROM emp a
WHERE (deptno, sal) IN (SELECT deptno, MAX(sal)
                        FROM emp
                        GROUP BY deptno)

Execution Plan:

Plan hash value: 2491199121

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     1 |    47 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN SEMI      |          |     1 |    47 |     8  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | EMP      |    14 |   294 |     3   (0)| 00:00:01 |
|   3 |   VIEW               | VW_NSO_1 |     3 |    78 |     4  (25)| 00:00:01 |
|   4 |    HASH GROUP BY     |          |     3 |    21 |     4  (25)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMP      |    14 |    98 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Workaround #4: Aggregating over Cartesian Product

SELECT a.ename, a.job, a.sal, a.deptno
FROM emp a, emp b
WHERE a.deptno=b.deptno
GROUP BY a.ename, a.job, a.sal, a.deptno
HAVING a.sal=MAX(b.sal)

Execution Plan:

Plan hash value: 2435006919

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     6 |   168 |     8  (25)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   HASH GROUP BY      |      |     6 |   168 |     8  (25)| 00:00:01 |
|*  3 |    HASH JOIN         |      |    65 |  1820 |     7  (15)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Workaround #5: Outer Join with IS NULL check

SELECT a.ename, a.job, a.sal, a.deptno
FROM emp a LEFT JOIN emp b ON a.deptno=b.deptno
                          AND a.sal<b.sal
WHERE b.empno IS NULL

Execution Plan:

Plan hash value: 1201587841

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    32 |     7  (15)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|*  2 |   HASH JOIN OUTER   |      |     1 |    32 |     7  (15)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| EMP  |    14 |   154 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Workaround #6: Using NOT EXISTS

SELECT ename, job, sal, deptno
FROM emp a
WHERE NOT EXISTS (SELECT 1
                  FROM emp b
                  WHERE b.deptno=a.deptno
                    AND b.sal>a.sal)

Execution Plan:

Plan hash value: 3353202012

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |   140 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN ANTI    |      |     5 |   140 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Synonymous Workaround for #6 (execution plan has the same hash value): Using COUNT(*)=0 Equivalent

SELECT ename, job, sal, deptno
FROM emp a
WHERE 0=(SELECT COUNT(*)
         FROM emp b
         WHERE b.deptno=a.deptno
           AND b.sal>a.sal)

Execution Plan:

Plan hash value: 3353202012

Workaround #7: Using ALL Predicate

SELECT ename, job, sal, deptno
FROM emp a
WHERE a.sal>=ALL(SELECT b.sal
                 FROM emp b
                 WHERE b.deptno=a.deptno)

Execution Plan:

Plan hash value: 2561671593

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    14 |   294 |    24   (0)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     2 |    14 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Workaround #8: Using In-Line View

SELECT a.ename, a.sal, a.deptno
FROM emp a, (SELECT deptno, MAX(sal) max_sal
             FROM emp
             GROUP BY deptno) b
WHERE a.deptno=b.deptno
  AND a.sal=b.max_sal

Execution Plan:

Plan hash value: 269884559

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    39 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN           |      |     1 |    39 |     8  (25)| 00:00:01 |
|   2 |   VIEW               |      |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |     3 |    21 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | EMP  |    14 |   182 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Workaround #9: Using EXISTS Predicate

SELECT ename, job, sal, deptno
FROM emp a
WHERE EXISTS (SELECT 1
              FROM emp b
              WHERE b.deptno=a.deptno
              HAVING a.sal=MAX(b.sal))

Execution Plan:

Plan hash value: 3057787348

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    21 |    24   (0)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL  | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|*  3 |   FILTER             |      |       |       |            |          |
|   4 |    SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  5 |     TABLE ACCESS FULL| EMP  |     5 |    35 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Synonymous Workaround for #9 (execution plan has the same hash value): Using COUNT(*)>0 Equivalent

SELECT ename, job, sal, deptno
FROM emp a
WHERE 0< (SELECT COUNT(1)
          FROM emp b
          WHERE b.deptno=a.deptno
          HAVING a.sal=MAX(b.sal))

Execution Plan:

Plan hash value: 3057787348

Here is a practical example which happens to qualify as another Synonymous Workaround for #9:

SELECT ename, job, sal, deptno
FROM emp a
WHERE NOT EXISTS (SELECT 1
                  FROM emp b
                  WHERE b.deptno=a.deptno
                  HAVING a.sal<MAX(b.sal))

Execution Plan:

Plan hash value: 3057787348

Workaround #10: Using Analytical Function RANK()

WITH x AS (
SELECT ename, job, sal, deptno,
       RANK()OVER(PARTITION BY deptno ORDER BY sal DESC) rk
FROM emp a
)
SELECT ename, job, sal, deptno
FROM x
WHERE rk=1

Execution Plan:

Plan hash value: 3291446077

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    14 |   728 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |      |    14 |   728 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   294 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Workaround #11: Using Analytical Function MAX

WITH x AS (
SELECT ename, job, sal, deptno,
       MAX(sal)OVER(PARTITION BY deptno) max_sal
FROM emp a
)
SELECT ename, job, sal, deptno
FROM x
WHERE sal=max_sal

Execution Plan:

Plan hash value: 4130734685

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    14 |   728 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |      |    14 |   728 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |      |    14 |   294 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Workaround #12: Using Analytical Function COUNT with CONNECT BY

WITH x AS (
SELECT ename, job, sal, deptno, COUNT(*)OVER(PARTITION BY empno) cnt
FROM emp a
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
)
SELECT ename, job, sal, deptno
FROM x
WHERE cnt=1

Execution Plan:

Plan hash value: 704858046

-----------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |    14 |   728 |     2   (0)| 00:00:01 |
|*  1 |  VIEW                          |        |    14 |   728 |     2   (0)| 00:00:01 |
|   2 |   WINDOW SORT                  |        |    14 |   350 |     2   (0)| 00:00:01 |
|*  3 |    CONNECT BY WITHOUT FILTERING|        |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID| EMP    |    14 |   350 |     2   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN           | PK_EMP |    14 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Workaround #13: Using Analytical Function COUNT with CONNECT BY filtered by LEVEL

WITH x AS (
SELECT ename, job, sal, deptno, COUNT(*)OVER(PARTITION BY empno) cnt
FROM emp a
WHERE level<=2
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
)
SELECT ename, job, sal, deptno
FROM x
WHERE cnt=1

Execution Plan:

Plan hash value: 2668428643

------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |        |    14 |   728 |     2   (0)| 00:00:01 |
|*  1 |  VIEW                           |        |    14 |   728 |     2   (0)| 00:00:01 |
|   2 |   WINDOW SORT                   |        |    14 |   350 |     2   (0)| 00:00:01 |
|*  3 |    FILTER                       |        |       |       |            |          |
|*  4 |     CONNECT BY WITHOUT FILTERING|        |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| EMP    |    14 |   350 |     2   (0)| 00:00:01 |
|   6 |       INDEX FULL SCAN           | PK_EMP |    14 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Workaround #14: CONNECT BY with GROUP BY and HAVING

SELECT ename, job, sal, deptno
FROM emp a
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
GROUP BY ename, job, sal, deptno
HAVING COUNT(*)=1

Execution Plan:

Plan hash value: 2144516570

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  1 |  FILTER                        |      |       |       |            |          |
|   2 |   HASH GROUP BY                |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  3 |    CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   4 |     TABLE ACCESS FULL          | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Workaround #15: GROUP BY and HAVING over CONNECT BY filtered by LEVEL

SELECT ename, job, sal, deptno
FROM emp a
WHERE level<=2
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
GROUP BY ename, job, sal, deptno
HAVING COUNT(*)=1

Execution Plan:

Plan hash value: 1946770371

----------------------------------------------------------------------------------------
| Id  | Operation                       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  1 |  FILTER                         |      |       |       |            |          |
|   2 |   HASH GROUP BY                 |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  3 |    FILTER                       |      |       |       |            |          |
|*  4 |     CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   5 |      TABLE ACCESS FULL          | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

If you want to learn how to come up with numerous workarounds on your own, check my book “Oracle SQL Tricks and Workarounds” for instructions.