Solutions to Puzzle of the Week #6

Puzzle of the Week #6

Find all employees who is paid one of top 4 salaries in the entire company without using sub-queries and in-line views/WITH clause.

Expected Result:

ENAME             SAL
---------- ----------
KING             5000
FORD             3000
SCOTT            3000
JONES            2975
BLAKE            2850

 

Method/Workaround #1: Use Aggregation over Cartesian Product

This method works in all RDBMS systems, such as Oracle, SQL Server, Sybase, Teradata, etc.

SELECT a.ename, a.sal
FROM emp a, emp b
WHERE a.sal<=b.sal
GROUP BY a.ename, a.sal
HAVING COUNT(DISTINCT b.sal)<=4
ORDER BY 2 DESC;

ENAME             SAL
---------- ----------
KING             5000
FORD             3000
SCOTT            3000
JONES            2975
BLAKE            2850

This method is described in details in my book Oracle SQL Tricks and Workarounds.

Method/Workaround #2: Use DENSE_RANK and MINUS

This method is more Oracle specific, but it is as good as the 1st one. It was suggested by one of the contest participants:

SELECT ename, 
       CASE WHEN DENSE_RANK()OVER(ORDER BY sal DESC)<=4 THEN sal END sal
FROM emp
MINUS
SELECT ename, NULL
FROM emp
ORDER BY 2 DESC;

ENAME             SAL
---------- ----------
KING             5000
FORD             3000
SCOTT            3000
JONES            2975
BLAKE            2850

This is a very elegant variation of the following query with MINUS substituting the filter for analitic function result:

SELECT ename, sal
FROM (SELECT ename, sal, DENSE_RANK()OVER(ORDER BY sal DESC) rk
      FROM emp)
WHERE rk<=4;

ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
FORD             3000
JONES            2975
BLAKE            2850

Interestingly enough, Teradata SQL has a special clause QUALIFY for that matter. It is very elegant and maybe one day it will become an ANSII standard:

SELECT ename, sal
FROM emp
QUALIFY DENSE_RANK()OVER(ORDER BY sal DESC)<=4

Finally, I would like to share a couple of more traditional approaches that DO USE subqueries:

SELECT ename, sal
FROM emp a
WHERE 4>=(SELECT COUNT(DISTINCT b.sal)
          FROM emp b
          WHERE b.sal>=a.sal)
ORDER BY sal DESC;

ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
FORD             3000
JONES            2975
BLAKE            2850
SELECT ename, sal
FROM emp a
WHERE 4>(SELECT COUNT(DISTINCT b.sal)
          FROM emp b
          WHERE b.sal>a.sal)
ORDER BY sal DESC;

ENAME             SAL
---------- ----------
KING             5000
SCOTT            3000
FORD             3000
JONES            2975
BLAKE            2850

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Advertisements

Common rule to create or validate the GROUP BY clause

Level: Intermediate

Do you know that the content of the GROUP BY clause can be derived from the rest of the clauses in SELECT statement?

Well, it is true to a great extent. What can be derived is a combination of columns and expressions that must be included in GROUP BY.

We will start a demonstration with a simple puzzle:

Complete GROUP BY Clause in the following query:

SELECT 'Salesmen Report' AS "Report Title", 
       d.deptno, COUNT(e.empno) cnt
FROM dept d, emp e 
WHERE d.deptno=e.deptno
GROUP BY _______________________________
HAVING e.job=’SALESMAN’
   AND SUM(e.sal)>2000
ORDER BY TO_CHAR(e.hiredate,’YYYY’), SUM(e.sal) DESC;

If you are not sure about the answer, learn the following rule.

Common rule to create or validate the GROUP BY clause:

It should at least include all columns and scalar expressions (constants, aggregate functions, and analytical functions are excluded) referenced in SELECT, HAVING, and ORDER BY clauses. Other columns can be specified as well, but they don’t have to.

Here is a solution to the above puzzle:

SELECT 'Salesmen Report' AS "Report Title", 
       d.deptno, COUNT(e.empno) cnt
FROM dept d, emp e 
WHERE d.deptno=e.deptno
GROUP BY d.deptno, e.job, TO_CHAR(e.hiredate,'YYYY')
HAVING e.job=’SALESMAN’
   AND SUM(e.sal)>2000
ORDER BY TO_CHAR(e.hiredate,’YYYY’), SUM(e.sal) DESC;

Note that logically e.job=’SALESMAN’ condition does not belong to HAVING clause. If we push it to WHERE clause (where it belongs to), the group by clause may change:

SELECT 'Salesmen Report' AS "Report Title", 
       d.deptno, COUNT(e.empno) cnt
FROM dept d, emp e 
WHERE d.deptno=e.deptno AND e.job=’SALESMAN’
GROUP BY d.deptno, TO_CHAR(e.hiredate,'YYYY')
HAVING SUM(e.sal)>2000
ORDER BY TO_CHAR(e.hiredate,’YYYY’), SUM(e.sal) DESC;

Since e.job is no longer in the HAVING clause, we don’t have to put it in GROUP BY, but we CAN. The Rule on GROUP BY clause says that at very least, the columns referenced in SELECT, HAVING, and ORDER BY clauses must be referenced in GROUP BY as well, but there might be other columns.

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Interview Question: Can a Select statement have HAVING clause and no GROUP BY?

Question: Can a Select statement have HAVING clause and no GROUP BY?
Level: Intermediate/Advanced.

I was first asked this question about 20 years ago and I have to admit I did not provide the right answer which seemed to be very counter intuitive.

The correct answer: Yes, it can.

Example:

SELECT SUM(sal)
FROM emp
HAVING COUNT(*)>10

Essentially, when we don’t use GROUP BY clause we treat the entire table as a single group. As with any group, we can reference various aggregate functions in SELECT, HAVING, and ORDER BY clauses.

We can even use some analytical functions as follows:

SELECT COUNT(*), SUM(COUNT(*)) OVER() sm
FROM emp
HAVING COUNT(*)<100

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

15 Workarounds for Getting Top Records

To illustrate the concept we will be solving the following problem defined for scott schema:

Find all top paid employees in each department. Display employee names, salaries, jobs, and department.

To qualify for a workaround, a solution’s execution plan should have a distinct hash value (More on that can be found in my book “Oracle SQL Tricks and Workarounds”).

Workaround #1: Correlated subquery

SELECT ename, job, sal, deptno
FROM emp a
WHERE sal=(SELECT MAX(sal)
           FROM emp b
           WHERE b.deptno=a.deptno);

Result:

ENAME      JOB              SAL     DEPTNO
---------- --------- ---------- ----------
BLAKE      MANAGER         2850         30
SCOTT      ANALYST         3000         20
KING       PRESIDENT       5000         10
FORD       ANALYST         3000         20

Execution Plan:

Plan hash value: 1245077725

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |     1 |    47 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN           |         |     1 |    47 |     8  (25)| 00:00:01 |
|   2 |   VIEW               | VW_SQ_1 |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |         |     3 |    21 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP     |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | EMP     |    14 |   294 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Workaround #2: Correlated subquery with arithmetic transformation

SELECT ename, job, sal, deptno
FROM emp a
WHERE 0=(SELECT MAX(b.sal)-a.sal
         FROM emp b
         WHERE b.deptno=a.deptno)

Execution Plan:

Plan hash value: 2649664444

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    21 |    24   (0)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  4 |    TABLE ACCESS FULL| EMP  |     5 |    35 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Workaround #3: Non-Correlated subquery

SELECT ename, job, sal, deptno
FROM emp a
WHERE (deptno, sal) IN (SELECT deptno, MAX(sal)
                        FROM emp
                        GROUP BY deptno)

Execution Plan:

Plan hash value: 2491199121

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     1 |    47 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN SEMI      |          |     1 |    47 |     8  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | EMP      |    14 |   294 |     3   (0)| 00:00:01 |
|   3 |   VIEW               | VW_NSO_1 |     3 |    78 |     4  (25)| 00:00:01 |
|   4 |    HASH GROUP BY     |          |     3 |    21 |     4  (25)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMP      |    14 |    98 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Workaround #4: Aggregating over Cartesian Product

SELECT a.ename, a.job, a.sal, a.deptno
FROM emp a, emp b
WHERE a.deptno=b.deptno
GROUP BY a.ename, a.job, a.sal, a.deptno
HAVING a.sal=MAX(b.sal)

Execution Plan:

Plan hash value: 2435006919

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     6 |   168 |     8  (25)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   HASH GROUP BY      |      |     6 |   168 |     8  (25)| 00:00:01 |
|*  3 |    HASH JOIN         |      |    65 |  1820 |     7  (15)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Workaround #5: Outer Join with IS NULL check

SELECT a.ename, a.job, a.sal, a.deptno
FROM emp a LEFT JOIN emp b ON a.deptno=b.deptno
                          AND a.sal<b.sal
WHERE b.empno IS NULL

Execution Plan:

Plan hash value: 1201587841

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    32 |     7  (15)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|*  2 |   HASH JOIN OUTER   |      |     1 |    32 |     7  (15)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| EMP  |    14 |   154 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Workaround #6: Using NOT EXISTS

SELECT ename, job, sal, deptno
FROM emp a
WHERE NOT EXISTS (SELECT 1
                  FROM emp b
                  WHERE b.deptno=a.deptno
                    AND b.sal>a.sal)

Execution Plan:

Plan hash value: 3353202012

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     5 |   140 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN ANTI    |      |     5 |   140 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Synonymous Workaround for #6 (execution plan has the same hash value): Using COUNT(*)=0 Equivalent

SELECT ename, job, sal, deptno
FROM emp a
WHERE 0=(SELECT COUNT(*)
         FROM emp b
         WHERE b.deptno=a.deptno
           AND b.sal>a.sal)

Execution Plan:

Plan hash value: 3353202012

Workaround #7: Using ALL Predicate

SELECT ename, job, sal, deptno
FROM emp a
WHERE a.sal>=ALL(SELECT b.sal
                 FROM emp b
                 WHERE b.deptno=a.deptno)

Execution Plan:

Plan hash value: 2561671593

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    14 |   294 |    24   (0)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMP  |     2 |    14 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Workaround #8: Using In-Line View

SELECT a.ename, a.sal, a.deptno
FROM emp a, (SELECT deptno, MAX(sal) max_sal
             FROM emp
             GROUP BY deptno) b
WHERE a.deptno=b.deptno
  AND a.sal=b.max_sal

Execution Plan:

Plan hash value: 269884559

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    39 |     8  (25)| 00:00:01 |
|*  1 |  HASH JOIN           |      |     1 |    39 |     8  (25)| 00:00:01 |
|   2 |   VIEW               |      |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |     3 |    21 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMP  |    14 |    98 |     3   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | EMP  |    14 |   182 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Workaround #9: Using EXISTS Predicate

SELECT ename, job, sal, deptno
FROM emp a
WHERE EXISTS (SELECT 1
              FROM emp b
              WHERE b.deptno=a.deptno
              HAVING a.sal=MAX(b.sal))

Execution Plan:

Plan hash value: 3057787348

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    21 |    24   (0)| 00:00:01 |
|*  1 |  FILTER              |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL  | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
|*  3 |   FILTER             |      |       |       |            |          |
|   4 |    SORT AGGREGATE    |      |     1 |     7 |            |          |
|*  5 |     TABLE ACCESS FULL| EMP  |     5 |    35 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Synonymous Workaround for #9 (execution plan has the same hash value): Using COUNT(*)>0 Equivalent

SELECT ename, job, sal, deptno
FROM emp a
WHERE 0< (SELECT COUNT(1)
          FROM emp b
          WHERE b.deptno=a.deptno
          HAVING a.sal=MAX(b.sal))

Execution Plan:

Plan hash value: 3057787348

Here is a practical example which happens to qualify as another Synonymous Workaround for #9:

SELECT ename, job, sal, deptno
FROM emp a
WHERE NOT EXISTS (SELECT 1
                  FROM emp b
                  WHERE b.deptno=a.deptno
                  HAVING a.sal<MAX(b.sal))

Execution Plan:

Plan hash value: 3057787348

Workaround #10: Using Analytical Function RANK()

WITH x AS (
SELECT ename, job, sal, deptno,
       RANK()OVER(PARTITION BY deptno ORDER BY sal DESC) rk
FROM emp a
)
SELECT ename, job, sal, deptno
FROM x
WHERE rk=1

Execution Plan:

Plan hash value: 3291446077

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |    14 |   728 |     4  (25)| 00:00:01 |
|*  1 |  VIEW                    |      |    14 |   728 |     4  (25)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   294 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Workaround #11: Using Analytical Function MAX

WITH x AS (
SELECT ename, job, sal, deptno,
       MAX(sal)OVER(PARTITION BY deptno) max_sal
FROM emp a
)
SELECT ename, job, sal, deptno
FROM x
WHERE sal=max_sal

Execution Plan:

Plan hash value: 4130734685

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    14 |   728 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |      |    14 |   728 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |      |    14 |   294 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| EMP  |    14 |   294 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Workaround #12: Using Analytical Function COUNT with CONNECT BY

WITH x AS (
SELECT ename, job, sal, deptno, COUNT(*)OVER(PARTITION BY empno) cnt
FROM emp a
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
)
SELECT ename, job, sal, deptno
FROM x
WHERE cnt=1

Execution Plan:

Plan hash value: 704858046

-----------------------------------------------------------------------------------------
| Id  | Operation                      | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |        |    14 |   728 |     2   (0)| 00:00:01 |
|*  1 |  VIEW                          |        |    14 |   728 |     2   (0)| 00:00:01 |
|   2 |   WINDOW SORT                  |        |    14 |   350 |     2   (0)| 00:00:01 |
|*  3 |    CONNECT BY WITHOUT FILTERING|        |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID| EMP    |    14 |   350 |     2   (0)| 00:00:01 |
|   5 |      INDEX FULL SCAN           | PK_EMP |    14 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Workaround #13: Using Analytical Function COUNT with CONNECT BY filtered by LEVEL

WITH x AS (
SELECT ename, job, sal, deptno, COUNT(*)OVER(PARTITION BY empno) cnt
FROM emp a
WHERE level<=2
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
)
SELECT ename, job, sal, deptno
FROM x
WHERE cnt=1

Execution Plan:

Plan hash value: 2668428643

------------------------------------------------------------------------------------------
| Id  | Operation                       | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |        |    14 |   728 |     2   (0)| 00:00:01 |
|*  1 |  VIEW                           |        |    14 |   728 |     2   (0)| 00:00:01 |
|   2 |   WINDOW SORT                   |        |    14 |   350 |     2   (0)| 00:00:01 |
|*  3 |    FILTER                       |        |       |       |            |          |
|*  4 |     CONNECT BY WITHOUT FILTERING|        |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| EMP    |    14 |   350 |     2   (0)| 00:00:01 |
|   6 |       INDEX FULL SCAN           | PK_EMP |    14 |       |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Workaround #14: CONNECT BY with GROUP BY and HAVING

SELECT ename, job, sal, deptno
FROM emp a
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
GROUP BY ename, job, sal, deptno
HAVING COUNT(*)=1

Execution Plan:

Plan hash value: 2144516570

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  1 |  FILTER                        |      |       |       |            |          |
|   2 |   HASH GROUP BY                |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  3 |    CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   4 |     TABLE ACCESS FULL          | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Workaround #15: GROUP BY and HAVING over CONNECT BY filtered by LEVEL

SELECT ename, job, sal, deptno
FROM emp a
WHERE level<=2
CONNECT BY deptno=PRIOR deptno
       AND sal<PRIOR sal
GROUP BY ename, job, sal, deptno
HAVING COUNT(*)=1

Execution Plan:

Plan hash value: 1946770371

----------------------------------------------------------------------------------------
| Id  | Operation                       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  1 |  FILTER                         |      |       |       |            |          |
|   2 |   HASH GROUP BY                 |      |     1 |    21 |     4  (25)| 00:00:01 |
|*  3 |    FILTER                       |      |       |       |            |          |
|*  4 |     CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   5 |      TABLE ACCESS FULL          | EMP  |    14 |   294 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

If you want to learn how to come up with numerous workarounds on your own, check my book “Oracle SQL Tricks and Workarounds” for instructions.