Monday Hiring SQL Puzzle and Lateral View Usage Nuance

Let’s solve a fairly simple SQL problem:

For each department count the number of people hired on Monday

• Use scott.emp table
• Show department number and count columns
• If no employees from a given department was hired on Monday, we should list such
department with 0 in the count column
• Sort the result by the department number

Expected Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

We will start with the in-line scalar subquery approach as it is probably one of the most intuitive:

Strategy #1: In-Line Scalar Subquery

SELECT deptno, (SELECT COUNT(*)
                FROM scott.emp
                WHERE deptno=e.deptno
                AND TO_CHAR(hiredate, 'DY')='MON') mon_hires
FROM scott.emp e
GROUP BY deptno
ORDER BY 1

When you only need a single value/column/expression from a correlated subquery, the in-line subquery in SELECT clause works just fine. If we needed more than one: count and let say total salary, we would need to use LATERAL view (or a completely different approach – see below).

Strategy #2: Lateral View

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires, m.total_sal
FROM x, LATERAL (SELECT COUNT(*) mon_hires, SUM(sal) total_sal
                 FROM scott.emp e
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

So far, all is good. I know that some database developers don’t like using table aliases too much and when an opportunity comes they use ANSI standard JOIN syntax with USING clause. Can it be applied in the lateral view?

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(empno) mon_hires
                 FROM scott.emp JOIN x USING (deptno)
                 WHERE TO_CHAR(hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The syntax is correct but the result is apparently not! What happened?

We replaced the WHERE clause condition of e.deptno=x.deptno with the JOIN x USING(deptno) – it first looked legitimate to me until I realized that we have just added an extra join instead of a reference to an external table (CTE x). Essentially, our last (incorrect) query is the same as the following:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, LATERAL (SELECT COUNT(*) mon_hires
                 FROM scott.emp e, x
                 WHERE e.deptno=x.deptno
                   AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

We simply introduced a new (and unwanted) join on the CTE x and turned the correlated reference (in the WHERE clause) into an old-style joining condition which did not even require the LATERAL view functionality:

WITH x AS (
SELECT DISTINCT deptno 
FROM scott.emp
)
SELECT x.deptno, m.mon_hires
FROM x, (SELECT COUNT(*) mon_hires
         FROM scott.emp e, x
         WHERE e.deptno=x.deptno
           AND TO_CHAR(e.hiredate, 'DY')='MON') m
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 1
20 1
30 1

The above examples prove that we need to be very careful when mixing up different techniques and new syntax options.

Finally, the last approach will show that only a single scan of the emp table is needed to get the result:

Strategy #3: Conditional Counting

SELECT deptno, COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES
10 0
20 0
30 1

This is the best and incidentally the shortest solution that once again demonstrates the power of conditional counting (aggregation) right in SELECT clause.

Likewise we can also show the total salary of those hired on Monday:

SELECT deptno, 
       COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', 1)) mon_hires,
       SUM(DECODE(TO_CHAR(hiredate, 'DY'), 'MON', sal)) total_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Result:

DEPTNO MON_HIRES TOTAL_SAL
10 0
20 0
30 1 1250

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Advertisements

7 Solutions to 2018 Oracle SQL Puzzle of the Week #13

Second Top Employee as of the Start of Employment

List all employees who were 2nd top paid in the entire company as of the time their employment started

  • Use a single SELECT statement only.
  • At the time of employment start the rank of the employee by salary should be 2.
  • Show the top salary at the time when the employee started with the company.
  • We assume that no employees have ever been terminated since day 1.

Expected Result:

ENAME JOB SAL HIREDATE MAX_SAL
WARD SALESMAN 1250 22-FEB-81 1600
BLAKE MANAGER 2850 01-MAY-81 2975
FORD ANALYST 3000 03-DEC-81 5000
SCOTT ANALYST 3000 19-APR-87 5000

Solutions:

Solution #1. Using LATERAL view, RANK and cumulative MAX analytic functions (Oracle 12g+):

SELECT e.ename, e.job, e.hiredate, e.sal, r.max_sal 
FROM scott.emp e, LATERAL(SELECT a.empno,  
                                 RANK() OVER(ORDER BY a.sal DESC) rk, 
                                 MAX(a.sal) OVER() max_sal 
                          FROM scott.emp a 
                          WHERE a.hiredate<=e.hiredate) r 
WHERE e.empno=r.empno  
  AND rk=2 
ORDER BY e.hiredate

Solution #2. Using CTE, cumulative MAX analytic function and a correlated subquery with COUNT to mimic the filter by RANK:

WITH x AS ( 
SELECT ename, job, hiredate, sal, MAX(sal)OVER(ORDER BY hiredate) max_sal 
FROM scott.emp a 
) 
SELECT * 
FROM x 
WHERE 1=(SELECT COUNT(*) 
         FROM scott.emp 
         WHERE hiredate<=x.hiredate 
           AND sal>x.sal) 
ORDER BY hiredate

Solution #3. Using CTE, cumulative MAX analytic function and an in-line scalar subquery in SELECT to mimic the RANK:

WITH x AS ( 
SELECT ename, job, hiredate, sal, MAX(sal)OVER(ORDER BY hiredate) max_sal, 
       (SELECT COUNT(*)+1 
        FROM scott.emp 
        WHERE sal>e.sal  
          AND hiredate<=e.hiredate) rk 
FROM scott.emp e 
) 
SELECT ename, job, hiredate, sal, max_sal 
FROM x 
WHERE rk=2 
ORDER BY hiredate

Solution #4. Using self-join and Cartesian Product with aggregation:

SELECT a.ename, a.job, a.hiredate, a.sal, MAX(b.sal) max_sal 
FROM scott.emp a JOIN scott.emp b ON b.hiredate<=a.hiredate 
                                 AND b.sal>a.sal 
GROUP BY a.ename, a.job, a.hiredate, a.sal 
HAVING COUNT(DISTINCT b.empno)=1 
ORDER BY a.hiredate

Solution #5. Using CTE and cumulative MAX analytic function (twice):

WITH x AS ( 
SELECT ename, job, hiredate, sal, 
       MAX(sal) OVER(ORDER BY hiredate) max_sal 
FROM scott.emp  
), y  AS ( 
SELECT ename, job, hiredate, sal, max_sal, MAX(sal) OVER(ORDER BY hiredate) max_sal2 
FROM x 
WHERE sal<max_sal 
) 
SELECT ename, job, hiredate, sal, max_sal 
FROM y 
WHERE sal=max_sal2 
ORDER BY hiredate

Solution #6. Using regular and recursive CTEs, ROWNUM, GREATEST, and CASE functions (no Analytic functions!):

WITH e AS ( 
SELECT ename, job, sal, hiredate 
FROM scott.emp 
ORDER BY hiredate 
), x AS ( 
SELECT ename, job, sal, hiredate, ROWNUM rn 
FROM e 
), y(max_sal, sal2, rn) AS ( 
SELECT sal, 0, 1 
FROM x 
WHERE rn=1 
UNION ALL 
SELECT GREATEST(x.sal, y.max_sal) AS max_sal, 
       CASE WHEN x.sal>y.max_sal THEN y.max_sal 
            WHEN x.sal>y.sal2 AND x.sal<=y.max_sal THEN x.sal  
            ELSE y.sal2  
       END AS sal2, 
       x.rn 
FROM x JOIN y ON x.rn=y.rn+1 
) 
SELECT x.ename, x.job, x.sal, x.hiredate, y.max_sal 
FROM y JOIN x ON y.rn=x.rn AND y.sal2=x.sal

Solution #7. Using CTE and MODEL clause to mimic Solution #6:

WITH x AS ( 
SELECT * 
FROM scott.emp 
MODEL 
DIMENSION BY (ROW_NUMBER() OVER(ORDER BY hiredate) rn) 
MEASURES(ename, job, sal, hiredate, sal max_sal, 0 sal2) 
RULES( 
    max_sal[rn>1] = GREATEST(max_sal[CV()-1], sal[CV()]), 
    sal2[rn>1] = CASE WHEN sal[CV()]> max_sal[CV()-1] THEN max_sal[CV()-1] 
                      WHEN sal[CV()]> sal2[CV()-1]   
		       AND sal[CV()]<=max_sal[CV()-1] THEN sal[CV()]  
                      ELSE sal2[CV()-1] 
                 END 
     ) 
) 
SELECT ename, job, sal, hiredate, max_sal 
FROM x 
WHERE sal=sal2

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Further Reading:

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.