A trick that helps avoiding multiple table scans.

Let’s look at a fairly simple SQL problem:

In a traditional scott.emp table, find all employees who work in the same department as the president.

  • Make your query work even if there are more than 1 president records exist in emp table
  • Make Oracle scan emp table just ONCE

A “traditional” solution to this problem may look like this:

Strategy #1: Using a subquery

SELECT *
FROM scott.emp
WHERE deptno IN (SELECT deptno 
                 FROM scott.emp 
                 WHERE job='PRESIDENT')

or this:

Strategy #2: Using a self-join

SELECT DISTINCT a.*
FROM scott.emp a JOIN scott.emp b ON a.deptno=b.deptno
WHERE b.job='PRESIDENT'

Note, that DISTINCT option in the above query is needed to prevent duplicates if there were multiple presidents in а specific department.

Both solutions above use 2 copies of the emp table which makes oracle scan the same scott.emp table twice.

A trick presented below allows you to use only a single copy of the emp table to solve the problem. The trick involves different conceptual and technical approaches compared to the solutions we have seen so far.

Conceptually, we should rephrase the problem in a way that would keep it identical and at the same time allows us to use different technical arsenal. This approach is explained in a detailed manner in my book “Oracle SQL Tricks and Workarounds”. We can rephrase the puzzle and say that we are looking for employees from departments with some “positive” number of presidents working there. Technically speaking, we need to use analytic function COUNT and check if it is greater than 0:

Strategy #3: Using analytic function COUNT

WITH x AS (
SELECT e.*, 
       COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno) cnt
FROM scott.emp e
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE cnt>0
ORDER BY empno

We do need to use a common table expression as we cannot filter out by analytic function in the same query where the function is used. Nevertheless, we scan the emp table just once, and during this scan, Oracle engine counts the number of presidents in each department.

COUNT is not the only analytic function that can be employed to solve the problem.

Strategy #4: Using analytic function LISTAGG

WITH x AS (
SELECT e.*, 
       LISTAGG(job,'|') 
         WITHIN GROUP (ORDER BY job) OVER(PARTITION BY deptno) jobs
FROM scott.emp e 
)
SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM x
WHERE '|' || jobs || '|' LIKE '%|PRESIDENT|%'
ORDER BY empno

Instead of counting the presidents by department, we simply concatenate all the job titles and check if the resulting string includes a president.

Finally, if you don’t like using sub-queries in general, we can leverage the power of MODEL clause:

Strategy #5: Using MODEL clause to avoid sub-queries

SELECT empno, ename, job, mgr, hiredate, sal, comm, deptno
FROM scott.emp
MODEL RETURN UPDATED ROWS 
DIMENSION BY (
  empno, 
  SIGN(COUNT(DECODE(job,'PRESIDENT',1))OVER(PARTITION BY deptno)) cnt
) 
MEASURES(ename, job, mgr, hiredate, sal, comm, deptno, 0 dummy) 
RULES(dummy[ANY, 1]=1)
ORDER BY empno

The tricky part here is using a composition of SIGN, COUNT, and DECODE functions (i.e. SIGN on top of what we used in Strategy #3) as a secondary dimension and empno as primary. Employee number is unique by itself, so adding another dimension will still maintain uniqueness required by MODEL clause. The only MODEL RULE changes the dummy measure which “triggers” the “RETURN UPDATED ROWS” instruction and returns only those rows where the dummy dimension was set to 1 – notice that its default value is 0.

You can check the execution plan for all of the above strategies to see how many times Oracle scans the emp table.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

 

Advertisements

12 Solutions to 2018 Oracle SQL Puzzle of the Week #10

Top Salary Puzzle

Find highest salary in each department without using MAX function

  • Use a single SELECT statement only.
  • For an added complexity (optional): try not using ANY functions at all (neither group, nor analytic, not even scalar)

Expected Result:

DEPTNO MAX_SAL
10 5000
20 3000
30 2850

Solutions:

We will begin with a simpler problem that does allow us using functions.

Solution #1. Using MIN function

Credit to: Boobal Ganesan

MIN function can be seen as an opposite to the MAX, so it is trivial to employ it here:

SELECT deptno, -MIN(-sal) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

This is an “order” based approach that sorts the values within a concatenated string and then uses regular expression to cut the first token.

SELECT deptno,
       REGEXP_SUBSTR(LISTAGG(sal,',') 
                     WITHIN GROUP(ORDER BY sal DESC),'[^,]+',1,1) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #3. Using AVG(…) KEEP() group function

This is another “order” based strategy whete AVG function can be replaced with MIN or any other aggregate function that returns a single value out of a set of identical ones.

SELECT deptno, AVG(sal) KEEP(DENSE_RANK FIRST ORDER BY sal DESC) max_sal
FROM scott.emp
GROUP BY deptno
ORDER BY 1;

Solution #4. Using Analytic function and CTE

ROW_NUMBER is chosen in this approach, though other analytic functions, such as RANK, DENSE_RANK, LEAD, LAG, FIRST_VALUE, etc can be used here (with some changes) as well. ROW_NUMBER is convenient to use as it allows to avoid DISTINCT option.

WITH x AS (
SELECT deptno, sal, 
       ROW_NUMBER()OVER(PARTITION BY deptno ORDER BY sal DESC) rn
FROM scott.emp
)
SELECT deptno, sal max_sal
FROM x
WHERE rn=1
ORDER BY 1;

Solution #5. Using MATCH_RECOGNIZE clause

Credit to: KATAYAMA NAOTO

This approach is similar to the previous one if we used LAG analytic function: which would return NULL for the top record.

SELECT deptno, sal max_sal 
FROM scott.emp
MATCH_RECOGNIZE (
PARTITION BY deptno
ORDER BY sal DESC
ALL ROWS PER MATCH
PATTERN (ISNULL)
DEFINE ISNULL AS PREV(ISNULL.sal) IS NULL
);

Solution #6. CONNECT BY and CONNECT_BY_ISLEAF while avoiding Analytic functions

This approach is a bit artificial. We could have used DISTINCT and avoid START WITH clause completely.  CTEs x and y are used to simulate ROW_NUMBER analytic function.

WITH x AS (
SELECT deptno, sal
FROM scott.emp
ORDER BY 1,2
), y AS (
SELECT x.*, ROWNUM rn
FROM x
)
SELECT deptno, sal
FROM y
WHERE CONNECT_BY_ISLEAF=1
CONNECT BY deptno=PRIOR deptno
       AND rn=PRIOR rn+1
START WITH (deptno, rn) IN (SELECT deptno, MIN(rn)
                            FROM y
                            GROUP BY deptno);

Solution #7. Using MODEL clause with ROW_NUMBER function

This method is pretty much the same as in the Solution #4 above. The RETURN UPDATED ROWS and dummy measures are used to only return rows with rn=1.

SELECT deptno, max_sal
FROM scott.emp
MODEL
RETURN UPDATED ROWS
PARTITION BY (deptno)
DIMENSION BY (ROW_NUMBER() OVER(PARTITION BY deptno ORDER BY sal DESC) rn)
MEASURES(sal max_sal, 0 dummy)
RULES(
 dummy[1]=1
)
ORDER BY 1;

The following 5 solutions (##8-12) satisfy the “added complexity” term and do NOT use any functions at all.

Solution #8. Using ALL predicate

Generally speaking, >=ALL filter is identical to =(SELECT MAX() …). See my book for more detailed explanations.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE sal>=ALL(SELECT sal
               FROM scott.emp
               WHERE deptno=a.deptno)
GROUP BY deptno, sal
ORDER BY 1;

Solution #9. Using NOT EXISTS predicate

See Chapter 10 of my book for details.

SELECT deptno, sal max_sal
FROM scott.emp a
WHERE NOT EXISTS(SELECT 1
                 FROM scott.emp
                 WHERE deptno=a.deptno
                   AND sal>a.sal)
GROUP BY deptno, sal
ORDER BY 1;

Solution #10. Using Outer-Join with IS NULL filter

This approach is also covered very deeply in my book, Chapter 10.

SELECT a.deptno, a.sal max_sal
FROM scott.emp a LEFT JOIN scott.emp b ON a.deptno=b.deptno
                                      AND b.sal>a.sal 
WHERE b.empno IS NULL
GROUP BY a.deptno, a.sal
ORDER BY 1;

Solution #11. Using MINUS and ANY predicate

MINUS serves 2 purposes: it removes non-top rows and eliminates duplicates, so no DISTINCT option (or GROUP BY) is required.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE sal<ANY(SELECT sal 
              FROM scott.emp
              WHERE deptno=a.deptno);

Solution #12. Using MINUS and EXISTS predicate

Last two approaches covered in the drill from the Chapter 10 of my book.

SELECT deptno, sal max_sal 
FROM scott.emp
MINUS
SELECT deptno, sal
FROM scott.emp a
WHERE EXISTS(SELECT 1 
             FROM scott.emp
             WHERE deptno=a.deptno
               AND sal>a.sal);

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

8 Solutions to 2018 Oracle SQL Puzzle of the Week #9

Recent employment Puzzle

For each location, show 2 most recently hired employees

  • Use a single SELECT statement only.
  • ename1 and hiredate1 columns should correspond the latest hired employee while ename1 and hiredate1 columns – the previous one

Expected Result:

LOC ENAME1 HIREDATE1 ENAME2 HIREDATE2
NEW YORK MILLER 23-JAN-82 KING 17-NOV-81
CHICAGO JAMES 03-DEC-81 MARTIN 28-SEP-81
DALLAS ADAMS 23-MAY-87 SCOTT 19-APR-87

Solutions:

Solution #1. Using Self-Join and MAX functions

SELECT d.loc, 
     MAX(e1.ename) KEEP(DENSE_RANK FIRST ORDER BY e1.hiredate DESC) ename1, 
     MAX(e1.hiredate) hiredate1, 
     MAX(e2.ename) KEEP(DENSE_RANK FIRST ORDER BY e2.hiredate DESC) ename2, 
     MAX(e2.hiredate) hiredate2 
FROM scott.emp e1 JOIN scott.emp e2 ON e1.deptno=e2.deptno 
 AND e1.hiredate>=e2.hiredate 
 AND e1.ROWID!=e2.ROWID 
                  JOIN scott.dept d ON e1.deptno=d.deptno 
GROUP BY d.loc;

Solution #2. Using LISTAGG and REGEXP_SUBSTR functions

SELECT d.loc, 
       MAX(ename) KEEP(DENSE_RANK FIRST ORDER BY hiredate DESC) ename1,
       MAX(hiredate) hiredate1, 
       REGEXP_SUBSTR(LISTAGG(ename, '|') WITHIN GROUP (ORDER BY hiredate DESC),
 '[^|]+',1,2) ename2,
       REGEXP_SUBSTR(LISTAGG(hiredate, '|') WITHIN GROUP (ORDER BY hiredate DESC),
 '[^|]+',1,2) hiredate2
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
GROUP BY d.loc
ORDER BY 1;

Solution #3. Using CTE, ROW_NUMBER, and Self-Join

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT a.loc, a.ename ename1, a.hiredate hiredate1,
              b.ename ename2, b.hiredate hiredate2
FROM x a JOIN x b ON a.loc=b.loc AND a.rn=1 AND b.rn=2;

Solution #4. Using Pivot

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, e1_ename AS ename1, e1_hdate AS hiredate1,
       e2_ename AS ename2, e2_hdate AS hiredate2
FROM x
PIVOT (
MAX(ename) ename, MAX(hiredate) hdate FOR rn IN (1 AS e1, 2 AS e2) 
)
ORDER BY 1;

Solution #5. Simulating Pivot with MAX and DECODE functions

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, 
       MAX(DECODE(rn,1,ename)) ename1, 
       MAX(DECODE(rn,1,hiredate)) hiredate1,
       MAX(DECODE(rn,2,ename)) ename2, 
       MAX(DECODE(rn,2,hiredate)) hiredate2
FROM x
GROUP BY loc
ORDER BY 1;

Solution #6. Using CONNECT BY

WITH x AS (
SELECT d.loc, e.ename, e.hiredate, 
       ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
)
SELECT loc, PRIOR ename ename1, PRIOR hiredate hiredate1, 
       ename ename2, hiredate hriedate2
FROM x
WHERE rn=2
START WITH rn=1
CONNECT BY loc=PRIOR loc
       AND rn=PRIOR rn+1;

Solution #7. Using LEAD and ROW_NUMBER Analytic functions

WITH x AS (
SELECT d.loc, e.ename ename1, e.hiredate hiredate1, 
 LEAD(e.ename,1) OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) ename2,
 LEAD(e.hiredate,1) OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) hiredate2,
 ROW_NUMBER()OVER(PARTITION BY d.deptno ORDER BY e.hiredate DESC) rn
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
) 
SELECT loc, ename1, hiredate1, ename2, hiredate2
FROM x
WHERE rn=1
ORDER BY 1;

Solution #8. Using Model Clause:

SELECT loc, ename1, hiredate1, ename2, hiredate2
FROM scott.emp e JOIN scott.dept d ON e.deptno=d.deptno
MODEL
RETURN UPDATED ROWS
PARTITION BY (d.loc)
DIMENSION BY (
   ROW_NUMBER()OVER(PARTITION BY d.loc ORDER BY e.hiredate DESC) AS rn
)
MEASURES(
    ename AS ename1, hiredate AS hiredate1, 
    ename AS ename2, hiredate AS hiredate2
)
RULES(
    ename2[1]   =ename1[2],
    hiredate2[1]=hiredate1[2]
)
ORDER BY 1;

You can execute the above SQL statements in Oracle Live SQL environment.
My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

7 Solutions to 2018 Oracle SQL Puzzle of the Week #4

2018 Puzzle of the Week #4:

Calculate Mutual Funds’ Performance

For a given table fund_performance (see the CREATE TABLE statement below), calculate each fund’s performance over the 6-month period from Jan-2016 till Jun-2016.

  • Use a single SELECT statement
  • Performance is calculated as a multiplication of all the months’ performance rates for the given time frame
  • The solution should work for any time frame, so treat from-month and to-month as query parameters
  • DDL command:
  • CREATE TABLE fund_performance AS
    SELECT 1 fund_id, '2016-01' perf_month, 1.05 perf_rate
    FROM dual 
    UNION ALL
    SELECT 1, '2016-02', 1.02 FROM dual UNION ALL
    SELECT 1, '2016-03', 0.92 FROM dual UNION ALL
    SELECT 1, '2016-04', 1.01 FROM dual UNION ALL
    SELECT 1, '2016-05', 1.04 FROM dual UNION ALL
    SELECT 1, '2016-06', 0.95 FROM dual UNION ALL
    SELECT 2, '2016-01', 1.04 FROM dual UNION ALL
    SELECT 2, '2016-02', 1.03 FROM dual UNION ALL
    SELECT 2, '2016-03', 0.98 FROM dual UNION ALL
    SELECT 2, '2016-04', 1.04 FROM dual UNION ALL
    SELECT 2, '2016-05', 1.01 FROM dual UNION ALL
    SELECT 2, '2016-06', 0.98 FROM dual;
    

    Expected Result:

    FUND_ID Cumulative Performance
    1 0.98
    2 1.08

Solutions:

Solution #1: Using Math Formula (Sum of Logs = Log of Product)

SELECT fund_id, ROUND(EXP(SUM(LN(perf_rate))),2) "Cumulative Performance" 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-02' AND '2016-05' 
GROUP BY fund_id 
ORDER BY 1

Solution #2: Using Dynamic XML Query with XMLTYPE

(Credit to: Katayama Naoto)

SELECT fund_id,  
       ROUND(TO_NUMBER(EXTRACTVALUE(XMLTYPE(
		dbms_xmlgen.getxml('SELECT '|| LISTAGG(perf_rate,'*') 
                                      WITHIN GROUP(ORDER BY perf_month)||' C 
				    FROM dual')),'/ROWSET/ROW/C')),2) AS "cumulative performance" 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
GROUP BY fund_id 
ORDER BY 1

Solution #3: Using Dynamic XML with XMLQUERY

(Credit to: Boobal Ganesan)

SELECT fund_id, 
       ROUND(TO_NUMBER(XMLQUERY((LISTAGG(perf_rate,'*') 
	             WITHIN GROUP(ORDER BY fund_id)) RETURNING CONTENT)),2) "cumulative performance" 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
GROUP BY fund_id

Solution #4: Using Model Clause with 2 measures

(Credit to: Katayama Naoto)

WITH x AS ( 
SELECT fund_id, cump, flag 
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
MODEL 
PARTITION BY (fund_id) 
DIMENSION BY (ROW_NUMBER()OVER(PARTITION BY fund_id ORDER BY perf_month) AS N) 
MEASURES(perf_rate, 
         CAST(0 AS NUMBER) AS cump, 
         CAST(0 AS NUMBER) AS flag) 
RULES( 
      cump[ANY] ORDER BY N = perf_rate[CV(N)] * NVL(cump[CV(N)-1],1),
      flag[ANY] ORDER BY N = NVL2(perf_rate[CV(N)+1],0,1)
     )
)
SELECT fund_id, ROUND(cump,2) "Cumulative Performance" 
FROM x
WHERE flag=1 
ORDER BY fund_id

Solution #5: Using Model clause with 1 measure

WITH d AS (
SELECT fund_id, perf_month, perf_rate, 
       RANK()OVER(PARTITION BY fund_id ORDER BY perf_month DESC) rk
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
), x AS ( 
SELECT * 
FROM d
MODEL 
  PARTITION BY (fund_id) 
  DIMENSION BY (ROW_NUMBER()OVER(PARTITION BY fund_id ORDER BY perf_month) AS N) 
  MEASURES     (perf_rate, rk, CAST(0 AS NUMBER) AS cump) 
  RULES        (cump[ANY] ORDER BY N = perf_rate[CV(N)] * NVL(cump[CV(N)-1],1) )
)
SELECT fund_id, ROUND(cump,2) "Cumulative Performance" 
FROM x
WHERE rk=1 
ORDER BY fund_id

Solution #6: Using Recursive CTE

WITH d AS (
SELECT fund_id, perf_rate, 
       ROW_NUMBER()OVER(PARTITION BY fund_id ORDER BY perf_month) rn,
       COUNT(*)OVER(PARTITION BY fund_id) cnt
FROM fund_performance 
WHERE perf_month BETWEEN '2016-01' AND '2016-06' 
), x(fund_id, cum_perf, rn, cnt) AS (
SELECT fund_id, perf_rate, 1, cnt
FROM d
WHERE rn=1
UNION ALL
SELECT x.fund_id, x.cum_perf*d.perf_rate, d.rn, d.cnt
FROM x JOIN d ON x.fund_id=d.fund_id
             AND x.rn+1=d.rn
)
SELECT fund_id, ROUND(cum_perf,2) "Cumulative Performance"
FROM x
WHERE rn=cnt

Solution #7: Using 12c new Function based WITH clause

(Credit to: Katayama Naoto)

WITH
FUNCTION product(list IN sys.odcinumberlist) RETURN NUMBER IS
   v_result NUMBER DEFAULT 1;
BEGIN
  FOR i IN list.FIRST .. list.LAST LOOP
      v_result := v_result * list(i);
  END LOOP;
  RETURN v_result;
END;
SELECT fund_id, product(CAST(COLLECT(perf_rate) AS sys.odcinumberlist)) AS "Cumulative Performance"
FROM fund_performance
GROUP BY fund_id
ORDER BY 1

You can execute first 6 of the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

How to generate a list of first N binary numbers in Oracle SQL?

In my recent post I showed how to convert a decimal number (i.e. an integer) into a binary string. We can build upon that technique to answer the question:

WITH x AS (
SELECT LEVEL n
FROM dual
CONNECT BY LEVEL<=50
)
SELECT x.N, y.bin
FROM x, LATERAL (SELECT LISTAGG(SIGN(BITAND(x.N, POWER(2,LEVEL-1))),'') 
                        WITHIN GROUP(ORDER BY LEVEL DESC) bin
                 FROM dual
                 CONNECT BY POWER(2, LEVEL-1)<=x.N) y

Note the LATERAL keyword (Oracle 12c new feature) that enables us to reference “x” in the inline view “y”. In pre-12c world, we would have to use TABLE/CAST/MULTISET function composition to achieve the same result:

WITH x AS (
SELECT LEVEL n
FROM dual
CONNECT BY LEVEL<=50
)
SELECT x.N, y.column_value bin
FROM x, TABLE(CAST(MULTISET(
          SELECT LISTAGG(SIGN(BITAND(x.N, POWER(2,LEVEL-1))),'') 
                 WITHIN GROUP(ORDER BY LEVEL DESC) bin
          FROM dual
          CONNECT BY POWER(2, LEVEL-1)<=x.N) AS sys.odcivarchar2list)) y

The idea used in the following query is based on a totally different approach. It builds a string of “0”s and “1”s in a loop until its length reaches a desired value:

WITH x(v, n) AS (
SELECT column_value, 1
FROM TABLE(sys.odcivarchar2list('0','1'))
UNION ALL
SELECT x.v || t.column_value, x.n+1
FROM TABLE(sys.odcivarchar2list('0','1')) t JOIN x on LENGTH(x.v)=n
WHERE n<=CEIL(LOG(2,50))
), y AS (
SELECT NVL(LTRIM(x.v,'0'),'0') bin, ROWNUM-1 dec
FROM x
WHERE n=(SELECT MAX(n) FROM x)
)
SELECT *
FROM y
WHERE dec<=50

To better understand the above query, try the following one:

SELECT *                            
FROM TABLE(sys.odcivarchar2list('0','1')), 
     TABLE(sys.odcivarchar2list('0','1')),
     TABLE(sys.odcivarchar2list('0','1')),
     TABLE(sys.odcivarchar2list('0','1'))

If we put enough tables in the Cartesian product and concatenate all column_value columns in a single character string expression, we will achieve our goal. The challenge with this approach is to dynamically change the number of the tables in the FROM clause. This can be simulated in the recursive WITH clause by repeatedly adding more and more collections of bits (0 and 1).

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

 

Integer to Binary Conversion in Oracle SQL

Interestingly enough, Oracle does not have a built-in function to convert Decimal numbers (i.e. integers) into Binary. This post offers an elegant way of doing so.

The following script is intended to be executed in SQL*Plus, so it uses some SQL*Plus commands:

column bin format a40
undefine N
SELECT LISTAGG(SIGN(BITAND(&&N, POWER(2,LEVEL-1))),'') 
       WITHIN GROUP(ORDER BY LEVEL DESC) bin
FROM dual
CONNECT BY POWER(2, LEVEL-1)<=&&N;

Result (for N=400):

BIN
-------------
110010000

Result (for N=1401):

BIN
------------
10101111001

Explanation:

How many digits may the resulting binary string have? The answer comes from Math: not more than LOG(2, N) + 1. Let’s first generate a numeric range from 1 to LOG(2,N)+1:

SELECT LEVEL
FROM dual
CONNECT BY LEVEL<=LOG(2,&N)+1

Result (for N=1401):

 LEVEL
------
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11

Alternatively, we can use mathematically equivalent condition in the CONNECT BY clause using POWER instead of LOG function:

SELECT LEVEL
FROM dual
CONNECT BY POWER(2,LEVEL)<=&N*2

or

SELECT LEVEL
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&N

Now, we will check every bit of the desired result (i.e. binary representation of N) by using BITAND function:

SELECT LEVEL, BITAND(&&N, POWER(2,LEVEL-1)) bit
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&&N

Result (for N=12):

LEVEL        BIT
----- ----------
    1          0
    2          0
    3          4
    4          8

Positive values in the bit column refer to a bit 1 in the corresponding position (in reverse order) of the binary value. It’s easy to turn those values to 1 by using SIGN function:

SELECT LEVEL, SIGN(BITAND(&&N, POWER(2,LEVEL-1))) bit
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&&N

Result (for N=12):

LEVEL        BIT
----- ----------
    1          0
    2          0
    3          1
    4          1

Here, we can see that we need to concatenate the values in the bit column in reverse order. This is very easy to do using LISTAGG function:

SELECT LISTAGG(SIGN(BITAND(&&N, POWER(2,LEVEL-1))),'') 
       WITHIN GROUP(ORDER BY LEVEL DESC) bin
FROM dual
CONNECT BY POWER(2,LEVEL-1)<=&&N

Result (for N=12):

BIN
----------
1100

Note that we sorted all the rows in descending order of the LEVEL to obtain the correct order of bits.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Solutions to Puzzle of the Week #13

Puzzle of the Week #13:

Table Setup and Puzzle description can be located here

Expected Result:

  ID FULL_NAME                             GROUP_ID
---- ----------------------------------- ----------
   8 Oscar Pedro Fernando Rodriguez               1
   9 Rodriguez, Oscar Pedro Fernando              1
  10 Oscar Fernando Rodriguez Pedro               1
   1 John Smith                                   2
   2 John L. Smith                                2
   4 Smith, John                                  2
   5 Tom Khan                                     3
  11 KHAN, TOM S.                                 3

Solutions:

#1. Using CTE (Recursive WITH) and LISTAGG

WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y(id, token, lvl) AS (
SELECT name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, 1), 1 
FROM x
UNION ALL
SELECT x.name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1), y.lvl+1
FROM x JOIN y ON x.name_id=y.id AND REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1) IS NOT NULL
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name, 
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
--- ---------------------------------------- ----------
  8 Oscar Pedro Fernando Rodrigues                    1
  9 Rodrigues, Oscar Pedro Fernando                   1
 10 Oscar Fernando Rodrigues Pedro                    1
  1 John Smith                                        2
  2 John L. Smith                                     2
  4 Smith, John                                       2
  5 Tom Khan                                          3
 11 KHAN, TOM S.                                      3

Explanation:

The key idea is to split each name into multiple name tokens, then sort and merge them back into a single line. Matching (duplicate) names will have the same merged line so we could use it to identify duplicates. DENSE_RANK analytic function is used to generate sequential group id values.

The same idea is used in the solution below. The only difference is the way to split the names into tokens.

#2: Using CONNECT BY and TABLE/CAST/MULTISET functions

 WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y AS (
SELECT name_id AS id, y.column_value AS token
FROM x,
     TABLE(CAST(MULTISET(SELECT REGEXP_SUBSTR(x.full_name, '[^ ]+', 1, LEVEL) token
                    FROM dual
                    CONNECT BY LEVEL <= LENGTH(full_name)-LENGTH(REPLACE(full_name,' '))+1
                        )
                AS sys.odcivarchar2list)
          ) y
WHERE LENGTH(y.column_value)>1
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name,
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
---- ---------------------------------------- ----------
   8 Oscar Pedro Fernando Rodrigues                    1
   9 Rodrigues, Oscar Pedro Fernando                   1
  10 Oscar Fernando Rodrigues Pedro                    1
   1 John Smith                                        2
   2 John L. Smith                                     2
   4 Smith, John                                       2
   5 Tom Khan                                          3
  11 KHAN, TOM S.                                      3

 

7 Solutions to Puzzle of the Week #11

Puzzle of the Week #11

Produce the Employee Roll Report that satisfies the following list of requirements:

  • Use single SELECT statement
  • Single column “Names” should have a list of the employee names separated by comma
  • The maximum size of the values in the “Names” column should be 23
  • The report should have as few rows as possible
  • All the employee names should be concatenated in the alphabetical order

Expected Result:

(the Length column is added for length verification only)

Names                                        Length
---------------------------------------- ----------
ADAMS,ALLEN,BLAKE,CLARK                          23
FORD,JAMES,JONES,KING                            21
MARTIN,MILLER,SCOTT                              19
SMITH,TURNER,WARD                                17

Solutions

#1 – Using Recursive WITH clause (Common Table Expression) – to contactenate names

WITH e AS (
SELECT ename, ROW_NUMBER()OVER(ORDER BY ename) rn, 23 AS maxlen
FROM emp
), x (rn, txt, grp) AS (
SELECT 1, CAST(ename AS VARCHAR2(100)), 1
FROM e
WHERE rn=1
UNION ALL
SELECT e.rn,
       CASE WHEN LENGTH(x.txt||','||e.ename)>e.maxlen THEN e.ename
            ELSE x.txt||','||e.ename
       END,
       CASE WHEN LENGTH(x.txt||','||e.ename)>e.maxlen THEN x.grp+1
            ELSE x.grp
       END
FROM e JOIN x ON e.rn=x.rn+1
)
SELECT MAX(txt) "Names", LENGTH(MAX(txt)) "Length"
FROM x
GROUP BY grp
ORDER BY grp;

Names                         Length
------------------------- ----------
ADAMS,ALLEN,BLAKE,CLARK           23
FORD,JAMES,JONES,KING             21
MARTIN,MILLER,SCOTT               19
SMITH,TURNER,WARD                 17

#2 – Using Recursive WITH clause (Common Table Expression) – to group names AND LISTAGG function

WITH t (ename, len, rn) AS (  
SELECT ename, LENGTH(ename) + 1, ROW_NUMBER() OVER(ORDER BY ename)  
FROM emp  
), r (ename, running_len, rn, gp) AS (  
SELECT ename, len, rn, 1 
FROM t 
WHERE rn = 1  
UNION ALL  
SELECT t.ename,  
       CASE WHEN t.len > 24 - r.running_len THEN t.len ELSE r.running_len + t.len END,  
       t.rn,  
       r.gp + CASE WHEN t.len > 24 - r.running_len THEN 1 ELSE 0 END  
FROM t JOIN r ON t.rn = r.rn + 1  
)  
SELECT LISTAGG(ename, ',') WITHIN GROUP(ORDER BY rn) AS "Names", MAX(running_len) - 1 AS "Length"  
FROM r  
GROUP BY gp  
ORDER BY gp
/

#3: Using Recursive WITH clause (CTE) – to group names in a different way

WITH data (ename, grp, pass) AS (  
SELECT ename,  
    CASE WHEN SUM(LENGTH(ename) + 1) OVER(ORDER BY  ename) - 1 <= 23  
   THEN 1  
   ELSE 0  
    END, 1  
FROM emp  
UNION ALL  
SELECT ename,  
    CASE WHEN SUM(LENGTH(ename) + 1) OVER (ORDER BY  ename) - 1 <= 23  
   THEN 1  
    END, pass + 1  
FROM data  
WHERE (grp = 0 AND pass = 1) OR grp IS NULL  
), x AS (
SELECT LISTAGG(ename, ',') WITHIN GROUP(ORDER BY ename) AS names, pass  
FROM data  
WHERE grp = 1  
GROUP BY pass 
)  
SELECT names "Names", LENGTH(names) AS "Length"  
FROM x 
ORDER BY 1;

#4: Using XMLAGG with Regular Expressions

WITH t AS (
SELECT TRIM(',' FROM XMLAGG(xmlelement(e, ename||',') ORDER BY ename).EXTRACT('//text()')) AS txt
FROM  emp
), x AS (
SELECT LEVEL AS l,
       TRIM(',' FROM TRIM(REGEXP_SUBSTR(txt,'.{1,23}(,|$)',1,LEVEL))) AS names
       FROM t
       CONNECT BY TRIM(',' FROM TRIM(REGEXP_SUBSTR(txt,'.{1,23}(,|$)',1,LEVEL))) IS NOT NULL
)
SELECT names "Names", LENGTH(names) "Length"
FROM x
/

#5: Using LISTAGG with Regular Expressions

WITH  x AS (
SELECT LISTAGG (ename, ',') WITHIN GROUP (ORDER BY 1) str
FROM emp
)
SELECT RTRIM(REGEXP_SUBSTR (str, '.{1,23}(,|$)', 1, LEVEL), ',')  "Names",
       LENGTH(RTRIM(REGEXP_SUBSTR (str, '.{1,23}(,|$)', 1, LEVEL), ',')) "Length"
FROM x
CONNECT BY RTRIM(REGEXP_SUBSTR (str, '.{1,23}(,|$)', 1, LEVEL), ',') IS NOT NULL

#6: Using MODEL clause for grouping names

WITH m AS (
SELECT i, ename, grp, len, prevlen  
FROM emp  
MODEL  
   DIMENSION BY (ROW_number() OVER (ORDER BY  ename) AS i)  
   MEASURES 
    (
       ename AS ename, 
    CAST('' AS VARCHAR2(24)) AS names,
    0 AS grp,
    0 AS len, 
    0 AS prevlen
 )  
    RULES 
 (
   len[i] = LENGTH(ename[CV()]),
   prevlen[i] = CASE WHEN (CASE WHEN NVL(prevlen[CV()-1],0) = 0 THEN NVL(len[CV()-1],0) 
           ELSE NVL(prevlen[CV()-1],0) + 1 +  NVL(len[CV()-1],0) 
         END) > 23  
         THEN NVL(len[CV()-1],0)  
         ELSE CASE WHEN NVL(prevlen[CV()-1],0) = 0 THEN NVL(len[CV()-1],0) 
          ELSE NVL(prevlen[CV()-1],0) + 1 +  NVL(len[CV()-1],0) 
           END  
       END,
   grp[i] = NVL(grp[CV()-1],0) + CASE WHEN prevlen[CV()+1] < prevlen[CV()] THEN 1 ELSE 0 END   
 )  
)             
SELECT LISTAGG(ename,',') WITHIN GROUP (ORDER BY ename) AS "Names" , LENGTH(listagg(ename,',') WITHIN GROUP (ORDER BY  ename)) AS "Length"  
FROM m  
GROUP BY grp;  
 

#7: Oracle 12c Solution – Using MATCH_RECOGNIZE clause

SELECT  LISTAGG(name,',') WITHIN GROUP(ORDER BY name) "Names",
        LENGTH(LISTAGG(name,',') WITHIN GROUP(ORDER BY name)) "Length"
FROM  EMP
MATCH_RECOGNIZE
 (
  ORDER BY ENAME
  MEASURES
 MATCH_NUMBER() rn,
 UP.ENAME name
 ALL ROWS PER MATCH
 PATTERN (UP+)
 DEFINE
   UP AS SUM(LENGTH(UP.ENAME || ',')) <= 24
  )
GROUP BY RN
ORDER BY RN

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Get a comma separated line of Oracle table column names

When you have to manually put together an insert statement into a table with large number of columns you may feel some pain – it is tedious to concatenate all the columns in a comma delimited line. The following small procedure may save you some time and effort.

CREATE OR REPLACE PROCEDURE sp_get_column_line(p_table VARCHAR2)
AS
    v_result VARCHAR2(2000);
BEGIN
    SELECT listagg (column_name, ', ') WITHIN GROUP (ORDER BY column_id)
    INTO v_result
    FROM user_tab_cols
    WHERE table_name=UPPER(p_table);

    DBMS_OUTPUT.PUT_LINE(LOWER(v_result));
END;
/
SQL> exec sp_get_column_line('emp')

PL/SQL procedure successfully completed.

SQL> SET SERVEROUTPUT ON
SQL> exec sp_get_column_line('emp')
empno, ename, job, mgr, hiredate, sal, comm, deptno

PL/SQL procedure successfully completed.

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

How to Dynamically Generate SELECT Statement with all Table Columns Excluding a Given List

Problem: Dynamically Generate SELECT Statement with all Table Columns Excluding a Given List

Level: Intermediate/Advanced

Let say, we need to dynamically generate a SELECT statement that lists all columns from emp table except for hiredate and comm.

SQL Solution:

SELECT 'SELECT ' || LOWER(LISTAGG(column_name,', ') WITHIN GROUP (ORDER BY column_id)) || ' FROM ' || table_name AS SQL
FROM user_tab_columns
WHERE table_name='EMP'
  AND column_name NOT IN ('COMM', 'HIREDATE')
GROUP BY table_name
/

--Result:
SQL
------------------------------------------------------
SELECT empno, ename, job, mgr, sal, deptno FROM EMP

You can also do it in a number of different ways in PL/SQL.

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.