Substitution SQL Puzzle

May 17, 2020Zahar HilkevichLeave a comment

Level: Advanced

A colleague of mine approached me recently with a puzzle he struggled with: you have a table (let’s call it data_table) with id and val (i.e. value) columns. You are given two parameters: value_to_overwrite and value_to_use that should transform the content of the data_table in a special way:

If both parameters exist in the data_table in the val column for the same id, then the one that is equal to value_to_overwrite should be substituted with value_to_use
If none or just one of the parameters exist in the data_table.val column, than the val column should remain the same
List all the rows from the data_table after the transformation.

Let’s create the data_table using the following DDL command:

CREATE TABLE data_table AS
SELECT 1 id, 'a' val FROM dual
UNION ALL
SELECT 1 id, 'b' val FROM dual
UNION ALL
SELECT 1 id, 'c' val FROM dual
UNION ALL
SELECT 2 id, 'b' val FROM dual
UNION ALL
SELECT 2 id, 'd' val FROM dual

ID	VAL
1	a
1	b
1	c
2	b
2	d

For parameters value_to_overwrite = ‘a’ and value_to_use = ‘b’ the expected result should look like this:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	c
2	b	b
2	d	d

Note, that for id = 1, value ‘b’ is substituted with new value ‘a’ because both, value_to_overwrite (‘a’) and value_to_use (‘b’) exist in the val column. All other values should remain the same as substitution condition is not met.

To mimic the parameter use in the query we will create another table (rule_table) with a single row in it.

CREATE TABLE rule_table AS
SELECT 'a' value_to_use, 'b' value_to_overwrite
FROM dual

Translating requirements from English to SQL will likely result in a bulky and inefficient query. Let’s demonstrate that:

/* Values that need to be substituted */
SELECT d.id, d.val AS original_value, r.value_to_use AS new_value
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use IN (SELECT val
                         FROM data_table
                         WHERE id = d.id)
UNION ALL
/* Values that remain the same as only value_to_overwrite exist for given id */
SELECT d.id, d.val, d.val
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use NOT IN (SELECT val
                             FROM data_table
                             WHERE id = d.id)
UNION ALL
/* Values that remain the same as value_to_overwrite does not match val */
SELECT d.id, d.val, d.val
FROM data_table d
WHERE d.val NOT IN (SELECT value_to_overwrite
                    FROM rule_table)

As you can see, there are multiple (five) copies of the data_table used, which will lead to a poor performance when the size of the table increases dramatically.

A way better approach is to take the first SELECT from the UNIONed statement above and turn the INNER JOIN into an LEFT OUTER JOIN. At the same time, we need to move the filtering condition from the WHERE clause to the JOIN (otherwise, the LEFT JOIN will work as INNER JOIN):

SELECT d.id,
       d.val                      AS original_value,
       NVL(r.value_to_use, d.val) AS new_value
FROM data_table d LEFT JOIN rule_table r 
                  ON d.val = r.value_to_overwrite
                 AND r.value_to_use IN (SELECT val
                                        FROM data_table
                                        WHERE id = d.id)

This is a quite efficient and fairly short query that uses only two copies of the data_table. Can we do better than that? Yes, we can!

WITH x AS (
SELECT id, val,
       MIN(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  min_val,
       MAX(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  max_val,
       LEAST(value_to_use, value_to_overwrite)    min_ow,
       GREATEST(value_to_use, value_to_overwrite) max_ow,
       value_to_use, value_to_overwrite
FROM data_table CROSS JOIN rule_table 
)
SELECT id, val AS original_value,
       CASE WHEN min_val=min_ow AND
                 max_val=max_ow AND
                 val=value_to_overwrite THEN value_to_use
       ELSE val
       END AS new_value
FROM x

Analytic functions MIN and MAX let us scan the data_table vertically while LEAST and GREATEST do the same horizontally. The later pair of functions come very handy when you need to compare pairs of values, so the smaller of the values should match LEAST and the other – GREATEST.

And still, the last strategy has one flaw: we used a Cartesian Product (CROSS JOIN) which means that had we have more than one substitution rule, the method would not work properly. Let’s fix it.

First, we will add one more rule:

INSERT INTO rule_table VALUES('b', 'c')

Now, the expected result should looks as the following:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	b
2	b	b
2	d	d

Note, that the second rule turns original ‘c’ value into ‘b’.

And again, Analytic functions do all the magic:

WITH x AS (
SELECT id, val, value_to_overwrite, value_to_use,
       LEAST(value_to_overwrite, value_to_use) || '|' ||
       GREATEST(value_to_overwrite, value_to_use) rule_vals,
       LISTAGG(DISTINCT val, '|') WITHIN GROUP(ORDER BY val)
       OVER(PARTITION BY id) vals
FROM data_table LEFT JOIN rule_table ON val = value_to_overwrite
)
SELECT id, val AS original_value,
       CASE WHEN value_to_overwrite IS NULL THEN val
            WHEN INSTR(vals, rule_vals)=0 THEN val
            ELSE value_to_use
       END     AS new_value
FROM x

This time, LISTAGG analytic function (with DISTINCT option – recently supported by Oracle) helps matching the val against value_to_overwrite and value_to_use pair.

I strongly recommend executing parts of the above queries to gain a better understanding of the demonstrated strategies. livesql.oracle.com site offers you a great query tool with the latest version of Oracle database.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Interview Question: For each department count the number of employees who get no commission.

September 10, 2018Zahar HilkevichLeave a comment

Interview Question:

Write a single SELECT statement that returns the number of employees who get no commission broken down by department. (Use scott.emp table)

Level:

Intermediate

Expected Result:

DEPTNO	NO_COMM_COUNT
10	3
20	5
30	3

Solutions

A very typical attempt to solve this problem results in the following query:

SELECT deptno, COUNT(*) no_comm_count
FROM scott.emp
WHERE comm IS NULL OR comm=0
GROUP BY deptno
ORDER BY 1

Yes, the result looks correct, but is the query correct?

The answer is NO! It would become apparent if we had a department where all employees get paid commission, so the number of those who does not would be 0.

Let’s change the requirement a bit – we will show all department and number of employees hired on Friday:

SELECT deptno, COUNT(*) fri_count
FROM scott.emp
WHERE TO_CHAR(hiredate, 'DY')='FRI'
GROUP BY deptno 
ORDER BY 1

The result of this query is clearly not what we want:

DEPTNO	FRI_COUNT
30	2

We would expect the following instead:

DEPTNO	FRI_COUNT
10	0
20	0
30	2

Why don’t we get the departments 10 and 20? The answer is very simple – because we filter “all” those department rows with our WHERE clause. So how should we work around?

Let’s start with more intuitive but less efficient approaches – we will use the same query as before and UNION it with another query that returns “empty” departments. Essentially, the original problem transforms into a new one – find all department where no employees were hired on Friday.

Strategy #1: Using UNION ALL with multi-column non-correlated subquery:

SELECT deptno, COUNT(*) fri_count 
FROM scott.emp 
WHERE TO_CHAR(hiredate, 'DY')='FRI' 
GROUP BY deptno
UNION ALL
SELECT deptno, 0 fri_count 
FROM scott.emp
WHERE (deptno, 'FRI') NOT IN (SELECT deptno, TO_CHAR(hiredate, 'DY')
                              FROM scott.emp)
GROUP BY deptno
ORDER BY 1

DEPTNO	FRI_COUNT
10	0
20	0
30	2

Strategy #2: Using UNION ALL with ALL predicate on correlated subquery:

SELECT deptno, COUNT(*) fri_count
FROM scott.emp
WHERE TO_CHAR(hiredate, 'DY')='FRI'
GROUP BY deptno
UNION ALL 
SELECT deptno, 0 no_comm_count 
FROM scott.emp a
WHERE 'FRI'!=ALL(SELECT TO_CHAR(hiredate, 'DY')
                 FROM scott.emp b
                 WHERE a.deptno=b.deptno) 
GROUP BY deptno
ORDER BY 1

It is apparent that the ALL predicate ensures that no employees were hired on Friday.

Now we will mimic the behavior of the UNION ALL operator using LEFT JOIN:

Strategy #3: Using LEFT JOIN:

SELECT a.deptno, COUNT(DISTINCT b.empno) fri_count
FROM scott.emp a LEFT JOIN scott.emp b ON a.deptno=b.deptno
                                      AND TO_CHAR(b.hiredate, 'DY')='FRI'
GROUP BY a.deptno
ORDER BY 1

COUNT(DISTINCT …) is needed to handle a Cartesian Product as the join by deptno column produces many to many relationship, i.e. Cartesian product.

Strategy #4: Generic substitution technique for an outer-join using UNION ALL

WITH e AS (
SELECT deptno, COUNT(*) fri_count
FROM scott.emp
WHERE TO_CHAR(hiredate, 'DY') = 'FRI'
GROUP BY deptno
UNION ALL
SELECT deptno, 0
FROM scott.emp
GROUP BY deptno
)
SELECT deptno, MAX(fri_count) fri_count
FROM e
GROUP BY deptno
ORDER BY 1

All the above techniques may look cool but they are clearly an overkill for such a simple problem. There is a simple rule worth remembering:

If you need to conditionally aggregate all records in the table but you fail doing so due to a WHERE clause filter, consider moving the filter into the GROUP function you use in SELECT.

Strategy #5: Conditional Aggregation

SELECT deptno, COUNT(DECODE(TO_CHAR(hiredate, 'DY'), 'FRI', 1)) fri_count
FROM scott.emp
GROUP BY deptno
ORDER BY 1

Alternatively, you can use CASE function inside of COUNT. It is especially convenient for our original question/problem, i.e. to count employees who is not paid a commission:

SELECT deptno, COUNT(CASE WHEN LNNVL(comm>0) THEN 1 END) no_comm_count
FROM scott.emp
GROUP BY deptno
ORDER BY 1

DEPTNO	NO_COMM_COUNT
10	3
20	5
30	3

This approach is the most efficient as it makes Oracle scanning the emp table only once.

Notice the use of the LNNVL function. You can read more about it in my recent post here.

My Oracle Group on Facebook:

If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

3 Solutions to 2018 Oracle SQL Puzzle of the Week #2

January 22, 2018January 22, 2018Zahar HilkevichLeave a comment

2018 Puzzle of the Week #2:

For each of the following salary ranges select two randomly chosen employees:
0-999
1000-1999
2000-2999
3000+

Expected Result (in SQL*Plus):

ENAME      SAL        RANGE
---------- ---------- ---------
SCOTT            3000 3000+
FORD             3000 3000+
BLAKE            2850 2000-2999
CLARK            2450 2000-2999
TURNER           1500 1000-1999
MILLER           1300 1000-1999
JAMES             950 0-999
SMITH             800 0-999

Remember to use only a single SELECT statement.
Use table emp (from Oracle scott schema)

Solutions:

Solution #1: Using ROW_NUMBER with random.value functions:

We are applying a random sorting order to each of the salary ranges and take 2 top records from each range:

WITH x AS (
SELECT CASE WHEN sal<=999  THEN '0-999'
            WHEN sal<=1999 THEN '1000-1999'
            WHEN sal<=2999 THEN '2000-2999'
            ELSE                '3000+'
       END range,
       ename, sal
FROM emp
), y AS (
SELECT ename, sal, range, 
       ROW_NUMBER()OVER(PARTITION BY range 
                        ORDER BY dbms_random.value) rn
FROM x
)
SELECT range, ename, sal
FROM y
WHERE rn<=2
ORDER BY range

Result:

RANGE     ENAME      SAL
--------- ---------- ----------
0-999     JAMES      950
0-999     SMITH      800
1000-1999 WARD 1250
1000-1999 TURNER 1500
2000-2999 JONES 2975
2000-2999 CLARK 2450
3000+     FORD 3000
3000+     KING 5000

Result (of subsequent execution):

RANGE     ENAME             SAL
--------- ---------- ----------
0-999     SMITH             800
0-999     JAMES             950
1000-1999 WARD             1250
1000-1999 MARTIN           1250
2000-2999 BLAKE            2850
2000-2999 JONES            2975
3000+     SCOTT            3000
3000+     KING             5000

Solution #2: Using DECODE, MAX() KEEP and UNION ALL:

Instead of taking top 2 records (randomly sorted), we are taking top 1 and bottom 1 and combine them together. DECODE function mimics the CASE from the previous solution.

WITH x AS (
SELECT DECODE(1, SIGN(999-sal), '0-999', SIGN(1999-sal), '1000-1999',
                 SIGN(2999-sal), '2000-2999', '3000+') range,
       ename, sal, ROWNUM || dbms_random.value rnd
FROM scott.emp
)
SELECT range, MAX(ename)KEEP(DENSE_RANK FIRST ORDER BY rnd) ename,
              MAX(sal)  KEEP(DENSE_RANK FIRST ORDER BY rnd) sal
FROM x
GROUP BY range
UNION ALL
SELECT range, MAX(ename)KEEP(DENSE_RANK LAST ORDER BY rnd) ename,
              MAX(sal)  KEEP(DENSE_RANK LAST ORDER BY rnd) sal
FROM x
GROUP BY range
ORDER BY range

Result:

RANGE     ENAME             SAL
--------- ---------- ----------
0-999     JAMES             950
0-999     SMITH             800
1000-1999 MARTIN           1250
1000-1999 WARD             1250
2000-2999 JONES            2975
2000-2999 BLAKE            2850
3000+     FORD             3000
3000+     KING             5000

Note, that we concatenated ROWNUM with dbms_random.value to produce UNIQUE random value. Without ROWNUM (or any other KEY) there is always a chance that dbms_random.value will repeat on different rows and hence top and bottom values could be mixed and the same employee will be repeated twice.

Solution #3: Using SIN for random value simulation and multi-column UNPIVOT with MAX() KEEP function:

Instead of combining top and bottom records from two statements using UNION ALL, here were calculating top and bottom values as 1 record and UNPIVOT them to produce two rows per salary range:

WITH x AS (
SELECT DECODE(1, SIGN(999-sal), '0-999', SIGN(1999-sal), '1000-1999',
                 SIGN(2999-sal), '2000-2999', '3000+') range,
       ename, sal, 
       SIN(ROWNUM*TO_NUMBER(SUBSTR(
                             extract(second 
                                     from current_timestamp),-3))
           ) rnd
FROM scott.emp
), y AS (
SELECT range, MAX(ename)KEEP(DENSE_RANK FIRST ORDER BY rnd) ename1,
              MAX(sal)  KEEP(DENSE_RANK FIRST ORDER BY rnd) sal1,
              MAX(ename)KEEP(DENSE_RANK LAST ORDER BY rnd) ename2,
              MAX(sal)  KEEP(DENSE_RANK LAST ORDER BY rnd) sal2
FROM x
GROUP BY range
)
SELECT range, ename, sal
FROM y
UNPIVOT (
  (ename, sal) for (t1, t2) in ((ename1,sal1), (ename2,sal2))
)
ORDER BY range

Result:

RANGE     ENAME             SAL
--------- ---------- ----------
0-999     SMITH             800
0-999     JAMES             950
1000-1999 MILLER           1300
1000-1999 MARTIN           1250
2000-2999 CLARK            2450
2000-2999 BLAKE            2850
3000+     FORD             3000
3000+     SCOTT            3000

Note the use of multi-column UNPIVOT. Randomization simulation is based on a fairly random selection of the last 3 digits in the current timestamp’s second value. This number is used as a “seed”. When this seed is multiplied by the rownum, the result is used as a SIN function argument which makes the outcome pseudo-random.

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

How to Simulate SIGN Function

February 16, 2017February 16, 2017Zahar HilkevichLeave a comment

Puzzle of the day:

How to simulate the SIGN function in Oracle SQL by only using CEIL, FLOOR, and ABS Oracle SQL functions along with arithmetic operators? No PL/SQL.

Solution:

SIGN(x)=CEIL(x/(1+ABS(x)))+FLOOR(x/(1+ABS(x)))

In SQL, we can demonstrate it as follows:

WITH r AS (
SELECT dbms_random.VALUE(-999,999) rnd
FROM dual
CONNECT BY LEVEL<=10
UNION ALL
SELECT 0
FROM dual
)
SELECT rnd, SIGN(rnd), CEIL(rnd/(1+ABS(rnd)))+FLOOR(rnd/(1+ABS(rnd))) "MySign"
FROM r

Result:

       RND  SIGN(RND)     MySign
---------- ---------- ----------
  -519.606         -1         -1
-657.62692         -1         -1
414.625079          1          1
736.175183          1          1
268.689074          1          1
-647.12649         -1         -1
338.192233          1          1
784.780876          1          1
-529.69184         -1         -1
-596.56803         -1         -1
         0          0          0

As you can see, “MySign” column perfectly matches SIGN column.

Comment:

WITH clause is needed to generate 10 random values in the range of -999 .. +999. “0” value is added to demonstrate a special case as it is unlikely that zero will be randomly generated.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Solutions to Puzzle of the Week #13

June 6, 2016Zahar HilkevichLeave a comment

Puzzle of the Week #13:

Table Setup and Puzzle description can be located here

Expected Result:

  ID FULL_NAME                             GROUP_ID
---- ----------------------------------- ----------
   8 Oscar Pedro Fernando Rodriguez               1
   9 Rodriguez, Oscar Pedro Fernando              1
  10 Oscar Fernando Rodriguez Pedro               1
   1 John Smith                                   2
   2 John L. Smith                                2
   4 Smith, John                                  2
   5 Tom Khan                                     3
  11 KHAN, TOM S.                                 3

Solutions:

#1. Using CTE (Recursive WITH) and LISTAGG

WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y(id, token, lvl) AS (
SELECT name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, 1), 1 
FROM x
UNION ALL
SELECT x.name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1), y.lvl+1
FROM x JOIN y ON x.name_id=y.id AND REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1) IS NOT NULL
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name, 
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
--- ---------------------------------------- ----------
  8 Oscar Pedro Fernando Rodrigues                    1
  9 Rodrigues, Oscar Pedro Fernando                   1
 10 Oscar Fernando Rodrigues Pedro                    1
  1 John Smith                                        2
  2 John L. Smith                                     2
  4 Smith, John                                       2
  5 Tom Khan                                          3
 11 KHAN, TOM S.                                      3

Explanation:

The key idea is to split each name into multiple name tokens, then sort and merge them back into a single line. Matching (duplicate) names will have the same merged line so we could use it to identify duplicates. DENSE_RANK analytic function is used to generate sequential group id values.

The same idea is used in the solution below. The only difference is the way to split the names into tokens.

#2: Using CONNECT BY and TABLE/CAST/MULTISET functions

 WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y AS (
SELECT name_id AS id, y.column_value AS token
FROM x,
     TABLE(CAST(MULTISET(SELECT REGEXP_SUBSTR(x.full_name, '[^ ]+', 1, LEVEL) token
                    FROM dual
                    CONNECT BY LEVEL <= LENGTH(full_name)-LENGTH(REPLACE(full_name,' '))+1
                        )
                AS sys.odcivarchar2list)
          ) y
WHERE LENGTH(y.column_value)>1
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name,
       COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
       DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

  ID FULL_NAME                                  GROUP_ID
---- ---------------------------------------- ----------
   8 Oscar Pedro Fernando Rodrigues                    1
   9 Rodrigues, Oscar Pedro Fernando                   1
  10 Oscar Fernando Rodrigues Pedro                    1
   1 John Smith                                        2
   2 John L. Smith                                     2
   4 Smith, John                                       2
   5 Tom Khan                                          3
  11 KHAN, TOM S.                                      3

Round-Robin Schedule SQL Puzzle

April 13, 2016April 13, 2016Zahar HilkevichLeave a comment

Puzzle: With a single SELECT statement create a schedule of play for a round-robin tournament. The query should work for odd or even number of players. For odd number of players, the player with bye should be listed last for that round.

You can read about round-robin tournament definition here: https://en.wikipedia.org/wiki/Round-robin_tournament

Expected Results:

4 players:

 ROUND PAIRING
------ --------
     1 1 vs 2
     1 3 vs 4
     2 1 vs 4
     2 2 vs 3
     3 3 vs 1
     3 4 vs 2

ROUND PAIRING
----- --------
    1 1 vs 2
      3 vs 4

    2 1 vs 4
      2 vs 3

    3 3 vs 1
      4 vs 2

5 players:

  ROUND PAIRING
------ --------
     1 5 vs 1
     1 4 vs 2
     1 3 - bye
     2 1 vs 2
     2 5 vs 3
     2 4 - bye
     3 1 vs 4
     3 2 vs 3
     3 5 - bye
     4 2 vs 5
     4 3 vs 4
     4 1 - bye
     5 3 vs 1
     5 4 vs 5
     5 2 - bye

ROUND PAIRING
----- -------
    1 5 vs 1
      4 vs 2
      3 - bye

    2 1 vs 2
      5 vs 3
      4 - bye

    3 1 vs 4
      2 vs 3
      5 - bye

    4 2 vs 5
      3 vs 4
      1 - bye

    5 3 vs 1
      4 vs 5
      2 - bye

6 players:

ROUND PAIRING
----- --------
    1 5 vs 1
    1 4 vs 2
    1 3 vs 6
    2 1 vs 2
    2 5 vs 3
    2 6 vs 4
    3 1 vs 4
    3 2 vs 3
    3 5 vs 6
    4 1 vs 6
    4 2 vs 5
    4 3 vs 4
    5 3 vs 1
    5 6 vs 2
    5 4 vs 5

ROUND PAIRING
----- -------
    1 5 vs 1
      4 vs 2
      3 vs 6

    2 1 vs 2
      5 vs 3
      6 vs 4

    3 1 vs 4
      2 vs 3
      5 vs 6

    4 1 vs 6
      2 vs 5
      3 vs 4

    5 3 vs 1
      6 vs 2
      4 vs 5

Solution (SQL*Plus script):

accept players prompt "Enter the number of players: "
set pagesize 100
break on "ROUND" skip 1
col pairing for a10

WITH prompt AS (
   SELECT &players AS oplayers
   FROM dual
), x AS (
SELECT LEVEL n, oplayers,  oplayers + MOD(oplayers,2) AS players
FROM dual, prompt
CONNECT BY LEVEL<=oplayers + MOD(oplayers,2)-1
), w AS (
SELECT a.n AS rnd, 1+MOD(a.n+b.n,a.players-1) AS p,
       CASE WHEN ROW_NUMBER()OVER(PARTITION BY a.n ORDER BY 1)<=a.players/2 THEN ROW_NUMBER()OVER(PARTITION BY a.n ORDER BY a.n, b.n )-1
            ELSE a.players-ROW_NUMBER()OVER(PARTITION BY a.n ORDER BY 1)
       END AS match_id, a.oplayers  
FROM x a, x b 
UNION ALL
SELECT n, players AS p, 0 AS match_id, oplayers
FROM x
)
SELECT rnd AS "ROUND", 
       CASE WHEN MAX(p)>oplayers THEN  MIN(p) || ' - bye ' 
            WHEN MOD(MAX(p)-MIN(p),2)=1 THEN MIN(p) || ' vs ' || MAX(p)
            ELSE MAX(p) || ' vs ' || MIN(p) 
       END AS pairing
FROM w
GROUP BY rnd, match_id, oplayers
ORDER BY 1, CASE WHEN MAX(p)>oplayers THEN oplayers ELSE MIN(p) END;

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Interview Question: Show Odd/Even rows without using any functions and pseudo-columns

April 7, 2016July 6, 2017Zahar HilkevichLeave a comment

Sushil Kumar, Database Developer at JP Morgan Chase & Co, has recently asked me this question on my Facebook group page. My first reaction was: “What a silly question! Of course it is impossible to identify odd and even rows without using functions”. But shortly after that, I realized that this is a great SQL puzzle. It took me about 30 minutes (which is a lot!) to solve it.

Interview Question: Show Odd/Even rows without using any functions and pseudo-columns

Level: Advanced

Sample Expected Result:

ENAME           EMPNO   ODD_EVEN
---------- ---------- ----------
MILLER           7934          1
FORD             7902          0
JAMES            7900          1
ADAMS            7876          0
TURNER           7844          1
KING             7839          0
SCOTT            7788          1
CLARK            7782          0
BLAKE            7698          1
MARTIN           7654          0
JONES            7566          1
WARD             7521          0
ALLEN            7499          1
SMITH            7369          0

--Note: Rows are sorted by empno

The idea behind the following solution is quite simple: substitute functions with operators and predicates. Several similar techniques were described in my book Oracle SQL Tricks and Workarounds.

Solution:

WITH x (ename, empno, odd_even) AS
(
SELECT ename, empno, 1 as odd_even
FROM emp
WHERE empno>=ALL(SELECT empno FROM emp)
UNION ALL
SELECT e.ename, e.empno, 1-odd_even
FROM emp e, x
WHERE e.empno>=ALL(SELECT empno FROM emp WHERE empno<x.empno)
)
SELECT *
FROM x
/

ENAME           EMPNO   ODD_EVEN
---------- ---------- ----------
MILLER           7934          1
FORD             7902          0
JAMES            7900          1
ADAMS            7876          0
TURNER           7844          1
KING             7839          0
SCOTT            7788          1
CLARK            7782          0
BLAKE            7698          1
MARTIN           7654          0
JONES            7566          1
WARD             7521          0
ALLEN            7499          1
SMITH            7369          0

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Puzzle of the week #3 Solutions

March 12, 2016March 12, 2016Zahar HilkevichLeave a comment

Puzzle of the week #3 – Calendar Summary Report:

Write a single SELECT statement that outputs number of Sundays, Mondays, Tuesdays, etc in each month of the current year.

The output should look like this:

MONTH  SUN  MON  TUE  WED  THU  FRI  SAT
----- ---- ---- ---- ---- ---- ---- ----
JAN      5    4    4    4    4    5    5
FEB      4    5    4    4    4    4    4
MAR      4    4    5    5    5    4    4
APR      4    4    4    4    4    5    5
MAY      5    5    5    4    4    4    4
JUN      4    4    4    5    5    4    4
JUL      5    4    4    4    4    5    5
AUG      4    5    5    5    4    4    4
SEP      4    4    4    4    5    5    4
OCT      5    5    4    4    4    4    5
NOV      4    4    5    5    4    4    4
DEC      4    4    4    4    5    5    5

We suggest you to go over the post that explains how to generate various date ranges before checking the solutions below.

Solution #1: Using PIVOT simulation

WITH days AS (
SELECT TRUNC(SYSDATE,'YEAR')+ROWNUM-1 d
FROM dual
CONNECT BY TO_CHAR(TRUNC(SYSDATE,'YEAR')+ROWNUM-1, 'YYYY')=TO_CHAR(SYSDATE,'YYYY')
)
SELECT TO_CHAR(d,'MON') Month,
       SUM(CASE WHEN TO_CHAR(d,'DY')='SUN' THEN 1 END) SUN,
       SUM(CASE WHEN TO_CHAR(d,'DY')='MON' THEN 1 END) MON,
       SUM(CASE WHEN TO_CHAR(d,'DY')='TUE' THEN 1 END) TUE,
       SUM(CASE WHEN TO_CHAR(d,'DY')='WED' THEN 1 END) WED,
       SUM(CASE WHEN TO_CHAR(d,'DY')='THU' THEN 1 END) THU,
       SUM(CASE WHEN TO_CHAR(d,'DY')='FRI' THEN 1 END) FRI,
       SUM(CASE WHEN TO_CHAR(d,'DY')='SAT' THEN 1 END) SAT
FROM days
GROUP BY TO_CHAR(d,'MON'), TO_CHAR(d,'MM')
ORDER BY TO_CHAR(d,'MM');

Solution #2: Using PIVOT

SELECT month, mon, sun, mon, tue, wed, thu, fri, sat
FROM 
(
SELECT TO_CHAR(TRUNC(SYSDATE,'YEAR')+ROWNUM-1, 'MON') month,
       TO_CHAR(TRUNC(SYSDATE,'YEAR')+ROWNUM-1, 'DY') dy,
       TO_CHAR(TRUNC(SYSDATE,'YEAR')+ROWNUM-1, 'MM') mm
FROM dual
CONNECT BY TO_CHAR(TRUNC(SYSDATE,'YEAR')+ROWNUM-1, 'YYYY')=TO_CHAR(SYSDATE,'YYYY')
)
PIVOT
(
   COUNT(dy)
   FOR dy IN ('SUN' sun, 'MON' mon, 'TUE' tue, 'WED' wed, 'THU' thu, 'FRI' fri, 'SAT' sat)
)
ORDER BY mm;

Solution #3: Using PIVOT simulation and Recursive WITH

WITH days(d) AS
(
SELECT TRUNC(SYSDATE,'YEAR') d
FROM dual
UNION ALL
SELECT d+1
FROM days
WHERE TO_CHAR(d+1,'YYYY')=TO_CHAR(SYSDATE,'YYYY')
)
SELECT TO_CHAR(d,'MON') Month,
       SUM(CASE WHEN TO_CHAR(d,'DY')='SUN' THEN 1 END) SUN,
       SUM(CASE WHEN TO_CHAR(d,'DY')='MON' THEN 1 END) MON,
       SUM(CASE WHEN TO_CHAR(d,'DY')='TUE' THEN 1 END) TUE,
       SUM(CASE WHEN TO_CHAR(d,'DY')='WED' THEN 1 END) WED,
       SUM(CASE WHEN TO_CHAR(d,'DY')='THU' THEN 1 END) THU,
       SUM(CASE WHEN TO_CHAR(d,'DY')='FRI' THEN 1 END) FRI,
       SUM(CASE WHEN TO_CHAR(d,'DY')='SAT' THEN 1 END) SAT
FROM days
GROUP BY TO_CHAR(d,'MON'), TO_CHAR(d,'MM')
ORDER BY TO_CHAR(d,'MM');

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

Puzzle of the Week Challenge – Solutions to the 2nd Puzzle

March 4, 2016April 30, 2016Zahar HilkevichLeave a comment

Last week we presented the 2nd puzzle of our contest, Puzzle of the Week. Today we publish correct answers for that puzzle:
Thanks to all who accepted the challenge!

Dish washing schedule puzzle:

Four roommate students, Anna, Betty, Carla, and Daniela decided to make a “Dish washing schedule”. Every day one of the girls should do all the dishes. The challenge is to make a schedule for the next month that will spread the responsibilities among the girls as evenly as possible. At the same time the schedule should be completely random.

Solution #1: Traditional approach for mimicking pivoted report.

WITH x AS (
SELECT FLOOR((LEVEL-1)/4) id, 
       LEVEL AS d,  
       RANK()OVER(PARTITION BY FLOOR((LEVEL-1)/4) ORDER BY DBMS_RANDOM.VALUE) rk
FROM dual 
CONNECT BY LEVEL<=32
)
SELECT MAX(CASE WHEN MOD(rk,4)=1 THEN D END) AS "Anna",
       MAX(CASE WHEN MOD(rk,4)=2 THEN D END) AS "Betty",
       MAX(CASE WHEN MOD(rk,4)=3 THEN d END) AS "Carla",
       MAX(CASE WHEN MOD(rk,4)=0 THEN D END) AS "Daniela"
FROM x
WHERE d<=TO_CHAR(LAST_DAY(SYSDATE),'DD')
GROUP BY id
ORDER BY id

Sample output #1:

      Anna      Betty      Carla    Daniela
---------- ---------- ---------- ----------
         2          1          4          3
         6          5          8          7
        11          9         10         12
        14         15         16         13
        20         19         18         17
        24         21         22         23
        28         27         26         25
        31                    30         29

Sample output #2 (after re-running the same query):

      Anna      Betty      Carla    Daniela
---------- ---------- ---------- ----------
         3          4          2          1
         7          5          6          8
        12          9         11         10
        16         14         13         15
        20         19         18         17
        23         24         21         22
        25         27         26         28
                   31         29         30

Solution #2: Using Recursive WITH clause for range generation:

WITH x(d) AS (
  SELECT 1 AS d
  FROM dual
  UNION ALL
  SELECT d+1
  FROM x
  WHERE d<32
), y AS (
SELECT FLOOR((d-1)/4) id, 
       CASE WHEN d<=TO_CHAR(LAST_DAY(SYSDATE),'DD') THEN d END d, 
       RANK()OVER(PARTITION BY FLOOR((d-1)/4) ORDER BY DBMS_RANDOM.VALUE) rk
FROM x
)
SELECT MAX(CASE WHEN MOD(rk,4)=1 THEN D END) AS "Anna",
       MAX(CASE WHEN MOD(rk,4)=2 THEN D END) AS "Betty",
       MAX(CASE WHEN MOD(rk,4)=3 THEN d END) AS "Carla",
       MAX(CASE WHEN MOD(rk,4)=0 THEN D END) AS "Daniela"
FROM y
WHERE d<=TO_CHAR(LAST_DAY(SYSDATE),'DD')
GROUP BY id
ORDER BY id

Sample output #1:

      Anna      Betty      Carla    Daniela
---------- ---------- ---------- ----------
         1          4          2          3
         8          7          6          5
        11         10          9         12
        15         14         13         16
        18         19         20         17
        24         22         21         23
        25         27         28         26
        29                    30         31

Sample output #2 (after re-running the same query):

      Anna      Betty      Carla    Daniela
---------- ---------- ---------- ----------
         1          4          3          2
         5          6          7          8
        11         12         10          9
        13         15         16         14
        20         19         17         18
        22         23         24         21
        28         26         27         25
                   29         30         31

Solution #3: Using PIVOT clause:

SELECT "1" AS "Anna","2" AS "Betty", "3" AS "Carla", "4" AS "Daniela"  
FROM (
SELECT FLOOR((LEVEL-1)/4) id, 
       CASE WHEN LEVEL<=TO_CHAR(LAST_DAY(SYSDATE),'DD') THEN LEVEL END AS d,  
       RANK()OVER(PARTITION BY FLOOR((LEVEL-1)/4) ORDER BY DBMS_RANDOM.VALUE) rk
FROM dual 
CONNECT BY LEVEL<=4*CEIL(31/4)
)
PIVOT 
(
  MAX(d)
  FOR rk IN (1,2,3,4)
)
ORDER BY id;

Sample output #1:

      Anna      Betty      Carla    Daniela
---------- ---------- ---------- ----------
         4          1          3          2
         7          6          8          5
        10          9         11         12
        16         14         13         15
        20         19         17         18
        23         22         21         24
        26         25         27         28
        31                    29         30

Sample output #2 (after re-running the same query):

      Anna      Betty      Carla    Daniela
---------- ---------- ---------- ----------
         4          1          3          2
         8          7          5          6
        11         10         12          9
        16         13         14         15
        20         19         17         18
        21         22         23         24
        28         26         27         25
        31         29         30

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.