Substitution SQL Puzzle

May 17, 2020Zahar HilkevichLeave a comment

Level: Advanced

A colleague of mine approached me recently with a puzzle he struggled with: you have a table (let’s call it data_table) with id and val (i.e. value) columns. You are given two parameters: value_to_overwrite and value_to_use that should transform the content of the data_table in a special way:

If both parameters exist in the data_table in the val column for the same id, then the one that is equal to value_to_overwrite should be substituted with value_to_use
If none or just one of the parameters exist in the data_table.val column, than the val column should remain the same
List all the rows from the data_table after the transformation.

Let’s create the data_table using the following DDL command:

CREATE TABLE data_table AS
SELECT 1 id, 'a' val FROM dual
UNION ALL
SELECT 1 id, 'b' val FROM dual
UNION ALL
SELECT 1 id, 'c' val FROM dual
UNION ALL
SELECT 2 id, 'b' val FROM dual
UNION ALL
SELECT 2 id, 'd' val FROM dual

ID	VAL
1	a
1	b
1	c
2	b
2	d

For parameters value_to_overwrite = ‘a’ and value_to_use = ‘b’ the expected result should look like this:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	c
2	b	b
2	d	d

Note, that for id = 1, value ‘b’ is substituted with new value ‘a’ because both, value_to_overwrite (‘a’) and value_to_use (‘b’) exist in the val column. All other values should remain the same as substitution condition is not met.

To mimic the parameter use in the query we will create another table (rule_table) with a single row in it.

CREATE TABLE rule_table AS
SELECT 'a' value_to_use, 'b' value_to_overwrite
FROM dual

Translating requirements from English to SQL will likely result in a bulky and inefficient query. Let’s demonstrate that:

/* Values that need to be substituted */
SELECT d.id, d.val AS original_value, r.value_to_use AS new_value
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use IN (SELECT val
                         FROM data_table
                         WHERE id = d.id)
UNION ALL
/* Values that remain the same as only value_to_overwrite exist for given id */
SELECT d.id, d.val, d.val
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use NOT IN (SELECT val
                             FROM data_table
                             WHERE id = d.id)
UNION ALL
/* Values that remain the same as value_to_overwrite does not match val */
SELECT d.id, d.val, d.val
FROM data_table d
WHERE d.val NOT IN (SELECT value_to_overwrite
                    FROM rule_table)

As you can see, there are multiple (five) copies of the data_table used, which will lead to a poor performance when the size of the table increases dramatically.

A way better approach is to take the first SELECT from the UNIONed statement above and turn the INNER JOIN into an LEFT OUTER JOIN. At the same time, we need to move the filtering condition from the WHERE clause to the JOIN (otherwise, the LEFT JOIN will work as INNER JOIN):

SELECT d.id,
       d.val                      AS original_value,
       NVL(r.value_to_use, d.val) AS new_value
FROM data_table d LEFT JOIN rule_table r 
                  ON d.val = r.value_to_overwrite
                 AND r.value_to_use IN (SELECT val
                                        FROM data_table
                                        WHERE id = d.id)

This is a quite efficient and fairly short query that uses only two copies of the data_table. Can we do better than that? Yes, we can!

WITH x AS (
SELECT id, val,
       MIN(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  min_val,
       MAX(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  max_val,
       LEAST(value_to_use, value_to_overwrite)    min_ow,
       GREATEST(value_to_use, value_to_overwrite) max_ow,
       value_to_use, value_to_overwrite
FROM data_table CROSS JOIN rule_table 
)
SELECT id, val AS original_value,
       CASE WHEN min_val=min_ow AND
                 max_val=max_ow AND
                 val=value_to_overwrite THEN value_to_use
       ELSE val
       END AS new_value
FROM x

Analytic functions MIN and MAX let us scan the data_table vertically while LEAST and GREATEST do the same horizontally. The later pair of functions come very handy when you need to compare pairs of values, so the smaller of the values should match LEAST and the other – GREATEST.

And still, the last strategy has one flaw: we used a Cartesian Product (CROSS JOIN) which means that had we have more than one substitution rule, the method would not work properly. Let’s fix it.

First, we will add one more rule:

INSERT INTO rule_table VALUES('b', 'c')

Now, the expected result should looks as the following:

ID	ORIGINAL_VALUE	NEW_VALUE
1	a	a
1	b	a
1	c	b
2	b	b
2	d	d

Note, that the second rule turns original ‘c’ value into ‘b’.

And again, Analytic functions do all the magic:

WITH x AS (
SELECT id, val, value_to_overwrite, value_to_use,
       LEAST(value_to_overwrite, value_to_use) || '|' ||
       GREATEST(value_to_overwrite, value_to_use) rule_vals,
       LISTAGG(DISTINCT val, '|') WITHIN GROUP(ORDER BY val)
       OVER(PARTITION BY id) vals
FROM data_table LEFT JOIN rule_table ON val = value_to_overwrite
)
SELECT id, val AS original_value,
       CASE WHEN value_to_overwrite IS NULL THEN val
            WHEN INSTR(vals, rule_vals)=0 THEN val
            ELSE value_to_use
       END     AS new_value
FROM x

This time, LISTAGG analytic function (with DISTINCT option – recently supported by Oracle) helps matching the val against value_to_overwrite and value_to_use pair.

I strongly recommend executing parts of the above queries to gain a better understanding of the demonstrated strategies. livesql.oracle.com site offers you a great query tool with the latest version of Oracle database.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

5 Solutions to 2018 Oracle SQL Puzzle of the Week #1

January 15, 2018Zahar HilkevichLeave a comment

2018 Puzzle of the Week #1:

For a given text string, find the first (from the beginning) longest sub-string that does not have repeating characters.

Solutions:

Solution #1: Using CONNECT BY clause (for range generation), REGEXP_COUNT, and RANK() functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
), x AS ( 
SELECT SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1) substr, 
       RANK() OVER(ORDER BY r2.rn - r1.rn DESC, r1.rn) rk 
FROM r r1, r r2, w 
WHERE r1.rn<=r2.rn 
 AND REGEXP_COUNT(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1), '(.).*\1') = 0 
) 
SELECT substr 
FROM x 
WHERE rk=1

Result of execution in Oracle Live SQL client:

SUBSTR
rkans

Solution #2: Using CONNECT BY clause (for range generation), REGEXP_LIKE, and MAX() KEEP functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
) 
SELECT MAX(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1)) 
 KEEP(DENSE_RANK FIRST ORDER BY r2.rn - r1.rn DESC, r1.rn) substr 
FROM r r1, r r2, w 
WHERE r1.rn<=r2.rn 
 AND NOT REGEXP_LIKE(SUBSTR(w.word, r1.rn, r2.rn - r1.rn + 1), '(.).*\1')

Solution #3: Using CONNECT BY clause (twice), LATERAL view, REGEXP_COUNT, and RANK() functions:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), s AS ( 
SELECT SUBSTR(word, LEVEL) word, LEVEL rn 
FROM w 
CONNECT BY LEVEL<=LENGTH(word) 
) 
SELECT MAX(x.substr) 
       KEEP(DENSE_RANK FIRST ORDER BY LENGTH(x.substr) DESC, s.rn) substr 
FROM s, LATERAL(SELECT SUBSTR(s.word, 1, LEVEL) substr 
                FROM dual 
                CONNECT BY LEVEL<=LENGTH(s.word)) x 
WHERE REGEXP_COUNT(x.substr, '(.).*\1') = 0

Solution #4: Using XMLTable function (for range generation), Correlated subquery with COUNT(DISTINCT), and MAX() KEEP function:

WITH w AS ( 
SELECT 'arkansas' AS word 
FROM dual 
), r AS ( 
SELECT ROWNUM rn, word
FROM w, XMLTABLE('for $i in 1 to $N cast as xs:integer return $i' 
                 PASSING LENGTH(w.word) AS N) x
) 
SELECT MAX(SUBSTR(r1.word, r1.rn, r2.rn - r1.rn + 1))
 KEEP(DENSE_RANK FIRST ORDER BY r2.rn - r1.rn DESC, r1.rn) substr 
FROM r r1, r r2
WHERE r1.rn<=r2.rn 
 AND r2.rn - r1.rn + 1 = 
 (SELECT COUNT(DISTINCT SUBSTR(SUBSTR(r1.word, r1.rn, r2.rn - r1.rn + 1), 
                               LEVEL, 1)) 
 FROM dual 
 CONNECT BY LEVEL<=r2.rn - r1.rn + 1 
 )

Solution #5: Using CONNECT BY, Recursive CTE, INSTR, SUBSTR, and MAX() KEEP functions:

WITH w AS (
 SELECT 'arkansas' word
 FROM dual
), s(sub, word, lvl, rn) AS (
SELECT SUBSTR(word, LEVEL, 1), SUBSTR(word, LEVEL) word, 1, ROWNUM
FROM w
CONNECT BY SUBSTR(word, LEVEL) IS NOT NULL
UNION ALL
SELECT SUBSTR(word, 1, lvl+1), word, lvl+1, ROWNUM
FROM s
WHERE LENGTH(SUBSTR(word, 1, lvl+1))=lvl+1
 AND INSTR(sub, SUBSTR(word, lvl+1, 1))=0
)
SELECT MAX(sub) KEEP (DENSE_RANK FIRST ORDER BY lvl DESC, rn) substr
FROM s

You can execute the above SQL statements in Oracle Live SQL environment.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

6 Solutions to Puzzle of the Week #19

August 8, 2016August 8, 2016Zahar HilkevichLeave a comment

Puzzle of the Week #19:

Produce the department salary report (shown below) with the following assumptions/requirements:

Use Single SELECT statement only
DECODE and CASE functions are not allowed
An employee’s salary is shown in the corresponding department column (10, 20 or 30), all other department columns should contain NULLs.
The query should work in Oracle 11g.

Expected Result:

ENAME              10         20         30
---------- ---------- ---------- ----------
SMITH                        800
ALLEN                                  1600
WARD                                   1250
JONES                       2975
MARTIN                                 1250
BLAKE                                  2850
CLARK            2450
SCOTT                       3000
KING             5000
TURNER                                 1500
ADAMS                       1100
JAMES                                   950
FORD                        3000
MILLER           1300

Solutions:

#1: Using NULLIF, ABS, and SIGN functions

SELECT ename, NULLIF(sal * (1-ABS(SIGN(deptno-10))),0) "10",
              NULLIF(sal * (1-ABS(SIGN(deptno-20))),0) "20",
              NULLIF(sal * (1-ABS(SIGN(deptno-30))),0) "30"
FROM emp

#2: Using NULLIF and INSTR functions

SELECT ename, NULLIF(sal * INSTR(deptno, 10), 0) "10",
              NULLIF(sal * INSTR(deptno, 20), 0) "20",
              NULLIF(sal * INSTR(deptno, 30), 0) "30"
FROM emp

#3: Using NVL2 and NULLIF functions

SELECT ename, NVL2(NULLIF(deptno, 10), NULL, 1) * sal "10",
              NVL2(NULLIF(deptno, 20), NULL, 1) * sal "20",
              NVL2(NULLIF(deptno, 30), NULL, 1) * sal "30"
FROM emp

#4: Using PIVOT clause

SELECT *
FROM  (SELECT deptno, ename, sal
       FROM emp)
PIVOT (MAX(sal)
       FOR deptno IN (10, 20, 30)
      );

#5: Using Scalar SELECT statements in SELECT clause

SELECT ename,
       (SELECT sal FROM emp WHERE empno=e.empno AND deptno=10) "10",
       (SELECT sal FROM emp WHERE empno=e.empno AND deptno=20) "20",
       (SELECT sal FROM emp WHERE empno=e.empno AND deptno=30) "30"
FROM emp e;

#6: Using UNION (different sort order)

SELECT ename, sal "10", NULL "20", NULL "30"
FROM emp
WHERE deptno=10
UNION
SELECT ename, NULL, sal, NULL
FROM emp
WHERE deptno=20
UNION
SELECT ename, NULL, NULL, sal
FROM emp
WHERE deptno=30;

ENAME              10         20         30
---------- ---------- ---------- ----------
ADAMS                       1100
ALLEN                                  1600
BLAKE                                  2850
CLARK            2450
FORD                        3000
JAMES                                   950
JONES                       2975
KING             5000
MARTIN                                 1250
MILLER           1300
SCOTT                       3000
SMITH                        800
TURNER                                 1500
WARD                                   125

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

An interview question that checks knowledge of Oracle SQL character functions.

November 22, 2015February 22, 2016Zahar HilkevichLeave a comment

Level: Beginner

Question: Find all employee names beginning with ‘S’ not using LIKE operator.

Method #1: INSTR function

SELECT ename
FROM emp
WHERE INSTR(ename, 'S')=1
/

--Result:

ENAME
-------
SMITH
SCOTT

Workaround/Method #2: Range Test

SELECT ename
FROM emp
WHERE ename>='S' AND ename<'T'
/

--Result:

ENAME
-------
SMITH
SCOTT

Workaround/Method #3: SUBSTR function

SELECT ename
FROM emp
WHERE SUBSTR(ename, 1, 1) = 'S'
/

--Result:

ENAME
-------
SMITH
SCOTT

Workaround/Method #4: ASCII function

SELECT ename
FROM emp
WHERE ASCII(ename)=ASCII('S')
/

--Result:

ENAME
-------
SMITH
SCOTT

Workaround/Method #5: SUBSTR functions

SELECT ename
FROM emp
WHERE 'S' || SUBSTR(ename, 2) = ename
/

--Result:

ENAME
-------
SMITH
SCOTT

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.