Substitution SQL Puzzle

Level: Advanced

A colleague of mine approached me recently with a puzzle he struggled with: you have a table (let’s call it data_table) with id and val (i.e. value) columns. You are given two parameters: value_to_overwrite and value_to_use that should transform the content of the data_table in a special way:

  • If both parameters exist in the data_table in the val column for the same id, then the one that is equal to value_to_overwrite should be substituted with value_to_use
  • If none or just one of the parameters exist in the data_table.val column, than the val column should remain the same
  • List all the rows from the data_table after the transformation.

Let’s create the data_table using the following DDL command:

CREATE TABLE data_table AS
SELECT 1 id, 'a' val FROM dual
UNION ALL
SELECT 1 id, 'b' val FROM dual
UNION ALL
SELECT 1 id, 'c' val FROM dual
UNION ALL
SELECT 2 id, 'b' val FROM dual
UNION ALL
SELECT 2 id, 'd' val FROM dual
IDVAL
1a
1b
1c
2b
2d

For parameters value_to_overwrite = ‘a’ and value_to_use = ‘b’ the expected result should look like this:

IDORIGINAL_VALUENEW_VALUE
1aa
1ba
1cc
2bb
2dd

Note, that for id = 1, value ‘b’ is substituted with new value ‘a’ because both, value_to_overwrite (‘a’) and value_to_use (‘b’) exist in the val column. All other values should remain the same as substitution condition is not met.

To mimic the parameter use in the query we will create another table (rule_table) with a single row in it.

CREATE TABLE rule_table AS
SELECT 'a' value_to_use, 'b' value_to_overwrite
FROM dual

Translating requirements from English to SQL will likely result in a bulky and inefficient query. Let’s demonstrate that:

/* Values that need to be substituted */
SELECT d.id, d.val AS original_value, r.value_to_use AS new_value
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use IN (SELECT val
                         FROM data_table
                         WHERE id = d.id)
UNION ALL
/* Values that remain the same as only value_to_overwrite exist for given id */
SELECT d.id, d.val, d.val
FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite
WHERE r.value_to_use NOT IN (SELECT val
                             FROM data_table
                             WHERE id = d.id)
UNION ALL
/* Values that remain the same as value_to_overwrite does not match val */
SELECT d.id, d.val, d.val
FROM data_table d
WHERE d.val NOT IN (SELECT value_to_overwrite
                    FROM rule_table)

As you can see, there are multiple (five) copies of the data_table used, which will lead to a poor performance when the size of the table increases dramatically.

A way better approach is to take the first SELECT from the UNIONed statement above and turn the INNER JOIN into an LEFT OUTER JOIN. At the same time, we need to move the filtering condition from the WHERE clause to the JOIN (otherwise, the LEFT JOIN will work as INNER JOIN):

SELECT d.id,
       d.val                      AS original_value,
       NVL(r.value_to_use, d.val) AS new_value
FROM data_table d LEFT JOIN rule_table r 
                  ON d.val = r.value_to_overwrite
                 AND r.value_to_use IN (SELECT val
                                        FROM data_table
                                        WHERE id = d.id)

This is a quite efficient and fairly short query that uses only two copies of the data_table. Can we do better than that? Yes, we can!

WITH x AS (
SELECT id, val,
       MIN(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  min_val,
       MAX(CASE WHEN val IN (value_to_use, value_to_overwrite) 
                THEN val 
           END)
       OVER(PARTITION BY id, value_to_overwrite)  max_val,
       LEAST(value_to_use, value_to_overwrite)    min_ow,
       GREATEST(value_to_use, value_to_overwrite) max_ow,
       value_to_use, value_to_overwrite
FROM data_table CROSS JOIN rule_table 
)
SELECT id, val AS original_value,
       CASE WHEN min_val=min_ow AND
                 max_val=max_ow AND
                 val=value_to_overwrite THEN value_to_use
       ELSE val
       END AS new_value
FROM x

Analytic functions MIN and MAX let us scan the data_table vertically while LEAST and GREATEST do the same horizontally. The later pair of functions come very handy when you need to compare pairs of values, so the smaller of the values should match LEAST and the other – GREATEST.

And still, the last strategy has one flaw: we used a Cartesian Product (CROSS JOIN) which means that had we have more than one substitution rule, the method would not work properly. Let’s fix it.

First, we will add one more rule:

INSERT INTO rule_table VALUES('b', 'c')

Now, the expected result should looks as the following:

IDORIGINAL_VALUENEW_VALUE
1aa
1ba
1cb
2bb
2dd

Note, that the second rule turns original ‘c’ value into ‘b’.

And again, Analytic functions do all the magic:

WITH x AS (
SELECT id, val, value_to_overwrite, value_to_use,
       LEAST(value_to_overwrite, value_to_use) || '|' ||
       GREATEST(value_to_overwrite, value_to_use) rule_vals,
       LISTAGG(DISTINCT val, '|') WITHIN GROUP(ORDER BY val)
       OVER(PARTITION BY id) vals
FROM data_table LEFT JOIN rule_table ON val = value_to_overwrite
)
SELECT id, val AS original_value,
       CASE WHEN value_to_overwrite IS NULL THEN val
            WHEN INSTR(vals, rule_vals)=0 THEN val
            ELSE value_to_use
       END     AS new_value
FROM x

This time, LISTAGG analytic function (with DISTINCT option – recently supported by Oracle) helps matching the val against value_to_overwrite and value_to_use pair.

I strongly recommend executing parts of the above queries to gain a better understanding of the demonstrated strategies. livesql.oracle.com site offers you a great query tool with the latest version of Oracle database.

***

If you find this post useful, please press the LIKE button and subscribe.

My Oracle Group on Facebook:

Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Suggested Reading:

Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.

Leave a comment