Level: Advanced
A colleague of mine approached me recently with a puzzle he struggled with: you have a table (let’s call it data_table) with id and val (i.e. value) columns. You are given two parameters: value_to_overwrite and value_to_use that should transform the content of the data_table in a special way:
- If both parameters exist in the data_table in the val column for the same id, then the one that is equal to value_to_overwrite should be substituted with value_to_use
- If none or just one of the parameters exist in the data_table.val column, than the val column should remain the same
- List all the rows from the data_table after the transformation.
Let’s create the data_table using the following DDL command:
CREATE TABLE data_table AS SELECT 1 id, 'a' val FROM dual UNION ALL SELECT 1 id, 'b' val FROM dual UNION ALL SELECT 1 id, 'c' val FROM dual UNION ALL SELECT 2 id, 'b' val FROM dual UNION ALL SELECT 2 id, 'd' val FROM dual
ID | VAL |
---|---|
1 | a |
1 | b |
1 | c |
2 | b |
2 | d |
For parameters value_to_overwrite = ‘a’ and value_to_use = ‘b’ the expected result should look like this:
ID | ORIGINAL_VALUE | NEW_VALUE |
---|---|---|
1 | a | a |
1 | b | a |
1 | c | c |
2 | b | b |
2 | d | d |
Note, that for id = 1, value ‘b’ is substituted with new value ‘a’ because both, value_to_overwrite (‘a’) and value_to_use (‘b’) exist in the val column. All other values should remain the same as substitution condition is not met.
To mimic the parameter use in the query we will create another table (rule_table) with a single row in it.
CREATE TABLE rule_table AS SELECT 'a' value_to_use, 'b' value_to_overwrite FROM dual
Translating requirements from English to SQL will likely result in a bulky and inefficient query. Let’s demonstrate that:
/* Values that need to be substituted */ SELECT d.id, d.val AS original_value, r.value_to_use AS new_value FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite WHERE r.value_to_use IN (SELECT val FROM data_table WHERE id = d.id) UNION ALL /* Values that remain the same as only value_to_overwrite exist for given id */ SELECT d.id, d.val, d.val FROM data_table d JOIN rule_table r ON d.val = r.value_to_overwrite WHERE r.value_to_use NOT IN (SELECT val FROM data_table WHERE id = d.id) UNION ALL /* Values that remain the same as value_to_overwrite does not match val */ SELECT d.id, d.val, d.val FROM data_table d WHERE d.val NOT IN (SELECT value_to_overwrite FROM rule_table)
As you can see, there are multiple (five) copies of the data_table used, which will lead to a poor performance when the size of the table increases dramatically.
A way better approach is to take the first SELECT from the UNIONed statement above and turn the INNER JOIN into an LEFT OUTER JOIN. At the same time, we need to move the filtering condition from the WHERE clause to the JOIN (otherwise, the LEFT JOIN will work as INNER JOIN):
SELECT d.id, d.val AS original_value, NVL(r.value_to_use, d.val) AS new_value FROM data_table d LEFT JOIN rule_table r ON d.val = r.value_to_overwrite AND r.value_to_use IN (SELECT val FROM data_table WHERE id = d.id)
This is a quite efficient and fairly short query that uses only two copies of the data_table. Can we do better than that? Yes, we can!
WITH x AS ( SELECT id, val, MIN(CASE WHEN val IN (value_to_use, value_to_overwrite) THEN val END) OVER(PARTITION BY id, value_to_overwrite) min_val, MAX(CASE WHEN val IN (value_to_use, value_to_overwrite) THEN val END) OVER(PARTITION BY id, value_to_overwrite) max_val, LEAST(value_to_use, value_to_overwrite) min_ow, GREATEST(value_to_use, value_to_overwrite) max_ow, value_to_use, value_to_overwrite FROM data_table CROSS JOIN rule_table ) SELECT id, val AS original_value, CASE WHEN min_val=min_ow AND max_val=max_ow AND val=value_to_overwrite THEN value_to_use ELSE val END AS new_value FROM x
Analytic functions MIN and MAX let us scan the data_table vertically while LEAST and GREATEST do the same horizontally. The later pair of functions come very handy when you need to compare pairs of values, so the smaller of the values should match LEAST and the other – GREATEST.
And still, the last strategy has one flaw: we used a Cartesian Product (CROSS JOIN) which means that had we have more than one substitution rule, the method would not work properly. Let’s fix it.
First, we will add one more rule:
INSERT INTO rule_table VALUES('b', 'c')
Now, the expected result should looks as the following:
ID | ORIGINAL_VALUE | NEW_VALUE |
---|---|---|
1 | a | a |
1 | b | a |
1 | c | b |
2 | b | b |
2 | d | d |
Note, that the second rule turns original ‘c’ value into ‘b’.
And again, Analytic functions do all the magic:
WITH x AS ( SELECT id, val, value_to_overwrite, value_to_use, LEAST(value_to_overwrite, value_to_use) || '|' || GREATEST(value_to_overwrite, value_to_use) rule_vals, LISTAGG(DISTINCT val, '|') WITHIN GROUP(ORDER BY val) OVER(PARTITION BY id) vals FROM data_table LEFT JOIN rule_table ON val = value_to_overwrite ) SELECT id, val AS original_value, CASE WHEN value_to_overwrite IS NULL THEN val WHEN INSTR(vals, rule_vals)=0 THEN val ELSE value_to_use END AS new_value FROM x
This time, LISTAGG analytic function (with DISTINCT option – recently supported by Oracle) helps matching the val against value_to_overwrite and value_to_use pair.
I strongly recommend executing parts of the above queries to gain a better understanding of the demonstrated strategies. livesql.oracle.com site offers you a great query tool with the latest version of Oracle database.
***
If you find this post useful, please press the LIKE button and subscribe.
My Oracle Group on Facebook:
Also, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/
Suggested Reading:
Would you like to read about many more tricks and puzzles? For more clever tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds”.