## Solutions to Puzzle of the Week #13

### Puzzle of the Week #13:

Table Setup and Puzzle description can be located here

### Expected Result:

```  ID FULL_NAME                             GROUP_ID
---- ----------------------------------- ----------
8 Oscar Pedro Fernando Rodriguez               1
9 Rodriguez, Oscar Pedro Fernando              1
10 Oscar Fernando Rodriguez Pedro               1
1 John Smith                                   2
2 John L. Smith                                2
4 Smith, John                                  2
5 Tom Khan                                     3
11 KHAN, TOM S.                                 3```

### Solutions:

#### #1. Using CTE (Recursive WITH) and LISTAGG

```WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y(id, token, lvl) AS (
SELECT name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, 1), 1
FROM x
UNION ALL
SELECT x.name_id, REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1), y.lvl+1
FROM x JOIN y ON x.name_id=y.id AND REGEXP_SUBSTR(full_name, '[^ ]+', 1, y.lvl+1) IS NOT NULL
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name,
COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

ID FULL_NAME                                  GROUP_ID
--- ---------------------------------------- ----------
8 Oscar Pedro Fernando Rodrigues                    1
9 Rodrigues, Oscar Pedro Fernando                   1
10 Oscar Fernando Rodrigues Pedro                    1
1 John Smith                                        2
2 John L. Smith                                     2
4 Smith, John                                       2
5 Tom Khan                                          3
11 KHAN, TOM S.                                      3```

#### Explanation:

The key idea is to split each name into multiple name tokens, then sort and merge them back into a single line. Matching (duplicate) names will have the same merged line so we could use it to identify duplicates. DENSE_RANK analytic function is used to generate sequential group id values.

The same idea is used in the solution below. The only difference is the way to split the names into tokens.

#### #2: Using CONNECT BY and TABLE/CAST/MULTISET functions

``` WITH x AS (
SELECT name_id, UPPER(REGEXP_REPLACE(full_name,'[[:punct:]]')) full_name
FROM name_list
), y AS (
SELECT name_id AS id, y.column_value AS token
FROM x,
TABLE(CAST(MULTISET(SELECT REGEXP_SUBSTR(x.full_name, '[^ ]+', 1, LEVEL) token
FROM dual
CONNECT BY LEVEL <= LENGTH(full_name)-LENGTH(REPLACE(full_name,' '))+1
)
AS sys.odcivarchar2list)
) y
WHERE LENGTH(y.column_value)>1
), z AS (
SELECT id, LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token) ordered_name,
COUNT(*)OVER(PARTITION BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) cnt,
DENSE_RANK()OVER(ORDER BY LISTAGG(token, ' ') WITHIN GROUP(ORDER BY token)) group_id
FROM y
WHERE LENGTH(token)>1
GROUP BY id
)
SELECT z.id, n.full_name, DENSE_RANK()OVER(ORDER BY group_id) group_id
FROM z JOIN name_list n ON z.id=n.name_id
WHERE z.cnt>1
ORDER BY 3, 1;

ID FULL_NAME                                  GROUP_ID
---- ---------------------------------------- ----------
8 Oscar Pedro Fernando Rodrigues                    1
9 Rodrigues, Oscar Pedro Fernando                   1
10 Oscar Fernando Rodrigues Pedro                    1
1 John Smith                                        2
2 John L. Smith                                     2
4 Smith, John                                       2
5 Tom Khan                                          3
11 KHAN, TOM S.                                      3```