Puzzle of the Week #13: Duplicate Names

Puzzle of the Week #13

Consider the following table:

SQL> SELECT *
  2  FROM name_list;

   NAME_ID FULL_NAME
---------- -----------------------------------
         1 John Smith
         2 John L. Smith
         3 Tom Blake
         4 Smith, John
         5 Tom Khan
         6 Pete Ward
         7 Peter Ward
         8 Oscar Pedro Fernando Rodriguez
         9 Rodriguez, Oscar Pedro Fernando
        10 Oscar Fernando Rodriguez Pedro
        11 KHAN, TOM S.
        12 Ward, Peter Peter

12 rows selected.

With a single SELECT statement retrieve a list of all duplicate names assuming the following:

  • Case should be ignored
  • Punctuation characters should be ignored
  • Single character initials should be ignored
  • Names are considered matching if they consist of the same set of name tokens, for ex: “John Smith” should match “SMITH, John L.” as the names consist of the same list of two tokens: John and Smith (case is ignored as well as punctuation and middle initial).

Expected Result:

   ID FULL_NAME                             GROUP_ID
---- ----------------------------------- ----------
   8 Oscar Pedro Fernando Rodriguez               1
   9 Rodriguez, Oscar Pedro Fernando              1
  10 Oscar Fernando Rodriguez Pedro               1
   1 John Smith                                   2
   2 John L. Smith                                2
   4 Smith, John                                  2
   5 Tom Khan                                     3
  11 KHAN, TOM S.                                 3

Table Setup:

CREATE TABLE name_list
(
    name_id   NUMBER,
    full_name VARCHAR2(100)
);

INSERT INTO name_list VALUES(1,'John Smith');
INSERT INTO name_list VALUES(2,'John L. Smith');
INSERT INTO name_list VALUES(3,'Tom Blake');
INSERT INTO name_list VALUES(4,'Smith, John');
INSERT INTO name_list VALUES(5,'Tom Khan');
INSERT INTO name_list VALUES(6,'Pete Ward');
INSERT INTO name_list VALUES(7,'Peter Ward');
INSERT INTO name_list VALUES(8,'Oscar Pedro Fernando Rodriguez');
INSERT INTO name_list VALUES(9,'Rodriguez, Oscar Pedro Fernando');
INSERT INTO name_list VALUES(10,'Oscar Fernando Rodriguez Pedro');
INSERT INTO name_list VALUES(11,'KHAN, TOM S.');
INSERT INTO name_list VALUES(12,'Ward, Peter Peter');

COMMIT;

To submit your answer (one or more!) please start following this blog and add a comment to this post.

A correct answer (and workarounds!) will be published here in a week.

My Oracle Group on Facebook:

If you like this post, you may want to join my new Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/

Would you like to read about many more tricks and puzzles?

For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.

 

 

Advertisements

One thought on “Puzzle of the Week #13: Duplicate Names

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s