This puzzle is sometimes offered at non-SQL technical interviews with companies like Google or Amazon:
For a given text string, find the first (from the beginning) longest sub-string that does not have repeating characters.
For example, the longest sub-string with no repeating characters in the word anaconda is acond; and in the word: stereotype – it is reotyp.
We move the bar higher and will solve this quite challenging problem in Oracle SQL:
- Remember to use only a single SELECT statement.
- The character case should be ignored (use or convert the value to a lower case)
- You have about 1 week to solve the puzzle and submit your solution(s) but whoever does it sooner will earn more points.
- The scoring rules can be found here.
If this puzzle is a bit tricky for you, start with a simpler one:
For a given text string, list all possible sub-strings.
If you are ready to give up, please don’t – try to come up with a PL/SQL solution (at very least). It could be a function that takes a VARCHAR2 argument and returns a VARCHAR2 result value.
A correct answer (and workarounds!) will be published here in a week.
My Oracle Group on Facebook:
If you like this post, you may want to join my Oracle group on Facebook: https://www.facebook.com/groups/sqlpatterns/
Would you like to read about many more tricks and puzzles?
For more tricks and cool techniques check my book “Oracle SQL Tricks and Workarounds” for instructions.
WITH
—————————
DATA(COL)AS(
SELECT ‘anaconda’ FROM DUAL UNION ALL
SELECT ‘stereotype’ FROM DUAL),
—————————
DATAVIEW(RN,COL)AS(
SELECT ROWNUM,COL FROM DATA),
—————————
WORKVIEW1(RN,POS,COL)AS(
SELECT
RN,ROW_NUMBER()OVER(PARTITION BY RN ORDER BY NULL),COL
FROM DATAVIEW
CROSS JOIN XMLTABLE(‘for $i in 1 to $LENGTH_COL cast as xs:integer return $i’
PASSING LENGTH(COL) AS LENGTH_COL)),
—————————
WORKVIEW2(RN,POS,COL,R)AS(
SELECT
RN,POS,COL,QTYCHK
FROM WORKVIEW1
CROSS JOIN XMLTABLE(‘for $i in 1 to $LENGTH_COL cast as xs:integer return $i’
PASSING LENGTH(COL)-POS+1 AS LENGTH_COL
COLUMNS QTYCHK NUMBER(32) PATH ‘.’)),
—————————
WORKVIEW3(RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL)AS(
SELECT
RN,POS,COL,R,COLUMN_VALUE,SUBSTR(COL,POS,R),SUBSTR(SUBSTR(COL,POS,R),QTYCHK,1)
FROM WORKVIEW2
CROSS JOIN XMLTABLE(‘for $i in 1 to $R cast as xs:integer return $i’
PASSING R AS R
COLUMNS QTYCHK NUMBER(32) PATH ‘.’)),
—————————
WORKVIEW4(RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL,COL_PART_DETAIL_CNT)AS(
SELECT RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL,COUNT(DISTINCT COL_PART_DETAIL)OVER(PARTITION BY RN,POS,R)
FROM WORKVIEW3),
—————————
WORKVIEW5(RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL,COL_PART_DETAIL_CNT,MAX_R)AS(
SELECT RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL,COL_PART_DETAIL_CNT,MAX(R)OVER(PARTITION BY RN)
FROM WORKVIEW4
WHERE R = COL_PART_DETAIL_CNT),
—————————
WORKVIEW6(RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL,COL_PART_DETAIL_CNT,MAX_R,MIN_POS)AS(
SELECT RN,POS,COL,R,R2,COL_PART,COL_PART_DETAIL,COL_PART_DETAIL_CNT,MAX_R,MIN(POS)OVER(PARTITION BY RN)
FROM WORKVIEW5
WHERE R = MAX_R)
—————————
SELECT MAX(COL) COL,MAX(COL_PART) COL_PART FROM WORKVIEW6
WHERE POS = MIN_POS
GROUP BY RN
ORDER BY RN;
CORRECT though overly complex. The use of XMLTable is interesting. The whole thing can be greatly simplified. Anyway, it’s a good start!
WITH temp
AS ( SELECT LEVEL ll
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('anaconda')),
temp2
AS ( SELECT DISTINCT SUBSTR ('anaconda', ll, LEVEL) str, ll
FROM DUAL, temp
CONNECT BY LEVEL <= LENGTH ('anaconda')),
temp3
AS (SELECT str, ll
FROM temp2
WHERE NOT REGEXP_LIKE (str, '([A-Za-z]).*\1')),
temp4
AS (SELECT str, ll
FROM temp3
WHERE LENGTH (str) = (SELECT MAX (LENGTH (str)) FROM temp3))
SELECT str
FROM temp4
WHERE ll = (SELECT MIN (ll) FROM temp4);
–takes around 34 seconds
This seems to work, but so inefficient … At the same time, your solution has ALL components of a very efficient strategy that I will show in a subsequent post with solutions. Keep trying!
with x as
(
select level i from dual connect by level length(str) then i+1 else i end, case when j <= length(str) then j+1 else i+1 end, substr(str, i, j) from y
where i<=j
and i<= length(str)
),
z as
(
select distinct letter, i from y
where letter is not null
and NOT REGEXP_LIKE (letter, '([A-Za-z]).*\1')
),
final as
(
select * from z
where LENGTH (letter) = (SELECT MAX (LENGTH (letter)) FROM z)
)
select letter from final
where i = (select min(i) from final);
–takes miliseconds
with x as
(
select level i from dual connect by level length(str) then i+1 else i end, case when j length(str) then substr(str, i+1, 1) else substr(str, i, j) end from y
where i<=j
and i<= length(str)
),
z as
(
select distinct letter, i from y
where letter is not null
and NOT REGEXP_LIKE (letter, '([A-Za-z]).*\1')
),
final as
(
select * from z
where LENGTH (letter) = (SELECT MAX (LENGTH (letter)) FROM z)
)
select letter from final
where i = (select min(i) from final);
–bit of correct in y to get all the possible sub-strings (the logic is same and it takes milliseconds)
— Some lines are missing out at the time of commenting here.. posting again
WITH x
AS ( SELECT LEVEL i
FROM DUAL
CONNECT BY LEVEL LENGTH (str) THEN i + 1 ELSE i END,
CASE WHEN j LENGTH (str) THEN SUBSTR (str, i + 1, 1)
ELSE SUBSTR (str, i, j)
END
FROM y
WHERE i <= j AND i <= LENGTH (str)),
z
AS (SELECT DISTINCT letter, i
FROM y
WHERE letter IS NOT NULL
AND NOT REGEXP_LIKE (letter, '([A-Za-z]).*\1')),
final
AS (SELECT *
FROM z
WHERE LENGTH (letter) = (SELECT MAX (LENGTH (letter)) FROM z))
SELECT letter
FROM final
WHERE i = (SELECT MIN (i) FROM final);
Not sure why the lines are being trimmed
Use livesql.oracle.com – save a script and share a link with me
–Live SQL Link
https://livesql.oracle.com/apex/livesql/s/f377kbahyxe3az3dbuqft1bfe
WITH ALIAS1 AS (SELECT ‘stereotype’ AS INPUT FROM DUAL),
ALIAS2(LVL) AS
(
SELECT LENGTH(INPUT)-1 AS LVL FROM ALIAS1
UNION ALL
SELECT LVL-1
FROM ALIAS2
WHERE LVL > 0
)
SELECT SUBSTRINGS
FROM
(SELECT SUBSTRINGS , ROW_NUMBER() OVER (ORDER BY INITIALS) AS RN
FROM
(SELECT MAX(LENGTH(SUBSTRINGS)) OVER () AS MAX_LEN, SUBSTRINGS , MAX(LENGTH(SUBSTRINGS)) AS LEN, MAX(LVL) AS INITIALS FROM
(SELECT DISTINCT SUBSTR((SELECT INPUT FROM ALIAS1) , A1.LVL , LVLL) AS SUBSTRINGS , A1.LVL FROM ALIAS2 A1 CROSS APPLY (SELECT A2.LVL AS LVLL FROM ALIAS2 A2 WHERE A1.LVL < A2.LVL))
WHERE NOT REGEXP_LIKE (SUBSTRINGS, '([A-Za-z]).*\1')
GROUP BY SUBSTRINGS)
WHERE MAX_LEN = LEN)
WHERE RN = 1
The above query works but I guess it would fail at certain point. Lets say I am taking the word “aana”, then the longest substring with out non-repetitive characters would be “an”.
But your query as it is not finding all possible substrings, it would give “a”, which is wrong.
Thanks a lot for your feedback.
Had updated, hopefully these one will suit well for all cases!
WITH ALIAS1 AS (SELECT ‘anaconda’ AS INPUT FROM DUAL),
ALIAS2(LVL) AS
(
SELECT LENGTH(INPUT) AS LVL FROM ALIAS1
UNION ALL
SELECT LVL-1
FROM ALIAS2
WHERE LVL > 1
)
SELECT SUBSTRINGS
FROM
(SELECT SUBSTRINGS , ROW_NUMBER() OVER (ORDER BY INITIALS) AS RN
FROM
(SELECT MAX(LENGTH(SUBSTRINGS)) OVER () AS MAX_LEN, SUBSTRINGS , MAX(LENGTH(SUBSTRINGS)) AS LEN, MAX(LVL) AS INITIALS FROM
(SELECT DISTINCT SUBSTR((SELECT INPUT FROM ALIAS1) , A1.LVL , LVLL) AS SUBSTRINGS , A1.LVL FROM ALIAS2 A1 CROSS APPLY (SELECT A2.LVL AS LVLL FROM ALIAS2 A2 WHERE A1.LVL <= A2.LVL))
WHERE NOT REGEXP_LIKE (SUBSTRINGS, '([A-Za-z]).*\1')
GROUP BY SUBSTRINGS)
WHERE MAX_LEN = LEN)
WHERE RN = 1
WITH
DATA(COL)AS(
SELECT ‘anaconda’ FROM DUAL UNION ALL
SELECT NULL FROM DUAL UNION ALL
SELECT ‘stereotype’ FROM DUAL),
DATAVIEW(RN,COL)AS(
SELECT ROWNUM,COL FROM DATA),
WORKVIEW(RN,POS,COL,COLPART)AS(
SELECT RN,ROW_NUMBER()OVER(PARTITION BY RN ORDER BY NULL),COL,SUBSTR(COL,QTYCHK)
FROM DATAVIEW
CROSS JOIN XMLTABLE(‘for $i in 1 to $LENGTH_COL cast as xs:integer return $i’
PASSING NVL(LENGTH(COL),0) AS LENGTH_COL
COLUMNS QTYCHK NUMBER(32) PATH ‘.’))
SELECT MAX(COL),MAX(ANS)KEEP(DENSE_RANK LAST ORDER BY LENGTH(ANS),POS DESC)
FROM(
SELECT RN,COL,POS,ANS
FROM WORKVIEW
MODEL
PARTITION BY(RN,POS)
DIMENSION BY(0 AS SOEJI)
MEASURES(COL,COLPART,CAST(NULL AS VARCHAR2(4000)) AS ANS,CAST(NULL AS VARCHAR2(4000)) AS WILLANS)
RULES ITERATE(4000) UNTIL NVL(LENGTH(COLPART[0]),0) <= ITERATION_NUMBER+1
(WILLANS[0] = WILLANS[0]||SUBSTR(COLPART[0],ITERATION_NUMBER+1,1)
,WILLANS[0] = CASE WHEN REGEXP_COUNT(WILLANS[0],SUBSTR(COLPART[0],ITERATION_NUMBER+1,1)) = 1 THEN WILLANS[0] ELSE SUBSTR(COLPART[0],ITERATION_NUMBER+1,1) END
,ANS[0]=CASE WHEN NVL(LENGTH(ANS[0]),0)<NVL(LENGTH(WILLANS[0]),0) THEN WILLANS[0] ELSE ANS[0] END)
)
GROUP BY RN;