Finding all Palindromes Contained in Strings with SQL

SQL is a really cool language. I can write really complex business logic with this logic programming language. I was again thrilled about SQL recently, at a customer site:

But whenever I tweet something like the above, the inevitable happened. I was nerd sniped. Oleg Šelajev from ZeroTurnaround challenged me to prove that SQL is so awesome:

Given a string, find all substrings from that string, which are palindromes. Challenge accepted! (For the moment, let’s forget about algorithmic complexity.)

Here’s the Full Query

Spoiler first.

Bear with me, I’ll explain it step by step afterwards. Here’s with PostgreSQL syntax:

WITH RECURSIVE
  words (word) AS (
    VALUES
      ('pneumonoultramicroscopicsilicovolcanoconiosis'),
      ('pseudopseudohypoparathyroidism'),
      ('floccinaucinihilipilification'),
      ('antidisestablishmentarianism'),
      ('supercalifragilisticexpialidocious'),
      ('incomprehensibilities'),
      ('honorificabilitudinitatibus'),
      ('tattarrattat')
  ),
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  ),
  palindromes (word, palindrome, start, length) AS (
    SELECT word, substring(word, start, x), start, x
    FROM starts CROSS JOIN (VALUES(0), (1)) t(x)
    UNION ALL
    SELECT word, palindrome, start, length + 2
    FROM (
      SELECT 
        word, 
        substring(word, start - length / 2, length) AS palindrome,
        start, length
      FROM palindromes
    ) AS p
    WHERE start - length / 2 > 0 
    AND start + (length - 1) / 2 <= length(word) 
    AND substring(palindrome, 1, 1) = 
        substring(palindrome, length(palindrome), 1)
  )
SELECT DISTINCT 
  word, 
  trim(replace(word, palindrome, ' ' || upper(palindrome) || ' '))
    AS palindromes
FROM palindromes
WHERE length(palindrome) > 1
ORDER BY 2

(You can run it yourself on SQLFiddle)

The result being:

word                                          |palindromes                                    
----------------------------------------------|-----------------------------------------------
antidisestablishmentarianism                  |ant  IDI  sestablishmentarianism               
antidisestablishmentarianism                  |antidi  SES  tablishmentarianism               
floccinaucinihilipilification                 |flo  CC  inaucinihilipilification              
floccinaucinihilipilification                 |floccinauc  INI  hilipilification              
floccinaucinihilipilification                 |floccinaucin  IHI  lipilification              
floccinaucinihilipilification                 |floccinaucinih  ILI  p  ILI  fication          
floccinaucinihilipilification                 |floccinaucinih  ILIPILI  fication              
floccinaucinihilipilification                 |floccinaucinihi  LIPIL  ification              
floccinaucinihilipilification                 |floccinaucinihil  IPI  lification              
floccinaucinihilipilification                 |floccinaucinihilipil  IFI  cation              
honorificabilitudinitatibus                   |h  ONO  rificabilitudinitatibus                
honorificabilitudinitatibus                   |honor  IFI  cabilitudinitatibus                
honorificabilitudinitatibus                   |honorificab  ILI  tudinitatibus                
honorificabilitudinitatibus                   |honorificabilitud  INI  tatibus                
honorificabilitudinitatibus                   |honorificabilitudin  ITATI  bus                
honorificabilitudinitatibus                   |honorificabilitudini  TAT  ibus                
incomprehensibilities                         |incompr  EHE  nsibilities                      
incomprehensibilities                         |incomprehens  IBI  lities                      
incomprehensibilities                         |incomprehensib  ILI  ties                      
incomprehensibilities                         |incomprehensibil  ITI  es                      
pneumonoultramicroscopicsilicovolcanoconiosis |pneum  ONO  ultramicroscopicsilicovolcanoconios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopics  ILI  covolcanoconios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilic  OVO  lcanoconios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolca  NOCON  ios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolcan  OCO  nios
pneumonoultramicroscopicsilicovolcanoconiosis |pneumonoultramicroscopicsilicovolcanoconio  SIS
pseudopseudohypoparathyroidism                |pseudopseudohy  POP  arathyroidism             
pseudopseudohypoparathyroidism                |pseudopseudohypop  ARA  thyroidism             
pseudopseudohypoparathyroidism                |pseudopseudohypoparathyro  IDI  sm             
supercalifragilisticexpialidocious            |supercalifrag  ILI  sticexpialidocious         
tattarrattat                                  |t  ATTA  rr  ATTA  t                           
tattarrattat                                  |t  ATTARRATTA  t                               
tattarrattat                                  |ta  TT  arra  TT  at                           
tattarrattat                                  |ta  TTARRATT  at                               
tattarrattat                                  |tat  TARRAT  tat                               
tattarrattat                                  |TAT  tarrat  TAT                               
tattarrattat                                  |tatt  ARRA  ttat                               
tattarrattat                                  |tatta  RR  attat                               
tattarrattat                                  |TATTARRATTAT                                    

This query uses a couple of nice features:

Common Table Expressions

They’re the only way to declare variables in SQL – i.e. you can “store” a query in such a table expression, assign a name to it, and reuse it several times. This is done with the WITH clause. The nice thing about common table expressions is that they are allowed to be RECURSIVE (depending on the database, the RECURSIVE keyword may be required / optional / not available).

VALUES() clause

The VALUES() clause is a very handy tool to create ad-hoc data in the form of tables. We did this to create a table called WORDS, which contains a couple of words inside of which we’d like to look for palindromes. In some databases (including DB2 and PostgreSQL), it’s totally possible to just use the VALUES() clause as a standalone clause instead of SELECT:

VALUES
  ('pneumonoultramicroscopicsilicovolcanoconiosis'),
  ('pseudopseudohypoparathyroidism'),
  ('floccinaucinihilipilification'),
  ('antidisestablishmentarianism'),
  ('supercalifragilisticexpialidocious'),
  ('incomprehensibilities'),
  ('honorificabilitudinitatibus'),
  ('tattarrattat')

Recursive Generation of Integers

In order to look for palindromes, the algorithm used here lists, for each word, the character index of each individual character in the word:

WITH RECURSIVE
  ...
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  ),
  ...

For example, if we used the word “word”…

WITH RECURSIVE
  words (word) AS (VALUES('word')),
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  )
SELECT *
FROM starts

We’d get the following result:

word |start 
-----|------
word |1     
word |2     
word |3     
word |4     

The idea of the algorithm is that we start from a given character and “fan out” in both directions of a character, again recursively. The characters to the left and to the right of such a character must be the same and so on.

If you’re interested in the syntax, stay tuned. There will be a blog post coming up soon.

Using CROSS JOIN to run the Algorithm Twice

There are two types of palindromes:

  • Those with an even amount of characters: “”, “aa”, “abba”, “babbab”
  • Those with an odd amount of characters: “b”, “aba”, “aabaa”

In our algorithm, we’re treating them in almost the same way by running the algorithm twice. Whenever you think “hey, I need to run this twice in SQL”, CROSS JOIN could be a good candidate, as we’re creating a cartesian product between:

  • The previous “starts” table, giving the starting character index
  • A table containing values 0 (palindromes with even amounts of letters) and 1 (palindromes with odd amounts of letters)

For more information about CROSS JOIN, read our article about JOINs.

Run this query for illustration:

WITH RECURSIVE
  words (word) AS (VALUES('word')),
  starts (word, start) AS (
    SELECT word, 1 FROM words
    UNION ALL
    SELECT word, start + 1 FROM starts WHERE start < length(word)
  )
SELECT *
FROM starts CROSS JOIN (VALUES(0),(1)) AS t(x)

Output:

word |start |x 
-----|------|--
word |1     |0  <-- Even length palindromes centering around position 1 length 0
word |1     |1  <-- Odd length palindromes centering around position 1 length 1
word |2     |0 
word |2     |1 
word |3     |0 
word |3     |1 
word |4     |0 
word |4     |1 

Trivially, a palindrome of length 0 (empty string) or length 1 (“w”, “o”, “r”, “d”) is acceptable in principle, but boring. We’ll filter them out later, but keep them in the algorithm as this simplifies the algorithm. If you want to tune the query, you could prevent generating them in the first place.

The Palindrome Algorithm

Thus far, we’ve just prepared utility data to calculate palindromes:

  • The words inside of which to search for palindromes
  • The individual character indexes to start fanning out from, in each individual word
  • The minimum palindrome length 0 (even) or 1 (odd)

Now comes the interesting part. Recursively fanning out from a starting character to find more palindromes, stopping the fanning out as soon as the new candidate is no longer a plaindrome:

WITH RECURSIVE
  ...
  palindromes (word, palindrome, start, length) AS (

    -- This part just creates the even/odd semantics
    SELECT word, substring(word, start, x), start, x
    FROM starts CROSS JOIN (VALUES(0), (1)) t(x)
    UNION ALL

    -- This part recurses by "fanning out"
    SELECT word, palindrome, start, length + 2
    FROM (
      SELECT 
        word, 
        substring(word, start - length / 2, length) AS palindrome,
        start, length
      FROM palindromes
    ) AS p
    WHERE start - length / 2 > 0 
    AND start + (length - 1) / 2 <= length(word) 
    AND substring(palindrome, 1, 1) = 
        substring(palindrome, length(palindrome), 1)
  )
  ...

It isn’t really so hard in fact. The recursion part selects recursively from the PALINDROMES table (the previously calculated palindromes). That table has 4 columns:

  • WORD: The word we’re looking for palindromes in. This is always the same per recursion
  • PALINDROME: The palindrome, i.e. the substring inside of a word. This changes per recursion
  • START: The start character index from which we started fanning out. This is always the same per recursion
  • LENGTH: The palindrome length. This increases by 2 per recursion

Let’s look at the result for “floccinaucinihilipilification”:

flo  CC  inaucinihilipilification              
floccinauc  INI  hilipilification              
floccinaucin  IHI  lipilification              
floccinaucinih  ILI  p  ILI  fication          
floccinaucinih  ILIPILI  fication              
floccinaucinihi  LIPIL  ification              
floccinaucinihil  IPI  lification              
floccinaucinihilipil  IFI  cation              

There are a total of 8 distinct palindromes contained in this word (and believe it or not, the word exists, too. Pronunciation is another thing).

Let’s “debug” the algorithm to get to this list (remember, SQL indexes are 1 based):

Start 1-4: No palindromes
Start 5: Even palindrome [4:5]
  flo  CC  inaucinihilipilification              

Start 6-11: No palindromes (I wont' repeat this further down)
Start 12: Odd palindrome [11:13]
  floccinauc  INI  hilipilification              

Start 14: Odd palindrome [13:15]
  floccinaucin  IHI  lipilification              

Start 16: Odd palindrome [15:17]
  floccinaucinih  ILI  p  ILI  fication          

Start 18: Odd palindrome [17:19], [16:20], [15:21] (Fanning out 3 times)
  floccinaucinihil  IPI  lification              
  floccinaucinihi  LIPIL  ification              
  floccinaucinih  ILIPILI  fication              

Start 20: Odd palindrome [19:17] (already found this)
  floccinaucinih  ILI  p  ILI  fication          

Start 22: Odd palindrome [21:23]
floccinaucinihilipil  IFI  cation              

The IPI, LIPIL, ILIPILI chain is the most interesting. We’ve succeeded to fan out 3 times adding new characters from WORD on both sides of the initial character.

When do we stop fanning out? Whenever one of these conditions hold true:

    WHERE start - length / 2 > 0 
    AND start + (length - 1) / 2 <= length(word) 
    AND substring(palindrome, 1, 1) = 
        substring(palindrome, length(palindrome), 1)

I.e.

  • When we’ve reached the beginning of WORD (no more characters to the left)
  • When we’ve reached the end of WORD (no more characters to the right)
  • When the letter to the left of the palindrome of the previous recursion doesn’t match the letter to the right of the palindrome

That last predicate could also simply read:

    AND palindrome = reverse(palindrome)

But that might be a bit slower as it compares something that we’ve already proven to be true in the previous recursion.

Finally, formatting the result

The final part isn’t too interesting anymore:

SELECT DISTINCT 
  word, 
  trim(replace(word, palindrome, ' ' || upper(palindrome) || ' '))
    AS palindromes
FROM palindromes
WHERE length(palindrome) > 1
ORDER BY 2

We’ll simply:

  • Select DISTINCT results only, as palindromes might appear several times in a word
  • Replace the palindrome substring by its upper case version and add some whitespace to better visualise it. That’s totally optional, of course
  • Remove the trivial palindromes of length 0 and 1

And we’re done! Again, feel free to play around with this on SQLFiddle, or, much better, provide a cool palindrome algorithm implementation in your favourite language in the comments section, or on Twitter:

More beautiful SQL in these articles here:

3 thoughts on “Finding all Palindromes Contained in Strings with SQL

  1. I got asked to do this as an interview test recently in PL/SQL. I did a simple nested loops solution. Here’s the rough equivalent in Oracle SQL:

    WITH words AS (
    	SELECT COLUMN_VALUE word, Length (COLUMN_VALUE) len
          FROM TABLE (SYS.ODCIVarchar2List (
    		'pneumonoultramicroscopicsilicovolcanoconiosis',
    		'pseudopseudohypoparathyroidism',
    		'floccinaucinihilipilification',
    		'antidisestablishmentarianism',
    		'supercalifragilisticexpialidocious',
    		'incomprehensibilities',
    		'honorificabilitudinitatibus',
    		'tattarrattat'
    	))
    ),
    rowgen AS (
    	SELECT LEVEL rn
          FROM DUAL
        CONNECT BY LEVEL <= (SELECT Max(len) FROM words)
    )
    SELECT w.word, s.rn srn, e.rn ern, Substr (w.word, s.rn, e.rn - s.rn + 1) subword
      FROM words w
      JOIN rowgen s
        ON s.rn < w.len
      JOIN rowgen e
        ON e.rn  s.rn
     WHERE Substr (w.word, s.rn, e.rn - s.rn + 1) 
         = Reverse (Substr (w.word, s.rn, e.rn - s.rn + 1))
     ORDER BY 1, 2, 3
    
    • Yeah, nice. Thanks for sharing. Oracle’s CONNECT BY syntax is much simpler than the CTE / WITH syntax for simple iterations.

      At first, I was also going to simply iterate over all the positions and possible lengths and check for palindromes with REVERSE(). But I got nerd sniped into further optimising this and avoiding REVERSE() 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s