§ Burrows Wheeler

We aim to get the $O(n)$ algorithm for burrows wheeler, by starting from the naive $O(n^2)$ implementation and then slowly chipping away to get to the fastest algorithm

§ String rotations

Given a string $s$ of length $n$, we can index it as $s[0]$, $s[1]$, upto $s[n-1]$. We can now build a table consisting of rotations of the string. We'll define:
lrot :: [a] -> [a]
lrot [] = []; lrot (x:xs) = xs ++ [x]

We can build a table of left-rotations:
foobar
oobarf
obarfo
barfoo
arfoob
rfooba

We can immediately notice that it's a symmetric matrix. We can prove this as follows. We write $lrot(rot, s)$ as an array such that:
$lrot(rot, s)[i] = s[(rot + i) % n] \text{where n = |s|}$
Now, we note that on row $r$ of our array we have the string $lrot(r, s)$. So the value at row $r$, column $c$ is $lrot(r, s)[c] = s[(r + c)%n]$. But this is symmetric in $c, r$ so can be written as $s[(c + r)%n]$, which is equal to $lrot(c, s)[r]$. Formally:
$lrot(r, s)[c] = s[(r + c) %n] = s[(c + r)%n] = lrot(c, s)[r]$

§ Sorts of string rotations

We're next interested in considering sorted order of the string rotations. For example, in the case of the string "here, there", all the rotations are:
here-there  0
ere-thereh  1
re-therehe  2
e-thereher  3
-therehere  4
therehere-  5
herehere-t  6
erehere-th  7
rehere-the  8
ehere-ther  9

which is calculated with the help of the following haskell code:
lrot :: [a] -> [a]
lrot [] = []; lrot (a:as) = as ++ [a]

lrots :: [a] -> [[a]]; lrots as = take (length as) (iterate lrot as)

main =  putStrLn $intercalate "\n" (zipWith (\s i -> s <> " " <> show i) (lrots "here-there") [0,1..])  If we now sort these rotations, we get: -therehere 0 e-thereher 1 ehere-ther 2 ere-thereh 3 erehere-th 4 here-there 5 herehere-t 6 re-therehe 7 rehere-the 8 therehere- 9  we produce this by chainging the above definition of main to: main = putStrLn$ intercalate "\n"
(zipWith (\s i -> s <> "  " <> show i)
-- | extra sort here
(sort \$ lrots "here-there") [0,1..])

Let's look at the final column of that table. We have:
-thereher|e|  0
e-therehe|r|  1
ehere-the|r|  2
ere-there|h|  3
erehere-t|h|  4
here-ther|e|  5 <- ORIGINAL STRING
herehere-|t|  6
re-thereh|e|  7
rehere-th|e|  8
therehere|-|  9

0123456789
errhhetee-

Now, we'll show how to write a really cool function call bwt such that:
bwtinv ("errhhetee-",5) = "here-there"

The 5 is the index of the original string in the list. That is, given the jumbled up last-column and the index of the original string, we're able to reconstruct the original string. The reason this is useful is that the jumbled string "errhhetee-" is easier to compress: it has long runs of r, h, and e which makes it easier to compress. The process we performed of rotating, sorting the rotations and taking the final column is the Burrows-Wheeler transform. in code:
import Data.List
lrot :: [a] -> [a]
lrot [] = []; lrot (a:as) = as ++ [a]

lrots :: [a] -> [[a]]
lrots as = take (length as) (iterate lrot as)

findix :: Eq a => a -> [a] -> Int
findix a as = length (takeWhile (/= a) as)

lastcol :: [[a]] -> [a]; lastcol = map last

bwt :: Eq a => Ord a => [a] -> ([a], Int)
bwt as = let as' = (sort . lrots) as in (lastcol as', findix as as')

Now we need to understand how bwtinv is defined and what it does. The definition is:
import Control.Arrow (&&&)

bwtinv :: Eq a => Ord a => ([a], Int) -> [a]
bwtinv (as, k) = recreate as !! k

-- recreate · lastcol · sort · rots = sort · rots
recreate :: (Eq a, Ord a) => [a] -> [[a]]
recreate as = recreate' (length as) as

recreate' :: (Eq a, Ord a) => Int -> [a] -> [[a]]
recreate' 0 = map (const [])
recreate' n = hdsort . consCol . (id &&& recreate' (n-1))

hdsort :: Ord a => [[a]] -> [[a]]
hdsort = let cmp (x:xs) (y:ys) = compare x y
in sortBy cmp

consCol :: ([a], [[a]]) -> [[a]]
consCol = uncurry (zipWith (:))

OK, so much for the code. what does it do? The idea is that:
• we recreate the entire matrix from the last column using recreateand take the kth element.
• recreate apprents a copy of the initial last column to a matrix ofcolumns, and then sorts this.

§ References

• Richard Bird: Pearls of functional algorithm design