lrot :: [a] -> [a]
lrot [] = []; lrot (x:xs) = xs ++ [x]
We can build a table of left-rotations:
foobar
oobarf
obarfo
barfoo
arfoob
rfooba
We can immediately notice that it's a symmetric matrix. We can prove
this as follows. We write $lrot(rot, s)$ as an array such that:
here-there 0
ere-thereh 1
re-therehe 2
e-thereher 3
-therehere 4
therehere- 5
herehere-t 6
erehere-th 7
rehere-the 8
ehere-ther 9
which is calculated with the help of the following haskell code:
lrot :: [a] -> [a]
lrot [] = []; lrot (a:as) = as ++ [a]
lrots :: [a] -> [[a]]; lrots as = take (length as) (iterate lrot as)
main = putStrLn $ intercalate "\n"
(zipWith (\s i -> s <> " " <> show i)
(lrots "here-there") [0,1..])
If we now sort these rotations, we get:
-therehere 0
e-thereher 1
ehere-ther 2
ere-thereh 3
erehere-th 4
here-there 5
herehere-t 6
re-therehe 7
rehere-the 8
therehere- 9
we produce this by chainging the above definition of main
to:
main = putStrLn $ intercalate "\n"
(zipWith (\s i -> s <> " " <> show i)
-- | extra `sort` here
(sort $ lrots "here-there") [0,1..])
Let's look at the final column of that table. We have:
-thereher|e| 0
e-therehe|r| 1
ehere-the|r| 2
ere-there|h| 3
erehere-t|h| 4
here-ther|e| 5 <- ORIGINAL STRING
herehere-|t| 6
re-thereh|e| 7
rehere-th|e| 8
therehere|-| 9
0123456789
errhhetee-
Now, we'll show how to write a really cool function call bwt
such that:
bwtinv ("errhhetee-",5) = "here-there"
The 5
is the index of the original string in the list. That is, given
the jumbled up last-column and the index of the original string, we're
able to reconstruct the original string. The reason this is useful is
that the jumbled string "errhhetee-"
is easier to compress: it has
long runs of r
, h
, and e
which makes it easier to compress.
The process we performed of rotating, sorting the rotations and taking
the final column is the Burrows-Wheeler transform. in code:
import Data.List
lrot :: [a] -> [a]
lrot [] = []; lrot (a:as) = as ++ [a]
lrots :: [a] -> [[a]]
lrots as = take (length as) (iterate lrot as)
findix :: Eq a => a -> [a] -> Int
findix a as = length (takeWhile (/= a) as)
lastcol :: [[a]] -> [a]; lastcol = map last
bwt :: Eq a => Ord a => [a] -> ([a], Int)
bwt as = let as' = (sort . lrots) as in (lastcol as', findix as as')
Now we need to understand how bwtinv
is defined and what it does.
The definition is:
import Control.Arrow (&&&)
bwtinv :: Eq a => Ord a => ([a], Int) -> [a]
bwtinv (as, k) = recreate as !! k
-- recreate · lastcol · sort · rots = sort · rots
recreate :: (Eq a, Ord a) => [a] -> [[a]]
recreate as = recreate' (length as) as
recreate' :: (Eq a, Ord a) => Int -> [a] -> [[a]]
recreate' 0 = map (const [])
recreate' n = hdsort . consCol . (id &&& recreate' (n-1))
hdsort :: Ord a => [[a]] -> [[a]]
hdsort = let cmp (x:xs) (y:ys) = compare x y
in sortBy cmp
consCol :: ([a], [[a]]) -> [[a]]
consCol = uncurry (zipWith (:))
OK, so much for the code. what does it do?
The idea is that:
recreate
and take the k
th element.recreate
apprents a copy of the initial last column to a matrix ofcolumns, and then sorts this.