Monday, 4 August 2008

Yet more mini benchmarks

Ive uploaded IntMap (like OrdMap its finished except for 'merge'). This mini-benchmark is much like the last.

> {-# OPTIONS_GHC -O2 -fno-monomorphism-restriction #-}
> import qualified Data.Map as M

> import qualified Data.GMap as G
> import Data.GMap.ListMap
> import Data.GMap.OrdMap
> import Data.GMap.IntMap

> import Control.Monad
> import Data.Maybe

> ($$) = liftM

This time we will turn gibberish characters in gibberish Ints

> dict = {-# SCC "dict" #-} map (map fromEnum) $$ take 800000 $$ lines $$ readFile "gibberish"

> test = {-# SCC "test" #-} do
> d <- dict
> let mp :: M.Map [Int] ()
> mp = {-# SCC "Map.fromList" #-} M.fromAscList $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "Map.lookup" #-} M.lookup w mp) d

> testL = {-# SCC "testL" #-} do
> d <- dict
> let mp :: ListMap (OrdMap Int) Int ()
> mp = {-# SCC "OrdList-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "OrdList-GMap.lookup" #-} G.lookup w mp) d

IntMap is an avl tree over unboxed Ints.

> testI = {-# SCC "testI" #-} do
> d <- dict
> let mp :: ListMap IntMap Int ()
> mp = {-# SCC "IntList-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "IntList-GMap.lookup" #-} G.lookup w mp) d

> main = do
> print =<< test
> print =<< testL
> print =<< testI

And ...

total time = 24.48 secs (1224 ticks @ 20 ms)
total alloc = 7,092,895,320 bytes (excludes profiling overheads)

COST CENTRE %time %alloc

Map.lookup 8.2 0.1
Map.fromList 0.2 0.5

OrdList-GMap.lookup 8.9 0.7
OrdList-GMap.fromAssocs 0.5 0.5

IntList-GMap.lookup 4.7 0.7
IntList-GMap.fromAssocs 0.3 0.5

IntMap is a fair bit faster than the others, as you would expect from such a specialised data structure. So what am I going to do with it?

Heres a snippet of Data.Serial (based heavily on Data.Binary)

> class Serial a where
> put :: a -> Put a
> get :: Get a

> encode :: Serial a => a -> [Int]

So I can now run off and write

> data SerialMap k a = SerialMap (ListMap IntMap Int a)

> instance Serial k => Map (SerialMap k) k where ...

Hopefully with some tweaking the Serial package will become fast enough for this to be interesting. This is aimed at situations where you might want to trade off access time (encoding is always going to slow this down) for space (4 UTF8 chars to an Int :-D ).

0 comments:

About Me