I've almost finished updating the polymorphic version of the avl trees. The only function missing is 'merge' which I'm hoping Adrian will be free to write as it requires some digging in his AvlTree package. During the rest of the week I'm going to work on the unboxed Int avl trees and then the serialisation package which will be using them.
Time for another benchmark...
Boring imports
> {-# OPTIONS_GHC -O2 -fno-monomorphism-restriction #-}
> import qualified Data.Map as M
Data.GMap can be found at code.haskell.org/gmap/api
> import qualified Data.GMap as G
> import Data.GMap.ListMap
> import Data.GMap.OrdMap
> import Data.GMap.AssocList
> import System.IO
> import System.IO.Unsafe
> import Control.Monad
> import Data.Maybe
> import Data.List
> ($$) = liftM
"gibberish" is /usr/dict/words, except to make the benchmark large enough I've made 26 versions of each word by prepending characters.
> dict = {-# SCC "dict" #-} take 2000000 $$ lines $$ readFile "gibberish"
We'll turn the dictionary into a map and then lookup each word in turn.
> test = {-# SCC "test" #-} do
> d <- dict
> let mp = {-# SCC "Map.fromList" #-} M.fromAscList $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "Map.lookup" #-} M.lookup w mp) d
And the same using a plain OrdMap (avl tree).
> testO = {-# SCC "testO" #-} do
> d <- dict
> let mp :: OrdMap String ()
> mp = {-# SCC "Ord-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "Ord-GMap.lookup" #-} G.lookup w mp) d
And again with a ListMap parameterised over an OrdMap.
> testL = {-# SCC "testL" #-} do
> d <- dict
> let mp :: ListMap (OrdMap Char) Char ()
> mp = {-# SCC "List-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "List-GMap.lookup" #-} G.lookup w mp) d
> main = do
> print =<< test
> print =<< testO
> print =<< testL
And these are the results (edited for easier reading):
total time = 44.14 secs (2207 ticks @ 20 ms)
total alloc = 10,699,481,064 bytes (excludes profiling overheads)
COST CENTRE %time %alloc
Map.lookup 11.7 0.1
Map.fromList 3.4 3.3
Ord-GMap.lookup 12.0 0.1
Ord-GMap.fromAssocs 0.4 0.9
List-GMap.lookup 8.4 1.2
List-GMap.fromAssocs 0.3 0.9
The point of this is not that ListMap is blazingly fast (although its not bad) but that the Map interface makes it possible to build maps tailored to a particular problem by composing maps and map combinators.
Now I have investigate why ListMap's lookup is causing so much allocation. 1.2% of 10.6 GB is a lot of allocation for a function thats not modifying anything.
0 comments:
Post a Comment