Musings

Saturday, 21 February 2009

Shaking off the dust

I sort of forgot I had a blog. So here is a big rambling summary of anything that happened in the last 3 months that might be interesting to the sorts of people reading planet haskell.

Firstly the 'PCI in Haskell' reading group - which unfortunately never quite picked up enough momentum to work. The code that was produced is here. You can find most of a binding to the *huge* last.fm api and a fairly sparse collection of machine learning algorithms. I'll carry on posting code to that repo since I've discovered that writing code works better than taking lecture notes. Right now I have some almost-finished rewrites of the neural network section in terms of linear algebra and automatic differentiation. It makes a lot of things much clearer, to the point where I prefer not to think in terms of neural networks anymore.

I also found a tentative supervisor for my dissertation. I'm going to be analysing Tribler's epidemic protocols. Lurking behind that clearly programmer-designed interface is a p2p social network featuring cool stuff like gossip-based reputation, bandwidth as currency, decentralised user clustering and torrent recommendation all laid on top of an massive, iterated game of prisoners dilema. What they don't have is any kind of model for how this network behaves, other than the traditional 'Well we ran it and it worked'. The idea is that if I throw fancy maths at it for 6 months I will get some way towards finding out why it works, how to make it work better and how to stop bad people from breaking/abusing it.

I have a new background project going too. Since I graduate (again) soon and the horrible spectre of employment is looming I've decided to try my hand at making some money in the exciting world of mobile phone games. Think cross between scorched earth and nethack overlaid on the real world using the phones gps. If you want to dodge an incoming mortar you have to physically outrun it. Want to explore far off areas? Catch a bus.

Having seen so many indy game projects fail to ever get started (looking at you, squidi) I'm very wary of being over-ambitious. This is a really simple, self-contained project thats totally finishable. I estimate that even with my dissertation, (ridiculously easy) lectures/homework, inevitable unforseen bugs, innate laziness and general distractability I can have a working, saleable game
by september-ish. Just in case, my roadmap is set up so that at each tiny step along the way I produce a working game. Hello world -> single player + joystick -> 10-20 players + joystick -> 10-20 players + gps in local area etc.

Most of the games on the iphone seem to be priced about the $10-20 mark, which seems silly. I imagine the android market and the new nokia/symbian market will end up with similar prices. I'm thinking of aiming at $3-5, the kind of price people might pay to kill half an hour waiting for the bus. But I dont know, these things are always more complicated then they seem. If the whole thing flops I'll still have had fun writing the game.

On the technical side, I'm planning to have all the game logic on the server. It means a slightly higher load but it makes getting everything in sync easier and means I can easily write dumb interfaces for lots of different OSs. Besides, on the scale of google maps, people just dont move very fast. Its hardly Quake.

The server I'm planning to write in OCaml (gasp!).

Because it seems to have better networking libraries than Haskell (hardened by mldonkey).

Because type classes are starting to feel like global state and I want to see how ml modules compare.

Because ocaml batteries, though still in alpha, looks like it might be mature enough to take the edge off the bleak and desolate standard library.

Because I can pretend its Alice (it was never meant to be).

But mainly because of Opis, which is more awesome than a pack of laser-guided narwhals strung out on awesome juice. Seriously, check it out. Write your code in arrowised FRP. Then use the same piece of code in a live system, simulate the entire network, run it in a step-by-step debugger with event rewind, even prove properties in a model checker or have the simulator guess the asymptotic complexity of each event function. I even have a sneaking suspicion that the pretty system-design-diagrams in the accompanying paper were also generated directly from the source code. If not, thats the first thing I'm implementing :-)

Friday, 12 December 2008

Zombified GMap

This summer I participated in the google summer of code, working on a haskell library for fast, composable maps. This never materialised and is generally assumed to be dead.

What happened? Myself and Adrian Hey put a lot of work into this. The code as is stands is just about usable and is in places very fast (mostly in the places that Adrian was responsible for :-) ). When the summer ended I wanted to tidy up some ugliness in the api (which was causing *loads* of silly code duplification, and was my fault) before releasing the first version. [EDIT: duplification is not a real word]

Then I moved house, started at a new university and generally did lots of time consuming things that caused this release to be put off. I'm an incredibly proficient procastinator so this level of work completely stopped me from doing the 2-3 days worth of work that was needed to bring GMap to life.

It would be a crying shame to let it die after so much work, so I have a plan. The code is going up on hackage, tonight, in all of its ugly glory. Between the 6th and 16th January I have absolutely no work, no commitments and no excuse for leaving my desk. Hopefully the sheer embarassment of having my half-finished code on display will drive me to fix it.

So now I've made a public commitment to fixing this sorry state of affairs and have no way to back out. Fingers crossed.

[EDIT: Huzzah]

Monday, 4 August 2008

Yet more mini benchmarks

Ive uploaded IntMap (like OrdMap its finished except for 'merge'). This mini-benchmark is much like the last.

> {-# OPTIONS_GHC -O2 -fno-monomorphism-restriction #-}
> import qualified Data.Map as M

> import qualified Data.GMap as G
> import Data.GMap.ListMap
> import Data.GMap.OrdMap
> import Data.GMap.IntMap

> import Control.Monad
> import Data.Maybe

> ($$) = liftM

This time we will turn gibberish characters in gibberish Ints

> dict = {-# SCC "dict" #-} map (map fromEnum) $$ take 800000 $$ lines $$ readFile "gibberish"

> test = {-# SCC "test" #-} do
> d <- dict
> let mp :: M.Map [Int] ()
> mp = {-# SCC "Map.fromList" #-} M.fromAscList $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "Map.lookup" #-} M.lookup w mp) d

> testL = {-# SCC "testL" #-} do
> d <- dict
> let mp :: ListMap (OrdMap Int) Int ()
> mp = {-# SCC "OrdList-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "OrdList-GMap.lookup" #-} G.lookup w mp) d

IntMap is an avl tree over unboxed Ints.

> testI = {-# SCC "testI" #-} do
> d <- dict
> let mp :: ListMap IntMap Int ()
> mp = {-# SCC "IntList-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "IntList-GMap.lookup" #-} G.lookup w mp) d

> main = do
> print =<< test
> print =<< testL
> print =<< testI

And ...

total time = 24.48 secs (1224 ticks @ 20 ms)
total alloc = 7,092,895,320 bytes (excludes profiling overheads)

COST CENTRE %time %alloc

Map.lookup 8.2 0.1
Map.fromList 0.2 0.5

OrdList-GMap.lookup 8.9 0.7
OrdList-GMap.fromAssocs 0.5 0.5

IntList-GMap.lookup 4.7 0.7
IntList-GMap.fromAssocs 0.3 0.5

IntMap is a fair bit faster than the others, as you would expect from such a specialised data structure. So what am I going to do with it?

Heres a snippet of Data.Serial (based heavily on Data.Binary)

> class Serial a where
> put :: a -> Put a
> get :: Get a

> encode :: Serial a => a -> [Int]

So I can now run off and write

> data SerialMap k a = SerialMap (ListMap IntMap Int a)

> instance Serial k => Map (SerialMap k) k where ...

Hopefully with some tweaking the Serial package will become fast enough for this to be interesting. This is aimed at situations where you might want to trade off access time (encoding is always going to slow this down) for space (4 UTF8 chars to an Int :-D ).

Saturday, 2 August 2008

More benchmarks

I've taken David Overtons advice and written run hscolour over my post, so this should be easier to read than the last couple.

I've almost finished updating the polymorphic version of the avl trees. The only function missing is 'merge' which I'm hoping Adrian will be free to write as it requires some digging in his AvlTree package. During the rest of the week I'm going to work on the unboxed Int avl trees and then the serialisation package which will be using them.

Time for another benchmark...

Boring imports

> {-# OPTIONS_GHC -O2 -fno-monomorphism-restriction #-}

> import qualified Data.Map as M

Data.GMap can be found at code.haskell.org/gmap/api

> import qualified Data.GMap as G
> import Data.GMap.ListMap
> import Data.GMap.OrdMap
> import Data.GMap.AssocList

> import System.IO
> import System.IO.Unsafe

> import Control.Monad 
> import Data.Maybe
> import Data.List

> ($$) = liftM

"gibberish" is /usr/dict/words, except to make the benchmark large enough I've made 26 versions of each word by prepending characters.

> dict = {-# SCC "dict" #-} take 2000000 $$ lines $$ readFile "gibberish"

We'll turn the dictionary into a map and then lookup each word in turn.

> test =  {-# SCC "test" #-} do
> d <- dict
> let mp = {-# SCC "Map.fromList" #-} M.fromAscList $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "Map.lookup" #-} M.lookup w mp) d

And the same using a plain OrdMap (avl tree).

> testO = {-# SCC "testO" #-} do
> d <- dict
> let mp :: OrdMap String ()
> mp = {-# SCC "Ord-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "Ord-GMap.lookup" #-} G.lookup w mp) d

And again with a ListMap parameterised over an OrdMap.

> testL = {-# SCC "testL" #-} do
> d <- dict
> let mp :: ListMap (OrdMap Char) Char ()
> mp = {-# SCC "List-GMap.fromAssocs" #-} G.fromAssocsAsc $ zip d $ repeat ()
> return $ all isJust $ map (\w -> {-# SCC "List-GMap.lookup" #-} G.lookup w mp) d

> main = do
> print =<< test
> print =<< testO
> print =<< testL

And these are the results (edited for easier reading):

total time = 44.14 secs (2207 ticks @ 20 ms)
total alloc = 10,699,481,064 bytes (excludes profiling overheads)

COST CENTRE %time %alloc

Map.lookup 11.7 0.1
Map.fromList 3.4 3.3

Ord-GMap.lookup 12.0 0.1
Ord-GMap.fromAssocs 0.4 0.9

List-GMap.lookup 8.4 1.2
List-GMap.fromAssocs 0.3 0.9

The point of this is not that ListMap is blazingly fast (although its not bad) but that the Map interface makes it possible to build maps tailored to a particular problem by composing maps and map combinators.

Now I have investigate why ListMap's lookup is causing so much allocation. 1.2% of 10.6 GB is a lot of allocation for a function thats not modifying anything.

Wednesday, 30 July 2008

Week 7 progress (respect my formatting damnit)

I still haven't found a way to make blogger respect whitespace, so yet again I present unformatted code.

I now have everything up and running bug free on the new api except Adrian's avl trees (which I've saved for last). Heres a simple benchmark running ListMap against Data.Map. Bear in mind that as I don't have the avl trees yet the inner map used for list map is the association list from my reference implementation - not the fastest data structure in the world.

import qualified Data.Map as M

import qualified Data.GMap as G
import Data.GMap.ListMap
import Data.GMap.AssocList

import System.IO
import System.IO.Unsafe

import Data.Maybe

dict = lines $ unsafePerformIO $ readFile "/usr/share/dict/words"

test =
let mp = {-# SCC "Map.fromList" #-} M.fromAscList $ zip dict $ repeat ()
in all isJust $ map (\w -> {-# SCC "Map.lookup" #-} M.lookup w mp) dict

testG =
let mp :: ListMap (SList Char) Char ()
mp = {-# SCC "GMap.fromAssocs" #-} G.fromAssocsAsc $ zip dict $ repeat ()
in all isJust $ map (\w -> {-# SCC "GMap.lookup" #-} G.lookup w mp) dict

main = do
print $ test
print $ testG

And the results:

COST CENTRE MODULE %time %alloc

CAF Main 68.9 91.7
Map.lookup Main 14.8 0.3
GMap.lookup Main 14.8 1.9
Map.fromList Main 1.6 4.6
GMap.fromAssocs Main 0.0 1.6

So, not unsurprisingly, ListMap is already faster for trie friendly tests. Once I've updated the avl trees there should be no competition.

Wednesday, 23 July 2008

QuickCheck strikes again

I've just finished converting ListGT from the collections package. ListGT has never seen the light of QuickCheck. Running my test suite on it for the first time was quite scary. I'm not sure yet how much of this is due to actual bugs and how much is due to illphrased tests but I've certainly got my work cut out for me over the next few days.

*Test.GMap> run 100
prop_lookup_empty : OK, passed 100 tests.
prop_lookup_singleton : OK, passed 100 tests.
prop_insert_with : Falsifiable, after 63 tests:
fromAssocs [([],5),([-5],-2),([3,0,4,4,-6,-4,4,7,7],0),([7,-4,-7,-8],4),([7,4,0,-8,3,-8,8,0],6),([7,7,8,-4,8,-2,-1,-1,-1],5)]
([],-6)
prop_insert_without : OK, passed 100 tests.
prop_insertWith_with : Falsifiable, after 30 tests:
fromAssocs [([],2),([3,5,8,1,-1],-5)]
([],11,2,)
prop_insertWith_without : OK, passed 100 tests.
prop_insertWith'_with : Falsifiable, after 1 tests:
fromAssocs [([],2),([1],-2)]
([-2],1,-2,)
prop_insertWith'_without : OK, passed 100 tests.
prop_insertMaybe_with : Falsifiable, after 16 tests:
fromAssocs [([],-6),([-3,-7,-4,1,0,-6,5],-3),([-2,1,3,-5,-6,5],-7),([-1,7],2)]
([],-1,2,)
prop_insertMaybe_without : OK, passed 100 tests.
prop_insertMaybe'_with : Falsifiable, after 30 tests:
fromAssocs [([],-13),([-12,-1,-4],-5),([-8,-5,-4,-5,-14,-13,-5,-9,-2,1,0,2,8],-3),([-8,-5,-3,-1,-13],9),([-7,5,16,15,-9,16,6,7,15,9],11),([-1,-8,5,-12,-1,8,2,7,-12],-1),([6,6,-8,2,11,-10,2,-1,-2,-7,-9,-10,13,-9,13,0],5),([7,1,-16,16,-14,-5,16,14,-7,16,-13,7,9],-3),([10,1,-9,15],-1),([12,-13,2,4],-10),([13,-10,4,-8,-4,9,-7,-1,-13,13,8,-11,-8,7,6,5],-6),([15,-5,9,-10,6,9,-3],-12),([15,12,13,-8],7)]
([],0,-8,)
prop_insertMaybe'_without : OK, passed 100 tests.
prop_delete_with : OK, passed 100 tests.
prop_delete_without : OK, passed 100 tests.
prop_adjustWith_with : Falsifiable, after 6 tests:
fromAssocs [([],3),([-5,-4,-2,-3,-4,6],3),([0,5,3],5),([4,6,-5,0,5,-5],-2),([6,0],-1)]
([],1,)
prop_adjustWith_without : OK, passed 100 tests.
prop_adjustWith'_with : Falsifiable, after 20 tests:
fromAssocs [([],11),([-10,6,9,7],0),([-9,4,1,11,0,-10,11],6),([-4,-4,0,3,3,-3,6,9,3],8),([-4,-2],-9),([-2,11,7,-2,-8,11,4,-4,-3,7,-6],10),([-1,-1,-5,3,0,-3,8,-9,7,-7],3),([3,2,-4,3,-8,9,7],-3),([4,7,2,-2,-8],7),([11,9,11,-9,0],-11)]
([],8,)
prop_adjustWith'_without : OK, passed 100 tests.
prop_adjustMaybe_with : Falsifiable, after 0 tests:
fromAssocs [([],2)]
([],-3,)
prop_adjustMaybe_without : OK, passed 100 tests.
prop_adjustMaybe'_with : Falsifiable, after 33 tests:
fromAssocs [([],1),([-2,2],0)]
([],0,)
prop_adjustMaybe'_without : OK, passed 100 tests.
prop_isSubsetOf : OK, passed 100 tests.
prop_isSubmapOf : OK, passed 100 tests.
prop_map : Falsifiable, after 9 tests:
fromAssocs [([],-2),([2,2],-2)]
([],0,)
prop_map' : Falsifiable, after 10 tests:
fromAssocs [([],0)]
([],-1,)
prop_mapMaybe : Falsifiable, after 3 tests:
fromAssocs [([],3),([-3,-1,0],2),([-2,1,3],-1)]
([],-2,)
prop_mapMaybe' : Falsifiable, after 5 tests:
fromAssocs [([],0),([2],0)]
([2],-3,)
prop_mapWithKey : Falsifiable, after 10 tests:
fromAssocs [([],-7),([-7,-5,6,-4,4,-2],-6),([-6,3,2,5],-5),([0,-5,-3,-1,-5,6],-3),([0,-3,4],0),([2,-3,-3,-5,5,-3,1],2)]
([],3,)
prop_mapWithKey' : Falsifiable, after 5 tests:
fromAssocs [([],0)]
([],2,)
prop_filter_in : Falsifiable, after 10 tests:
fromAssocs [([],-2),([-5,-4],-5),([-4,-3,-5,3,0],0),([5,4],4)]
([],2)
prop_filter_out : Falsifiable, after 57 tests:
fromAssocs [([-1],-1)]
([-1],0)
prop_valid : OK, passed 100 tests.
prop_lazy_alter : OK, passed 100 tests.
prop_strict_alter' : Falsifiable, after 5 tests:
fromAssocs [([],1)]
[]
prop_lazy_insertWith : OK, passed 100 tests.
prop_strict_insertWith' : OK, passed 100 tests.
prop_lazy_insertMaybe : OK, passed 100 tests.
prop_strict_insertMaybe' : Falsifiable, after 0 tests:
fromAssocs []
[]
prop_lazy_adjustWith : OK, passed 100 tests.
prop_strict_adjustWith' : Falsifiable, after 0 tests:
fromAssocs [([],3),([-1,-1],-1),([1,-1],-2)]
([0,3,-2],1)
prop_lazy_adjustMaybe : OK, passed 100 tests.
prop_strict_adjustMaybe' : Falsifiable, after 0 tests:
fromAssocs [([],1)]
([],-1)
prop_lazy_map : OK, passed 100 tests.
prop_strict_map' : OK, passed 100 tests.
prop_lazy_mapMaybe : OK, passed 100 tests.
prop_strict_mapMaybe' : Falsifiable, after 3 tests:
fromAssocs [([-3,1,-3],1),([1,-2,3],-3),([2,-4,-4],2),([3,3],-1)]
([0,4],0)
prop_lazy_mapWithKey : OK, passed 100 tests.
prop_strict_mapWithKey' : OK, passed 100 tests.
prop_lazy_foldKeys : OK, passed 100 tests.
prop_strict_foldKeys' : OK, passed 100 tests.
prop_lazy_foldElems : OK, passed 100 tests.
prop_strict_foldElems' : OK, passed 100 tests.
prop_lazy_foldAssocs : OK, passed 100 tests.
prop_strict_foldAssocs' : OK, passed 100 tests.
propO_keysAsc : OK, passed 100 tests.
propO_keysDesc : OK, passed 100 tests.
propO_elemsAsc : OK, passed 100 tests.
propO_elemsDesc : OK, passed 100 tests.
propO_assocsAsc : OK, passed 100 tests.
propO_assocsDesc : OK, passed 100 tests.
propO_lazy_foldKeysAsc : OK, passed 100 tests.
propO_lazy_foldKeysAsc' : OK, passed 100 tests.
propO_lazy_foldKeysDesc : OK, passed 100 tests.
propO_lazy_foldKeysDesc' : OK, passed 100 tests.
propO_lazy_foldElemsAsc : OK, passed 100 tests.
propO_lazy_foldElemsAsc' : OK, passed 100 tests.
propO_lazy_foldElemsDesc : OK, passed 100 tests.
propO_lazy_foldElemsDesc' : OK, passed 100 tests.
propO_lazy_foldAssocsAsc : OK, passed 100 tests.
propO_lazy_foldAssocsAsc' : OK, passed 100 tests.
propO_lazy_foldAssocsDesc : OK, passed 100 tests.
propO_lazy_foldAssocsDesc' : OK, passed 100 tests.
prop2_lazy_merge : OK, passed 100 tests.
prop2_strict_merge' : Falsifiable, after 17 tests:
(fromAssocs [],fromAssocs [([-3,3],6)])
([3,3],-4)
prop2_lazy_union : OK, passed 100 tests.
prop2_strict_union' : OK, passed 100 tests.
prop2_lazy_unionMaybe : OK, passed 100 tests.
prop2_strict_unionMaybe' : OK, passed 100 tests.
prop2_lazy_intersection : OK, passed 100 tests.
prop2_strict_intersection' : OK, passed 100 tests.
prop2_lazy_intersectionMaybe : OK, passed 100 tests.
prop2_strict_intersectionMaybe' : OK, passed 100 tests.
prop2_lazy_differenceMaybe : OK, passed 100 tests.
prop2_strict_differenceMaybe' : OK, passed 100 tests.
comp_empty : OK, passed 100 tests.
comp_singleton : OK, passed 100 tests.
comp_pair : OK, passed 100 tests.
comp_status : OK, passed 100 tests.
comp_nonEmpty : OK, passed 100 tests.
comp_addSize : OK, passed 100 tests.
comp_lookup : OK, passed 100 tests.
comp_lookupCont : OK, passed 100 tests.
comp_alter : OK, passed 100 tests.
comp_alter' : Falsifiable, after 0 tests:
([([-3,-2],-3)],([-2],))
comp_insertWith : OK, passed 100 tests.
comp_insertWith' : Falsifiable, after 1 tests:
([([],-1)],([],1,))
comp_insertMaybe : OK, passed 100 tests.
comp_insertMaybe' : OK, passed 100 tests.
comp_delete : OK, passed 100 tests.
comp_adjustWith : OK, passed 100 tests.
comp_adjustWith' : OK, passed 100 tests.
comp_adjustMaybe : OK, passed 100 tests.
comp_adjustMaybe' : OK, passed 100 tests.
comp_map : OK, passed 100 tests.
comp_map' : OK, passed 100 tests.
comp_mapMaybe : OK, passed 100 tests.
comp_mapMaybe' : OK, passed 100 tests.
comp_mapWithKey : OK, passed 100 tests.
comp_mapWithKey' : OK, passed 100 tests.
comp_filter : OK, passed 100 tests.
comp_insert : Falsifiable, after 20 tests:
([([],2)],([],-1))
comp_size : OK, passed 100 tests.
comp_insertAssocs : OK, passed 100 tests.
comp_fromAssocs : OK, passed 100 tests.
comp_fromAssocsWith : OK, passed 100 tests.
comp_keys : OK, passed 100 tests.
comp_elems : OK, passed 100 tests.
comp_assocs : OK, passed 100 tests.
compO_insertAssocsAscWith : OK, passed 100 tests.
compO_insertAssocsDescWith : OK, passed 100 tests.
compO_insertAssocsAscMaybe : OK, passed 100 tests.
compO_insertAssocsDescMaybe : OK, passed 100 tests.
compO_foldElemsAsc : OK, passed 100 tests.
compO_foldElemsDesc : OK, passed 100 tests.
compO_foldElemsAsc' : OK, passed 100 tests.
compO_foldElemsDesc' : OK, passed 100 tests.
compO_foldKeysAsc : OK, passed 100 tests.
compO_foldKeysDesc : OK, passed 100 tests.
compO_foldKeysAsc' : OK, passed 100 tests.
compO_foldKeysDesc' : OK, passed 100 tests.
compO_foldAssocsAsc : OK, passed 100 tests.
compO_foldAssocsDesc : OK, passed 100 tests.
compO_foldAssocsAsc' : OK, passed 100 tests.
compO_foldAssocsDesc' : OK, passed 100 tests.
compO_fromAssocsAscWith : OK, passed 100 tests.
compO_fromAssocsDescWith : OK, passed 100 tests.
compO_fromAssocsAscMaybe : OK, passed 100 tests.
compO_fromAssocsDescMaybe : OK, passed 100 tests.
compO_elemsAsc : OK, passed 100 tests.
compO_elemsDesc : OK, passed 100 tests.
compO_keysAsc : OK, passed 100 tests.
compO_keysDesc : OK, passed 100 tests.
compO_assocsAsc : OK, passed 100 tests.
compO_assocsDesc : OK, passed 100 tests.
comp2_merge : Falsifiable, after 2 tests:
(([([1],2),([-3,4,-3,4],-2),([3],-2)],[([3,3],-1),([2,4,-2,-1],-1),([4,-3,-3,-4],-3)]),(,))
comp2_merge' : Falsifiable, after 7 tests:
(([([4],-5),([-2,4,-2,4],-4)],[([],3),([-2,-5],-1),([4,3,-5],-3)]),(,))
comp2_union : OK, passed 100 tests.
comp2_union' : OK, passed 100 tests.
comp2_unionMaybe : OK, passed 100 tests.
comp2_unionMaybe' : OK, passed 100 tests.
comp2_intersection : OK, passed 100 tests.
comp2_intersection' : OK, passed 100 tests.
comp2_intersectionMaybe : OK, passed 100 tests.
comp2_intersectionMaybe' : OK, passed 100 tests.
comp2_difference : OK, passed 100 tests.
comp2_differenceMaybe : OK, passed 100 tests.
comp2_differenceMaybe' : OK, passed 100 tests.
comp2_isSubsetOf : OK, passed 100 tests.
comp2_isSubmapOf : OK, passed 100 tests.
comp2_isProperSubsetOf : OK, passed 100 tests.
comp2_isProperSubmapOfBy : OK, passed 100 tests.

Tuesday, 15 July 2008

Week 5 progress

What happened to week 4? The power cable for my laptop silently gave up (the first sign of trouble was the melting plastic). It took me a long time to replace it - the first replacement I ordered still hasn't arrived. So instead of coding I spent the week sketching out designs and idea's on paper which actually ended up being really useful.

I'm now in Scotland, for some reason. I'm currently working on, in parallel, tweaking the api (code.haskell.org/gmap/api), writing tests and writing reference implementations on sorted and unsorted association lists. Once I have a really formidable test suite I'll begin converting the ~9000 lines of heavily optimised maps to the new api.

The current Map and OrderedMap classes are a compromise between generality and speed. The classes are huge - 50 or so methods each. I've started with some very neat, general methods like merge and alter as was suggested on the libraries list. I've then added lots of very specific methods like insertMaybe which have default implementations in terms of the general methods. The idea is that you can very quickly produce a working instance and then redefine the faster, more specific methods as necessary.

I'm writing the main part of the test suite as a test against the reference instances. You run a method once with the instance to be tested and once with the reference and check that both produce the same answer (or set of associations). This is a bit fiddly to set up type-wise but I think TestM will come to the rescue again. I'll write more on that later when I've made it less fiddly (and typechecked it) but the basic idea looks like this:

data Compare m1 m2 = forall a b. Eq b => Compare (PolyGen (m1 a) b) (PolyGen (m2 a) b)

checkMethod :: (Map map1 k, Map map2 k, Eq b) => (forall a map. Map map k => PolyGen (map a) b) -> Compare (map1 a) (map2 a)
checkMethod test = Compare test test

runCompare :: Compare m1 m2 -> Gen Bool
runCompare (Compare ma mb) =
a <- unPoly ma
b <- unPoly mb
return (a==b)

The PolyGen monad (which evolved from the test monad of 2 weeks back) is like the Gen monad but additonally can fix the type of class methods. An example might make this clearer:

empty :: Map map k => map a

prop_size_empty :: Map map k => PolyGen (map a) Bool
prop_size_empty = do
mp <- Poly empty
return $ (size empty = 0)

main = do
quickCheck (prop_size_empty :: PolyGen (IntMap Int) Bool)
quickCheck (prop_size_empty :: PolyGen (ListMap OrdMap Char String) Bool)

So now thanks to GADT magic (fiddly, verbose magic :-/ ) we can get test code that looks like:

checkSize :: Map map1 k, Map map2 k => Compare (map1 a) (map2 a)
checkSize = checkMethod $ do
mp <- PolyArb
return $ size mp

comps :: Map map1 k, Map map2 k => [Compare (map1 a) (map2 a)]
comps =
[
....
,checkSize
....
]

type CA map a = [Compare (AssocList a) (map a)]
type CSA map a = [Compare (SortedAssocList a) (map a)]

runComps = mapM (quickCheck . runCompare)

main = do
runComps (comps :: CA IntMap String)
runComps (comps :: CSA IntMap String)
runComps (comps :: CA (OrdMap Int) String)
runComps (comps :: CSA (OrdMap Int) String)
runComps (comps :: CA (ListMap IntMap Int) String)
runComps (comps :: CSA (ListMap IntMap Int) String)
etc ...

What this does is for each instance specified in main and for each test in the list of tests check that the same results are obtained when the test is run in the specified instance, in the AssocList reference instance and in the SortedAssocList reference instance. Which is damn handy, seeing as Adrian already wrote 6 0r 7 more or less working instances and there will be plenty more to follow.

Its a bit complicated and messy. And it would be nicer if I didn't have to keep rerunning the same reference tests against each instance. But I'll get all these wrinkles ironed out as I use it more. The whole thing would be far easier if I could just pass the types as an argument to the testing function. I also need to eventually need to look into testing laziness and strictness properties so I can make sure that

I've decided that I like writing this blog. I've come up with some nice ideas whilst writing progress updates, mostly due to that little twinge of embarrassment that you get when you post manky code.

About Me