Monday, 16 June 2008

GSoC Week 1

With the aftermath of my exams the first week has got off to a slow start. The first day was eaten by setting up ghc on a fresh computer and prodding the collections library into building (I still can't persuade cabal install to correctly register libraries).

I decided to try a serialised trie first. Adrian pointed out that Data.Binary use c-style allocation rather than the haskell gc so its not so efficient on large numbers of small keys. I've now ripped out most of the internals for my Serial class but left the front end untouched so I can (almost) reuse the deriving code and the default instances.

I've also implemented bitpacking rather than using a whole Word8 per constructor. I expect this will hurt speedwise but I think the focus for serialising should be mostly saving space. Later map implementations can trade off in the other direction.

The internal Builder in Data.Binary has been replaced with a Buildable class. This is only so I can easily experiment with different outputs - that change alone tripled the encoding time (because it prevents the Put monad from inlining the builder's puts?) so the final implementation will have to go back to a single output type.

Here are some small demonstrations:

*Data.Binary> encode ([1,2,3,4,5,6,7,8,9,0] :: [Word]) :: [Word]
[0,10,1,2,3,4,5,6,7,8,9,0]
*Data.Binary> encode ([1,2,3,4,5,6,7,8,9,0] :: [Word8]) :: [Word]
[0,655618,50595078,117967104]

*Data.Binary.Buildable.WordList> build $ foldl append empty $ map (putBits 32) [0,1,0,0,0,1,1,1,0,0,1,1,1,0,0]
[0,1,0,0,0,1,1,1,0,0,1,1,1,0,0]
*Data.Binary.Buildable.WordList> build $ foldl append empty $ map (putBits 4) [0,1,0,0,0,1,1,1,0,0,1,1,1,0,0]
[1048593,268505344]
*Data.Binary.Buildable.WordList> build $ foldl append empty $ map (putBits 1) [0,1,0,0,0,1,1,1,0,0,1,1,1,0,0]
[9116]

The key encoding is currently strict which means that lookups which fail early still have to pay the encoding cost. This kind of defeats the purpose of having a trie so I want to have a strict version for building the trie and a lazy version for lookups.

Goals for this week are:
Proper tests for the Serial class
Lazy Serial output
Set of benchmarks for comparing performance of different trie implementations
[UArray Word] output for Buildable

Thats all. Noodle time for me now.

0 comments:

About Me