Last night at the NYC Ruby hackfest, I got into a discussion about serializing data. Brian mentioned the Marshal library to me, which for some reason had completely escaped my attention until last night. He said it was wicked fast so we decided to run a quick benchmark comparison.
The test data is designed to roughly approximate what my stored classifier data will look like. The different methods we decided to benchmark were Marshal, json, eval, and yaml. With each one we took the in-memory object and serialized it and then read it back in. With eval we had to convert the object to ruby code to serialize it then run eval against that. Here are the results for 100 iterations on a 10k element array and a hash with 10k key/value pairs run on my Macbook Pro 2.4 GHz Core 2 Duo:
user system total real
array marshal 0.210000 0.010000 0.220000 ( 0.220701)
array json 2.180000 0.050000 2.230000 ( 2.288489)
array eval 2.090000 0.060000 2.150000 ( 2.240443)
array yaml 26.650000 0.350000 27.000000 ( 27.810609)
hash marshal 2.000000 0.050000 2.050000 ( 2.114950)
hash json 3.700000 0.060000 3.760000 ( 3.881716)
hash eval 5.370000 0.140000 5.510000 ( 6.117947)
hash yaml 68.220000 0.870000 69.090000 ( 72.370784)
The order in which I tested them is pretty much the order in which they ranked for speed. Marshal was amazingly fast. JSON and eval came out roughly equal on the array with eval trailing quite a bit for the hash. Yaml was just slow as all hell. A note on the json: I used the 1.1.3 library which uses c to parse. I assume it would be quite a bit slower if I used the pure ruby implementation. Here's a gist of the benchmark code if you're curious and want to run it yourself.
If you're serializing user data, be super careful about using eval. It's probably best to avoid it completely. Finally, just for fun I took yaml out (it was too slow) and ran the benchmark again with 1k iterations:
user system total real
array marshal 2.080000 0.110000 2.190000 ( 2.242235)
array json 21.860000 0.500000 22.360000 ( 23.052403)
array eval 20.730000 0.570000 21.300000 ( 21.992454)
hash marshal 19.510000 0.500000 20.010000 ( 20.794111)
hash json 39.770000 0.670000 40.440000 ( 41.689297)
hash eval 51.410000 1.290000 52.700000 ( 54.155711)