« Gotcha with cache_fu and permalinks | Main | Marshal data too short error with ActiveRecord »

August 27, 2008

Serializing data speed comparison: Marshal vs. JSON vs. Eval vs. YAML

Last night at the NYC Ruby hackfest, I got into a discussion about serializing data. Brian mentioned the Marshal library to me, which for some reason had completely escaped my attention until last night. He said it was wicked fast so we decided to run a quick benchmark comparison.

The test data is designed to roughly approximate what my stored classifier data will look like. The different methods we decided to benchmark were Marshal, json, eval, and yaml. With each one we took the in-memory object and serialized it and then read it back in. With eval we had to convert the object to ruby code to serialize it then run eval against that. Here are the results for 100 iterations on a 10k element array and a hash with 10k key/value pairs run on my Macbook Pro 2.4 GHz Core 2 Duo:

                 user      system     total       real
array marshal  0.210000   0.010000   0.220000 (  0.220701)
array json     2.180000   0.050000   2.230000 (  2.288489)
array eval     2.090000   0.060000   2.150000 (  2.240443)
array yaml    26.650000   0.350000  27.000000 ( 27.810609)

hash marshal   2.000000   0.050000   2.050000 (  2.114950)
hash json      3.700000   0.060000   3.760000 (  3.881716)
hash eval      5.370000   0.140000   5.510000 (  6.117947)
hash yaml     68.220000   0.870000  69.090000 ( 72.370784)

The order in which I tested them is pretty much the order in which they ranked for speed. Marshal was amazingly fast. JSON and eval came out roughly equal on the array with eval trailing quite a bit for the hash. Yaml was just slow as all hell. A note on the json: I used the 1.1.3 library which uses c to parse. I assume it would be quite a bit slower if I used the pure ruby implementation. Here's a gist of the benchmark code if you're curious and want to run it yourself.

If you're serializing user data, be super careful about using eval. It's probably best to avoid it completely. Finally, just for fun I took yaml out (it was too slow) and ran the benchmark again with 1k iterations:

                 user      system     total       real
array marshal  2.080000   0.110000   2.190000 (  2.242235)
array json    21.860000   0.500000  22.360000 ( 23.052403)
array eval    20.730000   0.570000  21.300000 ( 21.992454)

hash marshal  19.510000   0.500000  20.010000 ( 20.794111)
hash json     39.770000   0.670000  40.440000 ( 41.689297)
hash eval     51.410000   1.290000  52.700000 ( 54.155711)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341f4a0d53ef00e55493330a8834

Listed below are links to weblogs that reference Serializing data speed comparison: Marshal vs. JSON vs. Eval vs. YAML:

Comments

Nice post, Paul.

One issue I ran into with Marshaling data into the database this week was related to character encoding. I never totally nailed down the specific problem, but I had to base64 encode the Marshal output in order to safely store it in a TEXT column.

-Bryan

I heard that the yaml lib may be removed from ruby1.9 as it has no maintainer. Perhaps a new maintainer could start as a rewriter, and code the lib in C?

I think the current YAML parser is Syck, which is a Ragel parser written in C. If that's the case I'm not sure how its speed could be improved.

What about XML serialization? I don't think it might be faster than marshal but I'd like to see it in the results.

Have a nice day and thank you for this great post,

Lukas

why the lucky stiff wrote Syck and it's been a part of Ruby since 1.8.0.

however, it could definitely be faster! Syck might not be the bottleneck - the Syck page says Syck is hella fast - but I've seen Ruby YAML go slow as hell on my password gem. first thing I did when I read this post was make a note to switch password from YAML to Marshal.

maybe you have to explicitly invoke Syck to get faster YAML. I don't know. very weird that why brags about its speed yet users complain about its not-speed.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment