Update: I notice that a lot of people land on this page from a search for Ruby HTTP Client. Since writing this post I created what I think is the best http client around for Ruby. It's called Typhoeus and it's super fast and has its own bindings to libcurl.
I'm currently working on a soon to be released feed (atom & rss) library called Feedzirra. One of my criteria for this library is that it should be able to pull multiple feeds as quickly as possible. This feature is one that I find all current Ruby feed libraries lacking in. All of the libraries I've checked out use Net::Http, which has been proven to be slow. So I decided to look at using different http clients, with a focus on ones that could finish multiple requests quickly.
I ran a benchmark test against my own site (just http://www.pauldix.net) using the following libraries:
- Net::Http
- RFuzz
- taf2-curb (a github version of the curb gem that supports the libcurl multi interface)
- EventMachine (specifically, HttpClient2)
- curl-multi (another ruby library that provides an interface to the libcurl multi api)
Here is the output from runs getting once, 10, and 100 times (benchmark code here if you're interested.)
Getting http://www.pauldix.net 1 time:
user system total real
taf2-curb 0.010000 0.010000 0.020000 ( 0.687772)
nethttp 0.000000 0.000000 0.000000 ( 0.627915)
rfuzz 0.000000 0.000000 0.000000 ( 0.601139)
eventmachine 0.000000 0.000000 0.000000 ( 0.688079)
curl-multi 0.000000 0.000000 0.000000 ( 0.596781)
Getting http://www.pauldix.net 10 times:
taf2-curb 0.020000 0.050000 0.070000 ( 0.779883)
nethttp 0.070000 0.050000 0.120000 ( 6.462604)
rfuzz 0.010000 0.010000 0.020000 ( 7.973339)
eventmachine 0.010000 0.020000 0.030000 ( 1.046070)
curl-multi 0.020000 0.040000 0.060000 ( 0.869918)
Getting http://www.pauldix.net 100 times:
taf2-curb 0.610000 1.850000 2.460000 ( 8.791512)
nethttp 0.720000 0.530000 1.250000 ( 69.995164)
rfuzz 0.100000 0.140000 0.240000 ( 67.059852)
eventmachine 0.140000 0.390000 0.530000 ( 7.506380)
curl-multi 0.740000 1.430000 2.170000 ( 12.315727)
There are plenty of problems with this kind of test. There's variability in speed over the net. Doing a GET on my site doesn't represent the many situations we can run into (like slow responses, slow transfer, or a large page). However, it seems pretty obvious from these tests that EventMachine and the two libraries that use libcurl-multi are the clear winners based on the metric of real time taken, which is the only one we care about.
Now the only thing left to decide on was library features. On this, taf2-curb was the clear winner. With EventMachine, I had problems getting it to pull down anything from FeedBurner and I couldn't get it to handle redirects. With curl-multi, it wasn't obvious how I could handle redirects or set header data. The taf2-curb library supported all the features I needed and let me set any header data I wanted. I'd highly recommend it if you have to do any http client stuff in Ruby.
Did the servers you tested against support any forms of compression, and which one(s) were used at the time? If no compression was used, it seems fair to repeat the experiments with the various choices the library supports, because that is the point of them. Which version(s?) of ruby did you test?
Posted by: hgs | January 30, 2009 at 06:01 AM
I was just testing against my site (hosted by typepad.) My guess is that they support compression. However, I didn't specify that compression was ok in any of my headers. I may rerun the test with compression, but I doubt it will make a difference. I expect that the Libcurl Multi and EventMachine libraries will far outperform rfuzz and net::http when it comes to making multiple requests. That's more a function of their deferred pattern than support for compression (see reactor pattern.
I performed the tests on ruby 1.8.7. Based on the post I linked to, my guess is that the differences would have been even more pronounced if I had been running 1.8.6. Not sure about 1.9. Even if it was slightly better, I think using libcurl is a better option. It's widely used, and widely tested. People put it to good use in many languages.
Posted by: Paul Dix | January 30, 2009 at 07:53 AM
I think proving that compression works has value because if many people use your reader then any advantage will be multiplied. See for example http://griffin.oobleyboo.com/archive/ruby-net-http-and-content-encoding-http_encoding_helper/ and http://www.codinghorror.com/blog/archives/000807.html although there are lots of others discussing the scalability of RSS. And this is optimisation, so getting the right benchmarks is part of making the right decision.
Posted by: hgs | January 30, 2009 at 09:41 AM
I'll most certainly test out compression for my library. However, that will be a test of taf2-curb with compression and without compression. This test was more about selecting an http client.
Decompression will happen outside the client after the download has happened via Zlib or something. The test I'll be running will be to determine if the extra cpu used by decompression makes getting feeds faster or slower as a whole (my guess is compression=faster).
Posted by: Paul Dix | January 30, 2009 at 09:49 AM
Could you test against a local server to get more accurate results? Maybe like a apache serving files of different sizes on your own computer or on a spare one? Then you wouldn't have to worry about internet network differences. I don't know, just a thought.
Posted by: John Nunemaker | January 30, 2009 at 01:12 PM
Testing against a local server would yield more accurate for that type of benchmark (against a server that has a very quick response time). However, a realistic test would have to simulate latency and servers that have variable response time. Variable response time is a much more realistic scenario. Further, because of the deferred processing method of the eventmachine and libcurl multi methods, variable conditions are the ones in which they'll have an even bigger advantage over net::http and rfuzz. I realize that last statement is speculation since I don't have actual numbers, but I'm fairly comfortable making it because of the huge difference in performance on the test this post is about.
I think the differences between the libraries on a single request or the differences between the eventmachine and libcurl multi options on many requests could be attributed to variable network conditions. Actually, even the single request performance characteristics would be greatly changed if the response is particularly large. The post I linked at the beginning details those problems.
Posted by: Paul Dix | January 30, 2009 at 04:25 PM
Try http://github.com/igrigorik/em-http-request. It combines the power of EM with the robust HTTP parser bundled with mongrel.
Posted by: Aman | January 30, 2009 at 09:36 PM
This test is pretty biased against net/http since libcurl uses keep-alive requests by default and net/http does not. :-(
I've submitted patches to ruby 1.9 to make use of non blocking requests. I need to fix some tests so that it gets rolled in to the next release. I think that with non blocking socket calls, and keep alive requests net/http should be nearly as fast as curb.
Posted by: Aaron Patterson | January 31, 2009 at 05:52 PM
Paul. curb is good, but EventMachine makes possible to use several different functionalities in one process and it seems to be impossible to do it with curb =(
I need to rewrite feeds (cache data locally), so I need to use thin + some feed parser. It seems that I'll have to rewrite your library to use with EM. Perhaps, pluggable backend?
Posted by: Max Lapshin | July 31, 2009 at 04:26 AM