Update: I notice that a lot of people land on this page from a search for Ruby HTTP Client. Since writing this post I created what I think is the best http client around for Ruby. It's called Typhoeus and it's super fast and has its own bindings to libcurl.
I'm currently working on a soon to be released feed (atom & rss) library called Feedzirra. One of my criteria for this library is that it should be able to pull multiple feeds as quickly as possible. This feature is one that I find all current Ruby feed libraries lacking in. All of the libraries I've checked out use Net::Http, which has been proven to be slow. So I decided to look at using different http clients, with a focus on ones that could finish multiple requests quickly.
I ran a benchmark test against my own site (just http://www.pauldix.net) using the following libraries:
- taf2-curb (a github version of the curb gem that supports the libcurl multi interface)
- EventMachine (specifically, HttpClient2)
- curl-multi (another ruby library that provides an interface to the libcurl multi api)
Here is the output from runs getting once, 10, and 100 times (benchmark code here if you're interested.)
Getting http://www.pauldix.net 1 time:
user system total real
taf2-curb 0.010000 0.010000 0.020000 ( 0.687772)
nethttp 0.000000 0.000000 0.000000 ( 0.627915)
rfuzz 0.000000 0.000000 0.000000 ( 0.601139)
eventmachine 0.000000 0.000000 0.000000 ( 0.688079)
curl-multi 0.000000 0.000000 0.000000 ( 0.596781)
Getting http://www.pauldix.net 10 times:
taf2-curb 0.020000 0.050000 0.070000 ( 0.779883)
nethttp 0.070000 0.050000 0.120000 ( 6.462604)
rfuzz 0.010000 0.010000 0.020000 ( 7.973339)
eventmachine 0.010000 0.020000 0.030000 ( 1.046070)
curl-multi 0.020000 0.040000 0.060000 ( 0.869918)
Getting http://www.pauldix.net 100 times:
taf2-curb 0.610000 1.850000 2.460000 ( 8.791512)
nethttp 0.720000 0.530000 1.250000 ( 69.995164)
rfuzz 0.100000 0.140000 0.240000 ( 67.059852)
eventmachine 0.140000 0.390000 0.530000 ( 7.506380)
curl-multi 0.740000 1.430000 2.170000 ( 12.315727)
There are plenty of problems with this kind of test. There's variability in speed over the net. Doing a GET on my site doesn't represent the many situations we can run into (like slow responses, slow transfer, or a large page). However, it seems pretty obvious from these tests that EventMachine and the two libraries that use libcurl-multi are the clear winners based on the metric of real time taken, which is the only one we care about.
Now the only thing left to decide on was library features. On this, taf2-curb was the clear winner. With EventMachine, I had problems getting it to pull down anything from FeedBurner and I couldn't get it to handle redirects. With curl-multi, it wasn't obvious how I could handle redirects or set header data. The taf2-curb library supported all the features I needed and let me set any header data I wanted. I'd highly recommend it if you have to do any http client stuff in Ruby.