Typhoeus is a mythical greek god with 100 fire breathing serpent heads. He's also the father of the more well known Hydra. Like the fearsome beast, Typhoeus is a fearsome Ruby library that enables parallel HTTP requests while cleanly encapsulating handling logic. Specifically, it uses libcurl and libcurl-multi to run HTTP really fast. Further, it's designed with the focus of creating client libraries that work with web services. These could be external services like Twitter or systems like CouchDB and SimpleDB or custom web services that you write yourself.
The libcurl interface is contained within the library. Rather than trying to get Curb to do what I wanted, I decided to start with a clean slate and write the c bindings myself. Other than the libcurl interface, it has a nice DSL for creating classes and client libraries for web services.
The inspiration for the library came from an interview with Amazon CTO Werner Vogels. In the interview he states that when a user visits the Amazon.com home page it calls out to up to 100 different services to construct the single page before returning it to the user. I like Amazon's approach of a services architecture, specially AWS, and wondered if I could do the same thing in Ruby. Typhoeus is the result of that effort.
I set up a benchmark to test how the parallel performance works vs Ruby's built in NET::HTTP. The setup was a local evented HTTP server that would take a request, sleep for 500 milliseconds and then issued a blank response. I set up the client to call this 20 times. Here are the results:
net::http 0.030000 0.010000 0.040000 ( 10.054327)
typhoeus 0.020000 0.070000 0.090000 ( 0.508817)
We can see from this that NET::HTTP performs as expected, taking 10 seconds to run 20 500ms requests. Typhoeus only takes 500ms (the time of the response that took the longest.)
Hopefully I've whetted your appetite. Before I get to the code examples and the API, I'd like to put in a plug for my employer kgb. They were nice enough to let me release this library as open source. We're also hiring for good front end rails developers, designers, and anyone with experience in search, information retrieval, and machine learning. Please drop me a line if you dominate code and if you're interested in joining an awesome team.
UPDATE: This interface has been deprecated. Please visit the Typhoeus Readme for updated documentation.
Finally, on the the codez. Here are some usage examples and notes from the readme (gist for easier reading).
# here's an example for twitter search
# Including Typhoeus adds http methods like get, put, post, and delete.
# What's more interesting though is the stuff to build up what I call
# remote_methods.
class Twitter
include Typhoeus
remote_defaults :on_success => lambda {|response| JSON.parse(response.body)},
:on_failure => lambda {|response| puts "error code: #{response.code}"},
:base_uri => "http://search.twitter.com"
define_remote_method :search, :path => '/search.json'
define_remote_method :trends, :path => '/trends/:time_frame.json'
end
tweets = Twitter.search(:params => {:q => "railsconf"})
# if you look at the path argument for the :trends method, it has :time_frame.
# this tells it to add in a parameter called :time_frame that gets interpolated
# and inserted.
trends = Twitter.trends(:time_frame => :current)
# and then the calls don't actually happen until the first time you
# call a method on one of the objects returned from the remote_method
puts tweets.keys # it's a hash from parsed JSON
# you can also do things like override any of the default parameters
Twitter.search(:params => {:q => "hi"}, :on_success => lambda {|response| puts response.body})
# on_success and on_failure lambdas take a response object.
# It has four accesssors: code, body, headers, and time
# here's and example of memoization
twitter_searches = []
10.times do
twitter_searches << Twitter.search(:params => {:q => "railsconf"})
end
# this next part will actually make the call. However, it only makes one
# http request and parses the response once. The rest are memoized.
twitter_searches.each {|s| puts s.keys}
# you can also have it cache responses and do gets automatically
# here we define a remote method that caches the responses for 60 seconds
klass = Class.new do
include Typhoeus
define_remote_method :foo, :base_uri => "http://localhost:3001", :cache_responses => 60
end
klass.cache = some_memcached_instance_or_whatever
response = klass.foo
puts response.body # makes the request
second_response = klass.foo
puts response.body # pulls from the cache without making a request
# you can also pass timeouts on the define_remote_method or as a parameter
# Note that timeouts are in milliseconds.
Twitter.trends(:time_frame => :current, :timeout => 2000)
# you also get the normal get, put, post, and delete methods
class Remote
include Typhoeus
end
Remote.get("http://www.pauldix.net")
Remote.put("http://", :body => "this is a request body")
Remote.post("http://localhost:3001/posts.xml",
{:params => {:post => {:author => "paul", :title => "a title", :body => "a body"}}})
Remote.delete("http://localhost:3001/posts/1")
Important Update:I should have mentioned that some bits of C were pulled from Todd Fisher's update to curb for the multi interface. The easy code is completely different and some of the C stuff in Multi has changed. However, a good chunk of the multi code comes straight from there. Thanks Todd. Sorry for not mentioning it earlier.
Good Good Good! Could you compare the performance with something like em-http-request?
Thanks
Posted by: Julien | May 07, 2009 at 08:02 PM
Should be compared with curl-multi for performance, since it sends multiple requests as well, Net::Http isn't a good comparison IMO. But curl-multi doesn't have the functionality of this I'm assuming, doesn't implement many of libcurl's offerings, so I'm sure this will be quite worthwhile even if the speed improvement isn't big. :)
Posted by: ehsanul | May 07, 2009 at 08:18 PM
Very COOL!
But why:
class Twitter
include Typhoeus
for me, only work:
class Twitter
require "Typhoeus"
require 'json'
include Typhoeus
Posted by: Max | May 07, 2009 at 08:42 PM
Nice work, but no credit to curb? The multi code looks familiar and the comments point to curb still ;-) I like the typhoeus_easy.c implementation very clean and easy to follow!
Posted by: Todd A. Fisher | May 07, 2009 at 11:26 PM
Reminds me of my profiling of Net/HTTP and curb with multi patch applied... see this: http://www.idle-hacking.com/2008/07/updated-curb-multi-interface-patch/ ?
Posted by: Todd A. Fisher | May 07, 2009 at 11:30 PM
Would be nice if HTTParty could use this - that has a way cleaner interface imho.
Posted by: Vishnu Gopal | May 08, 2009 at 02:45 AM
Which version of ruby is this? I believe the Net::HTTP performance is significantly improved in 1.9
Posted by: Glenn Gillen | May 08, 2009 at 02:54 AM
Wow. I love this. Great job Paul.
Posted by: Alistair Holt | May 08, 2009 at 06:17 AM
Have you changed out Feedzirra's backend with this? Looks really sweet...
Posted by: Sean | May 08, 2009 at 09:44 AM
ehsanul, I didnt' run a comparison against the multi because of it lacking many of these features.
julien, em-http-request is probably as fast. However, I wasn't able to get it to support everything that libcurl does.
Glenn, I tested against 1.8.7. Even in 1.9 I doubt it would compare. The problem would be with blocking IO. Although it would be interesting to test against a NET::HTTP Fibers implementation for parallelism.
Sean, I want to update Feedzirra to use this. I'll hopefully get to that in the next week or so.
Posted by: Paul Dix | May 09, 2009 at 11:25 AM
(Sorry if this somehow ends up being posted multiple times... The commenting wasn't quite working for me.)
Paul,
You've obviously done some great work here, and your interface to the core libraries are fantastic and look like a joy to use. I must admit, though, that I'm a little confused as to why you've compared your parallel request benchmarks to a single threaded, blocking request benchmark?
I suppose I mean to ask, do we even need to benchmark these things to know they're faster? As has been previously mentioned by you and ehsanul, it would be a much more impressive comparison if your implementation was faster than say, Ruby 1.8.7's green threads and having a request in each thread... or having say, Ruby 1.9's fibers in such a scenario (as you mentioned specifically).
I say all that because my current project is calling out to six or seven different web-services, and I've had to implement my own green threading solution with each request. I know I could probably swap out my library with your own, but I wonder a) how much of a performance increase it would give me, and b) how much work it'd be for me :P
If the performance increase is significant, then I'd have to put my nose to the grindstone and update my code with yours. Though, admittedly, I can't see how one could speed up the response time of a server :P Or how using a (probably) faster implementation of a C library instead of Ruby's NET::Http would really increase each request's performance. On that note, it would also be great to see the request time of a single request in your library compared with Ruby's Http library...
Looking forward to your thoughts.
Posted by: BigLove | May 11, 2009 at 05:16 PM
BigLove,
I suppose I should probably put together a comparison. What would be the proper ones to do? I can think of:
* EMHttpClient
* 1.8.6 Threaded
* 1.8.7 Threaded
* 1.9.1 Fibers
Running all those would be a pretty big pain in the ass though. If I have time I'd love to do that, but the real point of the benchmarks was to showcase the difference of running parallel vs. serial execution.
Posted by: Paul Dix | May 13, 2009 at 04:15 PM
Not to forget about the latest version of curb!
Posted by: Konstantin Haase | May 14, 2009 at 04:07 AM
But it doesn't support setting custom headers, only a custom user-agent?
Posted by: Andy | May 18, 2009 at 09:00 PM
I have a question about the copyright requirement with respect to the usage of the curb code within Typhoeus. Is there a need to explicit add the original copyright of both curb and the patch that Todd Fisher added? I ask not to be an idiot but rather to understand the requirements better in this area. Open source offers so much flexibility but I've never been sure where the lines are in terms of what you need to do to ensure proper attribution to the folks that wrote the original concepts that we then mold into our own creations.
Posted by: John W Higgins | May 20, 2009 at 12:43 PM
Hi John,
I don't know what the requirements are. To be honest, I used very little from curb. I definitely used a few pieces in the multi code, but almost everything is completely new.
With open source, everyone builds on everyone else. Neither of these libraries could have been written without libcurl. That couldn't have been written without gcc. I think it's just best efforts to give credit where it is due and not to rip off blatantly.
Have a look at the c code in both libraries. I think you'll find that they are quite different. However, I definitely couldn't have made typhoeus without the help of looking at the curb code.
Posted by: Paul Dix | July 19, 2009 at 04:59 PM
So we just started using this in a fairly high volume production env, and we are seeing posts instantly timeout (response code 0), regardless of what the timeout is set to. We built it against the latest libcurl source with --enable-ares and --enable-nonblocking. Seems to happen every 50 or so requests. Has anyone else run into this?
Posted by: Chris | August 05, 2009 at 07:17 PM
Hey Chris,
I assume this was you that posted on the Typhoeus list, but just in case it wasn't, you can join in on the thread about this issue here:
http://groups.google.com/group/typhoeus/browse_thread/thread/924d460ebbb240ac
Posted by: Paul Dix | August 06, 2009 at 08:57 AM