July 08, 2009

Using Named Pipes in Ruby for Inter-process Communication

Named pipes are an old trick for Unix types to accomplish communication between processes. I've actually never used the trick and it took me a little time to figure out how to do this in ruby. In this post I present a quick example for your enjoyment.

First, in a terminal window create the named pipe:

mkfifo my_pipe

At that point it looks like a file on the file system. It even has permissions on it. Now in Ruby processes we can open it like a file:

output = open("my_pipe", "w+") # the w+ means we don't block
output.puts "hello world"
output.flush # do this when we're done writing data

And here we are in a completely separate process:

input = open("my_pipe", "r+") # the r+ means we don't block
puts input.gets # will block if there's nothing in the pipe

If you pair that up with JSON, you've got a quick and efficient way to do inter-process communication. I'm actually using this trick so I can have my Sinatra web service call out to C++, Java, and Python processes.

June 05, 2009

On Ruby Interview with me

Pat Eyler of On Ruby has posted an interview with me about Typhoeus, Feedzirra, and software development in general.

May 07, 2009

Breathe fire over HTTP in Ruby with Typhoeus

Typhoeus is a mythical greek god with 100 fire breathing serpent heads. He's also the father of the more well known Hydra. Like the fearsome beast, Typhoeus is a fearsome Ruby library that enables parallel HTTP requests while cleanly encapsulating handling logic. Specifically, it uses libcurl and libcurl-multi to run HTTP really fast. Further, it's designed with the focus of creating client libraries that work with web services. These could be external services like Twitter or systems like CouchDB and SimpleDB or custom web services that you write yourself.

The libcurl interface is contained within the library. Rather than trying to get Curb to do what I wanted, I decided to start with a clean slate and write the c bindings myself. Other than the libcurl interface, it has a nice DSL for creating classes and client libraries for web services.

The inspiration for the library came from an interview with Amazon CTO Werner Vogels. In the interview he states that when a user visits the Amazon.com home page it calls out to up to 100 different services to construct the single page before returning it to the user. I like Amazon's approach of a services architecture, specially AWS, and wondered if I could do the same thing in Ruby. Typhoeus is the result of that effort.

I set up a benchmark to test how the parallel performance works vs Ruby's built in NET::HTTP. The setup was a local evented HTTP server that would take a request, sleep for 500 milliseconds and then issued a blank response. I set up the client to call this 20 times. Here are the results:

 net::http 0.030000 0.010000 0.040000 ( 10.054327)
typhoeus 0.020000 0.070000 0.090000 ( 0.508817)

We can see from this that NET::HTTP performs as expected, taking 10 seconds to run 20 500ms requests. Typhoeus only takes 500ms (the time of the response that took the longest.)

Hopefully I've whetted your appetite. Before I get to the code examples and the API, I'd like to put in a plug for my employer kgb. They were nice enough to let me release this library as open source. We're also hiring for good front end rails developers, designers, and anyone with experience in search, information retrieval, and machine learning. Please drop me a line if you dominate code and if you're interested in joining an awesome team.

Finally, on the the codez. Here are some usage examples and notes from the readme (gist for easier reading).

# here's an example for twitter search
# Including Typhoeus adds http methods like get, put, post, and delete.
# What's more interesting though is the stuff to build up what I call
# remote_methods.
class Twitter
include Typhoeus
remote_defaults :on_success => lambda {|response| JSON.parse(response.body)},
:on_failure => lambda {|response| puts "error code: #{response.code}"},
:base_uri => "http://search.twitter.com"

define_remote_method :search, :path => '/search.json'
define_remote_method :trends, :path => '/trends/:time_frame.json'
end

tweets = Twitter.search(:params => {:q => "railsconf"})

# if you look at the path argument for the :trends method, it has :time_frame.
# this tells it to add in a parameter called :time_frame that gets interpolated
# and inserted.
trends = Twitter.trends(:time_frame => :current)

# and then the calls don't actually happen until the first time you
# call a method on one of the objects returned from the remote_method
puts tweets.keys # it's a hash from parsed JSON

# you can also do things like override any of the default parameters
Twitter.search(:params => {:q => "hi"}, :on_success => lambda {|response| puts response.body})

# on_success and on_failure lambdas take a response object.
# It has four accesssors: code, body, headers, and time

# here's and example of memoization
twitter_searches = []
10.times do
twitter_searches << Twitter.search(:params => {:q => "railsconf"})
end

# this next part will actually make the call. However, it only makes one
# http request and parses the response once. The rest are memoized.
twitter_searches.each {|s| puts s.keys}

# you can also have it cache responses and do gets automatically
# here we define a remote method that caches the responses for 60 seconds
klass = Class.new do
include Typhoeus

define_remote_method :foo, :base_uri => "http://localhost:3001", :cache_responses => 60
end

klass.cache = some_memcached_instance_or_whatever
response = klass.foo
puts response.body # makes the request

second_response = klass.foo
puts response.body # pulls from the cache without making a request

# you can also pass timeouts on the define_remote_method or as a parameter
# Note that timeouts are in milliseconds.
Twitter.trends(:time_frame => :current, :timeout => 2000)

# you also get the normal get, put, post, and delete methods
class Remote
include Typhoeus
end

Remote.get("http://www.pauldix.net")
Remote.put("http://", :body => "this is a request body")
Remote.post("http://localhost:3001/posts.xml",
{:params => {:post => {:author => "paul", :title => "a title", :body => "a body"}}})
Remote.delete("http://localhost:3001/posts/1")

Important Update:I should have mentioned that some bits of C were pulled from Todd Fisher's update to curb for the multi interface. The easy code is completely different and some of the C stuff in Multi has changed. However, a good chunk of the multi code comes straight from there. Thanks Todd. Sorry for not mentioning it earlier.

May 03, 2009

RailsConf this week

I'm heading to Vegas tomorrow to attend RailsConf. Other than sitting in talks and meeting and talking to interesting people, I'd like to get some hacking done. If anyone is interested in meeting up to hack on either Feedzirra or Typhoues (or a CouchDB or SimpleDB library backed by Typhoeus), let me know. If you're going to be in Vegas, drop me a line or find me on twitter as pauldix.

April 17, 2009

Small Feedzirra Release

A few people had complained that Feedzirra was throwing segfaults on some feeds. After looking into this a bit I found that it was an issue when trying to deal with gzip or deflate encoded feeds. Since the primary goal of Feedzirra is to be used as a multi-feed background fetcher, segfaults are completely unacceptable. So for now I've made it not request gzip or deflate encodings by default.

You can still enable gzip and deflate by doing this when you call:

Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/PaulDixExplainsNothing", :compress => true)

This should work with all the fetching methods.

I also removed the ITunesRSS parser as a default option. The field names were a bit different so it was breaking normalization (which is something this library values). You can add that back in by doing the following:

 Feedzirra::Feed.add_feed_class(Feedzirra::ITunesRSS)