Service oriented architectures (SOA) are defined by Wikipedia as "systems [that] group functionality around business processes and package these as interoperable services." In more practical terms this means setting up various services to build out a full system. In the web 2.0 world people tend to think of SOAs as building mashups with public APIs from Google, Twitter, New York Times, Netflix, and countless others. However, more traditional SOAs are built internally for doing things like connecting with legacy systems and sprinkling magic scaling fairy dust to create "enterprise" systems.
One of the most visible examples of this kind of SOA is Amazon.com's architecture, which makes 100-150 different service requests to build a single page. Other high profile examples include LinkedIn's architecture and eBay's scalability best practices. This kind of thing is common in the Java world, but I've heard very little about developers within the Rails community building service architectures. The closest we've come is using message queues for background work, memcached, and sometimes a full text search engine (read: Solr, Sphinx, etc.)
I think the reason for the lack of services architectures in the Ruby community are twofold. First, Rails encourages a monolithic application architecture. Everything is contained within a single application code base. While this enables super fast iteration and development in the beginning, it can get messy and unwieldily later on (just ask anyone that has to wait 15 minutes for their test suite to run or who has had to refactor a big mess of code that contains all sorts of interdependencies.) The second reason is a shortcoming of the Ruby implementation. In Ruby it's impossible to run multiple requests in parallel. You could try threading, but not only will the Ruby threading model kill you, blocking IO will stop your interpreter cold as well.
Let's look at some numbers to put this into perspective. Let's say you have a request that calls out to 10 services to render a page. Further, these services can take anywhere from 100-150 ms to respond. In the regular Ruby world, those service requests would take 1 to 1.5 seconds to return. Inside of the request response life-cycle, this just isn't feasible. That's like making a call to that database that takes 1.5 seconds to come back. It just doesn't work. In the Java parallel world those 10 requests will return in no more time than the longest running (150 ms). With this model it suddenly becomes possible to make multiple service requests inside of the client's request/response chain.
HTTPMachine, a new Ruby library I'm writing for interacting with service architectures addresses the problem of running many requests in parallel. As a test for this I created a local evented http server that simply sleeps for 100ms and then returns a 60 line xml response. Here's some benchmark code that hits the server to test out speeds (in gist form):
calls = 10
@klass = Class.new do
include HTTPMachine
end
benchmark do |t|
t.report("httpmachine") do
HTTPMachine.service_access do
calls.times do
s = nil
@klass.get("http://127.0.0.1:3000") do |response_code, response_body|
s = response_body
end
end
end
end
t.report("net::http") do
calls.times do
s = open("http://127.0.0.1:3000").read
end
end
end
And finally, here are the results of the benchmark:
httpmachine 0.010000 0.000000 0.010000 ( 0.105024)
net::http 0.010000 0.010000 0.020000 ( 1.027697)
This is only the beginning of this library. It's only the test to prove out the concept that making multiple service calls within a client request is feasible. Now that I have that out of the way I'll start building up the other pieces to make a full services architecture a snap to build out. I'd really like to get feedback for the API and some possible use cases of this library. For pure Ruby environments I see something like a bunch of Sinatra services being called inside a main Rails application server. For my environment at work, it's a main Rails application server calling out to a bunch of services written in Java, Python, and C++.
Here's an example of a more fully fleshed out use:
class Post
include HTTPMachine
remote_method :search, :server => "http://localhost/search", :reponse_format => :json
end
# and using it
Post.search(params) # => returns fully parsed post results from the search
Thoughts?
This looks interesting. One of the things that concerns me about using all these libraries built on curl is that there doesn't seem to be a fakeweb ( http://github.com/chrisk/fakeweb/tree/master ) equivalent to make testing easier. So you're left with setting up local servers, etc.
Nevertheless this looks useful. I think some benchmarks of this against equivalent HTTParty code would be helpful both for showing differences in code implementation for wrapping services and also would highlight the speed boost that the curl libraries provide.
I'm still hoping for a curb-compatible fakeweb, but I think implementing it in the short-term is a bit over my head :/
Posted by: Jeff | March 05, 2009 at 12:33 PM
That's a good point. I'll have to include a good testing framework as part of this whole thing.
Posted by: Paul Dix | March 05, 2009 at 12:42 PM
Hi Paul,
This was exactly the use case I wrote evdispatch to resolve. check it out, there is definitely room to improve it. http://evdispatch.rubyforge.org/
The simple idea, is to have a per process background posix thread, running a libev loop, waiting for work to be signaled. Once a request comes into the queue the libcurl multi interface is used to send the request. This enables ruby to dispatch or queue work for the background posix thread to fetch, while not blocking the ruby interpreter. At a later point, ruby can block or timeout to wait for all the concurrent requests to complete.
Posted by: Todd Fisher | March 05, 2009 at 07:09 PM
Sorry to double post, but took a quick look at your HTTPMachine, I see you're already using my curb fork. In this case, it is probably a better solution to stick with curb... evdispatch was an experiment I developed before extending curb. evdispatch would in theory get better throughput, but in practice unless your making 1000s of service requests, curb will definitely work better... also, evdispatch has bugs, that haven't gone back to resolve...
Posted by: Todd Fisher | March 05, 2009 at 07:13 PM
Yeah, I think curb & the libcurl multi interface is best way to go. The one thing that this doesn't do yet is perform POST, PUT, and DELETE in parallel. For the time being I'm ok with that, but if I find I need it later I might have to fork Curb to write that support in. In the meantime, thanks for all the great work on Curb!
Posted by: Paul Dix | March 06, 2009 at 08:17 AM
Paul,
Not sure if this makes sense, but I'm more in favor of ESI and / or Nginx SSI as the ESI spec supports backend timeouts, expiry etc.
Todd's the author of mongrel-esi, but then again this isn't SOA, but simply a variation thereof that's perhaps more feasible with Ruby's typical multi-process in favor of multi-threaded deployment model.
- Lourens
Posted by: Lourens | March 06, 2009 at 06:37 PM
I understand what you're going for, but why HTTP? Seems like things like Thrift, Jabber, AMPQ, etc. would be more fitting.
Posted by: Josh Knowles | March 07, 2009 at 10:56 AM
SOA def presents some special technical challenges in the Ruby world as you point out. You mentioned message queues, but there's a lot more to be said about that topic. One strategy is to subscribe to events from the different systems and then take the relevant data from them and stick it in your db. Much much faster since you're hitting a local db instead of an external service on each request. You also don't have to handle failure scenarios in your app code. Finally, it provides a natural seam for sanitizing and translating the data if necessary. Take a look at the anti-corruption layer section of Domain-Driven Design.
Posted by: Pat Maddox | March 08, 2009 at 01:58 AM
SOA def presents some special technical challenges in the Ruby world as you point out. You mentioned message queues, but there's a lot more to be said about that topic. One strategy is to subscribe to events from the different systems and then take the relevant data from them and stick it in your db. Much much faster since you're hitting a local db instead of an external service on each request. You also don't have to handle failure scenarios in your app code. Finally, it provides a natural seam for sanitizing and translating the data if necessary. Take a look at the anti-corruption layer section of Domain-Driven Design.
Posted by: Pat Maddox | March 08, 2009 at 01:59 AM
One of the points of SOA is to not have everything going into a single DB. You partition out functionality early so you don't have the classic DB scaling problems later. On the idea of not hitting an external service, that's exactly what a DB is. It doesn't matter if you hit only that 1 or 1000, as long as they all return within an acceptable amount of time to render a request to the user.
Josh, on the issue of AMPQ, the kind of SOA I'm describing is meant to be synchronous. Things that the user is waiting on. Thing of a comment service, or on your site having a newsfeed service. The user needs to see these for the pages they're on and it's not something that gets queued. The writing to the data store can be queued, but pulling back the data to render the request needs to happen in real time. As for using Thrift or Jabber to do this kind of thing, it's definitely worth looking into.
Posted by: Paul Dix | March 08, 2009 at 04:10 PM