Yesterday I described the basic design of a distributed Ruby system I'm implementing. One of the first parts of building this system is figuring out how the different machines are going to communicate and cooperate. Distributed Ruby (DRb) seems like the perfect choice for the task. Programming Ruby describes it as a simple, lightweight Ruby version of RMI or CORBA.
Unfortunately, the Pickaxe book is my only reference and it includes only the most basic example. I guess it will help me to lay out how I think things work, a few of the concerns I have, and my guesses at their answers.
First thing to take note of is that DRb objects are multithreaded. So if I start some object of mine running on port 1337, multiple remote processes can connect to it. Not only that, but they can make calls at the same time. To see some of the multithreaded action I created a few simple DRb scripts like so:
require 'drb'And a test script to call it:
require 'drb/observer'
class Worker
def initialize
@test = 0
end
def run(sleep_val)
sleep(sleep_val)
@test += 1
return @test
end
end
worker = Worker.new
DRb.start_service("druby://127.0.0.1:1337", worker)
DRb.thread.join
sleep_time = ARGV[0].to_iI bring up three console windows. In one I run the worker which now sits waiting to be called on port 1337. In another I kick off the test caller with an argument to make it sleep for 10 seconds. In the last I kick of the test caller with an argument to make it sleep for only a second.
worker = DRbObject.new(nil, "druby://127.0.0.1:1337")
val = worker.run(sleep_time)
puts "test val: #{val}"
Running that test I can see three things that I think are important. First is that both calls succeeded so the multithreaded nature of the DRb object works exactly as I would expect allowing multiple connections at the same time. Second is that the @test instance variable in the worker object is shared. There is one worker object that handles both requests and threads them off. This tells me that if I'm going to be doing anything with instance variables in DRb objects, I have to make them threadsafe using a Queue or some locking mechanism like Mutex or Monitor. The last thing is that the calls from the test script don't return until the remote worker finishes. So the .run calls don't just send the data out to the remote worker and go about their business, they wait for a response. I also tested for when the connection is made with a slight modification of the test script. The call to DRbObject.new doesn't actually connect to the remote worker. That doesn't occur until the .run method call.
All of this is incredibly boring and simple, but I need to take small steps to think about the problem. So here's a slight rehash of my structure for the different pieces. One Master process runs for each MapReduce that gets kicked off. One DRbWorker runs on each machine that is available to perform processing. The Master sends as many input Chunks to the DRbWorker as it can handle. For each Chunk processing request, the DRbWorker starts a new process that it detaches from. It hands that script a druby uri so the Chunk process can report back to the Master what its status is. This means the call the Master makes to the DRbWorker to process the Chunk returns very quickly so the connection doesn't stay open (I'm assuming the connection is closed after the call returns, but I have to test this somehow).
This also means that the Master will have to start listening for incoming calls from the various Chunk processes before kicking them off to the DRbWorker. I was looking at using 'drb/observer' to have the DRbWorker send update notifications back to the Master, but I think it's easier to just have the Chunk process handle that. The DRbWorker is really just a dispatcher. It dispatches new processes into the machine that are requested from the Master.
I guess the only concern I have going forward is the issue of tracking which PIDs are responsible for processing each Chunk. I want the Master to be able to make a call to the DRbWorker to kill and/or restart a Chunk process if it thinks it has hung. I guess it's time to dig into launching and monitoring child processes.
Technorati Tags: ruby, drb, mapreduce
Comments