« Giving My Project a Code Name | Main | Figuring Out Where Ruby is Spending My Memory »

July 03, 2007

Comments

Lucas Carlson

You are absolutely right about the unfortunate memory hog of ActiveRecord. Have you thought about using EC2 to do Starfish where you are given a more adequate amount of RAM to run things with? You are also very right about the GFS being a very important piece of the puzzle. Depending on how deep you want to go into the rabbit whole, I did have an idea that could help you out, which is create a new Starfish type called DRb which allows you to pull a DRb array over the wire of any collection of objects. Then you could serve a lightweight collection of db objects as an array or hash through DRb and keep the Starfish interface.

Also, the way you kicked of new clients was very clever, I just use a shell script that does the exact same thing.

Feel free to contact me directly: lucas at rufy.com

Paul Dix

Lucas, I have an EC2 account, but I'm still worried about the memory. The truth is that this is just the first in a large number of parallel/distributed background tasks that I will need to write and execute. Since machines represent a scarce resource right now, I need to design something with a much smaller footprint so I can run many more tasks per machine.

I have to think about it a little more and write some experimental code. Starfish is a great start and I will definitely be looking at it more to help give me ideas.

drbrain

DRb sharing two Queues will probably be just right. Shove URLs (as Strings) into one Queue which clients pull from, and dump results into the other Queue (which the server reads from). Should be plenty light-weight.

For bonus points, use RingyDingy to help processes automatically find each other.

Paul Dix

drbrain, I'm a little confused on RingyDingy vs. Rinda. RingyDingy uses Rinda and DRb and helps monitor everything?

The comments to this entry are closed.

My Photo

Talks

Linkage

  • My Github
  • Feedzirra
    My Ruby library for parsing and fetching feeds at blinding speed.
  • SAX Machine
    My Ruby library exposes a DSL for building Nokogiri backed SAX parsers.
  • Typhoeus
    My Ruby library for running HTTP requests quickly, easily, and in parallel.
  • NYC Machine Learning Meetup
    The meetup I organize. Talks from researchers and practitioners on machine learning and related technologies and techniques.
  • Benchmark Solutions
    The financial market data startup I work for in NYC. We're hiring and need Javascript, Scala, C++, and Ruby programmers. We're also on the lookout for PhDs in statistics or machine learning.

Twitter / pauldix