« Released Basset Gem for Machine Learning | Main | ActiveDocument: More than just a document store »

January 11, 2008

ThruDB ORM for Ruby

Sebastian Delmont has posted his thoughts about putting together a ThruDB for Rails. If you're not familiar with ThruDB, you should go check it out. It's a document store written by Jake at Third Rail that is in the same vein as CouchDB, SimpleDB, or Google's BigTable (here's an intro to ThruDB).

Anyway, back to ThruDB for Rails. It would be kind of like an ActiveRecord style abstraction for ThruDB. The truth is it really has nothing to do with Rails. It could be thread-safae, support connection pooling, and work with Merb. If done right it could be an abstraction to work with any document style database combined with an index and not just the ThruDB, Lucene, Thrift combination.

After a brief discussion with Sebastian yesterday in #nyc.rb, I got to work to see if I could convert the ThruDB bookmark example to something more ruby-like. Here's what my bookmark example looks like using something I whipped up real quick called ThruMapper.

require 'ThruMapper'

class Bookmark < ThruMapper
  attribute :title, :string
  attribute :url, :string
  attribute :posted_date, :integer
  attribute :tags, :string
end

def load_tsv_file(file)
  open(file).each do |line|
    line.chomp
    bs = line.split("\t")
    Bookmark.create(:url => bs[0], :title => bs[1], :tags => bs[2])
  end
 
  # we'd need to do a commit after each one and possibly have a bulk insert
  # Commit just calls commitAll() on thrucene, which probably updates the index
  Bookmark.commit
end

# just a helper method to print it out
def print_bookmark(bookmark)
  puts "id    : #{bookmark.id}"
  puts "title : #{bookmark.title}"
  puts "url   : #{bookmark.url}"
  puts "tags  : #{bookmark.tags}"
end

# initialize the connection and load up some test data
ThruMapper.initialize_connection
load_tsv_file("../bookmarks.tsv")

# run a few find methods and check their results
Bookmark.find("tags:(+css +examples)",{ :random => 1}).each do |bookmark|
  print_bookmark(bookmark)
end

linux_bookmarks = Bookmark.find("title:(linux)",{:sortby => "title"}).each do |bookmark|
  print_bookmark(bookmark)
end

some_id = linux_bookmarks.first.id

# see the find_by_id method works
print_bookmark(Bookmark.find_by_id(some_id))

# see the find_all method works
puts Bookmark.find_all.size

# now remove them all from the store and clear the index
Bookmark.destroy_all

I reworked the BookmarkManager and didn't use the Thrift auto-generated bookmark files, but it works. ThruDB uses Thrift, which is a Facebook "framework for scalable cross-language services development". There is an annoying thing about Thrift that you have to define a .thrift file for each object type, then run a generator, and then work with the generated code. This is almost exactly like Google's protocol buffers and they were the bane of my existence there over the last summer.

I guess it makes sense to have the whole thing be cross language capable, but it drags dynamic languages down into compiled land. Why have schema files and run static generators when you can have the code be your schema? So following the syntax style Sebastian mentioned, ThruMapper constructs the necessary Thrift pieces for each inherited class.

So I've eliminated the .thrift files along with the generation step. The way I have it running now is that each ThruMapper class expects its own index. This creates a little bit of a configuration problem as the thrucene.conf file needs to be updated with each index. Does anyone have suggestions for dealing with this? I'd like to have the class definition of each ThruMapper class be the one point to get things going.

You can download the ThruMapper bookmark example and the ThruMapper code. Put them in tutorial/rb and run the ThruMapperBookmarkExample.rb to see it go.

Now I have to make it prettier, make it a gem, make it more generic so it can use CouchDB, implement some associations and automatic finders, make it thread safe, and have it pool connections. After all that it might be sweet. Hopefully Sebastian will agree to meet up with me next week to hack on this together. Anyone else up for a ThruDB hack session?

Technorati Tags: , , ,

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/280961/25024588

Listed below are links to weblogs that reference ThruDB ORM for Ruby:

Comments

Since maintaining Thrift type declaration files are cumbersome notably for dynamic languages and evolving, versionable doc types yet still wanting to maintain cross-platform capabilities why not use a structured text format for document representation? Something readily parseable and lightweight such as JSON or YAML?

This would keep Thrift where it belongs, at the communication interface perimeter while maintaining the desire to keep the persisted doc's representation as cross platform. Using Thrift to store temporary execution objects such as what Jake's done with ThruQueue is one thing but Thrift encoding long-lived, and presumably business-critical, entities is pushing Thrift far too deep into the infrastructure, coupling your documents with it in perpetuity. This would be like persisting the docs as their serialized CORBA representation because CORBA is what's used as the outer edge IPC.

Post a comment

If you have a TypeKey or TypePad account, please Sign In