ThruDB ORM for Ruby
Sebastian Delmont has posted his thoughts about putting together a ThruDB for Rails. If you're not familiar with ThruDB, you should go check it out. It's a document store written by Jake at Third Rail that is in the same vein as CouchDB, SimpleDB, or Google's BigTable (here's an intro to ThruDB).
Anyway, back to ThruDB for Rails. It would be kind of like an ActiveRecord style abstraction for ThruDB. The truth is it really has nothing to do with Rails. It could be thread-safae, support connection pooling, and work with Merb. If done right it could be an abstraction to work with any document style database combined with an index and not just the ThruDB, Lucene, Thrift combination.
After a brief discussion with Sebastian yesterday in #nyc.rb, I got to work to see if I could convert the ThruDB bookmark example to something more ruby-like. Here's what my bookmark example looks like using something I whipped up real quick called ThruMapper.
require 'ThruMapper'
class Bookmark < ThruMapper
attribute :title, :string
attribute :url, :string
attribute :posted_date, :integer
attribute :tags, :string
end
def load_tsv_file(file)
open(file).each do |line|
line.chomp
bs = line.split("\t")
Bookmark.create(:url => bs[0], :title => bs[1], :tags => bs[2])
end
# we'd need to do a commit after each one and possibly have a bulk insert
# Commit just calls commitAll() on thrucene, which probably updates the index
Bookmark.commit
end
# just a helper method to print it out
def print_bookmark(bookmark)
puts "id : #{bookmark.id}"
puts "title : #{bookmark.title}"
puts "url : #{bookmark.url}"
puts "tags : #{bookmark.tags}"
end
# initialize the connection and load up some test data
ThruMapper.initialize_connection
load_tsv_file("../bookmarks.tsv")
# run a few find methods and check their results
Bookmark.find("tags:(+css +examples)",{ :random => 1}).each do |bookmark|
print_bookmark(bookmark)
end
linux_bookmarks = Bookmark.find("title:(linux)",{:sortby => "title"}).each do |bookmark|
print_bookmark(bookmark)
end
some_id = linux_bookmarks.first.id
# see the find_by_id method works
print_bookmark(Bookmark.find_by_id(some_id))
# see the find_all method works
puts Bookmark.find_all.size
# now remove them all from the store and clear the index
Bookmark.destroy_all
I reworked the BookmarkManager and didn't use the Thrift auto-generated bookmark files, but it works. ThruDB uses Thrift, which is a Facebook "framework for scalable cross-language services development". There is an annoying thing about Thrift that you have to define a .thrift file for each object type, then run a generator, and then work with the generated code. This is almost exactly like Google's protocol buffers and they were the bane of my existence there over the last summer.
I guess it makes sense to have the whole thing be cross language capable, but it drags dynamic languages down into compiled land. Why have schema files and run static generators when you can have the code be your schema? So following the syntax style Sebastian mentioned, ThruMapper constructs the necessary Thrift pieces for each inherited class.
So I've eliminated the .thrift files along with the generation step. The way I have it running now is that each ThruMapper class expects its own index. This creates a little bit of a configuration problem as the thrucene.conf file needs to be updated with each index. Does anyone have suggestions for dealing with this? I'd like to have the class definition of each ThruMapper class be the one point to get things going.
You can download the ThruMapper bookmark example and the ThruMapper code. Put them in tutorial/rb and run the ThruMapperBookmarkExample.rb to see it go.
Now I have to make it prettier, make it a gem, make it more generic so it can use CouchDB, implement some associations and automatic finders, make it thread safe, and have it pool connections. After all that it might be sweet. Hopefully Sebastian will agree to meet up with me next week to hack on this together. Anyone else up for a ThruDB hack session?
Technorati Tags: thrudb, ruby, thrift, scalability
Since maintaining Thrift type declaration files are cumbersome notably for dynamic languages and evolving, versionable doc types yet still wanting to maintain cross-platform capabilities why not use a structured text format for document representation? Something readily parseable and lightweight such as JSON or YAML?
This would keep Thrift where it belongs, at the communication interface perimeter while maintaining the desire to keep the persisted doc's representation as cross platform. Using Thrift to store temporary execution objects such as what Jake's done with ThruQueue is one thing but Thrift encoding long-lived, and presumably business-critical, entities is pushing Thrift far too deep into the infrastructure, coupling your documents with it in perpetuity. This would be like persisting the docs as their serialized CORBA representation because CORBA is what's used as the outer edge IPC.
Posted by: Jason | January 11, 2008 at 01:36 PM