ActiveDocument: More than just a document store
Last week I met up with Sebastian, Jake, and others to talk about ActiveDocument and Thrudb. Rick Olson (aka technoweenie) and Ross McFarland (a Thrudb contributor) were also in attendance in #activedocument on freenode. Before I talk about that, I'd like to give out some informational pointers. We've all decided to focus in on Rick's ActiveDocument implementation to bring our efforts together. Sebastian also set up a Google Group for ActiveDocument.
We spent our time asking Jake a bunch of questions about the future of Thrudb and trying to figure out how ActiveDocument could become an alternative data layer for Rails apps (replacing ActiveRecord). There are many pieces to the project, but they reach well beyond being a simple document store. The major parts include Thrucene for indexing, Thrudb as the document store, Thruqueue as a message queue, and Throxy to proxy requests and provide load balancing between Thrudb and Thrucene instances. If AD is going to replace AR and an RDBMS, we'll have to take advantage of most of these pieces.
Thrudb is still very much in development. There isn't an official release and Jake has stated that it will probably just be in the subversion repo for a while. I believe that Throxy still has some work before it's ready, but it's high on the list. That being the case, it will probably be a little while before ActiveDocument has a release.
However, we still came up with some design and implementation things we're thinking/concerned about.
- Support for multiple serialization formats. Thrudb exmaples use Thrift as the serialization format, AD includes that and JSON.
- How to deal with concurrency, locking and counters. Thrudb uses atomic writes where the last write wins. If a group is a document and the membership is stored, how do we deal with two processes updating the group membership at the same time?
- Ability to support other document stores or indexes. Jake is separating Thrudb out into its constituent parts so it's conceivable that you could use SimpleDB as the document store and use the rest.
- Thinking outside the RDBMS box. AR provides great built in finders and relationship functionality. How much of this should we try to replicate and what other things can we include that just wouldn't have been possible in an RDBMS?
- Speed for writes and updates. Rick posted some metrics on ActiveDocument vs. ActiveRecord. The write speed is slow due to an update to the index each time a record is created or updated. Jake tells me that this problem only gets worse as the index grows in size. He's actively working on this and has recently improved it quite a bit. However, it still highlights that index design is important when thinking about how an AD schema will be set up.
I haven't had a chance to actually do more development on this. Right now I'm still hacking on Tahiti, but I'm pretty sure that it will use AD and Thrudb in some capacity. I'll post more details as these things develop.
Technorati Tags: activedocument, thrudb