I've released another library just in time for the new year. The new library is called Domainatrix and she's used for parsing domain names, canonicalizing URLs, and a few other things. It uses the list of domain names from the Public Suffix List to know what constitutes a subdomain, domain, and public suffix.
Usage couldn't be simpler.
url = Domainatrix.parse("http://www.pauldix.net")
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"
url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"
It's simple, but I think quite useful. Enjoy!
I like it! But I wonder if at some point the Ruby community should consider using boring names, e.g. Net::Domain, instead of coining a neologism for every release :)
Posted by: Jeremy Voorhis | December 30, 2009 at 07:58 PM
What advantages do you tout over Addressable::URI?
And I for one like the name. The trouble with "boring" names, as Mr. Voorhis suggests above, is that they are prone to naming collisions with other libraries which are trying to do the same thing. Boring names like Net::HTTP should be reserved for the standard library.
Then again, I named one of my projects HookR, so I could be biased ;-)
Posted by: Avdi Grimm | December 31, 2009 at 10:31 AM
I wasn't familiar with Addressable::URI, but at a glance it seems as though it presents the same basic functionality as the standard library URI. The problem I needed to solve with this library was identifying subdomain, domain, and tld (not just the full host).
I also needed something that could create a canonical representation of URLs. This is helpful if you're indexing off this and want to do a site search or something. If I index a bunch of pages and key off the canonical URL the query to show everything from this site would be two starts with queries like this:
net.pauldix.*
net.pauldix/*
That doesn't seem to be something that Addressable::URI supports. As for names, I'm a big fan of coming up with memorable and searchable library names (since they're not part of a standard lib). However, I generally try to use descriptive class names within the libraries I write.
Posted by: Paul Dix | December 31, 2009 at 10:59 AM
Those few new bits just don't justify creating a whole new library that duplicates tons of functionality in the Ruby standard library.
Why not submit a patch for URI instead? We already have URI#parse and friends; it's got a host method, so why not extend it with your domain-specific extensions? If you don't want it to be a part of the standard library, just monkeypatch URI with those methods:
class URI::HTTP def tld host =~ /\.(\w+)$/ ? $1 : nil # Or whatever end # … endNow:
>> URI.parse("http://foo.com/").tld => "com"Posted by: Alexander | December 31, 2009 at 01:31 PM
Hi Alexander,
Getting the tld is actually trickier than that. Have a look at the public suffix list. Sometimes the known suffix is two or three dots deep (co.uk, *.*.jp, etc). As for adding to Ruby, I'd be more than happy if ruby-core took it, I just don't see that happening. Easier to write my own library.
Finally, I don't like monkey patching other people's code for public release. If it's something in my own code base, fine, but not if it's something I'm going to share with the world.
The beauty of gems sand OSS is that I can write something really small that has use for other people and easily distribute it.
Posted by: Paul Dix | December 31, 2009 at 02:50 PM
I was searching the web for ML in NYC, and somehow landed here.
At the cost of being unabashedly honest, might I say that I like your profile picture a lot :)
{as my ML professor in cmu used to say, 'dude, ML will make you stumble on unexpectedly attractive things in life'}
**apologies for the complete randomness**
Posted by: kdboy | April 20, 2010 at 12:57 AM
Hi Paul
I like your gem better than addressable but if I pass Domainatrix.parse("index.html").domain to domainatrix I get an error. Addressable returns nil which is more correct.
What do you think?
Radek
Posted by: Radek | April 27, 2010 at 01:18 AM
Have you seen http://github.com/toddsundsted/Public-Suffix-List ?
Posted by: dubek | May 19, 2010 at 04:27 AM
I haven't seen that before, but my first commit predates that project by about 30 days.
Posted by: Paul Dix | May 19, 2010 at 11:24 AM