TruffleHog is a small library I wrote for detecting and parsing Atom and RSS feeds in a web page. I'm sure there are other libraries out there, but I couldn't find one easily. I found Cory Forsyth's FeedDetector, but there were some cases that it didn't handle.
Anyway, here is everything you need to get running.
gem install truffle-hog --source http://gemcutter.org
# and then in teh rubys
require 'rubygems'
require 'truffle-hog'
# get atom and rss
feed_urls = TruffleHog.parse_feed_urls(some_html)
# get atom if available, otherwise rss
feed_urls = TruffleHog.parse_feed_urls(some_html, :atom)
# get rss if available, otherwise atom
feed_urls = TruffleHog.parse_feed_urls(some_html, :rss)
Behind the scenes it's grabbing the urls through a regex. Really just a few lines of code. I may update it with some fancy Nokogiri parsing and some other stuff, but for now it's just a simple way to parse out feed urls.
Are you know about jnunemaker-columbus?
It does GET the url as well as follows redirects.
Posted by: Denis Barushev | October 23, 2009 at 02:29 PM
I didn't. However, doing the GET is exactly what I didn't want. I don't want my parsing code mixed in with the HTTP code.
I probably would have used something else if I could find it. I just didn't turn anything up on a search.
Posted by: Paul Dix | October 23, 2009 at 03:50 PM
I have heard of people using the google reader api to do feed discovery on a given url. Might be worth looking into.
Posted by: Quinn Shanahan | October 24, 2009 at 01:52 PM
I have heard of people using the google reader api to do feed discovery on a given url. Might be worth looking into.
Posted by: georgex | January 06, 2010 at 09:11 AM
You can use Feedbag to autodiscover feeds in a given url! cheers :) btw luv feedzirra & typhoeus!
Posted by: Paddy | February 25, 2010 at 12:41 AM