TruffleHog is a small library I wrote for detecting and parsing Atom and RSS feeds in a web page. I'm sure there are other libraries out there, but I couldn't find one easily. I found Cory Forsyth's FeedDetector, but there were some cases that it didn't handle.
Anyway, here is everything you need to get running.
gem install truffle-hog --source http://gemcutter.org
# and then in teh rubys
# get atom and rss
feed_urls = TruffleHog.parse_feed_urls(some_html)
# get atom if available, otherwise rss
feed_urls = TruffleHog.parse_feed_urls(some_html, :atom)
# get rss if available, otherwise atom
feed_urls = TruffleHog.parse_feed_urls(some_html, :rss)
Behind the scenes it's grabbing the urls through a regex. Really just a few lines of code. I may update it with some fancy Nokogiri parsing and some other stuff, but for now it's just a simple way to parse out feed urls.