I finally got around to handling the pull requests and various feature requests people have sent me over the last few weeks. The main new features are: adding new parsing elements across all feeds, ids and guids now get parsed, modified now gets parsed, http basic authentication is now supported, itunes rss is now a supported format, and some zlib errors people were having should now be gone.
Also, when a parser is not available it now calls the onfailure method instead of throwing an error. This is better behavior when you're getting hundreds of feeds and want it to continue without stopping the others.
Parsing of the id/guid was a real debate for me. I intentionally left that out of the library. I had two reasons for this. First, there are many feeds that don't have ids or guids in them. Second, the permalink of the entry should act as a unique identifier. I felt that having another id to track was wasteful and could cause errors. However, enough people have asked about the feature that I've decided to include it in this release.
I also had to do a small release of sax-machine to fix some parsing behavior. The feedzirra gemspec should automatically pull that one down.
Here's what all those new features look like in code:
# access an http basic feed. You can't do this in a bulk feed get. It has to be when getting only one.
Feedzirra::Feed.fetch_and_parse(some_url, :http_authentication => ["username", "password"])
# You can add custom parsing to the feed entry classes. Say you want the wfw:comments fields in an entry
Feedzirra::Feed.add_common_feed_entry_element("wfw:commentRss", :as => :comment_rss)
# The arguments are the same as the SAXMachine arguments for the element method. For more example usage look at the RSSEntry and
# AtomEntry classes. Now you can access those in an atom feed:
Feedzirra::Feed.parse(some_atom_xml).entries.first.comment_rss_ # => wfw:commentRss is now parsed!
# and the new accessors for feed entries
entry.id # => the atom id or rss guid
entry.modified # => the modified time of the post
Most of these features were thanks to outside contributors. In fact, the only one I did was for adding custom parsing across feed entries. I'd like to thank the following github users for helping out!
- Hamish Rickerby for the itunes stuff
- Daniel Insley for rdoc
- Alexey Dmitriev for the deflate fix
- Nelson Morris for refactoring the tests to mock.
- Julien Genestoux for the id, modified, and updated publishing parsing. Also new bahavior on no parser available.
Sorry if I forgot anyone. Please yell at me if I haven't pulled in changes that you think should be included.
Daniel Insley was the one who updated the tests to run using mocks. I had an old fork that did it, but it was ugly and I killed it. His solution is much better.
Posted by: Nelson Morris | March 19, 2009 at 12:10 PM
Hey Paul!
Thanks for the heads up and for this new version... I just DL it and tried it but I am bumping into an error (haven't had time to look into it) :
no such file to load -- feedzirra/rdf_entry (MissingSourceFile)
Is it possible that you "forgot" some files in the Manifest?
Posted by: Julien | March 19, 2009 at 01:19 PM
Hey Julien,
Whoops, I removed the old rdf files and forgot to remove it from the main loader and the gemspec. I just fixed it and bumped the gem version. Github should hopefully be putting it up soon. Version 0.0.8.
Thanks
Posted by: Paul Dix | March 19, 2009 at 02:23 PM
Hey Paul,
Is there a ML or a group somewhere to discuss Feedzirra issues? Couldn't find it, so I thought this could be a nice place to post.
So, here is my problem : while testing with lots of feeds I found one that was not working as expected : http://hg.mozilla.org/mozilla-central/atom-log
If you use it, you'll see that the parsed entries don't have a url, while the feed's entries actually has a item.
This is due to the fact that this link item doesn't have rel=alternate nor type="text/html"...
I can't help but think that feedzirra shouldn't miss the link, but I also think that when an entry has several links, it should only take the one with rel=alternate and type=text/html.
So, here is my suggestion : add a links attribute to entries, which contains all the links (and there attributes?), and then, override the link reader to either return the @link or a choosen link (the first one?) from @links.
I'll try to set this up if you think it is a good idea. There might be an easier way. The main "drawback" of this is that our entries object will get "heavier" by an array...
Any thoughts?
Posted by: Julien | March 24, 2009 at 02:30 PM