I’ve written an RSS aggregator. Go me.
As a first time python/PHP project, it’s really simple, Mark’s RSS Parser does all the work, and my junior Python puts it into a directory structure which is then read by a PHP script.
It splits into two parts, generate.py reads a list of feeds. The feeds are in a subdirectory of the script dir called “categories” and each text file within that is a category of feeds. For example, in categories/weblog is an entry
Mark Pilgrimthttp://diveintomark.org/xml/rss.xml (t means “tab”). generate finds that, gets the feed, saves the ETag of the feed – if given – back to the file (so the line becomes
Mark Pilgrimthttp://diveintomark.org/xml/rss.xmlt(etag) and we don’t keep downloading the whole feed if it hasn’t changed) splits out every item and saves each into
cache/weblogs/Mark Pilgrim as a file like this:
Site: dive into mark Link: http://diveintomark.org/ Title: The importance of human-readable markup Plink: http://diveintomark.org/archives/2003/05/03/the_importance_of_humanreadable_markup.html Category:
<p><cite>Slashdot</cite>: <a href="http://slashdot.org/article.pl?sid=03/05/02/1845241">HTML Rendering crashing IE</a>. Here's a <a href="http://vibrantlogic.com/ new.html">test page</a> that crashes my Internet Explorer 6.0 SP2 with all patches. Thanks to the wonders of integration, [...]
(Note, ESF format rides again :-)) with the file name as the checksum of the item. If a file with that checksum already exists, it won’t overwrite it.
view.php sits on the web-server. When called, it scans the cache directory for categories. When called with a category, it grabs every file from within it and puts the parsed contents into an array indexed by the time the file was created (Remember, the file was created from the item. If the item existed when we read the feed again, it wasn’t overwritten because there was already a file with that checksum) (There is also some fiddling to make sure that two items that arrived at the same second don’t overwrite each other). Said array is then displayed in a neat fashion.
It needs polishing, but it means I now have a LJ-friends-page-style list of every weblog and journal I read that has a feed, allowing me to stay up to date from my own browser 🙂
The demo (Which is my current feed setup) is here until I move it somewhere nicer 🙂