Dark Light

I’ve written an RSS aggregator. Go me.

As a first time python/PHP project, it’s really simple, Mark’s RSS Parser does all the work, and my junior Python puts it into a directory structure which is then read by a PHP script.

It splits into two parts, generate.py reads a list of feeds. The feeds are in a subdirectory of the script dir called “categories” and each text file within that is a category of feeds. For example, in categories/weblog is an entry Mark Pilgrimthttp://diveintomark.org/xml/rss.xml (t means “tab”). generate finds that, gets the feed, saves the ETag of the feed – if given – back to the file (so the line becomes Mark Pilgrimthttp://diveintomark.org/xml/rss.xmlt(etag) and we don’t keep downloading the whole feed if it hasn’t changed) splits out every item and saves each into cache/weblogs/Mark Pilgrim as a file like this:

Site: dive into mark
Link: http://diveintomark.org/
Title: The importance of human-readable markup
Plink: http://diveintomark.org/archives/2003/05/03/the_importance_of_humanreadable_markup.html

<p><cite>Slashdot</cite>: <a href="http://slashdot.org/article.pl?sid=03/05/02/1845241">HTML Rendering crashing IE</a>. Here's a <a href="http://vibrantlogic.com/ new.html">test page</a> that crashes my Internet Explorer 6.0 SP2 with all patches. Thanks to the wonders of integration, [...]

(Note, ESF format rides again :-)) with the file name as the checksum of the item. If a file with that checksum already exists, it won’t overwrite it.

view.php sits on the web-server. When called, it scans the cache directory for categories. When called with a category, it grabs every file from within it and puts the parsed contents into an array indexed by the time the file was created (Remember, the file was created from the item. If the item existed when we read the feed again, it wasn’t overwritten because there was already a file with that checksum) (There is also some fiddling to make sure that two items that arrived at the same second don’t overwrite each other). Said array is then displayed in a neat fashion.

It needs polishing, but it means I now have a LJ-friends-page-style list of every weblog and journal I read that has a feed, allowing me to stay up to date from my own browser 🙂

The demo (Which is my current feed setup) is here until I move it somewhere nicer 🙂

Related Posts

Syndicated Inc.

Crikey. Hello world. Okay, so the story so far would be that at half nine last night I…


People whose weblogs didn’t import cleanly into Gregarious, try again in the morning. http://www.caomhin.org/wibble/ http://www.bentbacktulips.co.uk/index.xml http://www.caomhin.org/linklog/index.rdf (Kevin? What…

Blog Meet 3

Okay, details for the BlogMeet then: At 2pm on Saturday 26th August July, Aquarion will go from Great…

Apple Sauce

So, having prevaricated, and perambulated the perimeter of the proverbial privet for long enough I have finally knuckled…