Dark Light

I’ve written an RSS aggregator. Go me.

As a first time python/PHP project, it’s really simple, Mark’s RSS Parser does all the work, and my junior Python puts it into a directory structure which is then read by a PHP script.

It splits into two parts, generate.py reads a list of feeds. The feeds are in a subdirectory of the script dir called “categories” and each text file within that is a category of feeds. For example, in categories/weblog is an entry Mark Pilgrimthttp://diveintomark.org/xml/rss.xml (t means “tab”). generate finds that, gets the feed, saves the ETag of the feed – if given – back to the file (so the line becomes Mark Pilgrimthttp://diveintomark.org/xml/rss.xmlt(etag) and we don’t keep downloading the whole feed if it hasn’t changed) splits out every item and saves each into cache/weblogs/Mark Pilgrim as a file like this:

Site: dive into mark
Link: http://diveintomark.org/
Title: The importance of human-readable markup
Plink: http://diveintomark.org/archives/2003/05/03/the_importance_of_humanreadable_markup.html
Category:

<p><cite>Slashdot</cite>: <a href="http://slashdot.org/article.pl?sid=03/05/02/1845241">HTML Rendering crashing IE</a>. Here's a <a href="http://vibrantlogic.com/ new.html">test page</a> that crashes my Internet Explorer 6.0 SP2 with all patches. Thanks to the wonders of integration, [...]

(Note, ESF format rides again :-)) with the file name as the checksum of the item. If a file with that checksum already exists, it won’t overwrite it.

view.php sits on the web-server. When called, it scans the cache directory for categories. When called with a category, it grabs every file from within it and puts the parsed contents into an array indexed by the time the file was created (Remember, the file was created from the item. If the item existed when we read the feed again, it wasn’t overwritten because there was already a file with that checksum) (There is also some fiddling to make sure that two items that arrived at the same second don’t overwrite each other). Said array is then displayed in a neat fashion.

It needs polishing, but it means I now have a LJ-friends-page-style list of every weblog and journal I read that has a feed, allowing me to stay up to date from my own browser šŸ™‚

The demo (Which is my current feed setup) is here until I move it somewhere nicer šŸ™‚

Related Posts

Sleep

Ah, the first daily update fail. I went to a LARP event. I meant to queue some posts…

T Paamayim Nekudotayim

Lesson of the day, PHP calls the ”::” – used to identify an uninstantiated class function – a…

This is not a redesign

This is just a re-realisation. Possibly. Features include: presentational difference between different types of content, Galleries getting consistant…

Used Games

The games industry does not like the used-games industry. The used-games industry quite likes the games industry, but…