Aquarionics

Tuesday 6th May 2003

Aquaintances

I’ve written an RSS aggregator. Go me.

As a first time python/PHP project, it’s really simple, Mark’s RSS Parser does all the work, and my junior Python puts it into a directory structure which is then read by a PHP script.

It splits into two parts, generate.py reads a list of feeds. The feeds are in a subdirectory of the script dir called “categories” and each text file within that is a category of feeds. For example, in categories/weblog is an entry Mark Pilgrimthttp://diveintomark.org/xml/rss.xml (t means “tab”). generate finds that, gets the feed, saves the ETag of the feed – if given – back to the file (so the line becomes Mark Pilgrimthttp://diveintomark.org/xml/rss.xmlt(etag) and we don’t keep downloading the whole feed if it hasn’t changed) splits out every item and saves each into cache/weblogs/Mark Pilgrim as a file like this:

Site: dive into mark
Link: http://diveintomark.org/
Title: The importance of human-readable markup
Plink: http://diveintomark.org/archives/2003/05/03/the_importance_of_humanreadable_markup.html
Category:

<p><cite>Slashdot</cite>: <a href="http://slashdot.org/article.pl?sid=03/05/02/1845241">HTML Rendering crashing IE</a>. Here's a <a href="http://vibrantlogic.com/ new.html">test page</a> that crashes my Internet Explorer 6.0 SP2 with all patches. Thanks to the wonders of integration, [...]

(Note, ESF format rides again :-)) with the file name as the checksum of the item. If a file with that checksum already exists, it won’t overwrite it.

view.php sits on the web-server. When called, it scans the cache directory for categories. When called with a category, it grabs every file from within it and puts the parsed contents into an array indexed by the time the file was created (Remember, the file was created from the item. If the item existed when we read the feed again, it wasn’t overwritten because there was already a file with that checksum) (There is also some fiddling to make sure that two items that arrived at the same second don’t overwrite each other). Said array is then displayed in a neat fashion.

It needs polishing, but it means I now have a LJ-friends-page-style list of every weblog and journal I read that has a feed, allowing me to stay up to date from my own browser :-)

The demo (Which is my current feed setup) is here until I move it somewhere nicer :)


Nicholas 'Aquarion' Avenell is a web developer in London, you can find out more about him or how to get in touch.

There are more Articles, Projects, Journal Entries, Photographs and things that defy description here, too.

If you're looking for something specific, there are Calendar & Category -based lists of everything.

And if you want to follow stuff that appears here, try a Syndication Feed, or the generic Feed of everything.

Aquarionics on Livejournal


Aquarion [updating]
Twitter last updated


More Journal:

[RSS Icon]
[ESF Icon]
[CDF Icon]

That which is relevant:


Explain Ads
© 2000 to 2008 inclusive Nicholas Avenell
All comments are the property of their creators, published with permission
(Unless otherwise indicated, the opinions and sentiments expressed on this site are those of the author and not of any organisation of which he is an affiliate, including his employer. Caveat Lector, E&OE. sigh)
0.629 seconds, 11 queries, 2.61Mb on Sat, 08 Nov 2008 10:35:32 +0000
Generated by Epistula Version 2.0.3