Aquaintances Next
Aquaintances now works, but there are a couple of things it doesn't handle well, and I'd like to make it handle better.
- Edits
- A downside of storing articles as checksums of the content is that if an article is edited, it immediatly moves to the top of the list. I should probably checksum some other detail, like the GUID if supplied, or the link, but then I wouldn't detect edits at all. I could generate a checksum of the GUID or link (Not title, Some feeds don't supply titles or their titles aren't unique), save it as that, generate the file, check the checksum against the checksum as the existing file (if it exists) and then magically highlight changes. Hmm.
- RSS Parser
- Either edit "Mark's":diveintomark.org/projects/rss_parser/ to return all the stuff I want or make my own.
- Categories
- This is where it gets complicated. I want to associate categories to Epistula's category set, meaning I could associate "Nick's Virtual Culture":http://www.frejol.org/archives.live?category=VirtualCulture category and "BB's Metablogging":http://weblog.burningbird.net/fires/cat_metablogging.htm category with my own "Metablog":http://www.aquarionics.com/category/Metablog category. This would enable me to start on the "crossreferencing theory":http://www.aquarionics.com/journal/id/1059 I was talking about, but also mean that if I was in a hurry, I could just see posts on the stuff I'm really interested in.
- News Feed
- With a little work, I could make it so that I could put a symlink in my newsserver's directory and read all my feeds by NNTP. That's one of the nice things about the flat file format :-). With a little more work, I could track those weblogs which allow comments sent as email, or have comment feeds per entry, and turn them into threads on the server.
- 2003-05-09 11:34:40
- Updated 6 weeks later
- By Aquarion
- More Journal Entries
- Filed under Aquaintances
Stuart Langridge:
Contrary (my homebrew aggregator) stores articles in separate files named weblog-name/md5-hash-of-link-to-article. When we get an updated RSS feed (you can tell it’s updated through the ETag thing and Last-Modified), we try and save all articles in it. If we hit one that already exists, then we compare an md5 hash of the saved article’s body with an md5 hash of the body from the RSS, and if different, overwrite the saved article on the filesystem. When we read an article, we add a “Read: yes” header to the saved article (mine stores in ESF-a-like too; that was a brilliant idea of yours), and overwriting it obviously removes that, which means it’s marked as unread again. We also order articles in the display by date-last-modified on the filesystem, so new ones (and changed ones that have overwritten old copies) jump to the top.