Categories
Imported From Epistula web development XML

ESF's second birthday

In the great and powerful world of weblogs, anything older than a week, that has vanished into the archives, is dead, gone, and will never be seen again.

Well, almost. ESF appears to have reappeared on people’s radar, and since today is exactly two years (and one month, damn) to the day that I released the spec, I thought it might be time for a little retrospective on why it existed, why it still exists, and where it went.

Well, like a small child with a paintbrush, it went everywhere. Plugins and templates exist for almost every major weblogging tool (Including an MT plugin just to create the required date format) and an increasingly scary number of minor ones. There’s even a feed reader for it (Which has, irritatingly, “extended” the format to allow a text description, which is somewhat against the spirit of the format). Oh, and a CPAN module to create and read it. I’m absolutely freaking amazed by all of this. I created the format for two main reasons:

  1. I was annoyed at the syndication wars
  1. Epistula needed it.

    The second was the actual reason for all this. I wanted a basic format to include a list the last x items of a section without hitting the database each time. I wanted to use an existing feed format for that, but really didn’t want to touch XML parsing with a sixty-foot pole at that point in the system. Because I was – and am – a *nix Admin, the most natural format for me to put this in was something approaching the classic news/mail format, which has passed data between systems for decades without needing to involve XML. I swapped the colon-separated format of that with a tabbed-based format, mostly because anyone using colons in a title field can be forgiven, but anyone using tabs has larger problems already. Hash marks marking non-parsed items is traditional, and after that it really just built itself.

    The two technical decisions it comes under fire most often for are that it sets the mime-type to text/plain and that it uses Epoch time format, both of which I’d probably do differently if I were to write ESF mk2. The mime-type was chosen because it really *is* just a text document, and can be read as such. Also, I’m not sure creating a new mime-type for a tin pot format is at all responsible, and it was never really meant to go as far as it did.

    The date is less excusable. When I was doing background reading for all this I saw that for every method of displaying the time, there were three or four variations to be detected accounted for (From the case of the time/date delimiter to the order of the pieces), so I fell back to the one format I felt was most common to all languages, Unix’s default Epoch time. Of course, this doesn’t allow for any kind of time zoning and isn’t actually supported by MT, so in future I’ll stick to the ISO standard (And indeed for Aquaintances Feed Instances – something of a natural successor to ESF, though it never got released – which was a mail/news based format for single articles, I used the ISO standard).

    So it’s this first, this hatred of the XML based format wars, that got Epistula published. I fully accept that anger is an incredibly bad reason to put a new specification into the wild, and is the fountain of fuckwittery from which a number of the recent syndication debarcles have spewn forth, and this was September 2002, when RSS 0.92, RSS 1.0 and various variations were appearing, all incompatible, all increasingly difficult to parse (And I really don’t like XML modules), and I didn’t – and don’t – want full content feeds. So I created a brand new format with thin slivers of metadata that shouldn’t ever break the bandwidth-bank, wasn’t ever going to change (Scout’s Honour) and, above all, could be parsed with a regular expression or two.

    The problems haven’t gone away. Bloglines’ Web Services Thingy is helping to solve the bandwidth problem, but the more I watch Atom’s development, the more it worries me as it gets more and more complicated, and more and more things that feeds will have because one day something will come along to support them.

    ESF is the simplest thing that could possibly work, and that’s why it exists.

Categories
Imported From Epistula web development XML

Atom Comments

Looking up your hostname…
Got your hostname.
Welcome to the Internet Relay Network Aquarion
Your host is excalibur.esper.net running tiamat-1.0(04).ylist.hfix via dircproxy 1.0.5
This proxy has been running since Tue, 24 Aug 2004 09:41:55 +0100
nickserv
identify ****
Aquarion sets mode +i Aquarion
Now talking on #eddings
Topic for #eddings is: ‘I am A PRODUSER.’
Topic for #eddings set by itagne at Wed Aug 25 08:50:42 2004
Mandy
[10:29] yeah
Mandy
[10:29] occasionally I’ll log into the server via the website and delete the crap manually
Senji [10:41] continues to think that feeds should include magic information on how to comment on them, rather than you having to go to the actual entry’s page to comment.
Mandy
[10:46] it would be useful, yes
gilmae
[10:46] I’m betting taht will be ATom’s Killer Feature
-NickServ-
Password accepted – you are now recognized.
(You Connected)
Aquarion
Er, no. I think that’s a really sucky idea
gilmae
almost everything required to do it already exists in Atom, and the last stephas been mooted
gilmae
why?
Aquarion
Mostly because it dictates what I store (and can store) about a person
Senji
Aquarion – why?
Senji
Aquarion – the feature I want is basically a URL.
Aquarion
That already exists in RSS.
gilmae
what else would you demand of the user?
Aquarion
It’s what the <comments />element is for.
Senji
<comments />– no comments? 🙂
Senji
So it does… 🙂
Aquarion
gilmae, For any Epistula entry, I store various things from the location from where I posted it, date, time. I could include mood, current music playing, a whole host of things.
gilmae
for comments?
Aquarion
I could also do that with comments
gilmae
you could
Senji
Aquarion’s comments have lots of tickyboxes.
Aquarion
The point is that any standardised comments interface wouldn’t let me.
Aquarion
And there’s that too. It’s why any given API cannot do everything, and the Atom API won’t ever work fully for all weblogs
gilmae
The comments over Atom would just be a way for commenting for people who don’t want to go to your site though, in the same way that the feed is for people to read your posts without going there
gilmae
people who want the frilly bits are going to read the site, and comment there
Aquarion
They can fuck off. If they can’t be bothered to go to my site, I don’t need their opinion.
Senji also wishes that people wouldn’t use javascript popups for comment interfaces…
gilmae
It’s been al ong time since *you* were on dial up, eh, Aqn
gilmae
frankly, when I was on dialup, I didn’t have the time to sit around for people’s designs to load up
Senji
gil – people’s designs were too heavyweight then 😛
gilmae
I wanted to read what they said, not see the same design I saw yesterday
Aquarion
Because if it’s infinatly extendable, then no one client will support everything, and if it isn’t, then I can’t store everything I want.
gilmae
obviously, Aqn excepted cause he changes his header image often :- )
Aquarion
gilmae: It’s not the design. It’s the validation, the optional info (And different sites having different optional bits)
Mandy
Senji – I used to have javascript popups, but then I changed to a better blogging system. :-p
gilmae
Senji, I read about two dozen feeds daily, about another three dozen weekly…even light designs start to add up over that 56K link
Aquarion has about 180 RSS feeds he checks at the moment
gilmae
Aqn: validation? you can still do that over Atom
Aquarion
But I tend not to read them in the aggregator. If I want to read them, I go to the site they came from.
Aquarion
gilmae: Last I saw it, all atom’s validation was post effect. I can’t say “these you need”, and I still can’t include my tickyboxes
gilmae shrugs…behaviour differs obviously
Aquarion
I really don’t like the homogeneousness that reading everything though a aggregator (or a f/list, for that matter) dictates
Aquarion
I spend a while with my site making it look readable, and having it parsed though a sucky interface negates that
gilmae
don’t get me wrong, I’m really trying not to be personal, but that is so arrogant
Aquarion
Yes
gilmae
its a pretty short step to only allowing people to email you if they do it through the client you give them
Aquarion
I’d disagree with that.
Aquarion
My biggest problem with the comment-by-atom thing, and the reason why nothing I will ever write will use it, is that it’s an gaping hole with a large sign pointing into it saying “Free google-rank! Point your spambot here!”
gilmae
the only possible response I can give to that is awful, cause it is “It will all be better in the future”ism
Aquarion
When we have jetpacks and a base on the moon, I shall possibly rethink my position
gilmae
but the spec is pretty raw, and I just can’t see it going through without the ability to return reponse codes to indicate that an entry doesn’t validate, nor can I see it going through without allowing the author to close off comments, either permenantly or on a spam by spam basis
Aquarion
Yes, but that just means we have to run spamassassin on all our comments
Aquarion
And I’d rather not have another pile of “possibly spam” to sort though every week
gilmae
I can’t see why something like mt-blacklist can’t be used for Atom comments
Aquarion
Me neither, but I don’t see blocking spam as preferable to having the hole in the first place
Aquarion considers pasting this conversation into an entry
gilmae
actually, that’s very well done, Aqn
gilmae
you just came up with The Perfect Reason for people to write their own blogging engines
gilmae
“Well, I wrote my own so that my commenting system isn’t Movable Type’s”
Aquarion
Yeah, it’s a side benefit 🙂

(Slightly later)

gilmae
shall i make a Perfect World statement? :- )
Aquarion
You can, I might even append it to the entry I just posted 🙂
gilmae
In A Perfect World, and I accept that this is the same world as the one with Mr Fusion-powered-DeLoreans, you (Aqn) would be able to publish your schema/DTD/whatever of your commenting-frilly-bits, and the Atom client would be able to use discovery to see that you support this functionality and the schema would tell it how to support it
Aquarion
That is indeed a perfect world
Aquarion
Actually X-Forms would solve some of that.
Aquarion
Include an X-Form for the comment – which has validation information – for each entry
Aquarion
And, while we’re at it, I want a pony ana castle.
Categories
Imported From Epistula Python XML

Accomplish

Todays accomplishment:

XML + XSLT + Python = HTML

(I should point out that the above has required my relearning Python from scratch – previous efforts have been basically PHP in Python – learning XSLT, and learning how mod_python works. Lots of work for so little gain)

Categories
aqcom Aquaintances epistula Imported From Epistula intertwingularity XML

Updates

Aquaintances now exports a valid OPML file.

This was far more work than it needed to be, because I have been unable to find a reference for a valid OPML file anywhere, blo.gs OPML files got imported by Dave’s Wonderful New Toy as “0 feeds added”, which is odd, because they were in exactly the same format as Dave’s old Blogroll before he redesigned Scripting.com. Grr.

Paul? Does this lower my Winer Scorecard number?

Oh, yeah, the other thing I’ve done today.

Banners are sticky. That is, it seems a shame to lose all these nice banners I spend ages making, so they now stick to the archive. If I can find my archive of all the ones I did last time I did the rotating banner thing, I’ll put those up too, but right now it’s just this weeks and January’s.

And yes, I’ll explain the “Frowny Lightbulb” thing soon. Promise.

Categories
Imported From Epistula XML

Repeat

Look, if I say something I can see it being ignored, but when Dorathea who has both a better knowledge of all this and A-List status says it, and people ignore it also, it’s just silly. So, what do we think will happen if other people start with the same sentiments?

Well, they’ll be ignored still. Please don’t bastardize the XML forms. I mean, make it RDF if you want to, make it it’s own vocab and DTD if that is what you feel, but please, please for the sake of my sanity as a coder, don’t allow people to randomly mix up any vocabularies they think they might like! No piece of software can nor should have to parse any random form of XML, be it past, present or future. The ability to add cool things is all very well, and I can even live with “Hang all backwards compatibility”, but the current temptation to allow anything in any documents means that in order to process any single document I’m either going to have to work with some kind of Uberparser which takes in XML to it’s wide gaping maw to spend a week digesting and being shat out as something I can actually understand; or I’m simply going to have to ignore any elements I don’t understand (Which could be categories, authors, extras or equally the content itself).

I have a complex RSS2 feed, it contains data from RSS 0.9* all the way up to the latest and greatest in RSS modula, from the number of comments to the trackback URL. If there is a reader out there which understands even 90% of the tags in it, I’ll be shocked and stunned.

I’m in favour of loosly defined data formats for internal use, for passing between known systems. But for a defined internet specification to rival SMTP, HTTP, NNTP et. al (and that’s what Atom/PIE/WOX could be if it works) you simply can’t give it this wide a canvas, it’s simply too much for parsing it on a major scale to be economical.

Categories
Imported From Epistula intertwingularity XML

Living in Syn

Hot topic within the geekoblogsphere this month is – in reverse order – the WOX project and WinerWatch.

I’m going to ignore WinerWatch (which is password protected now).

The WAX project – also known as “PIE” or “nECHO”, but I like “WOX” to stand for “Weblogs over XML” Eventually they’ll think of a better name and a permanent one, ‘till then I’ll call it WOX.

The project, whatever it’s name, is really simple at it’s heart. They are trying to define an XML format for weblogs. Problem is they are making a number of mistakes, and because I don’t trust Wikism they’ll never know I think that. (I was involved in Everything2, one of the first wiki-likes, and then went away for three months. In that time the mood of the site and general consensus was changed, and half my work was deleted. I’m now extremely wary of putting anything into that kind of public editing process) so you get this rant instead 🙂

When I wrote XML is the new black I meant it. All-things-to-all-people will be the death of XML. If you look at RSS2 you can see exactly why Dave Winer doesn’t like Funky Feeds (Which a careful calculation has seen means “Anything that uses name spaces”), but his reasoning is different to mine.

My point, and the reason I created ESF last year, is that when you are sending out a version of your site that’ll be collected once every hour or so by anyone who is even vaguely interested in what you say, you want to keep the amount of bandwidth that is being taken up by that feed to an absolute minimum. To a site like Aqcom where most of my visitors are normal browsers this isn’t much of an issue, but for people like Mark or Stuart where a large percentage of their readership browses with aggregators (Last time I saw Kryogenix’s stats (Which were updated in April on the page I found) his XML-feed count was twice his home-page hit-count. RSS Readers account for 1.58% of my readership (IE 49.74%, Moz 22.7%)) this is a bandwidth-breaker. It’s the reason Mark only puts excepts in his feeds. If you feed your entire site, including meta-data, I can’t help think you’re giving too much away.

Syndication means feeding your content out so other people can use it. The current model includes facilities for extending the feed infinitely using name spaces (meaning you can include foaf, ent dc or whatever data you want in your feed) which seems like a neat idea, until you have to support it. Do you know how many XML specifications there are for categories? DC has one, ENT _is_ one, WOX itself has a proposed “metadata” tag for this kind of thing, how is an aggregator meant to be able to tell what it is? The problem with names-spaced XML is that in order to display a page correctly, you have to understand each and every tin-pot format the creator has used, meaning it’s ideal in an enclosed environment where somebody somewhere defines what name spaces the document uses, but loose on the Internet it means that any given aggregator has to keep track of hundreds of specifications if it wants to get all the information it can out of the feed, not to mention the problems of people who pollute the given name of a – and I use this phrase in the loosest possible sense – standard. On top of all this metadata for the entry, you are now putting in metadata for the feed itself, meaning that for every element of data you include, you have to explain it, further bloating the feed.

This is why I think WOX is making the large mistakes. Also, I disagree with the decision that trackbacks and pingbacks are comments, and have to be treated as such, when I don’t.

Categories
Imported From Epistula windows XML

Office XML to be real XML after all

I got an interesting email today.

It being Saturday, this is a rare thing. Well, it’s a rare thing anyway, my email has been 90% spam since january, but this was interesting in a sort of XML-type way, so you get to hear it to.

Last week, Microsoft put a new newsgroup on it’s public NNTP server (news:msnews.microsoft.com) called microsoft.public.office.xml, which immediately caught my interest. A little while ago, there was some question over whether MS was going to back down on the “Office Will Do XML” stuff, so I asked. I was answered. This is what they said:

From: "Cybarber"

Hi,
There has been a lot of talk about Microsoft scaling down on XML in the
versions of Office 2003.

Hopefully this rumour is unfounded and XML functionality will be in the
Standard edition aswell.

Is there any clarity about this subject already?

From: "Joe_MSFT"

In short, yes, the Professional version will have additional XML capabilites
from the other editions, primarily those based on how customer-defined XML
schema can be used. This was done because the product editions have been
designed to meet the needs of different audiences. A brief overview is
here:
http://www.microsoft.com/presspass/newsroom/office/factsheets/OfficeSKUFS.asp
We will be publishing more details in the coming months.

Can you use XML with all 2003 versions – yes. They will all continue to
have XML Web services support through the Web services toolkit and Word and
Excel will be able to be saved in their respective native XML file formats
that allow for content reuse, transformation, construction and such.

--
Joe Andreshak, Microsoft Product Manager
This post is provided "AS IS" with no warranties, and confers no rights
Sample code subject to http://www.microsoft.com/info/cpyright.htm

From: Aquarion

Does this mean that any given Office 2003 file in it’s native XML format
– from any version of Office – can be transformed by, say, an XSLT
stylesheet?

It’s possible to remain within XML specs, yet having the main content in
a binary format, and I’ve heard rumours that this is the case.

From: "Joe_MSFT"
I’m not sure I understand what you’re asking, but let me explain.

The binary file format is separate from the XML file format. With
Office 2003, users can choose to save Word and Excel files in EITHER the
binary or XML file format. The binary file formats are similar to what
has existed for the previous versions.

The Word XML format is new for Office 2003. Previous versions of Office
will not know how to interpret the XML and simply open it as a plain
text file. There is no straightforward way to convert old binary files
to XML files, however you could open an old binary file in Word 2003,
for instance, and now save it as XML. This XML file is now usable by
any program that can interpret XML tags.

Hope this helps,
Joe

So it looks like all versions will save as XML, but only the high-price versions will have all the XML modification/DTD stuff in it.