Aquarionics

Category > XML

The Xtesnsible Markup Language

Saturday 31st May 2003

Office XML to be real XML after all

I got an interesting email today.

It being Saturday, this is a rare thing. Well, it's a rare thing anyway, my email has been 90% spam since january, but this was interesting in a sort of XML-type way, so you get to hear it to.

Last week, Microsoft put a new newsgroup on it's public NNTP server (news:msnews.microsoft.com) called microsoft.public.office.xml, which immediately caught my interest. A little while ago, there was some question over whether MS was going to back down on the "Office Will Do XML" stuff, so I asked. I was answered. This is what they said:

From: "Cybarber"

Hi,
There has been a lot of talk about Microsoft scaling down on XML in the versions of Office 2003.

Hopefully this rumour is unfounded and XML functionality will be in the Standard edition aswell.

Is there any clarity about this subject already?

From: "Joe_MSFT"

In short, yes, the Professional version will have additional XML capabilites from the other editions, primarily those based on how customer-defined XML schema can be used. This was done because the product editions have been designed to meet the needs of different audiences. A brief overview is here: http://www.microsoft.com/presspass/newsroom/office/factsheets/OfficeSKUFS.asp We will be publishing more details in the coming months.

Can you use XML with all 2003 versions - yes. They will all continue to have XML Web services support through the Web services toolkit and Word and Excel will be able to be saved in their respective native XML file formats that allow for content reuse, transformation, construction and such.

--
Joe Andreshak, Microsoft Product Manager
This post is provided "AS IS" with no warranties, and confers no rights
Sample code subject to http://www.microsoft.com/info/cpyright.htm
From: Aquarion

Does this mean that any given Office 2003 file in it's native XML format - from any version of Office - can be transformed by, say, an XSLT stylesheet?

It's possible to remain within XML specs, yet having the main content in a binary format, and I've heard rumours that this is the case.

From: "Joe_MSFT"
I'm not sure I understand what you're asking, but let me explain.

The binary file format is separate from the XML file format. With Office 2003, users can choose to save Word and Excel files in EITHER the binary or XML file format. The binary file formats are similar to what has existed for the previous versions.

The Word XML format is new for Office 2003. Previous versions of Office will not know how to interpret the XML and simply open it as a plain text file. There is no straightforward way to convert old binary files to XML files, however you could open an old binary file in Word 2003, for instance, and now save it as XML. This XML file is now usable by any program that can interpret XML tags.

Hope this helps,
Joe

So it looks like all versions will save as XML, but only the high-price versions will have all the XML modification/DTD stuff in it.

Those who spoke on this:

gravatar image

Colin Watson:

2003-06-01 00:10 7 hrs after the Original Article

Unfortunately that last reply reads to me as “product manager didn’t understand your question but has been told that XML is good because it’s interoperable, so repeated that”; I’m not sure I’d start celebrating based on that reply alone. It would be nice if I turned out to be wrong …

Comment Link

gravatar image

Aquarion:

2003-06-01 08:24 8 hrs after Colin Watson

Could be, but he is showing clue in other threads. The key point was "This XML file is now usable by any program that can interpret XML tags.", which I’d say means you can at least get at the tags within the document and get at the text.

Comment Link

gravatar image

A Nameless One:

2003-06-01 10:15 17 hrs after the Original Article

It sounds to me that the default saving format will be the binary one, and not the xml one – so it won’t help interoperability at all – everyone will just use the default format, and telling others to “save as xml” will be just like telling them to “save as rtf” – not very friendly.

Comment Link

gravatar image

MP:

2003-06-01 20:04 10 hrs after parent

I think it’s friendly. Well, I do it to people sending me files, and will quite happily email a standard explanation of why and how to do it to them.
Most of the people I deal with, at least, are quite willing to do this…

Comment Link


Saturday 12th July 2003

Living in Syn

Hot topic within the geekoblogsphere this month is – in reverse order – the WOX project and WinerWatch.

I’m going to ignore WinerWatch (which is password protected now).

The WAX project – also known as “PIE” or “nECHO”, but I like “WOX” to stand for “Weblogs over XML” Eventually they’ll think of a better name and a permanent one, ‘till then I’ll call it WOX.

The project, whatever it’s name, is really simple at it’s heart. They are trying to define an XML format for weblogs. Problem is they are making a number of mistakes, and because I don’t trust Wikism they’ll never know I think that. (I was involved in Everything2, one of the first wiki-likes, and then went away for three months. In that time the mood of the site and general consensus was changed, and half my work was deleted. I’m now extremely wary of putting anything into that kind of public editing process) so you get this rant instead :-)

When I wrote XML is the new black I meant it. All-things-to-all-people will be the death of XML. If you look at RSS2 you can see exactly why Dave Winer doesn’t like Funky Feeds (Which a careful calculation has seen means “Anything that uses name spaces”), but his reasoning is different to mine.

My point, and the reason I created ESF last year, is that when you are sending out a version of your site that’ll be collected once every hour or so by anyone who is even vaguely interested in what you say, you want to keep the amount of bandwidth that is being taken up by that feed to an absolute minimum. To a site like Aqcom where most of my visitors are normal browsers this isn’t much of an issue, but for people like Mark or Stuart where a large percentage of their readership browses with aggregators (Last time I saw Kryogenix’s stats (Which were updated in April on the page I found) his XML-feed count was twice his home-page hit-count. RSS Readers account for 1.58% of my readership (IE 49.74%, Moz 22.7%)) this is a bandwidth-breaker. It’s the reason Mark only puts excepts in his feeds. If you feed your entire site, including meta-data, I can’t help think you’re giving too much away.

Syndication means feeding your content out so other people can use it. The current model includes facilities for extending the feed infinitely using name spaces (meaning you can include foaf, ent dc or whatever data you want in your feed) which seems like a neat idea, until you have to support it. Do you know how many XML specifications there are for categories? DC has one, ENT _is_ one, WOX itself has a proposed “metadata” tag for this kind of thing, how is an aggregator meant to be able to tell what it is? The problem with names-spaced XML is that in order to display a page correctly, you have to understand each and every tin-pot format the creator has used, meaning it’s ideal in an enclosed environment where somebody somewhere defines what name spaces the document uses, but loose on the Internet it means that any given aggregator has to keep track of hundreds of specifications if it wants to get all the information it can out of the feed, not to mention the problems of people who pollute the given name of a – and I use this phrase in the loosest possible sense – standard. On top of all this metadata for the entry, you are now putting in metadata for the feed itself, meaning that for every element of data you include, you have to explain it, further bloating the feed.

This is why I think WOX is making the large mistakes. Also, I disagree with the decision that trackbacks and pingbacks are comments, and have to be treated as such, when I don’t.

Those who spoke on this:

gravatar image

beaneater:

2003-07-12 08:59 66 mins after the Original Article

Eek.

I’ve heard of the idea of trackback as comment, and there is sensible reasoning there. But pingback? I’ve not even heard a suggestion that those are comments.

Comment Link

gravatar image

gilmae:

2003-07-12 11:08 2 hrs after beaneater

I think it helps if you forget the word ‘Commen ‘ and think ‘Followup’. Comments, Trackbacks and Pingbacks are all just different followups.

Comment Link

gravatar image

beaneater:

2003-07-12 15:45 5 hrs after gilmae

I’m not so sure. I don’t think a pingback is as strong as that. It just signifies a reference, not an active followup. No?

Comment Link

gravatar image

Paul Freeman:

2003-07-12 20:18 9 hrs after gilmae

Would you consider an entry in a referal log to be a comment?

Comment Link

gravatar image

gilmae:

2003-07-14 02:17 1 day after Paul Freeman

I don’t think of pingbacks as being a comment, I think of them as both being ways in which a reader (essentially) says "Hey, I read what you said". It doesn’t even imply that the reader has something to add, because technically you don’t really need to add anything to the conversation when you track- or pingback, just link and ping. After pausing for a few minutes and thinking about it, I don’t really even need to add anything when I comment, I could just put whitespace in the comment body. I’d be a dickhead for doing so but I could. Additionally, Pingback is just glorified referral log searching. It merely automates the process, saving the pingee the hassle of actually having to trawl through their logs, and guaranteeing that referrals will be known rather than relying on someone clicking the referring link. So if you buy that Pingbacks are a form of followup (can you guess that I do?), then you are more or less bound to accept that referal log entries are also a followup. Of course, it is contingent on that Pingback==followup bit though.

Comment Link

gravatar image

Simon Willison:

2003-07-14 06:08 2 days after the Original Article

Actually, funky doesn’t mean “stuff that uses namespaces” – it has since been defined by Dave to mean “stuff that uses an element from a namespace in place of an element from the RSS 2.0 core specification when the core element would have been fine”. The classic example is using dc:date while leaving out pubDate. This complaint actually makes a lot more sense, and a massive amount of bother could have been avoided if only Dave had explained what on earth he was going on about sooner.

Comment Link

gravatar image

Aquarion:

2003-07-14 06:13 5 mins after Simon Willison

Yes, I saw that yesterday. I’m was about to blog about it, but got caught up in fixing the comments system :-)

Comment Link


Repeat

Look, if I say something I can see it being ignored, but when Dorathea who has both a better knowledge of all this and A-List status says it, and people ignore it also, it’s just silly. So, what do we think will happen if other people start with the same sentiments?

Well, they’ll be ignored still. Please don’t bastardize the XML forms. I mean, make it RDF if you want to, make it it’s own vocab and DTD if that is what you feel, but please, please for the sake of my sanity as a coder, don’t allow people to randomly mix up any vocabularies they think they might like! No piece of software can nor should have to parse any random form of XML, be it past, present or future. The ability to add cool things is all very well, and I can even live with “Hang all backwards compatibility”, but the current temptation to allow anything in any documents means that in order to process any single document I’m either going to have to work with some kind of Uberparser which takes in XML to it’s wide gaping maw to spend a week digesting and being shat out as something I can actually understand; or I’m simply going to have to ignore any elements I don’t understand (Which could be categories, authors, extras or equally the content itself).

I have a complex RSS2 feed, it contains data from RSS 0.9* all the way up to the latest and greatest in RSS modula, from the number of comments to the trackback URL. If there is a reader out there which understands even 90% of the tags in it, I’ll be shocked and stunned.

I’m in favour of loosly defined data formats for internal use, for passing between known systems. But for a defined internet specification to rival SMTP, HTTP, NNTP et. al (and that’s what Atom/PIE/WOX could be if it works) you simply can’t give it this wide a canvas, it’s simply too much for parsing it on a major scale to be economical.


Updates

Aquaintances now exports a valid OPML file.

This was far more work than it needed to be, because I have been unable to find a reference for a valid OPML file anywhere, blo.gs OPML files got imported by Dave’s Wonderful New Toy as “0 feeds added”, which is odd, because they were in exactly the same format as Dave’s old Blogroll before he redesigned Scripting.com. Grr.

Paul? Does this lower my Winer Scorecard number?

Oh, yeah, the other thing I’ve done today.

Banners are sticky. That is, it seems a shame to lose all these nice banners I spend ages making, so they now stick to the archive. If I can find my archive of all the ones I did last time I did the rotating banner thing, I’ll put those up too, but right now it’s just this weeks and January’s.

And yes, I’ll explain the “Frowny Lightbulb” thing soon. Promise.

Those who spoke on this:

gravatar image

Paul:

2004-01-19 12:43 11 hrs after the Original Article

Err, don’t think so, that was so long ago that I’ve forgotten all about it. :)

Who was searching and found Burnt_Offerings then?
(n.b. it didn’t like your url because it was too long)

Line 3 word 7 is still a little long (Lines should be shorter than 80 chars)

Comment Link


Monday 10th May 2004

Accomplish

Todays accomplishment:

XML + XSLT + Python = HTML

(I should point out that the above has required my relearning Python from scratch – previous efforts have been basically PHP in Python – learning XSLT, and learning how mod_python works. Lots of work for so little gain)


Thursday 26th August 2004

Atom Comments

Looking up your hostname...
Got your hostname.
Welcome to the Internet Relay Network Aquarion
Your host is excalibur.esper.net running tiamat-1.0(04).ylist.hfix via dircproxy 1.0.5
This proxy has been running since Tue, 24 Aug 2004 09:41:55 +0100
nickserv
identify ****
Aquarion sets mode +i Aquarion
Now talking on #eddings
Topic for #eddings is: 'I am A PRODUSER.'
Topic for #eddings set by itagne at Wed Aug 25 08:50:42 2004
Mandy
[10:29] yeah
Mandy
[10:29] occasionally I'll log into the server via the website and delete the crap manually
Senji [10:41] continues to think that feeds should include magic information on how to comment on them, rather than you having to go to the actual entry's page to comment.
Mandy
[10:46] it would be useful, yes
gilmae
[10:46] I'm betting taht will be ATom's Killer Feature
-NickServ-
Password accepted - you are now recognized.
(You Connected)
Aquarion
Er, no. I think that's a really sucky idea
gilmae
almost everything required to do it already exists in Atom, and the last stephas been mooted
gilmae
why?
Aquarion
Mostly because it dictates what I store (and can store) about a person
Senji
Aquarion - why?
Senji
Aquarion - the feature I want is basically a URL.
Aquarion
That already exists in RSS.
gilmae
what else would you demand of the user?
Aquarion
It's what the <comments />element is for.
Senji
<comments />-- no comments? :)
Senji
So it does... :)
Aquarion
gilmae, For any Epistula entry, I store various things from the location from where I posted it, date, time. I could include mood, current music playing, a whole host of things.
gilmae
for comments?
Aquarion
I could also do that with comments
gilmae
you could
Senji
Aquarion's comments have lots of tickyboxes.
Aquarion
The point is that any standardised comments interface wouldn't let me.
Aquarion
And there's that too. It's why any given API cannot do everything, and the Atom API won't ever work fully for all weblogs
gilmae
The comments over Atom would just be a way for commenting for people who don't want to go to your site though, in the same way that the feed is for people to read your posts without going there
gilmae
people who want the frilly bits are going to read the site, and comment there
Aquarion
They can fuck off. If they can't be bothered to go to my site, I don't need their opinion.
Senji also wishes that people wouldn't use javascript popups for comment interfaces...
gilmae
It's been al ong time since *you* were on dial up, eh, Aqn
gilmae
frankly, when I was on dialup, I didn't have the time to sit around for people's designs to load up
Senji
gil - people's designs were too heavyweight then :-P
gilmae
I wanted to read what they said, not see the same design I saw yesterday
Aquarion
Because if it's infinatly extendable, then no one client will support everything, and if it isn't, then I can't store everything I want.
gilmae
obviously, Aqn excepted cause he changes his header image often :- )
Aquarion
gilmae: It's not the design. It's the validation, the optional info (And different sites having different optional bits)
Mandy
Senji - I used to have javascript popups, but then I changed to a better blogging system. :-p
gilmae
Senji, I read about two dozen feeds daily, about another three dozen weekly...even light designs start to add up over that 56K link
Aquarion has about 180 RSS feeds he checks at the moment
gilmae
Aqn: validation? you can still do that over Atom
Aquarion
But I tend not to read them in the aggregator. If I want to read them, I go to the site they came from.
Aquarion
gilmae: Last I saw it, all atom's validation was post effect. I can't say "these you need", and I still can't include my tickyboxes
gilmae shrugs...behaviour differs obviously
Aquarion
I really don't like the homogeneousness that reading everything though a aggregator (or a f/list, for that matter) dictates
Aquarion
I spend a while with my site making it look readable, and having it parsed though a sucky interface negates that
gilmae
don't get me wrong, I'm really trying not to be personal, but that is so arrogant
Aquarion
Yes
gilmae
its a pretty short step to only allowing people to email you if they do it through the client you give them
Aquarion
I'd disagree with that.
Aquarion
My biggest problem with the comment-by-atom thing, and the reason why nothing I will ever write will use it, is that it's an gaping hole with a large sign pointing into it saying "Free google-rank! Point your spambot here!"
gilmae
the only possible response I can give to that is awful, cause it is "It will all be better in the future"ism
Aquarion
When we have jetpacks and a base on the moon, I shall possibly rethink my position
gilmae
but the spec is pretty raw, and I just can't see it going through without the ability to return reponse codes to indicate that an entry doesn't validate, nor can I see it going through without allowing the author to close off comments, either permenantly or on a spam by spam basis
Aquarion
Yes, but that just means we have to run spamassassin on all our comments
Aquarion
And I'd rather not have another pile of "possibly spam" to sort though every week
gilmae
I can't see why something like mt-blacklist can't be used for Atom comments
Aquarion
Me neither, but I don't see blocking spam as preferable to having the hole in the first place
Aquarion considers pasting this conversation into an entry
gilmae
actually, that's very well done, Aqn
gilmae
you just came up with The Perfect Reason for people to write their own blogging engines
gilmae
"Well, I wrote my own so that my commenting system isn't Movable Type's"
Aquarion
Yeah, it's a side benefit :-)
(Slightly later)
gilmae
shall i make a Perfect World statement? :- )
Aquarion
You can, I might even append it to the entry I just posted :-)
gilmae
In A Perfect World, and I accept that this is the same world as the one with Mr Fusion-powered-DeLoreans, you (Aqn) would be able to publish your schema/DTD/whatever of your commenting-frilly-bits, and the Atom client would be able to use discovery to see that you support this functionality and the schema would tell it how to support it
Aquarion
That is indeed a perfect world
Aquarion
Actually X-Forms would solve some of that.
Aquarion
Include an X-Form for the comment - which has validation information - for each entry
Aquarion
And, while we're at it, I want a pony ana castle.

Those who spoke on this:

gravatar image

gilmae:

2004-08-26 09:49 26 mins after the Original Article

You’ve got to have dreams

Comment Link

gravatar image

emma:

2004-08-26 10:08 45 mins after the Original Article

I totally agree with you. I want people to visit my site as well. I spend a lot of time working on it, changing the layout every now and then.

I post almost everything on my LJ as well though, because I it’s a) easy to do with wBloggar supporting LJ and my weblog supporting the blogger API b) because that’s where the most feedback comes from.

It it not A Perfect World :(

Comment Link

gravatar image

Peter:

2004-08-26 10:18 55 mins after the Original Article

>Aquarion
>I spend a while with my site making it look
>readable, and having it parsed though a sucky
>interface negates that

“I spend a while getting the fonts and colours in my emails just right, and telling me to send in plain text negates that”

Comment Link

gravatar image

Aquarion:

2004-08-26 10:42 23 mins after Peter

Your point is well made, but misses the point I was trying to make.

LJ & Bloglines – for example – fuck around with the HTML. In fact, LJ strips it, and only displays tiny bits. Bloglines applies its own stylesheet. People reading the above in Bloglines would find it more difficult, because it uses a definition list and Aquarionics’ stylesheet to make it readable as a coversation (This will work better when I fix the stylesheet a bit more, but I don’t have time right now, hense the direct paste rather than detailed essay). I want people to see it in the most readable way, and the most readable way for this journal will not be the same for every other journal in existance. There is no such thing as a Perfect Design for every journal or weblog, we can just make it the best we can for the content we write.

Also – and this is also what I mean by having the comment-data inline – You cannot – and should never – autogenerate forms. Never not ever. After some consideration, the layout of the commenting form for this site is (I think) fairly logical (I rarely get mis-directed comments anymore). Even if ATOM-Comment did allow me to say “show these check-boxen” (for the email) or “this is a location combo box” or whatever, I’ve no control over the logic and flow of the form. It would be autogenerated, and that’s Not Good.

Comment Link


ESF's second birthday

In the great and powerful world of weblogs, anything older than a week, that has vanished into the archives, is dead, gone, and will never be seen again.

Well, almost. ESF appears to have reappeared on people’s radar, and since today is exactly two years (and one month, damn) to the day that I released the spec, I thought it might be time for a little retrospective on why it existed, why it still exists, and where it went.

Well, like a small child with a paintbrush, it went everywhere. Plugins and templates exist for almost every major weblogging tool (Including an MT plugin just to create the required date format) and an increasingly scary number of minor ones. There’s even a feed reader for it (Which has, irritatingly, “extended” the format to allow a text description, which is somewhat against the spirit of the format). Oh, and a CPAN module to create and read it. I’m absolutely freaking amazed by all of this. I created the format for two main reasons:

  1. I was annoyed at the syndication wars
  2. Epistula needed it.

The second was the actual reason for all this. I wanted a basic format to include a list the last x items of a section without hitting the database each time. I wanted to use an existing feed format for that, but really didn’t want to touch XML parsing with a sixty-foot pole at that point in the system. Because I was – and am – a *nix Admin, the most natural format for me to put this in was something approaching the classic news/mail format, which has passed data between systems for decades without needing to involve XML. I swapped the colon-separated format of that with a tabbed-based format, mostly because anyone using colons in a title field can be forgiven, but anyone using tabs has larger problems already. Hash marks marking non-parsed items is traditional, and after that it really just built itself.

The two technical decisions it comes under fire most often for are that it sets the mime-type to text/plain and that it uses Epoch time format, both of which I’d probably do differently if I were to write ESF mk2. The mime-type was chosen because it really *is* just a text document, and can be read as such. Also, I’m not sure creating a new mime-type for a tin pot format is at all responsible, and it was never really meant to go as far as it did.

The date is less excusable. When I was doing background reading for all this I saw that for every method of displaying the time, there were three or four variations to be detected accounted for (From the case of the time/date delimiter to the order of the pieces), so I fell back to the one format I felt was most common to all languages, Unix’s default Epoch time. Of course, this doesn’t allow for any kind of time zoning and isn’t actually supported by MT, so in future I’ll stick to the ISO standard (And indeed for Aquaintances Feed Instances – something of a natural successor to ESF, though it never got released – which was a mail/news based format for single articles, I used the ISO standard).

So it’s this first, this hatred of the XML based format wars, that got Epistula published. I fully accept that anger is an incredibly bad reason to put a new specification into the wild, and is the fountain of fuckwittery from which a number of the recent syndication debarcles have spewn forth, and this was September 2002, when RSS 0.92, RSS 1.0 and various variations were appearing, all incompatible, all increasingly difficult to parse (And I really don’t like XML modules), and I didn’t – and don’t – want full content feeds. So I created a brand new format with thin slivers of metadata that shouldn’t ever break the bandwidth-bank, wasn’t ever going to change (Scout’s Honour) and, above all, could be parsed with a regular expression or two.

The problems haven’t gone away. Bloglines’ Web Services Thingy is helping to solve the bandwidth problem, but the more I watch Atom’s development, the more it worries me as it gets more and more complicated, and more and more things that feeds will have because one day something will come along to support them.

ESF is the simplest thing that could possibly work, and that’s why it exists.


Nicholas 'Aquarion' Avenell is a web developer in London, you can find out more about him or how to get in touch.

There are more Articles, Projects, Journal Entries, Photographs and things that defy description here, too.

If you're looking for something specific, there are Calendar & Category -based lists of everything.

And if you want to follow stuff that appears here, try a Syndication Feed, or the generic Feed of everything.


Aquarion's last Twitter was: [updating]
Twitter last updated


[RSS Icon][ESF Icon][CDF Icon]
© 2000 to 2008 inclusive Nicholas Avenell
All comments are the property of their creators, published with permission
(Unless otherwise indicated, the opinions and sentiments expressed on this site are those of the author and not of any organisation of which he is an affiliate, including his employer. Caveat Lector, E&OE. sigh)
0.460 seconds, 120 queries, 2.76Mb on Fri, 08 Aug 2008 19:12:23 +0000
Generated by Epistula Version 2.0.3