Archive for May, 2003
Office XML to be real XML after all
Saturday, May 31st, 2003I got an interesting email today.
It being Saturday, this is a rare thing. Well, it’s a rare thing anyway, my email has been 90% spam since january, but this was interesting in a sort of XML-type way, so you get to hear it to.
Last week, Microsoft put a new newsgroup on it’s public NNTP server (news:msnews.microsoft.com) called microsoft.public.office.xml, which immediately caught my interest. A little while ago, there was some question over whether MS was going to back down on the “Office Will Do XML” stuff, so I asked. I was answered. This is what they said:
From: "Cybarber"
Hi,
There has been a lot of talk about Microsoft scaling down on XML in the
versions of Office 2003.Hopefully this rumour is unfounded and XML functionality will be in the
Standard edition aswell.Is there any clarity about this subject already?
From: "Joe_MSFT"
In short, yes, the Professional version will have additional XML capabilites
from the other editions, primarily those based on how customer-defined XML
schema can be used. This was done because the product editions have been
designed to meet the needs of different audiences. A brief overview is
here:
http://www.microsoft.com/presspass/newsroom/office/factsheets/OfficeSKUFS.asp
We will be publishing more details in the coming months.Can you use XML with all 2003 versions – yes. They will all continue to
have XML Web services support through the Web services toolkit and Word and
Excel will be able to be saved in their respective native XML file formats
that allow for content reuse, transformation, construction and such.-- Joe Andreshak, Microsoft Product Manager This post is provided "AS IS" with no warranties, and confers no rights Sample code subject to http://www.microsoft.com/info/cpyright.htm
From: Aquarion
Does this mean that any given Office 2003 file in it’s native XML format
- from any version of Office – can be transformed by, say, an XSLT
stylesheet?It’s possible to remain within XML specs, yet having the main content in
a binary format, and I’ve heard rumours that this is the case.
From: "Joe_MSFT"
I’m not sure I understand what you’re asking, but let me explain.The binary file format is separate from the XML file format. With
Office 2003, users can choose to save Word and Excel files in EITHER the
binary or XML file format. The binary file formats are similar to what
has existed for the previous versions.The Word XML format is new for Office 2003. Previous versions of Office
will not know how to interpret the XML and simply open it as a plain
text file. There is no straightforward way to convert old binary files
to XML files, however you could open an old binary file in Word 2003,
for instance, and now save it as XML. This XML file is now usable by
any program that can interpret XML tags.Hope this helps, Joe
So it looks like all versions will save as XML, but only the high-price versions will have all the XML modification/DTD stuff in it.
In defense of weblogs
Friday, May 30th, 2003It’s probably fair to say that there is an element of the online population who feel that the recent explosion of weblogs and the publicity it has recieved is undue. Furthermore, it has emerged that All Webloggers Are Teenage Girls.
Well, Most of them.
Well, Most of them that live in poland.
Well, most of them that live in poland and filled out the survay.
I was going to start this article as “There are two types of weblogs in this world” and divide the world into Them (The webloggers who are, as Andrew carefully and elequently describes, a bunch of sixteen year old pop-idol fans gazing at their navels whilst describing how great the macaroni and cheese they had for dinner last night was) verses Us (The cool, level headed Webloggers, who discuss new W3C standards and the direction of them; the future of weblogging; the whys and wherefores of XML development and such), before I did something that stopped me doing this. I read my archives. So I ask this instead:
What is a weblog?
It’s a collection of articles displayed in descending chronological order on the front page, normally archived by month or week. Easy. Does that stop FTrain being a kind of weblog? Not really. Does that make The Register a weblog? Probably not. So, we have these weblog things, where we can safely define Weblogs as “Something I think is a weblog”, and that’s about it. This means that you can make something a weblog by putting it on the web and saying “This is a weblog”, and since weblogs are currently Cool, an awful lot of people are putting things online and calling them weblogs.
Saying that all webloggers act like 12 year olds is rougly equivilent to saying that “All journalists are no-talent hacks with the integrity of a ball of water”. Andrew Orlowski saying that blogs are a waste of bandwidth is roughly equivlant to the classic “All Greeks Are Liars” argument, since his irregular columns are mostly a rant three folds long about whatever is on his mind at the time. But I digress, for this isn’t about attacking Andrew Orlowski.
So, ninty percent of weblogs are crap, depending on how you define weblog.
So far, so normal. Ninty percent of everything is crap. There is still the 10%, there are still people like Stavros, BB, Mark, CavLec et. al. who make wading though the rest of it worth it.
One day I’ll submit something like this to The Register to see if they print it. I doubt it.
Holy Permalinks, Badman!
Friday, May 30th, 2003I have been convinced that permalinks should be nicer, and so they are now /journal/year/month/day/title. Happy now? :-)
Edit: I’m not sure Pingback is working on the new permalinks yet. Could someone pingback me? I should also point out that the old permalinks still work too :)
Syndication::ESF
Thursday, May 29th, 2003Um. Someone has proposed a Perl Module for ESF.
Golly
In an attempt to cut down my hits…
Thursday, May 29th, 2003robots.txt is a file that ninty-nine percent of all search engines download from the root domain of a webserver and use it as instructions for what – and what not – to index.
This is my robots.txt file:
User-Agent: * Disallow: /fun/mp3s.html Disallow: /comment Disallow: /trackback Disallow: /logging Disallow: /attachment Disallow: /search Disallow: /archive
See that last one? That’s the odd one out. It’s going to take a while (the top one has been there for a couple of months now, and was only removed for two weeks, and searches for MP3s account for most of my search traffic), but I’ve blocked Google from my date-based archives.
Why? Have I gone insane? Not quite. I’m currently plagued by incorrect search results. Until earlier this week, This Page was the top match on Google for the phrase “I Hate Dominos”. When I mentioned this a couple of days ago, that page became the top match within hours. This is stupid. Not only is Aquarionics defiantly not about my hatred of Dominos, I didn’t even say I did, some random anonymous commenter did.
Part of the problem with this is that every article gets indexed by Google twice (multiplied by the number of sites I get spidered as, now down to just one from six last week) and the top 200 words get indexed once more (The first two are part of the daily and single-item archives, the third is as the monthly archives which only show extracts or descriptions). This means that not only do people search for random things and get my website, when they search for things I do talk about they get the monthly page, where the phrase might be fifteen folds down.
So I’ve blocked search engines from searching archives, and instead made sure that there is a big list of links to every single entry in each section, so the engines can still find them but now will only index the page-per-article sections instead of having four copies of every item.
Which is neat.