Aquarionics

Thursday 29th May 2003

In an attempt to cut down my hits...

robots.txt is a file that ninty-nine percent of all search engines download from the root domain of a webserver and use it as instructions for what – and what not – to index.

This is my robots.txt file:

User-Agent: *
Disallow: /fun/mp3s.html
Disallow: /comment
Disallow: /trackback
Disallow: /logging
Disallow: /attachment
Disallow: /search
Disallow: /archive

See that last one? That’s the odd one out. It’s going to take a while (the top one has been there for a couple of months now, and was only removed for two weeks, and searches for MP3s account for most of my search traffic), but I’ve blocked Google from my date-based archives.

Why? Have I gone insane? Not quite. I’m currently plagued by incorrect search results. Until earlier this week, This Page was the top match on Google for the phrase “I Hate Dominos”. When I mentioned this a couple of days ago, that page became the top match within hours. This is stupid. Not only is Aquarionics defiantly not about my hatred of Dominos, I didn’t even say I did, some random anonymous commenter did.

Part of the problem with this is that every article gets indexed by Google twice (multiplied by the number of sites I get spidered as, now down to just one from six last week) and the top 200 words get indexed once more (The first two are part of the daily and single-item archives, the third is as the monthly archives which only show extracts or descriptions). This means that not only do people search for random things and get my website, when they search for things I do talk about they get the monthly page, where the phrase might be fifteen folds down.

So I’ve blocked search engines from searching archives, and instead made sure that there is a big list of links to every single entry in each section, so the engines can still find them but now will only index the page-per-article sections instead of having four copies of every item.

Which is neat.


Nicholas 'Aquarion' Avenell is a web developer in London, you can find out more about him or how to get in touch.

There are more Articles, Projects, Journal Entries, Photographs and things that defy description here, too.

If you're looking for something specific, there are Calendar & Category -based lists of everything.

And if you want to follow stuff that appears here, try a Syndication Feed, or the generic Feed of everything.


Aquarion's last Twitter was: [updating]
Twitter last updated


More Journal:

[RSS Icon]
[ESF Icon]
[CDF Icon]

That which is relevant:


Explain Ads
© 2000 to 2008 inclusive Nicholas Avenell
All comments are the property of their creators, published with permission
(Unless otherwise indicated, the opinions and sentiments expressed on this site are those of the author and not of any organisation of which he is an affiliate, including his employer. Caveat Lector, E&OE. sigh)
0.909 seconds, 11 queries, 2.61Mb on Wed, 01 Oct 2008 16:28:39 +0000
Generated by Epistula Version 2.0.3