computing linux sysadmin

Emergency Server Moves, Automation,

So, I’m having a bad server day.

This is a Monday Morning email if ever there was one
This is a Monday Morning email if ever there was one

It’s my fault, to a large extent. Earlier this year I discovered that my server had been sending daily emails complaining about a problem with one of the mirrored hard-drives, but these were going directly into Spam. I asked my hosting provider (Hetzner) to swap a new drive in, which they did, but the process of copying the old hard-drive mirror to the new one failed because of an error on the old hard drive. Not having the time to devote to fixing it properly, I bodged a fix and moved on with my life.

This is not a good email to receive either, but in addition to the last one, it's worse
This is not a good email to receive either, but in addition to the last one, it’s worse

Earlier this week, that problem developed into a fatal one, it’s getting worse, and so now I have a limited time to move everything off.


So now I own a brand new server, continuing with my usual conventions, this is

One of the reasons I didn’t jump to a new server earlier is because I wanted to improve the setup of it. Atoll – the server which is failing – was set up three years ago in a hurry as I attempted to move stuff off of fjord – its predecessor – before the billing period was up. All of it was before I moved back into sysadmin/devops, and it’s a very traditionally set-up server – the kind I refer to at work as “Artisanal”; Hand crafted by craftsmen, unique and unrepeatable because it has no documentation as to *why* any config is as it was. My mission at work is to automate that stuff out of existence, and I wanted to apply some of the stuff I’ve learned in the last few years to my own systems.

Server herding over the last ten years has shifted away from big servers doing multiple jobs, and towards huge servers pretending to be many tiny servers doing single jobs. Initially I thought of setting up the new server as a virtual hosting server, setting up many small servers. I’ve come to the conclusion this is impractical for my use-case, which is that the server tends to host many dozens of tiny websites and services. Dedicating a slice of resource to each does none of them any favours, and increases the admin burden rather than decreases it. Instead, I’ve gone towards a halfway house solution of putting most of the separate servers on Docker.

docker-logo(Docker, for those who haven’t seen it, is basically chroot on steroids, an enclosed mini-OS that shares physical resources with the host, but only has access to the service and disk resources you specifically give it. For example, you can grant it a port to talk to MySQL on the host (or another Docker container), and a directory so it can maintain state between restarts).

Rather than manually set up the server for Docker, and have many automated boxes inside a grand artisanal crafted shelf, I’ve decided to use Ansible to manage the main server setup (and a lot of the DNS, since most of my sites use Amazon’s easily scriptable Route53 DNS server service). I’m still learning Docker, but I’m comfortable in Ansible, so I haven’t gone as far as to orchestrate the Docker setup itself with Ansible, just the install.

All of this is kind of like getting everyone off of the capital of Sokovia before the rockets hit the right altitude. In this metaphor, the unbeatable forces of entropy will be playing the part of Ultron
All of this is kind of like getting everyone off of the capital of Sokovia before the rockets hit the right altitude. In this metaphor, the unbeatable forces of entropy will be playing the part of Ultron

Docker’s given me some fantastic advantages just in the 12 hours or so I’ve spent using it. I’ve long been a user of things like Virtualenv for python to separate out my language and library versions on a per-application basis, and the ability to do that with whole OS packages is very useful. It enables one of the holy grails of PHP development: Hosting multiple sites with different versions of PHP at the same time. So WordPress can be on the brand new PHP7 (you’re using it now, in fact), while a PHPbb install can remain on 5.6 until they get their act together (or I switch it to Discourse or something). For this traditional hosted kind of server, which does many things for many different people, it’s really useful.

Taking Control
Taking Control (Photo by Ady Satria Herzegovina, used with permission)

All this is still an arse, though. Working through today, I’ve got WordPress and the forums moved over, some of the static sites, but still less than 20% of functionality’s been moved. Taking a break now (It’s midnight as I write this) for some gaming and then probably sleep. Maybe I can finish this tomorrow…



linux sysadmin

Week 15 – Outstanding in our field

I had a lovely empire event. It had nice weather, nice people, and things that I wanted to happen happened, and happened well.

Most of what I’ve done other than that, though, has been dribs and drabs.

I’ve turned my dust-gathering RaspberryPi into an OpenVPN server for the house, enabling me to get at my media server and desktop files and folders from anywhere in the world without opening the services up to it. This required a bit of playing around, because OpenVPN didn’t work though my Fibre router, but putting the router from Virgin into “Modem mode” and moving all router duties to a nicer Belkin box stopped my kettle talking to the wifi (also my Kindle and media server).

Some playing with radio settings and configuration later, and the Belkin box is running the roost, with everything connected to it. Of course, the public story on that is just that I can turn on my kettle from anywhere in the world, because that’s the obvious bit.

The other bit was implementing recaptcha on a quotes service. I’ve been running quotefiles for channels for eight years or so, but I’ve never found a service that didn’t suck. Rash and the other QDB lookalikes have ownership, maintenance and being-awful problems, which pushed me towards Chirpy, a perl-based system written for Mozilla. I don’t generally work with perl, which meant that Chirpy didn’t really work for me. When it would crash with obscure templating errors that repaired themselves in a few minutes, I had nothing to do. Plus, as we drifted away from its 2007 last release date, and the 2010 last code-commit, I trust it less.

So when it failed for the last time, as I upgraded the server it was on, I honestly couldn’t be bothered to go through the CPAN dance again, and hacked my own together in PHP. It doesn’t have the features, or tagging, or any of the other things we didn’t use. But it worked.

Well, nearly. When I checked the queue for quotes to approve, as I do every few weeks, I found a spambot had hit the form, so I’ve added very basic recaptcha support, which took 45 minutes only because I can never spell captcha the same way twice in a file.


computing linux sysadmin

apt-get dist-upgrade

The release of a new Debian version is one of those Deep Thought moments. The great machinery has been churning and grinding for seven and a half million years to produce this single result, and it pops out and stands there for a while, while everyone looks at it to see who will bite first.

This weekend, I upgraded to use redis as a caching later, more for practice than because it desperately needs the speed boost, but the php-redis package doesn’t exist in Wheezy, so it was time to upgrade to Jessie, the new version.

The server hosting is, which is almost pure webhosting right now. Specifically, it hosts:

So a mix of stuff that nobody would notice if it died forever, and stuff people will send me messages about when it goes down.

First stage was to RTFM. Debian’s release upgrade process is deceptively simple, and I’ve successfully updated servers though four releases – that’s nearly ten years – with just an apt-get dist-upgrade, but one of the ways I’ve done that is to read the release notes and see where the iceburgs might be.

Here, a few security updates (Root logins disabled by default), but nothing major for anything I use… go forth.

The biggest problem I had was Apache, in fact. The 2.4 release Jessie upgrades to I missed as a big release, and the biggest problem was the previous permissions system’s “allow from all” declarations were not legal under the new system. This, coupled with a few changes to SSL config, caused me mild panic. A simple read-though of the Apache 2.2 -> 2.4 upgrade guide soon set me right, though.

The upgrade of PHP to the numerically pleasing number of  5.6.7 seems not to have broken anything major.

The packaging of web apps, however, is still moderately fucked. Mediawiki’s stable version is going to be abandoned at some point during Jessie’s lifetime, which will stop security updates of it. Mediawiki’s upgrade process is horrible anyway, and the Debian package solution of complex mazes of symlinks which break every point release hasn’t helped keep people on it, but I think the security release abandonment is the final nail in the coffin of me using it at all.

At some point soon I’ll need to upgrade my other server to Jessie as well, a more complicated process, since while Atoll is almost entirely my stuff, Cenote hosts over 100 sites for a couple of dozen users, as well as things like a Mumble server. Time to schedule some downtime there, I think…

computing Current Affairs sysadmin

Week Twelve: May the… sixth be… something something

My major success story this week has been to utterly screw up my desktop computer.

Over the last few years, I’ve started selecting PC components based on silence. When I did a replacement of the motherboard/CPU/cooling systems last year, though, I kept them in the same case. It wasn’t a bad case, really, although it reminded me how much of a poor life-choice cheap cases are (A cheap case generally has thin allumenium inside, which never fails to take a blood sacrifice). However, since then the machine has started making irritating “my fan is dying” noises, so I decided an upgrade of the fans was in order. One short trip to Amazon later, and I had a brand new Corsair case, with smoothed aluminium edging, no-screws hard-drive and optical drive installation, and – in a nod towards the new world order I hadn’t seen before – places to put SSD drives!

It’s black, and it’s got white LEDs in it, and it’s got a window in the side (I last bought a case with a window in it in 2003, during the saga of the Gold Plated Power Supply) (Hell’s bells this site’s old) and while putting things in to the new case did not require a blood sacrifice, taking stuff out required a significant sacrifice of my right index finger, and dealing with cross-threaded screws, and the fact that my machine now has four fans in it, plus a water cooled CPU taking up a fan pin, and only three fan pins on the motherboard. This can be repaired later. Before summer, probably.

For the full several hour experience, you needed to be here, but in rapid portrait-o-vision here is a 30 second timelapse of the whole thing (Portrait-o-vision because it was taken from my phone mounted in the dock, and there’s no sensible way to crop it)

Having done all that…

… it still made the noise. Turns out it was the power supply, which I’ll have to replace next month. In the meantime, turning it upside-down replaced the irritating fan-scraping noise with an even more irritating but far more intermittent high pitched whining.


The other major feature of the long weekend (I was off Friday though Monday) was that part of it was Streetfest, at which my new company, Skute, were doing a thing.

Due to some happy fun organisational issues, I didn’t get a full version of our thing up on the testing servers until Friday, at which point it became obvious that there were a couple of major differences from my exploratory tests. Differences that were pushing page generation times over thirty seconds, which is a magic death figure for our architecture. On top of that, there were some issues with the mobile experience that meant it was now missing the point in all three dimensions at once.

Basically, over twelve hours of Friday, I ended up doing a lot of the query optimisation work I’d never got time to complete to get initial loads somewhere sane, and then a heavy caching layer over the top of that to get subsequent loads closer to tenths of a second than tens of seconds. Something of a heavy day. The load times when we went live were… mostly fine, though apparently the mobile bandwidth on site may have not been up to much. It’s all a learning experience.

I’m feeling somewhat over-teched this week. May need to devote some evenings to pure creative.

Plus, it’s election time again, which is always good for my faith in humanity.

Apple computing Larp sysadmin

Sysadmin in a Field, Episode one: The tyranny of little bits of paper

For LARP events, including most recent Empire, PD relies on quite a bit of technology. With all the will in the world, keeping track of 1500 players and their characters, their medical highlights and plot highlights, is hard to do with bits of paper.

In this series of things, I’ll take you though some of the solutions we’ve found to problems in the field.

One of these is the tyranny of little bits of paper.

We run in a mock-medieval setting, of kingdoms and knights, orcs and wizards. But on the field, we generally avoid doing in-character things on computers. The refs are issued android tablets, which can be used to record the game events we need to keep track of (Rituals cast, etc.) and we’ll get to the tech of that some time later. Empire, though, has admin that happens in the field, be it the resource trading of the Bourse, or the House of Cards politics of the senate. All this happens on bits of paper, because there’s nothing quite so immersion breaking as dealing with a medieval clerk poking away at an iPad.

Plus, characters in the field communicate with off-site NPCs – and sometimes each other – with letters.

However, this leads us with important information on bits of paper that needs to be kept, and bits of paper are absolutely fucking awful. They get lost, they get muddy, they get out of order, out of place. Only one person has it at any one time, and whoever has it has to physically transport it somewhere else before it can be viewed by others. Kill it, kill it with – terrifyingly effective – fire.

We try to keep information on the wiki, one of three Mediawiki installs (A crew information one, a plot one, and a public one), so generally the first thing that happens after an event is that people try to type up their information and put it in the wiki. But typing up that kind of thing is time consuming, and is likely to lose any interesting layout or design the players have put into it, and it would be far better if we could do it in the field. And nobody has time to type up their notes in the field, we’ve got things to run.

For odyssey last year, I brought my Doxie scanner to solve this problem for the smaller game, and when that worked well, asked PD to get one themselves.

The Process

1. People who had stuff to scan fed it though the Doxie, and it went on the SD card.

A doxie promo picture. Not pictured: Mud
A doxie promo picture. Not pictured: Mud

The Doxie document scanner is a wonder of modern technology. It’s a small box, about the size of a roll of tin-foil, and you put documents in one side, and it scans them and puts them onto an SD card. It’s got rechargeable batteries, so it doesn’t need to be plugged in, and it looks like a USB Mass Storage device when you plug its USB port in. It’s the centre point of my own paperless system, which I’ll talk about in a future article.

Importantly, once the person had finished scanning stuff, they could go away and do whatever their job is supposed to be.

2. When the Doxie is plugged into my laptop, do stuff.

Sitting on my laptop was Hazel, an OS X utility to Do Things When Things Happen. It’s an awesome utility, but in this case could be replaced by anything with the ability to notice a directory has changed, and do something.

3. Specifically, run a python script to upload anything new in the directory to the Wiki

For Odyssey last year, this was a shared directory. In the field I whipped up a simple script that took a file name and sent it to mediawiki instead.

4. People tag stuff

Once it’s on the wiki, users could add categories and stuff to make sure things didn’t get lost.

What went right

  • It worked. There are a few dozen senate motions, and a load of things from the Conclave, that are sitting up on the wiki that would usually be waiting until someone had time to type them up.
  • It was easy to use. Once people got the hang of feeding stuff to the scanner, they could do so fairly quickly.

Future Improvements

  • Knowledge Transfer. I ended up being the person who used it most, partly because of the few people who knew it was there, I was the one with the most paper.
  • The Mud. There is no set of positive/negatives for last event that doesn’t feature the mud, but in this case, people were – rightfully – wary of feeding a document scanner anything that had encountered the wet and sticky ground
  • Single point of failure. Because I wrote the software for my mac, it only worked when it was plugged in to that, and my desk is a bit out of the way.
  • Power. The Doxie Go, which I have, has a built-in rechargable battery. The Doxie One, which I recommended to PD, requires NiCad batteries before that works, which we didn’t have on site, so it was tethered to my desk by a dodgy power cable.
  • Fussy Scanning. The Doxie is a bit fussy about straight edges of things going in, which isn’t great for the random edges of player-supplied paper. We can solve this by having a clear plastic wallet to put things in if the Doxie’s being picky.
  • No Preview. Stuff going directly to the wiki includes duplicates, and failures – like things that went though diagonally.

In general, though, it was a nice solution to a problem we’ve been having, and now we can make the incremental steps to make it even better…

Apple computing linux sysadmin windows

My Terribly Organised Life III:B – Technical Development

Code starts in a text editor. Your text editor might be a full IDE, custom built for your language, a vim window with more commands than you can remember, or an emacs with more metakeys than you have fingers. Nowadays, it might even be a window in a browser tab, but that’s always given me flashbacks to deploying software by pasting lines into textareas in Zope, but the lines I type are in a text editor, and currently that’s Sublime Text 3.

I used Eclipse, Netbeans and Aptana and variants on the Java-based juggernaut for years, but partly because my main development languages are PHP and Python, it never really worked that well for me. My primary development OS is OS X, on by beloved Macbook Air, but I don’t want that to matter at all. I use SublimeText because it has plugins to do most of the things I liked about IDEs (SublimeCodeIntel, some VCS plugins, and a small host of other things) and it works the same, and looks the same, across every OS I use day to day. I’ve got my prefs and package lists syncing via dropbox, even, so the plugins stay the same.

I work as a contractor for hire, most of the time, and I’m terminally addicted to new projects. So I’ve generally got upwards of a dozen different development projects active at any one time. Few of them use the same language/framework combination, and all of them need to be kept separate, and talk to each other only in the proscribed ways. Moore’s law has made that a lot easier with the advent of things like Virtualbox being able to run several things at once, but getting those all consistently setup and easy to control was always a bit of an arse. Vagrant is my dev-box wrangler of choice right now. It could do a lot more, but mostly I use it to get up and shut down development VMs as I need them, safe in the knowledge that I can reformat the environment with a single command, and – with most projects, after prep work I’ve already done – anyone can set up a fresh and working dev environment in a few minutes.

(In theory. In practice there’s always some “How up to date is your system” crap)

Plus, the command line history always looks like it’s instructions for some kind of evil gift-giving robot. Vagrant up! Vagrant Provision! Vagrant Reload! VAGRANT DESTROY!

It’s a year or so since I switched almost everything to vagrant environments, but it’s only in the last few months I’ve looked in more depth about using something other than shell-scripts to provision them. I don’t really want to run a separate server for it, I’m not working to that kind of scale, so Ansible is currently my provisioning system of choice.

Ansible technically breaks my rules on development environments being platform agnostic, since it’s fairly militantly anti-windows as a host platform, but with babun (which is a cygwin repackage, complete with a replacement for the awful cygwin interactive shell, zsh, and a full package manager. If you take away nothing else from this, never install cygwin again) it works fine.

I’m fairly lucky in that all my clients have standardized on git as their vcs of choice, as it’s my choice too. Tower absolutely shatters my platform independance rule, but it’s hands-down the best git GUI I’ve used, and its built in git-flow support makes a lot of things easier. In Windows I’m using Atlassian SourceTree for the same job, which does a passable job. I’d still not recommend a git gui unless you know how to drive the command line to some level, if only because the terminology gets weird, but at the same time I’ve really liked being able to work with cli-phobic front-end developers who could still commit directly to the repo and make changes without needing a dev to rebuild.

For that, and not much else, I’ll recommend the Github client (in both Windows and Mac forms). It’s the most easy to use git client out there, but it’s doing that by hiding a lot of complexity rather than only doing simple things. It will work with non-git repos, even, though it’s not terribly happy about the concept. Does have the massive advantage of being free, though.

For the full Rained On By The Cloud experience, current primary deploy stack for Skute backend involves pushes to Github branches automatically triggering CodeShip CI, which runs the test suite before deploying (assuming success, of course) to Heroku. Secondary stack is similar, but deploys with ansible to AWS (for Reasons. At some point in the future I’ll no doubt be doing deeper stuff on how I’ve built the backend for Skute). Leaning heavily on the cloud is, in IT as much as life, not entirely a good idea, but it’s a really good starting point, and redundancy is in place.

Heroku’s mostly been a good experience. We’ve run into some fun issues with their autodetection (They decided our flask-based frontend service should be deployed as node.js, because the asset build system had left a package.json in the root) but the nodes have been rock-solid. Anyway, I’ve drifted into specifics.

Other dev utilities I couldn’t live without? Putty, in windows, for all the normal reasons. Expandrive is a Windows/Mac util for mounting sftp services as logical drives (or, indeed, S3 buckets or a dozen other similar things). LiveReload automatically watches and recompiles CoffeeScript, SASS, LESS etc. when necessary, Sequel Pro is an OS X GUI for MySQL access… and Evernote, where go checklists and almost every other bit of writing that isn’t also code.

There’s probably more, but that’ll be another article now.

sysadmin windows

Windows 7: How To Automatically Backup Your PuTTY connections

Go to:

  1. Control Panel
  2. Administrative Tools
  3. Task Scheduler
  4. Create Basic Task (In the bar on the right)
  5. Name: “Backup Putty Connections”
  6. Next
  7. Run Daily
  8. Next, Next, Next (Until “Start a Program”)
  9. Program/Script: C:Windowsregedit.exe
  10. Arguments: /E "Putty_connections_backup.reg" "HKEY_CURRENT_USERSoftwareSimonTatham"
  11. Start In: (The directory to put the backups in. Somewhere in your Dropbox would be good)
  12. Open Properties when finished
  13. Finish.
  14. Check the “Run with highest privileges” option (If you don’t see it, find your new task (You may need to click on “Task Scheduler Library”) and right click on it, then select “Properties”)
  15. Right click on it again
  16. Run it.
  17. Make sure the file’s been created.

How to restore them:

  1. Install PuTTY
  2. Double click on that file.
computing sysadmin


You should back up your data.

You should periodically review this backup system to make sure you can get data back off it, but you should back up your data.

I’ve been doing this “computer guy” thing for a while now, and these are three things I have learnt are always true:

  1. The only way to be absolutely sure you get rid of a virus is to nuke the system it’s sitting on.
  2. Backup solutions that require the user to perform an action don’t work.
  3. The universe tends to irony WRT backup systems. And everything else.

The second is always true, from the lowliest “I want the blue “e” to work again” casual user though the most fan-speed obsessed poweruser to the most jaded and cynical sysadmin, if your backup system requires you to actually do something, then it’ll happen less often than it needs to and the time you fail to do it, or fail to check it, will be the day your harddrive dies.

Apple’s Time Machine is great, because it automatically syncs stuff to an external hard drive or network backup as you’re using the machine. If you have a laptop with it, set it up with a network backup, because while it warns you about your last backup being out of date it will go away if you tell it to, and that’s a recipe for badness.

Obviously this is where I explain how my backup system is the ultimate, most perfect way to solve this problem, but it isn’t. This the way that works for me:


My music, documents and projects get synced to Dropbox on a continuous basis as I’m using them (it used to be just Documents, but I started using them as a full backup system last year) and from there synced back down to my other machines (Directly over the lan with the latest version). By sharing subfolders with other accounts, Documents & Projects get synced down to my laptop when it gets plugged into wifi (it doesn’t have the disk space for the whole thing) and as soon as Dropbox sort sub-folder syncing even that complication will go away. Various subfolders are shared with other actual people too, for the things I work on with other people.

It’s good, it works, I know within a few hours if it’s broken, and I don’t have to think about it.

Which is, for me, the perfect backup system.
Windows 7 has a great backup system also. It will continually pester you until you set up some kind of backup system, and then it will pester you if that stops working.

Neither of them are good enough, though. You can ignore the messages, you can forget to plug in the external drive every so often.

(Downsides mainly involve trusting a third party with your data)

If Ubuntu’s magic personal backup system was cross platform, I’d probably use that (this syncs between three OSs), but trading into a different closed ghetto isn’t something I consider a good idea.


To remove a host that denyhosts has banned

Denyhosts is a utility that automatically bans IPs who attempt to ssh in to your server and get three wrong passwords. This is great when people are dictionary-attacking your SSH server, but less good when you have actual users who might get their password wrong.

The FAQ for denyhosts says how to fix this if it happens and your users are banned, but it’s a bit faffy, so I’m putting my script here. It works for me, it may screw your life up. Backups are your friend.

/etc/init.d/denyhosts stop
cd /var/lib/denyhosts
for THISFILE in hosts hosts-restricted hosts-root hosts-valid users-hosts;
	mv $THISFILE /tmp/;
	cat /tmp/$THISFILE | grep -v $REMOVE > $THISFILE;
	rm /tmp/$THISFILE;
mv /etc/hosts.deny /tmp/
cat /tmp/hosts.deny | grep -v $REMOVE > /etc/hosts.deny;
rm /tmp/hosts.deny
/etc/init.d/denyhosts start

Needs to run as root or someone with access to all denyhost’s files (plus hosts.deny).

2015 Addition:

As time has moved on, service management’s changed a bit. For Debian derived distros (ubuntu, probably mint?) you’ll need to change the /etc/init.d/denyhost lines with “service denyhosts stop” etc. Slackware uses “/usr/share/denyhosts/daemon-control”. Look it up for your own system, everything else should be fine, still. Thanks to Bill B and Velimir Kalik in the comments.