I’m on the train to London now, using the internet thanks to National Express including it for free on their trains. I’d be online anyway, since I can always use my phone as a modem (as an aside, that, plus 3G support, are all that’s stopping me from getting an iPhone when my T-Mobile contract expires).
This weekend is The Big One for our server. It’s getting an absolutely massive overhaul that should see us going from a non-virtualised Gentoo Linux machine to a Xen-based system using Ubuntu 7.10 Server as the Dom0 and various versions of Linux as DomUs. The biggest trick, as far as I’m concerned, is that we’re going to preserve the existing Gentoo install – and all its data – and bring it back to life as a VM.
We’re also carrying out some hardware work – RAM upgrade, and installing a second drive.
And there is where it started to get complicated.
The machine currently has an 80GB hard drive with the Gentoo install on it. I’m adding a second drive with a much higher capacity – 250GB. Because we’re using Xen I want to use LVM to make it easy to create partitions for the virtual machines. I also need to preserve the existing Gentoo install and and was going to move it onto the 250GB drive, originally in an LVM partition.
The plan went through several iterations, increasing in complexity and madness, until, writing this, I had a moment of clarity, startled the people sitting near us by exclaiming ‘WTF?!’ and I’ve devised a much simpler – and saner – way of doing this.
But I’ll show you the madness first, as a warning to others of the dangers of not properly examining your plans.
After deciding not to move the Gentoo install onto an LVM partition, for fear of the subsequent Ubuntu Server install not recognising the existing LVM partitions (unlikely, but not unheard of), I decided to just make a physical partition for it – it could always be included in the LVM group after the services on the Gentoo install have be split off into VMs and the Gentoo install is no longer required.
At this point, the plan was to have some swap on both disks, the 80GB disk for Ubuntu Server, running as Dom0, an 80GB partition on the 250GB disk holding the existing Gentoo install, and the remainder of the 250GB disk as LVM, ready to take on new VMs created using xen-tools.
That wasn’t too bad, but it did involve copying the Gentoo install across disks and formatting the original disk, which can be done safely, but carries an element of risk higher than I’m really comfortable with. There’s also a large degree of space going to be lost on the 80GB disk, since the Ubuntu Server Dom0 isn’t going to take up anything like 80GB, and we’re not using RAID, which could add some resiliency to our system.
So I revised the already complicated plan to include formatting the 250GB disk to have five partitions, and the 80GB disk to have three. Both disks would have 4GB of swap, 20GB for the Dom0, and 50GB for one LVM group. The 20GB and 50GB partitions would be set up using Linux software RAID1. The additional two partitions on the 250GB disk would be an 80GB physical partition to receive the current Gentoo install and the remainder of the disk as a second LVM group.
50GB is enough for providing VMs for my paying customers and my own essential services, and the second LVM group would provide space for other VMs and experiments which didn’t justify a shot on the ‘important’ RAID-protected space.
So now we have two software RAID groups, two LVM groups, copying an entire disk still, and I left my house this morning with an awful headache.
My moment of clarity came when I started to write this down and it struck me just how complicated it is, when there’s no real need. Instantly a much better plan presented itself.
We’re carrying this work out tomorrow (tonight we’re installing RAM and the 250GB disk), and the new plan is to just leave the 80GB disk alone. I’m going to create three partitions on the 250GB disk: 4GB of swap (with an existing 2GB of swap on the 80GB disk), 20GB for the Ubuntu Dom0, and the remainder as LVM for virtual machines.
This leaves me without touching the most precious thing there – my customer data. I can use the existing partition as the VM disk without moving it at all. When the VM is ready to be decommissioned, I can simply add it to the LVM group, or create a new one. No RAID, but also no ‘wasted’ space, and no risk to the existing Gentoo install.
In the future – long after the Gentoo system is decommissioned – we plan to replace the hardware and everything will have nice hardware RAID 1 mirrored disks. In the meantime, this gives me a system that’s no worse than the current set up or my original plan and will be far quicker and safer to install and much easier to maintain.
It’s a plan that leaves me much happier than I was this morning.
The trap I fell into was over-thinking things and feeling that because I could deploy some technology I should deploy it. Taken to its logical conclusion, this will leave you with a system that’s completely unmanageable and a headache to set up. I’m glad that I caught this before we reached London – while they provide an important learning opportunity, I’d rather learn before I make the mistakes.
While this new plan simplifies part of our upcoming work, there’s still a lot left to do. We’ve set aside Monday for continuing work if it’s disastrous, and I’ll write up my post-mortem on the train home on Tuesday.