Tuesday, May 09, 2006

A fascination with recompiling kernels

Okay, another trend for you:

Why do people who use Linux INSIST on recompiling their kernels over and over again? I'm a regular on several Linux forums and I see a lot of questions being asked that are along the lines of "I've recompiled my kernel and now something doesn't work". They are usually not talking about upgrading to a new kernel but actually just recompiling their already-working kernel but with different options (usually to change everything that is modularised to be incorporated into a single, fat kernel).

Now there are several very good reasons for customising a kernel in such a way. Firstly, if you are using PXE boots, USB keys, bootable CD's, (going back a bit now) single-floppy distributions or something else that has filesystem limitations (whether they are direct technicial limitations or, with PXE for instance, something like bandwidth) then having a single-file, small customised kernel is ideal. However, most people who are asking these types of questions are home desktop or laptop users.

When you have Gigabytes of space in which to store hundreds of unnecessary modules, recompiling the kernel ultimately costs you lots of time. Time in preparation, configuration, compilation, testing and, ultimately, problems brought about by missing a vital checkbox when configuring the new kernel.

One oft-touted advantage is speed. Now recompiling a kernel to target a different architecture may or may not give you a speed advantage (there are benchmarks which suggest that there isn't much difference at all between any of the x86 architectures). For general use, you wouldn't even notice the difference. The current theory is that compiling a kernel with only the modules you require built-in somehow makes it faster.

I can't see this myself. A module is ONLY ever loaded when you (or your hotplug programs) tell it to load. Otherwise, it's not even in RAM, let alone taking any of your CPU time. The time it takes to load a module is miniscule and rare... maybe ten, twenty times, each time your computer is booted, if that? A few milliseconds each time once properly cached? And that's usually done on bootup where things can hang around forever detecting hardware. After that, it's only ever done when you actually insert or remove some piece of hardware (and even then it may decide to stick around or be already loaded for something else). Compared to most other optimisations that are possible, that's an abysmally small one. Putting this stuff into a megalithic kernel every time doesn't save you anything in terms of processor time.

However, having a module PERMANENTLY in RAM even when you are not using it does not seem an efficient use of such a vital resource. By loading modules only on demand, surely you are saving enough RAM to cache anything you may want to load off of your disk (e.g. modules) and thereby increasing the system speed overall.

Additionally, having to hand-configure a kernel is an enormously time-consuming task, especially for the unskilled who may omit several vital options. And when you next purchase hardware, you're going to have to do it all over again. When you insert a USB device, the right options better be ticked or you won't see ANYTHING. You borrow your friend's USB hub, which uses slightly different modules, and you'll have to recompile the ENTIRE kernel all over again. You insert a USB device that uses a slightly different module to anything you've got in your kernel and, guess what, you have to recompile.

Why not just use a modularised, standard kernel that has modules for anything you could EVER use (but which are only lounging in a few tens of megabytes of disk), save yourself several hours of configuration, compilation, testing and frustration, sacrificing any miniscule optimisations that are negligible and unnoticeable for the sake of giving you more time actually USING the machine?

There are a lot of people on these forums that are complaining that something didn't work and their eventual solution is to recompile the kernel to incorporate some module that they didn't have to foresight to include. Add up the time taken to discover the problem, the time to diagnose it (which must be non-trivial if they have to post on a forum for help with it), the time to recompile the kernel and reinstall it in the right locations, to reboot and then test.

Is anyone seriously telling me that that is going to be LESS time than the actual productive time you lose you would lose by having to load a single module automatically from disk from their previous standard kernel configuration? I use hardware which, on the whole, is considered obsolete (it's cheaper and it does everything I need it to) and yet I still don't notice any significant improvement by not using modules.

Kernels take on the order of minutes to compile on modern machines but the human element is a lot of wasted time in doing so correctly, installing, rebooting etc.

Also, when troubleshooting omitting problematic modules can be necessary. When modularised this is simply a question of blacklisting them in the right software, editing a simple script to prevent insmod or modprobe from loading them or even just changing the permissions on the module file itself. Doing so on an all-in-one kernel involves, yes, yet another compile, install, reboot, etc...

I use Slackware which comes with the kernel config file used to compile the main Slackware kernel. It incorporates a lot of junk, most of it as modules, but it works on virtually ANY x86 machine (even 64-bit etc.) with little to no performance loss. I gladly sacrifice 50Mb of hard space to be able to buy/borrow ANY Linux-compatible hardware, stick it in the machine (which doesn't always involve shutting it down if you include USB, Firewire, PCMCIA etc.) and it will work without me having to TOUCH the software side of things.

People lend me their USB keys, their portable CDRW's, their scanners, their printers etc. so that I can diagnose them. So long as my software is up-to-date (which takes two commands at most), I know that anything Linux-compatible will run without me having to play with my machine or reboot it. The time I save in not recompiling the kernel each time more than makes up for any imaginary performance saving.

And when a new kernel comes out, the same config STILL works (make oldconfig). I get asked if I want to include X or Y in the new kernel and I always select the best option - the one that lets me use it if ever I need to (though I don't personally think I will have a Gigabit card, etc. in my personal machine any time soon) but doesn't allow it to get in my way - build as module.

Now let's pretend that my motherboard blows up tomorrow. So what? I move the hard drive to a spare machine, it boots up just the same but takes account of the new hardware without panicking, hanging, not having the modules required, or requiring me to play about with kernels and LILO/GRUB just to get the damn thing to boot up.

It's like defragging - there is a purpose to it but for most tasks the trade-off just doesn't add up. Spend three hours defragging to save milliseconds of disk acces time over the course of a month or so until the filesystem fragments again? There are scenarios where it may well be worth it, or have been worth it in the past, but most of the time you're just sitting a screen staring when you could be getting something done.