Adding a system call to the Linux Kernel

So, despite ostensibly being a ‘systems’ guy, I haven’t spent too much time in my life getting hands on with the Linux kernel. I’ve written tiny toy operating-system-like projects before, but haven’t done much open-heart surgery on real life code.

I think this should change, so in my very limited spare time I’m doing some very simple projects to teach me more about the Linux kernel code layout so that if it should so happen in a job interview that someone asks me if I’m comfortable hacking at the kernel level I can answer `yes’ with far more conviction (I would probably answer positively anyhow, because I’m arrogant enough to think that it’s not beyond me, but there’s a lot of metaphorical difference between having the book on your shelf and having read it 🙂 ).

The first important goal was to get a working development environment where I could boot kernels that I had built myself quickly and easily. After some searching for a minimal Linux system – but rejecting embedded systems, I wanted minimal in the sense that little else ran but the kernel, init and the shell – I decided upon ArchLinux. Arch seems to be an absolutely no-nonsense, no bells and whistles kind of distribution, which is exactly what I was after. I didn’t want anything to obscure what was going on at the kernel level.

So I downloaded and installed Arch into a VirtualBox virtual machine (as ever, VirtualBox works flawlessly), and got to work.

Building a kernel from source is very easy – download the source (I’m on 2.6.25), make config / make bzImage / make modules / make modules_install. Getting it to boot was more of a pain, simply because for some reason the kernel couldn’t mount the rootfs which is an ext3 filesystem. It turned out that at some point in the configuration step, the ATA IDE modules weren’t being selected even for building, and this obviously leads to problems when trying to mount a filesystem on a (virtual) IDE device. Running mkinitcpio gave the right hints when it said it couldn’t find the ATA module. Fixing that and rebuilding finally gave me a bootable kernel, and I was in a position to start hacking.

Knowing how labyrinthine things can get, I didn’t want to start with anything too taxing. So I went for the obvious OS 101 extension project: adding a system call. System calls are essentially the kernel’s API to userspace, and are called (on x86 at least) by loading the registers with the syscall number – Linux has 326 at the moment on x86 – putting the arguments on the stack and then invoking an interrupt which is trapped by the kernel. Once control passes to the kernel it can tell that a system call is to be called, find the right call point by indexing the system call number in a function table, and jumping to the actual code.

System calls are only added to the kernel when there is a compelling case to do so, as once added they can never be removed for fear of breaking compatibility – if you remove a syscall, you have to renumber all subsequently added syscalls and that breaks code. Therefore there’s not really much practical use in adding one, but a good deal of educational value!

The first thing I did was to write my syscall code. I didn’t really care what my syscall did, except I wanted its results to be verifiable so it needed to return some value. Going with an example from the book I am following (Linux Kernel Development, highly, highly recommended) I settled for a simple call that returns the size of the kernel per-process stack, as defined by THREAD_SIZE. Here’s the code:

asmlinkage long sys_kstacksize( void )
{
    return THREAD_SIZE;
}

Very, very simple. asmlinkage just tells the compiler to find the arguments for the function (although there are none in this case) on the stack. Everything else is self explanatory. This goes in kernel/sys.c which is where a good deal of the other system calls are implemented.

Now the (relatively) tricky bit – plumbing the system call in such that when a call comes in for our syscall the kernel knows where to find it. This is architecture specific, but on x86 the only thing you have to do is to add it to arch/x86/kernel/syscall_table_32.S, right at the end of the list of extant syscalls. This list is assembled into a function call table such that the address of the syscall is at the offset of its syscall number. You can also infer the syscall number from this file as it’s well commented every five syscalls. My addition was number 327.

Then for completeness I added a #define __NR_kstacksize 327 to include/asm-x86/unistd_32.h so that there was a corresponding lexical definition for the syscall number – you don’t really want to see syscalls called by number in code because you’ve got no idea what’s being called.

Quick re-build of the kernel to check that everything’s ok, then on to a user space test. Most syscalls are already plumbed in to userspace via libc or similar. However, a newly minted syscall won’t have any convenient userspace stubs, so we have to call it manually.

This is where my book let me down – or at least, understandably, hadn’t kept up with the last three years of kernel development since I bought it. There used to be macros that would generate userspace stubs for system calls called _syscallN for N arguments. They no longer exist. It took me a little while to find out what had replaced them, but it turns out that there is standard syscall(2) call that will invoke system calls on a userspace application’s behalf.

So, writing a tiny test program to invoke my new syscall was as easy as pie:

#include "../linux-2.6.25/include/linx/unistd.h"
 
int main( )
{
  long stack_size;
  stack_size = syscall( __NR_kstacksize );
  printf( "The kernel stack size is %ld\n", stack_size );
 
  return 0;
}

Testing was even easier – I ran it once on the kernel that came with Arch and got -1 as a response, which is ENOSYS – an indication that syscall 327 didn’t exist. Rebooting into my newly rebuilt kernel tells me that the kernel stack size is 8192. Success!

Ok, so that’s a really small start. I’ve proved to myself at least that I can deterministically affect the behaviour of the kernel (it doesn’t always feel that way when you’re working with such a complex piece of software). My intended future projects are more ambitious: the next thing I’m going to do is to understand the scheduler, and perhaps replace it with a massively less efficient one. Then I’m off into subsystem land – I want to write a small filesystem (an in memory one might be a simple start) and then figure out the IP stack. Should be enough to keep me busy on Sunday evenings for a while!

5 thoughts on “Adding a system call to the Linux Kernel

  1. That’s a pretty interesting project you have started there; looking forward to more articles along the same vein.

    Regards.

  2. Nice article. I haven’t used ArchLinux but it sounds like it would be good. If you’re going to be looking at kernel internals I would highly recommend “Understanding the Linux Kernel” and “The Linux Process Manager”; both a little out of date, but still excellent, especially the latter.

    Good luck!

  3. Hi,
    I am trying to do the same but got horribly confused as I am trying to do the same modifications for a 64 bit kernel.

    so, I added to syscall_table_32.S (As there was no _64 equivalent) and when I did go into unistd_64.h I was shocked to find the total number of syscalls for the _32 and _64 variants are completely different.

    So, If you can clarify how to do this in newer kernels that will be sweet!

  4. thank u very much for posting this tutorial cause i resolved so many problems…I encourage u to do more and finally to succeed in your projects

    :))

Comments are closed.