Archive for January, 2012

Archive for January, 2012

Writing a Simple OS Kernel — Part 3, Syscalls

Posted on

Updated 2012-33 to fix a bug in the context switch assumptions.
Edited for clarity, 2012-34.

Last time, we got our kernel to set up space for a task, and run that task in user mode. This time we’re going to add a facility for the user mode task to call back into the kernel.


But wait! Didn’t I say last time that one of the things user mode tasks cannot do is change the CPU mode? Well, yes, they can’t change it directly. What they can do is trigger an event that will make the CPU switch to supervisor mode and then jump to some predefined bit of code. That way, there’s no security problem, only kernel code is running in supervisor mode, but the user mode task can still ask the kernel to do things for it.

The instruction that lets us do this on an ARMv6 CPU is svc (used to be called swi). It takes an immediate value as an “argument”, which it actually does nothing with. If the kernel wants to use that number for something (like specifying what the user mode task wants done), it has to read the number right out of the instruction in RAM. This is doable, but not always ideal, and so some kernels (such as modern Linux) actually just always use a zero, and then store information they want to pass elsewhere.

Vector Interrupt Table

The svc instruction actually causes an interrupt. Interrupts are signals that (when they’re enabled) cause the CPU to switch modes and jump to some predefined instruction. What instruction will it jump to? Well, ARM CPUs have something called the vector interrupt table that is at the very start of RAM. These locations are where it will jump to (each word, starting at 0x0, is the location of a particular sort of interrupt, up until there are no more interrupt types).

That may not seem very useful. We can only execute one instruction? Well, yes, but that instruction can jump us to somewhere more useful. Now, your first thought may be to put a branch instruction there. Great idea, but it won’t work. Branch instructions are calculated relative to their position in the linked binary. If we copy one of them to another location in RAM, the offset will be wrong. We need to use a load instruction to load an absolute value into the program counter. What value should we load? The address of a function in our kernel, of course! It turns out that the assembler contains a syntax for including the address of a function directly, which is then an absolute value and so does not move when we copy it.

Here’s what the new bootstrap.s looks like:

.global _start
	mov r0, #
	ldr r1, =interrupt_table
	ldr r3, =interrupt_table_end
	ldr r2, [r1, #]
	str r2, [r0, #]
	add r0, r0, #
	add r1, r1, #
	cmp r1, r3
	bne keep_loading

	ldr sp, =0x07FFFFFF
	bl main

	ldr pc, svc_entry_address
	svc_entry_address: .word svc_entry

Why do we need to copy two instructions? Well, even that load instruction is loading from a relative address. Luckily if we move them both then the relative position remains the same. This may look a bit complicated, and that’s because it is. You could just copy the two words directly across, but this way we have a loop that copies everything from our interrupt table section, so we can easily add other interrupt handlers to it later. You’ll note we started at 0x8 instead of 0x0, because that’s where the SVC handler is, so if we want to add ones that come before 0x8, we just have to remember to change the base address at the start. With some hacks we could do the copying part in C, but for this example I decided that keeping all the bits for this in assembly was easiest.

Syscall Wrapper

We need a way for our C code to call the svc instruction. Since we don’t have anything we really need to pass through, we’ll just add a dummy wrapper for now.

Create a new file called syscalls.s and add it to the Makefile as a dependency of kernel.elf. Put this in it:

.global syscall
	svc 0
	bx lr

Pretty simple. It just activates the svc, and then when it comes back jumps to the caller (note that this assumes the registers it had before it called svc are reset before it comes back).

You’ll also need to add a line to asm.h:

void syscall(void);

Save Kernel State

Now, let’s think about what we want the kernel to do when it actually gets control back. We’d like to go back to the point in our kernel code we left off when calling activate in the first place. The problem is, we’ve not got any clue where that was! In loading the user mode task state, we did not keep around any information about what state the kernel was in. Let’s add that to activate now:

mov ip, sp
push {r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}

Put that at the top of activate, before we begin messing about with the registers. That’s all the kernel state we need to save!

SVC Entry

Alright, we’re now ready to define our first version of the svc_entry function. All we really want to go at this point is get the kernel back into a state where it can run, and then return to our code. We’ll put this in context_switch.s:

.global svc_entry
	pop {r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	mov sp, ip
	bx lr

Just reverse the save we just did, and jump back to where we came from in the kernel. How is lr where we came from? Well, you’ll have noted that when we call functions (and the C compiler does this as well), we use bl, which saves the address of the instruction after itself in lr before jumping to the function.

Alright, add a call to syscall to the end of first and then add another bwputs call after the activate call in main, build, and run. You should see your new message printed last.

Code so far on GitHub

Heading Back to User Mode

Add another print and another syscall to first, and another print and call to activate to main. Compile and run your code. What do you see?

When we call activate again, the user mode task re-starts at the beginning! That’s not what we want, but it makes sense. We never saved our place inside the user mode task. In fact, we don’t even have a reference to where its stack was when it called the syscall, so we’re just going back with the stack from before. That’s not going to work. We need to add some code to our SVC entry to save the user mode task’s state. If you recall the way we set up the task before, you’ll now see why. The way the stack looks after we save our state on it is exactly the same as the way we set it up! Everything will be where activate expects it:

msr CPSR_c, # /* System mode */
push {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
mov r0, sp
msr CPSR_c, # /* Supervisor mode */

mrs ip, SPSR
stmfd r0!, {ip,lr}

There, all the registers are on the user stack. We use stmfd which is just like push, but lets us operate on another register. You’ll note we had to save both versions of lr. One is state for the syscall wrapper to use, the other is the address inside the syscall wrapper we need to jump back to later. Now we just need some way to get the new location of the top of the stack back to the kernel.

Conveniently, our C code expects the return value of a function to be in r0, which is where I’ve put the user mode sp in this example. Just change the definition of activate in asm.h to return the right type, and then assign the return value somewhere and pass that to the next call to activate. You can now keep calling into your user mode task as many times as you want!

That’s It!

We now have a kernel that can start a user mode task, and switch back and forth between the kernel and the user mode task at will. Next time we’ll look at multitasking and maybe some other stuff.

Code for this post is on GitHub.

Writing a Simple OS Kernel — Part 2, User Mode

Posted on

Updated 2012-33 to fix a bug in the context switch assumptions.
Edited for clarity, 2012-34.

Last time, we got a basic bit of C code bootstrapped and running all by itself on QEMU. This is often called “bare-metal” programming. This time we’re going to add to that and do something a bit more complicated: we’re going to run a single user-mode program on our kernel.

What is User Mode?

User mode is the name given to the mode we put the computer in when running anything other than the kernel. This often has implications for memory protection and similar, but we don’t have any of that. For us it’s just going to be the CPU mode in which certain things (such as changing CPU modes) cannot be done.

User mode also often refers to the slices of the system resources given to different processes, etc, such that they can all be run on the same computer without interfering with each other.

First, some abstractions

We hard-coded some values and functionality last time to write to the serial port. That code was very simple for printing one statement, but we may want to clean it up a bit if we’re going to use the serial port a lot.

First, let’s pull out everything machine-specific from kernel.c and put it in a header file for the machine, call it versatilepb.h:

# UART0 ((volatile unsigned int*)0x101f1000)

So far we just throw the characters at the serial port as fast as we can, and hope they get caught. That will probably always work on QEMU, but will not work on real hardware, so let’s add the ability to check if the serial port can handle a byte right now. For that we’ll need two more constants:

# UARTFR 0x06

UARTFR is the offset (in words) from the UART0 base address of the flags for the serial port. UARTFR_TXFF is the mask that gets the bit representing if the transmit buffer is full.

Now you can put the following at the top of kernel.c:

# "versatilepb.h"

void bwputs(char *s) {
	while(*s) {
		while(*(UART0 + UARTFR) & UARTFR_TXFF);
		*UART0 = *s;

Now you can replace the big mess in main with bwputs("Hello, World!\n"); or any other message. Much cleaner!

Code so far on GitHub

Calling Into Assembly Code

Last time, we had a small piece of assembly code that called in to our C program. This time, we are going to want to call into assembly code from our C code. If you are on Ubuntu or some other systems, your C compiler may be defaulting to generating “thumb mode” instructions, which are not what we need to use for our assembly code. So we need to tell the C compiler to generate normal ARM instructions. To do this with gcc, add -marm to the end of your CFLAGS

A User Program

Create a simple “user program” in the form of a function in kernel.c called first that just prints and then hangs (since we will not build the ability to get out of user mode until later):

void first(void) {
	bwputs("In user mode\n");

Assembly stub

Create a new file called context_switch.s with the following:

.global activate

And a new file called asm.h with the following:

void activate(void);

Include this new header file into kernel.c so that you can call the assembly stub from your C code, then add a call to activate to your main.

Finally, add context_switch.o as a dependency to kernel.elf in your Makefile so that it will get built.

The Context Switch

Alright, what are the absolute minimum things we need our switch to user mode (called the “context switch”) to do? Well, it the very least we need a way to start running some function in user mode.

The way to switch an ARM system into user mode is to use the movs instruction to put some address to jump to (like the address of our function) into the pc register (the “program counter”, which is where the CPU is currently executing). But what mode will the CPU enter when we do this? The answer is that it will read the contents of a special register called SPSR (Saved Processor Status Register) and use that to change CPSR (Current Processor Status Register), and thus change modes. Couldn’t we just change CPSR directly? Because we’re not in user mode, we could, but since we want to jump into our function the moment we switch modes, this is the safest way to do it:

mov r0, #
msr SPSR, r0
ldr lr, =first
movs pc, lr

0x10 is just the value that sets the bit meaning “user mode”. We set that to SPSR, load the location of first and then movs there.

You can stick that at the start of activate and try to run that if you like, but it won’t work. Why is that? Remember how we had to set up the stack in order to jump into C code? Well, it turns out that one of the differences of user mode is that it uses a different sp register. This can be very handy later when we’re doing more complicated things, but for now we can just set the user mode stack to be the same as the kernel stack, by adding the following before the movs:

mov ip, sp
msr CPSR_c, # /* System mode */
mov sp, ip
msr CPSR_c, # /* Supervisor mode */

So what are we doing here? We copy our current sp to ip (because we’ll have a different sp in user mode, so we need to copy it somewhere), then we set a part of CPSR directly to enter “system mode”. What is system mode? It’s a special mode on the ARM processor that lets us access the registers as though we were in user mode, but still be able to do privileged things. We set user mode sp to our copy, then switch back to supervisor mode (which is where we normally operate in the kernel).

If you build the kernel now, and run it under QEMU, you should get “In user mode” printed out. Good job!

Code so far on GitHub

A Better Stack

Using the same kernel stack for our user mode program isn’t going to work very well if we want to be able to pause the program and go back to it, because other things will use the kernel stack in between, so we really want the program to have it’s own stack.

First, declare some space for your user stack:

unsigned int first_stack[256];

Then pass first_stack + 256 to activate, and change asm.h to have activate take an argument.

The first four arguments to an assembly call come in as r0-r3, so we can easily access this parameter inside activate:

msr CPSR_c, # /* System mode */
mov sp, r0
msr CPSR_c, # /* Supervisor mode */

One less line, since we can access r0 from user mode directly.

Less hardcoding

The program should still run, but now it’s using its own stack. We still have the value of SPSR and the name of the function we’re calling hardcoded into the assembly. We could pass these as parameters, but then we would have to remember them in a special way when it comes to being able to enter and re-enter the same user mode function multiple times (since the current stack, CPU mode, and entry point can change between calls), so it’s actually easiest to store these two additional values on the user mode program’s stack. We will store them in special positions so that all the user mode registers can be saved along with them easily.

We’ll want to move the calculation of the end of the stack up, so that we can put our data into in:

unsigned int *first_stack_start = first_stack + 256 - 16;
first_stack_start[0] = 0x10;
first_stack_start[1] = (unsigned int)&first;

You’ll note the cast to (unsigned int) of the function pointer. This is mostly to make the compiler not warn us about using a function pointer as data. You should now pass first_stack_start to activate. If you want to, you can test that it still works, but we aren’t actually using this new data yet.

We’ve done a lot of work on the assembly, and are about to change it quite a bit, so I’ll reproduce the whole context switch here with the changes to use these values:

.global activate
	ldmfd r0!, {ip,lr} /* Get SPSR and lr */
	msr SPSR, ip

	msr CPSR_c, # /* System mode */
	mov sp, r0
	pop {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	msr CPSR_c, # /* Supervisor mode */

	movs pc, lr

This loads the first two elements from the passed-in stack to SPSR and lr (ldmfd, when coupled with the ! on the first argument, is the same as pop, but works with any register instead of just sp), then we switch to usermode and set the stack to r0, as before. Finally we use the pop instruction to load the rest of the stuff into our registers. We have nothing in there just now, but we’ll use more later, and this also makes the stack skip over all that stuff so that the process can use all of the space.

You’ll note we had to set lr twice. This is because, like sp, lr has a different version used in user mode from the one used in supervisor mode.

That’s it!

We now have a kernel that sets up a user mode task and then switches to it. Next time: getting back out of user mode!

The code for this post is on GitHub.

Writing a Simple OS Kernel — Part 1

Posted on

This is the first of what I intend to be a series of posts detailing the specifics of writing and testing a simple OS kernel.

Because some parts of any kernel have to be platform-specific, I had to choose a platform to target for this example. In order to make the assembly code simple but still relevant, I chose ARMv6 for the architecture target. In order to make testing easy, I chose the versatilepb board as a target, which is very well supported by QEMU. As a result (unfortunately) the target is little-endian, but you can’t have everything. I will try to make sure I point out when something platform-specific is going on.

Setting up your system

First, you’ll need a build system for the ARM target. On Debian you’ll need to add the emdebian repository to get a cross-compiling toolchain. On Ubuntu it is already in the repositories. Instal the gcc-arm-linux-gnueabi package. Other systems may vary. This package adds all your standard tools (like gcc, ld, etc) but with as arm-linux-gnueabi- prefix.

Then, you’ll need the qemu-system package for use in testing your code.

Building code

Create a file called kernel.c containing the following:

int main(void) { return 0; }

It’s not very exciting, and we’ll revisit it in a minute, but we need something there or we can’t test our build.

One thing our build system is going to need to handle is running the arm-linux-gnueabi- prefixed commands instead of the normal toolchain. There are multiple ways you can choose to do this:

  1. Write a simple env shell script that you source in any terminal you cross-compile in. In this script, set aliases for all the standard commands to the cross-compiler commands.
  2. Write a simple env shell script that you source in any terminal you cross-compile in. In this script, set environment variables (like CC, LD, OBJCOPY) to point at the cross-compiler commands.
  3. Set the cross-compiler commands as the defaults for your make variables at the top of your Makefile.

Any of these ways, and probably many others, can work. I’m going to choose option 3 for this example, so I’ll start my Makefile off with:


We’re going to need to send some options to the C compiler to get it to generate the right sort of output, so that means defining a CFLAGS make variable. We need to specify the target architecture, tell the compiler to emulate floating point in software and to generate position-independant code, and tell the assembler to generate complete stack frames. I also prefer to turn on lots of warnings, so my CFLAGS looks like:

CFLAGS=-ansi -pedantic -Wall -Wextra -march=armv6 -msoft-float -fPIC -mapcs-frame

If you don’t know what some of these options do, you can check the gcc man page.

Now run the following to build the object code:

make kernel.o

Linking an ELF file

Now we have some code compiled for our target, but we also need to link it. You may be used to invoking the C compiler in order to link programs, but that won’t work for us because it actually links in some extra things that we don’t want, so we’ll need direct access to the linker by adding this to the top of our Makefile:


Next we’re going to have to send some special flags to the linker. One to tell it not to do a bunch of formatting and dynamic linking, and another to tell it where to start the code (since our code will not be loaded by an OS, it needs to sit at a specific place in RAM. This will depend on your bootloader and hardware. The memory address for the loader we’re using with QEMU is 0x10000). My LDFLAGS looks like:

LDFLAGS=-N -Ttext=0x10000

Finally, we need to write a make rule to actually link *.o files into *.elf files. Here it is:

.SUFFIXES: .o .elf
	$(LD) $(LDFLAGS) -o $@ $^

You should now be able to link your code with the following:

make kernel.elf

It probably gives you a warning about not being able to find the entry symbol. This is because the linker doesn’t actually expect to start in main, but rather in _start. We’ll deal with this in a bit.

Running the code

It turns out that qemu’s loader can actually load ELF files, so we now have something that we can run. Run the following:

qemu-system-arm -M versatilepb -cpu arm1176 -nographic -kernel kernel.elf

The switches just select the target and tell QEMU not to bother launching a video display. You will note that the program does not exit. This is because when you return 0; in our trivial code, you have nowhere to return to except QEMU’s loader, so the program just gets loaded again. Hit Ctrl-a and then hit x to quit.

Making the code do something interesting

Ok, now that we have some code building and running, lets make it do something. I’d like to emulate the good old “Hello, World!” program. Unfortunately, our code is everything running on the system. We have no standard out. We haven’t even initialised a video device. Are we going to have to write a whole video driver before we can even see anything?

Thankfully, serial ports come to our rescue. You may think serial ports are outdated, but they have one major adavtage: writing a serial port driver is trivial. When we run QEMU with -nographic, it is actually binding the first serial port of the emulated system to STDIN/STDOUT. Nice!

So, how does one talk to a serial port? Well, that is platform specific. If you look at the documentation for our target board, you’ll see that it uses reads and writes to special memory addresses in order to talk to devices. This is called memory-mapped I/O. The memory address we need to write to in order to send a byte over the first serial port (called UART0 in the documentation) is 0x101f1000. So let’s write a trivial C program to shove bytes into that memory address:

int main(void) {
	char *string = "Hello, World!\n";
	while(*string) {
		*(volatile char *)0x101f1000 = *string;

	while(1); /* We can't exit, there's nowhere to go */
	return 0;

Pretty simple. If you haven’t seen the volatile keyword before, it just tells the optomiser that this memory address needs to be treated like it is in RAM, otherwise the optomiser could decide to store it somewhere else, which wouldn’t work for us because we’re not actually storing a value, we’re talking to a device.

Notice that we also put an infinite loop at the end of the program, to keep it from going back to the QEMU loader.

Rebuild kernel.elf and run it with QEMU again. What happened? Nothing? Hmm, something’s not right.

Setting up the stack

C programs store their area on a stack in RAM. Unfortunately, we haven’t told anything where in RAM to put that, so our C program is not a happy camper. But how can we set things up if we can’t run code?

Well, we can run assembly code. This sounds more complicated than it is. All we need to do is set the stack pointer (which is a register that all C code defined for ARM knows about and uses to find the stack) and then call our main function. Here’s the code, saved as bootstrap.s:

.global _start
	ldr sp, =0x07FFFFFF
	bl main

Ah! There’s the _start symbol that our liker has been looking for! The first line just declares _start as visible to the linker. The ldr instruction sets the stack pointer to 0x07FFFFFF, which is the the address of the end of the 128MB of RAM that our QEMU system gives us. The bl instruction just jumps to our main function in the C program.

Building assembly code

We need to add some stuff to our Makefile so that we can build this new assembly file. Here’s the rule to build *.s files to *.o files:

.SUFFIXES: .s .o
	$(CC) $(CFLAGS) -o $@ -c $^

We also need to add a new rule for kernel.elf now, beacuse we’re linking multiple *.o files into it:

kernel.elf: bootstrap.o kernel.o

We’re done!

If you rebuild kernel.elf and run it with QEMU again, you should see “Hello, World!” printed in your terminal. Well done. I think that’s enough for one blog post, but I’ll be back with more.

The code from this post can be found on Github.