Updated 2012-33 to fix a bug in the context switch assumptions.
Edited for clarity, 2012-34.

Last time, we got a basic bit of C code bootstrapped and running all by itself on QEMU. This is often called “bare-metal” programming. This time we’re going to add to that and do something a bit more complicated: we’re going to run a single user-mode program on our kernel.

What is User Mode?

User mode is the name given to the mode we put the computer in when running anything other than the kernel. This often has implications for memory protection and similar, but we don’t have any of that. For us it’s just going to be the CPU mode in which certain things (such as changing CPU modes) cannot be done.

User mode also often refers to the slices of the system resources given to different processes, etc, such that they can all be run on the same computer without interfering with each other.

First, some abstractions

We hard-coded some values and functionality last time to write to the serial port. That code was very simple for printing one statement, but we may want to clean it up a bit if we’re going to use the serial port a lot.

First, let’s pull out everything machine-specific from kernel.c and put it in a header file for the machine, call it versatilepb.h:

#define UART0 ((volatile unsigned int*)0x101f1000)

So far we just throw the characters at the serial port as fast as we can, and hope they get caught. That will probably always work on QEMU, but will not work on real hardware, so let’s add the ability to check if the serial port can handle a byte right now. For that we’ll need two more constants:

#define UARTFR 0x06 #define UARTFR_TXFF 0x20

UARTFR is the offset (in words) from the UART0 base address of the flags for the serial port. UARTFR_TXFF is the mask that gets the bit representing if the transmit buffer is full.

Now you can put the following at the top of kernel.c:

#include "versatilepb.h"

void bwputs(char *s) {
	while(*s) {
		while(*(UART0 + UARTFR) & UARTFR_TXFF);
		*UART0 = *s;
		s++;
	}
}

Now you can replace the big mess in main with bwputs("Hello, World!\n"); or any other message. Much cleaner!

Code so far on GitHub

Calling Into Assembly Code

Last time, we had a small piece of assembly code that called in to our C program. This time, we are going to want to call into assembly code from our C code. If you are on Ubuntu or some other systems, your C compiler may be defaulting to generating “thumb mode” instructions, which are not what we need to use for our assembly code. So we need to tell the C compiler to generate normal ARM instructions. To do this with gcc, add -marm to the end of your CFLAGS

A User Program

Create a simple “user program” in the form of a function in kernel.c called first that just prints and then hangs (since we will not build the ability to get out of user mode until later):

void first(void) {
	bwputs("In user mode\n");
	while(1);
}

Assembly stub

Create a new file called context_switch.s with the following:

.global activate activate:

And a new file called asm.h with the following:

void activate(void);

Include this new header file into kernel.c so that you can call the assembly stub from your C code, then add a call to activate to your main.

Finally, add context_switch.o as a dependency to kernel.elf in your Makefile so that it will get built.

The Context Switch

Alright, what are the absolute minimum things we need our switch to user mode (called the “context switch”) to do? Well, it the very least we need a way to start running some function in user mode.

The way to switch an ARM system into user mode is to use the movs instruction to put some address to jump to (like the address of our function) into the pc register (the “program counter”, which is where the CPU is currently executing). But what mode will the CPU enter when we do this? The answer is that it will read the contents of a special register called SPSR (Saved Processor Status Register) and use that to change CPSR (Current Processor Status Register), and thus change modes. Couldn’t we just change CPSR directly? Because we’re not in user mode, we could, but since we want to jump into our function the moment we switch modes, this is the safest way to do it:

mov r0, #0x10 msr SPSR, r0 ldr lr, =first movs pc, lr

0x10 is just the value that sets the bit meaning “user mode”. We set that to SPSR, load the location of first and then movs there.

You can stick that at the start of activate and try to run that if you like, but it won’t work. Why is that? Remember how we had to set up the stack in order to jump into C code? Well, it turns out that one of the differences of user mode is that it uses a different sp register. This can be very handy later when we’re doing more complicated things, but for now we can just set the user mode stack to be the same as the kernel stack, by adding the following before the movs:

mov ip, sp msr CPSR_c, #0xDF /* System mode */ mov sp, ip msr CPSR_c, #0xD3 /* Supervisor mode */

So what are we doing here? We copy our current sp to ip (because we’ll have a different sp in user mode, so we need to copy it somewhere), then we set a part of CPSR directly to enter “system mode”. What is system mode? It’s a special mode on the ARM processor that lets us access the registers as though we were in user mode, but still be able to do privileged things. We set user mode sp to our copy, then switch back to supervisor mode (which is where we normally operate in the kernel).

If you build the kernel now, and run it under QEMU, you should get “In user mode” printed out. Good job!

Code so far on GitHub

A Better Stack

Using the same kernel stack for our user mode program isn’t going to work very well if we want to be able to pause the program and go back to it, because other things will use the kernel stack in between, so we really want the program to have it’s own stack.

First, declare some space for your user stack:

unsigned int first_stack[256];

Then pass first_stack + 256 to activate, and change asm.h to have activate take an argument.

The first four arguments to an assembly call come in as r0-r3, so we can easily access this parameter inside activate:

msr CPSR_c, #0xDF /* System mode */ mov sp, r0 msr CPSR_c, #0xD3 /* Supervisor mode */

One less line, since we can access r0 from user mode directly.

Less hardcoding

The program should still run, but now it’s using its own stack. We still have the value of SPSR and the name of the function we’re calling hardcoded into the assembly. We could pass these as parameters, but then we would have to remember them in a special way when it comes to being able to enter and re-enter the same user mode function multiple times (since the current stack, CPU mode, and entry point can change between calls), so it’s actually easiest to store these two additional values on the user mode program’s stack. We will store them in special positions so that all the user mode registers can be saved along with them easily.

We’ll want to move the calculation of the end of the stack up, so that we can put our data into in:

unsigned int *first_stack_start = first_stack + 256 - 16; first_stack_start[0] = 0x10; first_stack_start[1] = (unsigned int)&first;

You’ll note the cast to (unsigned int) of the function pointer. This is mostly to make the compiler not warn us about using a function pointer as data. You should now pass first_stack_start to activate. If you want to, you can test that it still works, but we aren’t actually using this new data yet.

We’ve done a lot of work on the assembly, and are about to change it quite a bit, so I’ll reproduce the whole context switch here with the changes to use these values:

.global activate
activate:
	ldmfd r0!, {ip,lr} /* Get SPSR and lr */
	msr SPSR, ip

	msr CPSR_c, #0xDF /* System mode */
	mov sp, r0
	pop {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	msr CPSR_c, #0xD3 /* Supervisor mode */

	movs pc, lr

This loads the first two elements from the passed-in stack to SPSR and lr (ldmfd, when coupled with the ! on the first argument, is the same as pop, but works with any register instead of just sp), then we switch to usermode and set the stack to r0, as before. Finally we use the pop instruction to load the rest of the stuff into our registers. We have nothing in there just now, but we’ll use more later, and this also makes the stack skip over all that stuff so that the process can use all of the space.

You’ll note we had to set lr twice. This is because, like sp, lr has a different version used in user mode from the one used in supervisor mode.

That’s it!

We now have a kernel that sets up a user mode task and then switches to it. Next time: getting back out of user mode!

The code for this post is on GitHub.

6 Responses

Rakesh S • 2013-052.497Z

unsigned int first_stack[256];
unsigned int *first_stack_start = first_stack + 256 – 16;
first_stack_start[0] = 0x10;
first_stack_start[1] = (unsigned int)&first;

Can you please explain those lines in simplified way ?

Thanks

Stephen Paul Weber • 2013-052.635Z

This line allocates an arry of 256 unsigned ints, which will be 256 machine words on our architecture.

unsigned int *first_stack_start = first_stack + 256 – 16;

This line sets a pointer to a location that is 16 positions before the end of the above allocated array. It is equivalent to &first_stack[256 - 16]

first_stack_start[0] = 0×10;

This sets the position pointed to by first_stack_start to the number 0x10. It is equivalent to: first_stack[256 - 16] = 0x10

first_stack_start[1] = (unsigned int)&first;

Set the second position in first_stack_start to the address of the procedure first.

Does that help?

Rakesh S. • 2013-058.254Z

Thanks for your reply.

I have another question out of this tutorial series. I want to do read-write operation on NAND memory, so how do I get access of it ?

Stephen Paul Weber • 2013-059.618Z

If you look in versatilepb.h on Github, you’ll see links to the datasheet site for this board. The navigation pane on the datasheet site (or perhaps a search of that site) should turn up information on the NAND controller.

sm • 2018-200.017Z

Hi Stephen,

Really appreciate this tutorial. I just wanted to get a clearer picture of the (256 -16) adjustment. We made that adjustment so the sp does not read over the user-defined data. The rest of the array elements (240 ints) are allocated for ARM registers (r0-r15)?

But why 16? We only passed two params (SPSR val and function addre) so shouldn’t 256-2 be sufficed?

Thanks,

Stephen Paul Weber • 2018-200.052Z

@sm the `-16` is because we do 16 pops from the stack. `ldmfd r0!, {ip,lr}` and then `pop {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}` — on first activate that pop will just load uninitialized garbage from our stack, which is fine since nothing is expecting the registers to have any particular values. This is done to be consistent with future context switches which *will* have saved register values on that part of the stack.

Singpolyma

Writing a Simple OS Kernel — Part 2, User Mode