Singpolyma

Personal Blog

Archive for the "0x4" Category

Writing a Simple OS Kernel — Part 6, Inter-Process Communication

Posted on

Last time we got hardware interrupts working, and used them to implement preemption for our multitasking. This time we’re going to get our user space processes talking to one another.

getpid

We’re going to need some way for replies to our messages to get sent back to a process. We can’t use whatever other mechanisms we build for registering a new communications channel for this, because without it we cannot receive communications, even about a new communications channel. Luckily we have a piece of information that comes with every process: its ID. In our case, the ID of a task is going to just be its index in the array of stacks in our main.

So the normal drill for a new syscall in the case statement in main:

case 0x2: /* getpid */
	tasks[current_task][2+0] = current_task;
	break;

And in syscall.s:

.global getpid
getpid:
	push {r7}
	mov r7, #
	svc 0
	bx lr

And in asm.h:

int getpid(void);

Removed the accidentally shared static variable. Do not use statics without a virtual memory system!

Code for this section on Github.

Putting processes to sleep

Currently our scheduler just round-robins through all tasks, but if we want a task to be able to wait for a message, we need some way for it to go to sleep while it waits. We will do this by having a new piece of metadata in our process space for “status”:

# TASK_READY       0
# TASK_WAIT_READ   1
# TASK_WAIT_WRITE  2

On coming out of a task (right after the activate call in main) it is definitely ready:

tasks[current_task][-1] = TASK_READY;

Then replace the two lines at the end of the while loop in main with a new scheduler that skips over tasks that aren’t ready:

/* Select next TASK_READY task */
while(TASK_READY != tasks[current_task =
	(current_task+1 >= task_count ? 0 : current_task+1)][-1]);

Now if any syscall sets the status to something else, the process will sleep until woken up.

Code for this section on Github.

Ringbuffers

We’re going to do our communication between processes using a rudimentary implementation of UNIX-style pipes. The pipes will be implemented using ringbuffers. Why? Because ringbuffers are easy to use for FIFO-style communication without needing any dynamic memory allocation. Pipes are allowed to get “full” and luckily we can tell when a ringbuffer is full, so this fits. We’re going to want some generic ringbuffer utilities, and it won’t hurt to have versions specialised to the pipe buffers:

# STACK_SIZE 1024 /* Size of task stacks in words */
# TASK_LIMIT 2   /* Max number of tasks we can handle */
# PIPE_BUF   512 /* Size of largest atomic pipe message */
# PIPE_LIMIT (TASK_LIMIT*5)

struct pipe_ringbuffer {
	int start;
	int end;
	char data[PIPE_BUF];
};

# RB_PUSH(rb, size, v) do { \
		(rb).data[(rb).end] = (v); \
		(rb).end++; \
		if((rb).end > size) (rb).end = 0; \
	} while(0)

# RB_POP(rb, size, v) do { \
		(v) = (rb).data[(rb).start]; \
		(rb).start++; \
		if((rb).start > size) (rb).start = 0; \
	} while(0)

# RB_LEN(rb, size) (((rb).end - (rb).start) + \
	(((rb).end < (rb).start) ? size : 0))

# PIPE_PUSH(pipe, v) RB_PUSH((pipe), PIPE_BUF, (v))
# PIPE_POP(pipe, v)  RB_POP((pipe), PIPE_BUF, (v))
# PIPE_LEN(pipe)     (RB_LEN((pipe), PIPE_BUF))

You’ll note I’ve increased the stack size. We’re going to be handling more data, and with our current setup if you need more space than the stack size it’s just going to cause very strange and hard-to-find bugs. The pipe limit is expressed in terms of the number of tasks. This makes sense since as the number of tasks increases, it is likely that the number of channels we’ll need increases, especially since we’re going to reserve one per task to begin with.

Now we just need to allocate our pipes in main:

struct pipe_ringbuffer pipes[PIPE_LIMIT];

and initialize them all:

size_t i;
/* Initialize all pipes */
for(i = 0; i < PIPE_LIMIT; i++) {
	pipes[i].start = pipes[i].end = 0;
}

Code for this section on Github.

Reading and writing

Now for the meaty syscalls. Some boilerplate first:

int write(int fd, const void *buf, size_t count);
int read(int fd, void *buf, size_t count);

.global write
write:
	push {r7}
	mov r7, #
	svc 0
	bx lr
.global read
read:
	push {r7}
	mov r7, #
	svc 0
	bx lr
case 0x3: /* write */
	_write(tasks[current_task], tasks, task_count, pipes);
	break;
case 0x4: /* read */
	_read(tasks[current_task], tasks, task_count, pipes);
	break;

You’ll note that I’ve placed the bodies of these syscalls off in their own functions. That’s partly because they’re going to be much bigger than what we’ve seen before, but also because they actually need to be able to call each other.

So, definition before use:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);
void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);

These are a bit long and maybe non-obvious, so one piece at a time:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	task[-1] = TASK_READY;

In case there’s an error, make sure we don’t stay blocked forever.

/* If the fd is invalid, or trying to read too much  */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

An out-of-range pipe index is obviously an error. And the ringbuffers can’t hold more than PIPE_BUF bytes, so it’s impossible to read more than that at once.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];
		if((size_t)PIPE_LEN(*pipe) < task[2+2]) {
			/* Trying to read more than there is: block */
			task[-1] = TASK_WAIT_READ;

Ah, our first actual blocking call! We get the ringbuffer for the pipe that is being read from, and if it has less bytes in it than the task wants to read, set the task into the TASK_WAIT_READ state.


		} else {
			size_t i;
			char *buf = (char*)task[2+1];
			/* Copy data into buf */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_POP(*pipe,buf[i]);
			}

We have a valid pipe with enough bytes in the buffer. Pop each byte off into the read buffer until we have read as many as were requested.


			/* Unblock any waiting writes
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_WRITE) {
					_write(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Loop over all the tasks, and any that were waiting to write, unblock them and try the write again. This is somewhat inefficient, since only tasks that were waiting to write to the pipe we just read from have any chance at all of being able to succeed (all the other pipes are just as full as they used to be), but it works.


void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	/* If the fd is invalid or the write would be non-atomic */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

Again, an out-of-range pipe index is obviously an error. And because of the ringbuffers, you’ll never be able to write more than PIPE_BUF bytes into a pipe all at once. We want our writes to be atomic (because we have preemptive multitasking going on), so trying to write more than PIPE_BUF bytes needs to be an error as well.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];

		if((size_t)PIPE_BUF - PIPE_LEN(*pipe) < 
			task[2+2]) {
			/* Trying to write more than we have space for: block */
			task[-1] = TASK_WAIT_WRITE;

If the write would put in more bytes than the pipe currently has space for, then block waiting for a read to make room for us.


		} else {
			size_t i;
			const char *buf = (const char*)task[2+1];
			/* Copy data into pipe */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_PUSH(*pipe,buf[i]);
			}

We have a valid write and there is enough space in the pipe. Push all the bytes from the write buffer into the pipe.


			/* Unblock any waiting reads
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_READ) {
					_read(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Same as before, this is inefficient. Only tasks waiting to read from pipes we just wrote to are really interested in waking up. But it doesn’t take too long to check, and it’s easier for now.

Code for this section on Github.

Pathserver

So, we have working read and write calls now, but in order for processes to communicate they’re going to need to agree on what pipe indices to use. A parent process can get its child’s PID (from fork) and write to it that way, maybe, but that’s not a generic solution.

We’re going to solve this problem by building a nameserver. The names in question are going to be pseudo-filesystem “paths”, so let’s call it a pathserver. We’ll need some string functions:

int strcmp(const char* a, const char* b) {
	int r = 0;
	while(!r && *a && *b) r = (*a++) - (*b++);
	return (*a) - (*b);
}

size_t strlen(const char *s) {
	size_t r = 0;
	while(*s++) r++;
	return r;
}

And since everything is static sized for us, how long can a path be?

# PATH_MAX 255 /* Longest absolute path */

Oh, and we need a way to determine what the pipe index of the pathserver itself is. Otherwise this whole concept is a nonstarter.

# PATHSERVER_FD (TASK_LIMIT+3) /* File descriptor of pipe to pathserver */

We said we’re reserving one pipe for each task for replies, and let’s also reserve 0-2, since those file descriptor numbers are usually “magic” and we don’t want to have any mix-ups until we get real FD tables in place later. The pathserver will manage all other pipes, so it can assign itself the first one. Now we have a constant that tells us the pipe index of the pathserver.

We can use this information to define a protocol and pseudo-syscall for registering a new path:

int mkfifo(const char *pathname, int mode) {
	size_t plen = strlen(pathname)+1;
	char buf[4+4+PATH_MAX];
	(void)mode;

	*((unsigned int*)buf) = 0;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);

	/* XXX: no error handling */
	return 0;
}

Notice I said pseudo-syscall. It may act sort of like a syscall, but in fact it is built entirely based on primitives we already have! This is one of the huge benefits of a microkernel: once you get your basic functionality in place, most things can be built on top of what you already have, as wrappers.

The protocol here is a zero, followed by the length of the path we’re registering, followed by the characters of that path. Pretty simple. There’s no way for the pathserver to signal an error to us, because we don’t wait for a reply, but unless there are no more pipes this can’t fail, so we’ll just leave it for now.

int open(const char *pathname, int flags) {
	unsigned int replyfd = getpid() + 3;
	size_t plen = strlen(pathname)+1;
	unsigned int fd = -1;
	char buf[4+4+PATH_MAX];
	(void)flags;

	*((unsigned int*)buf) = replyfd;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);
	read(replyfd, &fd, 4);

	return fd;
}

The protocol here looks the same as before, only instead of 0 as the first word, we send our reserved pipe. That’s our PID, moved ahead by the three we’re reserving. Conveniently, this means no valid replyfd will ever be 0, so we can differentiate between mkfifo and open.

Now for the pathserver itself, piece by piece.

void pathserver(void) {
	char paths[PIPE_LIMIT - TASK_LIMIT - 3][PATH_MAX];
	int npaths = 0;
	int i = 0;
	unsigned int plen = 0;
	unsigned int replyfd = 0;
	char path[PATH_MAX];

	memcpy(paths[npaths++], "/sys/pathserver", sizeof("/sys/pathserver"));

Need to store the paths of the pipes. We’ll just use an array and a linear lookup for now. We also register the first pipe as being the pathserver.


	while(1) {
		read(PATHSERVER_FD, &replyfd, 4);
		read(PATHSERVER_FD, &plen, 4);
		read(PATHSERVER_FD, path, plen);

Loop forever, reading the three pieces of data out of our pipe. Recall that we wrote all of this data with a single write call. That’s important because we need all these bytes together, so the writes must be atomic. Once we’re reading, however, the bytes are already in a deterministic order, so we can read them out in pieces. This is especially good, because we wrote the length of the message as part of the message! So we certainly need to be able to read one piece at a time.


		if(!replyfd) { /* mkfifo */
			memcpy(paths[npaths++], path, plen);

Registering a new path is super easy. Should probably check for out-of-bounds here, but this works for now.


		} else { /* open */
			/* Search for path */
			for(i = 0; i < npaths; i++) {
				if(*paths[i] && strcmp(path, paths[i]) == 0) {
					i += 3; /* 0-2 are reserved */
					i += TASK_LIMIT; /* FDs reserved for tasks */
					write(replyfd, &i, 4);
					i = 0;
					break;
				}
			}

Linear search for the path. If we find it, adjust the index by the reserved indices and write the reply. We ignore empty paths, to allow us the possibility of pipes without names that the pathserver knows about, but you can’t usefully search for. Notice also that we set i back to zero after writing a reply, because:


			if(i >= npaths) {
				i = -1; /* Error: not found */
				write(replyfd, &i, 4);
			}
		}
	}
}

If we shoot off the end, then i >= npaths, in which case we didn’t find it. Well, don’t let the caller just block forever waiting for a reply that’s not coming! Send them -1.

Now we’ll just tie this all together with some tasks that use this new functionality:

# TASK_LIMIT 3

void otherguy() {
	int fd;
	unsigned int len;
	char buf[20];
	mkfifo("/proc/0", 0);
	fd = open("/proc/0", 0);
	while(1) {
		read(fd, &len, 4);
		read(fd, buf, len);
		bwputs(buf);
	}
}

void first(void) {
	int fd;

	if(!fork()) pathserver();
	if(!fork()) otherguy();

	fd = open("/proc/0", 0);
	while(1) {
		int len = sizeof("Ping\n");
		char buf[sizeof("Ping\n")+4];
		memcpy(buf, &len, 4);
		memcpy(buf+4, "Ping\n", len);
		write(fd, buf, len+4);
	}
}

We spin up the pathserver and the otherguy. The otherguy just waits for incoming strings (19 characters max, but it’s just a demo) and prints any that come in. Then, our first task run in a loop forever sending Ping to the otherguy, and they get printed.

We just built a rudimentary serial port driver! It’s not using interrupts properly yet, but at least strings send to the otherguy are printed atomically, because the otherguy “owns” the serial port and we don’t have all the tasks preempting each other in the middle of a write.

Code for this section on Github.

We’re done!

We now have all the machinery we need in order to send messages between processes. Next time, we’ll look at using all of this to build a more fully-functional serial port driver.

The code so far is on Github.

Writing a Simple OS Kernel — Part 5, Hardware Interrupts

Posted on

Last time we got the kernel switching back and forth between multiple tasks. This time we’re going to get a simple timer to wake up the kernel at fixed intervals.

Motivation

So far we’ve been interacting with hardware when we want to. Writing to the serial port when we have something to print, for example. Why would we want to let the hardware interrupt what we’re doing?

A few reasons:

  1. Currently, our user mode tasks have to call a syscall on purpose in order for any other task to be able to run. This means one task can run forever and block anyone else from running (called starvation). So we at minimum need some way to interrupt running tasks (called preemption).
  2. The hardware knows when it’s ready. It is way more efficient to let the hardware say “I’m ready now” then to keep asking it.
  3. The hardware may actually stop being ready by the time we get around to talking to it again. We want to talk to the hardware during the window when it’s able to talk

Enabling a Simple Timer

Before we get to the serial port, there is an even simpler piece of hardware we can work with first: timers. Timers just take a value and count down to zero, sort of like an alarm clock.

I could make you go look up in the user guides what all the magic numbers for our timers are, but I’m a nice guy, so here’s the ones we need to add to versatilepb.h:

/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/index.html */
# TIMER0 ((volatile unsigned int*)0x101E2000)
# TIMER_VALUE 0x1 /* 0x04 bytes */
# TIMER_CONTROL 0x2 /* 0x08 bytes */
# TIMER_INTCLR 0x3 /* 0x0C bytes */
# TIMER_MIS 0x5 /* 0x14 bytes */
/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/Babgabfg.html */
# TIMER_EN 0x80
# TIMER_PERIODIC 0x40
# TIMER_INTEN 0x20
# TIMER_32BIT 0x02
# TIMER_ONESHOT 0x01

Now add the following somewhere near the top of your main:

*TIMER0 = 1000000;
*(TIMER0 + TIMER_CONTROL) = TIMER_EN | TIMER_ONESHOT | TIMER_32BIT;

The timer on our QEMU system defaults to using a 1Mhz reference, which means that it ticks 1000000 times per second. So, by setting it’s value to 1000000, it will reach 0 after one second.

Then, on the second line, we enable the timer, set it to “oneshot” mode (that is, it will just stop when it’s done counting, we’ll see another mode later), and set it to 32BIT mode (so that it can hold really big numbers).

Now, somewhere in your while loop, put this:

if(!*(TIMER0+TIMER_VALUE)) {
   bwputs("tick\n");
   *TIMER0 = 1000000;
   *(TIMER0 + TIMER_CONTROL) = TIMER_EN | TIMER_ONESHOT | TIMER_32BIT;
}

We’re just printing out “tick” if the timer has reached 0, and then resetting and re-enabling the timer.

Build and run your kernel. It should still work as before, but now it will also print out “tick” once per second.

Back to the Vector Interrupt Table

The CPU uses the same mechanism for hardware interrupting operation (often referred to as an “interrupt request” or IRQ) as it used know where to jump when we used svc. We need to add an entry to the right place. Here’s the new code for bootstrap.s:

interrupt_table:
ldr pc, svc_entry_address
nop
nop
nop
ldr pc, irq_entry_address
svc_entry_address: .word svc_entry
irq_entry_address: .word irq_entry
interrupt_table_end:

IRQ entry point

Back over in context_switch.s, irq_entry is going to be very similar to svc_entry, but with some changes to handle an IRQ instead of a syscall.

As soon as we enter system mode, we need to push the value of r7 onto the task’s stack (because that’s what the syscall wrapper does, and we need to set things up to look the same). The syscall puts the number identifying the syscall into r7, so we should come up with something that will identify an IRQ and put that there instead. To make sure the number never clashes with a syscall number, lets use negative numbers:

push {r7} /* Just like in syscall wrapper */
/* Load PIC status */
ldr r7, PIC
ldr r7, [r7]
PIC: .word 0x10140000
/* Most significant bit index */
clz r7, r7
sub r7, r7, #31

That value at PIC is the address of the interrupt controller status (that is, the value that can tell us which interrupts are currently firing) and then we just do some operations on that value to find out what the most significant bit set in the value is, and transform that into a negative index to use for the identifier of this IRQ.

The only other weird thing is that we have to get the value of the lr from IRQ mode instead of Supervisor mode, and the value is set to the instruction after the one we should be returning to, so we’ll need to subtract 4:

msr CPSR_c, # /* IRQ mode */
mrs ip, SPSR
sub lr, lr, # /* lr is address of next instruction */
stmfd r0!, {ip,lr}

So those bits, together with what we already had from svc_entry give us this complete entry point code:

.global irq_entry
irq_entry:
	/* Save user state */
	msr CPSR_c, # /* System mode */

	push {r7} /* Just like in syscall wrapper */
	/* Load PIC status */
	ldr r7, PIC
	ldr r7, [r7]
	PIC: .word 0x10140000
	/* Most significant bit index */
	clz r7, r7
	sub r7, r7, #31

	push {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	mov r0, sp

	msr CPSR_c, # /* IRQ mode */
	mrs ip, SPSR
	sub lr, lr, # /* lr is address of next instruction */
	stmfd r0!, {ip,lr}

	msr CPSR_c, # /* Supervisor mode */

	/* Load kernel state */
	pop {r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	mov sp, ip
	bx lr

A Small Problem

We’re pushing the value of r7, but where is it going to get popped? The syscall wrappers do both the push and the pop, but in the case of an IRQ we’re pushing in the entry point, and we have no easy way to tell activate that it’s exiting from an IRQ instead of a syscall. Hmm.

The solution to this that I’ve chosen is to remove the pop instruction from the syscall wrappers, and add an identical instruction to right after the pop instruction already in activate. Now the pop always happens.

More Magic Numbers

We hardcoded a single magic number into our assembly, but luckily that’s the only one our assembly code will need. In fact, we should not need any more assembly! (Unless we wanted to implement other sorts of hardware exceptions, which we may get to.) We do need that and a few other interrupt-controller related magic numbers in our C code too, so here’s the stuff for versatilepb.h:

/* http://infocenter.arm.com/help/topic/com.arm.doc.dui0224i/I1042232.html */
# PIC ((volatile unsigned int*)0x10140000)
# PIC_TIMER01 0x10
/* http://infocenter.arm.com/help/topic/com.arm.doc.ddi0181e/I1006461.html */
# VIC_INTENABLE 0x4 /* 0x10 bytes */

Enable the Interrupt

We have to tell the timer to generate interrupts, and we have to tell the interrupt controller to pass those interrupts through. So, replace your current one-shot timer code with this:

*(PIC + VIC_INTENABLE) = PIC_TIMER01;

*TIMER0 = 1000000;
*(TIMER0 + TIMER_CONTROL) = TIMER_EN | TIMER_PERIODIC | TIMER_32BIT | TIMER_INTEN;

Note that we’re also using a new timer mode here. Periodic mode will automatically cause the timer to restart from the value we set when it hits 0, which is perfect for when we’re getting an interrupt instead of waiting until the value hits 0.

Handle the Interrupt

Now we have to add a case to our switch statement for this request:

case -4: /* Timer 0 or 1 went off */
	if(*(TIMER0 + TIMER_MIS)) { /* Timer0 went off */
		*(TIMER0 + TIMER_INTCLR) = 1; /* Clear interrupt */
		bwputs("tick\n");
	}

The way our hardware is set up, the interrupt is actually the same for either timer 0 or timer 1. Now, we know that we’ve only set timer 0, but that’s lazy. TIMER_MIS is the offset where we’ll find a flag telling us if Timer0 was the one that is causing the interrupt. After checking that, we set a value on the timer to clear the interrupt (basically to say that we’ve seen it so that it won’t call us again unitl the next interrupt). Finally, print out “tick”.

As it is, this code should operate exactly the same as the version without the interrupts.

Pre-emption

Up until now, we’ve needed the dummy syscall syscall in an infinite loop at the end of every task in order to let other tasks run. Since the timer interrupts the tasks now, we don’t need them to interrupt themselves. Remove all references to the syscall syscall and try running the kernel again. Now you’ll note that the first task just stays in its infinite loop and does not let the other task keep running. Does not, that is, until the timer goes off and you see “tick” printed for the first time, after which the tasks switch. So the tasks are now being properly interrupted even if they do not give up control!

We’re done!

The kernel now has all the machinery it needs to be able to handle hardware interrupts, and tasks are being pre-empted by the timer. Next time we’ll start looking at IPC.

The code so far is on GitHub.

Writing a Simple OS Kernel — Part 3, Syscalls

Posted on

Updated 2012-33 to fix a bug in the context switch assumptions.
Edited for clarity, 2012-34.

Last time, we got our kernel to set up space for a task, and run that task in user mode. This time we’re going to add a facility for the user mode task to call back into the kernel.

Syscalls

But wait! Didn’t I say last time that one of the things user mode tasks cannot do is change the CPU mode? Well, yes, they can’t change it directly. What they can do is trigger an event that will make the CPU switch to supervisor mode and then jump to some predefined bit of code. That way, there’s no security problem, only kernel code is running in supervisor mode, but the user mode task can still ask the kernel to do things for it.

The instruction that lets us do this on an ARMv6 CPU is svc (used to be called swi). It takes an immediate value as an “argument”, which it actually does nothing with. If the kernel wants to use that number for something (like specifying what the user mode task wants done), it has to read the number right out of the instruction in RAM. This is doable, but not always ideal, and so some kernels (such as modern Linux) actually just always use a zero, and then store information they want to pass elsewhere.

Vector Interrupt Table

The svc instruction actually causes an interrupt. Interrupts are signals that (when they’re enabled) cause the CPU to switch modes and jump to some predefined instruction. What instruction will it jump to? Well, ARM CPUs have something called the vector interrupt table that is at the very start of RAM. These locations are where it will jump to (each word, starting at 0x0, is the location of a particular sort of interrupt, up until there are no more interrupt types).

That may not seem very useful. We can only execute one instruction? Well, yes, but that instruction can jump us to somewhere more useful. Now, your first thought may be to put a branch instruction there. Great idea, but it won’t work. Branch instructions are calculated relative to their position in the linked binary. If we copy one of them to another location in RAM, the offset will be wrong. We need to use a load instruction to load an absolute value into the program counter. What value should we load? The address of a function in our kernel, of course! It turns out that the assembler contains a syntax for including the address of a function directly, which is then an absolute value and so does not move when we copy it.

Here’s what the new bootstrap.s looks like:

.global _start
_start:
	mov r0, #
	ldr r1, =interrupt_table
	ldr r3, =interrupt_table_end
keep_loading:
	ldr r2, [r1, #]
	str r2, [r0, #]
	add r0, r0, #
	add r1, r1, #
	cmp r1, r3
	bne keep_loading

	ldr sp, =0x07FFFFFF
	bl main

interrupt_table:
	ldr pc, svc_entry_address
	svc_entry_address: .word svc_entry
interrupt_table_end:

Why do we need to copy two instructions? Well, even that load instruction is loading from a relative address. Luckily if we move them both then the relative position remains the same. This may look a bit complicated, and that’s because it is. You could just copy the two words directly across, but this way we have a loop that copies everything from our interrupt table section, so we can easily add other interrupt handlers to it later. You’ll note we started at 0x8 instead of 0x0, because that’s where the SVC handler is, so if we want to add ones that come before 0x8, we just have to remember to change the base address at the start. With some hacks we could do the copying part in C, but for this example I decided that keeping all the bits for this in assembly was easiest.

Syscall Wrapper

We need a way for our C code to call the svc instruction. Since we don’t have anything we really need to pass through, we’ll just add a dummy wrapper for now.

Create a new file called syscalls.s and add it to the Makefile as a dependency of kernel.elf. Put this in it:

.global syscall
syscall:
	svc 0
	bx lr

Pretty simple. It just activates the svc, and then when it comes back jumps to the caller (note that this assumes the registers it had before it called svc are reset before it comes back).

You’ll also need to add a line to asm.h:

void syscall(void);

Save Kernel State

Now, let’s think about what we want the kernel to do when it actually gets control back. We’d like to go back to the point in our kernel code we left off when calling activate in the first place. The problem is, we’ve not got any clue where that was! In loading the user mode task state, we did not keep around any information about what state the kernel was in. Let’s add that to activate now:

mov ip, sp
push {r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}

Put that at the top of activate, before we begin messing about with the registers. That’s all the kernel state we need to save!

SVC Entry

Alright, we’re now ready to define our first version of the svc_entry function. All we really want to go at this point is get the kernel back into a state where it can run, and then return to our code. We’ll put this in context_switch.s:

.global svc_entry
svc_entry:
	pop {r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	mov sp, ip
	bx lr

Just reverse the save we just did, and jump back to where we came from in the kernel. How is lr where we came from? Well, you’ll have noted that when we call functions (and the C compiler does this as well), we use bl, which saves the address of the instruction after itself in lr before jumping to the function.

Alright, add a call to syscall to the end of first and then add another bwputs call after the activate call in main, build, and run. You should see your new message printed last.

Code so far on GitHub

Heading Back to User Mode

Add another print and another syscall to first, and another print and call to activate to main. Compile and run your code. What do you see?

When we call activate again, the user mode task re-starts at the beginning! That’s not what we want, but it makes sense. We never saved our place inside the user mode task. In fact, we don’t even have a reference to where its stack was when it called the syscall, so we’re just going back with the stack from before. That’s not going to work. We need to add some code to our SVC entry to save the user mode task’s state. If you recall the way we set up the task before, you’ll now see why. The way the stack looks after we save our state on it is exactly the same as the way we set it up! Everything will be where activate expects it:

msr CPSR_c, # /* System mode */
push {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
mov r0, sp
msr CPSR_c, # /* Supervisor mode */

mrs ip, SPSR
stmfd r0!, {ip,lr}

There, all the registers are on the user stack. We use stmfd which is just like push, but lets us operate on another register. You’ll note we had to save both versions of lr. One is state for the syscall wrapper to use, the other is the address inside the syscall wrapper we need to jump back to later. Now we just need some way to get the new location of the top of the stack back to the kernel.

Conveniently, our C code expects the return value of a function to be in r0, which is where I’ve put the user mode sp in this example. Just change the definition of activate in asm.h to return the right type, and then assign the return value somewhere and pass that to the next call to activate. You can now keep calling into your user mode task as many times as you want!

That’s It!

We now have a kernel that can start a user mode task, and switch back and forth between the kernel and the user mode task at will. Next time we’ll look at multitasking and maybe some other stuff.

Code for this post is on GitHub.