Singpolyma

Archive of "define"

Archive for the "define" Category

Writing a Simple OS Kernel — Part 7, Serial Port Driver

Posted on

Last time we got IPC working, and used it to build a simple path-resolution module. This time we’re going to write a fully-functional serial port driver.

Generic Interrupts

When we got interrupts working, we sort of hacked the kernel part in to support the timer, which is all we were interested in at the time. Let’s make that a bit more generic. We’re going to need one more syscall, since interrupts have to be handled by the kernel. The simplest one we can implement just allows a process to wait for an interrupt to fire on a particular line:

void interrupt_wait(int intr);

.global interrupt_wait
interrupt_wait:
	push {r7}
	mov r7, #
	svc 0
	bx lr

# TASK_WAIT_INTR 3

case 0x5: /* interrupt_wait */
	/* Enable interrupt */
	*(PIC + VIC_INTENABLE) = tasks[current_task][2+0];
	/* Block task waiting for interrupt to happen */
	tasks[current_task][-1] = TASK_WAIT_INTR;
	break;

We are already enabling the timer interrupt when the kernel starts. Here, we enable the interrupt that the process is asking to wait on, since if it is not enabled the task will block forever, so this seems like a useful place to do it.

We also want to unblock tasks when the interrupt fires. Get rid of the -4 case that we have hardcoded for the timer and put this in:

default: /* Catch all interrupts */
	if((int)tasks[current_task][2+7] < 0) {
		unsigned int intr = (1 < < -tasks[current_task][2+7]);

		if(intr == PIC_TIMER01) {
			/* Never disable timer. We need it for pre-emption */
			if(*(TIMER0 + TIMER_MIS)) { /* Timer0 went off */
				*(TIMER0 + TIMER_INTCLR) = 1; /* Clear interrupt */
			}
		} else {
			/* Disable interrupt, interrupt_wait re-enables */
			*(PIC + VIC_INTENCLEAR) = intr;
		}
		/* Unblock any waiting tasks
			XXX: nondeterministic unblock order
		*/
		for(i = 0; i < task_count; i++) {
			if(tasks[i][-1] == TASK_WAIT_INTR && tasks[i][2+0] == intr) {
				tasks[i][-1] = TASK_READY;
			}
		}
	}

You’ll notice we also need a new magic number for versatilepb.h:

# VIC_INTENCLEAR 0x5 /* 0x14 bytes */

Code for this section on Github.

Output Driver

Let’s use this new ability to implement half of our driver: output. This will allow us to write to a specific fifo for output to the serial port instead of using bwputs.

A refresher of serial port related magic numbers:

/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0224i/Bbabegge.html */
# UART0 ((volatile unsigned int*)0x101f1000)
/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0183g/I18381.html */
# UARTFR 0x06 /* 0x18 bytes */
# UARTIMSC 0x0E /* 0x38 bytes */
# UARTFR_TXFF 0x20
# UARTIMSC_TXIM 0x20

# PIC_UART0 0x1000

Let’s make a generic driver that accepts parameters telling it what serial port to drive:

void serialout(volatile unsigned int* uart, unsigned int intr) {
	int fd;
	char c;

We need to create the fifo that processes send bytes on:

	mkfifo("/dev/tty0/out", 0);
	fd = open("/dev/tty0/out", 0);

We also need to tell the UART to trigger an interrupt when it becomes ready to send a byte:

	*(uart + UARTIMSC) |= UARTIMSC_TXIM;

And finally a loop to read bytes and send them to the UART:

	while(1) {
		read(fd, &c, 1);
		interrupt_wait(intr);
		*uart = c;
	}

This seems pretty reasonable, but there is a problem. The UART triggers an interrupt when it becomes ready to send a byte. If it is already ready (such as at boot time), the interrupt will not trigger. So we try this:

	while(1) {
		read(fd, &c, 1);
		*uart = c;
		interrupt_wait(intr);
	}

This is closer, but in both of these versions there is a race condition where the UART might not actually be ready anymore by the time we process the interrupt (depending on what other tasks are doing), so let’s add a check for that:

	while(1) {
		read(fd, &c, 1);
		if(!(*(uart + UARTFR) & UARTFR_TXFF)) {
			*uart = c;
		}
		interrupt_wait(intr);
	}

Now, however, if the UART is not ready, we will skip bytes. So we only want to read a new byte if we actually wrote the current one:

	int doread = 1;
	while(1) {
		if(doread) read(fd, &c, 1);
		doread = 0;
		if(!(*(uart + UARTFR) & UARTFR_TXFF)) {
			*uart = c;
			doread = 1;
		}
		interrupt_wait(intr);
	}

To test this out, let’s set up a new first:

void first(void) {
	int fd;

	if(!fork()) pathserver();
	if(!fork()) serialout(UART0, PIC_UART0);

	fd = open("/dev/tty0/out", 0);
	write(fd, "woo\n", sizeof("woo\n"));
	write(fd, "thar\n", sizeof("thar\n"));
	while(1);
}

Our current pre-emption speed will make this a bit of a pain to run, so you can set the clock value in main to something lower.

We can now print to the serial port by writing to a fifo!

Code for this section on Github.

Input Driver

Some new magic numbers:

# UARTICR 0x11 /* 0x44 bytes */
# UARTFR_RXFE 0x10
# UARTIMSC_RXIM 0x10
# UARTICR_RXIC 0x10
# UARTICR_TXIC 0x20

The input driver will be another process that waits on the same interrupt. Unfortunately, because this one interrupt actually can signal one of several things, if two of them are true, and different processes work on processing them, the interrupt will fire constantly and prevent any work from getting done. We solve this by clearing the more specific interrupt in the drivers where we handle them. For example, in serialout:

*(uart + UARTICR) = UARTICR_TXIC;

Next, start with some setup:

void serialin(volatile unsigned int* uart, unsigned int intr) {
	int fd;
	char c;
	mkfifo("/dev/tty0/in", 0);
	fd = open("/dev/tty0/in", 0);

	/* enable RX interrupt on UART */
	*(uart + UARTIMSC) |= UARTIMSC_RXIM;

	while(1) {

We want to wait on the interrupt, and then immidiately clear the specific interrupt we handle, just like in the output driver:

	interrupt_wait(intr);
	*(uart + UARTICR) = UARTICR_RXIC;

And we want to make sure, not only that no race condition has changed the UART status, but that the interrupt was meant for us at all:

	if(!(*(uart + UARTFR) & UARTFR_RXFE)) {

And then just read the byte and put it in our fifo:

	c = *uart;
	write(fd, &c, 1);

This input server has a bit of a bug. If the fifo is ever full, it will block, and maybe miss input bytes from the serial port. Our processes are likely to be way faster than any serial port, but still, in a real kernel you would want to account for this by splitting the driver into two processes: one that waits on the interrupt, and one that buffers the bytes internally.

Code for this section on Github.

Echo Application

To tie all this up, we write a simple application that will echo back every character you type:

void echo(void) {
	int fdout, fdin;
	char c;
	fdout = open("/dev/tty0/out", 0);
	fdin = open("/dev/tty0/in", 0);

	while(1) {
		read(fdin, &c, 1);
		write(fdout, &c, 1);
	}
}

Code for this section on Github.

We’re done!

We now have a working serial port driver that respects the hardware by using interrupts instead of burning CPU time looping on flag checks. Remember, any interactios with the UART will affect these drivers, so using bwputs may have strange side effects now.

To be honest, this is about as far as I expected to get when I originally started this series. I could write a clock driver, to allow for sleep calls and similar, but you should be able to see how that might work. One big topic I have avoided thus far is enabling the MMU and doing virtual memory / memory protection. I may try to figure out how that works on this harware, but it will require quite a bit of doing. Suggestions for others things you would like to see in this series go in the comments!

Code so far on Github.

Writing a Simple OS Kernel — Part 6, Inter-Process Communication

Posted on

Last time we got hardware interrupts working, and used them to implement preemption for our multitasking. This time we’re going to get our user space processes talking to one another.

getpid

We’re going to need some way for replies to our messages to get sent back to a process. We can’t use whatever other mechanisms we build for registering a new communications channel for this, because without it we cannot receive communications, even about a new communications channel. Luckily we have a piece of information that comes with every process: its ID. In our case, the ID of a task is going to just be its index in the array of stacks in our main.

So the normal drill for a new syscall in the case statement in main:

case 0x2: /* getpid */
	tasks[current_task][2+0] = current_task;
	break;

And in syscall.s:

.global getpid
getpid:
	push {r7}
	mov r7, #
	svc 0
	bx lr

And in asm.h:

int getpid(void);

Removed the accidentally shared static variable. Do not use statics without a virtual memory system!

Code for this section on Github.

Putting processes to sleep

Currently our scheduler just round-robins through all tasks, but if we want a task to be able to wait for a message, we need some way for it to go to sleep while it waits. We will do this by having a new piece of metadata in our process space for “status”:

# TASK_READY       0
# TASK_WAIT_READ   1
# TASK_WAIT_WRITE  2

On coming out of a task (right after the activate call in main) it is definitely ready:

tasks[current_task][-1] = TASK_READY;

Then replace the two lines at the end of the while loop in main with a new scheduler that skips over tasks that aren’t ready:

/* Select next TASK_READY task */
while(TASK_READY != tasks[current_task =
	(current_task+1 >= task_count ? 0 : current_task+1)][-1]);

Now if any syscall sets the status to something else, the process will sleep until woken up.

Code for this section on Github.

Ringbuffers

We’re going to do our communication between processes using a rudimentary implementation of UNIX-style pipes. The pipes will be implemented using ringbuffers. Why? Because ringbuffers are easy to use for FIFO-style communication without needing any dynamic memory allocation. Pipes are allowed to get “full” and luckily we can tell when a ringbuffer is full, so this fits. We’re going to want some generic ringbuffer utilities, and it won’t hurt to have versions specialised to the pipe buffers:

# STACK_SIZE 1024 /* Size of task stacks in words */
# TASK_LIMIT 2   /* Max number of tasks we can handle */
# PIPE_BUF   512 /* Size of largest atomic pipe message */
# PIPE_LIMIT (TASK_LIMIT*5)

struct pipe_ringbuffer {
	int start;
	int end;
	char data[PIPE_BUF];
};

# RB_PUSH(rb, size, v) do { \
		(rb).data[(rb).end] = (v); \
		(rb).end++; \
		if((rb).end > size) (rb).end = 0; \
	} while(0)

# RB_POP(rb, size, v) do { \
		(v) = (rb).data[(rb).start]; \
		(rb).start++; \
		if((rb).start > size) (rb).start = 0; \
	} while(0)

# RB_LEN(rb, size) (((rb).end - (rb).start) + \
	(((rb).end < (rb).start) ? size : 0))

# PIPE_PUSH(pipe, v) RB_PUSH((pipe), PIPE_BUF, (v))
# PIPE_POP(pipe, v)  RB_POP((pipe), PIPE_BUF, (v))
# PIPE_LEN(pipe)     (RB_LEN((pipe), PIPE_BUF))

You’ll note I’ve increased the stack size. We’re going to be handling more data, and with our current setup if you need more space than the stack size it’s just going to cause very strange and hard-to-find bugs. The pipe limit is expressed in terms of the number of tasks. This makes sense since as the number of tasks increases, it is likely that the number of channels we’ll need increases, especially since we’re going to reserve one per task to begin with.

Now we just need to allocate our pipes in main:

struct pipe_ringbuffer pipes[PIPE_LIMIT];

and initialize them all:

size_t i;
/* Initialize all pipes */
for(i = 0; i < PIPE_LIMIT; i++) {
	pipes[i].start = pipes[i].end = 0;
}

Code for this section on Github.

Reading and writing

Now for the meaty syscalls. Some boilerplate first:

int write(int fd, const void *buf, size_t count);
int read(int fd, void *buf, size_t count);

.global write
write:
	push {r7}
	mov r7, #
	svc 0
	bx lr
.global read
read:
	push {r7}
	mov r7, #
	svc 0
	bx lr
case 0x3: /* write */
	_write(tasks[current_task], tasks, task_count, pipes);
	break;
case 0x4: /* read */
	_read(tasks[current_task], tasks, task_count, pipes);
	break;

You’ll note that I’ve placed the bodies of these syscalls off in their own functions. That’s partly because they’re going to be much bigger than what we’ve seen before, but also because they actually need to be able to call each other.

So, definition before use:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);
void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);

These are a bit long and maybe non-obvious, so one piece at a time:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	task[-1] = TASK_READY;

In case there’s an error, make sure we don’t stay blocked forever.

/* If the fd is invalid, or trying to read too much  */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

An out-of-range pipe index is obviously an error. And the ringbuffers can’t hold more than PIPE_BUF bytes, so it’s impossible to read more than that at once.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];
		if((size_t)PIPE_LEN(*pipe) < task[2+2]) {
			/* Trying to read more than there is: block */
			task[-1] = TASK_WAIT_READ;

Ah, our first actual blocking call! We get the ringbuffer for the pipe that is being read from, and if it has less bytes in it than the task wants to read, set the task into the TASK_WAIT_READ state.


		} else {
			size_t i;
			char *buf = (char*)task[2+1];
			/* Copy data into buf */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_POP(*pipe,buf[i]);
			}

We have a valid pipe with enough bytes in the buffer. Pop each byte off into the read buffer until we have read as many as were requested.


			/* Unblock any waiting writes
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_WRITE) {
					_write(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Loop over all the tasks, and any that were waiting to write, unblock them and try the write again. This is somewhat inefficient, since only tasks that were waiting to write to the pipe we just read from have any chance at all of being able to succeed (all the other pipes are just as full as they used to be), but it works.


void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	/* If the fd is invalid or the write would be non-atomic */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

Again, an out-of-range pipe index is obviously an error. And because of the ringbuffers, you’ll never be able to write more than PIPE_BUF bytes into a pipe all at once. We want our writes to be atomic (because we have preemptive multitasking going on), so trying to write more than PIPE_BUF bytes needs to be an error as well.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];

		if((size_t)PIPE_BUF - PIPE_LEN(*pipe) < 
			task[2+2]) {
			/* Trying to write more than we have space for: block */
			task[-1] = TASK_WAIT_WRITE;

If the write would put in more bytes than the pipe currently has space for, then block waiting for a read to make room for us.


		} else {
			size_t i;
			const char *buf = (const char*)task[2+1];
			/* Copy data into pipe */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_PUSH(*pipe,buf[i]);
			}

We have a valid write and there is enough space in the pipe. Push all the bytes from the write buffer into the pipe.


			/* Unblock any waiting reads
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_READ) {
					_read(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Same as before, this is inefficient. Only tasks waiting to read from pipes we just wrote to are really interested in waking up. But it doesn’t take too long to check, and it’s easier for now.

Code for this section on Github.

Pathserver

So, we have working read and write calls now, but in order for processes to communicate they’re going to need to agree on what pipe indices to use. A parent process can get its child’s PID (from fork) and write to it that way, maybe, but that’s not a generic solution.

We’re going to solve this problem by building a nameserver. The names in question are going to be pseudo-filesystem “paths”, so let’s call it a pathserver. We’ll need some string functions:

int strcmp(const char* a, const char* b) {
	int r = 0;
	while(!r && *a && *b) r = (*a++) - (*b++);
	return (*a) - (*b);
}

size_t strlen(const char *s) {
	size_t r = 0;
	while(*s++) r++;
	return r;
}

And since everything is static sized for us, how long can a path be?

# PATH_MAX 255 /* Longest absolute path */

Oh, and we need a way to determine what the pipe index of the pathserver itself is. Otherwise this whole concept is a nonstarter.

# PATHSERVER_FD (TASK_LIMIT+3) /* File descriptor of pipe to pathserver */

We said we’re reserving one pipe for each task for replies, and let’s also reserve 0-2, since those file descriptor numbers are usually “magic” and we don’t want to have any mix-ups until we get real FD tables in place later. The pathserver will manage all other pipes, so it can assign itself the first one. Now we have a constant that tells us the pipe index of the pathserver.

We can use this information to define a protocol and pseudo-syscall for registering a new path:

int mkfifo(const char *pathname, int mode) {
	size_t plen = strlen(pathname)+1;
	char buf[4+4+PATH_MAX];
	(void)mode;

	*((unsigned int*)buf) = 0;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);

	/* XXX: no error handling */
	return 0;
}

Notice I said pseudo-syscall. It may act sort of like a syscall, but in fact it is built entirely based on primitives we already have! This is one of the huge benefits of a microkernel: once you get your basic functionality in place, most things can be built on top of what you already have, as wrappers.

The protocol here is a zero, followed by the length of the path we’re registering, followed by the characters of that path. Pretty simple. There’s no way for the pathserver to signal an error to us, because we don’t wait for a reply, but unless there are no more pipes this can’t fail, so we’ll just leave it for now.

int open(const char *pathname, int flags) {
	unsigned int replyfd = getpid() + 3;
	size_t plen = strlen(pathname)+1;
	unsigned int fd = -1;
	char buf[4+4+PATH_MAX];
	(void)flags;

	*((unsigned int*)buf) = replyfd;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);
	read(replyfd, &fd, 4);

	return fd;
}

The protocol here looks the same as before, only instead of 0 as the first word, we send our reserved pipe. That’s our PID, moved ahead by the three we’re reserving. Conveniently, this means no valid replyfd will ever be 0, so we can differentiate between mkfifo and open.

Now for the pathserver itself, piece by piece.

void pathserver(void) {
	char paths[PIPE_LIMIT - TASK_LIMIT - 3][PATH_MAX];
	int npaths = 0;
	int i = 0;
	unsigned int plen = 0;
	unsigned int replyfd = 0;
	char path[PATH_MAX];

	memcpy(paths[npaths++], "/sys/pathserver", sizeof("/sys/pathserver"));

Need to store the paths of the pipes. We’ll just use an array and a linear lookup for now. We also register the first pipe as being the pathserver.


	while(1) {
		read(PATHSERVER_FD, &replyfd, 4);
		read(PATHSERVER_FD, &plen, 4);
		read(PATHSERVER_FD, path, plen);

Loop forever, reading the three pieces of data out of our pipe. Recall that we wrote all of this data with a single write call. That’s important because we need all these bytes together, so the writes must be atomic. Once we’re reading, however, the bytes are already in a deterministic order, so we can read them out in pieces. This is especially good, because we wrote the length of the message as part of the message! So we certainly need to be able to read one piece at a time.


		if(!replyfd) { /* mkfifo */
			memcpy(paths[npaths++], path, plen);

Registering a new path is super easy. Should probably check for out-of-bounds here, but this works for now.


		} else { /* open */
			/* Search for path */
			for(i = 0; i < npaths; i++) {
				if(*paths[i] && strcmp(path, paths[i]) == 0) {
					i += 3; /* 0-2 are reserved */
					i += TASK_LIMIT; /* FDs reserved for tasks */
					write(replyfd, &i, 4);
					i = 0;
					break;
				}
			}

Linear search for the path. If we find it, adjust the index by the reserved indices and write the reply. We ignore empty paths, to allow us the possibility of pipes without names that the pathserver knows about, but you can’t usefully search for. Notice also that we set i back to zero after writing a reply, because:


			if(i >= npaths) {
				i = -1; /* Error: not found */
				write(replyfd, &i, 4);
			}
		}
	}
}

If we shoot off the end, then i >= npaths, in which case we didn’t find it. Well, don’t let the caller just block forever waiting for a reply that’s not coming! Send them -1.

Now we’ll just tie this all together with some tasks that use this new functionality:

# TASK_LIMIT 3

void otherguy() {
	int fd;
	unsigned int len;
	char buf[20];
	mkfifo("/proc/0", 0);
	fd = open("/proc/0", 0);
	while(1) {
		read(fd, &len, 4);
		read(fd, buf, len);
		bwputs(buf);
	}
}

void first(void) {
	int fd;

	if(!fork()) pathserver();
	if(!fork()) otherguy();

	fd = open("/proc/0", 0);
	while(1) {
		int len = sizeof("Ping\n");
		char buf[sizeof("Ping\n")+4];
		memcpy(buf, &len, 4);
		memcpy(buf+4, "Ping\n", len);
		write(fd, buf, len+4);
	}
}

We spin up the pathserver and the otherguy. The otherguy just waits for incoming strings (19 characters max, but it’s just a demo) and prints any that come in. Then, our first task run in a loop forever sending Ping to the otherguy, and they get printed.

We just built a rudimentary serial port driver! It’s not using interrupts properly yet, but at least strings send to the otherguy are printed atomically, because the otherguy “owns” the serial port and we don’t have all the tasks preempting each other in the middle of a write.

Code for this section on Github.

We’re done!

We now have all the machinery we need in order to send messages between processes. Next time, we’ll look at using all of this to build a more fully-functional serial port driver.

The code so far is on Github.

Writing a Simple OS Kernel — Part 5, Hardware Interrupts

Posted on

Last time we got the kernel switching back and forth between multiple tasks. This time we’re going to get a simple timer to wake up the kernel at fixed intervals.

Motivation

So far we’ve been interacting with hardware when we want to. Writing to the serial port when we have something to print, for example. Why would we want to let the hardware interrupt what we’re doing?

A few reasons:

  1. Currently, our user mode tasks have to call a syscall on purpose in order for any other task to be able to run. This means one task can run forever and block anyone else from running (called starvation). So we at minimum need some way to interrupt running tasks (called preemption).
  2. The hardware knows when it’s ready. It is way more efficient to let the hardware say “I’m ready now” then to keep asking it.
  3. The hardware may actually stop being ready by the time we get around to talking to it again. We want to talk to the hardware during the window when it’s able to talk

Enabling a Simple Timer

Before we get to the serial port, there is an even simpler piece of hardware we can work with first: timers. Timers just take a value and count down to zero, sort of like an alarm clock.

I could make you go look up in the user guides what all the magic numbers for our timers are, but I’m a nice guy, so here’s the ones we need to add to versatilepb.h:

/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/index.html */
# TIMER0 ((volatile unsigned int*)0x101E2000)
# TIMER_VALUE 0x1 /* 0x04 bytes */
# TIMER_CONTROL 0x2 /* 0x08 bytes */
# TIMER_INTCLR 0x3 /* 0x0C bytes */
# TIMER_MIS 0x5 /* 0x14 bytes */
/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0271d/Babgabfg.html */
# TIMER_EN 0x80
# TIMER_PERIODIC 0x40
# TIMER_INTEN 0x20
# TIMER_32BIT 0x02
# TIMER_ONESHOT 0x01

Now add the following somewhere near the top of your main:

*TIMER0 = 1000000;
*(TIMER0 + TIMER_CONTROL) = TIMER_EN | TIMER_ONESHOT | TIMER_32BIT;

The timer on our QEMU system defaults to using a 1Mhz reference, which means that it ticks 1000000 times per second. So, by setting it’s value to 1000000, it will reach 0 after one second.

Then, on the second line, we enable the timer, set it to “oneshot” mode (that is, it will just stop when it’s done counting, we’ll see another mode later), and set it to 32BIT mode (so that it can hold really big numbers).

Now, somewhere in your while loop, put this:

if(!*(TIMER0+TIMER_VALUE)) {
   bwputs("tick\n");
   *TIMER0 = 1000000;
   *(TIMER0 + TIMER_CONTROL) = TIMER_EN | TIMER_ONESHOT | TIMER_32BIT;
}

We’re just printing out “tick” if the timer has reached 0, and then resetting and re-enabling the timer.

Build and run your kernel. It should still work as before, but now it will also print out “tick” once per second.

Back to the Vector Interrupt Table

The CPU uses the same mechanism for hardware interrupting operation (often referred to as an “interrupt request” or IRQ) as it used know where to jump when we used svc. We need to add an entry to the right place. Here’s the new code for bootstrap.s:

interrupt_table:
ldr pc, svc_entry_address
nop
nop
nop
ldr pc, irq_entry_address
svc_entry_address: .word svc_entry
irq_entry_address: .word irq_entry
interrupt_table_end:

IRQ entry point

Back over in context_switch.s, irq_entry is going to be very similar to svc_entry, but with some changes to handle an IRQ instead of a syscall.

As soon as we enter system mode, we need to push the value of r7 onto the task’s stack (because that’s what the syscall wrapper does, and we need to set things up to look the same). The syscall puts the number identifying the syscall into r7, so we should come up with something that will identify an IRQ and put that there instead. To make sure the number never clashes with a syscall number, lets use negative numbers:

push {r7} /* Just like in syscall wrapper */
/* Load PIC status */
ldr r7, PIC
ldr r7, [r7]
PIC: .word 0x10140000
/* Most significant bit index */
clz r7, r7
sub r7, r7, #31

That value at PIC is the address of the interrupt controller status (that is, the value that can tell us which interrupts are currently firing) and then we just do some operations on that value to find out what the most significant bit set in the value is, and transform that into a negative index to use for the identifier of this IRQ.

The only other weird thing is that we have to get the value of the lr from IRQ mode instead of Supervisor mode, and the value is set to the instruction after the one we should be returning to, so we’ll need to subtract 4:

msr CPSR_c, # /* IRQ mode */
mrs ip, SPSR
sub lr, lr, # /* lr is address of next instruction */
stmfd r0!, {ip,lr}

So those bits, together with what we already had from svc_entry give us this complete entry point code:

.global irq_entry
irq_entry:
	/* Save user state */
	msr CPSR_c, # /* System mode */

	push {r7} /* Just like in syscall wrapper */
	/* Load PIC status */
	ldr r7, PIC
	ldr r7, [r7]
	PIC: .word 0x10140000
	/* Most significant bit index */
	clz r7, r7
	sub r7, r7, #31

	push {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	mov r0, sp

	msr CPSR_c, # /* IRQ mode */
	mrs ip, SPSR
	sub lr, lr, # /* lr is address of next instruction */
	stmfd r0!, {ip,lr}

	msr CPSR_c, # /* Supervisor mode */

	/* Load kernel state */
	pop {r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	mov sp, ip
	bx lr

A Small Problem

We’re pushing the value of r7, but where is it going to get popped? The syscall wrappers do both the push and the pop, but in the case of an IRQ we’re pushing in the entry point, and we have no easy way to tell activate that it’s exiting from an IRQ instead of a syscall. Hmm.

The solution to this that I’ve chosen is to remove the pop instruction from the syscall wrappers, and add an identical instruction to right after the pop instruction already in activate. Now the pop always happens.

More Magic Numbers

We hardcoded a single magic number into our assembly, but luckily that’s the only one our assembly code will need. In fact, we should not need any more assembly! (Unless we wanted to implement other sorts of hardware exceptions, which we may get to.) We do need that and a few other interrupt-controller related magic numbers in our C code too, so here’s the stuff for versatilepb.h:

/* http://infocenter.arm.com/help/topic/com.arm.doc.dui0224i/I1042232.html */
# PIC ((volatile unsigned int*)0x10140000)
# PIC_TIMER01 0x10
/* http://infocenter.arm.com/help/topic/com.arm.doc.ddi0181e/I1006461.html */
# VIC_INTENABLE 0x4 /* 0x10 bytes */

Enable the Interrupt

We have to tell the timer to generate interrupts, and we have to tell the interrupt controller to pass those interrupts through. So, replace your current one-shot timer code with this:

*(PIC + VIC_INTENABLE) = PIC_TIMER01;

*TIMER0 = 1000000;
*(TIMER0 + TIMER_CONTROL) = TIMER_EN | TIMER_PERIODIC | TIMER_32BIT | TIMER_INTEN;

Note that we’re also using a new timer mode here. Periodic mode will automatically cause the timer to restart from the value we set when it hits 0, which is perfect for when we’re getting an interrupt instead of waiting until the value hits 0.

Handle the Interrupt

Now we have to add a case to our switch statement for this request:

case -4: /* Timer 0 or 1 went off */
	if(*(TIMER0 + TIMER_MIS)) { /* Timer0 went off */
		*(TIMER0 + TIMER_INTCLR) = 1; /* Clear interrupt */
		bwputs("tick\n");
	}

The way our hardware is set up, the interrupt is actually the same for either timer 0 or timer 1. Now, we know that we’ve only set timer 0, but that’s lazy. TIMER_MIS is the offset where we’ll find a flag telling us if Timer0 was the one that is causing the interrupt. After checking that, we set a value on the timer to clear the interrupt (basically to say that we’ve seen it so that it won’t call us again unitl the next interrupt). Finally, print out “tick”.

As it is, this code should operate exactly the same as the version without the interrupts.

Pre-emption

Up until now, we’ve needed the dummy syscall syscall in an infinite loop at the end of every task in order to let other tasks run. Since the timer interrupts the tasks now, we don’t need them to interrupt themselves. Remove all references to the syscall syscall and try running the kernel again. Now you’ll note that the first task just stays in its infinite loop and does not let the other task keep running. Does not, that is, until the timer goes off and you see “tick” printed for the first time, after which the tasks switch. So the tasks are now being properly interrupted even if they do not give up control!

We’re done!

The kernel now has all the machinery it needs to be able to handle hardware interrupts, and tasks are being pre-empted by the timer. Next time we’ll start looking at IPC.

The code so far is on GitHub.

Writing a Simple OS Kernel — Part 4, Multitasking

Posted on

Added a description of fork, 2012-36.

Last time we got our kernel switch back and forth between a user mode task. The focus in this post is going to be adding the ability to run multiple tasks.

Hack Two Tasks in There

Our kernel currently has no way for tasks to exit, so we need to make sure they never try to return, since that will probably cause an error and hang the system (in the best case). So change the syscall invocation at the end of first to be in an infinite loop, such that if the task gets re-activated after it is done it just immediately calls back into the kernel.

Then, add a second function so that we have something else to run:

void task(void) {
	bwputs("In other task\n");
	while(1) syscall();
}

We’re going to run two tasks, so change the first_stack and first_stack_start to something like:

unsigned int stacks[2][256];
unsigned int *tasks[2];

And setup both tasks, in basically the same way as we did before:

tasks[0] = stacks[0] + 256 - 16;
tasks[0][0] = 0x10;
tasks[0][1] = (unsigned int)&first;

tasks[1] = stacks[1] + 256 - 16;
tasks[1][0] = 0x10;
tasks[1][1] = (unsigned int)&task;

Then, call activate with tasks[0] and tasks[1] instead of first_stack_start. Recompile and run, you should see both tasks running in the order that you activate them.

Some abstractions

We’ve hardcoded a bunch of stuff again. We should clean it up so that we don’t have to keep copying these chunks every time we want a new task. Let’s move those magic numbers into constants and make a function to set up a new task:

# STACK_SIZE 256 /* Size of task stacks in words */
# TASK_LIMIT 2   /* Max number of tasks we can handle */

unsigned int *init_task(unsigned int *stack, void (*start)(void)) {
	stack += STACK_SIZE - 16; /* End of stack, minus what we're about to push */
	stack[0] = 0x10; /* User mode, interrupts on */
	stack[1] = (unsigned int)start;
	return stack;
}

Then, back in main clean up the task initialisation to use these abstractions:

unsigned int stacks[TASK_LIMIT][STACK_SIZE];
unsigned int *tasks[TASK_LIMIT];

tasks[0] = init_task(stacks[0], &first);
tasks[1] = init_task(stacks[1], &task);

Scheduling

Scheduling (deciding what tasks to run, when, and for how long, in a multitasking system) is a big topic. We are going to implement the simplest possible (in our situation) schedule: round-robin. This means that we’ll just activate each task in the order that they were created, and then go back to the beginning and activate each in sequence again.

Since we have no way for tasks to exit, make sure the following is at the end of each task:

while(1) syscall();

This will just cause the task to repeatedly call back into the kernel when it is done.

We’ll also need to keep track in main of how many tasks there are and which one we’re currently running:

size_t task_count = 0;
size_t current_task = 0;

In order to use size_t you’ll need to include stddef.h. Now just make sure that task_count gets set correctly, and replace the code that activates user tasks with this:

while(1) {
	tasks[current_task] = activate(tasks[current_task]);
	current_task++;
	if(current_task >= task_count) current_task = 0;
}

This just activates each task in order, repeatedly, forever. Fairly simple.

You may be wondering why I didn’t use the classic current_task = (current_task + 1) % task_count trick. The reason is that, at least on ARMv6, gcc actually generates a library call for the % operator. You could try linking in libgcc, but there are some problems. For this case, it just wasn’t worth it, and so I’m using an if statement instead.

Code so far on GitHub

Setup for new Syscall

So, our syscall syscall is cute, but it’s not very useful. For good multitasking, we want a way for a user mode task to create new tasks. We’ll do this the way Unix does: fork. This syscall copies the current process, and then returns the ID of the new process to the parent, and 0 to the child.

We need a way for the kernel to know what the user mode task wants it to do. As I mentioned in a previous post, this is what the “argument” to svc is for. We could add code to our context switch to mask the bits off of the end of the svc instruction, but let’s not bother our context switch now. We’re going to store the id of our syscall in a register. syscalls.s is now:

.global syscall
syscall:
	push {r7}
	mov r7, #
	svc 0
	pop {r7}
	bx lr

.global fork
fork:
	push {r7}
	mov r7, #
	svc 0
	pop {r7}
	bx lr

That’s two syscalls, and the context switch will save the new value of r7 on the top of the stack, which we can read in the kernel. The actual value of r7 gets saved and restored by the syscall wrapper.

In order to call our new syscall, we’ll need to add the following to asm.h:

int fork(void);

memcpy

As you may have guessed, we’re going to implement the fork syscall, in order to make our multitasking more actually useful. To do that, we’re going to need to copy stuff. Because we’re not linking in libc at this point, we need our own memcpy, like so:

void *memcpy(void *dest, const void *src, size_t n) {
	char *d = dest;
	const char *s = src;
	size_t i;
	for(i = 0; i < n; i++) {
		d[i] = s[i];
	}
	return d;
}

Forking

We’re going to use fork to spawn task from first. So, move task up above first, and replace the first call to syscall with:

if(!fork()) task();

Then, modify main so that only first gets set up to run.

The Actual Syscall

So, we’re jumping into the kernel now, but so far we don’t actually do anything with this new syscall. How can we even tell which syscall is being called? Well, remember that the syscall id is in r7 when the context switch happens, so by the time we get to the kernel, we can access it like so:

switch(tasks[current_task][2+7]) {
	case 0x1:
		bwputs("fork!\n");
	break;
}

What’s that 2+7 for? Well, remember, we push SPSR and the supervisor mode lr on the top of the stack, so we have to move past those to get to the registers.

Our kernel has a limitation. Specifically, TASK_LIMIT. If we call fork when there is no space for a new task, that would be bad, so we should return an error:

if(task_count == TASK_LIMIT) {
	tasks[current_task][2+0] = -1;
} else {
}

What are we doing here? We’re changing the saved value of r0, which, you will recall, is the return value of a function. So, we are setting the return value that the user mode task will see to -1.

Now, what does fork actually need to do? It copies all the state from one task into a new one. The new task will need a stack pointer, and said stack pointer will need to be exactly as far from the end of the stack as the current task’s stack pointer:

size_t used = stacks[current_task] + STACK_SIZE - tasks[current_task];
tasks[task_count] = stacks[task_count] + STACK_SIZE - used;

Now we actually have to copy the stack over. Luckily, we know exactly how much to copy, since it’s the same as the distance from the stack pointer to the end:

memcpy(tasks[task_count], tasks[current_task], used*sizeof(*tasks[current_task]));

fork is specified to return the new PID in the parent process, and 0 in the child process:

tasks[current_task][2+0] = task_count;
tasks[task_count][2+0] = 0;

And, finally, we should probably record the fact that there’s a new task:

task_count++;

That’s it!

We now have a multitasking kernel (remember to increase TASK_LIMIT if you want to run more tasks!), and user mode tasks can start new tasks. Next time: hardware interrupts!

Code for this post is on GitHub.

Writing a Simple OS Kernel — Part 2, User Mode

Posted on

Updated 2012-33 to fix a bug in the context switch assumptions.
Edited for clarity, 2012-34.

Last time, we got a basic bit of C code bootstrapped and running all by itself on QEMU. This is often called “bare-metal” programming. This time we’re going to add to that and do something a bit more complicated: we’re going to run a single user-mode program on our kernel.

What is User Mode?

User mode is the name given to the mode we put the computer in when running anything other than the kernel. This often has implications for memory protection and similar, but we don’t have any of that. For us it’s just going to be the CPU mode in which certain things (such as changing CPU modes) cannot be done.

User mode also often refers to the slices of the system resources given to different processes, etc, such that they can all be run on the same computer without interfering with each other.

First, some abstractions

We hard-coded some values and functionality last time to write to the serial port. That code was very simple for printing one statement, but we may want to clean it up a bit if we’re going to use the serial port a lot.

First, let’s pull out everything machine-specific from kernel.c and put it in a header file for the machine, call it versatilepb.h:

# UART0 ((volatile unsigned int*)0x101f1000)

So far we just throw the characters at the serial port as fast as we can, and hope they get caught. That will probably always work on QEMU, but will not work on real hardware, so let’s add the ability to check if the serial port can handle a byte right now. For that we’ll need two more constants:

# UARTFR 0x06
# UARTFR_TXFF 0x20

UARTFR is the offset (in words) from the UART0 base address of the flags for the serial port. UARTFR_TXFF is the mask that gets the bit representing if the transmit buffer is full.

Now you can put the following at the top of kernel.c:

# "versatilepb.h"

void bwputs(char *s) {
	while(*s) {
		while(*(UART0 + UARTFR) & UARTFR_TXFF);
		*UART0 = *s;
		s++;
	}
}

Now you can replace the big mess in main with bwputs("Hello, World!\n"); or any other message. Much cleaner!

Code so far on GitHub

Calling Into Assembly Code

Last time, we had a small piece of assembly code that called in to our C program. This time, we are going to want to call into assembly code from our C code. If you are on Ubuntu or some other systems, your C compiler may be defaulting to generating “thumb mode” instructions, which are not what we need to use for our assembly code. So we need to tell the C compiler to generate normal ARM instructions. To do this with gcc, add -marm to the end of your CFLAGS

A User Program

Create a simple “user program” in the form of a function in kernel.c called first that just prints and then hangs (since we will not build the ability to get out of user mode until later):

void first(void) {
	bwputs("In user mode\n");
	while(1);
}

Assembly stub

Create a new file called context_switch.s with the following:

.global activate
activate:

And a new file called asm.h with the following:

void activate(void);

Include this new header file into kernel.c so that you can call the assembly stub from your C code, then add a call to activate to your main.

Finally, add context_switch.o as a dependency to kernel.elf in your Makefile so that it will get built.

The Context Switch

Alright, what are the absolute minimum things we need our switch to user mode (called the “context switch”) to do? Well, it the very least we need a way to start running some function in user mode.

The way to switch an ARM system into user mode is to use the movs instruction to put some address to jump to (like the address of our function) into the pc register (the “program counter”, which is where the CPU is currently executing). But what mode will the CPU enter when we do this? The answer is that it will read the contents of a special register called SPSR (Saved Processor Status Register) and use that to change CPSR (Current Processor Status Register), and thus change modes. Couldn’t we just change CPSR directly? Because we’re not in user mode, we could, but since we want to jump into our function the moment we switch modes, this is the safest way to do it:

mov r0, #
msr SPSR, r0
ldr lr, =first
movs pc, lr

0x10 is just the value that sets the bit meaning “user mode”. We set that to SPSR, load the location of first and then movs there.

You can stick that at the start of activate and try to run that if you like, but it won’t work. Why is that? Remember how we had to set up the stack in order to jump into C code? Well, it turns out that one of the differences of user mode is that it uses a different sp register. This can be very handy later when we’re doing more complicated things, but for now we can just set the user mode stack to be the same as the kernel stack, by adding the following before the movs:

mov ip, sp
msr CPSR_c, # /* System mode */
mov sp, ip
msr CPSR_c, # /* Supervisor mode */

So what are we doing here? We copy our current sp to ip (because we’ll have a different sp in user mode, so we need to copy it somewhere), then we set a part of CPSR directly to enter “system mode”. What is system mode? It’s a special mode on the ARM processor that lets us access the registers as though we were in user mode, but still be able to do privileged things. We set user mode sp to our copy, then switch back to supervisor mode (which is where we normally operate in the kernel).

If you build the kernel now, and run it under QEMU, you should get “In user mode” printed out. Good job!

Code so far on GitHub

A Better Stack

Using the same kernel stack for our user mode program isn’t going to work very well if we want to be able to pause the program and go back to it, because other things will use the kernel stack in between, so we really want the program to have it’s own stack.

First, declare some space for your user stack:

unsigned int first_stack[256];

Then pass first_stack + 256 to activate, and change asm.h to have activate take an argument.

The first four arguments to an assembly call come in as r0-r3, so we can easily access this parameter inside activate:

msr CPSR_c, # /* System mode */
mov sp, r0
msr CPSR_c, # /* Supervisor mode */

One less line, since we can access r0 from user mode directly.

Less hardcoding

The program should still run, but now it’s using its own stack. We still have the value of SPSR and the name of the function we’re calling hardcoded into the assembly. We could pass these as parameters, but then we would have to remember them in a special way when it comes to being able to enter and re-enter the same user mode function multiple times (since the current stack, CPU mode, and entry point can change between calls), so it’s actually easiest to store these two additional values on the user mode program’s stack. We will store them in special positions so that all the user mode registers can be saved along with them easily.

We’ll want to move the calculation of the end of the stack up, so that we can put our data into in:

unsigned int *first_stack_start = first_stack + 256 - 16;
first_stack_start[0] = 0x10;
first_stack_start[1] = (unsigned int)&first;

You’ll note the cast to (unsigned int) of the function pointer. This is mostly to make the compiler not warn us about using a function pointer as data. You should now pass first_stack_start to activate. If you want to, you can test that it still works, but we aren’t actually using this new data yet.

We’ve done a lot of work on the assembly, and are about to change it quite a bit, so I’ll reproduce the whole context switch here with the changes to use these values:

.global activate
activate:
	ldmfd r0!, {ip,lr} /* Get SPSR and lr */
	msr SPSR, ip

	msr CPSR_c, # /* System mode */
	mov sp, r0
	pop {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,fp,ip,lr}
	msr CPSR_c, # /* Supervisor mode */

	movs pc, lr

This loads the first two elements from the passed-in stack to SPSR and lr (ldmfd, when coupled with the ! on the first argument, is the same as pop, but works with any register instead of just sp), then we switch to usermode and set the stack to r0, as before. Finally we use the pop instruction to load the rest of the stuff into our registers. We have nothing in there just now, but we’ll use more later, and this also makes the stack skip over all that stuff so that the process can use all of the space.

You’ll note we had to set lr twice. This is because, like sp, lr has a different version used in user mode from the one used in supervisor mode.

That’s it!

We now have a kernel that sets up a user mode task and then switches to it. Next time: getting back out of user mode!

The code for this post is on GitHub.