Singpolyma

Technical Blog

Archive for the "Tech" Category

ARM Tooling Interlude

Posted on

This is just a breif interlude to document something that it took me a very long time to find out. Some ARM cross-compiling toolchains default to thumb instructions. Thumb instructions are this nifty feature of ARM that allow most common instructions to be represented in 16 bits instead of 32 bits, which can save RAM and improve instruction fetching. Many of the instructions needed in the context-switch part of the kernel, however, must be in full 32-bit ARM. No problem, there are instructions for switching the CPU mode, just make it switch whenever moving from one to the other, right?

One of the downsides of writing C over raw assembly is that you don’t have full control over what instructions are output. I needed a way to make the toolchain realize that mode-switching instructions needed to be inserted. I could not (at the time) find such a way.

So, I used a different solution: tell the C compiler to always output 32-bit ARM, no matter what the default is. This worked, but presented another problem: the default and standard libraries for my gcc were also in thumb, and the interwork was not getting set up there either! So I could no longer use anything from libgcc or libc, which meant that not only could I not use their memcpy, etc (not neccessarily bad, since in an OS one might want to write one’s own of much of libc anyway), but I couldn’t use things like the % operator (which on ARM is implemented using a helper procedure in libgcc).

A helpful StackOverflow user finally found the answer for me recently, and I have confirmed that by marking procedure entry points as functions I can stop being concerned about -marm, which allows me to Use -lgcc and % and even use parts of -lc.

Why Care About Typeclasses?

Posted on

Let’s say you have some code and you want it to be able to iteract in a very similar way with either a network connection or a named pipe. You might structure this using a sum type (also called a tagged union):


data Stream = Network NHandle | Pipe PHandle

def interaction(h)
	input = case h
		when (Network handle)
			readNetwork(handle)
		when (Pipe handle)
			readPipe(handle)
	end

	output = transform_input(input)

	case h
		when (Network handle)
			writeNetwork(handle, output)
		when (Pipe handle)
			writePipe(handle, output)
	end
end

Since “dynamic typing” is just the practise of giving every value the exact same type, which is a giant sum type, a dynamically typed construction would look like this:


def interactionDynamic(h)
	input = case h
		when is_a?Network
			readNetwork(h)
		when is_a?Pipe
			readPipe(h)
	end

	output = transform_input(input)

	case h
		when is_a?Network
			writeNetwork(handle, output)
		when is_a?Pipe
			writePipe(handle, output)
	end
end

This is basically the same, but it introduces a tradeoff. In the original, if we wanted to add another supported stream, we needed to add a case to the sum type *and* cases to the function, and in return we were guarenteed that we would only ever be passed cases we could handle. With the dynamically typed version, we can just add cases to the function, but any value can be passed to us and if we can’t handle it the code will just crash.

Neither of these solutions are very nice in this case. We really want abstracted operations:

def read(h)
	case h
		when is_a?Network
			readNetwork(h)
		when is_a?Pipe
			readPipe(h)
	end
end

def write(h,output)
	case h
		when is_a?Network
			writeNetwork(h, output)
		when is_a?Pipe
			writePipe(h, output)
	end
end

def interactionDynamicAbstract(h)
	write(h, transform_input(read(h)))
end

Much nicer! One issue here, though, is that these operations really are fundamental to Networks and Pipes, and we have defined them as part of our code. It is likely that other people using Networks and Pipes will end up writing similar code. What if we bundled up all the operations you could do with a Pipe right alongside the data?


def interactionDuckTyped(h)
	h.write(transform_input(h.read()))
end

Now the operations live inside the datatype. This has the added advantage that we don’t even need to write any code to support a new type of stream, as long as the stream comes packaged with a read and a write operation.

There are two issues here, though. One is very similar to the tradeoff we made for dynamic typing, which is that the compiler cannot infer from what we’ve written which operations are necessary and so cannot check that values being passed to our procedure actually implement these operations until runtime. The other problem is that the only thing we know about the operations for sure is their names. We cannot easily document that these operations should have certain properties, because the users of our function cannot easily what operations we are using.

Can we fix these issues while keeping the extensability and power? We could try bundling the operations up seperately:


def interactionRecord(h, ops)
	ops[:write](h, transform_input(ops[:read](h)))
end

Now we can know that these operations will be present, and we can document that the operations the user passes in must satisfy certain properties. This seems like a pretty good solution. We have, however, leaked the management of which operations to use back to our user. Can we keep this safety while regaining the conveniance of having the stream type author specify the operations instead of our user?

This is exactly what typeclasses do. Consider:


typeclass Stream a where {
	void write(handle, bytestring);
	bytestring read(handle);
}

hastypeclass Stream Network where {
	write = ...;
	read = ...;
}

def interactionTypeclass(h)
	Stream::write(h, transform_input(Stream::read(h)))
end

Now we have a natural place to document the properties operations must have (the typeclass definiton), and their relationships with each other. The compiler can look up the correct hastypeclass to figure out what code to run for each type, and while it does so it can check that the operations we need (the ones that are part of Stream) are in fact defined for the input type.

Safety, extensability, and ease of use, all in one solution.

Writing a Simple OS Kernel — Part 7, Serial Port Driver

Posted on

Last time we got IPC working, and used it to build a simple path-resolution module. This time we’re going to write a fully-functional serial port driver.

Generic Interrupts

When we got interrupts working, we sort of hacked the kernel part in to support the timer, which is all we were interested in at the time. Let’s make that a bit more generic. We’re going to need one more syscall, since interrupts have to be handled by the kernel. The simplest one we can implement just allows a process to wait for an interrupt to fire on a particular line:

void interrupt_wait(int intr);

.global interrupt_wait
interrupt_wait:
	push {r7}
	mov r7, #
	svc 0
	bx lr

# TASK_WAIT_INTR 3

case 0x5: /* interrupt_wait */
	/* Enable interrupt */
	*(PIC + VIC_INTENABLE) = tasks[current_task][2+0];
	/* Block task waiting for interrupt to happen */
	tasks[current_task][-1] = TASK_WAIT_INTR;
	break;

We are already enabling the timer interrupt when the kernel starts. Here, we enable the interrupt that the process is asking to wait on, since if it is not enabled the task will block forever, so this seems like a useful place to do it.

We also want to unblock tasks when the interrupt fires. Get rid of the -4 case that we have hardcoded for the timer and put this in:

default: /* Catch all interrupts */
	if((int)tasks[current_task][2+7] < 0) {
		unsigned int intr = (1 < < -tasks[current_task][2+7]);

		if(intr == PIC_TIMER01) {
			/* Never disable timer. We need it for pre-emption */
			if(*(TIMER0 + TIMER_MIS)) { /* Timer0 went off */
				*(TIMER0 + TIMER_INTCLR) = 1; /* Clear interrupt */
			}
		} else {
			/* Disable interrupt, interrupt_wait re-enables */
			*(PIC + VIC_INTENCLEAR) = intr;
		}
		/* Unblock any waiting tasks
			XXX: nondeterministic unblock order
		*/
		for(i = 0; i < task_count; i++) {
			if(tasks[i][-1] == TASK_WAIT_INTR && tasks[i][2+0] == intr) {
				tasks[i][-1] = TASK_READY;
			}
		}
	}

You’ll notice we also need a new magic number for versatilepb.h:

# VIC_INTENCLEAR 0x5 /* 0x14 bytes */

Code for this section on Github.

Output Driver

Let’s use this new ability to implement half of our driver: output. This will allow us to write to a specific fifo for output to the serial port instead of using bwputs.

A refresher of serial port related magic numbers:

/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0224i/Bbabegge.html */
# UART0 ((volatile unsigned int*)0x101f1000)
/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0183g/I18381.html */
# UARTFR 0x06 /* 0x18 bytes */
# UARTIMSC 0x0E /* 0x38 bytes */
# UARTFR_TXFF 0x20
# UARTIMSC_TXIM 0x20

# PIC_UART0 0x1000

Let’s make a generic driver that accepts parameters telling it what serial port to drive:

void serialout(volatile unsigned int* uart, unsigned int intr) {
	int fd;
	char c;

We need to create the fifo that processes send bytes on:

	mkfifo("/dev/tty0/out", 0);
	fd = open("/dev/tty0/out", 0);

We also need to tell the UART to trigger an interrupt when it becomes ready to send a byte:

	*(uart + UARTIMSC) |= UARTIMSC_TXIM;

And finally a loop to read bytes and send them to the UART:

	while(1) {
		read(fd, &c, 1);
		interrupt_wait(intr);
		*uart = c;
	}

This seems pretty reasonable, but there is a problem. The UART triggers an interrupt when it becomes ready to send a byte. If it is already ready (such as at boot time), the interrupt will not trigger. So we try this:

	while(1) {
		read(fd, &c, 1);
		*uart = c;
		interrupt_wait(intr);
	}

This is closer, but in both of these versions there is a race condition where the UART might not actually be ready anymore by the time we process the interrupt (depending on what other tasks are doing), so let’s add a check for that:

	while(1) {
		read(fd, &c, 1);
		if(!(*(uart + UARTFR) & UARTFR_TXFF)) {
			*uart = c;
		}
		interrupt_wait(intr);
	}

Now, however, if the UART is not ready, we will skip bytes. So we only want to read a new byte if we actually wrote the current one:

	int doread = 1;
	while(1) {
		if(doread) read(fd, &c, 1);
		doread = 0;
		if(!(*(uart + UARTFR) & UARTFR_TXFF)) {
			*uart = c;
			doread = 1;
		}
		interrupt_wait(intr);
	}

To test this out, let’s set up a new first:

void first(void) {
	int fd;

	if(!fork()) pathserver();
	if(!fork()) serialout(UART0, PIC_UART0);

	fd = open("/dev/tty0/out", 0);
	write(fd, "woo\n", sizeof("woo\n"));
	write(fd, "thar\n", sizeof("thar\n"));
	while(1);
}

Our current pre-emption speed will make this a bit of a pain to run, so you can set the clock value in main to something lower.

We can now print to the serial port by writing to a fifo!

Code for this section on Github.

Input Driver

Some new magic numbers:

# UARTICR 0x11 /* 0x44 bytes */
# UARTFR_RXFE 0x10
# UARTIMSC_RXIM 0x10
# UARTICR_RXIC 0x10
# UARTICR_TXIC 0x20

The input driver will be another process that waits on the same interrupt. Unfortunately, because this one interrupt actually can signal one of several things, if two of them are true, and different processes work on processing them, the interrupt will fire constantly and prevent any work from getting done. We solve this by clearing the more specific interrupt in the drivers where we handle them. For example, in serialout:

*(uart + UARTICR) = UARTICR_TXIC;

Next, start with some setup:

void serialin(volatile unsigned int* uart, unsigned int intr) {
	int fd;
	char c;
	mkfifo("/dev/tty0/in", 0);
	fd = open("/dev/tty0/in", 0);

	/* enable RX interrupt on UART */
	*(uart + UARTIMSC) |= UARTIMSC_RXIM;

	while(1) {

We want to wait on the interrupt, and then immidiately clear the specific interrupt we handle, just like in the output driver:

	interrupt_wait(intr);
	*(uart + UARTICR) = UARTICR_RXIC;

And we want to make sure, not only that no race condition has changed the UART status, but that the interrupt was meant for us at all:

	if(!(*(uart + UARTFR) & UARTFR_RXFE)) {

And then just read the byte and put it in our fifo:

	c = *uart;
	write(fd, &c, 1);

This input server has a bit of a bug. If the fifo is ever full, it will block, and maybe miss input bytes from the serial port. Our processes are likely to be way faster than any serial port, but still, in a real kernel you would want to account for this by splitting the driver into two processes: one that waits on the interrupt, and one that buffers the bytes internally.

Code for this section on Github.

Echo Application

To tie all this up, we write a simple application that will echo back every character you type:

void echo(void) {
	int fdout, fdin;
	char c;
	fdout = open("/dev/tty0/out", 0);
	fdin = open("/dev/tty0/in", 0);

	while(1) {
		read(fdin, &c, 1);
		write(fdout, &c, 1);
	}
}

Code for this section on Github.

We’re done!

We now have a working serial port driver that respects the hardware by using interrupts instead of burning CPU time looping on flag checks. Remember, any interactios with the UART will affect these drivers, so using bwputs may have strange side effects now.

To be honest, this is about as far as I expected to get when I originally started this series. I could write a clock driver, to allow for sleep calls and similar, but you should be able to see how that might work. One big topic I have avoided thus far is enabling the MMU and doing virtual memory / memory protection. I may try to figure out how that works on this harware, but it will require quite a bit of doing. Suggestions for others things you would like to see in this series go in the comments!

Code so far on Github.

Compiling Mustache Templates

Posted on

Christopher Vollick and I were recently discussing ways to get more compile-time guarentees out of our web templates. We both love Mustache for its simplicity and its thin-view philosophy, so that was a natural place to start.

I decided to try writing a tool to convert Mustache templates to Haskell code, which I have done.

This allows the compiler to check a number of things for us including: ensuring we do not use any keys in the templates that are not defined in the model, ensuring that we do not use keys in the template in ways that are inconsistent with the datatype in the model, and ensuring that our partials and sections expect the context they will end up getting passed due to the structure of the templates.

It also gives us a speed boost. Comparing to the other popular Haskell Mustache implementation my current benchmarks show a significant speedup. This is to be expected, since GHC can perform optimisations that a runtime interpolator just can’t. Also, in order to be able to use arbitrary Haskell Records as context, runtime implementations have to perform runtime checks using a union type or the Typeable class. My implementation can do all these checks at compile time.

People familiar with Mustache may be wondering how I handle recursive partials, since this is a feature Mustache touts as being possible because the templates are not compiled. I make a compromise in my code: partials are rendered out as toplevel functions, just like any other file, and as such they only inherit the context directly above them, instead of inheriting the enitre scope all the way to the top. I rarely use values from higher scopes in my partials anyway, so I don’t think this is a very severe limitation.

Values that are displayed in the template must have an instance of Pretty from wl-pprint and values that are used in sections or inverse sections must either be lists of records (which get used as the context for the section body) or any Monoid instance. If the value (==mempty), it is considered falsey. Note that if you need values to be able to be optional, but want the natural mempty of your datatype to be truthy, you can acheive this by wrapping in a Maybe.

The requirement to be a Monoid is relaxed for Bool and a large list of numeric types, which get wrapped in the Any and Sum newtype wrappers by the code generator when they are detected in a record. This allows False and 0 to be treated as falsy for the majority of types where this would be interesting without needing to define orphan Monoid instances or wrap everything up in newtypes yourself.

There is example code on GitHub. Check it out, play with it, and let me know what you think!

Writing a Simple OS Kernel — Part 6, Inter-Process Communication

Posted on

Last time we got hardware interrupts working, and used them to implement preemption for our multitasking. This time we’re going to get our user space processes talking to one another.

getpid

We’re going to need some way for replies to our messages to get sent back to a process. We can’t use whatever other mechanisms we build for registering a new communications channel for this, because without it we cannot receive communications, even about a new communications channel. Luckily we have a piece of information that comes with every process: its ID. In our case, the ID of a task is going to just be its index in the array of stacks in our main.

So the normal drill for a new syscall in the case statement in main:

case 0x2: /* getpid */
	tasks[current_task][2+0] = current_task;
	break;

And in syscall.s:

.global getpid
getpid:
	push {r7}
	mov r7, #
	svc 0
	bx lr

And in asm.h:

int getpid(void);

Removed the accidentally shared static variable. Do not use statics without a virtual memory system!

Code for this section on Github.

Putting processes to sleep

Currently our scheduler just round-robins through all tasks, but if we want a task to be able to wait for a message, we need some way for it to go to sleep while it waits. We will do this by having a new piece of metadata in our process space for “status”:

# TASK_READY       0
# TASK_WAIT_READ   1
# TASK_WAIT_WRITE  2

On coming out of a task (right after the activate call in main) it is definitely ready:

tasks[current_task][-1] = TASK_READY;

Then replace the two lines at the end of the while loop in main with a new scheduler that skips over tasks that aren’t ready:

/* Select next TASK_READY task */
while(TASK_READY != tasks[current_task =
	(current_task+1 >= task_count ? 0 : current_task+1)][-1]);

Now if any syscall sets the status to something else, the process will sleep until woken up.

Code for this section on Github.

Ringbuffers

We’re going to do our communication between processes using a rudimentary implementation of UNIX-style pipes. The pipes will be implemented using ringbuffers. Why? Because ringbuffers are easy to use for FIFO-style communication without needing any dynamic memory allocation. Pipes are allowed to get “full” and luckily we can tell when a ringbuffer is full, so this fits. We’re going to want some generic ringbuffer utilities, and it won’t hurt to have versions specialised to the pipe buffers:

# STACK_SIZE 1024 /* Size of task stacks in words */
# TASK_LIMIT 2   /* Max number of tasks we can handle */
# PIPE_BUF   512 /* Size of largest atomic pipe message */
# PIPE_LIMIT (TASK_LIMIT*5)

struct pipe_ringbuffer {
	int start;
	int end;
	char data[PIPE_BUF];
};

# RB_PUSH(rb, size, v) do { \
		(rb).data[(rb).end] = (v); \
		(rb).end++; \
		if((rb).end > size) (rb).end = 0; \
	} while(0)

# RB_POP(rb, size, v) do { \
		(v) = (rb).data[(rb).start]; \
		(rb).start++; \
		if((rb).start > size) (rb).start = 0; \
	} while(0)

# RB_LEN(rb, size) (((rb).end - (rb).start) + \
	(((rb).end < (rb).start) ? size : 0))

# PIPE_PUSH(pipe, v) RB_PUSH((pipe), PIPE_BUF, (v))
# PIPE_POP(pipe, v)  RB_POP((pipe), PIPE_BUF, (v))
# PIPE_LEN(pipe)     (RB_LEN((pipe), PIPE_BUF))

You’ll note I’ve increased the stack size. We’re going to be handling more data, and with our current setup if you need more space than the stack size it’s just going to cause very strange and hard-to-find bugs. The pipe limit is expressed in terms of the number of tasks. This makes sense since as the number of tasks increases, it is likely that the number of channels we’ll need increases, especially since we’re going to reserve one per task to begin with.

Now we just need to allocate our pipes in main:

struct pipe_ringbuffer pipes[PIPE_LIMIT];

and initialize them all:

size_t i;
/* Initialize all pipes */
for(i = 0; i < PIPE_LIMIT; i++) {
	pipes[i].start = pipes[i].end = 0;
}

Code for this section on Github.

Reading and writing

Now for the meaty syscalls. Some boilerplate first:

int write(int fd, const void *buf, size_t count);
int read(int fd, void *buf, size_t count);

.global write
write:
	push {r7}
	mov r7, #
	svc 0
	bx lr
.global read
read:
	push {r7}
	mov r7, #
	svc 0
	bx lr
case 0x3: /* write */
	_write(tasks[current_task], tasks, task_count, pipes);
	break;
case 0x4: /* read */
	_read(tasks[current_task], tasks, task_count, pipes);
	break;

You’ll note that I’ve placed the bodies of these syscalls off in their own functions. That’s partly because they’re going to be much bigger than what we’ve seen before, but also because they actually need to be able to call each other.

So, definition before use:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);
void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);

These are a bit long and maybe non-obvious, so one piece at a time:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	task[-1] = TASK_READY;

In case there’s an error, make sure we don’t stay blocked forever.

/* If the fd is invalid, or trying to read too much  */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

An out-of-range pipe index is obviously an error. And the ringbuffers can’t hold more than PIPE_BUF bytes, so it’s impossible to read more than that at once.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];
		if((size_t)PIPE_LEN(*pipe) < task[2+2]) {
			/* Trying to read more than there is: block */
			task[-1] = TASK_WAIT_READ;

Ah, our first actual blocking call! We get the ringbuffer for the pipe that is being read from, and if it has less bytes in it than the task wants to read, set the task into the TASK_WAIT_READ state.


		} else {
			size_t i;
			char *buf = (char*)task[2+1];
			/* Copy data into buf */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_POP(*pipe,buf[i]);
			}

We have a valid pipe with enough bytes in the buffer. Pop each byte off into the read buffer until we have read as many as were requested.


			/* Unblock any waiting writes
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_WRITE) {
					_write(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Loop over all the tasks, and any that were waiting to write, unblock them and try the write again. This is somewhat inefficient, since only tasks that were waiting to write to the pipe we just read from have any chance at all of being able to succeed (all the other pipes are just as full as they used to be), but it works.


void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	/* If the fd is invalid or the write would be non-atomic */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

Again, an out-of-range pipe index is obviously an error. And because of the ringbuffers, you’ll never be able to write more than PIPE_BUF bytes into a pipe all at once. We want our writes to be atomic (because we have preemptive multitasking going on), so trying to write more than PIPE_BUF bytes needs to be an error as well.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];

		if((size_t)PIPE_BUF - PIPE_LEN(*pipe) < 
			task[2+2]) {
			/* Trying to write more than we have space for: block */
			task[-1] = TASK_WAIT_WRITE;

If the write would put in more bytes than the pipe currently has space for, then block waiting for a read to make room for us.


		} else {
			size_t i;
			const char *buf = (const char*)task[2+1];
			/* Copy data into pipe */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_PUSH(*pipe,buf[i]);
			}

We have a valid write and there is enough space in the pipe. Push all the bytes from the write buffer into the pipe.


			/* Unblock any waiting reads
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_READ) {
					_read(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Same as before, this is inefficient. Only tasks waiting to read from pipes we just wrote to are really interested in waking up. But it doesn’t take too long to check, and it’s easier for now.

Code for this section on Github.

Pathserver

So, we have working read and write calls now, but in order for processes to communicate they’re going to need to agree on what pipe indices to use. A parent process can get its child’s PID (from fork) and write to it that way, maybe, but that’s not a generic solution.

We’re going to solve this problem by building a nameserver. The names in question are going to be pseudo-filesystem “paths”, so let’s call it a pathserver. We’ll need some string functions:

int strcmp(const char* a, const char* b) {
	int r = 0;
	while(!r && *a && *b) r = (*a++) - (*b++);
	return (*a) - (*b);
}

size_t strlen(const char *s) {
	size_t r = 0;
	while(*s++) r++;
	return r;
}

And since everything is static sized for us, how long can a path be?

# PATH_MAX 255 /* Longest absolute path */

Oh, and we need a way to determine what the pipe index of the pathserver itself is. Otherwise this whole concept is a nonstarter.

# PATHSERVER_FD (TASK_LIMIT+3) /* File descriptor of pipe to pathserver */

We said we’re reserving one pipe for each task for replies, and let’s also reserve 0-2, since those file descriptor numbers are usually “magic” and we don’t want to have any mix-ups until we get real FD tables in place later. The pathserver will manage all other pipes, so it can assign itself the first one. Now we have a constant that tells us the pipe index of the pathserver.

We can use this information to define a protocol and pseudo-syscall for registering a new path:

int mkfifo(const char *pathname, int mode) {
	size_t plen = strlen(pathname)+1;
	char buf[4+4+PATH_MAX];
	(void)mode;

	*((unsigned int*)buf) = 0;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);

	/* XXX: no error handling */
	return 0;
}

Notice I said pseudo-syscall. It may act sort of like a syscall, but in fact it is built entirely based on primitives we already have! This is one of the huge benefits of a microkernel: once you get your basic functionality in place, most things can be built on top of what you already have, as wrappers.

The protocol here is a zero, followed by the length of the path we’re registering, followed by the characters of that path. Pretty simple. There’s no way for the pathserver to signal an error to us, because we don’t wait for a reply, but unless there are no more pipes this can’t fail, so we’ll just leave it for now.

int open(const char *pathname, int flags) {
	unsigned int replyfd = getpid() + 3;
	size_t plen = strlen(pathname)+1;
	unsigned int fd = -1;
	char buf[4+4+PATH_MAX];
	(void)flags;

	*((unsigned int*)buf) = replyfd;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);
	read(replyfd, &fd, 4);

	return fd;
}

The protocol here looks the same as before, only instead of 0 as the first word, we send our reserved pipe. That’s our PID, moved ahead by the three we’re reserving. Conveniently, this means no valid replyfd will ever be 0, so we can differentiate between mkfifo and open.

Now for the pathserver itself, piece by piece.

void pathserver(void) {
	char paths[PIPE_LIMIT - TASK_LIMIT - 3][PATH_MAX];
	int npaths = 0;
	int i = 0;
	unsigned int plen = 0;
	unsigned int replyfd = 0;
	char path[PATH_MAX];

	memcpy(paths[npaths++], "/sys/pathserver", sizeof("/sys/pathserver"));

Need to store the paths of the pipes. We’ll just use an array and a linear lookup for now. We also register the first pipe as being the pathserver.


	while(1) {
		read(PATHSERVER_FD, &replyfd, 4);
		read(PATHSERVER_FD, &plen, 4);
		read(PATHSERVER_FD, path, plen);

Loop forever, reading the three pieces of data out of our pipe. Recall that we wrote all of this data with a single write call. That’s important because we need all these bytes together, so the writes must be atomic. Once we’re reading, however, the bytes are already in a deterministic order, so we can read them out in pieces. This is especially good, because we wrote the length of the message as part of the message! So we certainly need to be able to read one piece at a time.


		if(!replyfd) { /* mkfifo */
			memcpy(paths[npaths++], path, plen);

Registering a new path is super easy. Should probably check for out-of-bounds here, but this works for now.


		} else { /* open */
			/* Search for path */
			for(i = 0; i < npaths; i++) {
				if(*paths[i] && strcmp(path, paths[i]) == 0) {
					i += 3; /* 0-2 are reserved */
					i += TASK_LIMIT; /* FDs reserved for tasks */
					write(replyfd, &i, 4);
					i = 0;
					break;
				}
			}

Linear search for the path. If we find it, adjust the index by the reserved indices and write the reply. We ignore empty paths, to allow us the possibility of pipes without names that the pathserver knows about, but you can’t usefully search for. Notice also that we set i back to zero after writing a reply, because:


			if(i >= npaths) {
				i = -1; /* Error: not found */
				write(replyfd, &i, 4);
			}
		}
	}
}

If we shoot off the end, then i >= npaths, in which case we didn’t find it. Well, don’t let the caller just block forever waiting for a reply that’s not coming! Send them -1.

Now we’ll just tie this all together with some tasks that use this new functionality:

# TASK_LIMIT 3

void otherguy() {
	int fd;
	unsigned int len;
	char buf[20];
	mkfifo("/proc/0", 0);
	fd = open("/proc/0", 0);
	while(1) {
		read(fd, &len, 4);
		read(fd, buf, len);
		bwputs(buf);
	}
}

void first(void) {
	int fd;

	if(!fork()) pathserver();
	if(!fork()) otherguy();

	fd = open("/proc/0", 0);
	while(1) {
		int len = sizeof("Ping\n");
		char buf[sizeof("Ping\n")+4];
		memcpy(buf, &len, 4);
		memcpy(buf+4, "Ping\n", len);
		write(fd, buf, len+4);
	}
}

We spin up the pathserver and the otherguy. The otherguy just waits for incoming strings (19 characters max, but it’s just a demo) and prints any that come in. Then, our first task run in a loop forever sending Ping to the otherguy, and they get printed.

We just built a rudimentary serial port driver! It’s not using interrupts properly yet, but at least strings send to the otherguy are printed atomically, because the otherguy “owns” the serial port and we don’t have all the tasks preempting each other in the middle of a write.

Code for this section on Github.

We’re done!

We now have all the machinery we need in order to send messages between processes. Next time, we’ll look at using all of this to build a more fully-functional serial port driver.

The code so far is on Github.