Singpolyma

Writing a Simple OS Kernel — Part 6, Inter-Process Communication

Posted on

Last time we got hardware interrupts working, and used them to implement preemption for our multitasking. This time we’re going to get our user space processes talking to one another.

getpid

We’re going to need some way for replies to our messages to get sent back to a process. We can’t use whatever other mechanisms we build for registering a new communications channel for this, because without it we cannot receive communications, even about a new communications channel. Luckily we have a piece of information that comes with every process: its ID. In our case, the ID of a task is going to just be its index in the array of stacks in our main.

So the normal drill for a new syscall in the case statement in main:

case 0x2: /* getpid */
	tasks[current_task][2+0] = current_task;
	break;

And in syscall.s:

.global getpid
getpid:
	push {r7}
	mov r7, #
	svc 0
	bx lr

And in asm.h:

int getpid(void);

Removed the accidentally shared static variable. Do not use statics without a virtual memory system!

Code for this section on Github.

Putting processes to sleep

Currently our scheduler just round-robins through all tasks, but if we want a task to be able to wait for a message, we need some way for it to go to sleep while it waits. We will do this by having a new piece of metadata in our process space for “status”:

# TASK_READY       0
# TASK_WAIT_READ   1
# TASK_WAIT_WRITE  2

On coming out of a task (right after the activate call in main) it is definitely ready:

tasks[current_task][-1] = TASK_READY;

Then replace the two lines at the end of the while loop in main with a new scheduler that skips over tasks that aren’t ready:

/* Select next TASK_READY task */
while(TASK_READY != tasks[current_task =
	(current_task+1 >= task_count ? 0 : current_task+1)][-1]);

Now if any syscall sets the status to something else, the process will sleep until woken up.

Code for this section on Github.

Ringbuffers

We’re going to do our communication between processes using a rudimentary implementation of UNIX-style pipes. The pipes will be implemented using ringbuffers. Why? Because ringbuffers are easy to use for FIFO-style communication without needing any dynamic memory allocation. Pipes are allowed to get “full” and luckily we can tell when a ringbuffer is full, so this fits. We’re going to want some generic ringbuffer utilities, and it won’t hurt to have versions specialised to the pipe buffers:

# STACK_SIZE 1024 /* Size of task stacks in words */
# TASK_LIMIT 2   /* Max number of tasks we can handle */
# PIPE_BUF   512 /* Size of largest atomic pipe message */
# PIPE_LIMIT (TASK_LIMIT*5)

struct pipe_ringbuffer {
	int start;
	int end;
	char data[PIPE_BUF];
};

# RB_PUSH(rb, size, v) do { \
		(rb).data[(rb).end] = (v); \
		(rb).end++; \
		if((rb).end > size) (rb).end = 0; \
	} while(0)

# RB_POP(rb, size, v) do { \
		(v) = (rb).data[(rb).start]; \
		(rb).start++; \
		if((rb).start > size) (rb).start = 0; \
	} while(0)

# RB_LEN(rb, size) (((rb).end - (rb).start) + \
	(((rb).end < (rb).start) ? size : 0))

# PIPE_PUSH(pipe, v) RB_PUSH((pipe), PIPE_BUF, (v))
# PIPE_POP(pipe, v)  RB_POP((pipe), PIPE_BUF, (v))
# PIPE_LEN(pipe)     (RB_LEN((pipe), PIPE_BUF))

You’ll note I’ve increased the stack size. We’re going to be handling more data, and with our current setup if you need more space than the stack size it’s just going to cause very strange and hard-to-find bugs. The pipe limit is expressed in terms of the number of tasks. This makes sense since as the number of tasks increases, it is likely that the number of channels we’ll need increases, especially since we’re going to reserve one per task to begin with.

Now we just need to allocate our pipes in main:

struct pipe_ringbuffer pipes[PIPE_LIMIT];

and initialize them all:

size_t i;
/* Initialize all pipes */
for(i = 0; i < PIPE_LIMIT; i++) {
	pipes[i].start = pipes[i].end = 0;
}

Code for this section on Github.

Reading and writing

Now for the meaty syscalls. Some boilerplate first:

int write(int fd, const void *buf, size_t count);
int read(int fd, void *buf, size_t count);

.global write
write:
	push {r7}
	mov r7, #
	svc 0
	bx lr
.global read
read:
	push {r7}
	mov r7, #
	svc 0
	bx lr
case 0x3: /* write */
	_write(tasks[current_task], tasks, task_count, pipes);
	break;
case 0x4: /* read */
	_read(tasks[current_task], tasks, task_count, pipes);
	break;

You’ll note that I’ve placed the bodies of these syscalls off in their own functions. That’s partly because they’re going to be much bigger than what we’ve seen before, but also because they actually need to be able to call each other.

So, definition before use:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);
void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes);

These are a bit long and maybe non-obvious, so one piece at a time:

void _read(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	task[-1] = TASK_READY;

In case there’s an error, make sure we don’t stay blocked forever.

/* If the fd is invalid, or trying to read too much  */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

An out-of-range pipe index is obviously an error. And the ringbuffers can’t hold more than PIPE_BUF bytes, so it’s impossible to read more than that at once.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];
		if((size_t)PIPE_LEN(*pipe) < task[2+2]) {
			/* Trying to read more than there is: block */
			task[-1] = TASK_WAIT_READ;

Ah, our first actual blocking call! We get the ringbuffer for the pipe that is being read from, and if it has less bytes in it than the task wants to read, set the task into the TASK_WAIT_READ state.


		} else {
			size_t i;
			char *buf = (char*)task[2+1];
			/* Copy data into buf */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_POP(*pipe,buf[i]);
			}

We have a valid pipe with enough bytes in the buffer. Pop each byte off into the read buffer until we have read as many as were requested.


			/* Unblock any waiting writes
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_WRITE) {
					_write(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Loop over all the tasks, and any that were waiting to write, unblock them and try the write again. This is somewhat inefficient, since only tasks that were waiting to write to the pipe we just read from have any chance at all of being able to succeed (all the other pipes are just as full as they used to be), but it works.


void _write(unsigned int *task, unsigned int **tasks, size_t task_count, struct pipe_ringbuffer *pipes) {
	/* If the fd is invalid or the write would be non-atomic */
	if(task[2+0] > PIPE_LIMIT || task[2+2] > PIPE_BUF) {
		task[2+0] = -1;

Again, an out-of-range pipe index is obviously an error. And because of the ringbuffers, you’ll never be able to write more than PIPE_BUF bytes into a pipe all at once. We want our writes to be atomic (because we have preemptive multitasking going on), so trying to write more than PIPE_BUF bytes needs to be an error as well.


	} else {
		struct pipe_ringbuffer *pipe = &pipes[task[2+0]];

		if((size_t)PIPE_BUF - PIPE_LEN(*pipe) < 
			task[2+2]) {
			/* Trying to write more than we have space for: block */
			task[-1] = TASK_WAIT_WRITE;

If the write would put in more bytes than the pipe currently has space for, then block waiting for a read to make room for us.


		} else {
			size_t i;
			const char *buf = (const char*)task[2+1];
			/* Copy data into pipe */
			for(i = 0; i < task[2+2]; i++) {
				PIPE_PUSH(*pipe,buf[i]);
			}

We have a valid write and there is enough space in the pipe. Push all the bytes from the write buffer into the pipe.


			/* Unblock any waiting reads
				XXX: nondeterministic unblock order
			*/
			for(i = 0; i < task_count; i++) {
				if(tasks[i][-1] == TASK_WAIT_READ) {
					_read(tasks[i], tasks, task_count, pipes);
				}
			}
		}
	}
}

Same as before, this is inefficient. Only tasks waiting to read from pipes we just wrote to are really interested in waking up. But it doesn’t take too long to check, and it’s easier for now.

Code for this section on Github.

Pathserver

So, we have working read and write calls now, but in order for processes to communicate they’re going to need to agree on what pipe indices to use. A parent process can get its child’s PID (from fork) and write to it that way, maybe, but that’s not a generic solution.

We’re going to solve this problem by building a nameserver. The names in question are going to be pseudo-filesystem “paths”, so let’s call it a pathserver. We’ll need some string functions:

int strcmp(const char* a, const char* b) {
	int r = 0;
	while(!r && *a && *b) r = (*a++) - (*b++);
	return (*a) - (*b);
}

size_t strlen(const char *s) {
	size_t r = 0;
	while(*s++) r++;
	return r;
}

And since everything is static sized for us, how long can a path be?

# PATH_MAX 255 /* Longest absolute path */

Oh, and we need a way to determine what the pipe index of the pathserver itself is. Otherwise this whole concept is a nonstarter.

# PATHSERVER_FD (TASK_LIMIT+3) /* File descriptor of pipe to pathserver */

We said we’re reserving one pipe for each task for replies, and let’s also reserve 0-2, since those file descriptor numbers are usually “magic” and we don’t want to have any mix-ups until we get real FD tables in place later. The pathserver will manage all other pipes, so it can assign itself the first one. Now we have a constant that tells us the pipe index of the pathserver.

We can use this information to define a protocol and pseudo-syscall for registering a new path:

int mkfifo(const char *pathname, int mode) {
	size_t plen = strlen(pathname)+1;
	char buf[4+4+PATH_MAX];
	(void)mode;

	*((unsigned int*)buf) = 0;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);

	/* XXX: no error handling */
	return 0;
}

Notice I said pseudo-syscall. It may act sort of like a syscall, but in fact it is built entirely based on primitives we already have! This is one of the huge benefits of a microkernel: once you get your basic functionality in place, most things can be built on top of what you already have, as wrappers.

The protocol here is a zero, followed by the length of the path we’re registering, followed by the characters of that path. Pretty simple. There’s no way for the pathserver to signal an error to us, because we don’t wait for a reply, but unless there are no more pipes this can’t fail, so we’ll just leave it for now.

int open(const char *pathname, int flags) {
	unsigned int replyfd = getpid() + 3;
	size_t plen = strlen(pathname)+1;
	unsigned int fd = -1;
	char buf[4+4+PATH_MAX];
	(void)flags;

	*((unsigned int*)buf) = replyfd;
	*((unsigned int*)(buf+4)) = plen;
	memcpy(buf+4+4, pathname, plen);
	write(PATHSERVER_FD, buf, 4+4+plen);
	read(replyfd, &fd, 4);

	return fd;
}

The protocol here looks the same as before, only instead of 0 as the first word, we send our reserved pipe. That’s our PID, moved ahead by the three we’re reserving. Conveniently, this means no valid replyfd will ever be 0, so we can differentiate between mkfifo and open.

Now for the pathserver itself, piece by piece.

void pathserver(void) {
	char paths[PIPE_LIMIT - TASK_LIMIT - 3][PATH_MAX];
	int npaths = 0;
	int i = 0;
	unsigned int plen = 0;
	unsigned int replyfd = 0;
	char path[PATH_MAX];

	memcpy(paths[npaths++], "/sys/pathserver", sizeof("/sys/pathserver"));

Need to store the paths of the pipes. We’ll just use an array and a linear lookup for now. We also register the first pipe as being the pathserver.


	while(1) {
		read(PATHSERVER_FD, &replyfd, 4);
		read(PATHSERVER_FD, &plen, 4);
		read(PATHSERVER_FD, path, plen);

Loop forever, reading the three pieces of data out of our pipe. Recall that we wrote all of this data with a single write call. That’s important because we need all these bytes together, so the writes must be atomic. Once we’re reading, however, the bytes are already in a deterministic order, so we can read them out in pieces. This is especially good, because we wrote the length of the message as part of the message! So we certainly need to be able to read one piece at a time.


		if(!replyfd) { /* mkfifo */
			memcpy(paths[npaths++], path, plen);

Registering a new path is super easy. Should probably check for out-of-bounds here, but this works for now.


		} else { /* open */
			/* Search for path */
			for(i = 0; i < npaths; i++) {
				if(*paths[i] && strcmp(path, paths[i]) == 0) {
					i += 3; /* 0-2 are reserved */
					i += TASK_LIMIT; /* FDs reserved for tasks */
					write(replyfd, &i, 4);
					i = 0;
					break;
				}
			}

Linear search for the path. If we find it, adjust the index by the reserved indices and write the reply. We ignore empty paths, to allow us the possibility of pipes without names that the pathserver knows about, but you can’t usefully search for. Notice also that we set i back to zero after writing a reply, because:


			if(i >= npaths) {
				i = -1; /* Error: not found */
				write(replyfd, &i, 4);
			}
		}
	}
}

If we shoot off the end, then i >= npaths, in which case we didn’t find it. Well, don’t let the caller just block forever waiting for a reply that’s not coming! Send them -1.

Now we’ll just tie this all together with some tasks that use this new functionality:

# TASK_LIMIT 3

void otherguy() {
	int fd;
	unsigned int len;
	char buf[20];
	mkfifo("/proc/0", 0);
	fd = open("/proc/0", 0);
	while(1) {
		read(fd, &len, 4);
		read(fd, buf, len);
		bwputs(buf);
	}
}

void first(void) {
	int fd;

	if(!fork()) pathserver();
	if(!fork()) otherguy();

	fd = open("/proc/0", 0);
	while(1) {
		int len = sizeof("Ping\n");
		char buf[sizeof("Ping\n")+4];
		memcpy(buf, &len, 4);
		memcpy(buf+4, "Ping\n", len);
		write(fd, buf, len+4);
	}
}

We spin up the pathserver and the otherguy. The otherguy just waits for incoming strings (19 characters max, but it’s just a demo) and prints any that come in. Then, our first task run in a loop forever sending Ping to the otherguy, and they get printed.

We just built a rudimentary serial port driver! It’s not using interrupts properly yet, but at least strings send to the otherguy are printed atomically, because the otherguy “owns” the serial port and we don’t have all the tasks preempting each other in the middle of a write.

Code for this section on Github.

We’re done!

We now have all the machinery we need in order to send messages between processes. Next time, we’ll look at using all of this to build a more fully-functional serial port driver.

The code so far is on Github.

How to force the default route to always use a specific interface on Ubuntu

Posted on

This post was written by Dave Vandervies (@dj3vande).

I think Ubuntu inherits the things that need to be configured for this to work directly from Debian; if that’s correct this information will be useful without modification for Debian systems. It may also be possible to configure dhclient on other Linuxes (and other *nixes) according to these notes; I don’t know whether the guts of dhclient-script come from Debian or from dhclient. In any case, looking at what dhclient-script does will be a good start.

So, I have a machine that’s attached to two networks.

Both networks have DHCP servers. Both DHCP servers give out router configuration. But one of those default routes is useless, so I want to force the system to always use the other network segment’s router as its default route.

(My use case is a virtual machine; one VM network is NATted to the outside world, and one is isolated on the host machine. This VM gets run on different host machines, which assign different network addresses to their virtual segments, but which interface is on which type of segment is fixed by the VM configuration. So here, unlike many multi-interface machines, it’s necessary to tie the default route to a particular *interface* and not a router address.)

It turns out that this isn’t hard to do, but it took a lot of googling (and several rounds of “Find something interesting, search on new keywords”) to find a solution that actually did what I needed; all of the obvious searches turn up information about how to set the default route to a fixed router IP, which only works when you’re always on the same network.

Before I get into the theory of operation, here’s the solution, for
impatient people:

cat << 'EOF' > /etc/dhcp/dhclient-enter-hooks.d/restrict-default-route
## Only the DHCP server talking to eth0 is allowed to give us a default
## route.  Other interfaces only get local-segment configuration.
case ${interface} in
  eth0)
    ;;
  *)
    unset new_routers
    ;;
esac
EOF

So, here’s how this works, since that might be useful for other complicated DHCP configuration tricks you want to do.

When dhclient needs to make system configuration changes, it does it by invoking /sbin/dhclient-script. The first nontrivial thing dhclient-script does is run the entry hooks in /etc/dhcp/dhclient-enter-hooks.d to allow local configuration rules to adjust things as necessary.

These hooks are shell script fragments sourced by the main script, so you can use them to edit the variables that make dhclient-script do things if you like. This is precisely what we want to do. $new_routers is the default gateway the DHCP server gave us; we only want to actually use it for the DHCP server eth0 is talking to, so if we’re on any other interface we unset it, to prevent dhclient-script from taking any action based on it.

You can use the same trick to ignore DNS settings (new_domain_name and new_domain_name_servers) if you want to ignore one network’s DNS servers; I didn’t need that in my configuration because the two subnets’ nameservers behave identically. (But note that if you need something more complicated than simply ignoring one of them, you’ll probably need more than just an entry hook.)

Heterogenous Collections in Haskell

Posted on

One of the things statically typed languages like Haskell and C often don’t have great support for, is collections that contain elements that are not all of the same type. In a dynamically typed language one can store any element of any type in a collection, because the type is finally determined later when the value is used at runtime. With a statically typed language, the compiler wants to determine all the types at compile time, and so it often wants to force every element to be of the same time (homogeneous collections).

Normally when this problem arises, I find that I can rethink the problem in a way such that the solution does not require heterogeneous collections, but recently I’ve been working with one that, while I can solve it another way, the most natural solution seems to call for this.

The Haskell Wiki has a couple of suggestions for doing this. The easiest one, is just to use Data.Dynamic to do dynamic typing from inside Haskell. The problem with this is that it is both slower (the types of all values in the collection have to be checked at runtime) and less safe (if a bad value gets put in the data structure, it can cause the application to crash).

Another solution suggested there is exitstentially qualified datatypes. These solve the problem in a way that is safe (as long as you only need to perform operations from some typeclass on the data) and efficient, and is analogous to an upcast in a statically-typed Object-Oriented language. However, it can be a bit complex, and is a language extension.

A solution that will work in Haskell98 and give most of the benefits of exitstentially qualified datatypes (with less power) is to package up operations to be performed with the datatype, instead of the data itself. Any functions that you will want to call (such as those which are members of the typeclass you’re interested in) can be partially applied to the data, resulting in a function that is the exact same type as an equivalent function partially applied to some other instance of the same typeclass. You can thus put records full of these functions into a list or other collection, and it will work because they are all the same datatype. Then instead of calling (func x otherparams) you can just call (func otherparams) and get exactly the same result.

An example:

class SomeClass a where
	somefun :: a -> Int -> Bool

instance SomeClass Int where
	somefun = (>)

instance SomeClass [a] where
	somefun = (>) . length

data SomeFunAble = SomeFunAble {
		somefun_ :: (Int -> Bool)
	}

someFunAble :: (SomeClass a) => a -> SomeFunAble
someFunAble v = SomeFunAble (somefun v)

somelist = [someFunAble "hello", someFunAble (12::Int)]

otherlist = map (`somefun_` 12) somelist

On Language Extensions (in Haskell and Elsewhere)

Posted on

This was originally an email to the Haskell web-devel list and was also posted on Reddit.

Somebody claiming to be yi huang wrote:

Why prefer code generation over template Haskell? Isn’t them essentially the same thing, and template haskell is performed automatically.

Also, from Reddit (nicolast):

I never understood why someone would want to avoid using language extensions which have been in GHC for at least some time. The only reason I can think of is: compatibility with other compilers. But is anyone ever going to compile/run a yesod-routes based application using something other than GHC?!

First off, yes, Template Haskell is very similar to code generation. There are a few reasons I would like to avoid it.

It’s a language extension. I try to avoid those in general, in every stanardized language I code in (C89, R5RS, Haskell98) for several reasons.

As nicolast said, compatibility with other compilers is a big one. When I get a piece of code from someone who assumed that “what MSVC does” or “what Racket does” is the same as “anyone can run this”, it makes it quite difficult to use my favourite implementations of those languages. I don’t want to make assumptions about other people’s environments, or what will be useful in the future. Maybe someone writes a Haskell interpreter that makes use in some context I haven’t even imagined much nicer. Who knows.

Additionally, any other static analysis/code processing tools (like, say, hlint) *also* needs to support whatever syntax extensions you’re using (semantics extensions may or may not apply here, depending on the nature of the tool). Requiring that every tool author support all my favourite extensions limits my tool options, and makes life harder for tool authors (since they cannot just look in one place for the spec and write to that anymore if they need to look up compilers’ extensions as well).

Will anyone ever compile/run/analyze a yesod-routes based application using something other than GHC/hlint? (Actually, does hlint support TH? It might.) What specifically about yesod-routes makes this less likely? What drew me to Yesod.Routes.Dispatch was its relative purity in terms of extensions/dependencies, etc.

Additionally, I find Template Haskell specifically (and some other language extensions, like Overlapping Instances) can make code harder to read (for me) and possibly harder to reason about. A code generator makes a file that I can read for comprehension, edit if I want to, etc.

Ok, that’s a bit of a long answer to a short question, but it sort of sums up my motivation vis extensions in general and TH in particular.

BitBrawl: Cleanup and Multiplayer

Posted on

I’ve pushed some stuff up to the BitBrawl repo. It’s really starting to come together!

First off: the art/data is in the repo now. I’m using the grass from the base assets for the background for now. I’ve also built a custom character from some of the art by Johannes Sjölund, and put the credits in my COPYING file. This sprite sheet has more animations than just “walk”, which will be very useful when I start work on combat.

I have also done some simple obvious things: players can no longer walk off the edge of the screen. The control configuration start screen now gives useful names for the keys, instead of internal names. Art and data are searched for in system folders, with a fallback to the current directory, instead of hardcoded paths. On you can find the headphones you have been looking for.

< img src=”https://singpolyma.net/wordpress/wp-content/uploads/2012/07/bitbrawl2-300 alt=”BitBrawl with 2 players” >

I’ve also upgraded the drawing/physics code to actually draw and control all the N players who join the game! So, while all they can do for now is walk around, you can actually add as many players as you want! (There’s a small bug in that if you add more players that can be shown across the screen, the menu will draw some of them off-screen. I’m not sure yet how I want to handle that.)