Singpolyma

Priority Continuum Onyx Unboxing and Assembly (Photos)

Posted on

Rust Factory Without Box (Trait Object)

Posted on

I’ve been playing around a lot with Rust recently and it’s quickly becoming my second-favourite programming language. One of the things I’ve been playing with is some Object Oriented design concepts as they might apply. For example, consider this code:

fn format_year(n: i32) -> String {
	if n == 0 {
		"0 is not a year".to_string()
	} else if n < 0 {
		format!("{} BC", -n)
	} else {
		format!("{} AD", n)
	}
}

While maybe overkill for this small example, let’s go ahead and replace conditional with polymorphism:

fn format_year(n: Box<Year>) -> String {
	format!("{} {}", n.year(), n.era())
}

trait Year {
	fn year(&self) -> u32;
	fn era(&self) -> String;
}

impl Year {
	fn new(n: i32) -> Box<Year> {
		if n == 0 {
			Box::new(YearZero())
		} else if n < 0 {
			Box::new(YearBC(-n as u32))
		} else {
			Box::new(YearAD(n as u32))
		}
	}
}

struct YearZero();

impl Year for YearZero {
	fn year(&self) -> u32 { 0 }
	fn era(&self) -> String { "is not a year".to_string() }
}

struct YearBC(u32);

impl Year for YearBC {
	fn year(&self) -> u32 { self.0 }
	fn era(&self) -> String { "BC".to_string() }
}

struct YearAD(u32);

impl Year for YearAD {
	fn year(&self) -> u32 { self.0 }
	fn era(&self) -> String { "AD".to_string() }
}

This works, and really does seem to mimic the way this kind of design looks in a class-based Object Oriented language. It has a major disadvantage, however: all our objects are on the heap now (which is likely to cause performance issues). In some cases, this can be fixed by using CPS so that the trait objects could be borrowed references instead of boxed, but that’s both ugly and not always an option. One other design might be to use an enum:

fn format_year(n: Year) -> String {
	format!("{} {}", n.year(), n.era())
}

enum Year {
	YearZero,
	YearBC(u32),
	YearAD(u32)
}

impl Year {
	fn new(n: i32) -> Year {
		if n == 0 {
			Year::YearZero
		} else if n < 0 {
			Year::YearBC(-n as u32)
		} else {
			Year::YearAD(n as u32)
		}
	}

	fn year(&self) -> u32 {
		match self {
			YearZero => 0,
			YearBC(y) => y,
			YearAD(y) => y
		}
	}

	fn era(&self) -> u32 {
		match self {
			YearZero => "is not a year".to_string(),
			YearBC(y) => "BC".to_string(),
			YearAD(y) => "AD".to_string()
		}
	}
}

No more heap allocations! While this is obviously analogous, some might claim we haven’t actually “replaced conditional” at all, though we have at least contained the conditionals in a place where a type only knows about itself, and not about other things that might get passed in. Even if you accept adding match arms on self as “extension”, in terms of open/closed this requires a modification to at least the enum and the factory to add a new case, instead of just the factory as with the trait version.

What is it about the enum version that allows us to avoid the boxing? Well, an enum knows what all the possibilities are, and so the compiler can know the size that needs to be reserved to store any one of those. With the trait case, the compiler can’t know how big the infinite world of possibilities that might implement that trait could be, and so cannot know the size to be reserved: we have to defer that to runtime and use a box. However, the factory will always actually return only a known list of trait implementations… can we exploit that to know the size somehow? What if we create an enum of the structs from the trait version and have the factory return that?

enum YearEnum {
	YearZero(YearZero),
	YearBC(YearBC),
	YearAD(YearAD)
}

impl Year {
	fn new(n: i32) -> YearEnum {
		if n == 0 {
			YearEnum::YearZero(YearZero())
		} else if n < 0 {
			YearEnum::YearBC(YearBC(-n as u32))
		} else {
			YearEnum::YearAD(YearAD(n as u32))
		}
	}
}

impl std::ops::Deref for YearEnum {
	type Target = Year;

	fn deref(&self) -> &Self::Target {
		match self {
			YearEnum::YearZero(x) => x,
			YearEnum::YearBC(x) => x,
			YearEnum::YearAD(x) => x
		}
	}
}

The impl std::ops::Deref will allow us to call any method in the Year trait on the enum as returned from the factory, allowing this to effectively act as a trait object, but with no heap allocations! This seems like exactly what we want, but it’s a lot of boilerplate. Luckily, it’s very mechanical so creating a macro to do this for us is fairly easy (and I’ll throw in a bunch of other obvious trait implementations while we’re at it):

macro_rules! trait_enum {
	($trait:ident, $enum:ident, $( $item:ident ) , *) => {
		enum $enum {
			$(
				$item($item),
			)*
		}

		impl std::ops::Deref for $enum {
			type Target = $trait;

			fn deref(&self) -> &Self::Target {
				match self {
					$(
						$enum::$item(x) => x,
					)*
				}
			}
		}

		impl From<$enum> for Box<$trait> {
			fn from(input: $enum) -> Self {
				match input {
					$(
						$enum::$item(x) => Box::new(x),
					)*
				}
			}
		}

		impl<'a> From<&'a $enum> for &'a $trait {
			fn from(input: &'a $enum) -> Self {
				&**input
			}
		}

		impl<'a> AsRef<$trait + 'a> for $enum {
			fn as_ref(&self) -> &($trait + 'a) {
				&**self
			}
		}

		impl<'a> std::borrow::Borrow<$trait + 'a> for $enum {
			fn borrow(&self) -> &($trait + 'a) {
				&**self
			}
		}

		$(
			impl From<$item> for $enum {
				fn from(input: $item) -> Self {
					$enum::$item(input)
				}
			}
		)*
	}
}

And now to repeat the first refactoring, but with the help of this new macro:

fn format_year<Y: Year + ?Sized>(n: &Y) -> String {
	format!("{} {}", n.year(), n.era())
}

trait Year {
	fn year(&self) -> u32;
	fn era(&self) -> String;
}

trait_enum!(Year, YearEnum, YearZero, YearBC, YearAD);

impl Year {
	fn new(n: i32) -> YearEnum {
		if n == 0 {
			YearZero().into()
		} else if n < 0 {
			YearBC(-n as u32).into()
		} else {
			YearAD(n as u32).into()
		}
	}
}

struct YearZero();

impl Year for YearZero {
	fn year(&self) -> u32 { 0 }
	fn era(&self) -> String { "is not a year".to_string() }
}

struct YearBC(u32);

impl Year for YearBC {
	fn year(&self) -> u32 { self.0 }
	fn era(&self) -> String { "BC".to_string() }
}

struct YearAD(u32);

impl Year for YearAD {
	fn year(&self) -> u32 { self.0 }
	fn era(&self) -> String { "AD".to_string() }
}

We do still have two places with must be modified rather than extended (the macro invocation and the factory), but all other code can be written ignorant of those and in the same style as using a normal trait object. The normal trait objects can even be recovered using various implementations the macro creates, or even just by doing &* on the enum. Benchmarking these three styles on a somewhat more complex example actually found this last one to also be the most performant (though only marginally faster than the pure-enum approach), and the boxed-trait-object style to be more than three times slower.

So there you go, next time you ask yourself if you want the flexibility of a trait or the size guarantees and performance of an enum, maybe grab a macro and say: why not both!

Error Handling in Haskell

Posted on

When I first started learning Haskell, I learned about the Monad instance for Either and immediately got excited. Here, at long last, was a good solution to the error handling problem. When you want exception-like semantics, you can have them, and the rest of the time it’s just a normal value. Later, I learned that the Haskell standard also includes an exception mechanism for the IO type. I was horrified, and confused, but nothing could have prepared me for what I discovered next.

While the standard Haskell exception mechanism infects all of IO it at least has a single, well-defined type for possible errors, with a small number of known cases to handle. GHC extends this with a dynamically typed exception system where any IO value may be hiding any number of unknown and unknowable exception types! Additionally, all manner of programmer errors in pure code (such as pattern match failures and integer division by zero) are thrown into IO when they get used in that context. On top of everything, so-called exceptions can appear that were not thrown by any code you can see but are external to you and your dependencies entirely. There are two classes of these: asynchronous exceptions thrown by the runtime due to a failure in the runtime itself (such as a HeapOverflow) and exceptions thrown due to some impossible-to-meet condition the runtime detects (such as detectable non-termination). Oh, I almost forgot, manually killing a thread or telling the process to exit are also modeled as “exceptions” by GHC.

Once the initial decision to have a dynamically typed exception system was made, everything that could make use of an exception-like semantic in any case was bolted on. What am I going to do, though? Write my own ecosystem and runtime that works how I would prefer? No, I’m going to find a way to make the best of the world I’m in.

When dealing with this situation, there are two separate and equally important things to consider: exception safety, and error handling. Exception safety describes the situation when you are, for example, acquiring and releasing resources (such as file handles). You want to be sure you release the resource, even if the exception system is going to abort your computation unceremoniously. You can never know if this will happen or not, since the runtime can just throw things at you, you always need to wrap resource acquisition/release paths in some exception safety. There are a lot of complex issues here, but it’s not the subject of this post so suffice to say the main pattern of interest for dealing with this is bracket.

Error handling is totally different. This is where you want to be able to recover from possible recoverable errors and do something sensible. Retry, save the task for later, alert the user that their request failed, read from cache when the network is down, whatever.

The first move in this area that I saw that I liked, was the errors package. Many helpers for dealing with error values, and in earlier versions a helper that would exclude unrecoverable errors and convert the rest to error values. I liked this pattern, but wanted more. This is Haskell! I wanted to know, at a type level, when recoverable errors had already been handled. Of course, programmer errors in pure code and unrecoverable errors from the runtime are always possible, so we can’t say anything about them at the type level, but recoverable errors we could know something about. So I wrote a package, iterated a few times, and eventually became a dependency for the helper in the errors package that I had based my whole idea on. Until very recently, errors and unexceptionalio were the two ways I was aware of to handle recoverable errors (and only recoverable errors) reliably, and know at a type level that you had done so. Recently errors decided to change the semantic to fit the previously-misleading documentation of the helper and so unexceptionalio now stands alone (to my knowledge) in this area.

In light of this new reality, I updated the package to make it much more clear (both in documentation and at a type level) what hole in the ecosystem this fills. I exposed the semantic in a few more ways so it can be useful even to people who don’t care about type-level error information. I also named the unrecoverable errors. Things you might want to be safe from, or maybe log and terminate a thread because of, but never recover from. For now, I call these PseudoException.

UnexceptionalIO (when used on GHC) now exposes four instances of Exception that you can use even if you have no use for the rest of the package: ExternalError (for things the runtime throws at you, asynchronously or not), ProgrammerError (for things raised from mistakes in pure code), PseudoException (includes the above and also requests for the process to exit), and SomeNonPseudoException (the negation of the above). All of these will work with the normal GHC catch mechanisms to allow you to easily separate these different uses for the exception system.

From there, the package builds up a type and typeclass with entry and exit points that ensure that values in this type contain no SomeNonPseudoException. The type is only ever used in argument position, all return types are polymorphic in the typeclass (implemented for both UIO and IO, as well as for all monad transformers in the unexceptional-trans package) so that you can use them without having to commit to UIO for your code. If you use the helpers in a program that is all based on IO still, it will catch the exceptions and continue just fine in that context.

Finally, the latest version of the package also exposes some helpers for those cases where you do want to do something with PseudoException. For exception safety, of course, there is bracket, specialized to UIO. For knowing when a thread has terminated (for any reason, success or failure) there is forkFinally, specialized to UIO and PseudoException. Finally, to make sure you don’t accidentally swallow any PseudoException when running a thread, there is fork which will ignore ThreadKilled (you assumedly did that on purpose) but otherwise rethrow PseudoException that terminate a thread to the parent thread.

This is hardly the final word in error handling, but for me, this provides enough sanity that I can handle what I need to in different applications and express my errors at a type level when I want to.

Haskell2010 Dynamic Cast

Posted on

This post is not meant to be a suggestion that you should use this code for anything. I found the exploration educational and I’m sharing because I find the results interesting. This post is a Literate Haskell file.

Programmers often mean different things when they say “cast”. One thing they sometimes mean is to be able to use a value of one type as another type, converting as possible.

We’ll use dynamic typing to allow us to check the conversions at runtime.

> module DynamicCast (DynamicCastable(..), dynamicCast, Opaque) where
> import Data.Dynamic
> import Data.Void
> import Control.Applicative
> import Control.Monad
> import Data.Typeable
> import Text.Read (readMaybe)
> import Data.Traversable (sequenceA)

But we don’t want to expose the dynamic typing outside of this module, in case people become confused and try to use a Dynamic they got from elsewhere. Really we’re just using the Dynamic as an opaque intermediate step.

> newtype Opaque = Opaque Dynamic

Types can define how they both enter and exit the intermediate representation. This both allows casting existing types to new types, but also can allow casting new types to existing types without changing the instances for those existing types.

> class (Typeable a) => DynamicCastable a where
> 	toOpaque :: a -> Opaque
> 	toOpaque = Opaque . toDyn
>
> 	fromOpaque :: Opaque -> Maybe a
> 	fromOpaque (Opaque dyn) = fromDynamic dyn

And finally the cast itself.

> dynamicCast :: (DynamicCastable a, DynamicCastable b) => a -> Maybe b
> dynamicCast = fromOpaque . toOpaque

Let’s see some examples.

We’ll say that Integer and simple lists (including String) represent themselves and define no specific conversions.

> instance DynamicCastable Integer
> instance (Typeable a) => DynamicCastable [a]

Int is represented however Integer represents itself.

Anything that can convert to Integer can convert to Int.

Any String that parses using read as an Int can also convert to an Int.

> instance DynamicCastable Int where
> 	toOpaque = toOpaque . toInteger
> 	fromOpaque o = fromInteger <$> fromOpaque o <|> (readMaybe =<< fromOpaque o)

And now

dynamicCast (1 :: Int) :: Maybe Integer
Just 1

dynamicCast (1 :: Integer) :: Maybe Int
Just 1

This is pretty obvious and boring, but perhaps it gives us confidence that this is going to work at all. Let’s try something fancier.

Void is the type with no inhabitants, so it can never be converted to.

> instance DynamicCastable Void where
> 	fromOpaque _ = Nothing

Either is represented as just the item it contains, and any item can be contained in an Either.

> instance (DynamicCastable a, DynamicCastable b) => DynamicCastable (Either a b) where
> 	toOpaque (Left x) = toOpaque x
> 	toOpaque (Right x) = toOpaque x
>
> 	fromOpaque o = Left <$> fromOpaque o <|> Right <$> fromOpaque o

And now

dynamicCast 1 :: Maybe (Either Int Void)
Just (Left 1)

dynamicCast 1 :: Maybe (Either Void Int)
Just (Right 1)

dynamicCast (Left 1 :: Either Int Void) :: Maybe Int
Just 1

Maybe is very similar, store the Just as the unwrapped value, and store Nothing as Void.

> instance (DynamicCastable a) => DynamicCastable (Maybe a) where
> 	toOpaque (Just x) = toOpaque x
> 	toOpaque Nothing = toOpaque (undefined :: Void)
>
> 	fromOpaque = fmap Just . fromOpaque

dynamicCast (Left 1 :: Either Int Void) :: Maybe (Maybe Int)
Just (Just 1)

To be able to cast the contents of a Functor, the possible failure also lives in the Functor, so we need a wrapper.

> newtype FunctorCast f a = FunctorCast (f (Maybe a))

> mkFunctorCast :: (Functor f) => f a -> FunctorCast f a
> mkFunctorCast = FunctorCast . fmap Just

> runFunctorCast :: FunctorCast f a -> f (Maybe a)
> runFunctorCast (FunctorCast x) = x

> runTraversableFunctorCast :: (Traversable f) => Maybe (FunctorCast f a) -> Maybe (f a)
> runTraversableFunctorCast = join . fmap (sequenceA . runFunctorCast)

> instance (Functor f, DynamicCastable a, Typeable f) => DynamicCastable (FunctorCast f a) where
> 	toOpaque = Opaque . toDyn . fmap toOpaque . runFunctorCast
> 	fromOpaque (Opaque dyn) = FunctorCast . fmap fromOpaque <$> fromDynamic dyn

runTraversableFunctorCast $ dynamicCast (mkFunctorCast ["1"]) :: Maybe [Either Void Integer]
Just [Right 1]

Thoughts After #ccsummit

Posted on

This year I again attended the Creative Commons Global Summit. There were many great sessions and participants both last year and this year, but I’ve had this growing feeling I can’t shake.

My emotions were best summed up by a conversation I had with a friend just after the conference had ended. “How was your conference?” he asked. “Oh, it was good.” I replied. “Lots of Free Culture?” he asked. There was an awkward pause, “Uh… not really.” “Yeah,” he said, “Free Culture is dead. We lost.”

That’s not to say the the causes of Libraries, Museums, Open Access, Open Science, and Open Data are not good. It’s not to say the conference wasn’t full of many noble things that I support. But Free Culture is just not on anyone’s agenda.

I’ve been saying for awhile that it seems like the Free Culture community has lost leadership and momentum, that there is no rally point, no meeting place. But the truth may be that this is just the symptom of having moved on.

I think there are a couple of reasons for this. One is that the Free Culture movement was always born out of a desire to access *existing* cultural resources, and not to replace them. This means that Free Culture advocates are more likely to spend effort advocating for Fair Dealing / Fair Use, exceptions to copyright, and “balance” rather than promoting Free Culture artists and works. There has also just been more success with education. No work of Free Culture (unless you stretch to include Wikipedia) has had any mainstream cultural impact. The Open Access and OER movements by contrast are changing the face of education around the world. It’s also just a function of students and others who were heavily involved maturing and losing the free time they had to spend on movement activities.

But what could it mean for Free Culture to be “dead”? According to last year’s State of the Commons report, there are on the order of 780 million Free Culture works on the Internet, and the number is growing daily. The licenses are alive and well, and while this report may over-count some things (due to user error, etc, when marking the license of a work) it seems like there is a lot of Free Culture out there. I think, however, that the movement has stalled. Free Culture is just an option in a dropdown on a hosting platform now.

Some might see this as a sign of success. At one point, no one had heard of Free Culture or Creative Commons, now many major hosting platforms offer the licenses as an option. I think it depends on what you see as the goal, and what you think of as success.

For me, right now, short-term success is getting to the point where at least one body of Free Cultural work (needs to be more than a standalone work, probably, to have staying power) achieves mainstream cultural impact. Where I could mention this body of work to someone totally unrelated to the movement and they would have at least a halfway chance of having heard of it.

Awhile ago now the main Vlogbrothers channel went CC-BY. This is certainly a body of work with a reasonable amount of success, and so is very good progress in this area.

However, my favourite to support for this remains Pepper & Carrot a born-free, community-supported webcomic by David Revoy. Revoy is a Free Culture and Free Software supported, and he really seems to understand why this sort of thing is necessary, and how to go about it. He not only embraces, but encourages and promotes all kinds of derivative works both non-commercial and commercial in nature. I think this may be a real shot to have a somewhat-well-known, ongoing franchise with only loose central management. I will continue to do what I can, and continue to remain hopeful.

If you want to connect with like-minded Free Culture enthusiasts, I recommend joining the WIFO Forum.