It’s really great that we’re slowly seeing multiple microblogging platforms crop up. One of the problems with federating all our data across the web, however, is that there’s no way to really track what’s going on. No way to search. If all you care about is Twitter you can use search.twitter.com, and identi.ca and others similarly have their own search for one site, but where is the central search option?
Well, it turns out that it’s really not that hard to build one. Using the powerful libre Xapian full text index solution and PubSubHubBub, a microblog search engine doesn’t even need a crawler.
I’ve launched an early version of this over at µsearch.singpolyma.net or the easier to type musearch.singpolyma.net. To seed content there is a script polling identi.ca and rstat.us periodically, until they implement a PSHB firehose, but any microblog site can get itself added by just adding PSHB feeds with the box on the homepage.
I make no promises at this point about the quality or longevity of the data in the search engine. Please report any issues or features you would like to see.
Of course the source is available under the ISC license. Feel free to submit patches!
Query Syntax
Just typing words will search all the metadata and content of posts for those words. Searching in quotes will search for a phrase. You can also prefix words or phrases with a field name to search only one bit of metadata. The currently indexed fields are:
- content
- category (like hashtags)
- in_reply_to
- bookmark (permalink)
- author
- to (mentioned users)
This is both a summary of, and an expansion on, discussions with Tantek Çelik and Kevin Marks at PubStandards LXVII about Microformats 2.
The Positive
- Optimizations for root-class-only when it makes sense. Some people really want this. I don’t care much either way, but it seems fine.
- Flat sets of properties are how I actually store some microformats in practice. Some things you want to be able to have more than one (like address), but it seems the plan is to make all such cases embedded microformats objects. Seems fine.
The Meh
Changing root class of hCard from vcard to hcard. This seems like an unnecessary complexity and just adds yet another case to look for. Still, whatever, not a big deal. I’m not for it, though. It seems this is actually not part of the proposal at all. It’s just part of the thought that leads to h-
Changing all root class names to have an h- prefix is in basically the same boat. h- is no better than just h or any existing names anyway, but other than making parsing a bit more complex and authoring a bit more ugly it’s not a big deal.
The Regressions
Update: After IMing with Tantek it seems that a lot of the syntax changes are rooted in the fact that, useful or not, many people want an RDFa/microdata (XML/RDF/JSON) -like generic syntax, but all the ones they are creating are too complicated for web designers (who don’t want to use any attributes they are not already using) and so we need to create a syntax to do this, since people are going to do it anyway, and that syntax needs to be designer friendly, so that designers don’t hate us.
- Simple “universal parsing” (ie: prefixes on class names). Not useful (since universal parsing just gets you an internal representation that you later have to write specific code to understand the vocabulary for anyway. Bad because it means microformats syntax would no longer be compatible with POSH (Plain Old Semantic HTML). This is a serious regression. I have always been a fan of the microformats “paving the cowpaths”, that is, trying as much as possible to just converge the semantic efforts authors are trying anyway so that we all use the same vocabularies.
- Typed class names. Similar problem to the first point, but I wanted to call out types specifically as a problem. The vocabulary specifies the type so this information is redundant.
- An advantage is that authors can tell which classes are “from microformats”. This is only an advantage if you think of microformats as competing with microdata and similar: that is, encoding data in the page. I’ve always seen microformats as just suggestions for what semantics you ought to use in knows cases when you would otherwise be making up your own markup.
- Vendor extensions are also an import from the bad world of arbitrary metadata.
It has been some time now since Lawrence Lessig released Code 2.0, the second edition to his book Code and Other Laws of Cyberspace. I have not yet read the book, however. Why not? Because it was not available in a suitable format.
The books is available from its website as a free PDF download. Unfortunately, PDF is a format designed to do one thing: lay things out to print. I do not want to print the book (if I wanted that, I would just buy a hard copy!) No, I want to read it on my a screen. In this case, the screen on my phone (an n900 running fbreader). PDFs are laid out as a sequence of pages, sized for print. Any screen that is not at least the size of these pages ends up panning and scrolling in horrible ways. PDF readers also lack the sorts of features one wants out of an “ebook” (such as being able to automatically resume from where one left off).
Converting the PDF to a sensible format has proved to be very difficult, if not impossible, without a huge amount of manual work. Not worth it. Lessig, however, also hosts a copy of the book contents in a wiki. So, today I finally got around to writing a script to screen scrape the wiki for content, and then massaged it a bit for ideal eBook experience.
So, for anyone who wants it, I release here an eBook-friendly (X)HTML version of the book, licensed (as the source) under a Creative Commons Attribution-ShareAlike 2.5 license. Download the eBook
Version 1.0 of the DiSo Actionstream plugin for WordPress has finally been released! The upgrade for this version may be a little rough, because the entire data storage model has changed since the last release. Let me know if you have any trouble!
For Debian users, there is a wordpress-diso-actionstream package in my APT repository.
The University of Waterloo has joined the eduroam international educational institution WiFi access exchange program. I am very happy about this, since eduroam uses good WPA2+RADIUS authentication instead of stupid captive portal stuff. Since I had to do some research to figure out how to connect from my N900, I thought I would share that here.
- Go to Settings > Internet connections > Connections > New
- Select WPA with EAP for security.
- EAP type: PEAP
- Certificate: none
- EAP method: EAP MSCHAPv2
- User name: user@uwaterloo.ca
- Password: Your Quest password
- Hit “Advanced”
- Go to the EAP tab
- Check the box “Use manual user name” and enter user@uwaterloo.ca again
The last three steps are the extra part. I go this information from this thread. I have no idea what the difference is between a “manual user name” and the username that is part of the login prompt. Maybe someone else knows?