Singpolyma

Archive of "Microformats"

Archive for the "Microformats" Category

Microformats 2

Posted on

This is both a summary of, and an expansion on, discussions with Tantek Çelik and Kevin Marks at PubStandards LXVII about Microformats 2.

The Positive

  • Optimizations for root-class-only when it makes sense. Some people really want this. I don’t care much either way, but it seems fine.
  • Flat sets of properties are how I actually store some microformats in practice. Some things you want to be able to have more than one (like address), but it seems the plan is to make all such cases embedded microformats objects. Seems fine.

The Meh

Changing root class of hCard from vcard to hcard. This seems like an unnecessary complexity and just adds yet another case to look for. Still, whatever, not a big deal. I’m not for it, though. It seems this is actually not part of the proposal at all. It’s just part of the thought that leads to h-

Changing all root class names to have an h- prefix is in basically the same boat. h- is no better than just h or any existing names anyway, but other than making parsing a bit more complex and authoring a bit more ugly it’s not a big deal.

The Regressions

Update: After IMing with Tantek it seems that a lot of the syntax changes are rooted in the fact that, useful or not, many people want an RDFa/microdata (XML/RDF/JSON) -like generic syntax, but all the ones they are creating are too complicated for web designers (who don’t want to use any attributes they are not already using) and so we need to create a syntax to do this, since people are going to do it anyway, and that syntax needs to be designer friendly, so that designers don’t hate us.

  • Simple “universal parsing” (ie: prefixes on class names). Not useful (since universal parsing just gets you an internal representation that you later have to write specific code to understand the vocabulary for anyway. Bad because it means microformats syntax would no longer be compatible with POSH (Plain Old Semantic HTML). This is a serious regression. I have always been a fan of the microformats “paving the cowpaths”, that is, trying as much as possible to just converge the semantic efforts authors are trying anyway so that we all use the same vocabularies.
  • Typed class names. Similar problem to the first point, but I wanted to call out types specifically as a problem. The vocabulary specifies the type so this information is redundant.
  • An advantage is that authors can tell which classes are “from microformats”. This is only an advantage if you think of microformats as competing with microdata and similar: that is, encoding data in the page. I’ve always seen microformats as just suggestions for what semantics you ought to use in knows cases when you would otherwise be making up your own markup.
  • Vendor extensions are also an import from the bad world of arbitrary metadata.

POSHstream, a POSHformat profile of activitystrea.ms

Posted on

This is just a post to describe a proposal for a POSHformat that profiles the ActivityStreams ATOM extension.

It only really makes sense to use these fields in the context of hAtom, and the fields of hAtom are used in the same way as the equivalent ATOM fields are used by the ATOM extension. In other words, this is an “hAtom extension”.

Object Identifiers

Object Identifiers (for types and verbs) can be specified using descriptive text (such as “link” or “weblog entry”) but a more unambiguous IRI MAY be specified using the value class pattern. Descriptive text SHOULD conform to agreed-upon, or at least published, type names, similarly to how ActivityStreams Object Identifiers should be defined in published specs.

class=object-type

The content of the object-type field MUST be an Object Identifier.

This field specifies the type of the Object described by this entry. An entry MAY have more than one object-type or no object-type.

Activity feed processors SHOULD use the most specific Object Type that they understand within each entry. The processor MAY use other information to infer the Object Type if it cannot understand any of the Object Types given.

class=verb

The content of the verb field MUST be an Object Identifier.

This field defines the verb for this activity. An entry MAY have more than one verb but MUST have at least one.

Activity feed processors SHOULD use the most specific Verb that they understand within each entry. If none of the Activity’s Verbs are understood by the processor, the processor MAY use other information to infer the Verb, or the processor MAY use the content of the hAtom title, summary and/or content to obtain a sentence describing the Activity.

class=object

This class MUST appear along with one or more others that describe another POSHformat or microformat OR appear on a tag that inherently describes an external resource, such as <a>, <object>, or <img>. The object data is extracted according to this other format.

An additional object-type SHOULD be inferred from the other class(es), type attribute, and/or tag name present on this element.

class=target

The target field contains information about the target of the activity, for verbs that support a target. The target is the object that the action was done to. This field MUST appear along with one or more others that described another POSHformat or microformat OR appear on a tag that inherently describes an external resource, such as <a>, <object>, or <img>, similarly to class=object.

The precise meaning of the target as relates to the activity depends on the verb in use, but its meaning is somewhat similar to the English preposition “to”. The target extension element MUST NOT be used for indirect objects that are not targets.

actor

The actor is taken from the class=author hCard present in the hAtom container. An additional class “actor” MAY be added to the hCard in keeping with the ATOM extension.

The ‘post’ Verb

This verb has IRI http://activitystrea.ms/schema/1.0/post and descriptive text “posted”.

If the language of the activity is not English, the IRI or descriptive text should be provided using the value class pattern.

This Verb describes the act of posting or publishing an Object on the web. The implication is that before this Activity occurred the Object was not posted, and after the Activity has occurred it is posted or published.

If an activity using this verb has a target element, the target object is the collection in which the item was posted.

An example


	<div class="hentry">
		<span class="author vcard">
			<span class="fn">Geraldine</span>
		</span>
		<span class="verb">posted</span>
		a <span class="object-type">photo</span>
		on PhotoPanic
		@ <span class="published">2008-11-02T15:29:00Z</span>
		<img class="object entry-title" alt="My Cat" src="/geraldine/photo1.jpg" />
	</div>

DiSo Gets Search

Posted on

Tantek Çelik has purchased socialsearchme.com for this service! Thanks!

Never tweet about something you don’t want to go public.  I’ve been annoying my followers for some time now about my new social search engine.  Tantek then linked to it from his WordCamp SanFrancisco presentation.  Not that I’m upset at all.  I’m ecstatic that he thought it was worth linking to!  Still, a word to the cautious 😉

So how does this search engine work? What does it do? Basically, it’s an hCard search engine.  Unlike the Yahoo or Technorati Kitchen implementations, however, this search is focused on social networking and profiles.  If DiSo were Facebook, this could be the friend search functionality.  So instead of having the results be links to pages that contain matching hCards, the results are profiles with social networking data (including contacts) and names, etc.

One other key thing that is different here from pure hCard search is that I am only spidering representative hCards (with some small hacks for well-known sites like Twitter).  This means I don’t spider arbitrary hCard data, instead I am only indexing profile pages.  I use both XFN parsing and the SGAPI to verify claims that two pages represent the same person, and then associate them.  Data from both pages goes into the index as if it were all on one page.  Only one page needs an hCard, since connections are made through rel=me and XFN.  This way, although my profile is on my main page and my contacts are at singpolyma.net/contacts, the search engine indexes them both.

To find new pages to index, I spider along XFN (and FOAF, since I also ask the SGAPI) to find pages likely to have the sort of data I’m looking for.  Interestingly enough, this means that social networks like Twitter, Pownce, and Digg, who support hCard and XFN, get almost completely indexed.  There are over 100000 profiles in the index now, and I have only given it one manually : singpolyma.net.

I’m not entirely sure how the data will be useful yet, but I’m really excited about the possibilities.  I firmly believe in making XFN lists, static though they may be, come alive with potential through layers of functionality, be in through plugins, 3rd party services, or bookmarklets.

Speaking of bookmarklets, I have one.  Go to that page, add the bookmarklet, and visit my contacts page (or any other page with lots of XFN data).  Click it and watch that boring list of links and names turn into a more functional social-networking list.

The code has been released under an MIT-style license on my repository.  Front-end is PHP, back-end is Ruby.

DiSo : on our way to fixing your addressbook 😉

Extending Microformats: a Return to XOXO

Posted on

I haven’t written about the XOXO microformat in some time, but some recent discussions caused me to dig into my archives to source a new post.  Microformats tend to follow the rule of only formalizing the most common of existing publishing patterns (the 80-20), meaning that some more “edge case” data cannot be represented.  Does this mean that this data is useless?  Not at all: but it is outside the realm of microformats, at least for now.  So we either need to invent something new, or extend what we have.

A Page from Recent History

This is not a new problem.  Every formalised standard is going to face those who feel that their bit of metadata should be included.  Take, as an example, the RSS 2.0 spec.  Core essentials of news feeds are present: title, description, date, etc.  Lots of metadata is missing though: author name, comment counts, comment feed URLs, ane more.  People solved this problem in two very different ways: some extended, and some invented something new.

Extending RSS (or any XML format) is easy: create a namespace, add your elements, publish.  If a particular piece of metadata is popular it gets standardised in a spec’ed extension (dc:creator, slash:comments, wfw:commentRss, etc).  The benefit of this approach is that all existing parsers can still read your content.  If a parser doesn’t need your extra metadata, it can safely ignore it and present just the core content.  No new code needs to be written, and no new formats need to be learned for 80% of the applications.

There was another group interested in solving this problem: the ATOM group.  They threw away all the existing formats (RSS 2.0 and RSS 1.0/RDF) and built something brand new from scratch to accomodate their data needs.  What was the result?  Feed aggregators everywhere had to write all-new code to handle this new, incompatible, and often more complicated case.  Time and effort was wasted both in code and user education (unlearn “What is RSS” learn “What is ATOM” / “What are feeds”).  Once the standard hit a spec’ed form, what happened?  People began to use namespaces in ATOM as well, because for all the “better” it was, for some edge cases it just wasn’t “better” enough.

Back to XOXO

It seems the key is to be easily extendable, not to think of everything up front.  If microformats are going to make their way into lots of APIs and not just be used for better page scraping (Ma.gnolia does a good job of this), then extensability is necessary.  Fortunately, XOXO provides an easy solution.  Check this out:

<ul>
<li class="vcard">
<dl>
<dt>fn</dt>
<dd class="fn">Martha</dd>
<dt>Anniversary</dt>
<dd>2005-02-04</dd>
</dl>
</li>
</ul>

An hCard parser can read that.  For a normal use case, no new code is needed.  An XOXO parser can read that, and if it knows about hCard will likely know what “fn” means.  The other data is there, though.  The parser has that data.  Minimal new code, and all the data can be used.  Cool or what?

ActionStream 0.40 and DiSo Profile 0.25

Posted on

I have updated two of my DiSo plugins: Profile and ActionStream.

The profile updates mostly involve some code cleanup, a page here documenting it, and a new API to add permissions options to the permissions page.

The ActionStream update is a bit more extensive:

  • Support for coComment
  • Code cleanup, of course
  • RSS2 output option, linked from the stream output (add &full for a different view)
  • Reportedly working in WP2.5 with a patch I accepted
  • Better Safari support
  • If you disable showing your service usernames they are also hidden in the collapsed items
  • Abitily to set permissions on updates from each service (if wp-diso-profile0.25 is installed)