Technical Blog

Thinking About Aggregators

Posted on

So, I’ve been thinking for awhile about the aggregator experience I want to have for reading my blogs and microblogs. As I’ve increasingly been moving to my own infrastructure for both, my subscription experience has been evolving as well. Right now I have a rather hacky script polling the Twitter and APIs every 2 minutes, putting the content into my WordPress database, and exposing that information over a partial implementation of the Twitter API. I then use a ruby script that polls said Twitter API and sends the messages to me over XMPP (as well as allowing me to post to my self-hosted microblog). I read other feeds using newsbeuter.

This entire setup is a bit suboptimal. Why, for example, is my server polling and then storing content I am subscribed to? Isn’t that normally my aggregator’s job? I’ve realised, that the reason the model has evolved this way elsewhere (such as at is that in many microblogging services, the service acts as both publisher and aggregator, but this is sort of artificial. My server has no good reason to act as an aggregator, that function is not related to publishing my content.

I thus intend to build a better aggregator infrastructure. My current thought is to build some sort of modular system with data sources and sinks. The primary sources (to start) will be RSS/ATOM feeds and maybe Twitter compatible APIs. On top of this can be built a nice web-feed-reader infrastructure (like Bloglines was), which should support PubSubHubBub for its subscriptions (since it lives on a server asnyway), or an XMPP feed-delivery system (which just connects, delivers the content, and disconnects), or an ncurses or other GUI system for local use.

The system also needs to support replies (via Salmon or Trackback/Pingback) and also cross-posting of said replies (using Twitter-compatible or WordPress-compatible APIs). The public key that goes with the private key the aggregator is using to sign the Salmon slaps needs to be publicly discoverable somewhere (likely at your publishing point), but otherwise the aggregator doesn’t need any kind of public or always-on presence.

Leave a Response