Singpolyma

Give Raw Pages a Lift

Posted on

We’ve all seen it: the harsh white background, serif fonts, and window-border hugging indicitive of unstyle (or very lightly styled) (X)HTML.  Today I discovered something: it takes very little CSS to take a basic HTML page and give it some flavour (and make it less painful on the eyes).  Just eight lines of CSS.

Not all pages have such CSS, however, and sometimes it might be nice to just hit a button and get some style.  So I created a bookmarklet.  Drag it to your bookmark bar, and next time you see an unstyled page, hit it.  Basic styles will be added instantly.

SocialSearchMe.com API

Posted on

First off, my social network search engine now has a domain thanks to Tantek!  It’s just a redirect forwarder for now, but much easier to remember!

I have been polishing some bits of the search engine and am pleased to report that it now has a complete API!

First off, the microformat API.  All data is marked up with hCard, this allowing pages on the engine to double as API output.  This is the preferred method.  If you really must, a JSON(P) variant is available.

These are the endpoints for a standard search:

http://scrape.singpolyma.net/profile/?q=NAME

http://scrape.singpolyma.net/profile/search.js.php?q=NAME

These are the endpoints to search from the “point of view” of a particular person (specify a URL):

http://scrape.singpolyma.net/profile/?q=NAME&pov=URL

http://scrape.singpolyma.net/profile/search.js.php?q=NAME&pov=URL

To retrieve data about a specific user use:

http://scrape.singpolyma.net/profile/person.php?id=ID

http://scrape.singpolyma.net/profile/person.js.php?id=ID

http://scrape.singpolyma.net/profile/person.php?url=URL

http://scrape.singpolyma.net/profile/person.js.php?url=URL

And that’s it!  This stuff powers my contacts page and the bookmarklet.

DiSo Gets Search

Posted on

Tantek Çelik has purchased socialsearchme.com for this service! Thanks!

Never tweet about something you don’t want to go public.  I’ve been annoying my followers for some time now about my new social search engine.  Tantek then linked to it from his WordCamp SanFrancisco presentation.  Not that I’m upset at all.  I’m ecstatic that he thought it was worth linking to!  Still, a word to the cautious 😉

So how does this search engine work? What does it do? Basically, it’s an hCard search engine.  Unlike the Yahoo or Technorati Kitchen implementations, however, this search is focused on social networking and profiles.  If DiSo were Facebook, this could be the friend search functionality.  So instead of having the results be links to pages that contain matching hCards, the results are profiles with social networking data (including contacts) and names, etc.

One other key thing that is different here from pure hCard search is that I am only spidering representative hCards (with some small hacks for well-known sites like Twitter).  This means I don’t spider arbitrary hCard data, instead I am only indexing profile pages.  I use both XFN parsing and the SGAPI to verify claims that two pages represent the same person, and then associate them.  Data from both pages goes into the index as if it were all on one page.  Only one page needs an hCard, since connections are made through rel=me and XFN.  This way, although my profile is on my main page and my contacts are at singpolyma.net/contacts, the search engine indexes them both.

To find new pages to index, I spider along XFN (and FOAF, since I also ask the SGAPI) to find pages likely to have the sort of data I’m looking for.  Interestingly enough, this means that social networks like Twitter, Pownce, and Digg, who support hCard and XFN, get almost completely indexed.  There are over 100000 profiles in the index now, and I have only given it one manually : singpolyma.net.

I’m not entirely sure how the data will be useful yet, but I’m really excited about the possibilities.  I firmly believe in making XFN lists, static though they may be, come alive with potential through layers of functionality, be in through plugins, 3rd party services, or bookmarklets.

Speaking of bookmarklets, I have one.  Go to that page, add the bookmarklet, and visit my contacts page (or any other page with lots of XFN data).  Click it and watch that boring list of links and names turn into a more functional social-networking list.

The code has been released under an MIT-style license on my repository.  Front-end is PHP, back-end is Ruby.

DiSo : on our way to fixing your addressbook 😉

Messaging: What I Want

Posted on

I’ve blogged numerous times about XMPP, SMTP, and communications evolution on the web.  I’ve suggested what I want ultimately and snippets of how we might get there.  Here, I am going to outline just briefly what I consider “next steps”.  The big ones.  Get these done, and you will have made a *huge* stride in online messaging:

  1. Allow offline messages (type normal or chat) to be collected as “email”.  Gmail sort-of does this by presenting unseen offline messages in the web interface inbox.  I want IMAP access to these in the inbox and their archive.  Heck, store them in a Unix mailspool (have to store them somewhere anyway) and existing IMAP servers will just work for you!
  2. SMTP messages are type=normal.  If you store offline messages in a mailspool and run an SMTP server on that spool, you’re mostly done.  Might be good to offer real-time deliver of those messages to the user of XMPP as well though.

That’s it! Sure, more can be done, but if you get the first one done I will be your biggest fan.  Do both and you’re well on your way to an evolution in how we deal with email (both from a user and a protocol perspective).  Yes, I’ve tried to build this.  I want to do it as an ejabberd module, but ejabberd is barely documented.  I’ll try again sometime if no one else does – maybe with ejabberd, maybe with someone else.

Boxbe AntiSPAM

Posted on

Today I received an email from Boxbe support telling me they had finally given users the option to turn off their “coutesy notification” system.  I couldn’t be happier!  I thought I’d take this post to share about my SPAM problems, and my solution.

The Problem

GMail SPAM filtering is nice.  I may not have it forever, and don’t like to count on it, but it works very well.  Unfortunately I made the choice when I registered this domain name to set up a catch-all.  At first that was fine, but after over a year *@singpolyma.net was receiving so much SPAM, so fast, that even the GMail SPAM filter couldn’t keep up.  I began to receive over 40 SPAM (sometimes over 200) per day, sometimes all at once!  I didn’t want to disable the catch-all though… that felt like the wrong solution.

The Right Solution

I decided the right solution was whitelisting.  Since most of the people I know don’t use PGP (yet) there is no way to guarentee the sender of the messages, but from a cursory glance over my SPAM box I decided that trusting the From: header would work for 99% of today’s SPAM.

I can’t set up a forwarder from a catch-all with Dreamhost, so I set it to be delivered into a mailbox.  I then created a “dummy” Gmail account to fetch this mail via POP3.  Bonus #1, Gmail filters all this mail as it comes in, catching a huge amount of the illigitimate messages (just not enough of them).  Set Gmail to forward all email to singpolyma@boxbe.com (more on that in a bit) and delete.  Using Gmail as an email pipe/filter really.

Then Boxbe.  Boxbe gives you a you@boxbe.com email address that you can forward mail to, it checks it against a whitelist, and sends it on if it matches.  Previously, if it did not match, they would reply with a “challenge” email.  This is annoying, broken, and sometimes embarassing, so I am very pleased that they have now given people the option I wanted all along.  Disable all “courtesy notifications” and turn on the report of the queue, daily.  If I receive any mail from people not on my whitelist, I get an email from Boxbe once a day summarizing who tried to contact me.  I go and let through any legitimate new people.  Perfect.

Boxbe uses the password anti-pattern (although they’re working on fixing that, they say) to import your address book.  They have a CSV importer though.  Export from Gmail, import to Boxbe.  Set up some trusted domains (like *@uwaterloo.ca) and go.

I haven’t seen SPAM since, and have only once or twice had to go over and let through a message that got stopped.