Leave a comment

Jan 19, 2012 | In search of serendipity and soulful data

Every Wednesday I get the privilege to attend HCI lunches here at Stanford where a mishmash of scholars, professors, and students meet over lunch to watch someone’s presentation on research (and proceed to tear it apart). It does result in some interesting discussions – yesterday being no exception – and I thought I’d share some thoughts on this elusive idea of “serendipity”.

Terri Winograd, one of the luminaries in the field, put it succinctly when he said that

serendipity is to data as intuitive is to a user interface

What is intuitive for you may not be to me, just as serendipity cannot be quantified per se as a quality of some data. In this particular context, serendipity remains one of Monica Lam’s holy grails – how might one sift through the masses of data we manufacture each day and reveal the ones that are particularly “serendipitous”? Jesse brought up the valid point that intuitiveness will likely not change as quickly as serendipity, which may make the latter even more elusive. Let us see. Merriam-Webster defines it as:

the faculty or phenomenon of finding valuable or agreeable things not sought for

Other sources inevitably include terms like “happy accident” or “pleasant surprise”. Hence the keywords there are

a) “something that is useful or creates some sense of pleasure or happiness”
b) “something that is unexpected”

We can definitely cater to b) with what we have: tf-idf, in its roughest sense, provides some measure of information quality relative to rarity. Algorithms such as LDA do a pretty neat job surfacing themes and topics that are somewhat unexpected and which may provide some serendipity, but it is hardly predictable. What may be the right approach, then, is to consider technologies that surface information that is an aggregate, but not a composition, of existing data. Where we currently see a garden full of different colored apples and we rejoice at an algorithm’s capacity to say “there are many different apples, and this one may be of interest”, or “we can group your 100 apples into 7 different logical clusters”, we may benefit from the approach that says “this garden has too many apples”.

In that sense I feel the approach of creating serendipity by surfacing past data is inherently problematic. I do acknowledge the fact that getting better access to our past can be interesting (and serendipitous), but it hardly a worthwhile challenge. We may choose to omit certain aspects of our past history specifically because we don’t want to be reminded of them – how would a system be able to discern that fact? Is Facebook’s “People You May Know” feature one that uncovers “serendipity”? Are other people better arbiters of “soulful data” than algorithms will ever be?

Let me enumerate things that make me happy especially if this is something that can be provided to me online (without my explicit, active participation):

a) finding a long-lost friend
b) hearing good news about a close person’s success
c) reading about a news article that is related to a current research topic or question
d) seeing videos of cats
e) finding a really sweet online deal that is within my budget and what I had been looking for

It’s clearer now why websites like Reddit and Facebook succeed, because the narrow circles of individuals we associate are likely to provide us with those happy discoveries (this, as opposed to the worries of the echo chamber – this later finding is worthy of a whole different discussion).

Ultimately I think any system that attempts to pursue the notion of surfacing soulful data needs to be reminded of the fact that there needs to be a clear and transparent method for communicating how that data is derived and ultimately chosen for display. Also, there is a very fine line between letting people “search” for this data and providing it at the opportune moment. Maybe that is the key question: what is the opportune moment to provide someone with information? The most basic answer to that would be: “when they are searching for it”. We can definitely do better.

In short:

a) surface information that a user cannot or would rather not spend the effort to search for
b) choose carefully the moments you expose that information
c) be transparent about how that data is obtained

This entry was posted on Thursday, January 19th, 2012 at 7:23 pm, EST under the category of User Interface. You can leave a response, or trackback from your own site.