Archive for the ‘recommendation engines’ Category

Online systems are hostile to niche markets, even if they expand individual experience

Monday, March 30th, 2009

Whimsley points out that the Internet (and recommendation systems, in particular) expand choices for individuals, but potentially lower the overall range of things that are likely (emphasis on the word “likely”) to be experienced. The focus is especially on recommendation algorithms, such as the one on Amazon.com:

You can see that [in the graph above] on the left, in Internet World, a few products were chosen a lot, especially the one centred on about (-0.2, -0.2). In Offline World there are many more medium-sized dots, showing that the consumption of products is more equal. In Internet World one product has “gone viral” and gets chosen over 1500 times out of the total of 3600, while 26 products languish in the obscurity of being sampled fewer than ten times. In Offline World no single product is chosen more than 10% of the time, and only 14 products are sampled fewer than ten times. In short, niche products do better in Offline World than in Internet World.

While each customer on average experiences more unique products in Internet World, the recommender system generates a correlation among the customers. To use a geographical analogy, in Internet World the customers see further, but they are all looking out from the same tall hilltop. In Offline World individual customers are standing on different, lower, hilltops. They may not see as far individually, but more of the ground is visible to someone. In Internet World, a lot of the ground cannot be seen by anyone because they are all standing on the same big hilltop.

The point here is similar to the point Clay Shirky made back in 2003:

A persistent theme among people writing about the social aspects of weblogging is to note (and usually lament) the rise of an A-list, a small set of webloggers who account for a majority of the traffic in the weblog world. This complaint follows a common pattern we’ve seen with MUDs, BBSes, and online communities like Echo and the WELL. A new social system starts, and seems delightfully free of the elitism and cliquishness of the existing systems. Then, as the new system grows, problems of scale set in. Not everyone can participate in every conversation. Not everyone gets to be heard. Some core group seems more connected than the rest of us, and so on.

Prior to recent theoretical work on social networks, the usual explanations invoked individual behaviors: some members of the community had sold out, the spirit of the early days was being diluted by the newcomers, et cetera. We now know that these explanations are wrong, or at least beside the point. What matters is this: Diversity plus freedom of choice creates inequality, and the greater the diversity, the more extreme the inequality.

Shirky is writing about weblogs, but what he says applies to books, CDs, videos – anything where people can influence the choices that other people make. Shirky’s point applies to the point that Whimsley is making about recommendation engines. This part applies equally to webloggers or videos:

We also know that as the number of options rise, the curve becomes more extreme. This is a counter-intuitive finding – most of us would expect a rising number of choices to flatten the curve, but in fact, increasing the size of the system increases the gap between the #1 spot and the median spot.

Shirky offered a hypothetical example of how this works:

To see how freedom of choice could create such unequal distributions, consider a hypothetical population of a thousand people, each picking their 10 favorite blogs. One way to model such a system is simply to assume that each person has an equal chance of liking each blog. This distribution would be basically flat – most blogs will have the same number of people listing it as a favorite. A few blogs will be more popular than average and a few less, of course, but that will be statistical noise. The bulk of the blogs will be of average popularity, and the highs and lows will not be too far different from this average. In this model, neither the quality of the writing nor other people’s choices have any effect; there are no shared tastes, no preferred genres, no effects from marketing or recommendations from friends.

But people’s choices do affect one another. If we assume that any blog chosen by one user is more likely, by even a fractional amount, to be chosen by another user, the system changes dramatically. Alice, the first user, chooses her blogs unaffected by anyone else, but Bob has a slightly higher chance of liking Alice’s blogs than the others. When Bob is done, any blog that both he and Alice like has a higher chance of being picked by Carmen, and so on, with a small number of blogs becoming increasingly likely to be chosen in the future because they were chosen in the past.

Think of this positive feedback as a preference premium. The system assumes that later users come into an environment shaped by earlier users; the thousand-and-first user will not be selecting blogs at random, but will rather be affected, even if unconsciously, by the preference premiums built up in the system previously.

Netflix, recommendation engines, and the problem of Napoleon Dynamite

Thursday, November 27th, 2008

Kottke has some thoughts about the problem computer programmers have coming up with an algorithm that might recommend, or not recommend, the movie Napoleon Dynamite:

The thing that all those kinds of movies have in common is that if you’re outside of the intended audience for a particular movie, you probably won’t get it. That means that if you hear about a movie that’s highly recommended within a certain group and you’re not in that group, you’re likely to hate it. In some ways, these are movies intended for a narrow audience, were highly regarded within that audience, tried to cross over into wider appeal, and really didn’t make it.

This problem has dimensionality. There is no way to come up with a simple 1 to 5 rating system that works for everyone. Neither voting by the users nor any analysis of popularity with the majority be able to solve this problem. Instead, a separate category should exist for the highly contentious movies such as this. I suspect that, just as Pandora has done for music, there should be a way to deduce what a movie viewer likes, if you already know all of the movies that the viewer likes. But without a full inventory of every movie that the viewer likes and dislikes, there will be no way to guess at whether they will like movies as odd as Napoleon Dynamite.

(I personally love Napoleon Dynamite.)