Mashable recently posted about an upcoming feature addition to Digg: a recommendation engine. After reading the post and watching Kevin's screen cast on the new feature, I think it looks interesting. Improving the signal to noise ratio on Digg is definitely a good idea and I think a recommendation engine is a step in the right direction. However, the system as described by Kevin has a big hole in it: it's worthless to people with no digging history or friends.
This is a typical problem in recommendation systems. How do you make the system useful to users for which you have no data? If you look at usage patterns for users on UGC (user generated content) websites, this isn't a small problem. Numbers for web sites I've worked on have < 5% of users submitting content, and < 10% of users voting on content. I have no idea what Digg's numbers are, but I'm sure they're not too far off. This means the new recommendation system is completely useless for around 90% of the people that use Digg.
The news isn't all bad though. It's a short stretch to implement recommendation systems that are based on item similarity. This way a user that has no history on Digg could come to a news item and click a link that says "find me more on Digg like this." The methods aren't that much different than the user similarity measures. In fact, the users that are voting help create their own recommendations are also giving the system the data it needs to calculate these item recommendations.
Another method is to collect passive data like a user clicking on a news link on Digg. They're not actively voting the article up or down, but there is an implicit indication of interest in the article. You can then use this data to make a guess about what the user may like. So this method requires a little bit of data, but doesn't require the user to be an "active digger."
In any case, I think it will be interesting to see how they continue to develop the system. The information filtering space is one that will only become more important as the exponential growth of content continues.



