Monday, April 6, 2009

TweeFind: twitter search engines by relevance?

I read this post on Mashable about a new Twitter search engine and I had to comment:

Google's ranking system is a mix of an IR (Information Retrieval) ranking on the similarity of the search term to the content and page rank. The problem here as I see it is 140 characters means that IR score is not very illuminating, so we get back tons of results with say 1 word matching b/t the search term and the tweet. Once we realize this, then we see that the scoring algorithm for "popular" twitter users becomes the driving force behind pushing tweets up or down.

That is all fine, but the fact that the content is so thin, means that flipping results up and down based on a users aggregate behavior TELLS US NOTHING ABOUT THE GOODNESS/BADNESS of the tweet in question. So we get in the situation where a bad tweet by a good author gets promoted, and the results are not very good. My $0.02.


This reasoning is why I think that the best way to evaluate twitter posts, in a search context, is to constrain to posts that share urls. That being said, the current search at http://search.twitter.com is great b/c it is not for finding data or "deep searching" a topic. I think it is perfect for queries that are realtime, like this past weekend when one of my favorite bands canceled a show in Minneapolis, I asked Twitter "gaslight anthem medical emergency" and I got back fantastic results. What I won't ask Twitter search (without advanced operators to limit to tweets with urls, and even then I am dubious if I would get decent results) is "turbogears simpledb integration", because I am looking for deep information. So I guess to me, searching twitter is for realtime, "breadth" searches, and searching Google gives me depth. Of course I could search re-searchr and get both :)

No comments:

Post a Comment