Wednesday, April 15, 2009

#hashtags and @replies....

Recently I was intrigued by some Twitter posts by @garrickvanburen relating to whether #hashtags and @reply syntax on Twitter were useful anymore with the advent of Twitter search. The topic of #hashtags has been discussed by Robert Scoble on Friendfeed as well, where Scoble claimed that "hashtags are dead".

I disagree with the sentiment that #hashtags serve no purpose anymore, and I think that part of the issue with people thinking they are dead is a lack of understanding about how search algorithms work (or should work). The key to #hashtags being useful is that they represent a significant act by the user who added them to their Tweet. The act of adding a #hashtag is akin to saying "this post is about #thisTopic", and that is powerful fuel for a good Information Retrieval algorithm that takes advantage of it. Now I realize that not all #hashtags really describe what the post is about, but the majority do and this data should be used.

During my Twitter exchange about this topic I made the following point:
A Tweet with a url and a #hashtag is directly translatable to a Delicious entry.
What I meant by this is that Delicious lets you tag your bookmarks into various categories and then search/sort along those dimensions. So if I write a Tweet that essentially shares a url (a very common use case), and then put a #hashtag on it, I have done the exact same thing. If this is powerful for Delicious it is certainly powerful for Twitter. In fact I would posit it is more powerful for Twitter, because Tweets are only 140 characters which is not a lot of data from which a search algorithm can determine similarity and relevance.

So how does a search algorithm take advantage of this data? Well a very simple way is to "boost" the relevance score for that field. Google does this based on HTML tags (well I think they do or at least did at some point), giving more weight to the text inside a <title> tag than say a inside <h4> tag. There are more complex ways of utilizing this data as well... but that is another post.

Okay so maybe I convinced you that #hashtags are not dead, but you are probably asking about @replies now. Well the argument is the same, if a bit more convoluted. The @reply tags are also meta-data about the post, they indicate who is/are the intended audience for the post. Knowing who is the intended audience means that you can then utilize meta-data you know about that users aggregate activity to "boost" the Tweet up or down. One example of this would be using TunkRank to re-weight a Tweet based on where the user was going. This is essentially giving extra weight to a Tweet going to someone who is an expert, or a power-user.

@replies and #hashtags are important pieces of meta-data (some of the only pieces of meta-data we get in Tweets); they should and will be considered by advanced search/indexing algorithms in the ranking of Tweet search results. So my plea is don't let @replies and #hashtags die, we would lose a useful system for collaborative filtering and intelligence, and make data-mining twitter all the more difficult.

Monday, April 6, 2009

TweeFind: twitter search engines by relevance?

I read this post on Mashable about a new Twitter search engine and I had to comment:

Google's ranking system is a mix of an IR (Information Retrieval) ranking on the similarity of the search term to the content and page rank. The problem here as I see it is 140 characters means that IR score is not very illuminating, so we get back tons of results with say 1 word matching b/t the search term and the tweet. Once we realize this, then we see that the scoring algorithm for "popular" twitter users becomes the driving force behind pushing tweets up or down.

That is all fine, but the fact that the content is so thin, means that flipping results up and down based on a users aggregate behavior TELLS US NOTHING ABOUT THE GOODNESS/BADNESS of the tweet in question. So we get in the situation where a bad tweet by a good author gets promoted, and the results are not very good. My $0.02.


This reasoning is why I think that the best way to evaluate twitter posts, in a search context, is to constrain to posts that share urls. That being said, the current search at http://search.twitter.com is great b/c it is not for finding data or "deep searching" a topic. I think it is perfect for queries that are realtime, like this past weekend when one of my favorite bands canceled a show in Minneapolis, I asked Twitter "gaslight anthem medical emergency" and I got back fantastic results. What I won't ask Twitter search (without advanced operators to limit to tweets with urls, and even then I am dubious if I would get decent results) is "turbogears simpledb integration", because I am looking for deep information. So I guess to me, searching twitter is for realtime, "breadth" searches, and searching Google gives me depth. Of course I could search re-searchr and get both :)