Technorati's Blog Finder

Technorati has come up with another good service called Blog Finder that can let you search for blogs as per categories or subjects, as tagged by the blog-owners. IMHO the tagging part should have been a moderated affair, current system will have all the drawbacks of tagging. This could do well for blogs that do not have categories, like the Blogger blogs. As of now Technorati (and I know this using their BlogPostTag API) used to regard the categories in blog systems such as WordPress as default tags. Now I am not influenced by the bashing Technorati has been getting lately, but frankly, I would still prefer Google to search for blogs on a subject. [Source]

Hindi in that Cloud

In my previous post I had discussed that Tagcloud was unable to generate the Tagcloud for Hindi blogdom since the Yahoo term extraction API doesn't recognize non-English characters yet. I then decided on implementing it myself. This is how I do it:

  1. Parse the Hindi blog group RSS Feed
  2. Get the Words ignoring the very commonly used ones
  3. Insert the data (word and frequency of occurrence) in database

The data from database is available as XML and as this JSP (see the frame below for a glimpse). If you want to see the page in action, look out for the “Kya bolte hum?” section at Chittha Vishwa.

Sad part is, the solution only considers words and is not intelligent enough to decipher phrases. Perhaps this is why I was happy to notice that Technorati now provides a Blog Post Tags service where the query returns the most frequently used tags in a blog. However, for some strange reasons the query never works out for me, I tried the same for this blog as well several other Hindi blogs but the XML returned is always empty. First, I thought it only works for WordPress blogs or blogs that use Technorati Tags, however for some blogs like this one it works. A missive to Technorati did not fetch any reply; their blog post OTOH indicates that the service is “available only on request”. I hope they read this post and tell me what is happening. If thier solution works, my tagcloud could perhaps be generated more efficiently.

TagCloud, if it could work with Hindi

Tagcloud seems interesting; tells you the crux of the conversations in blogdom pretty much like Technorati tags. There are 80+ Hindi blogs now and I thought why not sport one such cloud for these blogs at Chittha Vishwa, alas the effort failed. The onus fell on the Yahoo Term Extraction API that as of now only recognizes English words. The speed with which terms are extracted from even large amount of text, I tested it here. Try entering some Hindi text and as you may see it is unable to recognize the terms. The thing I like is the simplicity of REST. BTW, do have a look at the Tagcloud from my Hindi as well as this blog (atleast it will get the English words from my Hindi blog) on the left sidebar.

While I intended to do it for pure fun, as did Desipundit, people are scepticle about the usefulness of these tags. While Tagcloud ranks these tags to prepare the cloud, Simon used it to extract terms for “automated tagging”, though the results are not guaranteed to be relevent.