Hindi in that Cloud
In my previous post I had discussed that Tagcloud was unable to generate the Tagcloud for Hindi blogdom since the Yahoo term extraction API doesn't recognize non-English characters yet. I then decided on implementing it myself. This is how I do it:
- Parse the Hindi blog group RSS Feed
- Get the Words ignoring the very commonly used ones
- Insert the data (word and frequency of occurrence) in database
Sad part is, the solution only considers words and is not intelligent enough to decipher phrases. Perhaps this is why I was happy to notice that Technorati now provides a Blog Post Tags service where the query returns the most frequently used tags in a blog. However, for some strange reasons the query never works out for me, I tried the same for this blog as well several other Hindi blogs but the XML returned is always empty. First, I thought it only works for WordPress blogs or blogs that use Technorati Tags, however for some blogs like this one it works. A missive to Technorati did not fetch any reply; their blog post OTOH indicates that the service is “available only on request”. I hope they read this post and tell me what is happening. If thier solution works, my tagcloud could perhaps be generated more efficiently.