Tuesday, January 22, 2008

Google's transliteration application for Indic languages

Compared what Microsoft offered through www.bhashaindia.com, Google's transliteration application much better. Considering the IME runs on the client, it could have been a lot better in detecting Tamil (though the IME is available for many Indic languages, Tamil happens to be a bit difficult one). For example, to get ன, you have to type the letter n followed by _ (underscore) within a second or so. With a small rule set and /or a dictionary, it could have detected the letter based on the context as Google does.

Google does even better. To differentiate between kuril and nedil, most IME expected the users to use upper case. It has been a fair expectation. But it is also the users' common practice to begin a sentence in English with upper case. On transliteration, this caused a needless error and correction. With Google, typing 'Oru' or 'oru' results in a transliteration to 'ஒரு', which is a common word. Google also shows a context menu with other possibilities like 'ஓரு' and 'ஒறு'.

More to come as I explore.

No comments: