September 25, 2003
A frequent query that people ask on forums is for information on how to type in Indian languages on the computer. Standard solutions are things like I-Leap, Kiran font, Akruti etc. But I would recommend something better, that will be very helpful in the long run.
I had kinda searched for how to type in marathi, and came across a useful resource. It turns out that Windows 2000 and Windows XP both have built in support for the "Indic locale", which means you can do all your Devnagari typing without any extra tools. But before that, some info about "encoding" ...
When you type those documents using any of the tools mentioned above, how exactly is the document stored on your computer? Normal ASCII (English) text is stored by using one 8-bit number per character in the normal alphabet. This representation, or encoding, is defined in the ASCII Standard (American Standard Code for Information Interchange)
But when you type a Marathi document using the tools above, they use their own scheme to store the characters that you type, since there is no pre-defined way of saying that a particular combination of bits means the letter "Ka", for example.
This is where the global standard, Unicode comes in ... it is a global standard that uses 2 bytes (16 bits instead of only 8 in case of ASCII) to store each character. It is designed in such a manner that the ASCII character set is automatically a subset of Unicode. Why is Unicode important? It defines a globally accepted encoding for all languages - thus the letters in a Marathi document are differently encoded in zeros and ones, from the letters in an English document. And the particular values are standard, anywhere in the world. This allows convenient interchange of information across the globe.
This standard way of storing data is technically called "encoding". The configuration of a system that allows editing in the context of a particular language is called "locale". This locale defines things such as character symbols, date format, currency format and a host of other things. The locale for Indian languages is called the "Indic locale" in case of Windows.
Windows had been supporting Unicode for a long time, and you can easily type Devnagari documents in Windows applications! The Indian govt also recently announced an official "Indianized" version of GNU/Linux, which is being developed by a division of CDAC (former NCST). In GNU/Linux, creating software that can work in any language is called "Internationalization" (I18N) and adapting it to a local language is called "Localization" (L10N) The localization effort has been going on for quite some time in India already, but it was hampered so far by lack of standards, because people refused to work with Unicode for issue not really relevant here.
What does all this mean? It means that global standards and technologies are in place to allow easy interchange of information in all languages, not just English or Marathi! Now its upto the users - when you create multilingual documents, insist on using Unicode encoding. That way, others will be able to use your documents, no matter what tools they use, or even what operating system they use!
I'll describe how to do this in terms of Windows. Those interested in GNU/Linux can contact the local LUGs ... there are a lot of efforts and projects in GNU/Linux that need help, but that will be out of context for this mail.
First of all, you have to enable the Indic locale on your Win2k system. For that you'll need the installation CD, and login as Administrator or a super-user. All you have to do is go to Regional Options, and enable the "Indic" language setting in the "General" tab. Also in the "Input Locales" tab, enable whatever language that you want to use for typing. The default keyboards are not very intuitive, but there's a solution for that. Once you have done the above changes and restarted the system (if asked to), your system is ready to recognise and work with Unicode documents in your selected languages.
Now two things remain - fonts, and key mappings. You will need Unicode fonts to take advantage of this standardization. One font that most people find acceptable is Arial Unicode MS. Remember, fonts are only collections of glyphs, that provide visual appearance to symbols. The appearance on the screen of the letter "Ka" is determined by what font you use. No matter what font you use for display, the letter "Ka" will be stored on your disk using exactly the same sequence of ones and zeros as defined by Unicode.
Last but not the least - keyboard. The default Marathi keyboard provided by Windows is rather unintuitive, I have no idea what its based on and I would rather have a "phonetic" keyboard - where you get "Ka" when you press the "K" key and so on ... On the net I found one gerat utility that provides this with full Unicode support - an editor called "Baraha". It can either be used as a standalone editor, or also allow you to type in any application like Word, Notepad, etc. Interesting point is that since Indic locale is built into Win2k, even notepad can be used to type Marathi documents!
You can find Baraha at the following site... its roughly a 5MB download but well worth the patience!
One VERY important point that I must stress is that its the Unicode support that is important. The choice of fonts and typing utilities is incidental. Whatever tools you choose, always insist that it should have Unicode support and make sure that your documents are saved in standard Unicode format only. That way, you'll be able to truely achieve cross-application, cross-platform interchange of information in Indian languages.
For more rather technical information (that even I havn't bothered to read), go to Unicode homepage
| Legal Notice | Home |