by Michael S. Kaplan, published on 2010/07/09 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/07/09/10033379.aspx
So now I'm going to talk about something kind of new.
I've kind of mentioned it before but really just in passing
Basically I'm going to just talk a little bit about some Indic conjuncts that exist for languages like Hindi and Bengali.
Now I’ve talked about the Bengali NNA before.
Basically in Unicode it’s NA + VIRAMA + NA == NNA, or
ন + ্ + ন == ন্ন
Now you can kind of see a little of the doubled NA in the conjunct itself, though its mostly all well and good that cursor movement will not put you in the middle of this conjunct.
They are doing that work with Uniscribe, which I believe sets the rules based on the information in the font and its own knowledge about the script in the shaping engine.
Thankfully intelligent keyboards know that I can just type NNA and get that conjunct as an option rather than requiring me to add a virama (or should I say HASANT?) myself.
It certainly makes typing more natural!
There is a lingering size issue I'll talk about another day....
Now Hindi is another language that has a lot of conjuncts in it, like:
DDA: द + ् + द == द्द
(which should obviously not be confused with DEVANAGRAI LETTER DDA (U+0921, ड).
JNYA: ज + ् + ञ == ज्ञ
SHRA: श + ् + र == श्र
KSSA: क + ् + ष == क्ष
Okay, you get the point. I can’t do them all, there are thousands in the Mangal font. And when I say thousands I mean on the high thousands side. If I had the unlocked font in front of me I'd give you an exact count.
So you perhaps can see what I was getting at here – how in many cases the conjunct may have a lot less to do with visual similarities with the original letters that it is made up of.
I never thought about it, but to be able to read in Hindi you need to know lots of these conjuncts – that is like having to learn all the Kanji for Japanese!
Now this makes languages like Hindi very complicated (and Sanskrit even more complicated), indeed! Not because they are complicated languages (after all Hindi and Urdu are roughly the same language, as I pointed out in Giving the people Urdu, we are!). But Urdu, unlike Hindi, doesn’t have so many alternate forms of letters that happened to be alongside each other.
And Unicode, by requiring viramas and control characters, adds some extra complexity on their own (since implementers took way to long to hide the details from regular users).
But it makes me wonder....
Could one have a simplified version of Hindi that just stuck to the letters and had no conjuncts?
Would it make sense and be readable?
Would it look childish?
Would everyone hate it?
Of course it wouldn't really help much -- this is the "full consonant, explicit virama, full consonant" case I discussed in Why my IUC31 talks were presented on Vista (even though running on a MacBook Pro), which will require more control characters to be added to get the right result (thus making it more complicated to do this "simpler" form.
No one ever did describe the reasoning one would use to choose which form one wanted, though for the record I'll note that none of the intelligent input methods for Indic seem to allow it....
Maybe it woulf be easier to just use Urdu instead. ;-)
John Cowan on 9 Jul 2010 9:38 AM:
I'm looking at this posting in Firefox 3.6.6 and Internet Explorer 7.0.5730.13, both in Windows XP SP3. Alas, the IE version is badly garbled. The Bengali characters are all outlined white boxes, and the Devanagari shows up with the == displaced just to the right of the + signs, with the virama-plus-dotted-circle glyph *between* the two equals signs. Very grody.
On Firefox, which uses its own cross-platform rendering engine, the Bengali is in a slightly different font, and there's no ligature, but at least the NA-with-virama NA glyphs appear correctly, not as boxes. The Devanagari is perfect.
When I get home I'll look at it in Firefox and Chrome on Ubuntu. So far it's Windows +1, Internet Exploder -1.
Paul D. Hunt on 9 Jul 2010 1:30 PM:
Michael,
Of course it is possible to reform the Devanagari script to make it so that no conjuncts are required, but I would argue that this would much impede the reader. Think of conjuncts as ligatures: In quality typography, you will often find fi standing in the place of f and i. It is possible to render Hindi or any of the languages that uses Devanagari script with half (or halant) forms of the consonants, and in fact the Mangal font does this quite a bit of the time (perhaps a vast majority of the time). In fact, the last time I checked there were not thousands of of conjuncts. In the version of the font that came with my Win XP laptop (version 1.20) there are 675 characters in total of which perhaps ~80% are conjunct characters. However, in my opinion, many of the characters in Mangal are superfluous and will in all likelyhood never be accessed. But I digress. If you can get your hands on a copy of Bapurao S. Naik's 'Typography of Devanagari', I would recommend reading the portions on script reform ideas that were proposed and implemented in an attempt to make Devanagari typesetting possible during the hot metal typesetting era. Also, I would discount your comparison of learning conjuncts for Indian scripts to learning Han ideographs, as the formation of the Indian conjuncts occurs in highly predictable ways once one knows the rules of formation. Just as you perceive Urdu to be a simpler system (I am assuming because you are familiar with it), my perception is that the Brahmi-based scripts are a simpler system - because I am familiar with it. Spend a week learning how conjuncts are constructed and you'll have a much easier time reading it.
Michael S. Kaplan on 9 Jul 2010 4:36 PM:
John, I though Google employees weren't supposed to be running Windows in their offices anymore? :-)
Maybe that was the problem -- everyone there was using platforms that were nine years old. I assume your Ubuntu is a more recent build....
But seriously, the Bengali situation in XP post SP2 was okay but before they had to make massive changes (some from Unicode changes, some from feedback), so I won't be too surprised if things didn't look *too* good there compared to now. Things look very nice on Windows 7/IE8, fwiw.
Michael S. Kaplan on 9 Jul 2010 5:51 PM:
Paul, I agree -- but the full form does exist, so the question of when the user would expect it remains an interesting one, right?
Pavanaja U B on 9 Jul 2010 9:37 PM:
<Now this makes languages like Hindi very complicated (and Sanskrit even more complicated)>
Actually, it is not the language, but the script which looks complicated. The idea of just keeping the basic characters with "halant" at the end for conjuncts is not very new. This was tried almost a century ago in Kannada when the letterpress printing was just in nascent stages. Very famous Kannada writer B M Shreekantayya tried it in Kannada. But people did not accept it. Kannada is more complex compared to Devanagari script because of vattus. Malayalam newspapers imposed the simplified Malayalam almost 4 decades ago on to the people and got success.
referenced by