by Michael S. Kaplan, published on 2010/07/26 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/07/26/10036264.aspx
This blog does not mean I am taking up the cause of Bengali in Unicode in the same way I did with Tamil in Unicode a decade ago. I lack the passionate/interested contacts to sustain such a thing, and to be honest doubt they are out there (if they are, they are not nearly as vocal on the Internet!). I am simply making a point here....
So two of the things I mentioned at the World Classical Tamil Conference (well, actually at the co-located Tamil Internet 2010) were:
Now it is not as completely simple as e.g. taking that chart I put up in the bottom half of Learn Tamil in 30 Days (or something like that) that ended in Wikipedia thanks to Scott's efforts and in Unicode 5.1 thanks to Mark and I... taking that chart, substituting the word Tamil with the word Bengali and redoing the characters to magically get a new chart.
Because there are differences between the two blocks that do not make them completely match. This is definitely the case.
And some of the named sequences might be different. This might be the case.
And even if you had this big chart in place and you have all 32 consonants in the block and all 11 dependent vowels in the block in a big grid with nearly 400 cells altogether, there is still a piece missing.
There is the fact that there are thousands of conjunct consonants that take up to our consonants that are together with no vowels (I mean this in the language sense; in the Unicode sense "four consonants with no vowel" would be "four consonants with the first three followed by a Virama, or to be more accurate for Bengali a Hasant") and a chart of nearly 400 that ignores the other thousands is not all that complete of a chart.
At which point someone has some explaining to do about how they would expect all of this to be described.
You can look in that Wikipedia article on conjuncts and see lots of rules but they also have lots of exceptions. It is unclear (to me at least) whether there are algorithms that could be used to build the conjuncts but since some appear to have no connection in shape to the original consonants it is probably easier to just have the huge horking table.
Now there are movements to dump many of the conjuncts (e.g. the swachchha font movement) but they have limited traction.
So even though The script can make the language more complicated, the fact is that people are often quite fond of their complications. Not everyone will do stuff like experts chose to do in Want to hear about a cool new typographic convention? Khmer, and I'll tell you about it..., and not everyone wants to move from a primarily Top-to-Bottom/Right-to-Left language to a primarily Left-To-Right/Top-To-Bottom language just to get into computers like happened for Japanese (in retrospect they made a good choice, as vertical support is still not all that great -- though soimne would clsim if they had styed vertical this effort would have been more vigorously supported).
So getting back to the less "language revolutionaries" thoughts of people wondering what they can do in Unicode.
The original topic, remember?
You (by which I mean They) are not asking the right question.
If your (by which I mean their) language is Hindi or Bengali or Oriya or Telugu or Marathi or Malayalam or Konkani¹ or Assamese or Punjabi or whatever, the question you (by which I mean they) have to ask is whether people are able to widely support your language -- and by widely I mean there are input methods and fonts and search tools and all of the platform pieces you need to easily work with the language -- and if there are blockers, figuring out how to unblock them.
The problems can be technical, they can be conceptual, there can be motivation to change or inertia against it.
But the key is not to say "look what Tamil got, how can I get something?" but to figure out what you (by which I mean they) need that can genuinely help and then to push to get that need taken care of in a way that fits in the framework of how the Internet and everything else gets work done.
Now obviously the fact that the Government of Bangladesh joined Unicode as an Institutional member on June 30th is an example of perhaps just such a movement. If it is, I hope it leads to productive conversations and contributions.
To be honest after I saw the announcement my first thought was that I wish I had time/resources to go up there and talk to some people while I am on the right side of the world for such talks to happen, but I really can't afford to go to all of the places I want to go while I am in India without people sponsoring some parts of the trip (i.e. plane tickets and hotel rooms) involving visits to them. And no one from Bangladesh was asking anyway.
So, we'll see what happens, I suppose.
If you're interested, I'll keep you all posted (and to be honest since I am interested I'll keep you posted anyway!).
1 - The spellcheck results for the word Konkani were quite disappointing. Blog spell check #fail, and I believe the only one of India's constitutional languages to not be in its dictionary: