Actionable bugs are the only ones that get fixed, in the long run

by Michael S. Kaplan, published on 2011/04/14 07:03 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/04/14/10153912.aspx

This time he still going on about Tamil grammar and its scientific nature. And how UC (the Unicode Consortium) was destroying it.

Tamil grammar in one of its rules categorically states that if elongated vowel is
required add the appropriate vowel/s.

ie for example if kiii is required in Tamil k+ii+i > ௧ீி

when it comes to using dependent vowels (name is questionable) it should not be
restricted by dotted line.

For example a Tamil name, Eurotamilini is in use (similar to Indiraani,
lankeshwaran, Eelavar) and it can be written as it should be, as Grammar permits
it.

Denying this facility or suggesting complicated ways to represent it are wrong and
it should be reversed.

(Not I also noticed in Viki Hindi page, Hindi also using longer than long vowel i.)
So it looks like forcing dotted circle is wrong, wrong according to Grammar and
wrong compared to, for example English etc..

Can we deprecate the dotted circle?
Auai is a name. why attempting to block it.
varuuum is a typical Tamil word, why attempt to block it.

Sinnathurai

Peter Constable, colleague and brave man, jumped into the breach with a dose of reasonable inquiry. He knows what is going on but he knows that whether Srivas is right or wrong he doesn't ever tend to raise issues in an actionable way. And so if there is something genuine the answer must be teased out:

You don’t say what products are blocking k+ii+i.

But how would you expect k+ii+i to display? The two vowels overlap and become
somewhat illegible:

Re “Auai” and “varuuum”, what are the Unicode character sequences you suggest, and
how do you think they should be rendered?

Peter

These are not idle questions Peter raises here. One can praise rendering of a generative model for shoving random diacritics in Latin all you want but if What do you get when you combine a base character with a buttload of diacritics? proves anything, it proves that text that is based on randomly string stuff together is certain to look worse, versus text based on known expectations that someone engineers plan for.

This is kind of a strength that the contrasting Indic model provides -- it handles what it knows how to do, and it assumes the rest is invalid. If all your text is valid and the typography folks know the full set of valid combinations then you never have problems.

Now the contention that Srivas has here is that there is a missing case, which is (if true) useful information.

But he provides no images, no citations. If you want to see how bad text looks if you just shove random crap into the frame, see that buttload of diacritics blog. The information Peter was asking for is not optional.

Hi Peter,

Some details before answering to your question.

When printing technology was introduced in India (by western missionaries, rather
like Unicode now), some changes were made to the way the extra long vowels were
represented.

the extra long vowels were/are being written normally in the form
consonant+long_dependent vowel + short-independent vowel

When pronouncing many make the mistake of splitting/breaking the extra long vowel.
(I have proof even within educational CDs/DVDs on this.)

Actually these are long vowels are supposed to be elongated in time not broken as
separate sounds.

There are occasions in which breaking as components also required. (This I found as
example in representing foreign names. for example,

The name: Kori Akiinoo is normally pronounced Kori AkIInoo, where the IIno supposed
to start a new word, but because it is a name, is written without space.

Anyway, in normal Tamil it is extra long vowel, bout components of different sounds.

We are using the correct form in special circumstances and continuing with the
incorrect method for normal use.

So to start, I have ways of correctly representing the correct requirements using
font technology. In basic form the glyphs do not need to overlap. these are only
rendering manipulations.

However the dotted circles create problems as not all combinations require complex
rendering, especially the extra long representation.

FYI:
The Tamil grammar rule that explains how to write the extra long mathrai/time (not
mathra) is as follows,

"If elongation is required, write by stacking/adding the appropriate vowel mathrai".

The product:
I;ve just tried the following in MS Word 2010 and it shows a dotted circle before
the third mathrai. I pates it below in yahoo mail and it also shows dotted circle.To
be sure, please use a font that actually has the dotted circly glyph.

வரூும்

Regards
Sinnathurai

Now notice that he did not respond with any of the information that is needed here. At least he might have noticed that (he was providing information "before answering" -- he was not answering.

At this point Shriramana Sharma, a colleague of mine from INFITT's WG02 replied:

Srivas, there is neither need nor justification to deprecate the
dotted circle. If implementations display a dotted circle where they
should not, it is a problem with those implementations and not with
the encoding. Peter has already informed me that the major software
producer he is part of has already taken efforts to ensure that all
conceivably meaningful sequences are meaningfully displayed. If there
is any lacuna in this aspect, please provide appropriate screenshots
and I'm sure Peter will be glad to look into it.

This highlights another issue, one perhaps implied to people who know the situation, about whose "fault" the issue is.

You see, Unicode isn't the one defining the use of the dotted circle for every errant case. The fact that Srivas keeps blaming Unicode unscienticality is something that everyone grows tired of, so I'm sure someone else would have lost patience soon if Shriramana Sharma had not. But perhaps they would not have been as eloquent so let's call this one a win.

It i an implementation-specific problem and if the right info is provided then the issue can be addressed.

Sharma,

The need is to write Tamil as in Grammar.

The grammar states, to obtain elongated matrai, not short, not long, but extra long,
stack as required.

The combining vowel is for indicating it is combining/elongating matrai.

The independent vowel indicates it does not kind of combine, but stands alone.

The users hence, wrongly assume it as standalone.

Further, I'm not sure if UC misunderstood the definition of Matrai.
or
Sanskrit mislearnt about the matrai.

Matrai does not directly mean combining vowels, but means timing. (both for
consonant and vowels.)

As far as Tamil is concerned, UC has a mis understanding about Mathrai and trying to
apply this may be the route cause of the error in decisions.

I know you are keen on Sanskrit.
Could you clarify the definition for timing/matrai and combining vowels in Sanskrit.
I can then querry UC, the definitions of the same.

The Tamil Grammar not only is correct, but also accurate in having the timing
definitions.
I want to know wether the problem created by UC or SK.

Sinnathurai

Well Srivas,

In Contemporary Tamil (you know the type I learned in School ) requires
that for Extra-long vowels, the corresponding short independent vowel sign
be added for each Mora.

கூஉ
கோஒ
காஅ
மீஇ

kūu
kōo
kāa
mīi

This is how things work in the mainstream Tamil. I was taught only this way
in School, and thats how Tamil books publish are being published around the
globe.

Please don't push your fringe interpretations of Tamil Grammar into Unicode
and expect the consortium to change the rules for your own whims and
fancies.

Tamil Grammar does not mandate the use of Dependant vowel signs for
extra-long vowels. Period.

Before attempting to educate UC about Tamil Grammar, I suppose you must read
them properly in first hand.

V

Now what is being underscored is the underlying truth about our man Srivas -- that he is inaccurate and perhaps in some cases untruthful much more often than not. The principles that the earlier responses had underscoring them was the assumption that there might be a valid issue that Srivas would help bring to light.

Even though everyone knew that if something actionable came out of the issue it would not be from Srivas, it would be from someone else who had the graphic, and the citations.

INFITT WG02 has done such changes in the past -- for example proving the need to support misnamed Aytham as a standalone character rather than as a combining one, since it is used in loan words as a de facto TAMIL LETTER FA. That was another case of a bogus dotted circle, though that omewas a side effect of Unicode character properties and how Microsoft used them. Thus when the property value changed and Microsoft picked the change up, the dotted circle went away.

ACTIONABLE feedback, to the right people if possible. I mean, Srivas complaining on a Unicode list about a potential Microsoft issue isn't quite the level of complaining to McDonald's about the Burger King whopper you just had, but it is enough of a disconect that one should endeavor to improve the aim when one learns of the new target.

Please do not feed the trolls.

Thank you,

Management.

Seriously, the poster sounds like part-troll and part-kook [1]. Why give them more publicity?

[1] www.catb.org/.../kook.html

Yes, there is some of that. Though his [unreasonable] requests do lead to people wondering how to handle other [reasonable] ones. So sometimes covering the issue makes sense....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.