Oriya vs. Odia?

by Michael S. Kaplan, published on 2010/11/18 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/11/18/10092981.aspx


It isn't Korea vs. Corea.

Or Chaudhuri vs. Chaudhary? for that matter.

It isn't Farsi vs. Persian, either.

And really it isn't Uighur vs. Uyghur, either.

It does have a lot more in common with Macao vs. Macau.

And Bangalore vs. Bengaluru is even closer still to the mark still.

Now Oriya has a name, in its own script:

ଓଡିଆ  

as I mentioned in O-O-O-Iced Win7 and an Oriya Cookie, They forever go together, What a Classic Combination....:

This particular English transliteration, like the Bangalore one, is widely described as a Britishism, as result of the British occupation.

This does tell part of, well okay most of, the story, though I have had some additional information suggested to me about it.

I can't speak to the veracity of the information, so hopefully readers who know will either confirm the logic or refute the claim.

Many of the British transliterations were based on the Sanskritized transliteration of names, so apparently while it is true that ଡ (u+0b21, aka ORIYA LETTER DDA) is generally pronounced more like the "d" in "dark" or "darn", the Devanagari counterpart ड (U+0921, aka DEVANAGARI LETTER DDA) is almost like the "d" in "drum", with perhaps a less "guttural" bent where one has the tongue further back in the mouth and one is exhaling less. So one has not a true "r" but an "r" style mouth shaping.

If true, then a transliteration scheme trying to figure out what to do to differentiate

from from from

by which I mean

U+0b21 from U+0b22 from U+0b26 from U+0b27

by which I mean

ORIYA LETTER DDA from ORIYA LETTER DDHA from ORIYA LETTER DA from ORIYA LETTER DHA

one might really cling to this difference as the way to differentiate ORIYA LETTER DDA from the others.

Now as I said I do not know enough about Oriya to say for sure and although some casual web search led to some mild confirmation of some of the pronunciation "logic" suggested here, I haven't found anything overwhelming to confirm or refute this "logic".

Though in the end, even if it is true it still represents something a native speaker may not be such a huge fan of. Who wants to rehabilitate the incorrect opinions of someone who is wrong about your language? :-)

Now as articles like Orissa is now Odisha, Oriya is Odia point out, the government moved on this change with all due haste once it came up on the ballot.

As with Uyghur (from Uighur) and other such changes, companies like Microsoft will probably jump on it where they can (i.e. in the next version), though there are limits there. I mean, since Unicode can't change character names, all of those names will need to stay

Oriya Letter *

and such, which means that Microsoft couldn't change Character Map or the Word Insert Symbol... dialog's notion of these character names either since it depends on the names as they are....

I imagine descriptions of things like the HTML meta tags like "ISCII Oriya ##charset=x-iscii-or" might also be slow to update given the places it is buried. And there may be some cases like that they stay that way for a long time. people will need to know about both names of a while.

But I support Odia here, even though I lose an "Oreo" pun out of the deal. Transliteration accuracy trumps my silly pun requirements any day (and the Oriya/Oreo one is on shaky ground anyway!).


Azarien on 20 Nov 2010 6:52 AM:

Saying that "Unicode can't change character names" is a bureaucratic point of view.

Saying that Microsoft can't change it because Unicode didn't, is an another bureaucratic point of view.

Michael S. Kaplan on 20 Nov 2010 9:09 AM:

Stasbility rules and consistency rules are more than just bureaucratic.... :-)

Michael S. Kaplan on 20 Nov 2010 9:20 AM:

And (from that point of view!) changing a longstanding name just because one didn't get to choose it? That too is a bureaucratic point of view....

The thing is, each decision is also a lot more than that.

Random832 on 23 Nov 2010 4:49 AM:

The problem is there's no adequate explanation of _why_ they can't change. The stability policy says "They are designed to be used programmatically and, therefore, must be stable.", but it's not clear what programmatic use they're suitable for [particularly what can't be done with "UXXXX" strings instead], given all the hyphens and spaces and huge length...

And not changing them makes them unsuitable for displaying to users and allowing users to search for, since they don't match user expectations on what the characters should be called.

Michael S. Kaplan on 23 Nov 2010 8:03 AM:

There are numerous explanations (e.g. Unicode Character Names) and the Stability Policy on Names (here), though perhaps aliases can be proposed by someone (explained here)....

Michael S. Kaplan on 23 Nov 2010 3:28 PM:

That first reference I listed points to an article by Andrew West that gives the big reason here. I'll quote it directly:

Since the merger between Unicode and ISO/IEC 10646 only two characters have ever changed their name, namely U+00C6 and U+00E6, which were originally called LATIN CAPITAL LETTER A E and LATIN SMALL LETTER A E in Unicode 1.0, then changed to LATIN CAPITAL LIGATURE AE and LATIN SMALL LIGATURE AE in Unicode 1.1 after the merger with ISO/IEC 10646, and finally changed to their current names LATIN CAPITAL LETTER AE and LATIN SMALL LETTER AE in Unicode 2.0. The latter change was due to representations by the Danish Standards Association who considered these two characters to be letters rather than ligatures; but this caused so much trouble and acrimony that the respective committees of Unicode and ISO/IEC 10646 resolved never again to make any name changes, regardless of the severity of the mistake or the triviality of the change required.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/08/08 Bengali vs. Kamrupi?

go to newer or older post, or back to index or month or day