speaking with an accent, conceptually

by Michael S. Kaplan, published on 2011/02/18 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/02/18/10128879.aspx


Blogs like When the roof got raised, and why and Number format and currency format are not always the same) and Why does the percent stuff have so many restrictions?(the former two talking about the growing pains involved in extending locale support as new languages brought new requirements years ago, and the latter talking about a limitation documented here that is architecturally fixed in Windows 7 and may one day get its data fixed if we are lucky, point out that NLS is a reactive business.

We have something out there, it turns out to not be enough, and so things are changed. Enhanced. Stretched. Modified.

Other times, it is silly to touch things at all. There are times that a language has a similar concept that is different enough that trying to make it work within existing support that "fixing" it just makes no sense.

Like for one thing, consider LOCALE_S1159 and LOCALE_S2359, the per-locale AM and PM indicators.

In a language like BengaliBangla (ref: Even in India, the language is actually known as Bangla (not Bengali)), have the following set in the locale:

LOCALE_S1159          পুর্বাহ্ন

LOCALE_S2359          অপরাহ্ন

If you know Bangla you might see the problem here.

Let's look at these two words in the larger context in which they exist:

Time period Word When
Dawn ভোর 03:00 to 06:59
Morning সকাল 07:00 to 11:59
Noon দুপুর 12:00 to 14:59
Afternoon বিকেল 15:00 to 17:59
Evening সন্ধ্যা 18:00 to 19:59
Night রাত 20:00 to 02:59

This is a multi-part problem, of course.

Now in general terms someone in Bengal or a Bangla-speaking part of Assam or Bangladesh from that table along with a time is the kind of thing one would want in a time format.

One would not generally do so much with AM or PM after the time in these places.

I emailed with friend Omi Azad about it for a bit and he confirmed that the use of these terms would simply be more intuitive; forcing everyone into the 12 hour clock we use with these two less than perfect terms is far from ideal.

The folks in India and Bangladesh are not alone here, either -- Malay has a similar issue (they would use pg for the morning, tgh for 12 to 4pm, ptg for 4-7pm, and mlm for after 7pm) which has the same problem when itcomes to dding it to our time format notions.

By its very nature this would be a much bigger change, making the architectural investments to support:

Here in the US we have such terms though I can't say I'd expect them in a formatted time string.

Even after confirming with Ben and Shihab and Omi and Goldie that some or all of these terms are used, it is still not entirely clear to me whether they would be expected in a long time format, or whether instead this conceptual jump is due to Bangla people moving to the nearest conceptual analogue that they have to our AM/PM and identifying it, since AM/PM wouldn't naturally occur to them if it isn't exactly how they would look at the world.

But since a similar construct is use in the US and other places, this new architecture would make sense, as would going out and trying to get all the data for it across all those locales.

Though obviously this would pretty unlikely at this point.

Bengalis who wanted such a mechanism for time formatting are probably going to have to keep writing their own code, alongside a 24-hour clock.

Or go back in time 10-15 years and make the case then, of course.

Okay, let's assume that change is not going to be heading our way.

There is another problem and I was having it in my reading research on this problem in my elementary "learning Bengali" books and that when I started describing my troubles Omi pointed out with those AM/PM strings that appears to exist in our Bangla fonts. In his words:

হ ্ ন is currently হ্ন but has to be হ্ণ
হ ্ ণ is currently হ্ণ but has to be হ্ন

So when the font is fixed they will look like পুর্বাহ্ণ & অপরাহ্ণ

So the idea is that the HNA and HNNA conjuncts in the Bengali fonts are perhaps reversed?

If he's right that would explain the trouble I was having.

I was going to check with Goldie too, but she is in Mexico and asking her to be typing in Bengali script seems like a little much. I'll wait til she gets back to ask her....

In the meantime, I'm wondering how many people might be typing words the wrong way to get the right appearance, and how much that might muck around with search in the meantime.

This had me thinking about an extensive discussion I had six years ago with someone from Ethiopia about the fact that they did not have time zones but they had a different notion that they used to describe time that amounted to something wi8th many of the same effects related to how hey thought of time compared to when the sun was up (given that Ethiopia is reportedly the hottest place in the world year round I can easily imagine they would have such a mechanism!).

Maybe I'll ask Scott Hanselman if he has any thoughts about that issue.

And now I am wondering how much of the data in our locales is trying to map what people want on an architecture imperfect to representing what people use -- causing our locales to kind of "speak with an accent" the way as person might speak with an accent because he is using the phonemes he grew up with while speaking a language with different phonemes....


Siyam on 18 Feb 2011 8:28 AM:

qoute:

"হ ্ ন is currently হ্ন but has to be হ্ণ

হ ্ ণ is currently হ্ণ but has to be হ্ন

So when the font is fixed they will look like পুর্বাহ্ণ & অপরাহ্ণ"

I found no problem with HNA and HNNA conjuncts. Then the strings must be miss-spelled.

পূর্বাহ্ণ spells প ূ র ্ ব া হ ্ ণ

অপরাহ্ণ spells অ প র া হ ্ ণ

In Bangla grammar, rules of 'ণ' says ণ will take place after হ (www.sachalayatan.com/.../35669).

হ ্ ণ should look like half of na hanging below of ha, and হ ্ ন should look like curve of na is flipped and merged with ha on right side.

I have tested it with Vrinda, Lohit Bengali and my own Kalpurush. Fonts are correct.

Michael S. Kaplan on 18 Feb 2011 9:05 AM:

So the AM/PM strings themselves are wrong?

Siyam on 18 Feb 2011 10:38 AM:

Yes. They translated this strings with wrong spelling.

Mihai on 18 Feb 2011 5:49 PM:

Actually, the am/pm feels alien even to me.

For Romanian, most of the written content will use the 24h form, or nothing at all, relying on context. If we go out for a movie, 7 is definitely not 7 am :-)

When speaking, the time of the day is not specified, unless there is a chance of confusion. And if there is, then one would qualify it with "7 in the morning" or "7 in the evening" or "11 at day" or "11 at night", or "12 at noon". No abbreviation, because you don't do that in writing.

But the problem is: there are no clear limits between some of the times 10 might be "in the evening" for some and "at night" for others, depending on your lifestyle :-)

Michael S. Kaplan on 18 Feb 2011 6:02 PM:

Ah, but Romanian we do right -- a 24-hour clock by default! :-)

Mihai on 19 Feb 2011 1:06 AM:

Sure, Windows does Romanian right :-)

It was just an example on how things that originate in one language don't map well to how things work in other languages (the concept of am/pm feels alien, because it is not used).

Same problem with something Mac and some UNIX/Linux boxes do: using "yesterday" or "tomorrow" instead of dates. Problem for Romanian is that if you use "tomorrow" (mâine), would also have to use the word for "the day after tomorrow" (poimâine) and "the day after-after tomorrow" (răspoimâine) because otherwise it feels weird to mix colloquial lingo with dry dates, computer style (imagine in English some software using "yesterday" and "today", but 02/20/2011 instead of "tomorrow").

Alex Cohn on 19 Feb 2011 2:23 PM:

I agree with Mihai: the whole concept of AM/PM is foreign for many locales. I must quote a song by a Russian rock group "Звуки Му" (Sounds of Mu):

I woke up at night, around 3 o'clock
And soon I realized: you left me.
So what? You left me, I do not care
Anyway, I will get drunk.
I woke up in the day, around 3 o'clock
And soon I realized: you left me.
So what? You left me, I do not care
Anyway, I will get drunk.
I woke up in the morning, around 3 o'clock
And soon I realized: you left me.
So what? You left me, I do not care
Anyway, I will get drunk.
I woke up in the evening, around 3 o'clock
And soon I realized: you left me.
So what? You left me, I do not care
Anyway, I will get drunk.

I remember that it took me some effort, when I studied English as a foreign language as a child, to comprehend this whole idea of AM/PM.

This same approach is natural for Hebrew, too. The language requires to say "3 before morning" and "5 in the morning", "11 at noon", "6 in the evening", and "midnight", never "12 in the night". Naturally, as in the case of Russian song above, there is a lot of freedom to play with the language. In hi-tech, it's a usual pun to say "the meeting is at 9 before morning" (תשע לפנות בוקר).

True, these locales have 24-hour clock by default... But for both languages speaking about five in the evening is very natural. I agree it's not easy to formalize the 12-hour clock: will 6 be "day" or "evening". But definitely the Hebrew locale on my Windows XP which uses untranslated "AM" and "PM" is better than the Russian that keeps both blank.

Mihai on 21 Feb 2011 9:26 AM:

"it took me some effort, ..., to comprehend this whole idea of AM/PM"

I still have to think twice to make sure about 12. Why 11am-12pm-1pm :-) ?

Why change am/pm between 11 and 12, and not between 12 and 1, when the numbers "wrap around"?

I would really expect 11am-12am-1pm. But maybe that's because I am an engineer :-)

Or the fact that in Romanian "pm" would would be something like "after lunch" ("după masă" literally "after table") while 12 is "at lunch" ("la prânz"), so 1pm, 2pm, 3pm are 1/2/3 "după masă", but 12 is not.

You can also say "după amiază" instead of "după masă", with "amiază" meaning "the time of the day when the sun is at it's maximum height", which probably maps a bit better to pm, but one would never say "12 după amiază" or "12 după masă".


go to newer or older post, or back to index or month or day