The real problem(s) with all of these console "fallback" discussions

by Michael S. Kaplan, published on 2010/02/15 10:46 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/02/15/9963784.aspx


It seems like these days, you can't swing a cat around here without hitting a bunch of people talking about the console and fallback on bidirectional languages (particularly Arabic and Hebrew).

This is something that people are all over about native console apps, and managed console apps as well (even externally to customers, in Blogs like Dina's Developing Arabic applications should be easy! with blogs like Console doesn’t display Arabic, or in blogs of mine like this one, where a managed code developer was asking about calls to SetThreadPreferredUILanguages(MUI_CONSOLE_FILTER, NULL, NULL).

These various theoretically cat-injuring communications center around the MUI story for Arabic and Hebrew in most cases, and the importance of making sure the right fallback is used (which essentially amounts to either English or French) any time one launches a console application and needs a better source for resources since Arabic and Hebrew won't work there.

But there is a problem to all of this.

Well, two problems actually. I just thought of another problem.

The first one is that this is not only a problem with Arabic and Hebrew; it applies to lots of other complex script languages - from the code page based ones like Thai to the Unicode-only ones like Hindi/Laotian/Khmer/Bengali/etc. None of them work in the console and all of them require fallback.

Now the problem with suggesting it is only an Arabic/Hebrew problem does not mean the suggested solution won't work for these other locales; it will. But by framing the question only in terms of those two, no one thinks about the problem with the others. And anyone who is starting with one of these other languages in mind will be lost.

I don't think this is a problem in Dina's case (her Blog's name is Developing Arabic applications should be easy!, after all. But just about everyone else is guilty of ignoring the fact that this needs to be called out for a lot of other languages.

Now there is a simple scheme for getting the underlying data - a combination of calling GetLocaleInfoEx with the LOCALE_SCONSOLEFALLBACKNAME constant and then doing some code page logic to make sure that the underlying code page can support the language, as the LOCALE_SCONSOLEFALLBACKNAME describes but does not define explicitly:

Note In general, applications should not make direct use of LOCALE_SCONSOLEFALLBACKNAME data. To determine what language resources to use in a console window, an application should call either SetThreadUILanguage or SetThreadPreferredUILanguages. These functions use the console fallback data as a factor in choosing a language that is legible in the console, but it is not the sole determinant. In particular, the console is limited to displaying characters from a single code page. For example, el-GR for Greek (Greece) is a valid console language, but if the current console code page is Latin-1 (code page 1252) the console displays Greek text mostly as a series of character-not-found symbols.

If the language corresponding to this locale is supported in the console, the value is the same as that for LOCALE_SNAME, that is, the locale itself can be used for console display. However, the console cannot display languages that can be rendered only with Uniscribe. For example, the console cannot display Arabic or the various Indic languages. Therefore, the LOCALE_SCONSOLEFALLBACKNAME value for locales corresponding to these languages is different from the value for LOCALE_SNAME.

For predefined locales, if the fallback value is different from the value for the locale itself, the value for the neutral locale is used. A specific locale is associated with both a language and a country/region, while a neutral locale is associated with a language but is not associated with any country/region. For example, ar-SA falls back to "en", not to "en-US". This policy of using neutral locales is implemented consistently for predefined locales and is strongly recommended for custom locales. However, the policy is not enforced. For a custom locale, your application can use a specific locale instead of a neutral locale as a fallback.

Note None of the functions described in Calling the "Locale Name" Functions accept neutral locales as inputs. Thus LOCALE_SCONSOLEFALLBACKNAME data is of very limited use. In particular, neither GetLocaleInfo nor GetLocaleInfoEx accepts neutral locales as inputs.

Now we get to the second problem: although this topic, which has big "do not use" info around it, has some of the best conceptual documentation on the rules for what works and what does not, no one is providing good code here to help. And most of the samples out there, and that others are using, is wrong anyway, in small ways ( since one can change the code page of a console with tools like chcp and fool the code.

And I just noticed what might be a third problem, this time in that MSDN topic. Aren't neutrals supported in GetLocaleInfo[Ex] in Windows 7?

Beyond that, there is a fourth problem that I just thought of.

None of the suggested functions or code samples or conceptual topics solve any real world problem.

Running all of these various solutions will keep you from for example loading Arabic resources in your console application -- which really would not ever be able to happen anyway since no one who knows anything about the console would ever pay someone to localize their application to Arabic anyway!

This makes the code kind of pointless for almost all cases except maybe a few like Arabic Morocco which fallback to French if it is there; most others go to English, which is the ultimate fallback usually anyway.

But doing all this work for the sake of Morocco can be worthwhile as an exercise (or as a reality if you ship software there!).

But beyond that, since in the majority of these cases where the user's default locale will be the same as their UI language, and since we tell people 'til the cows come home to always use the date formatting functions/methods/properties rather than rolling their own, this means that most of those strings will be printing out the same characters the console was determined to not be able to support in the first place.

So you end up right back where you started, for Khmer and Arabic and Bengali and Lao and Hindi and Thai and Sinhalese and all the others.

Hell, every time the user locale fails that same "is it supported by the code page?" test, you will fail. Which could be the case for almost any language other than English.

Using a similar fallback logic for the user locale/CurrentCulture is not something that currently exists. And even if you write your own you are hampered by the fact that the typical en-us fallback is such a bad match for the rest of the world with different decimal separators/day-month order issues/collation support/etc. that you will likely create as much confusion without the question marks as with them.

Perhaps this the fourth and fifth problems: the fourth is the lack of description of the above problem, and the fifth is the lack of any kind of solution....

But I will tell you one thing.

I'll be doing a presentation on console issues in the near future for some Windows developers and testers, and I promise you that the issues in this blog will be some of the main ones pointed out there, with some real suggested solutions to the full problem instead of just the half-solutions ignoring the twothreefourfive problems I mention here.

And also talking about the sixth problem: where Powershell, particularly the graphical Powershell, fits in here (and breaks the assumptions of almost every other solution that used to work some of the time!).

Perhaps the seventh problem is that no one is talking about that issue either.

I'll perhaps have some blogs after all of that is done, too. For the folks following here. :-)


Emeka on 26 Sep 2010 8:28 PM:

Some good info; but your blog is very hard to read.

Recommendation: revise and put a table of content.

Michael S. Kaplan on 26 Sep 2010 11:39 PM:

Nobody has to read it, of course. I'm not writing a book, it is a blog....


referenced by

2010/10/07 Myth busting in the console

2010/09/23 A confluence of circumstances leaves a stone unturned...

2010/06/27 Bugs hidden in plain sight, and commented that way too ANSWERS

2010/06/18 Bugs hidden in plain sight, and commented that way too

2010/05/07 Cunningly conquering communicated console caveats. Comprende, mon Capitán?

2010/04/07 Anyone who says the console can't do Unicode isn't as smart as they think they are

go to newer or older post, or back to index or month or day