I swear the Romanian bug is fixed; it was fixed 4.5 years ago!

by Michael S. Kaplan, published on 2010/12/13 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/12/13/10103924.aspx


It was a bit like that Latvian bug.

The one I described in I swear the Latvian bug is fixed; it was fixed 4.5 years ago!.

Though I admit I get slightly less email about this other issue.

In its latest incarnation, the mails that came in the other day, first this one:

Dear Michael Kaplan,

I am writing to this email address because I've seen it mentioned in a previous blog entry<
http://blogs.msdn.com/b/michkap/archive/2010/11/30/10098113.aspx>
.

My question is mentioned in the subject of this email: *How to do accent insensitive comparison under Windows*?

My test case would be:
voila == voilá
garcon == garçon
kase == käse, or better kaese == käse
arsita == arșiță (Romanian s and t with comma bellow, and a with breve)
arsita == arşiţă (Romanian s and t with cedilla, and a with breve)
arşiţă == arșiță

I've hacked a MSDN C++ sample <
http://pastebin.com/fG9APFDt>, but it doesn't quite work as expected.

After reading the documentation of Boost.Locale<
http://cppcms.sourceforge.net/boost_locale/html/tutorial.html> I was left with the impression that C++ could work as expected (at least "garcon" should be equal with "garçon")

If it is possible in C++ how about the Romanian s and t with comma bellow characters? The C++ locale for Romanian on Windows is equivalent with ISO-8859-2.

I've been informed that Microsoft doesn't want to add support for ISO-8859-16 in their products, UTF-8 and Unicode being the future, but how can I use C++ to have "arsita" and "arșiță" as equal?

Thank you in advance.

Sincerely,
Cristian Adam

and then this other one:

Dear Michael Kaplan,

I have found the CompareString Win32 function and I've made a test application <
http://pastebin.com/6mvkpzxx>.

The results on Windows XP are:

---------------------------
Comparison
---------------------------
voila == voilá
garcon == garçon
kase == käse
kase > kaese
arsita == arşiţă
arsita > arșiță
arşiţă > arșiță

---------------------------
OK
---------------------------

CompareString on Windows XP doesn't recognize the s and t comma bellow for Romanian language!!!

Should I file a bug report through Microsoft Connect?

Sincerely,
Cristian Adam.

Aha, now we know what the problem is.

First of all, you do not need to use that weird way to contact me, use the Contact link on THIS page in THIS blog!

And then, second of all....

XP is not going to be updated to support the Romanian S and T with comma below letters.

EVER.

There are updates to fonts to display them, you can use MSKLC to create keyboards for them.

But if you want collation and casing to work, you have to either:

No XP solution to this problem is going to materialize that will give the right answer for accent insensitive comparisons.

You can search on this site for the Romanian issue with these letters because I have talked about it often in the past.

No, it is not worth the time of putting in a Connect bug -- because it can't be fixed downlevel. The fix already exists, and has existed for over half a decade....

But with all that said, the real problem here is not the one I have been talking about, either here or in that other blog about Latvian.

The "problem" is that there are lots of people who still love XP, and that every day those people notice a problem that they want fixed.

Hell, I have one machine (a Dell Latitude D820) that I use for many things. It runs like crap on Vista, and Dell refuses to support with Windows 7 on it -- no drivers, occasional blue screens, etc. Though Vista is supported by Dell on it. So believe me, that one machine will be running XP for years to come.

But if your language is Latvian or Romanian, you have an even better reason to upgrade:

Because you want your @#%&*! language to work right.

That one machine of mine that @#%&*! Dell won't support on Windows 7? It is doing specific tasks at home that involve neither language. And I have lower expectations because I know it is running something that was mostly developed a decade ago.

By the way, that XP machine is running IE8 (I am not part of the 45% of China that is using @#%&*! IE6).

But as I said earlier in this blog and many times in this Blog, XP is not going to get updated collation, or casing....


Mihai on 13 Dec 2010 11:09 AM:

Actually, it does look like LINGUISTIC_IGNOREDIACRITIC has problems with Romanian.

Try:

wchar_t const szNoAccents[]   = { 0x0061, 0x0072, 0x0073, 0x0069, 0x0074, 0x0061, 0x0000 };

wchar_t const szWithComma[]   = { 0x0061, 0x0072, 0x0219, 0x0069, 0x021B, 0x0103, 0x0000 };

int result = CompareStringEx(

L"ro-RO",

LINGUISTIC_IGNOREDIACRITIC,

szNoAccents, -1,

szWithComma, -1,

NULL, NULL, 0

);

Result: CSTR_LESS_THAN, should be CSTR_EQUAL.

Tried it on Windows 7.

Michael S. Kaplan on 13 Dec 2010 2:20 PM:

If you look at the Romanian data, we don't consider these to be diacritic differences -- the letters are given independant weight.

If you choose anything other than Romanian, you can get the comma below ignored....

Mihai on 13 Dec 2010 3:35 PM:

Hmmm...

I agree that these are technically different, stand-alone characters.

But the lack of support was missing for such a long time (and the ANSI applications on non-Romanian systems did not allow even for the cedilla form) that a lot of people find it handy to just do the searches without "accents"

Very handy (try searching for "resita" in Google or Bing and you will find "Reşiţa", nice)

Michael S. Kaplan on 13 Dec 2010 4:10 PM:

But we can't have it both ways if search/string compare and SORT use the same function -- otherwise the order will be wrong, and that is what the Romanian collation is built for. These letters must have primary weight (like the cedilla below characters used to), or dictionarioes will be wrong.

Until they add an EqualString function, that is. :-)

zedware1 on 13 Dec 2010 9:04 PM:

I am now using Google Chrome on Windows XP. No IE6 anymore.

Michael S. Kaplan on 13 Dec 2010 10:33 PM:

Ah, but you're still using XP....

Mihai on 14 Dec 2010 10:35 AM:

I agree think that sort and search are different beasts, and most people don't even grok that.

Adding an extra API would make it quite confusing.

But by default sorting in most languages will be messed up if one uses any kind of *_IGNORE* flag.

Tae on 31 Jan 2013 7:52 AM:

What’s up with their censored swearing? Romanian s and t with comma bellow is unavailable in XP. You might want to replace “@#%&*!” with “fucked”.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day