What's wrong with what FxCop does for globalization, Part 0.5 (a segue)

by Michael S. Kaplan, published on 2006/12/06 04:26 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/06/1221890.aspx

I thought I'd talk about the FxCop issue from a slightly different standpoint, and discuss something that has nothing to do with FxCop to give an example of my concerns.... 

If you look at Writing Culture-Safe Managed Code (a .NET Framework Deployment white paper), you'll see a good and typical picture of how a technically savvy person might approach supporting international code without really trying to delve too deep into it (for an example of what I mean, see the section entitled Other Countries  where a quick enumeration of cultures to "worry" about is given!).

Incidentally you can be amused if you look at the section ironically entitled Incorrect Code Example you will see one of the earlier beliefs -- that CurrentCulture was evil for string comparisons but InvariantCulture was a good idea, something that this blog has gone to some trouble to debunk since that time.... :-)

Anyway, if you scroll down a bit, you will see a conversation about the Turkish I (a popular devil when one is trying to talk about culture-safe coding practices!). But the text, which names the Unicode code points for the dotless lowercase and dotted uppercase I (U+0130 and U+0131), actually (presumably unintentionally) shows the capital and small Y with acute (U+00dd and U+00fd):

What's up with that?

Well, if you look at the definitions for Windows code page 1252 and Windows code page 1254, you'll see part of the problem -- where 1254 defines the Turkic I additions, 1252 defines the Y with acute.

Of course that only tells part of the story. The page itself is encoded as UTF-8, so trying to change to either of these other two encodings will mess up the page:


So what is going on here?

The most likely problem is that some tool or application that produced the document did not save it as Unicode but instead as the Turkish code page, and then later some other tool, in converting it to Unicode simply assumed it was cp 1252. The text is therefore corrupted at this point, with no clean way to fix it.

The paper itself reminds me somewhat of that .NET Framework Developer's Guide: Custom Case Mappings and Sorting Rules topic I have discussed previously, in that neither one of them helps with international awareness; they are both written mostly from the standpoint of international mitigation, of how to protect your app from the world.

In my opinion, this is unfortunately the biggest problem in what FxCop does, the problem underlying the issues I was talking about in What's wrong with what FxCop does for globalization, Part 0. The final result that people seem to most often work toward after reading these pages or running this tool is to "culture proof" their code much more than any kind of attempt to properly support other cultures or enable an application to do so.

I am not going to blame FxCop here, as I think it is really the many surrounding documents and topics that are kind of directing the effort. As I was kind of giving some examples of here....

For the next post of the series, I'll start moving into my suggested solutions. :-)


This post brought to you by  (U+0DDE, a.k.a. SINHALA VOWEL SIGN KOMBUVA HAA GAYANUKITTA)

# dls on 6 Dec 2006 9:50 AM:

I just went back and caught Part 0 as well, and even though there might be better paths to knowledge, I wanted to ask if you'd consider a post covering resources for basic development practices for internationalization in .NET. Some of the tools are laid out well in the MSDN, but basic things seem difficult to find--how to successfully organize resource DLLs in moderate-sized projects, installation best practices. (Ignore the rest, it's frustration-prompted) Any practices that can make it easier to "go back and internationalize later" would have huge value since this usually seems to turn into the worst kind of whitebox modification.

# Aldo.NET on 7 Dec 2006 6:34 AM:

On dls' comment about "internationalize later" I would recommend at least some caution - sometimes it's easily doable but sometimes it requires to redesign the architecture of (if you are lucky) part of the application and that's awfully painful.

As a reply to Part 0 was suggesting - FxCop is "just" a tool and the good thing about it is that it highlights potential issues. Developers still need to judge case by case if each occurrence is a real issue or not, beased on the context.

If you had an example of an FxCop rule that would be accurate 100% of the times, you would probably have discovered a bug in the classes/methods to which the rule applies.

Just my 2 eurocents

referenced by

2007/10/01 What's wrong with what FxCop does for globalization, Part 1

go to newer or older post, or back to index or month or day