An Encoding.DisplayName property would have made way too much sense

by Michael S. Kaplan, published on 2007/12/19 10:01 -05:00, original URI:

Sometimes when dealing with internationalization issues, questions come in where it is clear that person is confused about the results being observed.

Now it is true that in some cases the confusion is entirely on their side and it is just a matter of pointing them to the right answer.

However, in many cases it is actually the fault of the underlying functionality or to be more precise the way it was exposed -- it was done so in a way that helped cause the confusion. Not a huge glaring bug, mind you -- just a divot in the design where a decision just made things a little bit harder to suss out what is happening.

Of course I am no longer on the team that owns the functionality in either managed or unmanaged code. And even when I was not working in every area, and I was not some kind of architect who approved every design decision, or for that matter even knew about every design decision.

I admit I know a couple of "important" people, but none in my own management chain, really1.

But when a question comes in that really does boil down to one of those minor design flawlets, those inspirations for confusion, I am self-consciously aware of the fact that I was on the team that designed that, whether I was involved or not. A sort of shame of [former ]ownership and/or membership? :-)

Like the other day, when the question was:


I used the Encoding class. When I ran our test on different language, we got different behavior. I define the encoding as follows:

    encoding = new UnicodeEncoding(true, false);

When I get the EncodingName on CHT and KOR machine, their values (shown below) are different from those on other machines:

Language encoding.EncodingName value
CHT Unicode (位元組由大到小)
KOR 유니코드(Big-Endian)
Most machines (Unicode (Big-Endian)

Any idea for the difference? Is this expected?


The problem? Well, the problem is that the Encoding.EncodingName property, described in documentation as:

When overridden in a derived class, gets the human-readable description of the current encoding.

This is okay as far as it goes, but none of those derived classes like UnicodeEncoding document that they override the name, and the vague description above does not make it clear that it is is actually the localized name -- localized into the language of the .NET Framework, usually called the something more like the Encoding.DisplayName.

Of course there is no Encoding.EnglishName either....

So again, this is one of those cases where the interface does not lead one to an intuitive understanding of the design, and the documentation, while misleading, is not outright wrong enough to necessarily be high on the list of things to fix.

The application probably won't crash. But it just makes things seem inconsistent.



The fact that they aren't is kind of irrelevant, especially in this class that for most code pages2 went from a thin wrapper around Win32 in .NET 1.0/1.1 that could perhaps be forgiven as handling CPINFOEX.CodePageName (which is documented as as being localized) as best as it perhaps can to a full-fledged implementation in .NET 2.0 and beyond and never managed to outgrow the childish appearance it inherited when its innards were redone pretty much from scatch3.

So now it is just a footnote, an embarrassing inconsistency that will sometimes resemble the EncodingInfo.DisplayName and other times EncodingInfo.Name, though will probably continue to confuse people asking simple questions about what to call an instance....


1 - The only exception to this rule I know of is a VP who suggested at one point that I should not be working for Microsoft; ironically enough that VP no longer is? :-)
2 - It should be noted that UnicodeEncoding is not one of them and never was -- it was always independent of Win32 even though it wore the Win32 wrapper framework as part of its derivation from the Encoding class.
3 - In fairness, the EncodingInfo class was added in 2.0 which did strive for more consistency, though without a whole lot of documentation links from these other, less consistent properties....


This post brought to you by(U+c720, aka HANGUL SYLLABLE IEUNG YU)

JM on 20 Dec 2007 4:38 AM:

Talk about making a mountain out of a molehill. Yes, the name is localized, yes, it's sloppy that they didn't document that. But that's it. You're devoting a whole post to that documentation omission, when it can hardly be called critical. Or even very important -- if you ask for the "human-readable" name, you shouldn't be surprised that the results could vary wildly. Asking a framework to just give you a name you can show the user allows for a lot of leeway. And, of course, you cannot use the resulting name as program input.

That's why Encoding.EnglishName makes no sense either. It's not supposed to be a name you can use across cultures, platforms, releases -- if you're looking for that, use Encoding.BodyName or Encoding.HeaderName or Encoding.WebName... OK, so this deserves no beauty prize either, and it's a good thing they came up with EncodingInfo. For maximum usability that ought to have a method allowing you to get the display name for any language you want. Other than that, though, I see this as a minor omission in the documentation.

Michael S. Kaplan on 20 Dec 2007 9:05 AM:

Actually, I was focusing on the inconsistency in the implementation.

Most of the rest of the globalization support has a specific name for the localized name. If that were correct, then one would not need documentation to make up for the lack.

Better documentation? Sure, I am a fan.

Better, more consistent, more intuitive design in the first place? I am an even bigger far. Because even better than having someone save you from mistakes is not making the mistakes in the first place....

go to newer or older post, or back to index or month or day