No Welsh puns here. Ever. Not even a little one. Not at all...

by Michael S. Kaplan, published on 2009/08/24 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/08/24/9881848.aspx


I have the feeling that certain questions are going to keep getting asked.

Some in person, like "What's up with your facebook profile pictures?"

Or others via email, like the one about people not feeling like a code page selection is right for a particular locale.

Just the other day, the question came in like this:

Hi,

We have been trying to use Welsh and found that the NLS table (http://msdn.microsoft.com/en-gb/goglobal/bb896001.aspx) maps Welsh to codepage 1252.

This has confused me because Welsh can contain Ux0174 (and others) that are not in 1252. (http://www.microsoft.com/globaldev/handson/user/welsh.mspx)

Is there going to be any changes to this mapping to create a Welsh codepage, or an equivilent of Celtic?

Thanks

Stephen Cornes

Now admittedly this is not the same case as On not looking at Uyghur through a Chinese prism where the question is framed in a way that suggests ignoring a language's concerns and requirements in order to fulfill a particular market's. In those kinds of cases, it is easier to pitch Microsoft as the good guy, doing its best. And it doesn't feel like marketing spin to me.

Plus I was in the meetings where this one was decided, and a lot of careful and thoughtful conversations and even arguments happened; only technical issues were raised, as far as I can recall, the "spin" issue didn't come up.

But the Welsh case is slightly different, since the desire is purely to want the best language support. And a pretty direct, honest question.

Sure, code pages aren't enough, I've said so often enough before (like in this blog). But why wouldn't a better choice be made? Perhaps one including Ŵ (U+0174, aka LATIN CAPITAL LETTER W WITH CIRCUMFLEX).

In this case, there are several facts conspiring against Welsh:

  1. There is no better code page available to choose from with better coverage;
  2. We are no longer creating new code pages;
  3. We can't change the code page, ever. Never. Not even a little but. Not at all.

This one feels like much more of a "saved by the bell" issue (the excuse sense, not the television show sense) because the policies block the action.

Whew?

Of course this is not always the case, and I'll give an example of that next time....

But for now if you use Unicode then all of the letters in Welsh are supported -- so once again I say just use Unicode!

 


Daniel Cheng on 24 Aug 2009 11:07 PM:

> We can't change the code page, ever. Never. Not even a little but. Not at all.

Okay, now tell me the history of CP950, CP951, HKGCCS and HKSCS.

John Cowan on 24 Aug 2009 11:14 PM:

ISO 8859-14 handles the Celtic languages, and Ŵ and ŵ are mapped to D0 and F0 respectively, replacing Ð and ð.  This would be code page 28604 if it existed, between 28603 = ISO 8859-13 and 28605 = ISO 8859-15.

For all the help that is.

Michael Everson on 25 Aug 2009 4:18 AM:

See http://en.wikipedia.org/wiki/ISO/IEC_8859-14

pne on 25 Aug 2009 8:28 AM:

> We can't change the code page, ever. Never. Not even a little but. Not at all.

That makes me wonder how the Euro symbol got retrofitted into some code pages, including (at least) 1252 -- what kind of approval process and arduous backwards-compatibility discussions were required before they could be changed.

Random832 on 25 Aug 2009 9:02 AM:

So was the "can't change the code page. ever." policy not yet in place when the Euro was added?

Of course, there are only five unused positions in CP1252 anyway.

Michael S. Kaplan on 25 Aug 2009 4:17 PM:

Daniel, see this blog and this one where I describe the mistakes made here, which we have since learned from.

Michael S. Kaplan on 25 Aug 2009 4:21 PM:

Michael and John -- I was referring to no such code pages on Windows. There are code pages for Georgian out there if you look hard enough; Microsoft isn't adding them every time an ISO committee wakes up and tries to get people off needing Unicode....

Michael S. Kaplan on 25 Aug 2009 4:25 PM:

Hey Random832 and pne -- The thing with the Euro I mentioned here; it is in fact one of the reasons the policy EXISTS (the aftermath of that change and the problems it caused).

John Cowan on 12 Sep 2009 7:59 PM:

Anybody can devise a coded character set, and many have.  But *code pages* are from IBM and Microsoft and nobody else, so if one of you doesn't allocate 28604, then it remains unallocated.  But you could.

Michael S. Kaplan on 12 Sep 2009 10:21 PM:

Except that bleeding policy thing keeps rearing its head!

Yuhong Bao on 24 Jan 2011 11:02 PM:

The funny thing is that ISO 8859-14 is from 1998, around the time when the Euro sign was added.


go to newer or older post, or back to index or month or day