At long last, explaining the yen/won/backslash bug plausibly

by Michael S. Kaplan, published on 2013/10/31 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2013/10/31/10460718.aspx


As you all know, I was at IUC 37 last week.

While I was there, Google software engineer Jungshik Shin asked me the question that I have been asked many times before.

I've never liked the question, not because I dislike the question itself, but because no one likes my answer.

He gave me his card, but I can't find it. Hopefully he'll see this blog since it's really for him that I'm writing this blog. Can someone please tell him? Thanks...

It has come up so many times that people will start thinking I'm crazy to keep talking about it, you know?

Look, I'm a keyboard guy. I do keyboards.

If you want to see a reverse solidus aka backslash when looking at U+005c on a Korean or Japanese system, then you just aren't going to get what you want.

It's what enough people in Japan and Korea want to keep it this way forever, just about.

One can even claim it is not conformant to the Unicode Standard.

People have been making the claim for years. I even found occurrences in the archives -- even from Jungshik Shin himself!

But I just don't see this one changing.

I will now explain the real, technical reason that the mappings cannot ever change.

At long last, I'm gonna come clean.

I should have done it years ago, but you know what they sat about better late than never....

You can blame it on two specific stability policies and one unavoidable but unfortunate decision (of Microsoft, not Unicode).

The policies are:

  1. You cannot change a code page once it is defined.
  2. You cannot change a "best fit" mapping once it is defined.

We have broken both rules in the past, and the consequences have always been dire.

The one unavoidable but unfortunate decision was which was made the character and which was made the "best fit" mapping.

For the proofs, see:

WindowsBestFit/bestfit932.txt for Japanese:
0x5c0x005c; \ Yen Sign
0x005c0x005c; \ Yen Sign
0x00a50x005c; \ Yen Sign

WindowsBestFit/bestfit949.txt for Korean:
0x5c0x005c; Won Sign
0x005c0x005c; Won Sign
0x20a90x005c; Won Sign

For the three lines:

  1. The first line is the JIS or KSC to Unicode conversion;
  2. The second line is the Unicode to JIS or KSC conversion;
  3. The third line is the "best fit" mapping - terrible but changing it breaks all paths in non-Unicode apps. Windows wouldn't ever finish booting in Japanese or Korean!

Now obviously we would never want to be unbootable. But those compat guarantees are pretty important, too...

Thus we are left here, unable to do better than this but wishing that we could...

So now you know, Jungshik Shin. Does that make it better, or worse? 😏;-)


Michael S. Kaplan on 31 Oct 2013 7:34 AM:

I should have done this years ago, but we weren't publishing the best fit mappings until more recently...

kinokijuf on 31 Oct 2013 9:13 AM:

This is a font issue. I have seen a Japanese font from Adobe IIRC that had a backslash that looked like a backslash, not like ¥.

Azarien on 1 Nov 2013 4:49 AM:

@kinokijuf: But you don't want to end up in a 100 Backslash Shop...

kinokijuf on 1 Nov 2013 1:51 PM:

@Azarien: The backslash was there all the time, just looking like a yen.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day