I WON to talk about the YEN
by Michael S. Kaplan, published on 2005/11/01 10:15 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/01/487665.aspx
(terrible pun in the title, sorry!)
It was early last month that I shook my head about the REVERSE SOLIDUS and stated for all the world to hear that I'd rather call it the path separator.
And I have been a strong advocate for keeping everything in Unicode, pointing out that code page 932 and code page 949, while being very useful for many purposes, are like poison for the YEN SIGN (U+00a5) and the WON SIGN (U+20a9), since any time they roundtrip through one of those two code pages they are converted to U+005c.
But you may have noticed that I palmed a card there when I tell people to keep it in Unicode to solve the problem, since the statement ignores the fact that neither U+00a5 nor U+20a9 are on the keyboards for Japanese or Korean users.
So while it is true that keeping it in Unicode will solve the problem, it will only be solved if you use the real YEN and WON and thus make sure that this particular lack in the keyboards does not cause the problem to already be there, anyway....
(Ben Bryant has been pouncing on me lately about this issue in the newsgroups, which he describes it in the post he wrote entitled Phantom Currency Signs in Japan and Korea.)
If you include that aspect then this is not a trivial problem to solve, either, especially when it continues to this day until and unless the keyboard issue is solved.
Now we have some of the tools in place to help here:
- As I pointed out here and here, both Japanese and Korean collations on Windows will equate the two, thus assuring us that people using the characters as currency signs will not be punished for using the actual currency signs
- The appearance of the REVERSE SOLIDUS is the same as the currency sign on Japanese and Korean systems.
Unfortunately, the PATH SEPARATOR is still a more crucial character on Windows than the currency sign -- imagine an OS with no paths! Would Windows even install? (probably not!).
Ben is right that making the YEN and the WON also act as path separators would have fixed the problem, but that suggestion is over a decade too late, now. Although it is a tempting bordering on tantalizing idea since the number or true YEN and WON characters in the wild has to be small given how they are on no keyboards....
Until reaility sets in and I remember that the NLS locale data properly assigns the true YEN and WON for the LOCALE_SCURRENCY fields for the Japanese and Korean locales, which means anyone who has ever used a program that uses GetCurrencyFormat might have them. And our group cannot punish users who use Unicode and NLS API functions properly without seeming like monsters!
Even if we added the character to the keyboard tomorrow, how do you document to users that these two characters that look the same should be used in these two different circumstances.
Which may be why, as I pointed out in When is a backslash not a backslash?, users in Japan and Korea are often fatalistic and resigned about it.
Perhaps one of the reasons that there is no big push to solve the problem is that Japanese and Korean are not widely used as languages outside of their respective countries, so there is not enough of a market segment that would benefit from adding the extra keys to the keyboard. At least not enough to outweigh the confusion.
But now I am thinking again, perhaps that segmentation of the market will provide as chance to solve the problem, too. Let me think on this one a bit more, I may have an idea....
This post brought to you by "¥" and "₩" (U+00a5 and U+20a9, a.k.a. YEN SIGN and WON SIGN)
# Stuart Ballard on 1 Nov 2005 11:31 AM:
Explain to me again why you can't just change the keyboard layout so that the key that (presumably) has the YEN or WON symbol on it actually inserts the correct character for that symbol?
I suppose the problem is that then there's no easy way to enter the path separator character and that's why making the YEN and WON also act as path separators is so tempting.
But do you really think that on a system where to most users the currency symbol is the path separator, users would be creating files with the currency symbol in the filename? A currency symbol that they can't even type directly? How common are files with "$" in the name in US systems even?
Can you even always assume that the currency symbol is a legal character in a filename? Since the only plausible culprit for such files would be programs using the currency symbol APIs, wouldn't such programs be broken if they assumed it was a permitted character?
I guess what I'm saying is that the behavior of any such existing file would be so utterly broken anyway - the user could never search for it, renaming it would have utterly bizarre behavior if you deleted the currency symbol and re-typed it, etc etc - that doing some kind of translation where the files got renamed to something that didn't include the currency symbol would be no more broken than it already is.
 I'm assuming that korean and japanese users don't actually press a key that looks like "\" to enter their currency symbols...
# Ben Bryant on 1 Nov 2005 11:46 AM:
Thanks for your mention. I don't think there is a nice solution. I just think that in programming for the Japanese and Korean locales, programmers should simply be aware that if their text data contains money related stuff as entered into their program via Unicode edit box, OnKeyDown or existing Unicode text data, they may need to convert U+005c to U+00a5 or U+20a9. This is especially important if the text data may be used internationally.
Oh, and I don't want the yen and won to act as path separators -- that would not be a good thing -- I only posed that far-fetched and undesirable option to clarify the situation.
# Maurits [MSFT] on 1 Nov 2005 12:31 PM:
Actually, I /would/ like to be able to type arbitrary filenames. Can't an encoding scheme be developed?
# Mihai on 1 Nov 2005 1:11 PM:
Although the problem is complicated, in real life I have not seen many Japanese users complaining.
The main reason is that the most Japanese text uses the wide versions of Katakana (including the Windows UI starting with W95).
As a result, the users will use the wide Yen (U+FFE5) when they talk about currency and the reverse solidus (U+005C) for file paths.
Nobody uses U+00A5, because it is not on the keyboard.
Not 100% accurate as an alghorithm, but in real life one can assume that U+005C is always the path separator.
This might (kind of) solve the Japanese problem, we still have the Won. Anyone? :-)
# Michael S. Kaplan on 1 Nov 2005 1:33 PM:
Like I said, Mihai -- our NLS API functions like GetLocaleInfo and GetCurrencyFormat will use U+00a5 and U+20a9. Which is why that equivalence is so useful (even if Norman does not like it much!).
# Ben Bryant on 1 Nov 2005 2:09 PM:
Okay, thank you so much Mihai. So the keyboard produces U+FFE5 for the yen sign. That is the real world observation I was looking for and that solves the problem for all practical purposes. I assume there would be an equivalent in Korea because this makes a lot of sense. They use the wide characters instead of the half-width stuff when they are fully writing in their own language. For pathnames I assume there is a tradition of using the half-width characters going back to a lot of older more ASCII-bound software.
# Michael S. Kaplan on 1 Nov 2005 3:47 PM:
U+ffe6 is the FULLWIDTH WON SIGN. Though see my warning above about the locale data and collation equivalence....
# Michael S. Kaplan on 1 Nov 2005 3:49 PM:
Also, Ben, the rule about half vs. full width is not as clear cut as you are making it sound here.
I talked a little bit bout this issue in the following post:
# Mihai on 1 Nov 2005 4:30 PM:
"the rule about half vs. full width is not as clear cut as you are making it sound here."
Indeed, it is not clear.
You depend on the user doing the right thing.
Pretty much like you depend on the English user to use ndash instead of minus.
Although in general Japanese are careful to detail and to do "the right thing," the rule is not carved in stone.
Doing some statistics on existing text might confirm or infirm my empirical observation :-)
# Michael S. Kaplan on 1 Nov 2005 4:52 PM:
The reason that Microsoft Access uses the halfwidth characters is that the user community found the fullwidth characters to be ugly in property sheets. With reasons like THAT guiding the choices, the fact that major functionality is affected is a real problem if we want to use the fullwidfth characters as the solution (which is why I flagged the issue)....
# Ben Bryant on 1 Nov 2005 5:21 PM:
Mihai - "confirm or infirm my empirical observation" I like it :)
Michael, if the NLS APIs and Access use halfwidth U+00a5 then I don't see a problem as long as the text stays in Unicode. The problem seems to me to be when the user cannot or does not differentiate yen from U+005c, especially in a Unicode edit box. If there is indeed a tradition of using the fullwidth yen sign in Japanese text referring to money, this problem is alleviated. I wish some Japanese users would chime in to confirm or infirm this.
# Michael S. Kaplan on 1 Nov 2005 6:11 PM:
If I validate by comparing what the user types to what is in the locale data, them not being equal is a big deal.
If I find the fullwidth to look ugly in my opinion and therefore do not use it, then that is also a big deal.
I am simply pointing out that it is not as easy as is being claimed here -- there are still issues/concerns that are equal to what you thought was a problem before, esp. since the Japanese and Korean keyboards do not have the fullwidth chars and cannot be changed (even though they can get to them via the IMEs if they want to).
# Nick Lamb on 1 Nov 2005 10:03 PM:
This post brought to you by "" and "" (U+00a5 and U+20a9, a.k.a. YEN SIGN and WON SIGN)
blogs.msdn.com playing up again? Or are the missing characters a sort of joke at your own expense?
Maybe I should try... "\¥₩/ /₩¥\"
# Michael S. Kaplan on 1 Nov 2005 10:07 PM:
Hmmm.... not sure what happened there, Nick.
Fixed now, I hope!
# Christian Kaiser on 3 Nov 2005 6:10 AM:
Well then just use the forward slash as path separator :)
Works in all cases (well, except UNC names. Sigh).
go to newer or older post, or back to index or month or day