I'd rather call it the path separator

by Michael S. Kaplan, published on 2005/10/12 03:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/10/12/479561.aspx


I am talking about the reverse solidus, U+005c (also known as the backslash to some).

It is even known as the "whack" to others (when some people talk about UNC paths, they will say 'whack whack servername' when they mean \\servername).

I am sure you may know of other names.

Though if you are on a Windows system with a Korean system locale, you have another name for it -- Won sign (I talked about this a little bit here).

And if you are on a Windows system with a Japanese system locale, you have yet another name for it -- Yen sign (I hinted at this briefly here).

The reason?

Well, Larry Osterman talked back in June about why the DOS PATH character is '\'.

However, in the JIS and KSC standards, 0x5c is where the Yen and the Won are stored. And it is not like you could have a system without paths just because it has a Japanese or Korean configuration, right?

So in code page 932, 0x5c (YEN SIGN) has to have a round trip mapping to U+005c (REVERSE SOLIDUS), with a mere best fit mapping to U+00a5 (YEN SIGN).

And in code page 949, 0x5c (WON SIGN) has to have a round trip mapping to U+005c (REVERSE SOLIDUS), with a mere best fit mapping to U+20a9 (WON SIGN).

In both cases, the importance of money was overshadowed by the importance of a path separator (this is, by the way, yet another reason to be sure to always use Unicode!).

This all came up yesterday when in a comment to my post Show me the [small]money, Yoshihiro Kawabata mentioned that U+005c was also being accepted -- he even ran through every character in the BMP (SQL he ran available online!) to see what else works in SQL Server as a "currency sign". It looks like they did indeed add some more currency symbols in SQL Server 2005, although the Yukon documentation has not completely caught up with the list. The additions in Yukon are:

Codepoint Symbol Name Added in SQL 2005
U+0024 $ DOLLAR SIGN  
U+005C \ REVERSE SOLIDUS  
U+00A2 CENT SIGN *
U+00A3 POUND SIGN  
U+00A4 ¤ CURRENCY SIGN  
U+00A5 ¥ YEN SIGN  
U+09F2 BENGALI RUPEE MARK  
U+09F3 BENGALI RUPEE SIGN  
U+0E3F ฿ THAI CURRENCY SYMBOL BAHT  
U+17DB KHMER CURRENCY SYMBOL RIEL *
U+20A0 EURO-CURRENCY SIGN  
U+20A1 COLON SIGN  
U+20A2 CRUZEIRO SIGN  
U+20A3 FRENCH FRANC SIGN  
U+20A4 LIRA SIGN  
U+20A5 MILL SIGN *
U+20A6 NAIRA SIGN  
U+20A7 PESETA SIGN  
U+20A8 RUPEE SIGN  
U+20A9 WON SIGN  
U+20AA NEW SHEQEL SIGN  
U+20AB DONG SIGN  
U+20AC EURO SIGN  
U+20AD KIP SIGN  
U+20AE TUGRIK SIGN  
U+20AF DRACHMA SIGN  
U+20B0 GERMAN PENNY SIGN  
U+20B1 PESO SIGN  
U+FDFC RIAL SIGN *
U+FE69 SMALL DOLLAR SIGN *
U+FF04 FULLWIDTH DOLLAR SIGN *
U+FFE0 FULLWIDTH CENT SIGN *
U+FFE1 FULLWIDTH POUND SIGN *
U+FFE5 FULLWIDTH YEN SIGN *
U+FFE6 FULLWIDTH WON SIGN *

(Note that as I talked about in Show me the [small]money, most of the currency symbols are still not on the list, despite the additions that have been done!)

I am not going to get into why I think this special relationship between U+005c, U+00a5, and U+20a9 feels kind of disconnected and odd and awkward, because I understand (given the absolute historical identity and need of the path separator) why it has to be there.

And I do understand why it would be on SQL Server's magical list of things that are treated as currency symbols (after all, customers adding currency data from non-Unicode Korean and Japanese sources should not fail).

Given how we as a company try to act like the whole issue not there a lot of the time, I even understand why it would not ever have been documented since documenting feels kind of awkward.

So I would rather call it a path separator on Windows, since that is true on every language configuration upon which Window is run. It makes the whole situation a lot easier to deal with, conceptually....

(In the meantime, feel free to vote on Kawabata-san's issue related to the unlisted U+005c in the documentation at the MSDN Product Feedback Center!)

 

This post brought to you by "\" (U+005c, a.k.a. REVERSE SOLIDUS)


# Ben Bryant on 12 Oct 2005 10:19 AM:

I agree except for the potential confusion of "path separator" with the "forward slash" especially as URLs become the most commonly seen path-like things and in many cases even in Windows it seems you can use the forward slash for a path separator on the file system too. The backslash path separator seems more and more relegated to DOS and programmers, rather than for common usage. Then again you do see it if you display and use the path in Windows File Explorer. But anyway, it is a good post covering the issue well. Thanks

# Michael S. Kaplan on 12 Oct 2005 11:10 AM:

Of course, technically it is too late to change the actual name (and Unicode, which is meant for all platforms, cannot name something that is used that way on just one). But maybe if the issues can be sorted out here then at least they will be less confusing for the next person....

# Maurits [MSFT] on 12 Oct 2005 4:15 PM:

So if you're on a Japanese or Korean version of Windows, this means you can't create a folder or file with your currency symbol in the name... and if you do a directory listing, instead of backslashes you see currency symbols?

# Michael S. Kaplan on 12 Oct 2005 4:33 PM:

Actually, it is interesting -- you should be able to create the files if you use the non-U+005c code points, but you probably will not be able to access the files from non-Unicode apps with KOR/JPN system locales....

# Yoshihiro kawabata on 13 Oct 2005 5:43 AM:

Thank you for pickup this U+005c issue.
Yes, U+005c is major issue of unicode.
and I see WON SIGN same for the first time.

# Nick Lamb on 14 Oct 2005 8:46 AM:

Is it true then that U+005c looks different (as in, it looks like ¥) on Japanese Windows systems?

If so, isn't it key to GIFT's mission to deliver replacement fonts that fix this issue (shouldn't be hard, it's just one substitute glyph across maybe a dozen key fonts) ?

# Michael S. Kaplan on 14 Oct 2005 10:13 AM:

Hi Nick,

If it was considered a bug to fix, that would be true. But it is an intentional design decision, so there is nothing to fix.

You do not need a Japanese copy of Windows to see it. If you change your default system locale ("language for non-Unicode programs") to Japanese and reboot. You will see it as well in all of the paths in the cmd window, etc.

# Michael S. Kaplan on 14 Oct 2005 10:15 AM:

To be clear about GIFT's "mission" here, it is not some neo-imperialistic plan to tell Japanese and Korean users that their expectations built over the last few decades are wrong. It is to deliver that expectation to them....

# Massuda on 1 Nov 2005 11:21 AM:

I am Brazilian and I am very surprised with U+20A2... cruzeiro was the Brazilian currency 10-15 years ago!

Never noticed that it was part of unicode standard.

The curious thing is that I never saw that sign in my life, because cruzeiro sign was written "Cr$".

By the way, Real (R$) is the Brazilian currency since june 1994.

referenced by

2013/10/31 At long last, explaining the yen/won/backslash bug plausibly

2007/11/09 If you get a yen for something a bit wider you might be out of luck

2007/03/28 A yen for Yen may be left unsatiated

2006/05/26 Two chickens in every pot, and an ASCII in every code page

2006/03/17 On the fuzzier definition of a 'Unicode application' on Win9x....

2005/12/28 Getting rid of your extra yen

2005/11/09 Getting around the default system locale

2005/11/07 SQL Server and the CLR have different ways of getting the money

2005/11/01 I WON to talk about the YEN

2005/10/14 The mission of GIFT

go to newer or older post, or back to index or month or day