U+0080 is not the Euro!

by Michael S. Kaplan, published on 2005/10/26 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/10/26/484481.aspx


Regular reader Maurits asked:

The "official" euro is U+20AC
http://www.fileformat.info/info/unicode/char/20ac/index.htm

But U+0080... while officially is a "control character"... has a strikingly familiar image:
http://www.fileformat.info/info/unicode/char/0080/index.htm

What happened there?

Well, I cannot speak for someone else's web site, of course. :-)

If I had to guess, I would scour my brain and then recall the fact that in almost every single Windows code page the Euro is located at position 0x80.

Bonus points for anyone who knows the two exceptions to this, and the manifestation of each of the exceptions -- WITHOUT looking at the code page tables. You are all on the honor system on this one!

So perhaps it was based on a font that, having nothing better to do with a control character at U+0080 just ended up shoving a Euro in there, much the way one tucks a dollar in an out of the way place for a rainy day? :-)

 

This post brought to you by "€" (U+20ac, a.k.a. EURO SIGN)


# Yoshihiro kawabata on 26 Oct 2005 4:53 AM:

Hello,

I check this topic in SQL Server 2000 / Windows XP Pro.

In SQL Server 2000 English Edition,
nchar/U+20ac converted to char/0x80,
and char/0x80 to nchar/U+20ac.
by following code.
declare @a varchar(4)
declare @b nvarchar(4)
declare @c varchar(4)

set @a = char(0x80)
set @b = @a
set @c = @b

select
@a, convert(binary(2), ascii(@a)),
@b, convert(binary(2), unicode(@b)),
@c, convert(binary(2), ascii(@c))

and, char(0x80) is monetary
select convert(money, nchar(0x80) + '1')

But SQL Server 2000 Japanese Edition, does not convert to 0x80, this became 0x3F.
and char(0x80) is not monetary.

FYI.

# Michael S. Kaplan on 26 Oct 2005 8:49 AM:

That is not just an FYI, Yoshihiro-san! That is oned of the two code pages that does not have a Euro in it at 0x80! Code page 932....

It was unusual to me at the time, since they actually had the code point free and available. But I do understand now after realizing how bad it is to change code pages.

Good job!

Yuhong Bao on 24 Jan 2011 11:13 PM:

"So perhaps it was based on a font that, having nothing better to do with a control character at U+0080 just ended up shoving a Euro in there, much the way one tucks a dollar in an out of the way place for a rainy day? :-)"

Well, the same problem is encountered with HTML character entities. Here is a clue: What does the first 256 characters of Unicode comes from?


referenced by

2006/12/03 Strangely Symbolic font issues

go to newer or older post, or back to index or month or day