It's not just ordinary script, dude -- it' superscript!

by Michael S. Kaplan, published on 2006/12/21 12:21 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/21/1341248.aspx

No need to involve the Private Use Area, since all the digits are available in superscript form! See:

As for why they are out of order (a common question that people ask), the first three were inherited from ISO 8859-1, and the rest of them were added all at once to fill out the set. Characters cannot be moved, so the odd placement is just the way things are. There are many people who wonder whether some kind of conceptual ordering in tools like the Windows Character Map (charmap.exe) would make sense here, though in practice it might prove to be harder to find things at times. Not to mention all the arguments about whether all the ones go together, or sll the superscript things, or all the numbers, and so on.

For what it is worth, things went better with their "below the line" cousins the subscript numbers:

Their original names in Unicode of several of these numbers prior to the merge with ISO 10646 were SUPERSCRIPT DIGIT * rather than just SUPERSCRIPT *, though since they are not really digits in the conventional sense (their Unicode category is No rather than Nd), this was probably a good change coming out of the merger....

As you may or masy not see in the above, in many cases on different browsers and different platforms, you will not always see the full set. Alternately, some digits may not look like they are using the same style. This is due to the fact that not all fonts cover all 20 characters, so you are seeing fonts being linked or substituted in.

Hi Michael (or Arya),

Don't mean to turn your blog into an Excel tutorial, but I'm a bit confused. How would one accomplish what you have described above? I can put a superscript "2" in Excel with the formula =char(178), which makes sense to me, as 0x00B2=178. And if I have the superscript 2 in a cell, I can use =code(<cell reference>) to get the value 178.

But what function (or failing that, a UDF) could I use to leverage the knowledge that, say, Unicode 2078 is superscript 8 to enter a superscript 8 into a cell?

I played around in charmap to see if I could figure it out, to no avail. I can find superscript 2 in several typefaces by entering 00B2 in the "Go to Unicode" field , however entering 2078 takes me to the next highest character, U+207F (superscript Latin small letter N). Is this because the font I am currently looking at in charmap lacks some of the codepoints in your list above?

On a side note, I discovered that when I use the "Go to Unicode" field in charmap (XP version), it takes the focus to that character (which I expect), but it also makes that character the first one in the list. Or, to be more specific, it renders me unable to scroll up to any of the earlier codepoints (until I hit Reset). For example, if you select Arial from the dropdown list and then enter 0035, "5" is the first character in the list, and 0-4 and the characters above them (the first row) are unavailable. Similarly, if you enter FEFC, the scrollbar disappears entirely and you can only see the last two characters. Is this a bug in charmap, or is it just by design?

Thanks in advance for your help,

Rob

Dean,

Thanks for the info. Yes, I realize that I can copy to the clipboard from charmap. So if my end goal was just to get those characters into a cell I would do that straight away.

It was just more just a case that I had got to playing with the char function and wondered why it worked with some values and not others. I guess the answer lies in the fact that it only takes an 8-bit value as an argument.

Rob

Rob -- the CHAR() worksheet function only supports ANSI.

But you can use VBA and e.g. wrap the ChrW function like this code as described in this article:

Public Function ChrWwrap(code As Variant) As Variant

ChrWwrap = ChrW(code)

End Function

Screenshot:

http://tkfiles.storage.msn.com/x1pwXU7dy1y8pxYSTluSBHY_n9N25g8XrubvAoYuE87kBgTnzwIhacmN7UUxR0ZlgN4Qx2rzusXeJlDpQJRGof_rsF15AF9DD275NElGmTfg4A

First one: ordinary 1 (U+0031)

Second one: ordinary 1 with superscript flavor applied

Third one: superscript 1 (U+00b9)

Fourth one: superscript 1 with superscript flavor applied

Where is the limit between actual character and layout? I mean a 2 is a 2, whether it appears written in small above the rest of the text or not. Isn't it?

What kind of decision leads to consider superscript 2 as a character different enough from plain vanilla 2 to make it a Unicode code point?

Is it simply some kind of backward compatibility because it appears in widely used codepages? The reason why superscript one appears in older codepages might be different (and more pragmatic): 20 years ago, not all programs who needed to display commonly superscripts digits such as 2 and 3 had graphics display capabilities. Therefore making them a character on their own allowed programs to easily display such superscripts digits.

Oh, wait! Didn't I just answer my own question?