When ALT+X seems to be failing...

by Michael S. Kaplan, published on 2006/05/29 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/05/29/609737.aspx

I have mentioned the ALT+X mechanism for entering Unicode code points in passing previously, e.g. in posts such as Typing in random Unicode code points.

Michael O'Henly asked via the Contacting Me... link:

Hi Michael...

I'm starting to learn Mandarin and want to be able to take notes using pinyin romanization in Word. Basically, I want to be able to easily create the following characters in a Word document.

ā á ǎ à ō ó ǒ ò ē é ě è ī í ǐ ì ū ú ǔ ù ǖ ǘ ǚ ǜ ü Ā Á Ǎ À Ō Ó Ǒ Ò Ē É Ě È

I think what I'm going to describe is a Word (2003 and 2007) bug, but I'd appreciate your opinion.

In a new document and using a unicode font, enter 01CE followed by Alt-X. You get "ǎ".
Enter 01DA followed by Alt-X, then 01CE followed by Alt-X. You get "ǚǎ". So far so good.
Enter "asdf" followed by 01CE followed by Alt-X. You get "asdf01CE". The substitution of the unicode character doesn't happen.

This behaviour _doesn't_ occur in WordPad -- only in Word.

Any thoughts about why this is happening and/or how I can get around it?
Thank you.

It is funny, I was over in building 36 a few weeks ago, while some of the folks from Office were giving a presentation to some customers about some of the international features in the next version of Office.

Chris Pratley was there, and although I probably hadn't seen him for a while, he actually helped me out recently when he pointed out a feature I did not know about in Office ( >= 2002) that is behind the very behavior that Michael hit....

Believe it or not, it is by design!

The <ALT>+X behavior in Word that toggles between Unicode code points and characters only supported UTF-16 code units when it was first introduced, but starting with Office 2002 it will accept code points across the entire Unicode code space!

In this particular case, because what was typed was asdf01ce, where the last four characters could all be potential hexidecimal numbers (and where you had not selected the four specific characters you wanted to be converted), Word could not really figure out what you wanted -- and thus did nothing.

As a workaround, if you select the last four characters of asdf01ce there that represent the code point you want to convert and then hit <ALT>+X, then you will end up with the asdfǎ that you were looking for.

The reason this does not happen in Wordpad is due in part to the fact that the version of Wordpad that ships with some versions of Windows¹ still only has support for UTF-16 -- so you cannot repro the "problem" there, and in part because the "feature" of guessing what you want to convert without the selection does not seem to work exactly the same way.

But if you select the code unit or code point, it will always convert for you!

1 - How far behind the Windows version of RichEdit is from the one in Office is a constant pain point that Office PARTNER SDE Murray Sargent and I have commiserated about in the past!

This post brought to you by ǎ (U+01ce, a.k.a. LATIN SMALL LETTER A WITH CARON)

# Tom Gewecke on 29 May 2006 9:04 PM:

UTF-16 does support the entire Unicode code space, so I assume you actually mean that originally only the BMP/Plane 0 range of characters could be accessed with this system.

# Michael S. Kaplan on 29 May 2006 10:10 PM:

Actually, I meant it a bit more literally -- it supports UTF-16 only -- thus you convert to and from surrogate code units. :-)

# Darrell on 20 May 2008 9:12 PM:

I dont know what Alt+z or alt +x means. How do I apply it to a picture. Will it let me open a picture up and tell me numbers of some sort?? Like gps cords if they were hidden inside the pictures image?

# Michael S. Kaplan on 20 May 2008 11:42 PM:

No, I am saying if you type the four numbers in Word, highlight them, and hit ALT+X it will convert to a character.

Frank DePuy on 2 Mar 2011 6:17 AM:

I am just starting to use alt-x and found this interesting. I have a separate question. I am trying to use alt-x with unicodes in Access 2003. In particular 2104 for centerline, 2334 for counterbore, and 21a7 for depth. All three return a unknown symbol. Can you point me in the right direction. My e-mail is fdepuy@qualityswitch.com. Thank you.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day