by Michael S. Kaplan, published on 2005/02/26 15:26 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/02/26/381020.aspx
The title of this post is a slightly sanitized version of an actual email title I received, from a developer named Nancy who says she has been a faithful reader of this blog since it started. She noticed that usually passing it has no effect, but it occasionally makes a code page non-functional. And looking at a flag that either does nothing or makes the API do nothing, she wondered whether it was actually a flag that should be renamed to MB_DONTSTARTNONEWONTBENONE.
I'll put in that request right away. :-)
Seriously Nancy, I'll explain what this flag does now....
The MultiByteToWideChar API has a flag that it can be called with, MB_USEGLYPHCHARS. The documentation for this flag is quite vague, saying only:
Use glyph characters instead of control characters.
Not very descriptive, is it? :-)
The glyph characters are an OEM code page only contraption, with the same table in every one of them1, even code pages for languages that cannot be used in the console like Hebrew and Arabic. They are great for a little bit of legacy artwork in the console, I suppose. All of this stuff is just inherited from a DOS world and these things are so hardly used that they are not exposed in the .NET Framework at all.
If a code page has such a "glyph" table and you specify the MB_USEGLYPHCHARS flag then these are the only mappings you will get when converting to Unicode. Which is probably why people often don't use the flag -- getting no results back for data does not seem like a very useful exercise....
Here for the sake of completeness is the full "glyph" table:
That and "ASCII art" style things is about the best you can do in the console....
1 - The only exception is code page 864 ("Arabic - OEM") which has a completely broken table that gave me wildly different info when I called MultiByteToWideChar with the MB_USEGLYPHCHARS flag. Given that Arabic does not work well in the console, this probably does not cause a big loss in functionality....
This post brought to you by "☼" (U+263c, a.k.a. WHITE SUN WITH RAYS)
A proud member of the glyph chars collection for more years than Seattle has seen sun
Edd on 24 Jan 2009 8:08 AM:
First of all, I am very thankful for the time you have put in to explaining the ins and outs of these functions. Your posts have vanquished that uneasy am-I-really-using-this-correctly feeling I have had.
With that said, surely a better place for this information would be on the MSDN pages themselves!? Can you twist the nipples of the people responsible and make it happen?
It's depressing how often I look up a function on MSDN and then have to go elsewhere to answer all the questions that the so-called-explanation raises, simply to attain a basic level of correctness.
Mr Ziggy on 27 May 2012 3:41 PM:
Not for every one, just the MSDOS / IBM AT code pages based on CP 437, and the Arabic one wasn't, I believe... Since the PC was direct video mapped, the IBM people got cute and assigned these for the PC video boards' character generator, as output only extras, and for backwards compatibility the Console does the same. Input and disk stored texts were treated as what is normally listed as the OEM code pages, for the nationalities IBM customized the PCs to, but applications that did direct Int10 and the terminal output $-terminated MS-DOS calls could display these. The documentation for the OEM Code Pages at globaldev should reflect this, as it's a major difference from the ANSI pages in how they behave. Just a cent, MrZ
referenced by
2007/04/19 Search and ye shall find, SIAO style!
2005/04/19 A few of the gotchas of MultiByteToWideChar