Unicode in the console (including STDIN)?

by Michael S. Kaplan, published on 2011/11/07 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/11/07/10234592.aspx

In an earlier topic you discussed how to do Unicode output to a console window at the C library level, by using _setmode.

It would be nice with a discussion also of input at the C level (which does not seem to work).

Most of the console samples I have given here use ReadConsoleW and WriteConsoleW. Because as "Myth #8" in this blog points out, the CRT has had some problems in some versions for both STDIN and STDOUT and STDERR.

There is just a bug or two way from the CRT working here, with input being the final frontier.

At this point, assuming you want code that works in any circumstance (in any version), then I highly recommend you move to ReadConsoleW.

There are just too many times that streams will "helpfully" convert text to some code page because it thinks it would be best to do.

Better to use the method that's been working since NT 3.1 than to use something that has had a variety of problems since then that are only now being totally eradicated....

Links from other 'myths' in this blog will get you the "if something is redirected" code, which is the principal nominal benefit of using things like the CRT. And I love the work of colleagues like Philip Lucido (former CRT owner used to be incredibly helpful to me!), but this one area simply has passed the mark of trust, for me....

And to think that all user-side complexity/brokenness could have been avoided if Microsoft had a time machine when they created NT, so they could use UTF-8 for everything. :(

Our Flux Capacitor is in beta during FY12 Q1, so maybe we can take care of that? :-)

Well, UTF-8 was invented in year 1992.

@Yuhong: They're much closer than I thought, actually: according to Wikipedia, UTF-8 started development early '92 was publicly announced January '93, while NT 3.1 was in development from '88 and released July '93 - which perhaps just makes it more disappointing that we were perhaps a couple of years off significantly less complex text handling in C on Windows :(.

NT wasn't going to take further delays to either gut existing "W" functions, or add a third set of functions for UTF-8 (I shudder at trying to convince DaveC of either choice there!). This new encoding as not an official part of the Unicode standard then, and we were implementing Unicode -- not potentially interesting RFCs....

What about allowing UTF-8 to be the ACP/OEMCP?

Asked and answered many times.By you, even....

Again, asked and answered -- we were Unicode focused, not RFC focused....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.