You down with .OCP (Yeah you know me!)

by Michael S. Kaplan, published on 2010/01/19 07:46 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/01/19/9950377.aspx


It was actually just over a year ago when Michael Holtstrom asked a question over in the Suggestion Box:

Hi. Here's a little program. I know it's not unicode, but the product I'm working on is 14yrs old, so it's just too late for that.

#include <iostream>

void info() {
  char line[1024];
  printf("\n Input via gets() ");
  gets(line);
  printf(" Echo via printf() %s\n",line);
}

int main(int argc, char** argv) {
  info();
  setlocale(LC_CTYPE,"");
  info();
  return 0;
}

So, on my dos console, built from visual studio 98, this works just fine, but built from visual studio 2008 the characters no longer round-trip.

For example, after the setlocale call, ALT+252 shows SUPERSCRIPT LATIN SMALL LETTER N as expected from cp437. And the byte from gets is xFC as expected. But when you give xFC to printf, it displays as LATIN SMALL LETTER U WITH DIAERESIS as would be expected from cp1252.

Now I realize that I can work around this by using  ReadConsole/WriteConsole instead, but isn't is a little insidious that on a completely default system, using basic calls like gets/printf/setlocale, simple IO doesn't round-trip?

Maybe I'm missing something, but it seems like someone has intentionally gone out of their way to make me suffer.

I'd love to know why.

Thanks.

P.S. why call setlocale? Because we always have, and they're scared of what will happen to the database drivers, etc. if we change it.

P.S. why care about non-ascii? Because many apps talk to our db and all latin1 is legal. We've already gone to a lot of trouble to avoid best-fitting when printing to the console, and the new behaviour destroys that.

Sorry it took me so long to get around to this one, Michael. There has been a lot going on....

Now there are shades of the Anything still wrong is probably wrong for good.... issues here and the complex issues surrounding the CRT's setlocale.

As I've mentioned there and other places and as people have noted for a long time, the nature of setlocale with the "" locale call is complicated and seems to change from time to time due to both OS settings and CRT changes.

In this case, a concerted effort to take the implied meaning of setlocale's "" setting to mean

Sets the locale to the default, which is the user-default ANSI code page obtained from the operating system.

and actually switch more of it to use the ACP rather than not making changes.

To make it work the old way in all versions, you can change

  setlocale(LC_ALL,"");

to

  setlocale(LC_ALL,".OCP");

though note that this will potentially also change more than was intended (it will fix the reported issue, but yo could run into another problem with something changing that you didn't expect to).

But the CRT, in many cases, is beholden to a standard that it at least passively tries to live up to, just as most C compilers have a C runtime that tries to live up to that standard (possibly with their own extensions like Microsoft's does).

So if a function is documented as being impacted by locale setting changes then fixing problems where the impact is not happening is simply making the CRT more conformant -- a change that many people feel is long past due and they are glad it happens more and more each version....

I have no strong feelings in either direction, but I will note that there is no way to become more conformant if one retains nonconformant behavior, like ignoring statements about expected locale dependencies.

Now note that I do think the inconsistencies that remain are still kind of weird - like the fact that puts (which behaves the same as printf) does not do the same thing as gets here; this seems like a bug although usually this boils down to the fact that the implementations of each function are not isolated from each other and a function that specifies no locale will deep down be calling one that does and thus one has to have an actual locale in there and then there you go.

But perhaps there is a way to dig in here a bit and treat the functions that deal with the console differently -- and have them use the settings attached to the given console in which they are running, across the board. Since the console is also "locale" based in a broad sense such behavior would also be conformant....


no comments

go to newer or older post, or back to index or month or day