Custom code pages? Redux

by Michael S. Kaplan, published on 2007/05/19 12:39 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/05/19/2735895.aspx


Warning: what is being described in this post is documented but is not supported. Please keep this mind before considering it!

People ask me all the time how to create custom code pages. Like regular reader Ivan Petrov, who has asked several times (e.g. Custom code pages?) how to take the world of WideCharToMultiByte and MultiByteToWideChar and make it their own....

Well, I'll start by saying that I am not a lawyer, so none of this is legal advice. Keep this in mind to.

But I'll point to three functions that are documented pretty much only in the interests of the Consent decree, that are used internally by Windows o support DLL-based code pages:

So what makes GB 18030 support on Win32 work is:

For a code page number, you have to be between 50000 and 59999 so Windows knows you are in the DLL-based code page range, and outside of anything like what is out there now (probably best to stay in the 599## range if you are asking me for advice, and if you aren't then you weren't going to listen anyway, right?).

From there, you can have your own code page that will work on Win32 as well as you write your DLL and its functions (remembering that bugs were fixed between versions so you may see slightly different results, especially where the flags are concerned!).

It should be obvious, but just in case remember that interoperability between machines is out the window if you do not have the same code page set up on both of them. :-)

Now this is only part of the story of course -- there is really no good promise how (or even that) this will work with MLang, for example. And although it seems to work in .NET for versions 1.0 and 1.1 (if you stick to numbered code pages and conversions only), it will definitely not work in 2.0 and beyond at all and is completely unsupported and undocumented in all of those versions.

(The managed story is based on a side effect -- .NET used to use Windows to do its code page stuff, but it does not do so anymore!)

If you want to create custom code pages in .NET the supported way for 2.0 and beyond, see Shawn's Best Way to Make Your Own Encoding. It even links to a sample that he created, which is more than I will do here with the Win32 case. :-)

Anyway, this is something that a lot of people have figured out already so none of it should be too mysterious. I figure putting it all in one place might make people less likely to incorrectly speculate about how stuff works, if nothing else.

Summary:

  1. Use Unicode.
  2. If you can't use Unicode, then at least try and use one of the built-in code pages in Windows.
  3. If you can't do that, then you can use the info in this article to create your own (unsupported) code page.

Tip your servers. Enjoy the veal!

 

This post brought to you by(U+104c, a.k.a. MYANMAR SYMBOL LOCATIVE)


# Shawn Steele - MSFT on 24 May 2007 12:28 AM:

http://blogs.msdn.com/shawnste/archive/2007/03/17/hacking-code-pages-or-how-to-totally-hose-your-machine-and-your-data.aspx has more scaryness


referenced by

2010/01/12 On my "Vietnamese Plus" and "pseudo-Form V" constructs

go to newer or older post, or back to index or month or day