There's no "I" in IDN, part 1: If you're not Unicode, you're just wrong!

by Michael S. Kaplan, published on 2011/02/14 07:01 -05:00, original URI:

International domain names -- one of those times that we really are all in this together, a time that "I don't have time to fix this" really isn't a good answer.

I figured I should talk about that for a bit....

So anyway, the question I got from a rather anxious developer via email the other day was:

I have a lot of code that depends on functions like getaddrinfo, getnameinfo, gethostbyname, and gethostbyaddr. How do I get them to support internationalized domain names?

The answer is both simple and complicated.

Complicated because the answer could (in theory) very different depending on whether the server is on the intranet (where one would use UTF-8) or the Internet (where one would use Punycode).

And complicated because there isn't a whole lot of infrastructure to have the system figure out which is which and which to use in native code (the managed story is a little better here but it has its own pitfalls; I will cover those another day).

For now I'll just talk about the intranet story (the Internet story will be for another another day).

The most important step, one that is pretty much universally a good design practice for many reasons but especially here is to move off the non-Unicode functions like the ones our anxious developer named. If one has anything outside of ANSI (or even ASCII in some cases), the Unicode (or UTF-8) version are required here, as the following table points out:

Function you shouldn't use

Function you should be using instead


DnsQuery_W (or DnsQuery_UTF8)


DnsValidateName_W (or DnsValidateName_UTF8)


DnsNameCompare_W (or DnsNameCompare_UTF8)



getaddrinfo GetAddrInfoW



















Now as luck would have it, deciding whether to use the "W" version of the function or the UTF-8 version (for the functions that support both) is pretty simple -- just use whichever format you have the text in already.

And as further luck would have it, for just about all of the functions on this list, the replacement is easy and straightforward for the call itself. Of course you may need to move the code to use Unicode, and it's important to not just convert it from the CP_ACP or whatnot (otherwise you haven't really fixed anything!, but that's not too bad.

You can think of this first step as the most obvious part of all of the work. I'll get into some of the more complicated aspects in the future, with maybe some additional fun details related to Active Directory to make things really interesting (that will be on yet another another day -- or with a topic like AD more than one other day!).

Now once you start getting into the EAI side (i.e. the email side) it gets both insanely simple and insanely complicated too. But eventually, on some other another other day (once again multiple other days, most likely), I'll hit that topic too.

Alex Cohn on 19 Feb 2011 2:29 PM:

Waiting impatiently for the internet part of the story (and, as the proverb goes, for the wisdom to know which to use in real life situation).

Michael S. Kaplan on 19 Feb 2011 2:35 PM:

It'll get there, don't worry. This one is gonna cover a lot of ground. It'll take a bit of time....

