There's no "I" in IDN, part 1: If you're not Unicode, you're just wrong!

by Michael S. Kaplan, published on 2011/02/14 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/02/14/10128621.aspx


International domain names -- one of those times that we really are all in this together, a time that "I don't have time to fix this" really isn't a good answer.

I figured I should talk about that for a bit....

So anyway, the question I got from a rather anxious developer via email the other day was:

I have a lot of code that depends on functions like getaddrinfo, getnameinfo, gethostbyname, and gethostbyaddr. How do I get them to support internationalized domain names?

The answer is both simple and complicated.

Complicated because the answer could (in theory) very different depending on whether the server is on the intranet (where one would use UTF-8) or the Internet (where one would use Punycode).

And complicated because there isn't a whole lot of infrastructure to have the system figure out which is which and which to use in native code (the managed story is a little better here but it has its own pitfalls; I will cover those another day).

For now I'll just talk about the intranet story (the Internet story will be for another another day).

The most important step, one that is pretty much universally a good design practice for many reasons but especially here is to move off the non-Unicode functions like the ones our anxious developer named. If one has anything outside of ANSI (or even ASCII in some cases), the Unicode (or UTF-8) version are required here, as the following table points out:

Function you shouldn't use

Function you should be using instead

DnsQuery_A

DnsQuery_W (or DnsQuery_UTF8)

DnsValidateName_A

DnsValidateName_W (or DnsValidateName_UTF8)

DnsNameCompare_A

DnsNameCompare_W (or DnsNameCompare_UTF8)

DnsHostnameToComputerNameA

DnsHostnameToComputerNameW

getaddrinfo GetAddrInfoW

GetAddrInfoA

GetAddrInfoW

getnameinfo

GetNameInfoW

GetNameInfoA

GetNameInfoW

GetAddrInfoExA

GetAddrInfoExW

gethostbyname

GetAddrInfoW

gethostbyaddr

GetNameInfoW

WSAAsyncGetHostByName

GetAddrInfoW

WSAAsyncGetHostByAddr

GetNameInfoW

WSALookupServiceBeginA

WSALookupServiceBeginW

Now as luck would have it, deciding whether to use the "W" version of the function or the UTF-8 version (for the functions that support both) is pretty simple -- just use whichever format you have the text in already.

And as further luck would have it, for just about all of the functions on this list, the replacement is easy and straightforward for the call itself. Of course you may need to move the code to use Unicode, and it's important to not just convert it from the CP_ACP or whatnot (otherwise you haven't really fixed anything!, but that's not too bad.

You can think of this first step as the most obvious part of all of the work. I'll get into some of the more complicated aspects in the future, with maybe some additional fun details related to Active Directory to make things really interesting (that will be on yet another another day -- or with a topic like AD more than one other day!).

Now once you start getting into the EAI side (i.e. the email side) it gets both insanely simple and insanely complicated too. But eventually, on some other another other day (once again multiple other days, most likely), I'll hit that topic too.


Alex Cohn on 19 Feb 2011 2:29 PM:

Waiting impatiently for the internet part of the story (and, as the proverb goes, for the wisdom to know which to use in real life situation).

Michael S. Kaplan on 19 Feb 2011 2:35 PM:

It'll get there, don't worry. This one is gonna cover a lot of ground. It'll take a bit of time....


referenced by

2013/10/17 There's no "I" in IDN, part 19: There's no "I" in IPv6, either!

2013/10/08 There's no "I" in IDN, part 18: There isn't even an "I" in John C. Klensin's name!

2013/09/13 There's no "I" in IDN, part 17: EAI made it to China, and everybody knows it!

2013/04/19 There's no "I" in IDN, part 16: It's a good thing they decided to call it EAI!

2012/10/12 There's no "I" in IDN, part 15: Still no 'I' in EAI.... but we could use an US sometime soon!

2012/08/08 There's no "I" in IDN, part 14: It turns out there's no "I" in IE, either

2012/05/18 There's no "I" in IDN, part 13: Desktop and Managed and Metro; oh my!

2012/02/27 There's no "I" in IDN, part 12: Emoji + IDN == U+1F4A9 (PILE OF POO)

2011/10/25 There's no "I" in IDN, part 11: There's no place like ::1, not even 127.0.0.1!

2011/09/21 There's no "I" in IDN, part 10: Who needs IDN support? How much? When? (Part 2)

2011/09/16 There's no "I" in IDN, part 9: Who needs IDN support? How much? When? (Part 1)

2011/08/12 There's no "I" in IDN part 8: Punycode don't do the PUA

2011/07/28 There's no "I" in IDN, part 7: IDN comes to AdWords

2011/07/14 There's no "I" in IDN, part 6: It isn't like there's an "I" in EAI, either!

2011/06/29 There's no "I" in IDN, part 5: Stephen Colbert's job is not in any jeopardy

2011/06/17 There's no "I" in IDN, part 4: the 'path' to Hell is paved with IDN bugs

2011/05/25 There's no "I" in IDN, part 3: There's no "I" in DIY, either!

2011/04/02 There's no "I" in IDN, part 2: Try not to use the wrong functions!

go to newer or older post, or back to index or month or day