International Domain Names? The sign on the door says 'Gone Phishing'....

by Michael S. Kaplan, published on 2005/01/11 04:25 -05:00, original URI:

Someone pointed me to a note in Robert Hensing's blog entitled Miscreant hiding techniques: Would the real explorer.exe please stand up? And the relevance of 1979 when doing searches . . .

The first part of his post talked about a machine that had a file called ехplorer.rexe, which looks just like explorer.exe, but it's not, since the first two letters of that first string are not LATIN SMALL LETTER E and LATIN SMALL LETTER X; they are CYRILLIC SMALL LETTER IE and CYRILLIC SMALL LETTER HA. Oops!

Lucky for the customer that they did not have a Unicode font (Lucida Sans Unicode) or a default system locale like Russian that would use code page 1251 as the default system codepage. Because if either of those things were true, even a dir command in CMD.EXE would not have found a difference between them....

(I mention the thing about Lucida Sans Unicode since that's what mine is set to. To me it looks like explorer.exe rather than ??plorer.exe so my only clue of a problem would be that it sorts after the letter "z". Which I guess is a pretty big clue for someone who spends as much time thinking about collation as I do, but for normal people may only possibly reach the level of a call to PSS. Damn.)

It made me think of a worry that Jesper Holmberg asked me about a few weeks ago:

What's your view on risks with international domain names? Isn't there a good chance it'll turn into Phish Heaven - how will my mom tell the difference between, www.wí and www.windοwsupdate.cοm?

Well, it is a bit outside my area of expertise, but it seems pretty risky to me -- there are too many characters that look like other characters. I have no knowledge of what any browser plans to do about it. But treating it as an engineering problem for brainstorming I can think of many ideas -- enough that it seems like a great interview question? :-)

Seriously though, it is an interesting problem to contemplate solutions for, especially as the work for International Resource Identifiers (IRIs) is now a draft and will have an RFC number soon. The problems with Phishing are not new, though -- as Martin Duerst and Michel Suignard say it best in the draft text for the RFC:

Spoofing can occur because in the UCS, there are many characters that look very similar.  Details are discussed in Section 7.5.  Again, this is very similar to spoofing possibilities on US-ASCII, e.g. using 'br0ken' or '1ame' URIs.

As the quote indicates, the problem is not new (it just has more surface area with all of Unicode upon which to draw!). Also, section 7.5 of the RFC has many ideas about potential strategies to mitigate the risk. I feel quite good about the fact that in a self-test I listed out as many ideas as I could think of and then when I looked at that section I had covered most of them (and thought of a few that were not really listed there).

It will be interesting to contemplate what the browsers will do now that people have been free to register international domain names for over a year before the RFC is even out....

I may stop trusting links from most sites. Until a browser shows me that they will do it safely....


This post sponsored by "е" (U+0435, a.k.a. CYRILLIC SMALL LETTER IE)

# e on 11 Jan 2005 9:51 PM:

Type the URL manually.

# Mike Dimmick on 7 Feb 2005 3:18 PM:

Yup, you're clairvoyant:

# Michael Kaplan on 7 Feb 2005 3:24 PM:

Heh heh heh -- I did not see dates on that page, so I am not sure whether they came before me or not. I know I did not see their doc before I posted....

But maybe they got the idea from me? :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2005/12/20 IDN hits the uber-client

2005/08/20 Mitigation tools for IDN security problems

2005/02/14 Almost no one on the Unicode List seems to "get" phishing

2005/01/12 Every character has a story #3: U+0138 (LATIN SMALL LETTER KRA)

go to newer or older post, or back to index or month or day