Almost no one on the Unicode List seems to "get" phishing

by Michael S. Kaplan, published on 2005/02/14 09:10 -08:00, original URI:

The Unicode List is up to its old fun and games again (well, actually its the participants, not the list itself), and this time it is not about the Unicode BOM.

I talked a little about this problem when I was saying International Domain Names? The sign on the door says 'Gone Phishing'....

Then some people started really getting into it because a bunch of hackers "found" a homograph spoofing issue. They even registered an evil URL (www.pа -- the first "a" is U+0430, a CYRILLIC SMALL LETTER A) which in browsers that support the new IDN/punycode stuff becomes

Then those folks at the Unicode List weighed in (in a thread with 116 posts the last time I looked)....

The "solution" that many people have touted involves a list of common cross-script items that might be expected (like Kana and Kanji). And then to show the actual punycode names, since that way people could tell they were being spoofed.

Anyone else see the flaw here?

The feature is for international domain names. If it were just ASCII then a confusing string would indeed warn users that bad things were going to happen. But if we were all using ASCII we wouldn't need IDN in the first place, now would we?

Doesn't it make the whole feature suck just a little bit for its target users if they are left seeing eird crap every time they go to a site that uses their native language for the URL?

I almost weighed into the thread to point out the obvious problems in approach but I did not want to add to the noise (and most likely be drowned out by the people who point out that there is no way to make it secure and how IDN will bring down the internet). So I did not become post #117.

Oops, a few more while I was typing this, mine would have been #120. Sometimes in this post-Kitty Genovese era in which we all live, it is better to not get involved....


This post brought to you by "а" (U+0430, a.k.a. CYRILLIC SMALL LETTER A)
A letter that is feeling quite popular these days and which would like to point out that this site is not ВӀоgs.Мsdn.соm/miсhкар no matter what the URL looks like...

# matt on Monday, February 14, 2005 9:25 AM:

I have absolutely no desire to receive email from or do business with anyone outside of the US. Give me the ability to filter out anything that does not originate from a specific country or region and I'll be happy. This includes 3rd party web sites and visitors/traffic to my web site.

The average person simply does not need to be able to receive email from someone in Russia or Venezuela. By filtering everyting that does not originate in the US, you greatly reduce your attack surface.

Give me a RELIABLE way to do this and the whole IDN thing becomes a non-issue.

# Michael Kaplan on Monday, February 14, 2005 9:28 AM:

Well, that is great for you. But how about the rest of the world?

"We are not alone anymore" (and I do not mean that in an X-Files sense!)

# Kristoffer Henriksson on Monday, February 14, 2005 9:31 AM:

That fixes nothing Matt; the fake paypal address could just as easily have been sent from inside the US.

All this shows is that IDN is poorly thought out and you need to turn off support for it in your browser for the foreseeable future.

# Frank Richter on Monday, February 14, 2005 9:55 AM:

Hm, AFAIK the entered IDN is normalized before converted to Punycode (e.g. ß becomes ss)... couldn't mappings be added that map identically looking characters to the same code point (e.g. the small cyrillic a would get mapped to the plain old ASCII small a), for anti-phishing purposes?

# CN on Monday, February 14, 2005 9:57 AM:

I think we have two problems here:

a) "Cross-scripting", when individual letters/symbols are replaced by their homographs in another script to fit in. The current paypal address is an example of that.

b) When the complete domain name is represented in another script, while maintaining homography.

Something like the cross item list you mention could solve a -- limit which character groups may appear in one domain. www.<strangestuff>.com should be allowed, while<strangestuff> should not.

This would alleviate the problem a little. It does certainly not solve it, because of b and also because of
c) íìïîı, or for example â in languages where you're used to seeing å (and have a shitty non-ClearType, low resolution, small size display and simply don't look closely)

I don't see any easy ways out of this. One could speculate about a highligthing in the browser, in addition to the site security, of whether the site is in the browsing history. Hopefully, you're more careful if you log in for the first time and create your account or are setting up a new machine, while a phishing attack trying to get you to key in your account info would give a somehow distinctly different "experience" to the user.

After all, highlighting of that a site that has already been visited has been around for ages for LINKS to said sites, adding it prominently in the browser UI could improve things.

Just some stray thoughts, I know this is far from both the Unicode list and your field of direct interest.

# Igor Tandetnik on Monday, February 14, 2005 10:29 AM:

"e.g. the small cyrillic a would get mapped to the plain old ASCII small a), for anti-phishing purposes?"

That would mean that I can't go and register река.com (река (pronounced re-'kah) means river in Russian) since (with all Latin characters) is already taken. That would create inconvenience for anybody trying to register an IDN, and also open up new possibilities for squatting (what if I register my IDN first, before you get the chance to register all-Latin domain name?)

# Igor Tandetnik on Monday, February 14, 2005 10:45 AM:

For a concrete example of point b, "paypal" can more or less convincingly be represented as "раура1" - that's five Cyrillic letters and a digit 'one'. With some fonts, you'll have a hard time telling the difference.

Here is the list of Cyrillic letters that look similar to some Latin ones - have fun finding sites that could be spoofed:
а е з к о р с у х
In addition to these (that look similar in both upper- and lowercase), a few letters only look right when capitalized:
Well, the last one is a bit of a stretch.

# forgive me on Monday, February 14, 2005 12:01 PM:

I am a somewhat of a linguist but know nothing about character sets, Unicode etc, beyond the fact it exists.

That said, there is next to no reason for having different language character sets in the same URL.

Warning when you had mixed alphabets on the Firefox/Internet Explorer gold bar, and also prompting I guess, would solve 99.9% of the problem.

# Michael Kaplan on Monday, February 14, 2005 12:06 PM:

Well, there are many cases that ok:

1) Japanese could reasonably have Hiragan, Katakana, and or Kanji and could reasonably mix the first or the second with the third (or with each other in some cases).

2) A site about the talmud with a URL of www.תלמוד or Ελλας (both examples from a post by Mark Shoulson). Note that the internet requires some things be in latin, and the other Latin additions are not unreasonable.

# forgive me on Monday, February 14, 2005 12:34 PM:

OK, point taken.

But still, what is wrong with a gold bar warning in all such cases, with prompting by default with information (which can be turned off by advanced users)?

# forgive me on Monday, February 14, 2005 12:37 PM:

Sorry, obviously prompting is only necessary at secure sites.

# Gary on Monday, February 14, 2005 5:35 PM:

A potential approach?

# Michael Kaplan on Monday, February 14, 2005 5:50 PM:

Gary, this is the kind of cool, well thought out, reasonable approaches to the problem which gets buried in with all of the dumbass arguments that people have about the problem on big mailing lists.

Thanks for pointing it out, its a very cool approach!

# FIREFOX on Monday, February 14, 2005 6:30 PM:

Breaking news.

Firefox will break IDN support by default. Those who want it will be able to easily enable it.


# Michael Kaplan on Monday, February 14, 2005 9:13 PM:

Hmmm.... this is in its own way just as bad though, isn't it? Going from one extreme to the other.

Unless this is a temporary thing until they enable functionaliy to do IDN right. Otherwise Firefox is just saying that if you want IDN you will be screwed over by phishing.

So Firefox either gets one thumb up for admitting they were overeager and working to slow down and do it right (to get two thumbs up you have to do all that the ferst time!) or two thumbs down if they are just panicking and turning off the functionality....

# Minh on Monday, February 14, 2005 10:58 PM:

> That said, there is next to no reason for
> having different language character sets in
> the same URL.

The Vietnamese alphabet is a mix of latin characters & non-standard characters.

Maybe domain names should carry a "code page". So your browser can display "(English)" and "(Cyrillic)"

# Steve Hurcombe on Tuesday, February 15, 2005 1:59 AM:

The problem is not going to be solved by technology because the problem is human perception. It's what the encoded words look like that makes this technique useful.

I propose that there's only one way to fix this problem and that's to have some form of verifiability of these domain names. Maybe finally we have to accept that domain names for commercial organisations have to go through a vetting process (digital certificates) while all other domains (for personal use??) remain on the old system.

Your browser would then be able to tell you that is a genuine commercial domain name, but has not been verified. I know that have a browser plug in for IE that does something like this.

I've some more ideas on phishing emails on my blog...which follow a similar line.

Best regards

# Jonathan Payne on Tuesday, February 15, 2005 7:22 AM:

I was wondering if this could be fixed by simply not allowing similar looking domain names to be registered. That way, if you registered, your registration would "lock-out" all similar looking variants. The domain registrar could maintain a list of potentially similar characters which could be updated as problems arise. Combined with some kind of complaints procedure for catching the domains that fall through the net, I would expect this to solve most issues.

# FIREFOX on Tuesday, February 15, 2005 7:32 AM:


this is a temporary measure to protect interim security while they code out a longer term solution.

See here

# Michael Kaplan on Tuesday, February 15, 2005 7:36 AM:

Jonathan -- that is a good first line of defense, under the justification of the "squatting" and others rules that exist today. But there will need to be other solutions to, since the phishers will find things faster than the registrations do....

Very cool FIREFOX -- one thumb up. :-)

# Centaur on Tuesday, February 15, 2005 9:14 AM:

I say — Down with IDN. The standard on DNS, STD0013 (RFC1034), defines domains as dot-separated labels consisting of digits, latin case-insensitive letters, and the hyphen, and that ought to be enough for everybody.

But if they insist on being able to shoot themselves in the foot, then let there be a LooksLike equivalence relation upon the set of Unicode characters, such that '1' LooksLike 'l' LooksLike 'I' and '0' LooksLike 'O' LooksLike CYRILLIC CAPITAL LETTER O LooksLike CYRILLIC SMALL LETTER O, and let domain names be unique with respect to said equivalence.

# Mike Williams on Tuesday, February 15, 2005 4:21 PM:

I suspect you could still phish by combining charset substituions with changes in the domain extension - popular browsers are mainly fixated on the .com but if you are something like .ca, or, or hell a typo like then a huge number of users are just not going to notice.

I would suggest the browser do some fuzzy string matches (in the manner of a spell-check) to compare loaded sites with sites that you have visited before.

referenced by

2005/12/20 IDN hits the uber-client

go to newer or older post, or back to index or month or day