What version of Unicode do passwords support?

by Michael S. Kaplan, published on 2011/01/20 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/01/20/10118124.aspx

The other day, the question came in again.

And when I say the question, I mean THE question.

You know, the What Unicode version do you support? question.

Well, technically it was a slight variation, more of a What version of Unicode is supported in passwords? but clearly the same question is being asked. Someone administrating password policy for users of Windows in their organization want to know what version of Unicode is used in pass word validation.

Unfortunately, not every question that is reasonable to ask is necessarily one that has a reasonable answer.

Sure, to start with there is everything I pointed out in the What Unicode version do you support? blog back in the end of 2005.

But there is a bigger issue here.

The fact that no validation is done on text in the password to remove illegal characters and/or replace them with � (U+fffd, aka REPLACEMENT CHARACTER).

It doesn't care about illegal sequences or non-characters or any other of the rules that Unicode may have.

So really, it almost never ever matters!

I know, everyone is tripping on the use of the word almost in the previous sentence. They either know the exception, think they know it, or know that I am about to trot some exception out....

Well, whichever one it is for you, strap in now please!

The exception is password filters.

First there is the password filter Microsoft provides, described in Strong Password Enforcement and Passfilt.dll. Note the rules it runs under, particularly the character classification information:

Character categoriesExamples

Uppercase letters of European languages (A through Z, with diacritic marks, Greek and Cyrillic characters)

A, B, C, … Z

Lowercase letters of European languages (a through z, sharp-s, with diacritic marks, Greek and Cyrillic characters)

a, b, c, … z

Base 10 digits (0 through 9)

0, 1, 2, … 9

Non-alphanumeric characters (special characters)


Any Unicode character that is categorized as an alphabetic character but is not uppercase or lowercase. This includes Unicode characters from Asian languages.

Note  A given character can satisfy only one category. The GetStringTypeW function is used to test whether each character in the password is uppercase, lowercase, or alphanumeric.

 Okay now obviously if you are using the built in passfilt.dll then there are some Unicode version dependencies here, based on the version of Unicode used to supply data for GetStringTypeW. And possibly not including different Unicode nornalization forms.

And of course if you use other password filter providers instead of or in addition to this built-in one (or even create your own, ref: Password Filter Programming Considerations), it could contain rules with version dependencies as well. 

I personally find these rules to be rather weird in some senses, since there are "Unicode-esque" tricks that could be used to make a password much harder to guess or divine or even get at via keyloggers in the cases of specially design keyboards and unusual characters in Unicode, and yet all of such complexities are given minimal weight under the default passfilt.dll algorithm

In the past (prior to working for Microsoft) I have helped several customers create both unusual keyboards as alluded to above and password filter DLLs that treat many of the complexities of Unicode as a valid method of increasing password complexity. And Unicode is a great area to help provide such additional complexity and to increase the ability to ferret out common lookalikes that Unicode might provide.

Unicode version information has certainly bled into some of those (either dependent on or completely independent of the information GetStringTypeW provides).

But other than such things, passwords know no Unicode version....

Warning: the above applies to Windows; other applications may work by entirely different rules and when/if they do such rules tend to be STUPIDER in one or more senses.

Cheong on 20 Jan 2011 5:27 PM:

But... while the authentication providers don't care, the passwords themselves is different when hashed.

Say... if Windows stored the password hashed in UTF-16, and you pass UTF-8 in for hash comparison, wouldn't the result be different?

Michael S. Kaplan on 20 Jan 2011 9:21 PM:

If you use a conversion mechanism that validates whether the Unicode is "valid" then yes, this can happen. But Windows does not treat it as a string that is to be hashed in a given encoding....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day