Normalize Wide Shut

by Michael S. Kaplan, published on 2007/08/17 03:46 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/08/17/4424917.aspx


(Apologies to Stanley Kubrick, of course!) 

It was almost the very first blog post I ever wrote, back in end of November 2004, entitled Normalization and Microsoft -- whats the story?

In it I mentioned that during my time at Microsoft I really had heard of four different uses of the word normalization.

Well, the day before yesterday I heard of another.

It seems that the ManagementAgent class has a NormalizeString method which (according to MSDN):

The NormalizeString(String) method enables the user to normalize case and accent in a string according to the setting for that particular MA. By calling this function, the user can normalize the string according to the connected directory format during provisioning. As a result, when the management agent string is imported back to the connector space from the target directory, the string value imported will be the same as what was written to connector space at provisioning time, allowing confirmation of the export.

As long as I am being all quote happy I'll include the remarks:

Certain directories, such as RACF, TopSecret, or ACF2 change text strings that are imported into the directory to remove accents from text characters or to make the text all upper case. When data from the management agent is imported back into the connector space, the string in connector space is not the same as what was staged for export, since the directory will have modified the value. When you use this method, you can set case rules on the string, which makes the string all upper case, or accent rules, which removes accent characters from the string. Since normalization is carried out only in outbound synchronization, setting the initial value of an attribute in CS where the configuration specifies normalizing both case and accents would yield the following:

MV Data flow CS uses NormalizeString Data flow Connected directory
café

———>

Initial attribute value (provisioning) Or EAF

String is normalized

café<——>CAFE

——>

Export

CAFE

RACF Directory

café ———>

Exported data is re-imported

CAFE<——>CAFE

<——

Stage

CAFE

RACF Directory

This method is used on the Extensible Management Agent and the XML Management Agent. The management agents can be call-based or file-based.

Once you have created the management agent that contains this method, you must use the Identity Manager UI to set the options for how the string should be normalized. In Identity Manager, you will need to create a new management agent. In the Create Management Agent UI, on the Configure Connection Information page, the management agents can be set as call-based or file-based. You must select the Import and Export radio button as the step type. On the Configure Attributes page, there are two check boxes: Upper Case and Remove Accents. Select one or both options.

It is actually a little known fact, but if you call LCMapString or LCMapStringEx with nothing but NORM_IGNORENONSPACE, you can actually see it did a bit of diacritic stripping (it will not do as complete of a job as my prior posts on this subject get into across all versions of the .NET Framework and Windows, but it was a step long before my blog existed. Hell, before .NET even existed as a concept. It never occurred to me to call it that way....

Anyway, there is also the Aux.NormalizeString method that is part of the Team Foundation Server SDK. Though with documentation that does not explain what it does and even goes so far as to directly say "This method supports the .NET Framework infrastructure and is not intended to be used directly from your code.", who knows what to expect!

So feel free to think of Aux.NormalizeString method as the "mystery meat" of the normalization world, and don't spend too much time looking directly at it. :-)

But anyway, this fifth Microsoft meaning to "normalization" that I honestly didn't think I got until I read all of the text in the topic and even after that was not 100% sure I got it, plus the sixth method that is a real mystery, should be added to the list. If nothing else it will add to the challenge of searching for that NormalizeString function on MSDN. :-)

 

This post brought to you by ƕ (U+0195, a.k.a. LATIN SMALL LETTER HV)


# Doug on 17 Aug 2007 8:30 AM:

Perhaps "normal" is wishful thinking when it isn't applied to a minimal universal representation for the data.

"Normalize" just sounds more impressive that "cleanup" or "make valid."

# Michael S. Kaplan on 17 Aug 2007 10:38 AM:

I suppose we should be grateful and say hosanahs that they did not use Disambiguate in the title? :-)


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day