Normalization as obfuscation in C#

by Michael S. Kaplan, published on 2005/02/07 12:32 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/02/07/368570.aspx


Take a look at the following code, let me know what you think of it (compiled with Whidbey Beta 2, note the preview of the exciting new StringInfo methods for dealing with text elements!):

namespace àáâãäå {
using System;
using System.Text;
using System.Globalization;

    class àáâãäå
    {
  [STAThread]
  static void Main(string[] args) {
            àáâãäå(); àáâãäå(); àáâãäå();
            àáâãäå(); àáâãäå(); àáâãäå(); àáâãäå();
        }
        static void àáâãäå(string àáâãäå) {
            StringBuilder àáâãäå = new StringBuilder();
            StringInfo àáâãäå =  new StringInfo(àáâãäå);

            àáâãäå.Append(àáâãäå.Normalize(NormalizationForm.FormC));
            àáâãäå.Append(": ");

            for(int àáâãäå=0; àáâãäå < àáâãäå.LengthInTextElements; àáâãäå++) {
                string àáâãäå = àáâãäå.SubstringByTextElements(àáâãäå, 1);
                if(àáâãäå.IsNormalized(NormalizationForm.FormC)) {
                    àáâãäå.Append("C");
                } else if(àáâãäå.IsNormalized(NormalizationForm.FormD)) {
                    àáâãäå.Append("D");
                } else {
                    àáâãäå.Append("_");
                }
            }
            Console.WriteLine(àáâãäå.ToString());
            return;
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
        static void àáâãäå() {
            àáâãäå.àáâãäå("àáâãäå");
        }
    }
}

It compiles, even though it looks like the namespace, the class name, every procedure (other than main) and every variable looks like the same string.

The wonders of various combinations of the string "àáâãäå".

(Interestingly, due to all the exciting work of someone from the VS team, the cursor moves over the text elements as letters and thus it was an interesting challenge getting this written!)

Do you think it makes the code less readable? :-)

Compile it on the command line:

c:\temp>csc àáâãäå.cs

and then run it to see the output:

c:\temp>àáâãäå
àáâaäå: CCCCCC
àáâaäå: DDDDDD
àáâaäå: DCCCCC
àáâaäå: DDCCCC
àáâaäå: DDDCCC
àáâaäå: DDDDCC
àáâaäå: DDDDDC

Interesting, no? :-)

 

This post brought to you by "à", "á", "â", "ã", "ä", and "å" (U+00e0, U+00e1, U+00e2, U+00e3, U+00e4, and U+00e5, a.k.a. LATIN SMALL LETTER A WITH GRAVE, LATIN SMALL LETTER A WITH ACUTE,  LATIN SMALL LETTER A WITH CIRCUMFLEX, LATIN SMALL LETTER A WITH TILDE, LATIN SMALL LETTER A WITH DIAERESIS, and LATIN SMALL LETTER A WITH RING ABOVE)
Well who did you think would be willing to sponsor this rubbish? :-)


# Brodie Thiesfield on 7 Feb 2005 10:57 AM:

It should be a compiler error if identifiers are the same after begin normalized to a fixed type. Just to stop the sort of confusing bug where it LOOKS like you're using the correct identifier, but in reality it's a different one. This wouldn't happen when everyone uses the same editor for the code, but if some developers choose to use a different editor which has a different normalization mode to the other developer's editor...

# Michael Kaplan on 7 Feb 2005 11:00 AM:

Well, I will not disagree with you, but this is not an error in 1.0, 1.1, or 2.0 of .NET as shipped by Microsoft (if you have an application that does not call Whidbey-specific methods), so technically that would be a breaking change.

Though maybe FxCop can make it a warning. :-)

# CN on 7 Feb 2005 12:34 PM:

Somehow, it made me thinking of a line in a quite funny Swedish poem. (It's dialectal and a semi-phonetic representation.)

Å i åa ä e ö.

Why use anything but vowels if you don't have to?

("Och i ån är en ö." is a "normalized" version of that string. From there on, I leave it as an exercise to the reader.)

(Did you happen to be inspired by the IDN phishing post on Slashdot et al today? I even saw comments linking back to your blog where you talked about it in general terms.)

# Michael Kaplan on 7 Feb 2005 12:37 PM:

Nah, I had this one planned a week ago (thats when I originally wrote the code!). Though I would like to stay under *that* radar....

# Dean Harding on 7 Feb 2005 2:54 PM:

I think firefox is rendering the text incorrectly. When I first looked at this page in firefox I thought "what do you mean, they all look they same - no they don't". Then I switched to IE and had a look, and they all *do* look the same.

To compare:
IE: http://www.codeka.com/tmp/norm/ie.png
Firefox: http://www.codeka.com/tmp/norm/firefox.png

Notice how firefox doesn't seem to be composing the characters properly? I wonder why that is... perhaps if I could be bothered, I might submit a bug report (unless there already is one).

I'm always torn between IE and firefox... I switch to one for a while, but then I'll switch back, and then later switch again...

# Michael Kaplan on 7 Feb 2005 2:59 PM:

Yikes, poor support for Normalization Form D combining marks seems like a pretty bad bug to me!

Though even IE will mess it up for some fonts -- using Uniscribe can help but using fonts with good info helps even more....

Probably worth reporting the bug, though. Better international support is always worthwhile (I pointed out many Extension B bugs in Mozilla back in the day).

# Adi Oltean on 8 Feb 2005 12:06 AM:

How about using the romanian letter ă or Ă
(I hope that IE will get this right :-)

# Dean Harding on 8 Feb 2005 2:33 AM:

Filed bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=281483

If you'll note, I tried it on Mozilla when I got home, and it's even worse there (well, arguably so - at least Mozilla was consistent) because it didn't combine any of the Form D characters!

# bg on 8 Feb 2005 5:18 AM:

> Interesting, no? :-)

NO ;]

# Michael Kaplan on 8 Feb 2005 7:59 AM:

Adi -- worked in IE no problem. :-) I actually just chose the first bunch I saw, I could have included more (I probably would have to if I were building an obfuscator for a big spplication, as even with a factorial number if possiblities it would be a huge name map!

Cool on the bug, Dean -- maybe they will even fix it (shame they do not use Uniscribe on Win32, but I guess I understand why they choose not to "sully" themselves that way. At least it works in IE!

bg -- sorry you didn't like it, but I gave up a long time ago trying to please everybody....

# Dean Harding on 8 Feb 2005 2:59 PM:

Apparently Uniscribe is in the works. My bug got attached as a dependency to https://bugzilla.mozilla.org/show_bug.cgi?id=218887 "Use Uniscribe APIs in GFX:Win and layout/editor" so it looks like that's the plan. Mind you, that one's over a year old, and not much progress in a while... so I'm not holding my breath for a quick fix.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/09/18 UCS-2 to UTF-16, Part 3: It starts with cursor movement (where MS simultaneously gets better and worse)

2008/07/24 When you assess, you make an...

2007/05/09 Sometimes you need more than StringInfo

2006/04/11 Case sensitive Visual Basic!

2005/10/30 Getting on people's case

2005/10/17 Comparing Unicode file names the right way

2005/06/28 The 'grammar' of identifiers

2005/04/29 Where did the new StringInfo stuff come from?

2005/04/22 Exposing custom cultures?

go to newer or older post, or back to index or month or day