The vector of this spam is [apparently] indeterminate

by Michael S. Kaplan, published on 2008/05/11 15:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/11/8488020.aspx

I have talked in the past about my feelings regarding the REPLACEMENT CHARACTER in blogs such as The torrents of U+fffd (aka When security and conformance trump compatibility and reality).

And I have even mentioned in the A less intelligent strain of blog spam blog how spam attempts mostly seemed equal but there were specific blogs that seemed to attract more of them, finally showing in the blog Microsoft is giving this character nada weight but lotsa importance that one blog in particular (Getting the real (localized) name of the keyboard) seemed quite susceptible to spam containing lots of U+fffd in it.

Even though no other blog seemed to be.

Now generally it take the splog/spam a bit of time to start hitting a post. It has to be live for at least a few weeks and more commonly a few months before it starts getting hit.

But suddenly my blog from yesterday of this time (Why Bengali keyboards can't be found on XP 64 bit) broke all of those rules:

The first instance came just hours after it went live;
like Getting the real (localized) name of the keyboard they are made up almost entirely of U+fffd;

Here is a screenshot looking at them:

Now I am curious what these two blogs:

could possibly have in common with each other that is different from the other 2540-some blogs in Sorting it all Out.

What makes them stand out in particular? What are the spam/splog sites targeting?

The sites they try to point to provide no patterns here, and at this point I believe they are not actually relevant.

Now I speak fluent notdef glyph as good as anyone, and to be honest better than most.

And I instinctively feel that the actual information that they were trying to encode, whether it was intended for phishing purposes or not, would provide some insight into what they were trying to do. And that this information, lost due to the UTF-16, UTF-8 & UTF-32 update to conform with Unicode 5.0's security concerns security fix, causes me to never be able to understand what the attempted attack vector was.

Obviously there is no problem catching this particular kind of spam; it has never gotten through at all. So it could hardly be said to be a very effective attack vector.

And with no more than 2-10 instances being sent to each blog per day, it is hardly the most common form of spam that fails to make it through the filters on Sorting it all Out.

But I am very curious about what the hell links up these two blogs to this particular mechanism and feel that the conformance changes have robbed me of an effective way to ever find out what the vector may be here!

Is this so crazy?

This post brought to you by � (U+fffd, a.k.a. REPLACEMENT CHARACTER)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2010/11/01 The consequences of being unintuitive and nonconformant

go to newer or older post, or back to index or month or day