I've got [SCS]U under my skin....

by Michael S. Kaplan, published on 2009/06/03 09:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/06/03/9689748.aspx

Apologies to Frank for the title!

Just last night old friend and compadre from the Unicode days¹ Doug Ewell asked me

Hey Michael,

This should be an easy question, but I can't find anything about how to do this in C#.  (I know, should have used Live or Bing or whatever instead of Google.)

I'm talking about a System.Text.Encoder, of course, and I want to do it for SCSU.  And of course I want to do a decoder too. :)

Thanks for any pointers,

(brought to you by the byte 0x0E, which is the SCSU tag SQU)

I probably should explain how Doug has an unhealthy² affection for SCSU (UTS#6: A Standard Compression Scheme for Unicode) and perhaps to a lesser extent BOCU (UTN#6: BOCU-1 - MIME-Compatible Unicode Compression), some compression schemes for Unicode which in theory can do  better job than programsd like WinZip because of their specific knowledge of properties of Unicode itself.

Of course, both SCSU and BOCU are not your typical "encodings" since they are basically Unicode. But then again so are UTF-8, UTF-16, and UTF-32, and .Net has no trouble calling them encodings, so there should be no problems there.

Anyway, under the old theory about what it means when no samples exist for a technology, I'll point to a sample here from Shawn of an overriding of System.Text.Encoding (with fallback behavior) that in his words "just reverses a-z, A-Z & 0-9 in ASCII". This should allow one to (with changes) allow one to implement the encoding work to support SCSU (or even BOCU!) if one wants.

I would do it, though there is the whole "involvement with Unicode" thing that I grappled with previously, and I think it's best for the actual working sample to be left for someone without the conflict.

Like Doug! :-)

Now this is not "implementing an Encoder and a Decoder" but since you can get text in and out of SCSU I think it is good enough. Though perhaps there are nuances, I have never looked to closely at the differences. It should be enough to get started, at least. And if not, Doug will surely let me know!


1 - I have made my peace with the fact that the connection I have with Unicode which really was confusing from a Microsoft standpoint since it is was largely outside of the company's own efforts to maintain that relationship is over. I take the Bulldog award as quite an ending achievement, like pitcher pitching a "perfect game" right before retiring. And thus I can talk about "the Unicode days" as a distinct entity. :-)
2 - Not in the clinical sense, mind you. I am just having some fun at Doug's expense.


This blog brought to you by(U+eeee, a private use character known for its resemblance to 50's housewives being scared onto chairs by mice!)

Doug Ewell on 3 Jun 2009 11:55 AM:

BOCU-1: no, no.  It's encumbered.  I used to have an unhealthy affection for BOCU-1 until IBM slapped on the license restriction.  Now I wouldn't touch it.

Michael S. Kaplan on 3 Jun 2009 4:38 PM:

Fair enough, dude. For the record, I was one of the loud voices against giving it UTS status after that license was put on it (over and above the official Microsoft position, which was also against it), so I know what you mean!

go to newer or older post, or back to index or month or day