by Michael S. Kaplan, published on 2008/10/29 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/10/29/9021615.aspx
Regular reader Jan Kučera, in response to Stripping is an interesting job (aka On the meaning of meaningless, aka All Mn characters are non-spacing, but some are more non-spacing than others), asked in a comment:
Any way for the .NET Compact Framework people out there? (as the normalization does not seem to be available for them...)
Yes it is true, Unicode normalization support like System.String.Normalize does not exist in the Compact Framework.
By the way, I know Jan did not ask the question in the title!
But I suppose that is why they call it compact, in part -- because they didn't put everything in there. :-)
One can always do it oneself, I suppose -- Unicode Nornalization can be implemented by anyone!
And as I mentioned almost four years ago in Normalization and Microsoft -- whats the story?, one can always just call FoldString and get a bunch of the functionality -- just not everything since the tables aren't as up to date.
Or one could find some third party library and call it. Anyone have a favorite?
All of this is a lot of work to perform a destructive job like stripping -- note that although I never judged Valerie for her choices vis-a-vis stripping and school, I do tend to judge code that strips out diacritics that actually might contain meaning. Negatively.
In other words, I may have provided the code to do it (code that to be frank is much better than this code, for several reasons that I'll get into some other time), but I still judge the people who use it.
This is the very combination of logic and meanness that can drive nice people insane -- good thing I'm not nice! :-)
This post brought to you by ピ (U+30d4, a.k.a.
KATAKANA LETTER PI)
Jan Kučera on 29 Oct 2008 4:12 AM:
Hi, searching your past posts in this area I've already got the idea that FoldString might do the trick, plus checked the documentation that this one is on Windows CE for quite a time as well.
What's quite confusing is the documentation, since on desktop, it is located in Kernel32.dll, and the only hints in CE documentation is that the header is in Winnls.h and link library is Coreloc.lib... which is not as helpful as one might think - to a managed stranger thronged in a compact space. :)
So anybody knowing the dll I might use for PInvoke signature? That would be nice ending to this story...
Jan Kučera on 29 Oct 2008 4:33 AM:
PS. As for the judging, I am not a fun of this stripping either, but one must strip here to save some money on sms... - you can send two and fourth message for the same money if you consent to strip.
I have no idea why such space wasting encoding as UCS-2 was chosen here (yeah, probably to earn a lot of money), but if you establish a movement to support national text messages then I would be the first one who would join you! :)
Moreover, I don't think every religion is that much comfortable with stripping, either...
John Cowan on 29 Oct 2008 10:21 AM:
UCS-2 was the only Unicode encoding that existed back in 1985 when the standards for SMS were set. The first SMS message was sent in 1992, only a few months after Ken Thompson devised UTF-8. And making backwards-incompatible changes to all the world's cell phones is unthinkable at this point. So people texting in alphabetic languages that don't fit in the constraints of the GSM 3.38 encoding can either strip diacritics (sometimes with murderous results, as this blog has noted) or put up with the 70-character maximum of UCS-2 SMS.
Michael S. Kaplan on 29 Oct 2008 11:33 AM:
Hey Jan --
The 3.0 eMbedded Visual Tools included a copy of WIN32API.TXT tailored for the CE platform. I can't grab a copy of it at the moment but ads articles like Paul Yao's (http://msdn.microsoft.com/magazine/cc301473.aspx) point out that it is largely replacing the DLl name with "coredll" and calling it a day....
For the stripping issue vis-a-vis religion, it is a choice and a judgment call, I think. Valerie was very well aware that working as a stripper would actually make her more than her physical therapy degree would provide, but she stopped doing it after she graduated since it was just a means to an end. Although we would vigorously discuss the issue at times, I never judged her for it, though as you point out some would...
Jan Kučera on 29 Oct 2008 4:09 PM:
John, thanks for the info, I was not aware of this timing. However, UCS-2 was also a backwards-incompatible change and I think the only real problem with any new change is, that it would lower profits of the operators, which is just unacceptable, no matter how right thing it might be.
(leaving aside that there are characters which do not fit into UCS-2 at all and they are still encoding...)
Michael S. Kaplan on 29 Oct 2008 4:43 PM:
Any implementation that has the fonts/rendering info will support supplementary characters just fine, so the UCS-2 is not a blocker (surrogate code units are for UCS-2 just random undefined code points).
Jan Kučera on 30 Oct 2008 9:20 AM:
Actually if both of us are on Windows Mobile we could use UCS-2 transfer to hold our custom encoding/decoding... cool, I might try this one. :)
And thanks for the "coredll" tip, it works... now I can see stripping even on phone... not that quality as on desktop, you know, but good enough for my purposes! ;-)
PS. And I also wanted to add, that it's not me just nitpicking, we really do have phones here (and not that old at all!), which are not capable of receiving UCS-2 messages...
Michael S. Kaplan on 30 Oct 2008 1:58 PM:
Sounds like we have some hardware providers to take out back and beat the crap out of!
go to newer or older post, or back to index or month or day