I agree, the best way to process Unicode input is indeed to make somebody else do it

by Michael S. Kaplan, published on 2007/10/26 10:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/10/26/5677876.aspx


I saw Raymond Chen's The best way to process Unicode input is to make somebody else do it and I wholeheartedly second that motion.

Wow, riffing on Raymond two posts in a row!

And there are lots of reasons for it -- not just the things he talked about but also for random reasons like the problems that came up the next day in If you would wait till I *FINISHED* what I was trying to say, you punk... (aka Premature validation) where eager Unicode character input processing to do validation did not take some of the fundamental issues with character input into account -- supplementary characters.

There are also the many issues that come up with complex scripts where it is quite easy for real problems to come up unless you leave that processing to the professionals and avoid doing character input processing in your own code, entirely.

Unless you happen to be writing your own edit or richedit control, in which case you have the job of learning and supporting these many scenarios, and I wish you luck (since you're going to need it!). :-)

As an aside, I don't think that first letter in Raymond's IME table is the one he meant to use, but the one that as there and the one that probably he meant to go there are both co-sponsoring this post....

 

This post brought to you by ә and ə (U+04d9 and U+0259, a.k.a. CYRILLIC SMALL LETTER SCHWA and LATIN SMALL LETTER SCHWA)


# Raymond Chen - MSFT on 26 Oct 2007 11:21 AM:

That was actually intentional -- Windows XP doesn't install IPA fonts by default, but it does install Cyrillic fonts.

# Raymond Chen - MSFT on 26 Oct 2007 11:25 AM:

Oh wait, except that U+0259 isn't in the IPA block. I must have faked myself out.

# Michael S. Kaplan on 26 Oct 2007 11:26 AM:

I think it is in Tahoma and Microsoft Sans Serif, so it should be safe to look for on XP and Win2000 boith, between them....

(This is not true of all IPA, but I think this particular one is also used in some languages?)

# Michael S. Kaplan on 26 Oct 2007 11:27 AM:

This one is used for both (if you look it up on the charts you'll see it in the Latin Extended block and lisyted in the IPA block -- so people looking fori t don't assume its missing!).

# Michael S. Kaplan on 26 Oct 2007 11:27 AM:

This one is used for both (if you look it up on the charts you'll see it in the Latin Extended block and lisyted in the IPA block -- so people looking fori t don't assume its missing!).

# Mihai on 26 Oct 2007 5:20 PM:

Actually, U+0259 is in the IPA Extension block, and it has glyphs in Arial, Arial Unicode MS, Courier New, Microsoft Sans Serif, Tahoma, Times New Roman.

# orcmid on 27 Oct 2007 3:52 PM:

Sigh.  You may now notice, quite correctly, that I read through my feed-reader backlog in most-recent first order.  

So yes, here is the place where some day I will have to worry about this stuff.  Not that far away either.  Sometime in early in 2008 at best.  Drat.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day