Should considering UTF-16 be harmful be considered harmful?

by Michael S. Kaplan, published on 2012/04/27 07:01 -04:00, original URI:

Like many of the people I know, I find myself looking over at Stack Overflow and related sites periodically, sites like

I'm usually pleased to pop in.

I'll admit it is seldom relevant to me these days, since I don't do so much dev work, and the work I do is usually just in my particular area.

But it still can be interesting.

Once in a while, it is even in my area!

For example, there is a question about code page conversion that touches on my "Pseudo Form V Normalization" issues here, and as far as I can tell the problem was solved independantly (the article cites a few of my blogs but does not seem to notice the Form V ones)....

And like the other day, when I saw yet another pingback to my blogs BACKSPACE vs. DELETE and I think MaxLength needs protection to assure safer text, the latter of which also includes a comment by regular reader Yuhong Bao pointing to the Stack Overflow article that this blog today is kind of about:

Should UTF-16 be considered harmful?

At the time, my comment to Yuhong Bao's link was:

I find that article to be rather naive, alarmist, and biased, myself.

This roughly mirrors my current feelings on the subject.:-)

One can argue about how complicated UTF-8 is given its crazy character boundaries.

Or about how huge UTF-32 with empty space in every character, and how it routinely fools people who should know better that it fixes the problems of UTF-16.

Or, as with that blog, with whether UTF-16 is harmful.

I find my UCS-2 to UTF-16 series, with its mix of bug reports of best practices and aspirational suggestions to be a much more reasonable about improving your code.

It was almost like one of those trains that you could get off of at any stop -- you never have to ride it to the end if it takes you as far as you wanted to go.

Now contrast that with Should UTF-16 be considered harmful?, which is not really built to be helpful, even as it catalogs various problems.

By no means is it Stack Overflow at its best....

Now there is also a ton of useful content, too.

Maybe if this article didn't keep sending me pingbacks to remind me it's there, I wouldn't feel the need to comment. :-)

Yuhong Bao on 27 Apr 2012 5:57 PM:

Recently the WHATWG Encoding Living Specification classed UTF-16 as legacy:

WndSks on 27 Apr 2012 8:24 PM:

Do you have an actual account so you can respond/help?

There is nothing like getting a answer from "the source", here are two of them:

John Cowan on 28 Apr 2012 12:34 PM:

UTF-16 is legacy on the Web (less than 0.1% of all pages).  Internally, it is anything but.

Pavel Radzivilovsky on 2 May 2012 7:23 AM:

Dear Michael,

I really suggest you invest some time in reading www dot utf8everywhere dot org. I hope this has potential to convince you. Following the discussion which you mentioned, started by none other than by the author of Boost.Locale, we compiled all arguments and counter-arguments and addressed them in this document.

I'd really appreciate your opinion on that.


Michael S. Kaplan on 2 May 2012 8:16 AM:

Interesting -- and unrealistic....

Joshua on 3 May 2012 1:52 PM:

That page raises something I hadn't noticed before. No way to write *portable* programs that are Unicode aware.

Michael S. Kaplan on 3 May 2012 2:30 PM:

Again, interesting yet unrealistic to think it matters to even 0.05% of developers in the real world.

B. Bill on 3 May 2012 3:12 PM:

Interesting yet unrealistic to think your UCS-2 to UTF-16 matters to even 0.05% of developers in the real world. No one, and I repeat it, *NO ONE*, except those who write Unicode algorithms, or text rendering engines, should care about encodings. The only way to do this is to standardize on *one* encoding. And the more you resist, the more harm you do to the world.

Michael S. Kaplan on 3 May 2012 3:45 PM:

I wish you luck in your aspirations, but no way will we ever have just one encoding form or scheme.

Life is about dealing with things as they are; deprecating thousands of functions and dozens of programming languages affecting hundreds of millions of people is never gonna happen.

pavel on 27 May 2012 1:12 PM:


since utf8everywhere is in the air and has 200 visitors per day on average, I suggest you take some time to address the claims more seriously. Maybe changing the title of your post can also help :)


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day