The temptation to channel Grumpy Code Reviewer can be almost overwhelming!

by Michael S. Kaplan, published on 2012/10/03 07:01 -04:00, original URI:

First, a few prior blogs so you can understand where I'm heading today:

  1. Vietnamese is a complex language on Windows
  2. On my "Vietnamese Plus" and "pseudo-Form V" constructs
  3. What's the difference between Tiếng Việt, Tiếng Việt, and Tiếng Việt? (other than the obvious, I mean)
  4. How would *you* define debacle?

 While these blogs are placed correctly in a chronological sense, they may not be in a "understand where I'm coming from here" sense.

Basically, I'm talking about the Vietnamese keyboard, and how to get information from the data typed with it.

In blog 3 I described (with the help of John Cowan) the basis for why the "pseudo-Form V" (described in blog 2) exists, and how hard it would be fix with a regular code page.

Blog 4 was just about why all of that had to happen.

But there were some comments put into blog 1 from 2005 by Vadim Berman in 2011-2012 on that original blog:

October 10, 2011

Hi Michael,
It's a very old post but still important. I was wondering whether there is some kind of workaround for WideCharToMultiByte in Vietnamese. Everything is OK except for the letters with double diacritics, like ệ or ữ.

What I don't understand is how it is different from Thai. Thai worked without any tweaking!

Is it possible to change the input and then recreate the characters that the conversion could not handle? I mean they exist in both cp1258 and Unicode, so this should be possible.

Best regards,

October 12, 2011

You might want to read the last comment here. There is a workaround for the broken conversion.

April 27, 2012

In the unlikely case that anyone still cares. here

The Stack Overflow link to code to "solve" the problem which actually points back to Vadim Berman's own blog!) is only a partial solution, mind you.

You can see Vadim Berman's blog here.

It solved replacing WideCharToMultiByte(1258, ...) in its mjwc2mbstr function.

I didn't put my tester hat on to check results, but I put my program manager who used to be a dev hat on to do a bit of code review, and the code seems reasonable.

Though it doesn't do it in a very scalable way.

And if you needed to go back to Unicode, it doesn't handle MultiByteToWideChar(1258, ...) properly in its mjmb2wcstr function at all. Which means lots of question marks. :-(

I tell you, the temptation to channel Grumpy Code Reviewer can be almost overwhelming!

As can the temptation to feel guilty enough about not doing the code work for so long that someone else felt annoyed if to do it himself incorrectly that I'd go and write the solution myself.

I'll resist that fate, since it's code pages, and sublimate that guilt into honest keyboard work. :-)

Which reminds me of the time I recently violated the First Tester's Axiom for Vietnamese, and nothing was broken! Another day...

Nick on 3 Oct 2012 8:13 AM:

Raymond Chen as "Grumpy Code Reviewer," I love it.

Joshua on 5 Oct 2012 8:23 AM:

News Flash: If you're failing to do something right so long that multiple people publish replacements, you are in no position to complain when they do it wrong until you do go back and do it right.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day