You can't always blame the IME....

by Michael S. Kaplan, published on 2011/09/09 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/09/09/10208356.aspx


Over in the Suggestion Box, Matthew Oakley asked:

Hi. I am using Ms Access as a front end to a mysql database for a web testing system. As far as I can tell, everything is set to use utf-8. However, when I type the korean character which is unicode c548 (using IME), it ends up stored in the database as ec95. When I type the same character in the webpage, it is stored in the same database as bec8. I have no idea what is going on! English is stored perfectly, ime seems to be creating a nightmare. Could you make a topic explaining what IME does in depth?

Interesting.

The character in question, U+c548, is also known as HANGUL SYLLABLE IEUNG A NIEUN.

This one:

Now the bytes in UTF-8 for this character would be:

EC 95 88

Notice the first two bytes.

They look a lot like the two bytes Matthew said were ending up in the database.

I don't know aabout you, but I'm wth Leroy Jethro Gibbs -- I don't believe in coincidences!

Now the IME works just fine (you can get the letter in question by typing  D K S or IEUNG A NIEUN that is interpreted as ᄋ ᅡ ᆫ , which gets assembled the right way on any version of Access).

I just tried it here. 

But something else is at work here, something that is corrupting the text....

Maybe something along the lines of custom processing code?

Or the data layer? Access is not a paragon of UTF-8 support.

It isn't the IME....


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day