What to do with the Vietnamese keyboard on Windows?
by Michael S. Kaplan, published on 2005/11/11 04:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/11/491349.aspx
By the way, I am not implying we must do anything at all. It is probably important to start with that disclaimer. :-)
If you think back to previous posts I have done on Vietnamese (here and especially here), it is clear that the implementation has been inspired largely and maybe originally guided by Windows code page 1258.
- the issues with some characters not sorting properly is/will be resolved in Vista;
- using Unicode instead of the code page will take care of the rest of the limitations
There is still the problem of the keyboard.
We'll start by looking at it. Here is the base state:
and here is the shifted state:
If you look at the top row, it contains five combining characters. It is not a coincidence that those five characters are the same ones as are in code page 1258:
VK_5 CC = U+0300 : COMBINING GRAVE ACCENT
VK_6 D2 = U+0309 : COMBINING HOOK ABOVE
VK_7 DE = U+0303 : COMBINING TILDE
VK_8 EC = U+0301 : COMBINING ACUTE ACCENT
VK_9 F2 = U+0323 : COMBINING DOT BELOW
Okay, so the model here is (on this keyboard) to allow and encourage text that is not really in any particular Unicode normalization form. Instead, it makes extensive use of intermediate forms that are neither Form C or Form D (most of the keyboards that Microsoft ships are in Form C).
Vista does/will fix the issue with the various equivalent forms, so it will be less noticeable. But it does make things a little odd for any time the data is being transferred around, or if the Win32 NLS API is not used to get comparisons done.
Of course it does nothing for the fact that not all of the characters that have been added to the sorting table for Vietnamese can be typed in with the Vietnamese keyboard.
And it does not allow for a more sophisticated model that supports chained dead keys so that one could type the two dead keys and the letter to get the various precomposed characters used in Vietnamese.
Such a keyboard would not be as useful in legacy non-Unicode applications due to all the problems in and lack of compatibility with code page 1258, but it would be better for Unicode support, for the language, and for consistency with the other keyboards on Windows.
With all that said, I have fallen into the trap of coming up with a whole plan for a technical solution without knowing whether there are actual users who are running into problems with the existing support. If there are any native speakers of Vietnamese who have thoughts here, it would help prioritize this issue as opposed to other features that future versions might include. Including features like support for chained dead keys in MSKLC to make creating such keyboards easier!
What would people who need to type Vietnamese like to see?
This post brought to you by "ệ" (U+1ec7, a.k.a. LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW)
# Mihai on 11 Nov 2005 12:24 PM:
I could query some Vietnamese translators.
We constantly get from them MS Word files with NFC form (and then we have problems saving in 1258 :-), although the standard keyboard does not allow to input them.
I suspect they use a 3rd party program or modified keyboard (maybe even with MSKLC :-)
Since a translator would probably not care if a character is NFC or intermediate form, I guess that application/keyboard has some other attractive features.
# Michael S. Kaplan on 11 Nov 2005 5:40 PM:
That would be cool. I'd love to know more about what we need to do. :-)
# Suzanne McCarthy on 11 Nov 2005 7:48 PM:
# Michael S. Kaplan on 11 Nov 2005 8:20 PM:
Well, it is not your fault if the combination of a bad keyboard and lots of bad user data is out there -- it is a no-win situation....
# Suzanne McCarthy on 11 Nov 2005 9:24 PM:
Surely the user data from the VietDic dictionary is from another aand more acceptable encoding standard. Why doesn't the MS keyboard enable one to keyboard this encoding? Obviously VietDic is the one with the most hits and MS keyboard encoding has the fewest hits. It indicates to me that the MS keyboard is not being used much because it doesn't match the standard encoding. So why not change the keyboard to match the standard? Have I missed something? I find the MS Vietnemese keyboard unusable. I know it is not my fault - why would it be?
# Michael S. Kaplan on 12 Nov 2005 12:00 AM:
I was referring to where you talked about not knowing what you are doing.... When we talk about encodings, anything short of Unicode will be inadequate at representing text fully. That includes *all* of the other possible options, and cp1258, too. But in any case, individual standards that cannot represent other languages are an evolutionary blind alley -- as is deciding the best encoding for a language by measuring google hits! :-)
# Suzanne McCarthy on 12 Nov 2005 12:39 AM:
These are *all* unicode - what do you mean by 'short of Unicode'? What else would I be using?
The main difference for example 1 and 2 is this.
1. uses one character for the 'e' - U+1EC3 : LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE
2. uses two characters in sequence - U+00EA : LATIN SMALL LETTER E WITH CIRCUMFLEX and U+0309 : COMBINING HOOK ABOVE
Aren't they both unicode? The other options are visibly different so they don't really count except to show the complexity involved.
Is there supposed to be one standard and who would decide that? The google hits are just to show that they are different really - I have no idea which encoding is best, just wondering if there is a process or entity which deals with this discrepancy?
# Mihai on 12 Nov 2005 1:43 AM:
Ok, got it!
Sounds like something serious. The thing is available free from the "Vietnamese Professionals Society" (http://www.vps.org
Click DOWNLOADS, the software is VPSKeys 4.3.
You should click on "(latest version)", if you click on the software title nothing happens.
The description is "Vietnamese Keyboard Driver for Win95-98, ME, NT4, Win2000 and Win XP" (impressive coverage), and and English manual is available (great, because the UI is Vietnamese :-)
# Andrew Cunningham on 12 Nov 2005 3:44 AM:
In theory, applications and web services should normalize their data. The problem of the character sequences generated by the Windows Vietnamese layout vs third party solutions is more related to this lack of normalization. The users end up in a mess.
1) using a third party IME in MS Word (and using precomposed characters) and having the Vietnamese proffing tools installed ... spell checking doesn't work, the dic assumes the character sequences the Windows layout uses.
2) searching Google. Google doesn't normalize. Search results using Microsoft's keyboard will yield different results than a third party solution.
3) Searching BBC's Vietnamese site with the Windows layout will find more recent articles, and using a third party solution will find older articles. Assuming they haven't updated/fixed that since my last search.
Third party solutions tend to be more flexible about the location of teh diacrtic. Some softwrae allows you to either type the diacritics after the base vowel or at the end of the word, dpeending on your preferences. (seems to mimic handwrighting styles in this sense).
A similar problem is beginning to surface with African languages, as more keyboard layouts are generated using MSKLC. There have been a number of dicussions in various fora regarding Nigerian languages when typing tones.
The ability to create keyboard layouts, using MSKLC, that generate NFC or NFD character sequences would be useful.
But in the long run, normalization issues aren't particularly a keyboard layout issue.
If this makes sense?
# Gabe on 12 Nov 2005 4:25 AM:
I don't know much about Vietnamese, so maybe I'm wrong to be surprised here. But why on earth could they possibly use backslash often enough to warrant two keys for it on opposite sides of the keyboard?
# Michael S. Kaplan on 12 Nov 2005 9:06 AM:
Hi Gabe --
That one way over on the left is the VK_OEM102 key, which is not present on 101-key keyboards and is thus by convention usually duplicated elsewhere.
MSKLC even issues a warning in the validation phase if you do not have an assignment there duplicated elsewhere, for that very reason....
# Michael S. Kaplan on 12 Nov 2005 9:10 AM:
Hi Andrew --
Yep, makes sense. One of the changes in Vista is to pick up the different normalization forms (and all of the 'intermediate forms', as well) which will at least make comparison technologies make use of them.
Over time, this may well improve search for MSN, though probably it will not do much for Google's search. :-)
# Michael S. Kaplan on 12 Nov 2005 9:12 AM:
This is a terminology difference between us -- Unicode is a single encoding. When you talk about encodings, I assume you are talking about the endlerss attempts at 8-bit encodings for Vietnamese, from cp1258 to all of the VNI code pages, and so on.
# Michael S. Kaplan on 12 Nov 2005 9:23 AM:
Hi Mihai --
Thanks for the link, though it does look like they are hosting the Windows core fonts for download. :-(
But the Online help looks interesting, and it does seem to support many of the different 8-bit encodings as well as Unicode...
# Suz on 12 Nov 2005 3:56 PM:
Sorry about my terminology. I have learned something about how to view, identify, and display characters since last June but not talk about them properly. Just so you know, I have never knowingly used or talked about anything other than Unicode. Anyhow - lots of interesting ideas from this.
# Michael S. Kaplan on 12 Nov 2005 4:37 PM:
No worries, Suzanne -- note that I did not say you were wrong, I just said there is a terminology difference. I was just pointing out why I misunderstood your words, that's all. :-)
If you look at that link Mihai posted, the tool supports many of those different encoding systems....
In the end, I do agree with Andrew -- search enginees need to normalize away the differences between these methods. And not only for theoretical comformance to Unicode and its principles of canonical equivalence, but also for the practicalz need to unite asll of these different forms of text, especially in languages like Vietnamese where it is such a factor in existing text!
# Andrew on 16 Nov 2005 8:11 PM:
is there any documentation available at this stage on language and internationalization features in Vista?
# Michael S. Kaplan on 17 Nov 2005 8:34 AM:
There is some on this blog (more after Beta 2 is out!), and plenty in the Beta 1 Vista SDK docs (more in Beta 2 for thst, also)....
THANH TIN BUI on 20 Feb 2009 1:22 PM:
Kinh Ong ,
Toi muon xu dung chu viet cac thu tu . Xin Ong vui long giup do cho . Xin cam on Ong ,
Thanh Tin Bui
TT on 29 Aug 2009 7:34 AM:
I prefer the VPS keyboard...
Nancy on 20 Sep 2010 10:27 PM:
i underst0nd but it any way i can see vietnamese key board on the screen
Michael S. Kaplan on 20 Sep 2010 11:50 PM:
The on-screen keyboard, perhaps?
kieulien on 7 Sep 2012 10:13 AM:
How to type the number on vietnamese keyboard?
Vy on 6 Oct 2012 10:25 PM:
I recently added the vietnamese keyboard to my computer but the issue im having is that i can't get the u and o letters with the comma on the top corner to show? can you show me how i can get it to appear? thank you.
go to newer or older post, or back to index or month or day