by Michael S. Kaplan, published on 2005/02/06 08:03 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/02/06/367985.aspx
The other day, Ivan Petrov pointed out:
...maybe the BIGGEST one, is about the absence of many of the Cyrillic vowel letters with graves in Unicode, respectively in ANSI 1251 Codepage. There are defined only 2+2=4 (CAPITAL and SMALL letters with graves – #CYRILLIC CAPITAL LETTER IE WITH GRAVE, #CYRILLIC CAPITAL LETTER I WITH GRAVE, #CYRILLIC SMALL LETTER IE WITH GRAVE and #CYRILLIC SMALL LETTER I WITH GRAVE) in Unicode.
The whole list of the Cyrillic vowel letters must be:
#CYRILLIC CAPITAL LETTER A WITH GRAVE
#CYRILLIC CAPITAL LETTER IE WITH GRAVE
#CYRILLIC CAPITAL LETTER I WITH GRAVE
#CYRILLIC CAPITAL LETTER O WITH GRAVE
#CYRILLIC CAPITAL LETTER U WITH GRAVE
#CYRILLIC CAPITAL LETTER HARD SIGN WITH GRAVE
#CYRILLIC CAPITAL LETTER YERU WITH GRAVE (only for Russian language)
#CYRILLIC CAPITAL LETTER E WITH GRAVE (only for Russian language)
#CYRILLIC CAPITAL LETTER YU WITH GRAVE
#CYRILLIC CAPITAL LETTER YA WITH GRAVE
#CYRILLIC SMALL LETTER A WITH GRAVE
#CYRILLIC SMALL LETTER IE WITH GRAVE
#CYRILLIC SMALL LETTER I WITH GRAVE
#CYRILLIC SMALL LETTER O WITH GRAVE
#CYRILLIC SMALL LETTER U WITH GRAVE
#CYRILLIC SMALL LETTER HARD SIGN WITH GRAVE
#CYRILLIC SMALL LETTER YERU WITH GRAVE (only for Russian language)
#CYRILLIC SMALL LETTER E WITH GRAVE (only for Russian language)
#CYRILLIC SMALL LETTER YU WITH GRAVE
#CYRILLIC SMALL LETTER YA WITH GRAVE
So my third question is:
“What can be done about this problem?”
Form more information you can see at:
http://titus.uni-frankfurt.de/unicode/unicsel/unicself.htm#Cyrillic
Well, when I look at the list, I can only think of one thing (well, one stream of things!) to say:
А̀ Ѐ Ѝ О̀ У̀
Ъ̀ Ы̀ Э̀ Ю̀ Я̀
а̀ ѐ ѝ о̀ у̀
ъ̀ ы̀ э̀ ю̀ я̀
or in Unicode code points....
0410 0300 0415 0300 0418 0300 041e 0300 0423 0300
042a 0300 042b 0300 042d 0300 042e 0300 042f 0300
0430 0300 0435 0300 0438 0300 043e 0300 0443 0300
044a 0300 044b 0300 044d 0300 044e 0300 044f 0300
These characters already exist in Unicode, in the composite (decomposed) form. Note that they look better in some fonts than they do in others -- which is mainly a matter of letting font foundries that work to support languages know that there is a need to make sure these particular characters have good font hints so that they will not look good "by accident" of the combining character guessing how best to work with the base characters.
If you wanted to try to get them added to Unicode in the precomposed form, the submission process for new characters is very straightforward. However, as the proposal information clearly states:
So it would appear that these characters are unlikely to be separately encoded.
As for the request to add these code points to cp1251, I will deal with that in a separate post, perhaps later today (or sometime soon).
This post brought to you by "Ѡ" (U+0460, CYRILLIC CAPITAL LETTER OMEGA)
# Mike Dimmick on 6 Feb 2005 10:00 AM:
# Michael Kaplan on 6 Feb 2005 11:42 AM:
# Mikhail Arkhipov (MSFT) on 6 Feb 2005 9:20 PM:
# Michael Kaplan on 6 Feb 2005 9:25 PM:
# Ivan Petrov on 8 Feb 2005 3:23 PM:
# Michael Kaplan on 8 Feb 2005 3:24 PM:
referenced by