Even the characters with no weight can be given weight in their own special way

by Michael S. Kaplan, published on 2006/11/19 15:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/11/19/1104093.aspx


Coming soon to the European Union: Bulgaria and Romania!

These are a couple of important events and have led to updates of the Times New Roman and Arial fonts in the European Union Expansion Font Update (a WGA download).

I will pine for Romanian and Bulgarian versions of the download page (nothing there yet but having those links in the blog will make it easier to check!).

One fun bit of text on the download page:

These characters are not accessible on the Windows XP keyboards for Bulgarian and Romanian. You may use Microsoft Keyboard Layout Creator (MSKLC) to define your own keyboard layout.

You can also access the characters via Character Map
Start>All Programs>Accessories>System Tools>Character Map

If you are using an Office Product you can use the application's Insert Symbol function to input the characters as well.

I think MSKLC might be approching ubiquity! :-) 

Now, the six code points in question are:

U+0218    Ș    LATIN CAPITAL LETTER S WITH COMMA BELOW

U+0219    ș    LATIN SMALL LETTER S WITH COMMA BELOW

U+021a    Ț    LATIN CAPITAL LETTER T WITH COMMA BELOW

U+021b    ț    LATIN SMALL LETTER T WITH COMMA BELOW

U+040d    Ѝ    CYRILLIC CAPITAL LETTER I WITH GRAVE

U+045d    ѝ    CYRILLIC SMALL LETTER I WITH GRAVE

Now in the end the primary importance of fonts is in display, so from a cynical view I guess you could be cynical say that this is just for show. Though since it was apparently specifically requested by the Court of Justice in Luxembourg and the European Commission it is obviously not just for show at all. It is considered to be an important gesture of support for the needs of what will be the newest members of the European Union.

I'll be covering the Romanian characters another day (I have an Every Character Has a Story post about the four Romanian "comma below" characters that I expect will be posted some time this week). It goes into many of the interesting ISO and Unicode and encoding and even keyboard issues that these characters raised. So stay tuned for more on this....

But the Bulgarian characters have a special place of their own here since it is the first CYRILLIC SCRIPT EU language, and basically one of only two non-LATIN SCRIPT EU languages (the other one is Greek). How the EU works to see this addition through will be interesting (support of Greek in the EU has actually had its share of technical issues over the years).

Now although these issues are weighed quite heavily, the truth is that all of these six characters have no weight in collation, a factoid that affects all versions of Windows prior to Vista, SQL Server, Access/Jet, and so on. A change here to address this would have dire consequences in applications, forcing re-indexing and/or potential index corruption on a global scale. Vista does address this issue, however, which makes Vista the best release to give both Bulgaria and Romania the weight that they of course deserve.

And along those lines, here is a screenshot of an RTM build of Vista with a huge number of MUI and LIP languages installed:

 

Note the language selected at the top of the list and also the language highlighted farther down....

I suppose if one was feeling ornery enough one could argue that UI language is just for show, too. But obviously it is more than that -- it is a commitment to the native speakers of a language in the concrete desire to give people a better "local" experience. :-)

 

This post brought to you by  Ѝ (U+040d, a.k.a. CYRILLIC CAPITAL LETTER I WITH GRAVE)


# Ivan Petrov on 21 Nov 2006 5:33 AM:

Hi Michael :-)

Here is the place where I want to remind you the following Bulgarian language related posts and comments:

http://blogs.msdn.com/michkap/archive/2005/09/12/463483.aspx#467936

http://blogs.msdn.com/michkap/archive/2006/03/28/561725.aspx

http://blogs.msdn.com/michkap/archive/2005/02/06/367985.aspx

or see

http://en.wikipedia.org/wiki/Bulgarian_language#Word_stress

Basically I want to say again that the Bulgarian written language uses 8 Stressed vowels with grave accent or when we take into consideration the letter case - lowercase and UPPERCASE - so we have 16 Stressed vowels with grave accent!

So, the:

U+040d    Ѝ    CYRILLIC CAPITAL LETTER I WITH GRAVE

U+045d    ѝ    CYRILLIC SMALL LETTER I WITH GRAVE

are only 2 (two) of these 16 Stressed vowels with grave accent !!!

I can suggest for future Font Updates to contact the The Institute for the Bulgarian Language at the Bulgarian Academy of Sciences as the Official regulatory body of Bulgarian language and especially The Department of Computational Linguistics as part of the Institute of the Bulgarian Language at http://dcl.bas.bg/ nevertheless the requests made by the Court of Justice in Luxembourg and the European Commission because of miss coordination between European Commission and The Institute for the Bulgarian Language.

Regards,

Ivan.

# Michael S. Kaplan on 21 Nov 2006 5:35 PM:

Every single one of those posts has already been given the only answer that will ever be given -- Microsoft is not going to support characters not added to Unicode, and thus far Unicode has not chosen to add the characters.

But the rest of what is desired already exists in Unicode and in the fonts.

# Ivan Petrov on 22 Nov 2006 4:30 AM:

Hi Michael,

Thank you for the response.

I think you do not understood me correctly!

With the above comment I mean that in future Microsoft Font Updates it will be good to be made the Times New Roman and Arial fonts to display the rest of the Bulgarian Stressed vowels with grave accent correctly, as you say:

1) "the primary importance of fonts is in display" (this post);

or

2) "These characters already exist in Unicode, in the composite (decomposed) form. Note that they look better in some fonts than they do in others -- which is mainly a matter of letting font foundries that work to support languages know that there is a need to make sure these particular characters have good font hints so that they will not look good "by accident" of the combining character guessing how best to work with the base characters" (http://blogs.msdn.com/michkap/archive/2005/02/06/367985.aspx).

So this is my point of view. I've understood you clearly and do not insist everything else than better representation of this characters in future Font updates.

Regards,

Ivan.

# Michael S. Kaplan on 22 Nov 2006 4:37 AM:

Okay, I understand a bit more now. Yes, this is a reasonable request, getting the display of these composite characters to be a little better. Though the improved display may not be as high of a priority in updating the font on downlevel systems as what led to this particular update (characters that were entirely missing), of course....

# Dirk on 23 Nov 2006 1:11 PM:

I think the português (Portugal) item is clipped by the one below. Look at the p and the g (or is it q). Is this a bug of the Vista combo box?

# Michael S. Kaplan on 23 Nov 2006 10:55 PM:

Hmmm.... maybe. OTOH it is a very small font size (some of the others are also unreadable, from my point of view). I think the problem is the font metrics and the font size used....


referenced by

2010/04/15 Caught the font inside of Windows (It wasn't me)

2008/01/28 What does the [cracked] SiaO crystal ball say about language support in XP SP3?

2007/05/08 Update⁴ for Romanian and Bulgarian

2007/01/28 Stealth features (like language detection?)

2007/01/26 Be careful what you wish for (just in case it comes true!) aka When a Cedilla needs to be a Comma Below (and vice versa)

2006/12/11 The Romanian keyboard layout on XP is the brokenest layout of all

2006/12/09 Update on the update to the update for Romanian and Bulgarian

go to newer or older post, or back to index or month or day