When is a character not a character?

by Michael S. Kaplan, published on 2007/01/23 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/01/23/1511566.aspx


The question that appeared in my inbox included a point that seemed worth posting about....

We are also facing some other problems like, there are some characters in the Keyboard layout that are not there in the Gujarati Unicode Range. Please find the attached work doc having a sample character. Let me know, if there is any way to render a character that is not in particular language Unicode range.

The attachment contained the following character:

 

Taking a brief look at the Gujarati keyboard layout in the base, Shift, AltGr, AltGr+Shift, and Ctrl+Shift states:

It looks like what is on the VK_5 key in the Shift state. It is definitely not in the Unicode code chart for Gujarati as a single character, but Unicode is the core of Microsoft's support. So let's hover over it in MSKLC and see what is behind it:

So the answer to the question raised in the title (when is a character not a character?) is quite simple -- it is when it is a sequence. :-)

It is in the Gujarati script Wikipedia article as the compound consonant jña.

Perhaps this could even be a named sequence (a-la-UAX #34) in a future version of Unicode....

 

This post brought to you by જ્ઞ (U+0a9c U+0acd U+0a9e, a.k.a. GUJARATI LETTER JA + VIRAMA + NYA, a.k.a. jña)


# Bart on 23 Jan 2007 6:07 AM:

Maybe you should write a post about how the concept of a character in the sense of wchar should be deprecated for uses other then datastorage or maybe a codepoint. And maybe explain how to handle the kind of characters this article is about and what sets them aside from 'normal' strings. (and maybe how to recognize them so that you can still do things like ReplaceStr)


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/01/24 Sometimes a WCHAR really *is* just a character....

go to newer or older post, or back to index or month or day