What do U+0223 and U+0657 have in common?
by Michael S. Kaplan, published on 2006/01/18 15:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/18/514314.aspx
Two somewhat random characters:
U+0223 "ȣ" LATIN SMALL LETTER OU
U+0657 "ٗ" ARABIC INVERTED DAMMA
Let's see what they have in common:
- They are both in Unicode.
- Neither one is in the versions of Tahoma font that have shipped.
- Both of them are in the Tahoma that ships with Windows Vista.
- I don't personally know of a language that uses either one, but I am sure there must be some.
- Both of them came up in email questions sent to me more than two times in this last week from different people.
Of course #5 is why they both kind of caught my eye, well that and a sixth and seventh point:
Vista keeps looking better and better, one character at a time! :-)
This post brought to you by "ȣ" and "ٗ" (U+0223 and U+0657, a.k.a. LATIN SMALL LETTER OU and ARABIC INVERTED DAMMA)
# Mihai on 19 Jan 2006 12:35 AM:
It seems U+0223 is used by Algonquin & Huron:
http://www.fileformat.info/info/unicode/char/0223/index.htm
"Vista keeps looking better and better, one character at a time!"
It is going to take a while to get to 0x10FFFF.
(I know, I know, is not all valid, the valid part is not allocated completely, etc. Just kidding :-)
And, in the end, "a trip of 10000 li begins with one step" :-)
# Ben Cooke on 19 Jan 2006 2:31 AM:
Interestingly, the small OU is in some font I have installed. (I'm not sure which; I have a couple of big fonts installed, and it's not in either of the two that usually end up "winning") but the inverted Damma is not.
I guess an obvious question is: what do these symbols actually mean?
# Michael S. Kaplan on 19 Jan 2006 7:40 AM:
I am sure they are both letters (as Mihai's notice of the annotation supports), just not letters commonly used in languages that Microsoft had worked to support previously.
If your question is how are they used in those languages, I am not sure.
# Roozbeh Pournader on 21 Jan 2006 8:23 AM:
Well, U+0657 is mostly used in Iran and Pakistan to mark an Arabic Waw (usually in the Quran), so the reader knows that it sounds like an [u:]. Since Waw may have many different readings, including a [w] sound, an [a:] sound, totally silent, ...
It's a rare character. That's the reason it's only in Unicode since 4.0.
# Richard on 24 Jan 2006 3:52 AM:
# Denis Jacquerye on 31 Jan 2006 5:21 PM:
U+0223 is still in use. See the examples in the formal proposal from before it was included in Unicode: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n1741.pdf
Please consider a
donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.
go to newer or older post, or back to index or month or day