Not a YES man, but maybe a YEH man

by Michael S. Kaplan, published on 2007/07/25

 People who know me at all will never claim that I am a Yes man. Keep that in mind for just a moment because I have a point here, I promise.

Way back in February 2006, after I posted Every character has a story #18: U+06cc and U+064a (ARABIC LETTER FARSI YEH and ARABIC LETTER YEH), reader Roozbeh Pournader commented:

It may be insteresting to know that Pashto actually uses both of the Yehs orthograpically. One is used for the [i] sound, the other for an [aj] sound.

It also has three other Yehs, but that's a different story.

I thought I'd be a YEH man and explore it for a bit.

If you look at the Pashto language topic in Wikipedia it lists the Pashto alphabet:

ا ب پ ت ټ ث ج ځ چ څ ح خ د ډ ذ ر ړ ز ژ ږ س ش ښ ص ض ط ظ ع غ ف ق ک ګ ل م ن ڼ ه ۀ و ؤ ى ئ ي ې ۍ

Those last five entries on the left (yes, the last five -- this is an RTL language!) do look decidely YEH-ish.

The characters are:

U+0649    ARABIC LETTER ALEF MAKSURA    (Pashto Yeh?)


U+064a    ARABIC LETTER YEH (Pashto Strong Yeh?)

U+06d0    ARABIC LETTER E (Pashto Soft Yeh?)

U+06cd     ARABIC LETTER YEH WITH TAIL (Pashto Faminine Yeh?)

It is vaguely possible that I am not the only one who has questions at this point. Though if you look at the two footnote articles, entitled Pashto Alphabet and Pashto Alphabet, you'll see that one does explicitly list U+06cc, and the other does not list code points but does list a Yeh, a Verbal Yeh, a Strong Yeh, a Soft Yeh, and a Faminine Yeh, which seem (to my eyes) to match the five characters above.

Then there are sites like this one that have text such as:

In the case of Pashto, Persian and Urdu the code U+06CC representing dotless or dotted Y (depending on position) has not been used, as it appears that browsers do not support it adequately. Instead the codes for dotless or dotted Y are used (U+0650 and U+0649, respectively), depending on position.

seem to be saying something that is mostly not true anymore for software, though it too gives a reason for U+06cc to not appeasr on these other lists.

The data in Vista actually uses the five code points above and does not use U+06cc, though the original sources for this were entirely different than the above I have mentioned.

So to be honest I really don't know if U+06cc is used by Pashto or not, at this point (if it is then it may not sort exactly where users expect it to).

So does this post make me a YEH man? :-)


This post brought to you by ی (U+06cc, a.k.a. ARABIC LETTER FARSI YEH)

# Åke Persson on 25 Jul 2007 3:59 PM:

Please, consider this important link:

# Michael S. Kaplan on 25 Jul 2007 4:37 PM:

Well, there are actually several problems with that doc that I pointed out to the author but there wan't much progress in getting the problems addressed. Example -- it was in part an attempt to consoliodate preferred prctices in severasl different lsnguages, when the kind of users that MS wants to enble really don't all want to sacrifice those differences.

It is not like everyone in the country agreed here -- just some standards people. :)

