Every character has a story #35: ARABIC LETTER HAH WITH TWO DOTS VERTICAL ABOVE (U+0682)

by Michael S. Kaplan, published on 2011/12/14 07:14 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/12/14/10247653.aspx

So it started the other day.

On the Unicode List.

Andreas Priliop asked:

Arabic letter U+0682 shows two dots above. It has the cryptic remark "not used in modern Pashto". But was it ever used?

The new 2011 edition of German standard DIN 31635  "Romanization of the Arabic Alphabet"  http://www.beuth.de/en/standard/din-31635/140593750 shows the real archaic Pashto letter on page 22.

It has one dot above and one dot below, corresponding to Pashto U+0696 and U+069A.

This is also the form shown in "Lehrbuch des Pashto (Afghanisch)" von Manfred Lorenz.

/* The current form for [dz] is U+0681. */

What he didn't realize was this was not the first version of an annotation for this letter.

We then had some people weigh in, like Ken Whistler:

To understand where the "cryptic" remark came from, you need to know more about the history of the character in the standard.

U+0682 was encoded in Unicode 1.0. I don't have the material in hand right at the moment to track down its original source, but for these kinds of extensions to Arabic dating back to Unicode 1.0, it most likely in some poorly resolved handwritten or photocopied source labelled "Pashto" but without much analysis.

However researching the exact details for that turns out, in Unicode 1.0 the character was published with a note "Pashto".

On February 13, 2003, Roozbeh Pournader sent a note around with a number of comments of Arabic character extensions and annotations. Among those notes was the statement:

C6. For 0682: The comment is wrong. This is not used in modern Pashto (just rechecked with my Pashto dictionaries). I am back from Kabul doing a study of computer requirements of Pashto and didn't see this anywhere. I guess we should send a public email and ask if anybody knows what this is. [Just an alert. Don't do anything for now.]

Then on March 19, 2003, Roozbeh followed up with another note:

 > 3. Comment for 0682: Remove 'Pashto'. This is not used in
> modern Pashto.
> Never. And not in loanwords. (May possibly be old Pashto.)

Based on that note, and with no further clarification provided by anyone on the issue, I and the other editors modified the annotation in the Unicode *4.0* names list, so that it read "not used in modern Pashto".

It has remained that way in the names list since that date.

If Andreas (or anyone else) has better information, that can certainly be submitted, and the editors can then work to further clarify any annotation for the character.

My own suspicion is that the original form from Unicode 1.0 may have been a hard-to-interpret glyph alternative for 0681. Note another note on the unicode email list from 2001, from Vladimir Ivanov. This note doesn't address 0682 specifically, but does raise questions about the exact nature and shape of the diacritic above the hah for dze in Pashto usage:


Date:     Fri, 8 Jun 2001 07:27:11 +0400

My Pashto informants call it "dI paxto alifbe", saying it has 10 extra =
Letter "dze" is represented in Unicode by U+0681 "Arabic letter heh with =
hamza above",
though the sign above heh is not exactly hamza. It is a zigzag-like sign =
of the same height as hamza, but they are well distinguished. My =
informants could not recall any special name for it.
If you use "heh with hamza above", people usually accept it as a =
substitute, saying that "computer is not able to build a real Pashto =
letter" (?!).
I could not find such a letter in Unicode. I would be glad to hear some =
comments on  it.

Vladimir Ivanov


 some others weighed in, in small ways.

They all meant well, but with no new info to impart.

Finally, Ron weighed in with what looked like the answer:

I think I have an answer to a possible source of U+0682:

Grammar of the Pasto or Language of the Afghans, Compared with the Iranian and North-Indian Idioms. By Dr. Ernest Trumpp. London and  Tuebingen, 1873. (Available from Google Books)

Page 1 (Page 24 of the PDF download from Google Books):

"Only one consonant has been left indistinct, the media [U+0685] d (=  dz), which is not distinguished from its tenuis [U+0685] t (= ts) by  separate diacritical marks. We have endeavoured to supply this want by  placing two dots above [U+062D], viz. [U+0682], as for a foreigner at 
any rate the non-distinction of the two sounds must prove very  troublesome."

Indeed, some other 19th century grammars refer to Pashto [ts] and [dz] as distinct letters but typeset them identically with three dots above (that is, like U+0685). Here are two such examples:

A Grammar of the Pukkhto or Pukshto Language on a New and Improved System, by Henry Walter Bellew, London 1867 (see alphabet table on page 3, that is page 20 of the PDF download from Google Books).

A Grammar of the Pukhto, Pushto, or Language of the Afghans, by Lieutenant H. G. Raverty, Calcutta 1855 (see alphabet table on pages 3-4, that is pages 77-78 of the PDF download from Google Books).

So it appears that the character "Hah with two dots vertical above" was a 19th-century attempt to distinguish Pashto [ts] and [dz] for didactic purposes. The convention of writing [dz] using Hah with hamza above (U+0681) appears to have emerged later. There are still some unanswered questions.

- Why did a character from a 19th-century book get coded in Unicode? Did it ever receive wider use beyond Trumpp's book?

- Is the present hamza convention a development of the two vertical dots proposal, or are they unrelated? About a year ago I worked with several Afghan expatriates living in Southern California, and in handwriting they would typically join two diacritical dots as a squiggle rather than a line (which is more common in Arabic). One could see how two vertical dots might develop into a vertical squiggle and later into a hamza, especially given the note by Vladimir Ivanov cited below. But this is only a conjecture at this point.

Anyway, I hope to have contributed a few pieces towards solving the puzzle :-)


Nw under ordinary circumstances, this would probably not have ben encoded without a ot more effort.

But in those early days, a lot of stuff slipped in.

Including what may have been an early, experimental linguistic innovation never ultimqately picked up in the language that this letter may indeed have been.

Now in most fonts that cover Arabic:

And as we say, "every character has a story."

Some just need deeper digging than others!

no comments

go to newer or older post, or back to index or month or day