You think *your* characters have stories? Let me tell you a character story....

by Michael S. Kaplan, published on 2011/10/20 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/10/20/10228020.aspx


Today's blog's title should be read with a Groucho Marx accent....

Over on the Unicode List, Andreas Prilop asked:

There are three so-called "Yiddish digraphs" in Unicode:
  U+05F0   wawayim
  U+05F1   waw yod
  U+05F2   yodayim

What is specifically Yiddish about these digraphs?
They can be used in the same way in Hebrew.
But this isn't done. Why not?

http://he.wikipedia.org/wiki/%F8%E9%E9_%F7%E5%F8%F6%E5%E5%E9%E9%EC
http://he.wikipedia.org/wiki/%F8%D6_%F7%E5%F8%F6%D4%D6%EC

Why should Yiddish be written with special digraphs
but Hebrew with sequences of two letters?

But even in Yiddish, the digraphs are not really used:

http://yi.wikipedia.org/wiki/%F8%F2%F7%E9%E0%E5%E5%E9%F7
http://yi.wikipedia.org/wiki/%F8%F2%F7%E9%E0%D4%E9%F7


The Unicode Standard says:
| ... to distinguish the digraph double vav from an occurrence
| of a consonantal vav followed by a vocalic vav.

By that reasoning you would need an English digraph "sh"
to distinguish "sh" in "***" from "s-h" in ***hole. ;-)

 Ah yes, the Yiddish digraphs!

Code point Character Name
U+05f0 װ HEBREW LIGATURE YIDDISH DOUBLE VAV
U+05f1 ױ HEBREW LIGATURE YIDDISH VAV YOD
U+05f2 ײ HEBREW LIGATURE YIDDISH DOUBLE YOD

Lots of people jumped in and the consensus was alon the lines of "I'm not sure, but I think it'd legacy".

Thankfully the guy who should be writing the "Every Character Has a Story" book jumped in to add some surety:

On 10/19/2011 12:08 PM, Mark E. Shoulson wrote:
> I think the issue here is (probably) a matter of legacy encodings,
> though someone else would need to confirm that.

O.k., as self-appointed historian of the standard, I guess I need to be
the one to answer that. ;-)

The Yiddish digraphs were added to the basic set of Hebrew letters for
Unicode 1.0 on behalf of the Research Libraries Group, for compatibility
with their existing usage on the Research Libraries Information Network
(RLIN).

Digging very deep in the old mailbox, I located email from Joan Aliprand
of the Research Libraries Group, dating from July 11, 1991 confirming
this, and noting that "I pushed very hard for inclusion of the Yiddish
digraphs tsvey vovn and tsvey yudn."

It is my recollection that the 3rd digraph was added during the
discussion of
the addition of those two.

At any rate, there is your legacy encoding source for these. Whether or not
the digraphs are used in *current* Yiddish data (or would even be
recommended for such use) is not relevant to reasons for the original
inclusion.

--Ken

 And there we go -- ever digraph has a story, too! 

This blog is sponsored by our three Hebrew/Yiddish Digraph friends....


Philip Newton on 21 Oct 2011 2:50 AM:

I'm amused that they point to usage in Wikipedia (the encyclopædia that *anyone* can edit) as "proof" that a certain spelling is or isn't used "in Hebrew" or "in Yiddish" (with the implication of "in general in that language").

Random832 on 21 Oct 2011 10:10 AM:

Could it have something to do with the fact that the letters are narrow [and therefore more likely to be useful as a digraph on fixed-pitch typewriters or computer screens - particularly if this is a common letter pair] - compare "ij" [yes, this is a "letter" in Dutch, but: ] contrast "ch" [which is/was a 'letter' in Spanish but never got encoded as a single code point]

Once the visual distinction is there (it's not clear if there would be a visual distinction in handwriting or in proportional fonts - I don't know anything about the language or the script so I can't say), it's easy to invent a semantic distinction: Something like 'This combination of letters means a double consonant some places and not others, and it looks silly to have it grouped together when it's not the double consonant, so only use it there.'

And once there's a perceived semantic distinction, that leads to it getting encoded as a 'real' character [defined here as not having a compatibility decomposition, which it would if it were just a question of legacy character sets; compare "fi"] in Unicode.

Incidentally, all three digraphs are present in the MARC-8 (RLIN) character set: lcweb2.loc.gov/.../32.html

Alex Cohn on 25 Oct 2011 7:50 PM:

Yes, it has everything to do with the history of Yiddish typewriter, see m.flickr.com

Random832 on 26 Oct 2011 7:15 AM:

Incidentally, how did/do typewriters for bidirectional scripts work? Does the carriage just advance to the right and you're expected to type numbers in reverse order?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day