Kurdish with an Arabic Accent?

by Michael S. Kaplan, published on 2010/10/01 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/10/01/10070128.aspx

The title of this blog was inspired from a blog from a few years ago in Elizabeth Pyatt's A Linguist in the Wild blog entitled Arabic with a Kurdish Accent?. However, this blog has nothing to with accent and is really focused on issues of script and collation..

It was a litte over 16 months ago in my The Whey doesn't get a locale, either blog that I responded to the It is Time to Add Kurdish Culture to VS .NET Globalization paper, which mentioned me and my work a bit.

I skipped over several of the (in my opinion) interesting linguistic issues related to Kurdish, with just a hint to the fact that there are multiple scripts involved in a theoretical locale list.

But one of the linguistic issues touches on the area of

and since these topics are all interesting to me, I thought it might make sense to talk about Kurdish and these issues a little.

I'll start with the two major branches of Kurdish:

  1. Kurmanji, which is generally written with the Latin script and is principally seen in Turkey and Syria;
  2. Sorani, which is generally written with the Arabic (some would say Perso-Arabic, though the lack of a Farsi Yeh in most letter lists gives me pause) script and is principally seen in Iraq and Iran.

I won't be talking about Yekgirtú in this blog, though its unification efforts are also kiind of interesting so I could talk about it some other time!

Now an interesting question comes up when one answers the question of how to sort the language if one is creating a custom culture and must pick a sort order from all the ones that exist.

If you look at the It is Time to Add Kurdish Culture to VS .NET Globalization paper, you will see that Azad Ali and Frederick Kohun explicitly use the ar-IR (Arabic - Iraq) culture to get their sorting from. Arabic sorting.

This is obviously not so very Perso-Arabic due to the specific differences in Persian when compared to Arabic. But if one looks at the letters used for Sorani, you may not get the same kind of "Persian", Perso-Arabic sense that languages like Persian and Urdu might give you.

Whether this is due to worry about support for some letters or is how people want it? That much is unclear, but we'll assume for now that the results are actually correct. So perhaps using ar-IR rather than something like fa-IR might be okay in this context, if there are no letters that would be different.

But another problem quickly rears its head, however, one that the Yekgirtú alphabet I said I wasn't going to talk about helps to highlight.

You see, the Sorani alphabet for Kurdish has a feature that makes it somewhat unique among languages using the Arabic script.

That difference? Well, vowels are required, and the result becomes a true alphabet rather than an abjad.

This factor could lead one to want a sort that is more like the Latin script Kurmanji, because even though it is a different script there is a total mapping between letters in the two cases.

You can actually find examples of both sorts out there, though without a mechanized solution it is possible that this is more of an anomaly than anything else - a limitation of technology and of sketchy manual labor.

Now in the Latin script, Kurmanji does not have Turkic rules about either letters or casing. So perhaps the English sort or something like it is good enough for them -- or perhaps they are just using the thing most easy for them to get on computers and if they had their choice they'd want something different.

Interestingly, both English and Arabic sit in the default collation table of Windows, so assuming one doesn't need the Arabic letters of Sorani in the Latin order of Kurmanji, then collation will simply work (if not, then one would have to work a little harder to get the right results).

It does seem odd to be placing the vowels in the "Arabic" places when they act like the vowels in Sorani Kurdish do, but perhaps that is indeed what people are used to at this point; perhaps it is what they want.

Note that the "Arabic" sort puts some of these letters in different places than the letter lists for Kurdish typically do. Which makes this question all the more complicated....

What bothers me most here is the thought that perhaps technology has led to changes in language done merely for the expedience of getting on computers. I don't know why exactly, but that bothers me. Like as if there is some issue in language and linguistics that technology should not sully....

