You've got to be kashidding me

by Michael S. Kaplan, published on 2007/01/12 06:01 -05:00, original URI:

I know that technically 'Kashidding' is not a word, but English is a productive language, yada yada yada, and it actually makes sense in the context of this post! 

The other day one of the developers on another team was looking at some of their text search functionality that had a "Match Kashida" option in it. Lots of folks have done it before -- from Internet Explorer to the XPS Viewer to Microsoft Word and so on....

But a Kashida is a really weird thing to match. Truly.

As Wikipedia explains:

Kashida is a type of justification used in some cursive scripts, particularly Arabic. In contrast to white-space justification, which increases the length of a line of text by expanding spaces between words or individual letters, kashida justification is accomplished by elongating characters at certain chosen points. Kashida justification can be combined with white-space justification to various extents.

Kashida can also refer to a character representing this elongation (also known as tatweel), or to one of a set of glyphs of varying lengths that are used to implement this elongation in a font. The Unicode standard assigns codepoint U+0640 as "Arabic Tatweel".

Now there are two ways to do Kashida type justification that can be combined as much as one might like:

  1. a font and/or shaping engine can do it in accordance with simple or complex rules as to which letters or character combinations or words;
  2. a Tatweel (U+0640) can be inserted in the text to give subtle (or perhaps not-so-subtle) hints as to where to do the stretching.

Perhaps people will immediately see an slight analogue to this behavior on the Latin script to perform a slightly different operation -- hyphenation to break long words between lines, which has that optional Unicode character the SOFT HYPHEN (U+00ad) that I have discussed previously....

(Now of course the analogy does not completely hold since Kashida-esque functionality can be much more artistic typographically, but nevertheless the two operations have much in common!)

But to be perfectly honest the notion of requiring either an explicit "Match Kashida" functionality like Microsoft Word has or a NORM_IGNOREKASHIDA flag as VarCmp is documented as having even though it is not in the help file and does not appear to exist in any header file after the old 16-bit olenls.h header is quite flawed, just as adding a NORM_IGNOREHYPHENATION or "Word Match Hyphenation" functionality would be.

Because most of the time, one cannot find a "Kashida" since there is no specific character to find, just as most Word hyphenation is done without a SOFT HYPHEN.

Perhaps there is value in a "FIND EXPLICIT KASHIDA" or "IGNORE EXPLICIT KASHIDA" though the latter should likely always be done (it is always done in Windows) and the former can be just as well served by a binary search and even better served by an explicit "stripping" function that gets rid of these types of characters with no semantic content. Attempting to match them in a find/search operation simply doesn't make a whole lot of sense here?

Which is why when I think about the people who build in such functionality and I wonder if they are kashidding me? :-)


This post brought to you by U+0640, a.k.a. ARABIC TATWEEL

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2012/07/16 if you see a ZWNBSP in the Release Preview, don't be insensitive and comment it hasn't been eating enough lately!

2010/08/31 And how exactly do you justify those frigging kashidas?

2010/05/28 The report of the need for a Uyghur hotfix may be an overstatement

2009/02/04 The road to hell is paved with attempts at being compatible

2007/05/17 If a bunch of specific Unicode characters can no longer live in the same apartment together, can they really claim that they needed their space?

go to newer or older post, or back to index or month or day