Being smart, by not trying to be clever

by Michael S. Kaplan, published on 2008/11/06 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/11/06/9046784.aspx


 

There are times that Microsoft Word is too smart for its own good

The message I received via the Contact link was:

Dear Michael,

First, I would like to thank you for your great BLOG!  Well done!  I've spent many hours reading and studying various articles and have found all of them helpful AND entertaining as well.  And it's hard to do both of those things at once!

Although I've worked in the Java world for years, I'm new to Windows programming.  So please forgive my ignorance!

The app that I'm building involves using 2 new keyboard layouts.  So I built them using MSKLC.  Everything was fine except they did not quite work in Microsoft Word 2007 running on XP.  Most of the keyboard worked just fine.  In particular the Devanagari layout didn't quite work in Word although it works fine in Notepad, Wordpad, and Excel 2007!  It turns out that when I use a dead key to obtain some of the cerebral consonants in a glyph they don't combine!  And it only happens when the dead key letter is in second position!

Let me explain more clearly with an example.

Here is what is supposed to happen (see the Devanagari Unicode table):

Let + be short for the words "followed by" in the example below

0938 + 094d + 0925

This sort of thing works just fine in MOST cases.  The 2 letters are combined into a glyph & are properly rendered on-screen

However, if the last code point in the example is created by using a dead key (dead key + someKeystroke), the 2 letters do NOT combine into a new glyph as they do all the other times.  If you then move the cursor backwards on the screen to select the letters of the word, then the glyph fixes itself and properly renders!

So I've found a way around this by simply not using dead keys but it makes for a less convenient keyboard mapping.

What's happening here?  I saw an article where one of your readers said something about Rich Edit causing something like this.  It was a small and somewhat vague remark.  Could this be the case here?  I don't mean for you to spend a lot of time on this.  If you could just point me in the right direction to understand what is happening, I'd be grateful.

Thanks again Michael.  Tell your boss that I said you're doing a great job!

Dharma

Well that was quite a nice note to get!

I thank you Dharma, for all the kind words. I won't go so far as to say it's why I'm here, though I will say it makes the time here nicer!

I'll forward the note on to my manager as per that last sentence, though of course since this a personal blog it is unknown how much it will impress since this isn't my actual job that you're talking about. But it can't hurt, at least. :-)

Now to the actual technical question....

In this example, we are taking a consonant

U+0938, aka DEVANAGARI LETTER SA

and combining it with another consonant

U+0925, aka DEVANAGARI LETTER THA

and making a conjunct with a bit of Unicode-esque Virama glue, so it will look like this:

स्थ U+0938 U+094d U+0925

Or maybe if you wanted to see a visible VIRAMA in there you'd add the ZWNJ, like so:

स्‌थ U+0938 U+094d U+200c U+0925

Anyway, you get the idea.

Now at the very beginning I talked about how some times that Word is too smart for its own good.

This would be one of those times.

You see, Word tries its best to be helpful, something that folks have at times even made fun of, e.g.

but even the most diehard of Microsoft critics will admit that Word really is trying be helpful -- this is why Word's competitor try to do the same kind of thing!

But there are times when its attempts to pay attention to what you are doing, in this case how you are entering the text, actually seem to do more harm than good.

You could perhaps try the advice in You're not the one out of sequence, and that's the Word (turn off the sequence checking) to see if that helps, though I don't think it will here. This is actually Word interupting its own Uniscribe layer since "there aren't any keyboards that would ever do this kind of thing" and it is saving some processing time trying to shape things that need no shaping.

Thus it isn't changing what you've input like the sequence checking can do in its artificial-intelligence-style-autocorrect attempts (you mentioned yourself that moving the cursor back over the letters and selecting them will "fix them up". And actually saving, closing, and re-opening it seems to fix the text up, as dies scrolling text way off the screen and then scrolling back to it.

In this case, Word's being so smart is mostly for its own benefit, not yours.

Which makes it all the more unfortunate that there isn't a way to give Word some hints here on what is going on.

Though the fact that it is not hurting the text being stored in the document gives you two potential workarounds, one of which seemed to work for me all the time and the other of which sometimes worked:

  1. You can ignore the problem and it will go away in the future when Word has to render the text when it is not watching you type it, or
  2. You can try to build in AutoCorrect sequences that replace the string with itself

The third workaround (don't do the dead key on the "shaping" character) is one implied in the original description, but since that changes how the keyboard itself is trying to do its work, it's not one I would suggest.

Perhaps if Word was not trying so hard to be clever, it could be smarter here?

Though obviously the Word folks would triage the importance of this bug by how common it is -- and of course this kind of keyboard would be pretty uncommon (sorry, Dharma!). But I'll forward it on, either way....


This blog brought to you by(U+0938, aka DEVANAGARI LETTER SA)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2009/07/22 The Letter Police can EAT MY SHORTS!

go to newer or older post, or back to index or month or day