Confusing the nature of the confusion? a.k.a. A belated apology (and some additional thoughts)

by Michael S. Kaplan, published on 2007/09/04 02:46 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/09/03/4729217.aspx

I had the opportunity to hear from Mike (the one behind the Windows Speech Recognition language support in Windows Vista on speech @ microsoft) about some of the back-story behind the Vista speech recognition feature I was talking about in We're confusing internationalization and localization, AGAIN.

(It is important to point out an often forgotten issue -- when it comes to speech recognition, I am not just an employee of the company, I am also a customer!)

Now it is true that I am on the whole unhappy with the currently by design, intended for SKU differentiation stuff that is happening with regard to user interface language (as I pointed out in Additional personal speculation on the Vista MUI SKU story.

And it is true that the speech recognition feature in Vista does pivot off of the UI language.

HOWEVER, this is not a case of 2 + 2 = 4, and the reason for that has to do with the intent of the team doing the work (and to some extent the timing).

Intent counts for a lot with me, so let me explain....

On the speech side, there are two very different features -- the "command" feature used to drive the application, or the "content" feature that simply adds text to the document.

They had received a great deal of feedback on both the time that speech was built into Office and the time it has been in the Tablet PC, and two of the consistent pieces of feedback they have received is that:

having these two very different features use two different languages at the same time is confusing and impacts accuracy, and
having the "command" feature essentially control a user interface when the speech command language and the UI language is also very confusing and impacts accuracy due to the lack of good visual cues.

Now given these two facts and with a desire to simplify the experience, tying speech recognition to the UI language of Windows was a sensible decision based on user feedback that I think is genuine.

The only unfortunate parts of the plan in the end are:

the inconsistency of the input model;
the confusion of internationalization and localization that amounts to a localizability problem;
the fact that user interface language is a SKU differentiator, and thus an important part of the speech recognition feature is too (kind of violates the thoughts I have here, a little);
the conceptual similarity between the post and that WinFS post,which also talked about exciting opprtunities without even mentioning what has been lost thereby;

Now we all know I am a huge fan of better solutions for the underlying issues in #3, so no need to talk more about that here :-)

And #4 is, in the end, just a blog post. Not every one of them can cover everything. Later this month I'll have over 2000 of them here at SiaO, so I know something about this one...

But for what it's worth, looking at the harder-to-dismiss issues, I think #1 and #2 are actually solvable problems (as is the original problem that was "solved" with the UI language plan), with some careful thought on how best to expose the ability to change the speech recognition content language, perhaps based on (for example) actual commands to do the switching?

Definitely requires thought, but this is not unsolvable.

As someone who actually uses speech recognition (albeit primarily for content and only occasionally for command and perhaps a bit more often for language), I do think this is a solvable (and interesting!) technical problem, one that I would love to see people look into further. And a great way to make speech recognition more compelling as a feature (along with those other trivial issues like improved accuracy, of course!)....

At this point, I'd like to apologize to the folks on the speech team.

Tying yourself to someone else for good reasons and having them (further on down the line) make less-than-great decisions that impact you is not the same as making the less-than-good decisions yourself. And they didn't, so I am grateful to be able to take this opportunity to correct my initial mistaken impression here. :-)

This post brought to you by 𐐡 (U+10421, a.k.a. DESERET CAPITAL LETTER ER)

# MichaelGiagnocavo on 6 Sep 2007 12:23 AM:

So, lets keep putting pressure on whoever did make the stupid decisions? Anyways, all I see is as a customer is that I have to have the UI in the language I want to speak content. Lame.

# Michael S. Kaplan on 6 Sep 2007 12:48 AM:

I agree with that, definitely. I just don't want to villianize the wrong people! :-)

# Shoshannah Forbes on 16 Sep 2007 5:27 AM:

Hi.

Your post made me wonder- can the speech recognition handle multi-lingual input in content mode, or would I be restricted to using a single language?

# Michael S. Kaplan on 16 Sep 2007 9:50 AM:

It is restricted. :-(

# Shoshannah Forbes on 16 Sep 2007 1:05 PM:

Oh well :(

come to think of it- it's interesting to notice that in sci-fi, speech recognition tends to be strictly monolingual. Hmmm...

# Michael S. Kaplan on 16 Sep 2007 1:10 PM:

Well we don't know that speech recognition is monolingual in sci-fi -- they only ever speak one language to the computer! :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/11/15 Michael's Keyboard Laws for Developers, Part 1

2007/09/21 If you had gotten there first, you might have staked your claim too!

go to newer or older post, or back to index or month or day