by Michael S. Kaplan, published on 2004/12/15 03:14 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/15/309881.aspx
A few days ago, I pointed out that machine translation is not easy. I still believe in a lot of what I said there, but it turns out that I may have (unintentionally) palmed a card there. I think it may be worth exploring that exploring the issue a bit.
The simplified definition I made for pragmatics is good as far as it goes (though a predicted I made n linguists shudder, where n is greater than one). However, where I said
pragmatics refers to the implicit knowledge that the two people on opposite sides of a communication attempt have.
It would be better (more accurate) to say that
pragmatics refers to the knowledge outside the scope of the words being communicated by the one person and to the other.
It may seem like hair-splitting, but its not. Think of the words "Elissa went to the movie. She said it was terrible." If one only looks at the second sentence, then all of the content of the first sentence conveys knowledge outside of its scope. That knowledge, while outside the scope of the second sentence, obviously allows one to properly understand much more about its context. Pronouns in successive sentences that refer to knowledge contained in earlier sentences obviously provide a way for machine translation to understand more about the pragmatic content than that for which original definition allowed, right?
Going back to my childhood where Schoolhouse Rock first taught me about pronouns when Albert Andreas Armadillo (via the voice of Jack Sheldon) taught me that although I could say
Now I have a friend named Rufus Xavier Sasparilla,
and I could say that Rufus found a kangaroo
That followed Rufus home
And now that kangaroo belongs
To Rufus Xavier Sasparilla
that it was in fact much easier to say "HE found a kangaroo that followed HIM home and now IT is HIS". While it is true that "saying all those nouns over and over can really wear you down," it is obvious that without the context of knowing that HIM == Rufus Xavier Sasparilla, that abbreviated sentence runs into some problems in understanding. A machine translation engine that is smart enough to work out those issues will be significantly better than one that is not.
Similar examples of context exist beyond pronouns that can give benefits that are comparable or even superior to a simple semantic translation. Such issues may even be discoverable and determinable by algorithm. But here is where the problems come in, and where the original difficulties I talked about come back. Many of these issues are not so easily determined even by human readers (I recall my third grade teacher Mrs. Galan pointing out that sentences with too many pronouns could not be understood by her, let alone by anyone else!). The concept of machines that can properly discern all of these pragmatical issues is a bit beyond a lot of the current art/science in the field of machine translation.
Lets take another more interesting example, Jack Winter's How I Met My Wife, from the 23 July 1994 New Yorker:
It had been a rough day, so when I walked into the party I was very chalant, despite my efforts to appear gruntled and consolate.
I was furling my wieldy umbrella for the coat check when I saw her standing alone in a corner. She was a descript person, a woman in a state of total array. Her hair was kempt, her clothing shevelled, and she moved in a gainly way.
I wanted desperately to meet her, but I knew I'd have to make bones about it since I was travelling cognito. Beknownst to me, the hostess, whom I could see both hide and hair of, was very proper, so it would be skin off my nose if anything bad happened. And even though I had only swerving loyalty to her, my manners couldn't be peccable. Only toward and heard-of behavior would do.
Fortunately, the embarrassment that my maculate appearance might cause was evitable. There were two ways about it, but the chances that someone as flappable as I would be ept enough to become persona grata or a sung hero were slim. I was, after all, something to sneeze at, someone you could easily hold a candle to, someone who usually aroused bridled passion.
So I decided not to risk it. But then, all at once, for some apparent reason, she looked in my direction and smiled in a way that I could make heads and tails of.
I was plussed. It was concerting to see that she was communicado, and it nerved me that she was interested in a pareil like me, sight seen. Normally, I had a domitable spirit, but, being corrigible, I felt capacitated---as if this were something I was great shakes at---and forgot that I had succeeded in situations like this only a told number of times. So, after a terminable delay, I acted with mitigated gall and made my way through the ruly crowd with strong givings.
Nevertheless, since this was all new hat to me and I had no time to prepare a promptu speech, I was petuous. Wanting to make only called-for remarks, I started talking about the hors d'oeuvres, trying to abuse her of the notion that I was sipid, and perhaps even bunk a few myths about myself.
She responded well, and I was mayed that she considered me a savory character who was up to some good. She told me who she was. "What a perfect nomer," I said, advertently. The conversation became more and more choate, and we spoke at length to much avail. But I was defatigable, so I had to leave at a godly hour. I asked if she wanted to come with me. To my delight, she was committal. We left the party together and have been together ever since. I have given her my love, and she has requited it.
Why do we like this so much? Well, I do even if you don't. :-)
Its a fun trick taking 74 words/phrases with one connotation and using the uncommon but inverse form to mean the opposite. Almost anyone old enough to understand the words will understand what is happening here, and in this case the line between semantic and pragmatic is quite blurry since every bit of it is buried in the text (other than the 74 terms, of course). You can find that different linguists may disagree on what is semantic and what is pragmatic here.
But in any case, imagine what a machine translation of this story would look like -- would it truly be able to capture any of the real intent of the story that is so obvious to all of the human readers of it? Although never stated explicitly, the intent is as clear as a Seattle day isn't. And that is a pragmatic intent.
Interestingly, the ability to automate this pragmatic aspect of machine translation requires a science that is also beyond us at the moment -- artificial intelligence. But thats a topic for another day....
# Zach Glazer on 15 Dec 2004 7:30 AM:
# Michael Kaplan on 15 Dec 2004 7:57 AM:
# Michael Giagnocavo on 18 Dec 2004 7:47 AM:
# Mike Dunn on 18 Dec 2004 10:44 AM:
# Michael Kaplan on 18 Dec 2004 11:01 AM:
# Alexandru Pojoga on 20 Dec 2004 3:13 AM:
referenced by
2006/10/30 Semper ubi sub ubi while doing translation badly
2006/08/09 Changing more than just the conversation
2005/06/19 Star Trek with linguistic pretensions
2005/03/13 IStemmer'ed the tide (or, Language-specific processing #2)