Saying all those nouns over and over again...

by Michael S. Kaplan, published on 2004/12/15 03:14 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2004/12/15/309881.aspx


A few days ago, I pointed out that machine translation is not easy. I still believe in a lot of what I said there, but it turns out that I may have (unintentionally) palmed a card there. I think it may be worth exploring that exploring the issue a bit.

The simplified definition I made for pragmatics is good as far as it goes (though a predicted I made n linguists shudder, where n is greater than one). However, where I said

pragmatics refers to the implicit knowledge that the two people on opposite sides of a communication attempt have.

It would be better (more accurate) to say that

pragmatics refers to the knowledge outside the scope of the words being communicated by the one person and to the other.

It may seem like hair-splitting, but its not. Think of the words "Elissa went to the movie. She said it was terrible." If one only looks at the second sentence, then all of the content of the first sentence conveys knowledge outside of its scope. That knowledge, while outside the scope of the second sentence, obviously allows one to properly understand much more about its context. Pronouns in successive sentences that refer to knowledge contained in earlier sentences obviously provide a way for machine translation to understand more about the pragmatic content than that for which original definition allowed, right?

Going back to my childhood where Schoolhouse Rock first taught me about pronouns when Albert Andreas Armadillo (via the voice of Jack Sheldon) taught me that although I could say

Now I have a friend named Rufus Xavier Sasparilla,
and I could say that Rufus found a kangaroo
That followed Rufus home
And now that kangaroo belongs
To Rufus Xavier Sasparilla

that it was in fact much easier to say "HE found a kangaroo that followed HIM home and now IT is HIS". While it is true that "saying all those nouns over and over can really wear you down," it is obvious that without the context of knowing that HIM == Rufus Xavier Sasparilla, that abbreviated sentence runs into some problems in understanding. A machine translation engine that is smart enough to work out those issues will be significantly better than one that is not.

Similar examples of context exist beyond pronouns that can give benefits that are comparable or even superior to a simple semantic translation. Such issues may even be discoverable and determinable by algorithm. But here is where the problems come in, and where the original difficulties I talked about come back. Many of these issues are not so easily determined even by human readers (I recall my third grade teacher Mrs. Galan pointing out that sentences with too many pronouns could not be understood by her, let alone by anyone else!). The concept of machines that can properly discern all of these pragmatical issues is a bit beyond a lot of the current art/science in the field of machine translation.

Lets take another more interesting example, Jack Winter's How I Met My Wife, from the 23 July 1994 New Yorker:

It had been a rough day, so when I walked into the party I was very chalant, despite my efforts to appear gruntled and consolate.

I was furling my wieldy umbrella for the coat check when I saw her standing alone in a corner. She was a descript person, a woman in a state of total array. Her hair was kempt, her clothing shevelled, and she moved in a gainly way.

I wanted desperately to meet her, but I knew I'd have to make bones about it since I was travelling cognito. Beknownst to me, the hostess, whom I could see both hide and hair of, was very proper, so it would be skin off my nose if anything bad happened. And even though I had only swerving loyalty to her, my manners couldn't be peccable. Only toward and heard-of behavior would do.

Fortunately, the embarrassment that my maculate appearance might cause was evitable. There were two ways about it, but the chances that someone as flappable as I would be ept enough to become persona grata or a sung hero were slim. I was, after all, something to sneeze at, someone you could easily hold a candle to, someone who usually aroused bridled passion.

So I decided not to risk it. But then, all at once, for some apparent reason, she looked in my direction and smiled in a way that I could make heads and tails of.

I was plussed. It was concerting to see that she was communicado, and it nerved me that she was interested in a pareil like me, sight seen. Normally, I had a domitable spirit, but, being corrigible, I felt capacitated---as if this were something I was great shakes at---and forgot that I had succeeded in situations like this only a told number of times. So, after a terminable delay, I acted with mitigated gall and made my way through the ruly crowd with strong givings.

Nevertheless, since this was all new hat to me and I had no time to prepare a promptu speech, I was petuous. Wanting to make only called-for remarks, I started talking about the hors d'oeuvres, trying to abuse her of the notion that I was sipid, and perhaps even bunk a few myths about myself.

She responded well, and I was mayed that she considered me a savory character who was up to some good. She told me who she was. "What a perfect nomer," I said, advertently. The conversation became more and more choate, and we spoke at length to much avail. But I was defatigable, so I had to leave at a godly hour. I asked if she wanted to come with me. To my delight, she was committal. We left the party together and have been together ever since. I have given her my love, and she has requited it.

Why do we like this so much? Well, I do even if you don't. :-)

Its a fun trick taking 74 words/phrases with one connotation and using the uncommon but inverse form to mean the opposite. Almost anyone old enough to understand the words will understand what is happening here, and in this case the line between semantic and pragmatic is quite blurry since every bit of it is buried in the text (other than the 74 terms, of course). You can find that different linguists may disagree on what is semantic and what is pragmatic here.

But in any case, imagine what a machine translation of this story would look like -- would it truly be able to capture any of the real intent of the story that is so obvious to all of the human readers of it? Although never stated explicitly, the intent is as clear as a Seattle day isn't. And that is a pragmatic intent.

Interestingly, the ability to automate this pragmatic aspect of machine translation requires a science that is also beyond us at the moment -- artificial intelligence. But thats a topic for another day....


# Zach Glazer on 15 Dec 2004 7:30 AM:

Interesting post. I had read that article from the New Yorker before, but still find it quite amusing. Seems like you have a quality informative site here. Good job.
We'll stock up on the limonata for your next visit.

-Zach

# Michael Kaplan on 15 Dec 2004 7:57 AM:

Uh oh, they are taking th gloves off now....

An endless supply of free Limonata? Is there room over the garage for me to move in? :-)

# Michael Giagnocavo on 18 Dec 2004 7:47 AM:

It gets worse... some languages don't use pronouns (or not much), and a lot more is inferred through context. Also, it's much more than simply /understanding/ the text (knowing the subject of a verb), but also in translation, conveying the same voice. If you took the English and converted to another language, leaving all the pronouns it, it'd sound very odd...

# Mike Dunn on 18 Dec 2004 10:44 AM:

One of my favorite linguistic terms is "cranberry morph" meaning a morpheme that is only used in combination with others, and can't be a word on its own. Such as cran (cranberry), kempt (unkempt), ruth (ruthless), promptu (impromptu), and so on.

# Michael Kaplan on 18 Dec 2004 11:01 AM:

Heh heh heh -- I had actually been staying on the side of understanding a language well enough to consider translating it.

You have hit on the most important next blockade -- that different languages are often missing concepts that are needed for the easiest job to translate it.

It is an additional huge problem with the New Yorker article; it is hard to imagine the actual intent being captured in another language -- thats like trying to capture puns!

# Alexandru Pojoga on 20 Dec 2004 3:13 AM:

Machine translation requires *nothing* short of artificial intelligence, otherwise it will always be relegated to niches, where all meanings are clear, and glossaries are complete and unambiguous.

Generalizing it to "everything" (including things like your New Yorker article) would mean an infinite glossary.

So it's like designing a chess program -- it's unrealistic to predict *all* games, so you'd better come up with AI.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2006/10/30 Semper ubi sub ubi while doing translation badly

2006/08/09 Changing more than just the conversation

2005/06/19 Star Trek with linguistic pretensions

2005/03/13 IStemmer'ed the tide (or, Language-specific processing #2)

go to newer or older post, or back to index or month or day