Must see: The successes and challenges of making low-data languages available in online automatic translation portals and software

by Michael S. Kaplan, published on 2010/08/16 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/08/16/10050409.aspx


Reader Stephen Nuchia mentioned over in the Suggestion Box yesterday:

I know this is short notice, but my friend Jeff Allen will be speaking at Microsoft Research on his work supporting "low data languages".  Sounds boring, unless you were one of the people trying to plan missions or direct rescuers to victims in Haiti this past january.  Tuesday the 17th, in building 99.

http://www.allenkeys2languages.org/2010/08/public-talk-on-low-data-languages-in-translation-speech-systemssoftware/

I thought this might be of interest to you and your readers.

Indeed, this is very interesting! :-)

I will shamelessly copy from the page linked to above so you know why if you are in Redmond and find languages interesting, you might be interested just as I am....

Presentation title: The successes and challenges of making low-data languages available in online automatic translation portals and software.

Date: Tues, August 17, 2010
Location: Microsoft, Redmond, Washington, USA
Room: Building 99, Room 1919
Time: 2:00PM-3:30PM
Speaker: Jeff Allen

Directions to building 99
http://research.microsoft.com/en-us/labs/redmond/visit.aspx

This is an open invitation public talk

Summary:
The majority of development work and deployment of machine translation (MT) technologies over the past several decades have been for international languages. Only a few projects for low-data/low-density/low resource/sparse-data/less-prevalent/lesser-commonly taught/minority languages have led to successful prototypes and products. There are a certain number of technical, logistical, social, educational and other factors which influence and impact the potential success of implementing systems for such languages. This talk will cover many of the lessons learned from previous projects, and some of the pitfalls to avoid. It will also demonstrate how the recent efforts for making Haitian Creole available for Haiti Disaster Relief had a certain level of success in record time because of the ability to build upon previous work. Yet, there were also obstacles with have been problematic and remain a concern for this language and for other less-prevalent languages.
Lastly, the discussion will mention some ways to enable proactive, forward thinking projects, using some bootstrapping methods, to reduce the risk of situations which can result from working in a primarily reactive mode.

The talk will also include speech recognition and speech synthesis technologies.

The area is a fascinating one, and one I have been involved with for years in a lesser aspect involving core underlying support like input and display. But this talk is the next logical step for emergent situations, and one that also takes a wider view of not waiting only until the sky is falling (my involvement in the urgent situations were much more about a reactive moment).

Now my problem is that I have a meeting from 3pm-4pm that I can't cancel and that I am running in a building on nearly the opposite end of the main campus, a good 5-10 minutes by shuttle (if the shuttle is on time). So I am going to hope it is being recorded or I'm gonna miss at least half of it. :-(


no comments

go to newer or older post, or back to index or month or day