Saturday, 15 July 2017

MIT Technology Review/George Anders: Language translation. Now Microsoft's making strides with Skype


MIT Technology Review

    Language translation. Now Microsoft’s making strides with Skype.

    by George Anders March 28, 2016

Katrina Rippel is a careful speaker who follows all the rules. Hao Chen is a more freewheeling conversationalist. And I’m a nonstop troublemaker, constantly blurting out whatever notions pass through my head. On a recent morning, the three of us met in cyberspace to find out how well (or poorly) we could communicate in a mixture of German, Mandarin, and English. Each of us spoke only our native language.

Donning headphones, we tapped into Skype Translator, a creation of Microsoft’s research team. (My chatting partners were part-time consultants to Microsoft, who happened to be thousands of miles away from my West Coast U.S. base.) When I asked Chen where he grew up, it didn’t faze me to hear him say: 我在中国的家乡在东北,辽宁省,鞍山市. A few seconds later, a friendly synthetic voice told me: “My hometown is in the northeast in China, Liaoning Province, Anshan.”

If only the rest of our exchanges had worked so smoothly. When Chen tried to explain his U.S. travels, Skype mishandled an ambiguous Mandarin noun, telling me he had journeyed to “the cadre of New York.” Only when Chen tried different phrasing was Skype willing to make it “the state of New York.” When I asked Rippel about her German hometown, Skype’s software, expecting me to speak English, not German, heard me say “dressed” instead of “Dresden.” So it created a gibberish sentence, in which the German word bekleidet appeared instead of her city name.

As such stumbles show, machine-based translation of everyday speech isn’t there yet, despite 30 years of trying. It’s our own fault, really. If we spoke with the clarity and precision of United Nations diplomats, then artificial-intelligence tools could decode everything according to well-established patterns. The more we rely on uncharted words or syntax to get our thoughts across, the harder it is for translation software to get everything right without extra help.

Even so, Microsoft, Google, Baidu, Facebook, IBM, and many others are vying for supremacy in this difficult field. Offering top-grade voice recognition and translation can become an attractive calling card that helps lock in customers for many other services. These range from Internet search to cloud computing, in which data storage and processing are provided via remote servers and an Internet connection.

Worldwide cloud and infrastructure spending topped $115 billion last year and is growing at a 28 percent annual rate, according to Synergy Research. Real-time translation can help competitors’ overall suites of cloud services stand out in what otherwise becomes a price-driven commodity business. For now, most translation services are available free of charge, but paid alternatives may emerge as global enterprises seek customized translation tools that work even better.

Microsoft, in particular, is exploring ways that corporate users can build greater capabilities atop the basic Skype Translator engine that Rippel, Chen, and I tested. One area of interest: helping customers load in thousands of specialized terms, reference documents, sample conversations, and quirky locutions ahead of time. That way, “Dresden”-type problems should be much less likely to occur.

Translation software works much better if it can tap into a hefty database of language patterns that particular speakers are likely to use, explains Microsoft research strategy director Vikram Dendi. Management consultants may use terms like “delta” and “granularity” in contexts that seem unthinkable to the rest of us. Industrial chemists may banter about more than a dozen different kinds of phthalates. And inside any big company, nicknames for projects, processes, and top executives are endless.

Offering top-grade voice recognition and translation can help lock in customers for many other services, ranging from Internet search to cloud computing.

Since 2011, Microsoft has allowed big customers to load their own glossaries or written materials into specialized text-translation databases. That’s meant to produce more reliable results than Microsoft’s basic Bing service provides, especially on dense technical material. More than 100,000 users have opted for customization, Dendi says. Light users may pay as little as $40 a month; heavy users such as Adobe and Twitter may pay far more.

Microsoft has tried a wide range of strategies to crack translation since the mid-1990s, when company founder Bill Gates predicted that speech recognition would be widely available within 10 years. Early approaches relied heavily on attempts to catalogue specific rules of grammar and usage. Starting in 2009, Microsoft broadened its emphasis. Statistical techniques have been paired with neural networks, a machine-learning system based on the structure and self-teaching nature of the human brain.

Currently, Microsoft uses five layers of neural nets to analyze speech, according to Peter Lee, head of the company’s research division. The lowest layers analyze sounds on a level that’s as rudimentary as the way image-analyzing software looks for edges and surfaces, without making any attempt to figure out what objects might be. As with many advanced machine-intelligence approaches, there’s some mystery to how it works, even to the researchers involved. “It has nothing to do with words or phonemes,” Lee says, referring to the sounds that distinguish one word from another. “I don’t think any of us understand exactly what the bottom layer is looking at. But it works surprisingly well.”

Microsoft’s researchers have also been making greater use of what’s known as “long short-term memory.” When recognizing speech or translating, neural nets make a series of guesses that keep getting revised as new information comes in. Occasionally, an expected pattern suddenly falters. In such cases, neural nets can do a better job of regrouping if they can revisit the assumptions that led to several words’ worth of guesses. Keeping a longer trail in the system’s short-term memory makes such retracing and subsequent corrections possible.

It takes at least 4,000 hours of spoken samples—and millions of words of text—to train Skype Translator’s neural nets in each new language. Arul Menezes, head of Microsoft’s machine-translation team, says he had expected difficulties in languages such as Arabic, where speakers’ accents can vary widely. But by collecting samples of enough different speakers’ voices, Menezes says, it’s been possible to develop Skype Translator’s “ear” for different intonations to the point that regional accents aren’t a problem. The same applies to the differences between male and female voices.

Other variations in everyday speech turn out to be trickier. The neural nets are exquisitely sensitive to differences in microphones. (Humans may be good at detecting the difference between static and speech, but that’s much harder for machines to master.) Pauses in speech are problematic, too. As Menezes notes, “People generally don’t pause at the end of a sentence. They pause elsewhere. Pauses end up being useless in detecting when a sentence begins or ends. You have to go by the words themselves.”

Sorting out the right translations for ambiguous words is a never-ending challenge, too, Menezes acknowledges. While speaking German, Rippel frequently uses the word Sie, which can mean she, you, or they, depending on the situation. Skype Translator gets it right about 80 percent of the time.

Similarly, Skype Translator stumbles slightly when Chen discusses family sizes in China. Regardless of government policies, Chen tells me, the sheer cost of parenting in China means that “a lot of people only want to give birth to a child.”

A few minutes later, sitting in Building 99 of Microsoft’s headquarters, Menezes and I review a transcript of the conversation. Menezes ruefully points to the child-raising exchange. “That should be ‘one child,’” he says. “But in Chinese, there’s no distinction between ‘one’ and ‘a.’ There’s a difference in English, but it has to be picked up entirely in context.”

“I don’t think professional translators are quaking in their boots yet at what we’re doing,” he adds, with a thin smile. “Their jobs are safe for quite a while.”

Rippel, a professional translator, isn’t nearly so critical. As long as users speak slowly and keep sentences short, she says, automated services such as Skype Translator can do a useful job of overcoming language barriers.

“It’s very important that this tool exists,” she says. “In current times, it’s more important than ever for people in all communities to be able to talk to one another.”

Hear more about artificial intelligence at EmTech MIT 2017.
Register now

Tagged

Skype, Microsoft, artificial intelligence, AI, AI Takes Off, Business Report, Business Impact, neural networks, machine translation
Credit

Illustration by James Graham; Data source: Microsoft
George Anders

George Anders Guest Contributor

George Anders is a writer based in Northern California. He shared in the 1997 Pulitzer Prize given to the Wall Street Journal for national reporting.
More from Business Impact
AI Takes Off

How technology advances are changing the economy and providing new opportunities in many industries.

    AI Hits the Mainstream

    More industries are looking for ways to use artificial intelligence. What will that mean for the technology’s future?

    by Nanette Byrnes
    Machine Learning for Everyone

    Recent advances are making machine learning useful outside the tech industry, says the leader of the Google Brain research group.

    by Tom Simonite
    How AI Is Feeding China’s Internet Dragon

    China’s biggest Internet company, Baidu, is pushing an ambitious effort to add artificial intelligence to its products.

    by Will Knight
    Man and Machine

    Despite many advances, AI still works best when paired with humans.

    by Robert D. Hof
    Siri for Business

    While Apple’s Siri and Microsoft’s Cortana get the attention, AI-powered assistants and software geared toward businesspeople are increasingly popular too.

    by Elizabeth Woyke

More from Business Impact
From Our Advertisers

    In partnership with Hewlett Packard Enterprise
    A Field Guide to Digital Transformation

    In partnership with Hewlett Packard Enterprise
    A Fireside Chat: Unlocking the Power of Hybrid, Flexible IT

    In partnership with Samsung
    Fight Scale with Scale

Want more award-winning journalism? Subscribe to Insider Plus.

    Insider Plus $79.95/year*

    Everything included in Insider Basic, plus ad-free web experience, select discounts to partner offerings and MIT Technology Review events
    Subscribe

       
    See details+

* Prices are for U.S. residents only

See international prices
The Download What's important in technology and innovation, delivered to you every day.

Follow us
Twitter Facebook RSS
MIT Technology Review

The mission of MIT Technology Review is to equip its audiences with the intelligence to understand a world shaped by technology.

Browse
International
Editions

    Company
    Your Account
    Customer Support
    More
    Policies

MIT Technology Review © 2017 v.|eiπ|
Post a Comment