How Machine Translation can help you most

10/26 Track1 15:00-16:30

マイク・ディリンジャー Mike Dillinger

LinkedIn Manager, Taxonomies and Human Judgements
Mike Dillinger, PhD, is Manager of taxonomies and human judgements at LinkedIn. Before that, he was manager and computational linguist at eBay, an independent consultant for Fortune 500 companies, and Director of Linguistics at both Spoken Translation and Global Words. He wrote the widely circulated LISA Best Practices Guide: Implementing Machine Translation, published a wide range of articles about linguistics, semantics, and machine translation, contributed to the emerging standards OLIF and UNL, and was awarded two patents for translation technology. Dr. Dillinger has taught at more than a dozen universities in several countries and has been a visiting researcher on four continents.

報告者:Richard Sadowsky (フリーランス翻訳者)


I had not expected a talk on MT to pleasantly surprise me, but that is just what Mike Dillinger accomplished with his clear-headed view on machine translation. Professional translators like myself who have been working in the trenches of J-E translation for the last 30 years have seen how wrong and nonsensical MT output can be at times, so we are skeptical that MT will replace us anytime soon. We have also grown accustomed to presentations by researchers or vendors that overhype the capabilities of MT. Dillinger was up front about the limitations of MT, winning me over right away and focusing my attention on his win-win message of “use it and make it better.” Here is the gist of his presentation.

What is There to Lose?

While there does exist a fear that human translators will be replaced, MT has not, in fact, eliminated jobs. The use of MT at language service providers (LSPs) has exploded since 2010, especially for European languages. Translation industry revenues more than doubled from ~$19B in 2005 to ~$40B in 2016 (Common Sense Advisory data). Dillinger shared a 2012 quote from Arle Lommel, “Machine translation will displace only those humans who translate like machines.”

MT and Translators

Today’s choices are: raw MT or raw MT plus partial or full post-editing, but post-editing is not a sustainable model. Translators hate it, as they have to fix the same errors many times, so it is hard to convince translators to be post-editors. Translators think that they are paid less for their work. (But in fact, if the MT is good enough—gets it 60% right—MT can speed up the translation process and translators or post-editors can earn a higher hourly wage). Using an MT system, the translator’s job is very similar to using a translation memory (TM). A good match can be accepted or tweaked. A bad match would need either post-editing or translating from scratch, as would a non-match. Remember that the overall goal of the system is to reduce the work of the translator./p>

MT and Clients

On the client side, many clients do not understand why and how translation providers use MT. Some clients think they are being cheated. Others think they will get poor quality if MT is used; yet others think mistakenly that their source texts will be visible on the Internet. None of these is the case. The lesson is to not talk about MT to clients. How you manage your tools can be considered a trade secret. It is sufficient to address the clients’ primary interests: price, delivery time, and quality. Talking about MT with most clients creates problems!

Making MT Useful

General-purpose MT from big providers will not be good enough for specific projects, so it is essential that translation providers customize the MT system for each project or client. You cannot wait for generic third-party MT providers like Google or Microsoft Bing. The translation provider should establish its own in-house machine translation, as many large vendors already do.

Key point: The output of machine translation will be bad if you have no control, no ownership. It is best to build your own. The tools are free. You only need the people with know-how. The large LSPs in Europe and America hire two or three people to build a specific engine for specific projects, and they don’t charge the client. They save money by going faster. They may even save 50% of their cost and thus be able to offer a 30% discount. So, ownership of the tools and process is critical!

MT Best Practices

You also need to know MT best practices. Improve your translation work flow (especially pre-processing) by identifying weaknesses in the work flow. This can be more effective than MT, which accounts for only one part of the translation work flow. As everyone knows, garbage in (the source) results in garbage out (from MT). MT cannot fix everything!

When to Use MT?

When there’s no other choice (extremely large projects, e.g.) and only when you can customize the MT engine for your project. Start with clients who have very large TMs. Be careful of source text variability. Rather than spending time choosing a system that you don’t own and have no control over, adopt an MT system that you can use and customize. Partial ownership means partial control. Build your own MT to have full control. It doesn’t have to be good at the beginning. It will improve. So, own the MT systems that you use.

Neural MT

Neural MT (NMT) is hypnotizing, but it is not magic and cannot work miracles. Progress is being made on NMT, but problems still remain. The output is more readable, but it still translates on the basis of the training corpus, not the real-world content that you need it for. The key strength of neural over statistical MT is that it looks at the entire sentence instead of just two or three words before and after the word it is translating. NMT therefore captures and leverages much more context. You still need to control the source and clean up the TMs used to train the system.

The Future: Adaptive MT

With MT systems today, the humans clean up the mess made by the machines, and we call that “post-editing”. The future will be AI systems under human control—systems that augment human expertise with machine intelligence. Today’s adaptive MT does not yet learn from rules, guidelines, or example documents. But once you have an MT system that instantly adjusts as you translate, you can leverage the capabilities of the machine (scale and speed) to augment the strengths of the human mind (handling nuance and variability). Adaptive MT is not yet available for everyone—Lilt offers it, and there are similar products (one to be available in Trados 2019)—but what I call “hybrid intelligence” translation appears to be the best way forward to create a better future for translators. Researchers need your help to build these systems for the future!

Q & A

Q:Isn’t it contradictory to say, "Don’t tell the customer," because to build the corpus you need the client’s material. Our client doesn’t let us keep their texts for reasons of confidentiality. What can we do?
A:You need to negotiate. Tell them that you want to use their texts to analyze and improve your tools and processes (without mention of MT), so you can serve them better.

Q:About customization, what can small LSPs do if they don’t have technological resources?
A:Most big providers gather as much data as possible to build a baseline model (on all topics), plus a smaller model for each project. This is difficult and expensive for small companies, because they can’t get access to a large corpus. What you can do is access the baseline models of Microsoft and Google’s MT systems and customize them with your own data. That is the most practical way to start.

Q:Isn’t pre-editing hard because the source changes all the time?
It’s true that rewriting doesn’t always seem to improve MT output. In most cases, pre-editing is very useful, but sometimes not for a single target language. What you can do is make terminology more consistent, as well as numbers, dates, abbreviations, etc.

Q:Which MT systems can be used by individual translators? Are they customizable?
Google Translator Toolkit and Microsoft Translator Hub are available and customizable. There is also the open source CAT tool MateCat. There are also paid products like Amazon Translate, Systran, and Kantan. The translator community needs to pressure companies and researchers to tell them what you need.

Q:Isn’t it dangerous to use Google Translate because it can save your text and compromise confidentiality?
I don’t think there are any significant risks in practice for many reasons, but the short answer is that databases where the source documents end up are totally separate from the databases that contain web pages for display, so there’s no way for anyone to see your documents-for-translation. Although in principle MT websites could store all the documents that they receive, they already have far too much data and it is unlikely they would want to save the data because of the great expense.