Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The quality of a translation makes a dramatic difference in the reading experience. I have 4 or more copies of the art of war, each with different translators. Dante’s inferno, the Illiad, Dostoyevsky, Beowulf, Confucius: some translations are unreadable and one or two incredible. Automation is just not going to produce the really good translations. The only benefit of a technique like this will be to generate unlicensed translations of work that will not otherwise get translated, which isn’t even that big a deal because the communities already produce unlicensed translations of just about everything.

Edit: The other use is to help create better context aware data scrapers that can combine bi-modal information streams and add some body language understanding. I guess it will probably end up in automated surveillance tech like security cameras/mics etc if it works.



I can think of quite a few manga/anime/videogames where a machine translation would've been acceptable because I just needed a bare minimum understanding of what was going on. Think of any number of 8-bit games where the game is 95% hack-and-slash, but a random villager tells you before the last boss fight, "You must make sure to wear the enchanted amulet in order to pierce the dragon's scales!"

It's one throwaway line, but if the line is in Japanese, an American playing an imported ROM might spend an hour of frustration wondering why his sword does zero damage to the final boss.


The nes had plenty of real examples where poor translations made the game harder or made secrets obscure and hard to find without a guide. Castlevania 2 and Adventure of link come to mind.

Actually, i'd say most text heavy games translated from japanese on the nes suffered from this problem and made those games way more confusing than they should have been.


Sure...but I don't feel like we're comparing apples to apples here, since I explicitly mentioned text-light games and you brought up the most text-heavy, RPG like entries in the respective CV and Zelda series (of that generation).

To boot, Simon's Quest also had the atrociously bad idea that NPCs in the game can lie to you and give intentionally incorrect information, making the translation effort that much more confusing.


If a low-quality machine translation is sufficient to make those games playable because they're text-light, I question whether that's actually any better than having a human who knows the language spend an hour swapping the text out instead. You'd be surprised how often machine TL will muck up things like menu options or item names. The appeal of automated TL makes way more sense if it means saving hundreds of hours of localization work (not that it does...)


mostly because it would take more than an hour and involves a translator interested. I sure can't speak Japanese (yet), but I have enough technical chops to spend a few hours getting something workable out with this alone.

Then from there maybe I can trigger Cunningham's law and get the attention of someone who knows what they are doing. Sounds like a win-win for me.


"You must wear a desirable necklace to surpass the balances of the great lizard"


A bad translation can improve bad writing. By using unfamiliar phrasing and word choice it decontextualizes, allowing the player to imagine meanings and nuance where it isn't there.


I get what you're going at, but I think it fundamentally misunderstands something.

If the writing is bad, there's nothing that can really "improve" it other than the original author cleaning it up with the help of an editor. A bad translation is effectively a new work, at best inspired by the original bad script. You could replace a bad Japanese script with a "good" English one - this has happened before - but at that point it's questionable whether any translation has happened at all, you're mostly writing new content inspired by the original work or adhering to broad constraints. What I'd say you're doing here is improving the experience of playing the game, but you haven't done anything meaningful to the writing or script.

In a few cases western companies have licensed Japanese works and spliced them together with entirely new plots for overseas audiences - Robotech is one infamous example where arguably there was nothing wrong with the source material and the result wasn't just a liberal translation.


> If the writing is bad, there's nothing that can really "improve" it other than the original author cleaning it up with the help of an editor. A bad translation is effectively a new work, at best inspired by the original bad script. You could replace a bad Japanese script with a "good" English one - this has happened before - but at that point it's questionable whether any translation has happened at all, you're mostly writing new content inspired by the original work or adhering to broad constraints.

What distinction are you drawing between working with an editor vs working with a translator? Often it's a very similar process, and there are cases where something is cleaned up in a translation and then that gets incorporated back into the next edition in the original language.


Typically the translator is not working with the author and they're not involved particularly early in the writing/publication process. They often come to the work months or years later. There are certainly exceptions, though.


One very amusing example of that being http://winterson.com/2005/06/episode-iii-backstroke-of-west.... (there is a fandub too, watch it if you have the time...)


This could be good for games, but I think any manga/prose that's simplistic enough to be boiled down to purely functional phrases like "wear the amulet to fight the dragon" would probably not be worth a read...


“Shorter/simpler/obvious the sentence is the easier it must be” isn’t actually the case with translations.

UI strings being short usually means hidden heavy context lies in visual elements, so it’ll just strengthen hilarity in mistakes like “Name: SQL Server, Province/Prefecture: Running” (because you know, equivalents to provinces in a region are called “State” in American English...).

“Province” is more or less harmless, but “(has/is/is in/to/like to)Start(ed/ing) type of errors due to missing context can make UI unusable. Oh and it’s un-spottable by non-speakers because they make sense when translated back to original languages.


People read entire machine translated novels. It's not too uncommon with Chinese xianxia/xuanhuan novels. We're talking thousands of pages. It's not as incomprehensible as you might think. I didn't find it enjoyable though. I can certainly see a place for automatically translated manga. It's not a sellable product though.


    I didn't find it enjoyable though
Sure, yeah - I think we're on the same page. My post was a reference to enjoyability more than functionality.

I have definitely read novels that were long, yet rote and simplistic even in their native language. =)

But they were not works worth reading in any language IMO. They could probably be satisfactorily machine translated (with some human editing) but the result would not be enjoyable except for ultra diehards of the genre who are simply happy to be reading a work from that particular genre, quality of prose be damned.

Those enjoyable xianxia/xuanhuan you mention were either rote and boring in the first place, or they were wonderfully written and had the life crushed out of them by a machine translation that dispensed with all nuance.


Whole world, except English-speaking countries, have/had exactly this experience.


I think the sweet spot is machine-assisted translation.

If you can get the output to be 90-95% correct, you can then display the raw and the machine output side-by-side, and have a human make corrections inline. Instead of a team of four working around the clock for a day or two, maybe you could have a translation as fast as it takes to proofread it three or four times end-to-end.

Rev is the same idea in the speech transcription space -- they have humans listen to the audio and fix up a machine-generated transcription.


Machine-assisted translation is deceptively bad, especially for fiction content like manga. The machine TL might mix up the order of a sentence like 'bob verbed alice', fundamentally changing the meaning, alter context, or omit implied information. All of these errors will produce english sentences that appear sensical and valid and will get past most editors and quality checkers unless they're familiar with the source language or very familiar with the work. This sort of error even creeps into official human translations of anime/manga in some cases when the translators are working quickly without a good editor (i.e. simulcasts where they have tight deadlines and low budgets)

In practice if you look at the fan translation community for manga machine-assisted translation is not given much more respect than machine translation - they both produce bad results and in many cases the people who normally welcome even a clumsy translation will reject machine TL and attempt to have it removed, because it often causes people to fundamentally misunderstand the work. The worst cases of machine-assisted TL in manga become infamous to the point of becoming shared memes - try googling "abaj" or "duwang" sometime.

For a very simple pervasive example: Japanese frequently uses gender-neutral pronouns and when translating to English you'll need to appropriately select the right gendered pronoun (or proper name) for each one, if you can. This is something a human can do pretty accurately if they have enough context and knowledge of the material, but it is nearly impossible for a computer to do it accurately without a ton of assistance. In a novel this would be an easier problem because all the necessary context is in the text instead of in the art and panel layouts. You'll note this arxiv paper intentionally cheats on the gender problem.


I've seen some people run visual novels through automated translations in aggregate like this: https://static.wikia.nocookie.net/muvluv/images/8/89/Be_a_go...

You can definitely get some of the gist in there, but some of the automated translations are just way off. And pretty much none of them result in good prose. In addition, none of the translations get the names fully correct, so you definitely need someone to go and fix it.


I think you underestimate the amount of work that's required when "proofreading", you need another translator that could have done the work themselves. And it's not like you can just exchange a few words here and there and consider it done. If a human has to make corrections they'll likely rewrite the entire sentence and then the advantage of machine-assisted translation is pretty much gone. Overall you'll save time but the amount of time saved is probably less than what we think.


I think I’ve come across a PSA about this scheme. The story was that unaware translators puts discount on proofreading, so some clients send in dummy machine translation to qualify as “proofreading” task to get full translation at proofreading pricing.


You can’t get output to be “90-95% correct” from machine translation.

People think of language translation as some sort of same dimension transformation but it’s more like re-projection that involve rotation in upper dimension. Simple warping goes only so far, neural networks give some uncanny slurries, human artists add a lot of their own brush strokes and it’s a lossy process both ways.

Speech transcription is much more straightforward because speakers are supposed to have corresponding single literal expressions for each segments of voice.


I am not sure if you have looked at the paper once, but they talk about context aware translation only.


I've read a few Light Novels (no text bubbles, little to no pictures) that were machine translated, and they've all been horrible to read through - you could barely make sense of the sentences, and the whole reading experience became frustrating rather than pleasurable.


I think the potential of this tool, like many industry tools, is to streamline the translation process, not completely automate it. I can see this speeding up the process and lightening the load on stuff like typesetters, even if the final translation isn't perfect.


> which isn’t even that big a deal because the communities already produce unlicensed translations of just about everything

I think those community translators will be very happy to have some of their work automated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: