There are no real benchmarks of how "natural/idiomatic" output is in a multitude...

		deaux 36 days ago \| parent \| context \| favorite \| on: Our eighth generation TPUs: two chips for the agen... There are no real benchmarks of how "natural/idiomatic" output is in a multitude of languages. "Multilingual benchmarks" are usually something like "How good is it at a multiple choice exam like the SAT in language X". This is a completely unrelated metric.

then there should be such a benchmark :)