Catching a Unicorn with GLTR: A tool to detect automatically generated text

snrji · on March 9, 2019

I wonder if these detections would become just impossible once NLP researchers figure out how to integrate adversarial training (specially GANs) into their models.

gwern · on March 9, 2019

What OP is detecting is not a flaw in GPT-2 but in how GPT-2 is used to generate samples: it chooses only from the top k most likely words, so you can easily detect GPT-2 simply by running it and noting that no sampled word appears from below, say, the top-40 candidates, while human text will occasionally have a top-41 or a top-99 or a top-1000 word.

If you fix that with a better sampling strategy like beam search or RL finetuning, unclear how well OP-like methods would work.

tibbon · on March 9, 2019

I'd love to see a Chrome plugin for this, to look at text on various sites and flag out of it's likely machine-written.

Animats · on March 9, 2019

That's neat, but many of the tests are using text generated with the same GPT-2 data set used for testing the text.

(Also, whatever generates their images of text blocks has what looks like interpreting Windows-1252 as Unicode.)

deathanatos · on March 9, 2019

> has what looks like interpreting Windows-1252 as Unicode

The other way around. The underlying data is UTF-8; it's being misread as either latin1 or Windows-1252. E.g., "Pérez",

  Original string: "Pérez"
  Encoded as UTF-8: b'P\xc3\xa9rez'
  Those bytes, erroneously decoded as latin1: "PÃ©rez"

https://en.wikipedia.org/wiki/Mojibake

These two (latin1 and Win-1252) are the most common mis-decoding. UTF-8 data is common. HTML/HTTP/the "text/*" mimetype default to latin1. Windows, for North American users, defaults to Win-1252 in places. (Both encodings are extremely similar.)

It used to be that if you typed "QuÃ©bec" into Google, it'd give you Québec, and wouldn't even issue a "did you mean?", just silently corrected it. Seems like that's not the case anymore. Back when it was, I wondered if the "machine" thought that QuÃ©bec was an alternative spelling. (The results for "QuÃ©bec" are still decently relevant.)

userbinator · on March 9, 2019

It sounds a good idea for deterring spam, etc. until you realise that such techniques can also be used for very insidious forms of censorship. Imagine writing about some controversial topic/opinion and getting "you've been banned because your comment looks like it was written by a robot."

evgen · on March 9, 2019

If your comment is so banal as to trip this filter then maybe you haven't actually contributed anything to the discussion anyway, so what is the loss? I don't mean this to be as glib as it might sound, but as the old saying goes, 'opinions are like assholes, everyone has one' and if your proposed addition to the discussion is indistinguishable from auto-generated text there is little lost by removing it.

xkcd-sucks · on March 9, 2019

I think the implication is either that

1) The people running the filter are saying that in bad faith, they have actually trained it to catch "Bad Ideas" instead of autogenerated text

2) Training a GAN to mimic your political opponents and spamming sites so the bad political opinions themselves become heuristics of spam

evgen · on March 9, 2019

In the referenced paper the trigger for the filter is lack of novelty, not any specific content. In neither of the cases you present would you be able to demonstrate that a filter like this could be operated in way to discriminate on content rather than on novelty other than by shifting the corpus in a manner that would be obvious to everyone.

userbinator · on March 9, 2019

Just like with spam filters, the problem isn't when they work; it's the false positives that are really the concern, and what I've seen with all the machine-learning stuff these days is that they tend to fail in mysterious ways.