Where’s that quote? Something like “AI is just compression, and compression is i...

daveguy · on Feb 20, 2024

I'm not sure the quote, but you're probably thinking something related to the Hutter Prize:

https://en.m.wikipedia.org/wiki/Hutter_Prize

A lossless compression contest to encourage research in AI. It's lossless, I think just to standardize scoring, but I always thought a lossy version would be better for AI -- our memories are definitely lossy!

Vecr · on Feb 20, 2024

> A lossless compression contest to encourage research in AI. It's lossless, I think just to standardize scoring, but I always thought a lossy version would be better for AI -- our memories are definitely lossy!

Gwern posts about this when people say something like that on here, but I'll do it instead. Lossless encoding is just lossy encoding + error correction of some sort.

daveguy · on Feb 20, 2024

Hah. "error correction of some sort" is doing a lot of heavy lifting there. Bit level correction is what I generally consider under the "error correction" umbrella. Which we have no problem with -- curious the bit error rate that would make text unreadable. In the context of compressing a large part of the English language wiki -- I think lossy also includes loss so significant that you wouldn't be able to reproduce the exact text. So, well beyond what we would generally consider "error correcting". But intuitively understandable as equivalent by humans. Impossible to quantify that objectively, hence lossless only for the competition.

GuB-42 · on Feb 22, 2024

The way we learn exact sentences is usually by getting an intuitive sense and applying corrections.

For example, to memorize "Doggo woofs at kity", we first get the concept of "dog barks at cat", it compresses well because intuitively, we know that dogs bark and cats are common targets. That's our lossy compression and we could stop there but it is only part of the story. It is not a "dog" but a "doggo", and it goes well with the familiar tone, a good compression algorithm will take only a few bits for that. Then there is the typo "kity" vs "kitty", it will take a bit of extra space, but again, a good algorithm will recognize the common typos and compress even that. So it means the entire process to lossless matters, lossy is just stopping halfway.

And if there is pure random noise remaining, there is nothing you can do, but all algorithms are on an equal footing here. But the key is to make what the algorithm consider as uncompressible noise as small as possible.

visarga · on Feb 20, 2024

> AI is just compression, and compression is indistinguishable from AI

Almost. Compression and AI both revolve around information processing, but their core objectives diverge. Compression is focused on efficient representation, while AI is built for flexibility and the ability to navigate the unpredictable aspects of real-world data.

Compression learns a representation from the same data it encodes, like "testing on the training set". AI models have different training and test data. There are no surprises in compression.

larodi · on Feb 20, 2024

lets say AI is not-so-smart JPEG which has more parts missing, and there is more guesswork when producing restoration.

compression is most of the times about finding the minimal grammar that unfolds to the same original material.

interestingly Fabrice Bellard somehow found a way to use transformers for compression without loss, and beats xz by significant margin. https://bellard.org/nncp/nncp_v2.1.pdf. it uses "deterministic mode of PyTorch" to make sure both directions work alike which I guess means - it saves the random toss throughout compression, for the decompression to use. note: this paper is still on my to-read list.

kimixa · on Feb 20, 2024

A lot of current compression techniques use prediction followed by some set of correction data to fix mis-predictions. If the prediction is more accurate, you can have a smaller correction set.

But you're right the predictor does need to be reproducible - the output must be exactly the same to match encoder and decoder behavior. While I don't think this is big focus right now for many, I don't think there's a fundamental reason why it couldn't be, though probably at the cost of some performance.

jazzyjackson · on Feb 20, 2024

representations' efficiency increases the more flexibility you give it

pyinstallwoes · on Feb 20, 2024

Intelligence is compressing information into irreducible representation.

5kg · on Feb 20, 2024

Language Modeling Is Compression: https://arxiv.org/abs/2309.10668

sandkoan · on Feb 20, 2024

Ilya says this here: https://www.youtube.com/watch?v=AKMuA_TVz3A

Rygian · on Feb 20, 2024

How does that make sense? Compression is deterministic (for same prompt, same output is algorithmically guaranteed). AI is only deterministic in corner cases.

leoff · on Feb 20, 2024

AI is always deterministic. We add noise to the models to get "non-deterministic" results, but if the noise and input is the same, the output is also the same.

david-gpu · on Feb 20, 2024

It's a bit more nuanced than that. Floating point arithmetic is not associative: "(A+B)+C" is not always equal to "A+(B+C)". Because of that, certain mathematical operations used in neural networks, such as parallel reductions, will yield slightly different results if you run them multiple times with the same arguments.

There are some people working hard to provide the means to perform deterministic AI computations like these, but that will come with some performance losses, so I would guess that most AIs will continue to be (slightly) non-deterministic.

nuancebydefault · on Feb 20, 2024

Is AI in general relying on floating point calculations?

david-gpu · on Feb 21, 2024

One hundred percent. It's mostly fancy floating point matrix multiplications.

Rygian · on Feb 21, 2024

That assumes an AI that is trained exactly once ever.