A lossless compression contest to encourage research in AI. It's lossless, I think just to standardize scoring, but I always thought a lossy version would be better for AI -- our memories are definitely lossy!
> A lossless compression contest to encourage research in AI. It's lossless, I think just to standardize scoring, but I always thought a lossy version would be better for AI -- our memories are definitely lossy!
Gwern posts about this when people say something like that on here, but I'll do it instead. Lossless encoding is just lossy encoding + error correction of some sort.
Hah. "error correction of some sort" is doing a lot of heavy lifting there. Bit level correction is what I generally consider under the "error correction" umbrella. Which we have no problem with -- curious the bit error rate that would make text unreadable. In the context of compressing a large part of the English language wiki -- I think lossy also includes loss so significant that you wouldn't be able to reproduce the exact text. So, well beyond what we would generally consider "error correcting". But intuitively understandable as equivalent by humans. Impossible to quantify that objectively, hence lossless only for the competition.
The way we learn exact sentences is usually by getting an intuitive sense and applying corrections.
For example, to memorize "Doggo woofs at kity", we first get the concept of "dog barks at cat", it compresses well because intuitively, we know that dogs bark and cats are common targets. That's our lossy compression and we could stop there but it is only part of the story. It is not a "dog" but a "doggo", and it goes well with the familiar tone, a good compression algorithm will take only a few bits for that. Then there is the typo "kity" vs "kitty", it will take a bit of extra space, but again, a good algorithm will recognize the common typos and compress even that. So it means the entire process to lossless matters, lossy is just stopping halfway.
And if there is pure random noise remaining, there is nothing you can do, but all algorithms are on an equal footing here. But the key is to make what the algorithm consider as uncompressible noise as small as possible.
> AI is just compression, and compression is indistinguishable from AI
Almost. Compression and AI both revolve around information processing, but their core objectives diverge. Compression is focused on efficient representation, while AI is built for flexibility and the ability to navigate the unpredictable aspects of real-world data.
Compression learns a representation from the same data it encodes, like "testing on the training set". AI models have different training and test data. There are no surprises in compression.
lets say AI is not-so-smart JPEG which has more parts missing, and there is more guesswork when producing restoration.
compression is most of the times about finding the minimal grammar that unfolds to the same original material.
interestingly Fabrice Bellard somehow found a way to use transformers for compression without loss, and beats xz by significant margin. https://bellard.org/nncp/nncp_v2.1.pdf. it uses "deterministic mode of PyTorch" to make sure both directions work alike which I guess means - it saves the random toss throughout compression, for the decompression to use. note: this paper is still on my to-read list.
A lot of current compression techniques use prediction followed by some set of correction data to fix mis-predictions. If the prediction is more accurate, you can have a smaller correction set.
But you're right the predictor does need to be reproducible - the output must be exactly the same to match encoder and decoder behavior. While I don't think this is big focus right now for many, I don't think there's a fundamental reason why it couldn't be, though probably at the cost of some performance.
How does that make sense? Compression is deterministic (for same prompt, same output is algorithmically guaranteed). AI is only deterministic in corner cases.
AI is always deterministic. We add noise to the models to get "non-deterministic" results, but if the noise and input is the same, the output is also the same.
It's a bit more nuanced than that. Floating point arithmetic is not associative: "(A+B)+C" is not always equal to "A+(B+C)". Because of that, certain mathematical operations used in neural networks, such as parallel reductions, will yield slightly different results if you run them multiple times with the same arguments.
There are some people working hard to provide the means to perform deterministic AI computations like these, but that will come with some performance losses, so I would guess that most AIs will continue to be (slightly) non-deterministic.