Tokenisation turns a continuous signal into a normalized discrete vocabulary: st...

prlin · on March 22, 2024

> Text is actually numbers

Text can be represented by numbers but they aren't the same datatype. They don't support the same operations (addition, subtraction, multiplication, etc).

lamename · on March 22, 2024

Interesting. Can you explain how this is superior and/or different from traditional DSP filters or other non-tokenization tricks in the signal processing field?

dist-epoch · on March 22, 2024

Traditional DSP filters still output a continuous signal. And it's a well-explored domain, hard to imagine any low-hanging fruit there.

My intuition is the following: transformers work really well for text, so we could try turning a time series into a "story" (limited vocabulary) and see what happens.

lamename · on March 22, 2024

Like this or something different?

https://github.com/gzerveas/mvts_transformer