You might be interested in "compact GA" which only requires keeping the % of 1s ...

You might be interested in "compact GA" which only requires keeping the % of 1s (or 0s) at each bit position.

I suspect that with some more engineering and attention from people doing ML stuff, GA-style algorithms can be made just as memory and space efficient as gradient methods, while giving better results and being more widely applicable.

Here's a post on this: http://pchiusano.github.io/2020-10-12/learning-without-a-gra...