You might be interested in "compact GA" which only requires keeping the % of 1s (or 0s) at each bit position.
I suspect that with some more engineering and attention from people doing ML stuff, GA-style algorithms can be made just as memory and space efficient as gradient methods, while giving better results and being more widely applicable.
I suspect that with some more engineering and attention from people doing ML stuff, GA-style algorithms can be made just as memory and space efficient as gradient methods, while giving better results and being more widely applicable.
Here's a post on this: http://pchiusano.github.io/2020-10-12/learning-without-a-gra...