I know there's real development and innovation here, but any time I hear about randomly or freely-wired neural networks, I can't help but be reminded of the "hacker koan":
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied. "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.
I wonder if this gets at the Chomsky theory about an ingrained language or approach, and if so then it would be useful to pattern the initial arrangement of a neural network intentionally to match what biology has already laid out.
Haven't heard the term "freely wired" before, but FAIR released an exploration of randomly wired neural networks which seems conceptually similar. https://arxiv.org/abs/1904.01569
I don't see how that relates. You don't stick to a single random wiring, you only make the wiring random so that the space of possible wirings of the network is not constrained by the classical sequential paradigm.
The story is more about how a random set of weights WILL have preconceptions of how to play. If you look at condition numbers or spectra of random normal matrices they are very much not random.
Back in engineering school, a million years ago, my partner and I used simulated annealing to “design” a digital circuit implementation in CMOS (a 32 bit CRC), minimizing various parameters like wire length to optimize its function. It worked shockingly well when simulated.
I am a huge fan of using randomized starting states and then allowing the computer to discover the best architecture. It produces, if nothing else, surprising results.
This is good, I think research in this direction will yield the next breakthrough in ML.
The current hierarchical feed-forward model of neural networks is what's limiting our advances. If you look at ResNet or DenseNet, their skip-connections are hacks to bypass that limitation and it brings great improvements. And if you look at the rooting by agreement technique for capsule networks, it's clear that getting biases from later layers to earlier layers improves things as well.
And even in our own virtual cortex, it's only up to V1 that things are remotely hierarchical, after that it's a mess of looping wiring between areas.
We need to throw away the current neural network and adopt a new paradigm that can express both feed-forward and feed-backward connections innately.
It's not like that hasn't been tried before though. Boltzmann machines have all-to-all connectivity and RNNs have feedback connections (and sometimes backward connections).
NNs have been studied since the 90s and a lot has been tried already. I think one should also keep the bitter lesson in mind.
Convolutional neural networks were also tried and failed to fulfill their potential for 20 years (or 50 depending on how far you go back) until they didn't and now they're ubiquitous. You cannot discard a whole area of research because one or some implementations of a concept have so far failed to become competitive.
All to all connectivity is possibly the worst paradigm unless you want to do architecture search so I wouldn't hold it as an example of anything wrt this concept.
As for recurrent neural networks to my knowledge they've only been used for recurrence in time or space not recurrence between layers for a single sample, though I might be wrong, so they're not relevant to what I'm speaking of. Though there is some work on skip connections (forward and backward) which takes some inspiration from their gating mechanisms.
If freely wired neutral networks are DAGs, I wonder how cyclic graphs of neurons behave, or even if that model would be theoretically meaningful, or computationally feasible.
Yep, they've existed for decades. Look up "RNN" (recurrent neural network) and "LSTM" (long stort-term memory). They were the standard for neural-network based time-series processing for a while until recently supplanted by Transformers.
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied. "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened.