To be fair, this whole Deep Learning renaissance was made possible and kicked of...

bcatanzaro · on Feb 17, 2017

Yes, I totally agree. Yann LeCun, Geoff Hinton, Jurgen Schmidhuber and others did unpopular work for a long time. And they deserve tons of credit for their perseverance which paid off.

Similarly, I think it's great that there are AI researchers working on techniques which are currently out of favor. It's important to have diversity of viewpoint.

What irritates me about neuromorphic computing is that much of the work I see publicized (including the work in this article) isn't being presented as basic research on a risky hypothesis. Instead it's presented as the future of AI, despite the current lack of any demonstrated utility, and the almost complete disconnect between the AI researchers building the future of AI and the neuromorphic community.

The burden of proof is always on the researcher to show utility, and if the neuromorphic computing community can do that, I'll be super excited! Until then, I'll be waiting for something measurable and concrete, and rolling my eyes at brain analogies.

Russell91 · on Feb 17, 2017

> Yes, I totally agree. Yann LeCun, Geoff Hinton, Jurgen Schmidhuber and others did unpopular work for a long time.

...

> Until then, I'll be ... rolling my eyes at brain analogies.

Maybe you don't realize this, but these guys made more brain analogies than you can count over the same period to which you attribute their greatness. Meanwhile, they were attacked year after year by state-of-the-art land grabbers saying the same things you just did.

> isn't being presented as basic research on a risky hypothesis.

It is basic research, but it's not a risky hypothesis. Existing neuromorphic computers achieve 10^14 ops/s at 20 W. Thats 5 Tops/Watt. The best GPUs currently achieve less than 200 Gops/Watt. Where is the risk in saying that a man-made neuromorphic chip can achieve more per dollar than a GPU. There is no risk, and suggesting that this field is somehow has too much risk for advances to be celebrated is absolutely crazy.

deepnotderp · on Feb 17, 2017

Non-neuromorphic (analog) deep learning chip startup here. We're forecasting AT LEAST ~50 TOPS/watt for inference.

Russell91 · on Feb 17, 2017

Sure - I guess it's productive for me to answer why this doesn't disagree with my comment. By the time you get the software to hook up that kind of low bit precision (READ: neuromorphic) compute performance with extreme communication-minimizing strategies (READ: neuromorphic), which will invariable require compute colocated, persistent storage (READ: neuromorphic) in any type of general AI application, you're not exactly making the argument that neuromorphic chips are a bad idea.

We literally have to start taking neuromorphic to mean some silly semantics like "exactly like the brain in every possible way" in order to disagree with it.

Edit: also, to ground this discussion, there are extremely concrete reason why current neural net architectures will NOT work with the above optimizations. That's the primary motivation for talking about "neuromorphic", or any other synonym you want to coin, as fundamentally different hardware. AI software ppl need to have a term for hardware of the future, which simply won't be capable of running AlexNet well at all, in the same way that a GPU can't run CPU code well. I think the term "neuromorphic" to describe this hardware is as productive as any.

p1esk · on Feb 17, 2017

Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compare them to GPUs, those "ops" better be FP32 or at least FP16.

Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations".

Russell91 · on Feb 17, 2017

>Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compare them to GPUs, those "ops" better be FP32 or at least FP16.

The comparison is of 3 bit neuromorphic synaptic ops against FP8 pascal ops. That factor is important (as it means that the neuromorphic ops are less useful), but it turns out to be dwarfed by the answer to your second question:

> Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations".

this is rather difficult to justify in this margin. But the idea is that proposals such as those above (50 Tops) tend to be optimistic on the efficiency of the raw compute ops. But these proposals really don't have much to say about the costs of communication (e.g. reading from memory, transmitting along wires, storing in registers, using buses, etc.). It turns out that if you don't have good ways to reduce these costs directly (and there are some, such as changing out registers for SRAMs, but nothing like the 100x speedup from analog computing), you have to just change the ratio of ops / bit*mm of communication per second. There are lots of easy ways to do that (e.g. just spin your ops over and over on the same data), but the real question is how to get useful intelligence out of your compute when it is data starved. This is an open question, and (sadly), very few ppl are working on it, compared to say low-bit-precision neural nets. But I predict this sentiment will be changed over the next few years.

Edit for below: no one is suggesting 50 Top/w hardware running alex net software to my knowledge (though would love to hear what they are proposing to run at that efficiency) . Nvidia among others are squeezing efficiency for cv applications with current software, but this comes at the cost of generality (it's unlike the communication tradeoffs they're making on that chip will make sense for generic AI research), and further improvements will rely on broader software changes, esp revolving around reduced communication. There are a lot of interesting ways to reduce communication without sacrificing performance, such as using smaller matrix sizes, which would reverse the state of the art trends.

p1esk · on Feb 17, 2017

Regarding your first answer, sounds like you're doing apples to oranges comparison here. What are those "synaptic ops"? Xavier board is announced to be capable of 30 Tops (INT8) at 30W, so even if your neuromorphic chip does 100 Tops at 20W, assuming for a second those ops are equivalent to INT3 operations, this makes them very similar in efficiency.

And you still haven't answered my second question: what is the reason the future neuromorphic chips won't be able to run current neural net architectures?

I'm not even sure what you are talking about at the end of your comment. The 50Tops/W figure was promised for an analog chip, designed to run modern DL algorithms. Sounds pretty reasonable, and I don't see how your arguments apply to it. Are you saying we can't build an analog chip for DL? Why does it have to be data starved?

deepnotderp · on Feb 17, 2017

Our hardware can run AlexNet...

Russell91 · on Feb 17, 2017

In an integrated system at 50 tops/watt? How are you going to even access memory at less than 20 fJ per op? Like, you're specifically trying to hide the catch here. If we were to take you at face value, we'd have to also believe that Nvidia is working on an energy optimized system that is 50x worse for no good reason.

For reference, reading 1 bit from a very small 1.5kbit sram, which is much cheaper than the register caches in a gpu, costs more than 25 fJ per bit you read.

deepnotderp · on Feb 17, 2017

So this is locked up in "secret sauce". But as a hint, the analog aspect can be exploited.

Russell91 · on Feb 17, 2017

Look, it sounds like your implying compute colocated storage in the analog properties of your system (which is exactly what a synaptic weight is btw), on top of using extremely low bit precision. So explicitly calling your system totally non-neuromorphic is a little deceiving. But even then I find this idea that you're going to be running the AlexNet communication protocol to pass around information in your system to be a little strange. If you're doing anything like passing digitized inputs through a fixed analog convolution then you're not going to beat the SRAM limit, which means that instead you have in mind keeping the data analog at all times, passing it through an increasing length of analog pipelines. Even if you get this working, I'm quite skeptical that by the time you have a complete system, you'll have reduce communication costs by even half the reduction you achieve in computation costs on a log scale. It's of course possible that I'm wrong there (and my entire argument hinges on the hypothesis that computation costs will fall faster than communication - which is true for CMOS but may be less true for optical), but this is really the only projection on which we disagree. If I'm right, then regardless of whether you can hit 50 Tops (or any value) on AlexNet, you'd be foolish not to reoptimize the architecture to reduce communication/compute ratios anyway.

p1esk · on Feb 18, 2017

Oh, I see what you meant now. Yes, when processing large amount of data (e.g. HD video) on an analog chip, DRAM to SRAM data transfer can potentially be a significant fraction of the overall energy consumption. However, if this becomes a bottleneck, you can grab the analog input signal directly (e.g. current from CCD), and this will reduce the communication costs dramatically (I don't have the numbers, but I believe Carver Mead built something called "Silicon Retina" in the 80s, so you can look it up).

Power consumption is not the only reason to switch to analog. Density and speed are just as important for AI applications.

deepnotderp · on Feb 18, 2017

I should clarify, once data enters the chip, we provide 50 tops/W. The transfer from dram is not included.