Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could somebody explain how the M1 can reach 900 GFlops or more? My knowledge is probably outdated by the M1 cpu runs on 3.2Ghz with 8 cores and each instruction needs at least 1 clocktick, so I would say it could reach max 25.6 GFlops? Similarly, the M1 GPU seems to have only 8 cores on 1.2 Ghz.

Obviously something is wrong in my reasoning, where are all those GFlops coming from?



It runs on the GPU (as written in the title).

Also, this assumption is wrong too:

> and each instruction needs at least 1 clocktick

Modern[1] CPU use a superscalar architecture[2] which allows them to excute more than one instruction per clock cycle (usually 4 or more[3]).

[1]: well, it's been the case for the past decades actually [2]: https://kb.iu.edu/d/aett [3]: https://stackoverflow.com/questions/37041009/what-is-the-max...


Thanks, as you say it must be running on the GPU then because even with 5 superscalar instructions per clocktick the M1 CPU wouldn't be anywhere near 900GFlops. And besides, isn't the thing that makes the M1 so fast the fact that it's a RISC processor that is -not- superscalar?

The headline suggested this had anything to do with the specific qualities of the M1 but apparently this has nothing to do with the M1 CPU? Any modern GPU can easily reach 1 TFlop nowadays.


> And besides, isn't the thing that makes the M1 so fast the fact that it's a RISC processor that is -not- superscalar?

I don't know where you got the idea that RISC = “not superscalar”, but it's a wrong one. There are a lot of superscalar ARM CPUs out there [1].

> The headline suggested this had anything to do with the specific qualities of the M1 but apparently this has nothing to do with the M1 CPU?

The M1 is a SoC, with a CPU and a GPU (and other things) on the same chip.

> Any modern GPU can easily reach 1 TFlop nowadays.

Yep, but that's still quite a feat to do this on a on a low-power mobile SoC.

[1]: https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures


Modern processors execute more than one instruction per cycle by executing several instructions in parallel on each core.

In sequential code, it's common that some instructions are independent and can theoretically be executed in parallel, this is measured with ILP (Insturction Level Parallelism). Modern processors exploit ILP by detecting dependencies between instructions and execute independent instructions in parallel. These are superscalar processors.

In addition, some extensions of the instruction sets add instructions that allow you to compute several data at the same time. They are called SIMD extensions (Single Instruction Multiple Data).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: