M1 Max has 400GB/s of memory bandwidth and a 4090 has 1TB/s of memory bandwidth,...

segfaultbuserr · on Dec 13, 2023

> M1 Max has 32 GPU cores and a 4090 has 16,000.

Apple M1 Max has 32 GPU cores, each core contains 16 Execution Units, each EU has 8 ALUs (also called shaders), so overall there are 4096 shaders. Nvidia RTX 4090 contains 12 Graphics Processing Clusters, each GPC has 12 Streaming Multi-Processors, and each SM has 128 ALUs, overall there are 18432 shaders.

A single shader is somewhat similar to a single lane of a vector ALU in a CPU. One can say that a single-core CPU with AVX-512 has 8 shaders, because it can process 8 FP64s at the same time. Calling them "cores" (as in "CUDA core") is extremely misleading, so "shader" became the common name for a GPU's ALU due to that. If Nvidia is in charge of marketing a 4-core x86-64 CPU, they would call it a CPU with 32 "AVX cores" because each core has 8-way SIMD.

jrk · on Dec 13, 2023

Actually each of those x86 CPUs probably has at least two AVX FMA units, and can issue 16xFP32 FMAs per cycle – it’s at least “64 AVX cores”! :)

kimixa · on Dec 13, 2023

Doesn't zen4 have 2x 256-bit FADD and 2x 256-bit FMA, and with avx512 ops it double-pumps the ALU (a good overview here [0]). If you count FADD as a single flop and FMA as 2, that's 48 "1 flop cores" per core.

I think it's got the same total FP ALU resources as zen3, and shows how register width and ALU resources can be completely decoupled.

[0] https://www.mersenneforum.org/showthread.php?p=614191

codedokode · on Dec 13, 2023

I think that 4090 has 16000 ALUs, not "cores" (let's call a component capable to execute instructions independently from others, a "core"). And M1 Max probably has more than 1 ALU in every core, otherwise it resembles an ancient GPU.

rsynnott · on Dec 13, 2023

Yeah; 'core' is a pretty meaningless term when it comes to GPUs, or at least it's meaningless outside the context of a particular architecture.

We may just be thankful that this particular bit of marketing never caught on for CPUs.

stonemetal12 · on Dec 13, 2023

Nvidia switched to marketing speak a long time ago when it came to the word "core". If we go with Nvidia's definition then M1 Max has 4096 cores, still behind the 4090, but the gap isn't as big as 32 to 16k.