Ah, sensible. Generally with the TLB numbers and the memory latency you can get a good idea of the performance by interoplating between the two number for any number of TLB misses (from 0% to 100%).
The M1 Max also has a crazy number of memory channels, even the older M1 has 4 channels, pro has 8 channels, and max has 16. So you can have many more cache misses in flight, this is part of why the m1 max is 20% faster than the 5950x with twice as many cores.
I have not seen TLB size reports for M1/M1 Pro/Max, but Anandtech has reported following figures for the A14:
«The L1 TLB has been doubled from 128 pages to 256 pages, and the L2 TLB goes up from 2048 pages to 3072 pages. On today’s iPhones this is an absolutely overkill change as the page size is 16KB, which means that the L2 TLB covers 48MB which is well beyond the cache capacity of even the A14» [0].
For the default page size on A14 of 16kb, that gives the 4Mb L1 cache coverage (16kb * 256), and 48Mb (16kb * 3072) L2 cache coverage. Off the top of my head only POWER 9/10 (and maybe POWER11) come close with such large TLB's. aarch64 page sizes come as fixed presets, e.g.:
- 4kb, 2Mb, and 1Gb
- 16kb, 32Mb
- 64kb, 512Mb
The OS can choose either of the three, and OS X defaults to the 16kb / 32Mb preset, although I have not seen whether it can simultaneously handle both, 16kb and 32Mb, page sizes.
A garbage collected runtime that has been optimised to make use of large TLB's or large page sizes can exploit the full advantage of the increased TLB depth. Azul JVM comes to mind with their ZGC garbage collector having been heavily optimised for terrabyte scale Java memory workloads.
I am now very curious to see the M1 Max vivisection results that would reveal whether the TLB size in it is even larger (or not).
Honestly it's kinda crazy what's possible these days. Couple years ago you'd easily burn 30+ watts on just the memory chips to get this kind of bandwidth.
Indeed, 64GB ram, 400GB/sec, decent GPU, ML acceleration, 10 cores, etc all in a small package that gives good battery life to a relatively thin laptop.
Here's hoping they put the same in the mac mini. Anyone interested in a linux port join the Marcan patreon, I'm kicking in a few $ a month.
The M1 Max also has a crazy number of memory channels, even the older M1 has 4 channels, pro has 8 channels, and max has 16. So you can have many more cache misses in flight, this is part of why the m1 max is 20% faster than the 5950x with twice as many cores.