Under the "What About HyperThreading?" section the author states:
>"Ironically, a large
motivation behind SMT is likely the need to improve single threaded performance. Doing so involves targeting higher performance per clock with wider and deeper cores. But scaling width and reordering capacity runs into increasingly diminishing returns"
Cam someone what is meant here by "scaling width"? Is width the number of hardware threads on a chip? Also does this directly relate to "reordering capacity"? If so how?
Width here means the number of instructions that can be executed in parallel. Reordering capacity, also called instruction window depth, is orthogonal (although the area of the reorder buffer is proportional to width x depth).
I think what the parent is hinting at is that the ROB isn't typically a simple linear array of uops, but instead generally a 2D array. You normally have each line in a ROB being a handful of instructions that can be issued without dependencies, and normally has interesting constraints like only one branch per line. The number of uops normally listed as the ROB size generally is if you can achieve perfect packing per line, but it's hard to get at the specifics from micro benchmarks because you normally hit other bottlenecks first if the the ROB engineers did a great job.
Why is AVX-512 disabled on golden cove? Seems odd for them to work so hard on the architecture and these optimizations only to remove it. Is there a specific reason, such as power consumption or heat?
At the last minute they got strapped to efficiency cores (Gracemont) that didn't implement AVX-512, and the schedulers of OSes they cared about running couldn't handle heterogeneous core features. Because of that they probably didn't finish verification of AVX-512 on the Golden Cove cores.
I bought my i5-12500 (that has no Gracemont cores) just when it came out. Linux reported all kinds of AVX-512 extensions and I could transcode video with Handbrake using AVX-512 without any problems.
Later on Intel pushed a microcode update, and now my CPU doesn't have AVX-512 enabled anymore.
microcode updates are volatile, you can rollback your microcode package in your distro and get the capability back. Some BIOSes also apply microcode. If you updated your BIOS, roll that back instead.
I updated my BIOS, and I'd rather have security fixes than AVX-512 that I don't really need because all I do with this machine is surfing the internet and watching movies. Thanks for the tip nevertheless!
I was just referring to OP who said that Intel probably didn't verify AVX-512 thoroughly, because it seems they did.
It's amazing to me that Intel can - without being held accountable, and for obvious strategic commercial gain - disable vital, working functionality in products they've already sold through the use of combined updates which are otherwise legitimately meant to address things about their products which are dangerously broken. Apply them and lose the function or be vulnerable and unstable. They can't keep getting away with this, and yet, they do.
I mean, most vehicle manufacturers don't advertise that their transmissions have a 'reverse' gear but it doesn't mean they can just remove it using a software update...
Because the 'efficiency' cores are actually optimised for silicon area efficiency (and not just low power usage, like is generally the case with things like ARM big.LITTLE setups), and AVX-512 native takes up a lot of silicon space on the chips Intel has implemented it for.
And because it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities, it's essentially not use-able.
> it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities
You would need a way for processes to signal the kernel that they have left a function that uses a higher instruction set and are thus available for moving to another core, plus a way for the kernel to signal processes to update their function pointers as appropriate for the new core they got moved to.
> it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities
It is a software problem, and it could have been solved with software. It was simply assumed a priori that it couldn't be solved, so no one bothered to try.
This isn't a new decision, it came out a few years ago that you'd end up with disparate instruction sets between the main and efficiency cores, and they got warned it would be a problem. Their disabling of the instructions to avoid it came about as a response. Both Microsoft and Linux devs warned them about the difficulties it'd present.
Or they talked with Microsoft who refused to add it to anything but Windows 11. Or maybe they can't handle it because of some difficiency in their Linux ports.
The thing is it is possible and Intel knew for years that the E-cores wouldn't have AVX512.
> And because it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities, it's essentially not use-able.
Couldn't you implement the new instructions (even slowly) in microcode? so they support the instructions with minimal silicon, even if its as slow or slower then other vector instructions?.
>"Ironically, a large motivation behind SMT is likely the need to improve single threaded performance. Doing so involves targeting higher performance per clock with wider and deeper cores. But scaling width and reordering capacity runs into increasingly diminishing returns"
Cam someone what is meant here by "scaling width"? Is width the number of hardware threads on a chip? Also does this directly relate to "reordering capacity"? If so how?