Golden Cove’s Lopsided Vector Register File

bogomipz · on Dec 29, 2022

Under the "What About HyperThreading?" section the author states:

>"Ironically, a large motivation behind SMT is likely the need to improve single threaded performance. Doing so involves targeting higher performance per clock with wider and deeper cores. But scaling width and reordering capacity runs into increasingly diminishing returns"

Cam someone what is meant here by "scaling width"? Is width the number of hardware threads on a chip? Also does this directly relate to "reordering capacity"? If so how?

wmf · on Dec 29, 2022

Width here means the number of instructions that can be executed in parallel. Reordering capacity, also called instruction window depth, is orthogonal (although the area of the reorder buffer is proportional to width x depth).

titzer · on Dec 29, 2022

(although the area of the reorder buffer is proportional to width x depth).

No, the reorder buffer size is proportional to the number of μ-ops, it does not depend on the number of execution ports.

monocasa · on Dec 29, 2022

I think what the parent is hinting at is that the ROB isn't typically a simple linear array of uops, but instead generally a 2D array. You normally have each line in a ROB being a handful of instructions that can be issued without dependencies, and normally has interesting constraints like only one branch per line. The number of uops normally listed as the ROB size generally is if you can achieve perfect packing per line, but it's hard to get at the specifics from micro benchmarks because you normally hit other bottlenecks first if the the ROB engineers did a great job.

binarymax · on Dec 28, 2022

Why is AVX-512 disabled on golden cove? Seems odd for them to work so hard on the architecture and these optimizations only to remove it. Is there a specific reason, such as power consumption or heat?

monocasa · on Dec 28, 2022

At the last minute they got strapped to efficiency cores (Gracemont) that didn't implement AVX-512, and the schedulers of OSes they cared about running couldn't handle heterogeneous core features. Because of that they probably didn't finish verification of AVX-512 on the Golden Cove cores.

nosebear · on Dec 29, 2022

I bought my i5-12500 (that has no Gracemont cores) just when it came out. Linux reported all kinds of AVX-512 extensions and I could transcode video with Handbrake using AVX-512 without any problems.

Later on Intel pushed a microcode update, and now my CPU doesn't have AVX-512 enabled anymore.

dmitrygr · on Dec 29, 2022

microcode updates are volatile, you can rollback your microcode package in your distro and get the capability back. Some BIOSes also apply microcode. If you updated your BIOS, roll that back instead.

nosebear · on Dec 29, 2022

I updated my BIOS, and I'd rather have security fixes than AVX-512 that I don't really need because all I do with this machine is surfing the internet and watching movies. Thanks for the tip nevertheless!

I was just referring to OP who said that Intel probably didn't verify AVX-512 thoroughly, because it seems they did.

justinjlynn · on Dec 29, 2022

It's amazing to me that Intel can - without being held accountable, and for obvious strategic commercial gain - disable vital, working functionality in products they've already sold through the use of combined updates which are otherwise legitimately meant to address things about their products which are dangerously broken. Apply them and lose the function or be vulnerable and unstable. They can't keep getting away with this, and yet, they do.

wmf · on Dec 30, 2022

In this case AVX-512 was never advertised so Intel feels justified in removing an "easter egg".

justinjlynn · on Jan 3, 2023

I mean, most vehicle manufacturers don't advertise that their transmissions have a 'reverse' gear but it doesn't mean they can just remove it using a software update...

berkut · on Dec 29, 2022

Because the 'efficiency' cores are actually optimised for silicon area efficiency (and not just low power usage, like is generally the case with things like ARM big.LITTLE setups), and AVX-512 native takes up a lot of silicon space on the chips Intel has implemented it for.

And because it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities, it's essentially not use-able.

pabs3 · on Dec 29, 2022

> it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities

You would need a way for processes to signal the kernel that they have left a function that uses a higher instruction set and are thus available for moving to another core, plus a way for the kernel to signal processes to update their function pointers as appropriate for the new core they got moved to.

https://wiki.debian.org/InstructionSelection

moonchild · on Dec 29, 2022

> it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities

It is a software problem, and it could have been solved with software. It was simply assumed a priori that it couldn't be solved, so no one bothered to try.

Twirrim · on Dec 29, 2022

This isn't a new decision, it came out a few years ago that you'd end up with disparate instruction sets between the main and efficiency cores, and they got warned it would be a problem. Their disabling of the instructions to avoid it came about as a response. Both Microsoft and Linux devs warned them about the difficulties it'd present.

wmf · on Dec 29, 2022

Or they talked to Microsoft who said it would take years to add that feature.

undersuit · on Dec 29, 2022

Or they talked with Microsoft who refused to add it to anything but Windows 11. Or maybe they can't handle it because of some difficiency in their Linux ports.

The thing is it is possible and Intel knew for years that the E-cores wouldn't have AVX512.

moonchild · on Dec 29, 2022

Let it take years, then. Leaving it disabled in the bios by default is one thing, and a rather reasonable one at that. Fusing it off is quite another.

_kbh_ · on Dec 29, 2022

> And because it's not clear how OS schedulers could nicely switch on-the-fly between cores with different instruction set capabilities, it's essentially not use-able.

Couldn't you implement the new instructions (even slowly) in microcode? so they support the instructions with minimal silicon, even if its as slow or slower then other vector instructions?.

monocasa · on Dec 29, 2022

You generally can't add new huge ISA extensions like that in microcode without hardware support.

_kbh_ · on Dec 29, 2022

Ah yeah when I wrote that I forgot that AVX-512 brought in a tonne of new operations as well as larger register size..

puffoflogic · on Dec 29, 2022

I tell you what, Intel does a lot of work to be #2. Their inevitable demise will be much more painful the longer it is delayed.