ROCm multiplies in 4.5ms and the author multiplies in 2.8ms. The naive algorithm...

saagarjha · on March 29, 2025

AMD isn’t losing sleep over the fact that J. Random Blogger is beating their GEMM by 60% on 4096x4096? What universe are you living in? This company is fighting for their life against CUDA and you’re telling me their software stack being so bad it can’t use a third of the hardware on the the first and literally only thing people want it to do is somehow not a problem?

roenxi · on March 29, 2025

The point of a platform is for software engineers to provide key functionality independently. Your issue here is you don't understand why CUDA has been so dominant over the last decade - a ~50% software performance gap isn't that material when hardware capacity doubles every generation. If we've reached the point where J. Random Blogger can solve their own problems then the CUDA moat has quite possibly been broken.

If AMD was only 1 hardware generation behind Nvidia they'd be pretty competitive. People are happy using CPUs with a gap of several generations from the cutting edge. And it isn't even that bad because anyone who particularly cares can optimise their software and avoid using rocBLAS.

MITSardine · on March 29, 2025

Though people may use CPUs several years old, they generally weren't at the moment they were bought, and the decision came from comparing with the competition. This argument of "my software will be faster when computers are faster" does not hold given that the competition is also benefiting from Moore's law. Nothing changes in relative terms, which is what matters, until you actually improve your slow software.

And while a possibility may exist to improve software on the user's end, do people not base their decisions on benchmarks involving existing (not potential) software? They find comparisons using the provided kernels, find AMD to be slower, unaware that they could (maybe, at that) find a 30% speedup to be had. Even if they stumbled on this article, would they trust they could pull it off, or simply go with the GPU that has the best performance with existing libraries?

These are machines sold for crunching numbers, they might as well crunch numbers as best they can...

roenxi · on March 30, 2025

At risk of repeating myself, you're not anywhere close to grappling with how bad the situation has been on AMD cards. If they could consistently half-saturate the hardware they'd have a place in the AI revolution instead of being left out in the cold. The traditional achievement of an AMD card, in practice, is 0% hardware saturation because when they tried to multiply matricies then there was a good chance that the system would crash.

The type of commercial logic you're talking about isn't the important factor in the real world. 50% saturation with the option to fully saturate is amazing by AMDs standards and they have much bigger problems than this affecting people's buying decisions. If they had been able to achieve this standard in 2020 I would still be buying AMD.

latchkey · on March 29, 2025

Follow Anush on Twitter and give him feedback. He's actively listening.

https://x.com/AnushElangovan