Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

ROCm multiplies in 4.5ms and the author multiplies in 2.8ms. The naive algorithm is 136ms. I don't think anyone at AMD is losing sleep over this; for a general purpose library this isn't horrible performance. It could be better, hand optimising to specific conditions often is. But as this blog post shows, optimising kernels is the sort of thing that people can do for fun and post blogs about if they care. They don't need AMD to be involved.

The problem with ROCm isn't that it only half-utilises the hardware, the problem was that someone trying to write this blog post in 2020 would have had (or at least the probability was rather high) a heading somewhere around implementing Kernel 0 talking about how the software crashed or the kernel panicked when they tried to run the benchmarks. That was what happened to me when I tried a conceptually similar exercise. I was wandering around HN posting comments about how there were no articles like this one to be found for AMD hardware and musing whether it was technically possible to do.

This makes me wish I'd bought an RDNA3 card instead of a Nvidia one for my last purchase. Not that I really regret the choice, AMD are going to have to show that they're interested in supporting consumer cards for a little longer to get me to trust them again although they're on the right path.



AMD isn’t losing sleep over the fact that J. Random Blogger is beating their GEMM by 60% on 4096x4096? What universe are you living in? This company is fighting for their life against CUDA and you’re telling me their software stack being so bad it can’t use a third of the hardware on the the first and literally only thing people want it to do is somehow not a problem?


The point of a platform is for software engineers to provide key functionality independently. Your issue here is you don't understand why CUDA has been so dominant over the last decade - a ~50% software performance gap isn't that material when hardware capacity doubles every generation. If we've reached the point where J. Random Blogger can solve their own problems then the CUDA moat has quite possibly been broken.

If AMD was only 1 hardware generation behind Nvidia they'd be pretty competitive. People are happy using CPUs with a gap of several generations from the cutting edge. And it isn't even that bad because anyone who particularly cares can optimise their software and avoid using rocBLAS.


Though people may use CPUs several years old, they generally weren't at the moment they were bought, and the decision came from comparing with the competition. This argument of "my software will be faster when computers are faster" does not hold given that the competition is also benefiting from Moore's law. Nothing changes in relative terms, which is what matters, until you actually improve your slow software.

And while a possibility may exist to improve software on the user's end, do people not base their decisions on benchmarks involving existing (not potential) software? They find comparisons using the provided kernels, find AMD to be slower, unaware that they could (maybe, at that) find a 30% speedup to be had. Even if they stumbled on this article, would they trust they could pull it off, or simply go with the GPU that has the best performance with existing libraries?

These are machines sold for crunching numbers, they might as well crunch numbers as best they can...


At risk of repeating myself, you're not anywhere close to grappling with how bad the situation has been on AMD cards. If they could consistently half-saturate the hardware they'd have a place in the AI revolution instead of being left out in the cold. The traditional achievement of an AMD card, in practice, is 0% hardware saturation because when they tried to multiply matricies then there was a good chance that the system would crash.

The type of commercial logic you're talking about isn't the important factor in the real world. 50% saturation with the option to fully saturate is amazing by AMDs standards and they have much bigger problems than this affecting people's buying decisions. If they had been able to achieve this standard in 2020 I would still be buying AMD.


Follow Anush on Twitter and give him feedback. He's actively listening.

https://x.com/AnushElangovan




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: