This is the thing about Nvidia. Even if some hardware beats them in a benchmark,...

jeroenhd · on Dec 13, 2023

The one being benchmarked here is heavily optimised for Apple Silicon. I think there are a few algorithms that Apple uses (like the one tagging faces on iPhones) that are heavily optimised for Apple's own hardware.

I think Apple's API would be as popular as CUDA if you could rent their chips at scale. They're quite efficient machines that don't need a lot of cooling, so I imagine the OPEX of keeping them running 24/7 in big cloud racks would be pretty low if they were optimised for server usage.

Apple seems to focus their efforts on bringing purpose-built LLMs to Apple machines. I can see why it makes sense (just like Google's attempts to bring Tensor cores to mobile) but there's not much practical use in this technology right now. Whisper is the first usable technology like this, but even my Android phone can live translate spoken text into words as an accessibility feature, I don't think Apple can sell Whisper as a product to end users.

jdminhbg · on Dec 13, 2023

> The one being benchmarked here is heavily optimised for Apple Silicon.

I don't think so, in the sense of a hand-optimized CUDA implementation. This just using the MLX API in the same way that you'd use CUDA via PyTorch or something.

mac-mc · on Dec 13, 2023

Apple would need to make rackmount versions of the machines with replaceable storage and maybe RAM and would really need to really beef up their headless management systems of the machines before they start becoming competitive.

Otherwise you need a whole bunch of custom mac mini style racks and management software which really increases costs and lead times. If you don't believe me, look how expensive AWS macOS machines are compared to linux ones with equivalent performance.

poyu · on Dec 13, 2023

They already make rack mount Mac Pros. But yeah they need to up their game on the management software

mac-mc · on Dec 18, 2023

Those are not efficient at all for a data center on size and cost compared to equivalent mac minis or studios on a tray. The rackmount format was made for music production and is very quiet for that reason.

rfoo · on Dec 13, 2023

> but I can't think of a single one for Apple Silicon.

The post here is exactly one for Apple Silicon. It compared a naive implementation in PyTorch which may not even keep 4090 busy (for smaller/not-that-compute-intensive models having the entire computation driven by Python is... limiting, which is partly why torch.compile gives amazing improvements) to a purposedly-optimized one (optimized for both CPU/GPU efficiency) for Apple Silicon one.

brucethemoose2 · on Dec 14, 2023

The pytorch performance is awful though. You'd have to be kinda crazy to not use an optimized implementation.

MBCook · on Dec 13, 2023

I wouldn’t be surprised a $2k top of the line GPU is a match/better than the built in accelerator on a Mac. Even if the Mac was slightly faster you could just stick multiple GPUs in a PC.

To me the news here is how well the Mac runs without needing that additional hardware/large power draw on this benchmark.

NorwegianDude · on Dec 14, 2023

The power draw is not impressive here. Sure, it's low, but if you account for performance/W then the GPU is much more efficient.