Vulkan has a subset of the features that Metal has.
Metal uses a device programming language that is C++ based, has pointers... and uses AIR (an LLVM IR dialect) as the bytecode.
Vulkan would be a big downgrade in feature levels from that.
While it is very much possible (and done) to have a compliant OpenCL implementation over Metal or D3D12, such a thing isn't possible at all for Vulkan.
Metal is also a subset of Vulkan. As Raph Levien points out here [1], there are certain important algorithms (decoupled lookback) that can't be implemented at all on Metal.
> One controversial aspect of the original decoupled look-back algorithm is that it depends a forward progress guarantee from the GPU
Apple GPUs aren't very amenable to implementing recursion or giving forward progress guarantees within a SIMD group by design. I'll have to check if a device-wide barrier type even exists on that TBDR...
It isn't like NVIDIA hardware (since Volta) where you have a separate instruction pointer for each (SIMT-but-not-quite) thread.
Neither is a subset, they're mostly overlapping sets. For instance Metal has features for tile-based rendering Vulkan doesn't, at least as of a couple years ago.
While this might work on your specific device, it is an undocumented interface and there is no guarantee at all for future compatibility.
If you do get a definitive answer from Apple you can share, please follow up here, as I would like to be able to cite it. I would be quite shocked if it's different than what I just said, though.
it's what is used for OpenCL on Metal (which is the impl present on M1) to provide the semantics there. AIR is a stable, forward compatible bytecode. Will ask and see what Apple says...
edit: thinking about this, OpenCL doesn't actually need those semantics either
The Open CL 1.2 barrier() function is threadgroup scope, same as threadgroup_barrier on Metal. OpenCL 2.0 introduced a proper barrier function (work_group_barrier), which takes a memory scope parameter, which can be memory_scope_device (all this is pretty similar to the Vulkan memory model, and at least some of the same people worked on both). I know of no way to reliably support those semantics on Metal.
Claiming OpenCL bitcode is "forward compatible" is a pretty strong claim considering that OpenCL has been deprecated for over 3 years, and the main thing you get when searching OpenCL docs on the Apple site is an exhortation to migrate to Metal. To the extent there's a forward compatibility guarantee for AIR, I'm sure it only applies to output generated by official Apple tools, and I'm pretty sure by now there's no way to get those to output a device-scope barrier.
"Subset" is not the right word here. Vulkan has pointers now, but (as was discussed in a recent thread), there are serious limitations compared with "real" C++. At the same time, Metal has its own limitations, not least of which it's lacking acquire/release semantics on atomics and a device-scoped barrier.
OpenCL is a little strange because older variants don't have advanced atomics (or subgroups), but does have pointers. I'd be curious to know what specific thing is not available on DX12 and Metal but missing in Vulkan, especially because I'm not aware of any DX12 feature on the critical path for OpenCL that's missing from Vulkan (at least as an extension).
For OpenCL on DX12, the test suite doesn't pass yet. Every Khronos OpenCL 1.2 CTS test passes on at least one hardware driver, but there's none that pass them all. That is why CLon12 isn't submitted to Khronos's compliant products list yet.
From the GPU compute perspective:
Vulkan has a subset of the features that Metal has.
Metal uses a device programming language that is C++ based, has pointers... and uses AIR (an LLVM IR dialect) as the bytecode.
Vulkan would be a big downgrade in feature levels from that.
While it is very much possible (and done) to have a compliant OpenCL implementation over Metal or D3D12, such a thing isn't possible at all for Vulkan.