Is insanely-fast-whisper fast enough to actually run on the CPU and still trascribe in realtime? I see that none of these are running quantized models, it's still fp16. Seems like there's more speed left to be found.
Edit: I see it doesn't yet support CPU inference, should be interesting once it's added.
Insanely fast whisper is mainly taking advantage of a GPU’s parallelization capabilities by increasing the batch size from 1 to N. I doubt it would meaningfully improve CPU performance unless you’re finding that running whisper sequentially is leaving a lot of your CPU cores idle/underutilized. It may be more complicated if you have a matrix co-processor available, I’m really not sure.
Edit: I see it doesn't yet support CPU inference, should be interesting once it's added.