As we continue our ASR experiments with OpenAI's Whisper ASR, we've examined a recent patch that restores proper attention caching that some users have found yields up to a 30% speedup on select GPU hardware. Switching back-and-forth between the patch and the original code repeatedly across a range of broadcasts on a V100 GPU on GCE, we found no measurable change in inference time, suggesting the observed performance increases from this patch may be limited to consumer GPU hardware.
Experiments With Whisper ASR: No Speedup On V100 GPU From #370
