Experiments With Whisper ASR: No Speedup On V100 GPU From #370

Kalev Leetaru

3 years ago

As we continue our ASR experiments with OpenAI's Whisper ASR, we've examined a recent patch that restores proper attention caching that some users have found yields up to a 30% speedup on select GPU hardware. Switching back-and-forth between the patch and the original code repeatedly across a range of broadcasts on a V100 GPU on GCE, we found no measurable change in inference time, suggesting the observed performance increases from this patch may be limited to consumer GPU hardware.