Experiments With Whisper ASR: Model Parameters & Non-Determinism: temperature_increment_on_fallback

Across our experiments with OpenAI's Whisper ASR this week, its unprecedented fluency has been challenged by its high non-determinism, with results varying substantially every time it runs and often interrupted by high levels of dropouts, repetition and hallucination.

Whisper's creators acknowledge these issues and in a previous Q&A noted that non-determinism "happens when the model is unsure about the output (according to the compression_ratio_threshold and logprob_threshold settings). The most common failure mode is that it falls into a repeat loop, where it likely triggers the compression_ratio_threshold. The default setting tries temperatures 0, 0.2, 0.4, 0.6, 0.8, 1.0 until it gives up, at which it is less likely to be in a repeat loop but is also less likely to be correct.

Their recommended solution was to add "–temperature_increment_on_fallback None" as a CLI parameter to stabilize the output. To test its efficacy, we ran all four models in both transcription and translation tasks, with the results below. While it did eliminate non-determinism, it came at the cost of eliminating more than half the broadcast, replacing it with repeated text.