Whisper Vs Chirp: The Hidden GPU Cost Of "Free" AI Models & Why Commercial Hosted Models Can Be Far Cheaper

The rapid proliferation of impressively capable "free" open source AI models with one-click installations and simplified workflows means companies increasingly begin their AI journeys to new capabilities through these no-cost tools. As a way of exploring new capabilities they can be powerful entrance points, but companies often continue their open source journey under the assumption that they also provide much lower long-term cost of ownership over commercially hosted solutions, fixating on the zero licensing cost of open models vs the billing cost of hosted APIs and forget to consider the total cost of ownership (TCO) of open models, especially the costs of all those GPUs, not to mention the opportunity cost of the much lower scalability and speed of hosted models compared with production-grade APIs.

The cloud model upended the notion of computational cost, with CPUs becoming rentable by the second and permitting nearly infinite scalability in what used to be known as "cloud bursting." This has led developers to increasingly fixate on the licensing cost of software and presume that open source or no-charge licensed software is far cheaper to operate because hardware can simply be rented by the second to run it on. Few actually run the numbers to understand the cost of running that "free" software on purchased local or rented cloud hardware. Worse, they run just a few files and extrapolate to their entire collections, arguing that the cost and speed of running 10 files is the same as running 10 million with perfectly linear scaling.

The reality is that GPU costs, both purchased and cloud rented are far from inconsequential at scale. Renting a V100 and associated VM for 30 minutes to run a quick inference test might only cost a few dollars and bolster the argument that a "free" piece of software will save vast amounts of money over a hosted API. On the other hand, purchasing or renting 10,000s of thousands of V100s and their corresponding VMs costs far more than a few dollars that companies often overlook until they begin to scale up.

Few developers consider the opportunity costs of free versus licensed software. Imagine a small collection of files to be run through an AI ASR model. In scenario one, a free tool is run on rented or purchased GPUs and takes 6 months to complete the entire collection. In scenario two, a commercial hosted API is used to run the entire collection in a single afternoon. While the ultimate monetary cost of the two is identical, the former has immense opportunity costs in terms of six months of lost time during which the transcripts could have powered entirely new applications – not to mention the developer and administrator costs of overseeing those workflows and the underlying hardware.

We explored this tradeoff in detail earlier this year by applying two LSM ASR systems, OpenAI's no-cost Whisper on a cloud VM and GCP's commercial hosted USM Chirp, to real-world content. The results were striking: for our own use case, self hosting Whisper would cost $163,680 per year, not counting developer and administrator costs and assuming no model updates of any kind, while Chirp costs just $38,880 and provides transparent access to continual model updates.

This offers a critical reminder that when considering the cost of "free" software, especially GPU-hungry AI tools, the cost of the underlying hardware and supporting personnel and especially the opportunity costs of extended processing times must be factored into the Total Cost of Ownership (TCO) – a previously sacrosanct understanding that appears to be fraying in the era of AI in which it is ever more important.