At-Scale OCR Of Television News Experiments: OCR'ing 10 Billion Seconds Of Global TV News For Just $47.5K Vs $26.9M

In collaboration with the Internet Archive's Television News Archive, we have successfully OCR'd 4.2 million television news broadcasts from around the world totaling 10.7 billion seconds of airtime (179.5M minutes / 2.99 million hours). Since we are OCR'ing broadcasts using 1fps still frames (extracting one frame every second), this means we have OCR'd 10.9 billion total images through GCP's Cloud Vision API thus far. To dramatically reduce costs and improve performance, instead of OCR'ing each frame a single image at a time, we are consolidating frames into image grid montages that leverage Cloud Vision's enhanced OCR performance to OCR multiple frames in each OCR request. The actual number of images (montage grids) submitted to Cloud Vision totals just 71.7 million, meaning we have OCR'd 10.7 billion images at a cost of OCR'ing just 71.7M images, meaning we are able to pack an average of 150 video frames per montage grid.

Our montage workflow does slightly reduce OCR accuracy by reducing Cloud Vision's ability to recognize small background text, even as it maintains near single-frame performance for the chyrons, "crawls" and other standard onscreen text elements that are the focus of our OCR efforts. In return, we are able to achieve a massive cost reduction over a traditional OCR workflow. While Cloud Video OCR offers frame-level results compared to our 1fps temporal resolution, its cost of $0.15/min (no discount for SD vs HD content) would have resulted in a cost of $26.94M to OCR this content. OCR'ing each frame individually through Cloud Vision costs $1.50/1000 images for the first 5M images, then $0.60/1000 images thereafter, yielding a total cost of $6.47M. It also would have required a Cloud Vision quota allowing for several million QPM, which would have required an astronomical amount of hardware to sustain a steady QPM given average API latencies at those scales, not to mention the question of whether a multi-million QPM would even be achievable for a short-term project.

In contrast, our montaging workflow cost just $47,560 to OCR all 10.7 billion seconds of airtime to date – an incredible cost reduction that made archive-scale OCR over the entire TV News Archive feasible for the first time in its quarter-century history.