How have television news outlets covered the price of gasoline over the past decade?
Traditionally, one would turn to the TV Explorer which keyword searches the station-provided human-generated closed captioning. For CNN, MSNBC and Fox News the results for "("gas prices" OR "gas price" OR "price of gas")" look like this:
Turning to the closed captioning of ABC, CBS and NBC evening news broadcasts since 2010 the results can be seen below (NOTE this graph begins in July 2010 versus June 2009 like the one above). The two spikes in April 2011 and March 2012 can be clearly seen, though on evening news programs, both received equal attention, whereas on CNN, MSNBC and Fox, the 2012 spike is double that of 2011.
How closely does the machine-generated transcript created by Google's Cloud Video API match the human-typed transcript? The timeline below shows those results, which are almost perfectly matched, with the few exceptions largely being uncaptioned commercials or places where the human captioners missed words. Thus, at least for evening news broadcasts over the past decade, it is clear that the machine-generated transcripts appear to yield identical results to human transcriptionists.
The most interesting finding, however, comes from comparing the OCR results with those of the spoken word transcripts. The timeline below shows the total number of seconds of airtime each month that mentioned the keywords "("gas prices" OR "gas price" OR "price of gas")". This looks very different from the captioning results, with the 2011 and 2012 captioning spikes disappearing and a fairly steady stream of mentions over the past decade, with the largest spikes actually occurring last year. Interestingly, there have been no mentions since December of last year. It is important to note that this graph reflects only the literal text of the three keyphrases appearing anywhere onscreen. An infographic with an arrow pointing upwards and just the numbers "$3.29" won't be counted below, whereas in a spoken word transcript the newsreader would likely have mentioned "gas prices" somewhere in their narration.
What does the enormous difference between the OCR and captioning results mean? This is a fascinating question and shows just how different the signals can be across the various modalities of television news, with the OCR results showing us that explicit onscreen textual mentions of gas prices are a steady feature of television news of the past decade, whereas the spoken word narration of the news has emphasized them rarely.