As we continue to look at new ways to segment television shows into their component stories, its worth noting the tremendous power of combining camera shot changes with onscreen OCR. Google's Cloud Video AI API segments a show into a list of specific "shots" in which the camera and scene are extremely similar, flagging each time there is a scene transition. By themselves, scene transitions do not necessarily imply a new "story" since a given story, such as a global update about Covid-19, may be made up of a multitude of shots strung together. However, given the transition traditions of television evening news, it is likely that all story transitions will be marked by shot changes, even if not every shot change indicates a story change.
Similarly, the onscreen text will likely be similar from shot to shot if the story is the same and differ if a shot change introduces a new story.
To test this theory, it turns out we can do all of this in a single SQL query in BigQuery. The query below takes ABC World News Tonight With David Muir ABC on February 1, 2020 4:00pm-4:30pm PST and filters it down from a 1,800 second block of airtime to just the 464 shot changes (most of which occur during rapid-fire periods like quick-paced previews and commercial breaks). For each second of airtime that contained a shot change, it records the complete OCR'd onscreen text during that second of airtime and compares it with the OCR'd text from the first second of the previous shot. It then computes the Levenshtein edit distance of the two strings.
All of this is accomplished with a single query ("fhoffa.x.levenshtein" persistent UDF courtesy of the incredible Felipe Hoffa):
select showOffset, fhoffa.x.levenshtein(LastOCRText, OCRText) EditDist, LastOCRText, OCRText from ( SELECT showOffset, LAG(OCRText) OVER (PARTITION BY iaShowId order by showOffset asc) LastOCRText, OCRText FROM `gdelt-bq.gdeltv2.vgegv2_iatv` WHERE DATE(date) = "2020-02-02" and iaShowId='KGO_20200202_000000_ABC_World_News_Tonight_With_David_Muir' and numShotChanges>1 order by date asc )
This yields this spreadsheet. We've gone in and manually annotated where each story begins so you can see where it lines up.
In reality, Levenshtein edit distance is not the best comparison mechanism, since it penalizes cases like "02.01.20 NEW CORONAVIRUS CASE" and "02.01.20 NEW CORONAVIRUS CASE WORLD NEWS TONIGHT" where one is a subset of the other, since that is what it is designed for. A better approach would be to break each string into words and compare the number of similar words, accounting for minimal substrings and so on. More advanced models could be implemented as UDFs or use BigQuery's built-in advanced machine learning capabilities.
Similarly, taking the onscreen text of the second when a transition occurred and comparing it to the onscreen text of the first second of the previous shot is not ideal, since a lot may have happened during that shot. A better approach would be to compare each second of airtime to the second before it and only examine the similarities for seconds that are shot changes. This would require simply dropping the "where numShotChanges>1" from the query above.
In spite of these limitations, look more closely at the spreadsheet above and its trivial as a human to spot the story transitions. In fact, using this spreadsheet, it takes less than a minute to segment the entire broadcast into its component stories, suggesting that simply combining shot changes and OCR could be enough to dramatically accelerate human broadcast annotation!