The GDELT Project

Comparing Human And Machine Transcripts: A List Of Common Differences

How do the automated television news transcripts generated by Google's Cloud Video API compare with the human-produced captioning viewers see? While we'll be exploring these differences at scale in more detail over the coming weeks, a closer look at NBC Nightly News the evening of March 7, 2011 offers a few hints about the kinds of systematic differences we see at scale.

Common sources of small-scale differences scattered throughout transcripts include:

Perhaps the single greatest source of difference between the machine and human transcripts are the following two classes: