Recognizing Text In Television News: OCR In The Video Era

The exponential advances in machine vision of the past half-decade have entirely transformed the accuracy and capabilities of optical text recognition (OCR), in which machines identify text in both still and motion imagery and convert it to searchable textual characters.

Take the example frame below from ABC World News With Diane Sawyer on August 10, 2010 at 5:35PM PST from the Internet Archive's Television News Archive, of green text over a yellow-and-white crosshatched background, with half of the text over wavy red parallel lines that bleed through the text, complete with an overall color saturation that blurs it all together.

Despite the almost worst-case scenario for text recovery, Google's Cloud Video API recognized this text without a single error, correctly transcribing "RATE OF RECOVERY "MORE MODEST" THAN ANTICIPATED". It even correctly identified the "abc" text in the ABC logo at lower-right.

Even more powerfully, since the Video API operates at the resolution of each individual frame, it not only identifies the precise temporal boundaries of the text in nanoseconds, but offers a bounding box around the text that tracks its zooming movement on screen, annotating not just the text on the screen, but the fact that that text was zoomed into.