Using Thumbnail Montages To Optimize AI-Based OCR Speed & Costs: Part 4

Continuing our experiments with optimizing video OCR costs, what impact does image resolution have on OCR accuracy? Assuming we are taking an input video and extracting still image frames at 1fps and OCRing them one by one, at what point does reducing image resolution negatively impact OCR quality? For typical chyron+crawl channels like CNN, resizing HD footage down to 500×500 pixels yields no measurable reduction in accuracy, while reducing further to 400×400 pixels measurably reduces accuracy. For text saturated business channels, even resizing to 500×500 pixels introduces a significant accuracy reduction, while text-heavy ideographic channels like Taiwanese television exhibit even more substantial accuracy reductions. These results offer a reminder why OCR, out of all forms of visual analysis, requires the highest possible image resolution and reinforces that text-saturated channels and non-English news are especially susceptible to resolution reduction. Interestingly, when presented with the original full resolution video, Cloud Vision's OCR is able to not only accurately recognize vertical ideographic text, but even reconstruct it into inline horizontal text, rather than return it as isolated characters.

We'll use the following workflow:

time ffmpeg -i CNNW_20240903_230000_Erin_Burnett_OutFront.mp4 -vf "fps=1" ./1FPSFRAMES/OUT-%06d.jpg
convert ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-fullres.jpg -resize 500x500 ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-resize500.jpg
convert ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-fullres.jpg -resize 400x400 ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-resize400.jpg
convert ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-fullres.jpg -resize 300x300 ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-resize300.jpg
convert ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-fullres.jpg -resize 200x200 ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-resize200.jpg
convert ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-fullres.jpg -resize 100x100 ./CNNW_20240903_230000_Erin_Burnett_OutFront-1fpsf1-resize100.jpg
time gsutil -m -q cp "./IMAGE.jpg" gs://[YOURBUCKET]/
curl -s -H "Content-Type: application/json; charset=utf-8" -H "x-goog-user-project:[YOURPROJECTID]" -H "Authorization: Bearer $(gcloud auth print-access-token )" https://vision.googleapis.com/v1/images:annotate -d '{ "requests": [ { "image": { "source": { "gcsImageUri": "gs://[YOURBUCKET]/ IMAGE.jpg" } }, "features": [ {"type":"TEXT_DETECTION"} ] } ] }' | jq -r .responses[].fullTextAnnotation.text

 

Let's start with the original full resolution 1280×720 HD frame:

Run through Cloud Vision's OCR, we get the following, which is an effectively flawless transcription of the frame:

Via Skype\nGeneva, Switzerland\n12:59 AM\nNEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCAN\n3:59 PM PT\nAHU AT ODDS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN C SITUATION ROOM

What if we resize down to 500×500 pixels?

No difference in OCR results:

Via Skype\nGeneva, Switzerland\n12:59 AM\nNEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCAN\n3:59 PM PT\nAHU AT ODDS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN C SITUATION ROOM

And 400 pixels?

Now we're starting to get OCR errors, including a strange "1444" appearing and errors like "cuation" instead of "situation":

1444\nVia Skype\nGeneva, Switzerland\n12:59 AM\nNEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCNN\n3:59 PM PT\nAHU AT ODDS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN CUATION ROOM

And 300 pixels?

Here we have even more error:

12:59 AM\nNEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nHUAT CODS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN C\nLIVE\nCAN

And 200 pixels?

Here we lose all of the text other than the main chyron – the entire scroll is gone:

NEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCHN

And finally 100 pixels?

Here the API was unsurprisingly unable to extract any text at all, nor was it even able to recognize that the image contained text of any kind – though a human would be hard pressed to be able to read any of the underlying text.

How about a text-heavy business channel frame?

The original full-resolution 1280×720 HD frame yields a nearly flawless transcript, though the closeness of some words leads them to be merged together by the API:

Top News\nBloomberg Television\nCHINA'S AUGUST PMI\nGAUGE\nACTUAL\nSURVEY\nManufacturing\n49.1\n49.5\nNon-Manufacturing\n50.3\n50.1\nCaixin Manufacturing\n50.4\n50.0\nCHINA GROWTH HEADWINDS MOUNT AS\nFACTORY AND HOUSING DATA WORSEN\nAustralia PM's Approval Drops as Voters Face\nCost-of-Living Pain\nPage 2 Next In 0:04\n➤ On a two-party preferred basis, Labor is tied with the center-\nright Liberal National opposition at 50-50, a result that if\nreplicated at an election would likely force one of them to\ngovern with the support of minority parties.\n► Albanese's faltering approval ratings come as Reserve Bank\nofficials make clear that the key rate is likely to remain at\na 12-year high of 4.35% for the rest of this year.\nToday's Most Read Next In 0:16\nCrypto Firm OKX Hires Gracie Lin\nFrom Grab to Be Singapore CEO\nAustralia PM's Approval Drops as\nVoters Face Cost-of-Living Pain\nValue Partners Founder to Quit as\nChinese Broker Asserts Control\nNikkei May Add Ryohin Keikaku, Cut\nNippon Paper, Analysts Say\nNew World Shares Drop 14% After\nFirst Loss Warning in 20 Years\nTata's Fast Fashion Giant Is\nDefying India's Consumer Slowdown\nGold Steadies Aheadof US Jobs Data\nThat May Shape Fed Rate Path\nLater Today \"The Pulse\"\nUrsula Marchioni\nBlackRock International\nLimited, Mng Dir/Head:Investment & Portfolio SolutionsEMEA\n4AM NY | 9AM UK | 4PM HK\nCSI300\n3283.71\n-37.72 1.14%\nSENSEX (C)\n82365.77\n+231.16 0.28%\nTAIEX\n22266.62\n-1.47 0.01%\nTOPIX\n2712.56\n-0.07 0%\nNZX 50\n12471.61\n+23.93 0.19%\nSTI\n3455.08\n+12.15 0.35%\n11:01ET SEP 1, 2024

What about a 500×500 resized version?

 

Unlike our CNN example, the much denser amount of text and more complex layout does lead to substantial changes:

Top News\nBloomberg Television\nCHINA'S AUGUST PMI\nGAUGE\nACTUAL\nSURVEY\nManufacturing\n491\n49.5\nNon-Manufacturing\n50.3\n501\nCaixin Manufacturing\n50.4\n50.0\nCHINA GROWTH HEADWINDS MOUNT AS\nFACTORY AND HOUSING DATA WORSEN\nAustralia PM's Approval Drops as Voters Face\nCost-of-Living Pain\nPage 2 Next in 0.04\nOn a two-party preferred basis, Labor is tied with the center-\nright Liberal National opposition at 50-50, a result that if\nreplicated at an election would likely forceone of them to\ngovern with the support of minority parties.\nAlbanese's faltering approval ratings come as Reserve Bank\nofficials make clear that the key rate is likely to remain at\na 12-year high of 4.35% for the rest of this year.\nToday's Most Read Next in 0.16\nCrypto Firm OKX Hires Gracie Lin\nFrom Grab to Be Singapore CEO\nAustraliaPM's Approval Drops as\nVoters Face Cost-of-Living Pain\nValue Partners Founder to Quit as\nChinese Broker Asserts Control\nNikkei May Add Ryohin Keikaku, Cut\nNippon Paper, Analysts Say\nNew World Shares Drop 14% After\nFirst Loss Warning in 20 Years\nTata's Fast Fashion Giant Is\nDefying India's Consumer Slowdown\nCS1300\n3283.71\n-37.72 114%\nSENSEX (C)\n82365.77\n+231.16 0.28%\nTAIEX\n22266.62\n-1.47 0.01%\nTOPIX\n2712.56\n-0.07 0%\nGold Steadies Ahead of US Jobs Data NZX 50\nThat May Shape Fed Rate Path\nLater Today \"The Pulse\"\nUrsula Marchioni\nBlackRock International\nLimited Mng Di Head Investment & Portfola Solutions EMEA\n4AM NY 9AM UK 4PM HK\n12471.61\n+23.93 0.19%\nSTI\n3455.08\n+12.15 0.35%\n1101ET SEP 12024

We can see there are 15 removals and 13 additions in total when we resize:

What about a 400×400 resized version?

We can already see a number of differences:

Top News\nBloomberg Television\nCHINA'S AUGUST PM\nACTUAL\nSURVEY\nManufacturing\n431\n49.5\nNon-Manufacturing\n50:3\n501\nCaixin Manufacturing\n504\n500\nFACTORY AND HOUSING DATA WORSEN\nAustralia PM's Approval Drops as Voters Face\nCost-of-Living Pain\nPage 2 Next in 0.04\nOn a two-party preferred basis, Labor is tied with the center.\nright Liberal National opposition at 50-50, a result that if\nreplicated at an election would likely force one of them to\ngovern with the support ofminority parties.\nAlbanese's faltering approval ratings come as Reserve Bank\nofficials make clear that the key rate is likely to remain at\na 12-year high of 4.35% for the rest of this year.\nToday's Most Read Next in\nCrypto Firm OKX Hires Gracie Lin\nFrom Grab to Be Singapore CEO\nAustralia PM's Approval Drops as\nVoters Face Cost of Living Pain\nValue Partners Founder to Quitas\nChinese Broker Asserts Control\nNikkei May Add Ryohin Keikaku Cut\nNippon Paper, Analysts Say\nNew World Shares Drop 14% After\nFirst Loss Warning in 20 Years\nTata's Fast Fashion Giant Is\nDefying India's Consumer Slowdown\nGold Steadies Ahead of US Jobs Data\nThat May Shape Fed Rate Path\nLater Today \"The Pulse\nUrsula Marchioni\nBlackRock International\nUnited Mng De Headies&P SMA\n4AM NY 9AM UK 4PM K\nCSI 300\n3283.71\n-37.72 114%\nSENSEX(C)\n82365.77\n+231.16 0.20%\nTAIEX\n22266.62\n-147 0.01%\nTOPIX\n2712.56\n-0.07 0%\nNZX 50\n12471.61\n+23.93 0.19%\nSTI\n3455.08\n+12.15 0.35%\nOET SEP 2004

Though, surprisingly, other than a few missing words, the transcript is actually quite similar:

And finally, what about a non-English broadcast? The full resolution 1272×716 HD frame (note that it is a slightly non-standard resolution):

This yields the following. Note how Cloud Vision remarkably reassembles the vertical text into spans of horizontal text rather than returning it as sequences of individual isolated unrelated characters:

傷大\n在\n吳念庭 Nienting Wu\n5小時。\n6 台視新聞 HD\n02:45:07\n最新\n傷勢狀況目前都還好,雖然無法一一回覆,但都有限\n大家的關心訊息。\n其他球員\n再次感謝外界的關心人\n稍早22:59開完\n新 柯京華城貪案羈押庭 書\n開押\n完庭\n翻攝 吳念庭臉書\n台北\n25-32\n臉書發文報平安 吳念庭呼籲:別針對其他球\nTTVNEWS 警民口角\"閃燈\"查酒駕致恐慌症發作 控警執法過當

And a 500×500 resized version?

This yields the following, which differs substantially from the above, even in the very first few characters:

台視新聞 HD\n吳念庭 Nienting Wu\n5小時,\n02:45:07\n最新\n【稍柯\n傷勢狀況目前都還好,雖然無法一一回覆,但都有一款\n大家的關心訊息。\n在\n其他球員\n再次感謝外界的關心\n-稍早22:59開完 臉\n柯京華城貪案羈押庭\n台北\n翻攝 吳念庭臉書\n3. 臉書發文報平安 吳念庭呼籲:別針對其他球員\n25-32\nTTVNEWS 警民口角\"閃燈\"查酒駕致恐慌症發作 控警執法過當