The GDELT Project

Using Thumbnail Montages To Optimize AI-Based OCR Speed & Costs: Part 6 – Further Grid Layout Experiments

Yesterday's experiments on how montage layouts impact OCR accuracy suggested that there were few differences between vertical, horizontal and grid-based montage layouts. At the same time, the underlying test was a relatively trivial one, with a small amount of onscreen text in a fairly basic layout. What if we try a more complex layout, with four frames: one a standard chyron+crawl, one with crawl+set signage, one a text-saturated business image and the final a text-heavy Taiwanese broadcast with both horizontal and vertical text and overlapping text. Once again, we'll test in 1×9 vertical, 2×2 grid and 9×1 horizontal layouts and compare their OCR results to stress test the impact of layout on OCR results.

In all three layouts, Cloud Vision correctly transcribes all of the text as individual words or character sequences with their respective bounding boxes. Only in the vertical layout, however, does it correctly reassemble all of those chunks back into cohesive frame-specific text, rather than intermixing text from unrelated frames. Custom post processing logic could recreate Cloud Vision's reassembly logic to correct this, but would have to take into account the complexities of differentiating between space-segmented and scriptio continua languages. In the end, a single-column vertical layout yields the best overall configuration.

Here we can see the four images in the three layouts and their resulting Cloud Vision OCR results:

The 1×9 vertical montage yields:

Bloomberg Television\nCHINA'S AUGUST PMI\nGAUGE\nACTUAL SURVEY\nManufacturing\n49.1\n49.5\nNon-Manufacturing\n50.3\n50.1\nCaixin Manufacturing\n50.4\n50.0\nCHINA GROWTH HEADWINDS MOUNT AS\nFACTORY AND HOUSING DATA WORSEN\nAustralia PM's Approval Drops as Voters Face\nCost-of-Living Pain\nPage 2 Next In 0:04\n➤ On a two-party preferred basis, Labor is tied with the center-\nright Liberal National opposition at 50-50, a result that if\nreplicated at an election would likely force one of them to\ngovern with the support of minority parties.\n► Albanese's faltering approval ratings come as Reserve Bank\nofficials make clear that the key rate is likely to remain at\na 12-year high of 4.35% for the rest of this year.\nToday's Most Read Next In 0:16\nCrypto Firm OKX Hires Gracie Lin\nFrom Grab to Be Singapore CEO\nAustralia PM's Approval Drops as\nVoters Face Cost-of-Living Pain\nValue Partners Founder to Quit as\nChinese Broker Asserts Control\nNikkei May Add RyohinKeikaku, Cut\nNippon Paper, Analysts Say\nNew World Shares Drop 14% After\nFirst Loss Warning in 20 Years\nTata's Fast Fashion Giant Is\nDefying India's Consumer Slowdown\nGold Steadies Ahead of US Jobs Data\nThat May Shape Fed Rate Path\nLater Today \"The Pulse\"\nUrsula Marchioni\nBlackRock International\nLimited, Mng Dir/Head:Investment & Portfolio Solutions EMEA\n4AM NY 9AM UK | 4PM HK\nCSI 300\n3283.71\n-37.72 1.14%\nSENSEX (C)\n82365.77\n+231.16 0.28%\nTAIEX\n22266.62\n-1.47 0.01%\nTOPIX\n2712.56\n-0.07 0%\nNZX 50\n12471.61\n+23.93 0.19%\nSTI\n3455.08\n+12.150.35%\n11:01 ET SEP 1, 2024\nTop News

FRENCH PAINTIN\nVia Skype\nGeneva, Switzerland\n12:59 AM\nNEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCNN\n3:59 PM PT\nAHU AT ODDS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN C SITUATION ROOM

CM WASHINGTON\nTHE WITH\nSITUATION WOLF\nITZER\nLIVE\nCAN\nNAS 577.33\nCHAYES SONG \"HOLD ON, I'M COMING.\" THE DECISION CAME AFTER THE LATE R&B SI SITUATION ROOM

A 台視新聞HD\n吳念庭 Nienting Wu\n02:45:07.\n5小時\n1\n傷大\n大\n傷勢狀況目前都還好,雖然無法一一回覆,但都有\n大家的關心訊息。\n在\n其他球員\n再次感謝外界的關心人\n7最 稍早22:59開完\n完庭\n最新\n新 柯京華城貪案羈押庭 書\n翻攝 吳念庭臉書\n台北\n雅典\n20. 臉書發文報平安 吳念庭呼籲:別針對其他球員\n25-32\nTTVNEWS 警民口角\"閃燈\"查酒駕致恐慌症發作 控警執法過當

The 2×2:

Top News\nBloomberg Television\nCHINA'S AUGUST PMI\nGAUGE\nACTUAL\nSURVEY\nManufacturing\n49.1\n49.5\nNon-Manufacturing\n50.3\n50.1\nCaixin Manufacturing\n50.4\n50.0\nCHINA GROWTH HEADWINDS MOUNT AS\nFACTORY AND HOUSING DATA WORSEN\nToday's Most Read Next In 0:16\nCrypto Firm OKX Hires Gracie Lin\nFrom Grab to Be Singapore CEO\nAustralia PM's Approval Drops as\nVoters Face Cost-of-Living Pain\nValuePartners Founder to Quit as\nChinese Broker Asserts Control\nNikkei May Add Ryohin Keikaku, Cut\nNippon Paper, Analysts Say\nNew World Shares Drop 14% After\nFirst Loss Warning in 20 Years\nTata's Fast Fashion Giant Is\nDefying India's Consumer Slowdown\nCSI 300\n3283.71\n-37.72 1.14%\nSENSEX (C)\n82365.77\n+231.16 0.28%\nTAIEX\n22266.62\n-1.47 0.01%\nTOPIX\n2712.56\n-0.07 0%\nAustralia PM's Approval Drops as Voters Face\nCost-of-Living Pain\nPage 2 Next In 0:04\n▸ On a two-party preferred basis, Labor is tied with the center-\nright Liberal National opposition at 50-50, a result that if\nreplicated at an election would likely force one of them to\ngovern with the support of minority parties.\n► Albanese's faltering approval ratings come as Reserve Bank\nofficials make clear that the key rate is likely to remain at\na 12-year high of 4.35% for the rest of this year.\nGold Steadies Ahead of US Jobs Data\nThat May Shape Fed Rate Path\nNZX 50\n12471.61\n+23.93 0.19%\nLater Today \"The Pulse\"\nUrsula Marchioni\nBlackRock International\nLimited, Mng Dir/Head:Investment & Portfolio Solutions EMEA\n4AM NY 9AM UK | 4PM HK\nSTI\n3455.08\n+12.15 0.35%\n11:01 ET SEP 1, 2024\n

CNN WASHINGTON\nFRENCH PAINTIN\nVia Skype\nGeneva, Switzerland\n12:59 AM\nNEW TONIGHT\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCNN\n3:59 PM PT\nAHU AT ODDS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN C SITUATION ROOM\n

台視新聞 HD\n吳念庭 Nienting Wu\n02:45:07.\n5小時。\n大\n傷勢狀況目前都還好,雖然無法一一回覆,但都有限\n大家的關心訊息。\n在\n其他球員\n再次感謝外界的關心人\n新 柯京華城貪案羈押庭\n開押\n完庭\n最新\n稍柯

THE WITH\nSITUATION WOLF\nITZER\n翻攝 吳念庭臉書\nLIVE\n台北\n雅典\n25-32\n臉書發文報平安 吳念庭呼籲:別針對其他球員\nCAN\nNAS 577.33\nCHAYES SONG \"HOLD ON, I'M COMING.\" THE DECISION CAME AFTER THE LATE R&B SI SITUATION ROOM\nTTVNEWS 警民口角\"閃燈\"查酒駕致恐慌症發作 控警執法過當

The 9×1 horizontal montage:

Top News\nBloomberg Television\nCHINA'S AUGUST PMI\nGAUGE\nManufacturing\nNon-Manufacturing\nACTUAL\nSURVEY\n49.1\n49.5\n50.3\n50.1\nCaixin Manufacturing\n50.4\n50.0\nCHINA GROWTH HEADWINDS MOUNT AS\nFACTORY AND HOUSING DATA WORSEN\nToday's Most Read Next In 0:16\nCrypto Firm OKX Hires Gracie Lin\nFrom Grab to Be Singapore CEO\nAustralia PM's Approval Drops as\nVoters Face Cost-of-Living Pain\nValuePartners Founder to Quit as\nChinese Broker Asserts Control\nNikkei May Add Ryohin Keikaku, Cut\nNippon Paper, Analysts Say\nNew World Shares Drop 14% After\nFirst Loss Warning in 20 Years\nTata's Fast Fashion Giant Is\nDefying India's Consumer Slowdown\nCSI 300\n3283.71\n-37.72 1.14%\nSENSEX (C)\n82365.77\n+231.16 0.28%\nTAIEX\n22266.62\n-1.47 0.01%\nTOPIX\n2712.56\n-0.07 0%\nVia Skype\nGeneva, Switzerland\n12:59 AM\nCM WASHINGTON\nGold Steadies Ahead of US Jobs Data\nThat May Shape Fed Rate Path\nNZX 50\n12471.61\n+23.93 0.19%\nNEW TONIGHT\nAustralia PM's Approval Drops as Voters Face\nCost-of-Living Pain\nPage 2 Next In 0:04\n➤ On a two-party preferred basis, Labor is tied with the center-\nright Liberal National opposition at 50-50, a result that if\nreplicated at an election would likely force one of them to\ngovern with the support of minority parties.\n► Albanese's faltering approval ratings come as Reserve Bank\nofficials make clear that the key rate is likely to remain at\na 12-year high of 4.35% for the rest of this year.\nLater Today \"The Pulse\"\nUrsula Marchioni\nSTI\n3455.08\nBlackRock International\nLimited, Mng Dir/Head:Investment & Portfolio Solutions EMEA\n4AM NY 9AM UK | 4PM HK\n+12.15 0.35%\n11:01 ET SEP 1, 2024

FRENCH PAINTIN\nWORLD HEALTH ORGANIZATION SAYS GAZA POLIO\nVACCINATION CAMPAIGN IS AHEAD OF TARGETS\nLIVE\nCAN\n3:59 PM PT\nAHU AT ODDS AGAIN AFTER US PRESIDENT SAYS ISRAELI PM NOT DOING ENOUGH IN C SITUATION ROOM

THE WITH\nSITUATION WOLF\nUTZER\nLIVE\n台北\nCAN\nNAS 577.33\nCHAYES SONG \"HOLD ON, I'M COMING.\" THE DECISION CAME AFTER THE LATE R&B SI SITUATION ROOM

開押\n完庭\n翻攝 吳念庭臉書\n2. 臉書發文報平安 吳念庭呼籲:別針對其他球員\n25-32\nTTVNEWS 警民口角\"閃燈\"查酒駕致恐慌症發作 控警執法過當\n傷大\n在\n吳念庭 Nienting Wu\n5小時。\n1%\n6 台視新聞HD\n02:45:07.\n傷勢狀況目前都還好,雖然無法一一回覆,但都有!\n大家的關心訊息。\n其他球員\n再次感謝外界的關心人\n最新\n最 稍早22:59開完\n新 柯京華城貪案羈押庭

You can see noticeable differences in the three OCR transcriptions, especially around intermixing of text from different frames into the same output, such as how the 2×2 grid blends the CNN Wolf Blitzer and Taiwanese frames together: "THE WITH\nSITUATION WOLF\nITZER\n翻攝 吳念庭臉書\nLIVE\n台北\n雅典\n25-32\n臉書發文報平安 吳念庭呼籲:別針對其他球員\nCAN\nNAS 577.33\nCHAYES SONG \"HOLD ON, I'M COMING.\" THE DECISION CAME AFTER THE LATE R&B SI SITUATION ROOM\nTTVNEWS 警民口角\"閃燈\"查酒駕致恐慌症發作 控警執法過當".

In reality, Cloud Vision transcribes images as individual character sequences that can be a single character on up to an entire word, along with a surrounding bounding box. The API helpfully offers an overall processed concatenation of these text chunks in which it uses the bounding boxes to reassemble them into a single paragraph of text under the assumption that the entire image represents a single underlying image, rather than a montage like we are using. Thus, the API actually correctly transcribes each of the CNN Wolf Blitzer and Taiwanese broadcast text chunks as individual annotations – it is only the summary text blob that incorrectly blends them together. A production OCR pipeline would ignore the summary blob and assemble the text based on these individual character sequence annotations, but this adds substantial complications in that words in English and other space-segmented languages should be assembled with spaces between them, while scriptio continua language blobs should be concatenated together without spaces. Such a pipeline would have to recreate all of this logic.

In contrast, the vertical montage makes this process far easier: the downstream reconstruction logic need only follow the summary blob word-by-word and chop after each carriage return if the text has passed the Y axis into the next frame.