Yesterday we explored the challenges of OCR'ing television news, in which a single change to one portion of an image can negatively impact the model's ability to reconstruct the individual text blocks into sequential reading-order text. How do Large Multimodal Models (LMM's) stack up?
The end result is that, like our previous tests, ChatGPT 4o, Gemini Flash and Gemini 1.5 Pro are capable are superior text grouping, accurately merging related OCR blocks in most cases. At the same time, in at least one case, Gemini Flash replicated an identical grouping error to Cloud Vision. Pricing is only slightly more expensive, with Gemini 1.5 Pro costing $1,892 per million frames and Cloud Vision API costing $1,500 per million frames. In many cases, the LMMs yield superior grouping, correctly reconstructing the tabular nature of the underlying frame.
The problem is that the output is so unstable that sometimes it contains a correct rendition of the frame, other times it contains hallucinations and still other times it drops entire swaths of the image's text. Running the same image with the same prompt will just as likely yield a correct transcription as it will exclude much of the frame's text or critical details and/or include transcriptions of text that weren't actually present in the image, with absolutely no indication of which output is the correct one. Once again, we find that LMMs have tremendously potential, but are hamstrung by their instability.
Let's start with the image that the classical OCR model performed well on:
ChatGPT 4o does an excellent job of OCR'ing the frame:
Bloomberg Television India Clears Steps to Curb Trading Rush in Stock Derivatives Page | Next in 0:29 India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. ---------------------------- CRYPTOCURRENCY - CRYP COINS XBT: 62,293.30 (-1,491.79, 2.32%) XET: 2,512.335 (-101.270, 3.93%) XLC: 65.063 (-2.335, 3.53%) XRP: 0.6142 (-0.0096, 1.54%) XBN: 328.09 (-15.85, 4.62%) XDH: 23,406 (-1,384, 5.63%) XEO: 0.4906 (-0.0383, 7.24%) XTH: 18.852 (-0.890, 4.53%) XMR: 148.885 (-5.570, 3.63%) XZC: 27.553 (-0.583, 2.13%) ---------------------------- Later Today "The Close" 3PM NY | 8PM UK | 3AM HK ---------------------------- Indices STOXX50 (C): 4,954.15 (-46.30, 0.93%) 12:29 ET OCT 1, 2024
However, a closer look shows that GPT actually merely wrote Python code to use Tesseract, which happened to work well on this specific frame:
from PIL import Image import pytesseract # Load the image to perform OCR image_path = "/mnt/data/BLOOMBERG_20241001_163000_Bloomberg_Markets-frame-000005.jpg" image = Image.open(image_path) # Perform OCR on the image ocr_text = pytesseract.image_to_string(image) ocr_text >'Bloomberg Television\n\nIndia Clears Steps to Curb Trading Rush in Stock\nDerivatives Page | Next in 0:29\n\n> India\'s securities regulator on Tuesday introduced new steps\nto limit a surge in equity derivatives trading in the nation\nafter growing retail participation took the speculative bets\nto the highest in the world.\n\n> The measures include limiting weekly options contracts to one\nper exchange, upfront collection of options premium from\ntraders, and raising the minimum contract size to at least 1.5\nmillion rupees, according to a circular published on the\nSecurities and Exchange Board of India\'s website.\n\nCRYPTOCURRENCY -CRYP\n\nCOINS\n\nXBT 2202 :\n4017. 282%\n\nXET 2512.335\n101.270 3.93%\n\nxLe 65.063\n2.335 3.53%\n\nXRP 0.6142\n-0.0096 1.54%\n\nXBN 328.09\n15.85 4.62%\n\nXDH 23,406\n-1.384 5.63%\n\nXEO 0.4906\n-0.0383 7.24%\n\nXTH 18.852\n0.890 4.53%\n\nXMR 148.885\n-5.570 3.63%\n\nxze 27.553\n-0.583 2.13%\n\nLater Today\n\n"The Close"\n\n3PM NY | 8PM UK | 3AM HK\n\n[STOXX50(C) |\n4954.15\n-46.30 0.93%\n\n1229ET OCT 1, 2024\n\n'
What if we ask it to perform OCR without using an external library?
OCR this image without using Python.
This yields the even more impressive:
Bloomberg Television Top News India Clears Steps to Curb Trading Rush in Stock Derivatives India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. Cryptocurrency - CRYP COINS Price Change % XBT 62,293.30 -1491.79 2.32% XET 2512.335 -101.270 3.93% XLC 65.083 -2.335 3.53% XRP 0.614 -0.0096 1.54% XBN 328.09 -15.85 4.62% XDH 23.406 -1.384 5.59% XEO 0.4906 -0.0383 7.24% XTH 18.852 -0.890 4.53% XMR 148.885 -5.570 3.62% XZC 27.553 -0.583 2.13% Major Indexes DOW: 42,184.34 (-145.81 / 0.34%) S&P 500: 5,709.94 (-52.54 / 0.91%) NASDAQ: 17,900.39 (-288.78 / 1.59%) SMI (C): 12,086.66 (-82.21 / 0.68%) OMX (C): 2,611.34 (-15.09 / 0.57%) STOXX50 (C): 4,954.15 (-46.30 / 0.93%) Later Today "The Close" 3 PM NY | 8 PM UK | 3 AM HK Time: 12:29 ET, October 1, 2024
In contrast, Gemini 1.5 Flash performs poorly, extracting just a fraction of the text and, worse, hallucinates the text it does output, yielding a rewritten summary instead of the original text:
Here's a summary of the provided text: Headline: India Clears Steps to Curb Trading Rush in Stock Derivatives Summary: India's securities regulator implemented new measures to curb a surge in equity derivatives trading. This action follows a significant increase in retail participation, pushing speculative bets to record highs globally. The measures involve limiting weekly options contracts to a single exchange, requiring upfront payment of premiums, and increasing the minimum contract size to at least 1.5 million rupees. This information is from a circular released by the Securities and Exchange Board of India.
What if we run a second time? Again it focuses on summarization over faithful transcription:
Here's a summary of the provided text: Headline: India Clears Steps to Curb Trading Rush in Stock Derivatives News Article: India's securities regulator implemented new measures to control the surge in equity derivatives trading. This is a response to increased retail participation, pushing speculative bets to record highs. The measures include: Limiting weekly options contracts to a single exchange. Requiring upfront payment of options premiums from traders. Increasing the minimum contract size to at least 1.5 million rupees. This information was released via a circular on the Securities and Exchange Board of India's website. The image also displays various financial market data, including cryptocurrency prices (XBT, XET, XLC, XRP, XBN, XDH, XEO, XTH, XMR, XZC), and major stock market indices (Dow, S&P 500, NASDAQ, SMI, OMX, STOXX50). The bottom right corner indicates the date and time as October 1, 2024. A segment marked "Later Today" with the words "The Close" suggests a news segment about the closing of the market.
What if we try a slightly adjusted prompt?
OCR the following image. Output the text as-is exactly as it appears in the image.
Fascinatingly, despite this image working perfectly under Cloud Vision, here Flash exhibits the same block-merging failure that Cloud Vision demonstrates on the second image:
Top News Bloomberg Television CRYPTOCURRENCY-CRYP DOW COINS 42184.34 -145.81 0.34% XBT 62293.30 -1491.79 2.32% XET 2512.335 S&P 500 -101.270 3.93% XLC 65.063 5709.94 -2.3353.53% -52.54 0.91% XRP 0.6142 -0.0096 1.54% NASDAQ XBN 328.09 17900.39 -15.85 4.62% -288.78 1.59% XDH 23.406 -1.384 5.63% XEO 0.4906 SMI (C) -0.0383 7.24% XTH 18.852 12086.66 -0.890 4.53% -82.21 0.68% XMR 148.885 -5.570 3.63% 27.553 -0.583 2.13% OMX (C) 2611.34 -15.09 0.57% Later Today India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:29 ► India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. ► The measures include limiting weekly options contracts to one exchange, upfront collection of options premium from XZC traders, and raising the minimum contract size to at least 1.5 "The Close" million rupees, according to a circular published on the Securities and Exchange Board of India's website. 3PM NY 8PM UK | 3AM HK 12:29 ET OCT 1, 2024 STOXX50 (C) 4954.15 -46.30 0.93%
What about Gemini 1.5 Pro using the original prompt? It changes the ordering of the text and once again hallucinates a summarized version of the text, rather than transcribing it verbatim:
CRYPTOCURRENCY - CRYP Coin Price Change % Change XBT 62293.30 -1491.79 2.32% XET 2512.335 -101.270 3.93% XLC 65.063 -2.335 3.53% XRP 0.6142 -0.0096 1.54% XBN 328.09 -15.85 4.62% XDH 23.406 -1.384 5.63% XEO 0.4906 -0.0383 7.24% XTH 18.852 -0.890 4.53% XMR 148.885 -5.570 3.63% XZC 27.553 -0.583 2.13% Major Indices DOW: 42184.34 (-145.81, 0.34%) S&P 500: 5709.94 (-52.54, 0.91%) NASDAQ: 17900.39 (-288.78, 1.59%) SMI (C): 12086.66 (-82.21, 0.68%) OMX (C): 2611.34 (-15.09, 0.57%) STOXX50 (C): 4954.15 (-46.30, 0.93%) Top News India Clears Steps to Curb Trading Rush in Stock Derivatives India's securities regulator introduced new steps to limit a surge in equity derivatives trading. Growing retail participation has led to increased speculative bets, reaching the highest levels globally. Measures include limiting weekly options contracts per exchange, upfront collection of options premiums, and raising the minimum contract size to 1.5 million rupees. "The Close" 3 PM NY | 8 PM UK | 3 AM HK 12:29 ET Oct 1, 2024
What if we ask it to reproduce the text as-is?
OCR the following image. Output the text as-is exactly as it appears in the image.
This time we get correct results. The results took 49s to produce and consumed 280 input tokens and produced 570 output tokens. The price to OCR this single image using Gemini 1.5 Pro is $0.00189, compared with $0.0015 for Cloud Vision API. Per million frames, this works out to $1,892 with Gemini 1.5 Pro and $1,500 for Cloud Vision API.
CRYPTOCURRENCY-CRYP COINS XBT 62293.30 -1491.79 2.32% XET 2512.335 -101.270 3.93% XLC 65.063 -2.335 3.53% XRP 0.6142 -0.0096 1.54% XBN 328.09 -15.85 4.62% XDH 23.406 -1.384 5.63% XEO 0.4906 -0.0383 7.24% XTH 18.852 -0.890 4.53% XMR 148.885 -5.570 3.63% XZC 27.553 -0.583 2.13% DOW 42184.34 -145.81 0.34% S&P 500 5709.94 -52.54 0.91% NASDAQ 17900.39 -288.78 1.59% SMI (C) 12086.66 -82.21 0.68% OMX (C) 2611.34 -15.09 0.57% STOXX50 (C) 4954.15 -46.30 0.93% Later Today "The Close" 3PM NY | 8PM UK | 3AM HK 12:29 ET OCT 1, 2024 Top News India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:29 ►India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. ►The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. Bloomberg Television
What about the image that posed problems for our classical OCR model?
ChatGPT 4o yields strong results using the original prompt, though it hallucinates the phrase "Market Overview", which does not appear in the text:
Bloomberg Television India Clears Steps to Curb Trading Rush in Stock Derivatives India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. --------------- CRYPTOCURRENCY - CRYP<GO> XBT: 62293.30 -1491.79 -2.32% XET: 2512.335 -101.270 -3.93% XLC: 65.063 -2.335 -3.53% XRP: 0.6142 -0.0096 -1.54% XBN: 328.00 -15.85 -4.62% XDH: 23.406 -1.384 -5.63% XEO: 0.4906 -0.0383 -7.24% XTH: 18.852 -0.890 -4.53% XMR: 148.885 -5.570 -3.61% XZC: 27.553 -0.583 -2.13% --------------- Markets Overview: DOW: 42183.71 -146.44 -0.35% S&P 500: 5709.86 -52.62 -0.91% NASDAQ: 17900.48 -288.69 -1.59% SMI (C): 12086.66 -82.21 -0.68% OMX (C): 2611.34 -15.09 -0.57% STOXX50 (C): 4954.15 -46.30 -0.93% --------------- Later Today "The Close" 3 PM NY | 8 PM UK | 3 AM HK 12:29 ET OCT 1, 2024
Let's try this again using our new prompt. This eliminates the hallucinated text:
Bloomberg Television Top News India Clears Steps to Curb Trading Rush in Stock Derivatives ► India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. ► The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. CRYPTOCURRENCY - CRYP COINS XBT 62293.30 2.32% XET 2512.335 0.93% XLC 65.063 3.53% XRP 0.6142 1.54% XBN 328.00 4.62% XDH 23.406 5.63% XEO 0.4906 7.24% XTH 18.852 4.53% XMR 148.885 3.61% XZC 27.553 2.13% DOW: 42183.71 -146.44 0.35% S&P 500: 5709.86 -52.62 0.91% NASDAQ: 17900.48 -288.69 1.59% SMI (C): 12086.66 -82.21 0.68% OMX (C): 2611.34 -15.09 0.57% STOXX50 (C): 4954.15 -46.30 0.93% Later Today "The Close" 3PM NY | 8PM UK | 3AM HK 12:29 ET OCT 1, 2024
Let's run a second time with the new prompt to compare. This time, it drops all of the labels for the transcribed crypto numbers and excludes the entire markets column.
Bloomberg Television India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:28 India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. CRYPTOCURRENCY - CRYP 101.270 3.93% 65.063 -2.335 3.53% 0.6142 -0.0096 1.54% 328.00 15.85 4.62% 23,406 1.384 5.63% 0.4906 -0.0383 7.24% 18.852 -0.890 4.53% 148.885 -5.570 3.63% 27.553 70.583 2.13% Later Today "The Close" 3PM NY | 8PM UK | 3AM HK 12:29 ET OCT 1, 2024
What if we run a third time? This time it reverted back to writing Python code and using Tesseract. However, while Cloud Vision was able to successfully recognize all of the text in the image and merely struggled with one sentence being separated from the rest, ChatGPT 4o's use of Tesseract resulted in almost identical results as above. This is certainly suspicious that ChatGPT 4o would yield the same problematic results itself and using Tesseract, with the same identical errors:
Bloomberg Television India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:28 India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. CRYPTOCURRENCY - CRYP 101.270 3.93% 65.063 -2.335 3.53% 0.6142 -0.0096 1.54% 328.00 15.85 4.62% 23,406 1.384 5.63% 0.4906 -0.0383 7.24% 18.852 -0.890 4.53% 148.885 -5.570 3.63% 27.553 70.583 2.13% Later Today "The Close" 3PM NY | 8PM UK | 3AM HK 1229ET OCT 1, 2024
And the Python code it used:
from PIL import Image import pytesseract # Load the image from the file path image_path = '/mnt/data/BLOOMBERG_20241001_163000_Bloomberg_Markets-frame-000006.jpg' image = Image.open(image_path) # Perform OCR on the image extracted_text = pytesseract.image_to_string(image) extracted_text >'Bloomberg Television\n\nIndia Clears Steps to Curb Trading Rush in Stock\nDerivatives Page 1 Next In 0:28\n\n> India\'s securities regulator on Tuesday introduced new steps\nto limit a surge in equity derivatives trading in the nation\nafter growing retail participation took the speculative bets\nto the highest in the world.\n\n> The measures include limiting weekly options contracts to one\nper exchange, upfront collection of options premium from\ntraders, and raising the minimum contract size to at least 1.5\nmillion rupees, according to a circular published on the\nSecurities and Exchange Board of India\'s website.\n\nCRYPTOCURRENCY -CRYP\n=\n\n101.270 3.93%\n\n65.063\n-2.335 3.53%\n\n0.6142\n-0.0096 1.54%\n\n328.00\n15.85 4.62%\n\n23,406\n1.384 5.63%\n0.4906\n-0.0383 7.24%\n18.852\n-0.890 4.53%\n\n148.885\n-5.570 3.63%\n\n27.553\n70.583 2.13%\n\nLater Today\n\n"The Close"\n\n3PM NY | 8PM UK| 3AM HK\n\n1229ET OCT 1,2024\n\n'
Gemini 1.5 Flash, using our new prompt:
OCR the following image. Output the text as-is exactly as it appears in the image.
Yields much better results this time:
Bloomberg Television Bloomber Top News India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:28 ► India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. ► The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. CRYPTOCURRENCY - CRYP COINS XBT 62293.30 -1491.79 2.32% XET 2512.335 -101.270 3.93% XLC 65.063 -2.335 3.53% XRP 0.6142 -0.0096 1.54% XBN 328.09 -15.85 4.62% XDH 23.406 -1.384 5.63% XEO 0.4906 -0.0383 7.24% XTH 18.852 -0.890 4.53% XMR 148.885 -5.570 3.63% XZC 27.553 -0.583 2.13% Later Today "The Close" 3PM NY | 8PM UK | 3AM HK DOW 42183.71 -146.44 0.35% S&P 500 5709.86 -52.62 0.91% NASDAQ 17900.48 -288.69 1.59% SMI (C) 12086.66 -82.21 0.68% OMX (C) 2611.34 -15.09 0.57% STOXX50 (C) 4954.15 -46.30 0.93% 12:29 ET OCT 1, 2024
Using the same prompt, Gemini 1.5 Pro yields strong results as well, though it blends the columns – for example connecting the Dow's 42183.71 to XBT crypto coin:
CRYPTOCURRENCY-CRYP DOW COINS 42183.71 XBT 62293.30 -146.44 0.35% -1491.79 2.32% XET 2512.335 S&P 500 -101.270 3.93% 5709.86 XLC 65.063 -52.62 0.91% -2.335 3.53% XRP 0.6142 NASDAQ -0.0096 1.54% 17900.48 XBN 328.09 -288.69 1.59% -15.85 4.62% XDH 23.406 -1.384 5.63% SMI (C) XEO 0.4906 12086.66 -0.0383 7.24% -82.21 0.68% XTH 18.852 -0.890 4.53% XMR 148.885 OMX (C) -5.570 3.63% 2611.34 XZC 27.553 -15.09 0.57% -0.583 2.13% Later Today India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:28 STOXX50 (C) 4954.15 -46.30 0.93% "The Close" ► India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. ► The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. 3PM NY | 8PM UK | 3AM HK 12:29 ET OCT 1, 2024
Interestingly, however, running a second time in a fresh session, it incorrectly interjects the page number between the title and the text:
CRYPTOCURRENCY-CRYP COINS XBT 62293.30 -1491.79 2.32% XET 2512.335 -101.270 3.93% XLC 65.063 -2.335 3.53% XRP 0.6142 -0.0096 1.54% XBN 328.09 -15.85 4.62% XDH 23.406 -1.384 5.63% XEO 0.4906 -0.0383 7.24% XTH 18.852 -0.890 4.53% XMR 148.885 -5.570 3.63% XZC 27.553 -0.583 2.13% DOW 42183.71 -146.44 0.35% S&P 500 5709.86 -52.62 0.91% NASDAQ 17900.48 -288.69 1.59% SMI (C) 12086.66 -82.21 0.68% OMX (C) 2611.34 -15.09 0.57% STOXX50 (C) 4954.15 -46.30 0.93% India Clears Steps to Curb Trading Rush in Stock Derivatives Page 1 Next In 0:28 ▸India's securities regulator on Tuesday introduced new steps to limit a surge in equity derivatives trading in the nation after growing retail participation took the speculative bets to the highest in the world. ▸The measures include limiting weekly options contracts to one per exchange, upfront collection of options premium from traders, and raising the minimum contract size to at least 1.5 million rupees, according to a circular published on the Securities and Exchange Board of India's website. Later Today "The Close" 3PM NY | 8PM UK | 3AM HK 12:29 ET OCT 1, 2024
In fact, paste the two outputs above into a diff checker and you'll see the massive differences across runs: