As we continue ramping up our experiments applying GCP's Cloud Vision API OCR to television news, it never ceases to amaze us the fascinating findings and capabilities it reveals. For example, below is a frame from a Thai-language television news broadcast, featuring a speaker seated in front of a projection screen displaying a slide. The broadcast is more than a decade old and was originally broadcast in SD 640×480 resolution, meaning it is extremely low resolution for the purposes of OCR. The image quality is also subpar, with blurring, saturation and other color artifacts and heavy JPEG compression block artifacts making the text hard to read – not to mention the fact that the text is at a significant angle to the screen and partially obscured by the speaker.
Remarkably, despite all of these challenges, Cloud Vision manages to transcribe a useable portion of the text, demonstrating just how powerful classical AI OCR is:
รวมสิรินธรเพื่อการฟื้นฟูสมรรถภา cing Rehabilitation Service ระหว่างวันที่ ๒๐ มีนาคม - แรมเซ็นทราศูนย์ราชการและ
What if we ask ChatGPT 4o to transcribe the text?
OCR the text in this image. Do not use Python. Transcribe the text exactly as-is in the image.
This yields the false hallucinated transcription of the following – once again demonstrating why LMM OCR is unusable at present for production OCR tasks:
"บริการฟื้นฟูสมรรถภาพ" "Vocational Training Rehabilitation Services"