As we explore how deep learning video understanding systems like Google's Cloud Video API can help us reimagine how machines might help us make sense of the visual world of television news, an interesting example emerges from a July 14, 2010 ABC World News with Diane Sawyer broadcast. A portion of the broadcast focused on the failed Times Square bomber, including an excerpt of an Umar Media broadcast with Arabic scrolling text at the bottom of the screen, an example of which can be seen in the header of this post.
Google's Cloud Video API seamlessly OCR'd both the English and Arabic text frame-by-frame, yielding results like "العربية Umar Media abc NEWS .com العربية 10:08 طلاب أميركيين محتجزين لديها" and "لعربية Umar Media abc NEWS .com مقتل 5 جنود من قوات حلف شمالی". The API was never told to expect Arabic text in the broadcast, it just automatically identified that a portion of the broadcast contained onscreen text in Arabic and transparently OCR'd it alongside the English text, allowing fully multilingual text searchability.
The Video API's OCR engine supports the full range of languages supported by Cloud Vision, which covers an incredible range of languages.