The SRT closed captioning format is a wide-used file format for storing the machine-readable closed captioning transcripts of videos. A variety of tools exist to extract captioning from MPEG formats (such as cable and OTA feeds) and ASR-derived or human-provided streaming video (such as applying yt-dlp to YouTube videos). For example, our "Television News Monitoring For All" workflow we unveiled this past August can readily download automatically-generated SRT captioning files from many YouTube videos. Any captioning format (such as TTXT) can be readily converted to SRT.
To make it easier to experiment with using such content with LLMs, we have released a simple Perl script that accepts any SRT captioning file and runs it through Google's 32K PaLM model for summarization and can easily be adapted to any prompt or LLM vendor of your choice. We hope this script makes it easier to perform at-scale LLM experiments:
For instructions on how to use it, see the technical details at the bottom of today's Russian TV news experiment.