GEN4: Building A Complete Near-Realtime Live Stream Video Analytics Platform In The Cloud In Just A Few Lines Of Code

Given the growing use of live streaming video across the world, from speeches by heads of state to news programming, how might we build processing pipelines that can ingest and process such content continuously 24/7? Today we're going to look at a simple pipeline using youtube-dl and ffmpeg that performs the ingest and how with just a few additional steps you can build a complete analytics platform for video live streams in the cloud!

Historically, ingesting, processing and analyzing video live streams required extensive infrastructure. Today, through the power of incredible open source tools like youtube-dl and ffmpeg and the power of the cloud, we can build a complete live stream analytics platform with just a few lines of code.

To ingest a video live stream, all you need is the URL of the page containing the live stream (ideally the "embed" version of the live stream if you can find that, though for many platforms you can just use the overall hosting page). We use youtube-dl to ingest it in an endless stream and pipe the output to ffmpeg to shard into self-contained individually playable MP4 files.

First, we install the tools we need:

apt-get -y install ffmpeg
apt-get -y install youtube-dl
pip install --upgrade youtube_dl

We run the upgrade process for youtube-dl, since the base installation that comes with most Linux OS' is outdated. The extractor plugins for youtube-dl change continuously in response to changes on the underlying websites, so you'll want to update it regularly on your machine.

Believe it or not, the entire process of ingesting the live stream, formatting it into MP4 and outputting to an individual sequence of timestamped MP4 files, one per minute, is accomplished with a single line:

youtube-dl -q -f best -o - "https://videohostingsite/pagecontainingthevideo" | ffmpeg -hide_banner -loglevel error -i - -c copy -flags +global_header -f segment -segment_time 60s -segment_atclocktime 1 -strftime 1 -segment_format_options movflags=+faststart -reset_timestamps 1 VIDEO_%Y%m%d%H%M00.mp4&

That's literally all the code you need! After a few seconds of initial negotiation, you will begin to see files like "VIDEO_20221108030800.mp4", VIDEO_20221108030900.mp4", VIDEO_20221108031000.mp4" appear in the directory as it shards the output to a new file every minute, resulting in an endless stream of one-minute MP4 files, rotating precisely every minute.

While each one-minute file is being streamed to disk, it cannot be read since the moov atom containing the mvhd header and trak track atoms are not written until the end (and then are posthumously moved to the front via the faststart flag). This means that we can't "see" the ingested video until the end of each minute, imposing substantial latency. In some applications this may not be problematic and having a smaller number of sharded files (1,440 one-minute MP4 files per day) may be ideal. In applications where latency is more important, you could simply stream the output of ffmpeg directly to your downstream processor or change "-segment_time 60s" to "-segment_time 1s" and "VIDEO_%Y%m%d%H%M00.mp4" to "VIDEO_%Y%m%d%H%M%S.mp4" to shard into one-second files (though this can increase the risk of brittleness if the disk cannot sustain the additional IO and inode load).

Even capturing 1080p30 live streaming video, this consumes less than 0.05% CPU on a 1-core N1 VM, meaning as many of these can be run on a single machine as the underlying disk supports (or more ideally you would simply run them all on RAM disk in the VM).

By default this downloads the highest-quality (highest resolution and/or bitrate) version of the video. To conserve disk space, you can select a lower-resolution format by running:

youtube-dl --list-formats "https://videohostingsite/pagecontainingthevideo"

And then changing "-f best" to "-f X" where X is the desired format listed by youtube-dl.

Some live streams may be restricted to specific geographic regions. This can be satisfied by running the ingest VM in a GCP region within the required geographic area or, in some cases, via VPN or running the ingest command on a remote hosted node that then proxies the data back to GCS either via a proxy or via gsutil running locally on the remote node.

A simple script can be written in any desired language that runs via cronjob every X seconds and uploads the previous X files to GCS via gsutil. For example, in the example above, a simple script runs via cron every 60 seconds and copies the file from 2 minutes ago (since the previous minute's file will still be wrapping up with the moov move to the front) to GCS via gsutil and then deletes the file.

Once in GCS, any process on any GCE VM can access and process the files. For example, in the pipeline above, a process can run via cronjob on a larger VM (including in a different region to take advantage of region-specific accelerators) and concatenate the one-minute MP4 files into larger monotonic 30-minute MP4 files that are more amenable to certain kinds of analyses. Alternatively, the one-minute files can be run through ASR, video analytic models and other kinds of AI models running on VMs all across the world and hosted APIs to analyze the files and stream their results back into GCS, into BigQuery, into the Timeseries Insights API or other tools for downstream analytics.

In other words, simply by changing the segmentation time you can adjust the tradeoff between the number of files and latency and any amount of downstream processing can be performed simply by connecting the GCS path via PubSub or fixed cron.

For example, using the example ingest pipeline above, a simple Perl script can be run via cron every minute to copy the file from 2 minutes ago into GCS. Separately, a 2-core VM with an attached V100 GPU in another region runs a process via cron every minute that takes the file from 3 minutes ago (to afford headroom for the copy to/from GCS) and runs it through a series of AI models and writes their outputs to GCS, with notification to BigQuery to load the results into a table for further analysis. With just a few lines of code

Thus, with just a few lines of code you can ingest and process any video livestream in near-realtime!