GEN4: Realtime Video Ingest & The Power Of Cloud Networking

Powering the new TV Visual Explorer requires ingesting 33 HD resolution channels in near-realtime every day spanning satellite, terrestrial cable, IPTV, IPTV geographic relay and IPTV VPN origin sources from across the world collected by the Internet Archive's Television News Archive. These must be streamed from the Archive through an ingest point for processing by the TV Visual Explorer infrastructure and other services like the TV AI Explorer.

Despite the substantial bandwidth required to ingest 33 HD resolution television channels in near-realtime, our primary ingest point is a single-core GCE VM located geographically proximate to the Archive in the nearest GCE data center to minimize path latency. The VM has only a small boot disk, though it has extended memory to provide enhanced network buffering. All videos are ingested using GCS stream ingest in which CURL is used to fetch the video file, with the output piped directly to GSUTIL to stream directly to GCS. Videos are streamed to a temporary GCS path, then verified via their checksum information and moved to their permanent path, with the final video then announced for all downstream processes.

The ability of a single one-core VM to ingest 33 HD channels each day stands as testimony to the incredible power of cloud networking and the infrastructure tooling that allows a single core (technically just a hyperthread, not a physical core) to ingest that much content each day without issue and how even extremely low-resource solutions can achieve immense results in the cloud.