The GDELT Project

New Television News Completion File

As the Television Explorer is increasingly used to understand the reaction to breaking news and evolving narratives, the unpredictable nature of its updates can make it difficult to estimate the precise cutoff for each station, meaning the point in time where all of the content aired prior to that moment has been processed and is now searchable. Ideally, the Internet Archive's Television News Archive uses a rolling 24-hour cutoff where content becomes searchable approximately 24 hours after it has aired. In practice, various factors such as server load and video resolution mean shows can sometimes take up to 72 hours to finish processing, especially longer multi-hour shows. While these cases are rare, they make it difficult to determine whether a given station is fully "caught up" to 24 hours ago.

To make it easier for users to know the status of the Television Explorer and other TV-related services, we are debuting today the new Television News Completion File, which is a simple ASCII file that tracks the processing status of each of the stations the Television News Archive currently monitors over the last four days in 30 minute increments and the status of that given time slot, making it trivial to see which stations still have shows being processed and which are updated to precisely 24 hours ago.

The file format is very simple, containing status entries for each half hour block for each currently monitored station over the last four days. It is a standard ASCII file with four columns (note there is no header row in the actual file):

The file is updated every 30 minutes around the clock and uses the HTTP header "Cache-Control: private" to prevent caching so that downloads should always yield the latest copy. Users should download it at 15 minutes after the hour and 45 minutes after the hour to get the latest version.

For most use cases, users should scan the file backwards for each station of interest, taking the slot that is 24 hours prior and scanning backwards until at least 6 hours worth of slots have either "DONE" or "DONE-NOCONTENT" in them with no empty slots between them. If there are a few empty earlier slots for that station, users can decide whether to work backwards until there are no empty slots or just accept a missing slot here or there in order to be able to analyze events as close to 24 hours ago as possible.

VIEW THE COMPLETION FILE