Television Explorer: Near-Realtime Updates & Trending Analytics

We're excited to announce two major updates to the Television Explorer that debuted over this past weekend: realtime updates and our new trending analytics dashboard.

REALTIME UPDATES

Since its debut this past December, the Television Explorer has been configured to update just once a day (around 5AM UTC), with a rolling embargo window of 48 hours. That meant that if there was a major breaking news story this morning, you wouldn't be able to explore how the various television networks covered the story until two days from now.

Given that the rest of GDELT updates every 15 minutes (with  GDELT 3 speeding that up to every 60 seconds), this has created an analytic tension in which our television tools have been more useful for historical research than understanding contemporary events.

Thus it is with great excitement that we are able to announce that as of this past weekend the Television Explorer now updates every 15 minutes, just like the rest of the GDELT system. The one caveat is that due to the immense computational power required for all of the processing that the Internet Archive performs on each television show, it typically takes 2-12 hours from the end of a show until it is available to the Television Explorer for indexing. Some particularly lengthy CSPAN programs can take even longer to be ready for indexing.

This means that while the Television Explorer now updates every 15 minutes, it is important to understand that the most recent 24 hours reflects an incomplete view of monitored shows for that period. A new warning has been added to the results page to remind you when your results period includes the most recent 24 hours. What this means in practice is that you should use the most recent 24 hours only to get a general gist of evolving coverage, while restricting your actual analytic window to end prior to the most recent 24 hours.

As always, remember that by default the Television Network dropdown is set to "National Networks" meaning you are searching just the six national networks monitored by the Internet Archive. You can switch to Affiliate Networks to search the various regional affiliates of ABC, CBS, NBC, PBS, etc.

There is also a new "Combined All Networks" option that searches across all US English language television coverage monitored by the Internet Archive since 2009, allowing you to look across both national and regional affiliate stations at once. This is especially useful for breaking news events where you want to see all available coverage across all stations to more fully understand the evolving story. Given that the Internet Archive only monitored many of the affiliate stations for select periods during the 2016 US presidential campaign, some of the 2009-present trends you will see in this graph reflect more of the Archive's changing source list than actual trends, so this option is most useful when used to view just the Last 3 Months or Last 72 Hours time options.

We've heard loud and clear from so many of you how important realtime updates are for your work and we're excited to see how you're able to use this new capability!

TRENDING ANALYTICS DASHBOARD

To go with our new realtime updates we've added a new series of trending analytics mini dashboards to the Explorer's front page. These analytics are updated every 15 minutes and display the top and trending topics overall and by station and the top trending phrases. At this time these analytics are computed only for the six national networks monitored by the Internet Archive (Bloomberg, CNBC, CNN, Fox Business, Fox News and MSNBC), while a few of the analyses add the London edition of BBC News.

We currently compute two major types of annotations for each broadcast on the networks above:

  • Topics. The Internet Archive annotates each television broadcast with a list of major topics it discusses by running the show's closed captioning transcript through an adaption of the Stanford Named Entity Recognizer. Unfortunately at this time the Archive does not have a system in place that is able to identify and remove advertisements, meaning the extracted topics reflect a combination of genuine news coverage and paid advertisements. Thankfully most advertisements are not closed captioned and thus are invisible to the topic tagger, but occasionally you will see a new topic (such as a brand of toothpaste or a prescription drug) pop out of nowhere to become a top trending topic – this can happen when there is a flood of advertisements for that product being shown intensely across multiple stations over several hours, with almost no advertisements for that product during the previous 24 hours – in such a case the topic list will correctly show that this topic is flooding the airwaves, since it cannot distinguish between news content and advertisements.
  • Phrases. In addition, GDELT breaks each transcript into complete sentences and computes the universe of complete 4-grams over those sentences. It then strips common stop words from the start and end of each 4-gram (converting "the white house is" to just "white house") and drops single-word ngrams to generate a final list of the top phrases 2-4 words in length that appeared in the broadcast. These reflect a combination of topics, entities, quotes and memes and offer a high-resolution look at the precise phrasing popular in that broadcast.

We then use the annotations above to compute several analytic mini dashboards:

  • Trending Topics. This computes a histogram of all topics assigned to all broadcasts on the six national networks over the last 24 hours (computed as a rolling window to the present moment) and computes the same histogram for the preceding 24 hours. The preceding 24 hour histogram is subtracted from the current 24 hour rolling histogram to generate a list of topics that were mentioned more in national broadcasts over the last 24 hours than they were in the preceding 24 hours. This offers a powerful window in what the national television networks are focusing on right now. Note that you may see odd topics on this list and person names are common, especially those of reporters or news anchors who work only on certain days of the week. In addition, remember that the most recent 24 hours is always incomplete as it takes 2-12 hours for a show to be processed and ready for indexing. Thus, there can be a substantial lag of several hours before a breaking news event makes it into this display.
  • Trending Topics By Station. This is identical to the overall Trending Topics display, but produces the equivalent analysis for each station and adds BBC News. Thus, for CNN, the system computes a histogram of the top topics on all CNN broadcasts over the last 24 hours and compares that against the histogram of top topics of all CNN broadcasts from the preceding 24 hours to identify the top topics being focused on by CNN more now than yesterday.
  • Top Topics By Station. This is similar to the trending displays, but here it just computes the histogram of the top topics mentioned on all broadcasts on each network over the past 24 hours and does not compare that list to the previous 24 hours. This list will frequently be very similar to the Trending Topics By Station list, but with notable differences in long-standing topics. For example, if CNN is discussing allegations of Russian election influence heavily today, but also focused on the topic heavily yesterday as well, the Top Topics By Station display will mention Russia, while the Trending Topics By Station display above will likely exclude Russia, since it is not a trending/rising topic, but rather a sustained interest topic. In this way you can think of Trending Topics as surfacing topics that from an absolute standpoint may not be the most talked about topic on the station, but which is getting considerably more attention than it did yesterday, while Top Topics reflects the top focal points of the station's coverage, even if those were also the same top focal points from yesterday.
  • Top Phrases. Similar to the overall Trending Topics display, this dashboard uses the Phrases data from above and computes a histogram of the top 2-4 word phrases appearing in all broadcasts from the six national networks over the past 24 hours and compares it with the same histogram from the preceding 24 hours. The top phrases which appeared more frequently over the last 24 hours compared with the previous 24 hours are then displayed. This display is in some ways more powerful than the Topics displays in that it reflects a combination of the top trending people, organizations, locations, objects, activities, memes and quoted phrases.

We hope these new analytic dashboards offer a powerful new way of looking at the news, especially around the exploration of agenda setting and how the major networks differ in the topics and events they focus on.