We're tremendously excited to announce today the new Top Trending Topics system of the Television Explorer. Each morning the system scans the full raw closed captioning of all monitored national stations and computes a massive ngram table over all of that material. After more than a week of intensive benchmarking, we've settled on 4-grams (four word ngrams) as our window size at this time, though we go a step further and remove common stop words from the start and end of each phrase. We then compare this ngram table to the one from the day before and generate a list of the top phrases that appeared more commonly in broadcasts on the most recent monitoring day compared with the day before.
The final result is a list of the top phrases that were mentioned significantly more often than the day before. This list is displayed only when you browse to the Television Explorer without running a search, offering you a set of potential starting points to explore the archive. Once you run a search (or click on any of the trending topics to run a search), the list is hidden to save screen space.
Of course, keep in mind that there is a rolling 48 hour embargo on the television content monitored by the Internet Archive's Television News Archive, so these phrases always represent the phrases that were trending two days ago. Also note that some advertisements are closed captioned and a large sudden ad buy for a given day can result in phrases from that popular ad trending.
While this first incarnation is quite primitive, we are extremely excited about the potential of this new system to help surface the stories dominating the airwaves and over the coming months we will be exploring a number of vastly more powerful approaches to peering more deeply into the trends and narratives of television news!