Using Google Cloud’s Natural Language AI and its new Timeseries Insights API, we can can perform realtime entity-level trend detection over global news media, looking for rapidly emerging stories. One of the most basic decisions is the time horizon over which to examine for anomalies. For example, when the API is asked to evaluate a one hour period from 4 to 5PM EST today compared with the past 72 hours, it does not return the Super Bowl as an anomaly, despite its massively surging media coverage. At first glance this might seem surprising, but the reason is simple.
Looking at the timeline below, mentions have been soaring day over day linearly since February 7th. Thus, while the Super Bowl was mentioned quite a few times from 4 to 5PM today, it has been surging linearly steadily over the entire comparison window of 72 hours (in fact, over the past 6 days), so compared to the past 72 hours, the large number of mentions from 4 to 5PM are not anomalous.
Had the comparison period been the past week compared with the past several months, the Super Bowl would have been returned as highly anomalous, but given the specific analysis period of 4 to 5PM today compared with the last three days, it was anything but unusual.
In contrast, the API returns that "Dancing on Ice" (identified by the Natural Language API as an "event") was a highly anomalous entity over the same time period. You can see from the timeline below that it had only sporadic mentions here and there until noon today, at which time it soared through a peak from 4 to 5PM. Thus, compared with the previous three days, the vertical increase in mentions of Dancing on Ice from 4 to 5PM today were a substantial departure from their sporadic mentions over the previous three days.
This demonstrates the power of the time horizon over which the API is asked to look. By adjusting the window and comparison periods, you can tune the kinds of anomalies you look for, from brief intense bursts through subtle longer-term departures from the norm.