Yesterday we showed how Google's new Timeseries Insights API could be used to identify the first glimmers of tomorrow's biggest news stories from across the world. In our examples, we barely scratched the surface of the API's capabilities. One feature that is especially useful for understanding the underlying patterns of a given entity is the "returnTimeseries" option. If enabled, under each anomaly's entry in the results the API will include its complete timeseries, including how many matching events per time interval in which it saw that entity and how many events per time interval it projected it would see the timeseries in forecast period. Taking our final Notre Dame query, we only have to add "returnTimeseries: true" to the end of the query to enable this feature:
time curl -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth print-access-token)" https://timeseriesinsights.googleapis.com/v1/projects/[YOURPROJECTID]/datasets/webnlp-201904:query -d '{ dimensionNames: ["EntityLOCATION"], testedInterval: { startTime: "2019-04-15T00:00:00Z", length: "86400s" }, forecastParams: { holdout: 10, minDensity: 5, forecastHistory: "1209600s", maxPositiveRelativeChange: 1.5, maxNegativeRelativeChange: 1, forecastExtraWeight: 100, seasonalityHint: "DAILY", }, returnNonAnomalies: false, returnTimeseries: true }' > RESULTS.TXT
The result for Ile de la Cite becomes the following:
"anomalies": [ { "dimensions": [ { "name": "EntityLOCATION", "stringVal": "Ile de la Cite" } ], "result": { "holdoutErrors": {}, "trainingErrors": { "mdape": 1, "rmd": 1 }, "forecastStats": { "density": "23", "numAnomalies": 1 }, "history": { "point": [ { "time": "2019-04-02T00:00:00Z", "value": 1 }, { "time": "2019-04-14T00:00:00Z", "value": 1 }, { "time": "2019-04-15T00:00:00Z", "value": 440 } ] }, "testedIntervalActual": 440, "testedIntervalForecastLowerBound": -1, "testedIntervalForecastUpperBound": 1, "forecast": { "point": [ { "time": "2019-04-15T00:00:00Z" } ] } }, "status": {} },
Note how in addition to the previously returned fields, it now includes "history" and "forecast"->"point" sections. The "history" section tells us that our dataset contains mentions of Ile de la Cite on April 2, 2019 (one mention), April 14, 2019 (one mention) and April 15 (440 mentions). The "forecast"->"point" section tells us that based on this history the API was expecting zero mentions on April 15 instead of the actual 440 mentions, making it a strong anomaly.
Similarly, here are the results for the Seine River:
{ "dimensions": [ { "name": "EntityLOCATION", "stringVal": "Seine River" } ], "result": { "holdoutErrors": { "mdape": 0.1428571428571429, "rmd": 0.1428571428571429 }, "trainingErrors": { "mdape": 0.84615384615384626, "rmd": 0.62459546925566334 }, "forecastStats": { "density": "85", "numAnomalies": 1 }, "history": { "point": [ { "time": "2019-04-01T00:00:00Z", "value": 1 }, { "time": "2019-04-03T00:00:00Z", "value": 2 }, { "time": "2019-04-04T00:00:00Z", "value": 5 }, { "time": "2019-04-06T00:00:00Z", "value": 3 }, { "time": "2019-04-07T00:00:00Z", "value": 6 }, { "time": "2019-04-08T00:00:00Z", "value": 5 }, { "time": "2019-04-09T00:00:00Z" }, { "time": "2019-04-10T00:00:00Z", "value": 8 }, { "time": "2019-04-11T00:00:00Z", "value": 9 }, { "time": "2019-04-12T00:00:00Z", "value": 16 }, { "time": "2019-04-13T00:00:00Z", "value": 3 }, { "time": "2019-04-14T00:00:00Z", "value": 8 }, { "time": "2019-04-15T00:00:00Z", "value": 586 } ] }, "testedIntervalActual": 586, "testedIntervalForecast": 9.3333333333333339, "testedIntervalForecastLowerBound": 8, "testedIntervalForecastUpperBound": 10.666666666666668, "forecast": { "point": [ { "time": "2019-04-15T00:00:00Z", "value": 9.3333333333333339 } ] } }, "status": {} },
This entity is seen regularly, a handful of times each day on most days in the two weeks leading up to the fire. The "forecast" section tells us that based on this history, the API expected to see 9 mentions on April 15, 2019, while the "history" section tells us that it actually say 586, showing this as a strong anomaly.
Similarly, here are the results for Notre Dame:
{ "dimensions": [ { "name": "EntityLOCATION", "stringVal": "Notre Dame" } ], "result": { "holdoutErrors": { "mdape": 0.42857142857142855, "rmd": 0.42857142857142855 }, "trainingErrors": { "mdape": 0.19999999999999996, "rmd": 0.65055762081784374 }, "forecastStats": { "density": "100", "numAnomalies": 1 }, "history": { "point": [ { "time": "2019-04-01T00:00:00Z", "value": 3 }, { "time": "2019-04-02T00:00:00Z", "value": 1 }, { "time": "2019-04-03T00:00:00Z", "value": 2 }, { "time": "2019-04-04T00:00:00Z", "value": 2 }, { "time": "2019-04-05T00:00:00Z", "value": 4 }, { "time": "2019-04-06T00:00:00Z", "value": 41 }, { "time": "2019-04-07T00:00:00Z", "value": 3 }, { "time": "2019-04-08T00:00:00Z", "value": 5 }, { "time": "2019-04-09T00:00:00Z", "value": 7 }, { "time": "2019-04-10T00:00:00Z", "value": 5 }, { "time": "2019-04-11T00:00:00Z", "value": 8 }, { "time": "2019-04-12T00:00:00Z", "value": 5 }, { "time": "2019-04-13T00:00:00Z", "value": 8 }, { "time": "2019-04-14T00:00:00Z", "value": 10 }, { "time": "2019-04-15T00:00:00Z", "value": 790 } ] }, "testedIntervalActual": 790, "testedIntervalForecast": 7, "testedIntervalForecastLowerBound": 4, "testedIntervalForecastUpperBound": 10, "forecast": { "point": [ { "time": "2019-04-15T00:00:00Z", "value": 7 } ] } }, "status": {} },
Like the Seine River, Notre Dame is mentioned regularly in the news, a few times each day, including a small peak of 41 mentions on April 6, 2019. Based on this history, the API expected to see 7 mentions on April 15, while the history section tells us it was actually mentioned 790 times.
The "returnTimeseries" is a powerful tool for diving more deeply into your timeseries data and understanding the patterns the API sees.