Using Google's Cloud Inference API & Cloud Natural Language API To Create A 'Trending Topics' Timeline

Google’s Cloud Inference API offers an incredibly powerful lens through which to explore the innate hidden patterns of vast archives of documents and images annotated by Google’s Cloud AI APIs. Annotating hundreds of millions of textual articles or images through Google’s Cloud Natural Language or Cloud Vision APIs yields billions of seemingly disparate datapoints that Cloud Inference can readily bring together to unveil the hidden patterns within, interactively exploring their underlying trends in realtime.

Last month we showcased how 11 billion entity annotations stored in BigQuery could be seamlessly imported into the Inference API using its native BigQuery support and explored interactively. A common theme of those explorations was the idea of taking an existing topic of interest, such as a public figure like Robert Mueller or a company like SpaceX and using the Inference API to interactively explore the hidden patterns in the topics they are most associated with.

In short, we showed how to use the Inference API for guided interactive exploration where the subjects of interest are known and the Inference API is used to gain new understanding of their context as seen through billions of temporal relationships.

Instead of using the Inference API to explore known topics, what if we used it to surface the most significant topics over time, creating what amounts to a fully automated “trending topics” timeline service?

GDELT’s Global Entity Graph consists of more than 11 billion entity annotations compiled by Google’s Cloud Natural Language API from a small random sample of 100 million English-language global news articles 2016-2019. What might it look like to use the Inference API to iterate over a day of that graph, asking it to surface the most significant entities every 15 minutes throughout the day?

On April 15, 2019, the Notre Dame cathedral was heavily damaged by a sudden fire. What would that day have looked like through the eyes of the Inference API running every 15 minutes to surface the trends of the last quarter-hour?

The query below uses the special TED value to instruct the Inference API to examine a particular day, as expressed in days since January 1, 1970 (in this case “18001” indicates that April 15, 2019 is 18,001 days after January 1, 1970). The “restrictStartTime” and “restrictEndTime” parameters further narrow the query down to a precise 15-minute increment.

time curl -s -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  https://infer.googleapis.com/v1/projects/[YOURPROJECTID]/datasets/gdeltbq_geg_v1:query \
  -d'{
  "name": "gdeltbq_geg_v1",
  "queries": [{
    "query": {
      "type": "TYPE_AND",
        "children": [
          {
            "type": "TYPE_TERM",
            "term": {
              "name": "ted",
              "value": "18001"
            }
          },
        ],
     },
    "distributionConfigs": {
      "dataName": "EntityLOCATION",
      "maxResultEntries": 5,
      "bgprobExp": 0.3
    },
    "restrictStartTime": "2019-04-15T19:15:00+00:00",
    "restrictEndTime": "2019-04-15T19:29:59+00:00"
  }]
}' > RESULTS

In less than a second, the query above returns that the top five most significant locations in this 15-minute period were the Seine, Ile de la Cite, U.S., Notre Dame and Paris, whereas querying for the 15 minutes immediately prior yields U.S., New York, Weches, Texas and Ohio, showing that something major happened in this 15 minutes.

Using a simple PERL script to iterate over the entire day in 15-minute increments, the Inference API was used to surface the most significant moments of the day every 15 minutes. Here is the actual chronology of top locations during that period immediately before and after the Notre Dame fire:

4/15/2019 18:00 U.S., New York, Washington, Michigan, UK
4/15/2019 18:15 U.S., New York, Washington, Europe, UK
4/15/2019 18:30 U.S., New York, California, Shelby, Boston
4/15/2019 18:45 U.S., New York, Washington, Israel, Paris
4/15/2019 19:00 U.S., New York, Weches, Texas, Ohio
4/15/2019 19:15 Seine, Ile de la Cite, U.S., Notre Dame, Paris
4/15/2019 19:30 Ile de la Cite, U.S., Seine, Notre Dame, New York
4/15/2019 19:45 Seine, U.S., Notre Dame, Neka Brod, Washington
…
4/16/2019 23:15 U.S., Notre Dame, California, Grand Chute, New York
4/16/2019 23:30 U.S., Notre Dame, New York, France, Paris
4/16/2019 23:45 U.S., Notre Dame, California, New York, France
4/17/2019 0:00 Notre Dame, U.S., Columbine High School, New York, Hawksbill Crag

Without any additional knowledge of the day’s events, it is clear that something major happened at Notre Dame and was trending globally by 19:15 UTC (8:15PM Paris time). Note that since the GDELT Global Entity Graph is a small daily sample of GDELT, there can be up to an hour between the time GDELT sees an article and when it is finally submitted to the Cloud Natural Language API for processing, hence the hour-long delay between the first glimmers of the fire and it trending in the GEG dataset, as well as reflecting the delay it took for news media around the world to begin covering the fire in depth.

The Notre Dame fire continued trending through the rest of the day and through the entirety of the following day, illustrating its global significance.

By midnight UTC on the 17th (7PM EST), Columbine High School was also trending, as an armed teenager allegedly obsessed with the shootings from two decades ago that had sparked a massive manhunt was found dead. The Notre Dame cathedral remained in the trending locations list for much of April 17th, but faded through the day as other stories leapt into the forefront.

More local stories like an art exhibition at the Boca Raton Museum of Art and a large amount of broken glass found on the beach on South Manitou Island each briefly trend in the dataset as they received bursts of coverage, while earlier in the week, Mojave Air and Space Port trended for nearly two hours as the Stratolaunch took its maiden voyage.

Using the same query as above, but replacing “EntityLOCATION” with “EntityPERSON” yields the following table of the most significant people trending in the hours before and after the Paris fire:

4/15/2019 18:00 Bryan Boganowski, Bernie Sanders, Chad Day, Malcolm Stewart, Matthew Schweich
4/15/2019 18:15 Julie Ingelfinger, Christopher Ali, Bernie Sanders, Donald Trump, Nancy Pelosi
4/15/2019 18:30 Mossimo Giannulli, Donald Trump, Elijah Cummings, African-American, Olivia Jade Giannulli
4/15/2019 18:45 Christopher Ali, Goran Djupsund, Olivia Jade Giannulli, Tia Boatman Patterson, Lori Loughlin
4/15/2019 19:00 Victor Hugo, Anne Hidalgo, Donald Trump, Emmanuel Macron, Larry Gerbrandt
4/15/2019 19:15 Anne Hidalgo, William Matthewman, Cindy Yang, Robert Adler, Charles Lee
4/15/2019 19:30 Anne Hidalgo, Camille Pascal, Emmanuel Macron, Zakher, Donald Trump
4/15/2019 19:45 Camille Pascal, Anne Hidalgo, Emmanuel Macron, Victor Hugo, Donald Trump

Suddenly around 8PM Paris time, famed French writer Victor Hugo, French President Emmanuel Macron and the Mayor of Paris Anne Hidalgo all trend in unison, adding French writer and historian Camille Pascal by 8:30PM Paris time.

Combined with the trending locations, it is clear that something enormous had occurred in Paris involving the Notre Dame cathedral that was significant enough to warrant the mentioning of a cross-section of France’s political and scholarly leaders.

Similarly, throughout the early hours of the morning of April 18, William Barr and Robert Mueller trended sporadically, but abruptly at noon UTC (7AM EST) both names, along with many others related to the Mueller Report began trending solidly for hours, reflecting the public release of the Mueller Report. Without any other knowledge of events that day, the Inference API would have allowed someone to instantly determine that a major development had just occurred relating to the Mueller Report.

4/18/2019 11:00 Jean-Marc Fournier, Victor Hugo, Bach, William Barr, Shannon McNamara
4/18/2019 11:15 Price, James P. Clarke, Scott Thumma, Mark Chaves, Nancy Ammerman
4/18/2019 11:30 Victor Hugo, Bach, Emmanuel Macron, God, Jesus
4/18/2019 11:45 Peter Coy, Donald Trump, William Barr, David Lauren, Robert Mueller
4/18/2019 12:00 William Barr, Rod Rosenstein, Jerrold Nadler, Jean-Marc Fournier, Robert Mueller
4/18/2019 12:15 William Barr, Evatec, Robert Mueller, Doug Collins, Donald Trump
4/18/2019 12:30 Lorna Catling, Craig Gralley, Scott Thumma, Mark Chaves, Robert Mueller
4/18/2019 12:45 William Barr, Special Counsel, Rod Rosenstein, Donald Trump, Jean-Marc Fournier
4/18/2019 13:00 William Barr, Robert Mueller, Nancy Pelosi, Scott Thumma, Mark Chaves
4/18/2019 13:15 Nick Bidlack, Kristie Franz, William Barr, Jean-Marc Fournier, Jay Reeves
4/18/2019 13:30 William Barr, Donald Papcsy, Robert Mueller, Chuck Schumer, Nancy Pelosi
4/18/2019 13:45 Jean-Marc Fournier, Bach, Victor Hugo, Emmanuel Macron, Donald Trump
4/18/2019 14:00 William Barr, Jerrold Nadler, Robert Mueller, Nancy Pelosi, Rod Rosenstein
4/18/2019 14:15 William Barr, Nancy Pelosi, Jerrold Nadler, Rod Rosenstein, Chuck Schumer
4/18/2019 14:30 William Barr, Rod Rosenstein, Robert Mueller, Nancy Pelosi, Chuck Schumer
4/18/2019 14:45 William Barr, Captain America: The Winter Soldier, Marcela Isaza, Shang-Chi, Stephen McFeely
4/18/2019 15:00 William Barr, Rod Rosenstein, Robert Mueller, Jerrold Nadler, Nancy Pelosi
4/18/2019 15:15 William Barr, Rod Rosenstein, Annie Markowitz, Mark Reinfeld, Robert Mueller
4/18/2019 15:30 William Barr, Jerrold Nadler, Jason Boardman, Rod Rosenstein, Ruth Loos
4/18/2019 15:45 William Barr, Robert Mueller, Rod Rosenstein, Chuck Schumer, Donald Trump
4/18/2019 16:00 William Barr, Rod Rosenstein, Robert Mueller, Jerrold Nadler, Hillary Clinton
4/18/2019 16:15 William Barr, Robert Mueller, Jerrold Nadler, Rod Rosenstein, Chuck Schumer
4/18/2019 16:30 William Barr, Robert Mueller, Rod Rosenstein, Jerrold Nadler, Jean-Marc Fournier
4/18/2019 16:45 William Barr, Tom Blanton, Patrick Chauvet, Robert Mueller, Tim Weiner
4/18/2019 17:00 William Barr, Jean-Marc Fournier, Robert Mueller, Olivier Nusse, Pierre Cochereau
4/18/2019 17:15 William Barr, Jerrold Nadler, Robert Mueller, Louisiana, Rod Rosenstein

While the Inference API’s true power lies in its ability to enable realtime sub-second interactive pattern exploration of vast temporal archives, the examples above demonstrate that it can also serve as a powerful and tunable “trending topics” service right out of the box without any additional coding.

An organization wishing to triage a realtime firehose of arriving documents could feed them all through Cloud Natural Language API and load the annotations in realtime into the Inference API, running the query above every few minutes to surface the specific topics and entities trending in their firehose, all by just connecting Google’s APIs together.

In the end, we see once again just how powerful the combination of Google’s Cloud AI APIs and its Inference API are for enabling realtime unbounded exploration of the world around us.