The GDELT Project

Early Warning for Epidemic Outbreaks: GDELT Offers the Earliest Warning of Ebola Outbreak

Last month Harvard’s HealthMap initiative garnered quite a bit of press for detecting early mentions of the current Ebola outbreak in March 2014, "nine days before the World Health Organization formally announced the epidemic.” HealthMap monitored its first mentions of the outbreak on March 14, 2014 and issued its first alert on March 19, 2014. Coverage of this feat focused on how the site “scour[s] tens of thousands of social media sites … infectious-disease physicians’ social networks and other sources,” emphasizing in particular that “informal sources are helping paint a picture of what’s happening.”

Much of the coverage of HealthMap’s success has emphasized that its early warning was due to its monitoring of these informal and social media channels. For example one blog stated “So how did a computer algorithm pick up on the start of the outbreak before the WHO? As it turns out, some of the first health care workers to see Ebola in Guinea regularly blog about their work. As they began to write about treating patients with Ebola-like symptoms, a few people on social media mentioned the blog posts. And it didn't take long for HealthMap to detect these mentions.”

This got us wondering how GDELT faired at detecting the earliest glimmers of the outbreak given that it currently only monitors global news media and does not penetrate the same kinds of social media and specialized medical alert networks that HealthMap monitors. Would just monitoring news media have offered any kind of advanced warning? The answer, it turns out, is that GDELT actually beat HealthMap by a day.

The first formal public international warning of the impending epidemic actually came not from social media, but from traditional news media: an article in Xinhua’s French-language newswire titled “Guinée: une étrange fièvre fait 8 morts à Macenta” published late in the day EST on March 13, 2014. The article reports that “a disease whose nature has not yet been identified has killed 8 people in the prefecture of Macenta in south-eastern Guinea … it manifests itself as a hemorrhagic fever…” In turn, this newswire article was actually simply reporting on a formal announcement made earlier in the day by Dr. Sakoba Keita, director of the Division of Disease Prevention in the Guinea Department of Health, broadcast nationally on state television, that announced both the outbreak of the unknown hemorrhagic fever and the departure of a team of government medical personnel to the area to investigate it in more detail. The Government of Guinea also formally notified the WHO of the outbreak, meaning the WHO was aware of the outbreak from an early stage.

GDELT monitored this earliest mention of the outbreak as it was published, as well as the considerable surge of domestic coverage of the unknown outbreak the following day, on March 14th. Unfortunately, GDELT is currently only able to translate a portion of global news media each day – it does not yet have the resources to translate 100% of all daily global news media, and thus this material was monitored and flagged, but GDELT was unable to fully process the material with its Event and Global Knowledge Graph algorithms to identify the hemorrhagic references and send a formal alert to the public interface. The formal alert indicating both hemorrhagic fever and the possible diagnosis of Ebola was finally issued on March 20th, one day after HealthMap issued its formal alert, and still three days before the WHO issued its formal alert on the outbreak.

There are several key findings here deserving of further exploration:

The fact that GDELT caught the earliest glimmers of the Ebola outbreak a day before other services, and, a day after that, monitored a massive surge in coverage indicative of rapidly spreading concern of the outbreak and suggestive of its growing significance a week before other services issued their first alerts, suggests that GDELT has enormous potential for early warning of emerging outbreaks and pandemics. Further, as GDELT’s translation infrastructure continues to rapidly expand to translate all foreign language material it monitors throughout the globe in realtime, GDELT’s ability to flag the earliest indicators of emerging situations throughout the world will only continue to grow.

When we talk about “beating the news” or “beating the WHO” and alerting or forecasting future outbreaks or political instability, we have to be careful to think more about whose news we’re “beating.” In the case of Ebola, contrary to much of the news coverage of HealthMap’s alerts, the project did not sift through millions of informal media posts and find a hidden pattern that it used to alert an unsuspecting Guinean Government who then took action to investigate the situation. The Guinean Government was all too aware of the emerging outbreak and had in fact already formally alerted the WHO. Thus, “beating” the WHO’s pronouncement that there was an emerging outbreak by 10 or 9 days did indeed “beat” the WHO to its official announcement to the general public, but certainly did not “beat” the Guinean Government. Similarly, when we talk about “beating the news” the metrics cited are traditionally beating the headlines of large national or international newspapers, while more often than not small local news sources in the affected region have been intensely covering the situation for days. In some regions of the world it is common practice to formally announce events like protests well in advance, advertising them on social media with the precise date, time, and street corner to meet up. Compiling these announcements into a spreadsheet each day is not “forecasting future protests” nor is it even “beating the news” – it is merely filtering. That’s not to say that it is not extremely useful and could offer a transformational shift in American policymakers’ access to that information, but it is simply the equivalent to syndicating a press release, not divining deeply hidden patterns across millions of data points and issuing a forecast. To put it in simpler terms, it represents the difference between republishing Apple's press release from last week announcing the details of its new Apple Watch which it has formally released to the public, versus producing a forecast today of what Apple will be releasing as its next big new product next fall. The problem is that a formal national broadcast by the health ministry of a small African country of a disease outbreak will likely receive far less global attention than the latest Apple product. While that Apple product will appear in headlines in papers across the world, that health announcement from an African country likely won’t reach far beyond its own borders, meaning that we must increasingly reach further and further local to access those kinds of alerts and synthesize them within the context of the evolving local, regional, and global information environments. Instead of trying to “beat” the international news, we should instead be focusing on first doing better at simply paying attention to the local news in the areas we’re interested in – that will get us far closer to our goals at far lower costs and, most critically, offers the great opportunities to collaborate with the local communities that are the best information sources on local events. The goal of GDELT is to do precisely this, to offer a global informational base to support increasingly local monitoring and feedback and increasingly high-resolution alerts and forecasts by providing a computational platform for monitoring the world that goes to great lengths to access local sources throughout the globe.

We are incredibly excited to see how remarkably well GDELT faired in monitoring the earliest indicators of the current Ebola outbreak, as well as its ability to map the surrounding stability context of the outbreak as it continues to evolve. We are already beginning to further explore how GDELT can be used as an early warning indicator for disease outbreak and especially how its unique ability to combine that with socio-cultural indicators and realtime social instability can be used to more accurately monitor, assess, visualize, and forecast both the spread of disease and its impact on the stability of nations.