We are enormously excited to announce today the unveiling of the future of GDELT. GDELT 2.0 debuts what we believe is one of the largest and most ambitious platforms ever created for monitoring our global world. From realtime translation of the world’s news in 65 languages, to measurement of more than 2,300 emotions and themes from every article, to a massive inventory of the media of the non-Western world, GDELT 2.0 is poised to redefine how we understand and interact with our global world, transcending language barriers and reaching deeply into the reactions and emotional resonance of world events. In essence, within 15 minutes of GDELT monitoring a news report breaking anywhere the world, it has translated it, processed it to identify all events, counts, quotes, people, organizations, locations, themes, emotions, relevant imagery, video, and embedded social media posts, placed it into global context, and made all of this available via a live open metadata firehose enabling open research on the planet itself.
GDELT 2.0 is an index over global society, an open dataset that attempts to make human society itself “computable,” leveraging the enormous power of Google Cloud to fundamentally reimagine how we study the human world in realtime at a planetary scale.
For more than half a century, the majority of work on understanding global society at scale has focused exclusively on English-language Western media. GDELT 2.0 redefines what it means to listen to the world, leveraging what we believe is one of the highest-resolution inventories today of the media systems of the non-Western world to give voice to the most remote corners of the world in near-realtime. This is coupled with what we believe is the largest realtime streaming translation of the news, translating 98.4% of non-English material monitored by GDELT 2.0 in 65 languages in realtime, allowing it to truly listen to the world irrespective of traditional language barriers. At the same time, sentiment mining has rapidly emerged over the past few years to become a key developing technology for studying global society. Yet, we must keep in mind that human emotion spans far more than just “positive” and “negative” and recognize that emotion reflects an incredibly rich tapestry that is deeply embedded in personal and cultural contexts. Towards this end, GDELT 2.0 is opening an entirely new chapter in the study of global emotion, bringing together 24 emotional measurement packages (see the list of available packages here, here, here, here, and here) that together assess more than 2,300 emotions and themes, including native measures for 15 languages. The single largest deployment in the world of sentiment analysis, we hope that by bringing together so many emotional and thematic dimensions crossing so many languages and disciplines, and applying all of it in realtime to breaking news from across the planet, that this will spur an entirely new era in how we think about emotion and the ways in which it can help us better understand how we contextualize, interpret, respond to, and understand global events.
Below are just a few of the myriad new capabilities debuting today with the official release of GDELT 2.0. We can’t wait to see they make possible.
- 15 Minute Updates. Access the world’s breaking events and reaction in near-realtime as both the GDELT Event and Global Knowledge Graph now update every 15 minutes.
- Realtime Translation of 65 Languages. GDELT 2.0 brings with it the public debut of GDELT Translingual, representing what we believe is the largest realtime streaming news machine translation deployment in the world: all global news that GDELT monitors in 65 languages, representing 98.4% of its daily non-English monitoring volume, is translated in realtime into English for processing through the entire GDELT Event and GKG/GCAM pipelines. GDELT Translingual is designed to allow GDELT to monitor the entire planet at full volume, creating the very first glimpses of a world without language barriers. A special emphasis on locations and names makes GDELT 2.0 likely the largest multilingual geocoding system in the world.
- Realtime Measurement of 2,300 Emotions and Themes. GDELT 2.0 also brings with it the debut of GDELT Global Content Analysis Measures (GCAM), representing what we believe is the largest deployment of sentiment analysis in the world: bringing together 24 emotional measurement packages that together assess more than 2,300 emotions and themes from every article in realtime, multilingual dimensions natively assessing the emotions of 15 languages (Arabic, Basque, Catalan, Chinese, French, Galician, German, Hindi, Indonesian, Korean, Pashto, Portuguese, Russian, Spanish, and Urdu). GCAM is designed to enable unparalleled assessment of the emotional undercurrents and reaction at a planetary scale by bringing together an incredible array of dimensions, from LIWC’s “Anxiety” to Lexicoder’s “Positivity” to WordNet Affect’s “Smugness” to RID’s “Passivity”.
- High Resolution View of the Non-Western World. Over the last few months we’ve embarked upon an ambitious initiative to vastly expand GDELT’s knowledge of the media systems of the non-Western world. Working closely with governments, think tanks, academics, NGO’s, and citizens on the ground throughout the world we have been working country-by-country to try to build the highest resolution inventory possible of the media systems of the non-Western world. While we still have a long way to go and the fluidity of the world’s media ensures that this will be a perpetual task, we are incredibly excited by the ability of this high resolution inventory, coupled with GDELT Translingual’s ability to translate 98.4% of this material in realtime, to give voice to the most remote corners of the world in near-realtime.
- Relevant Imagery, Videos, and Social Embeds. A large fraction of the world’s news outlets now specify a hand-selected image for each article to appear when it is shared via social media that represents the core focus of the article. GDELT identifies this imagery in a wide array of formats including Open Graph, Twitter Cards, Google+, IMAGE_SRC, and SailThru formats, among others. In addition, GDELT also uses a set of highly specialized algorithms to analyze the article content itself to identify inline imagery of high likely relevance to the story, along with videos and embedded social media posts (such as embedded Tweets or YouTube or Vine videos), a list of which is compiled. This makes it possible to gain a unique ground-level view into emerging situations anywhere in the world, even in those areas with very little social media penetration, and to act as a kind of curated list of social posts in those areas with strong social use.
- Quotes, Names, and Amounts. The world’s news contains a wealth of information on food prices, aid promises, numbers of troops, tanks, and protesters, and nearly any other countable item. GDELT 2.0 now attempts to compile a list of all “amounts” expressed in each article to offer numeric context to global events. In parallel, a new Names engine augments the existing Person and Organization names engines by identifying an array of other kinds of proper names, such as named events (Orange Revolution / Umbrella Movement), occurrences like the World Cup, named dates like Holocaust Remembrance Day, on through named legislation like Iran Nuclear Weapon Free Act, Affordable Care Act and Rouge National Urban Park Initiative. Finally, GDELT also identifies attributable quotes from each article, making it possible to see the evolving language used by political leadership across the world.
- Tracking Event Discussion Progression. Under the previous version of GDELT, only the first URL mentioning a given event was recorded, even if the event was mentioned in a hundred separate articles. GDELT 2.0 adds a new “Mentions” table that records every mention of an event over time, along with the timestamp the article was published. This allows the progression of an event through the global media to be tracked, identifying outlets that tend to break certain kinds of events the earliest or which may break stories later but are more accurate in their reporting on those events. Combined with the 15 minute update resolution and GCAM, this also allows the emotional reaction and resonance of an event to be assessed as it sweeps through the world’s media.
- Over 100 New GKG Themes. There are more than 100 new themes in the GDELT Global Knowledge Graph, ranging from economic indicators like price gouging and the price of heating oil to infrastructure topics like the construction of new power generation capacity to social issues like marginalization and burning in effigy. The list of recognized infectious diseases, ethnic groups, and terrorism organizations has been considerably expanded, and more than 600 global humanitarian and development aid organizations have been added, along with global currencies and massive new taxonomies capturing global animals and plants to aid with tracking species migration and poaching.
- Source Geographic Background Knowledge. GDELT now assesses the geography of every outlet it monitors over time and estimates its physical location on earth, incorporating that information back into the geocoding process to maximize its ability to recognize the geography of local media (a small rural radio station likely assumes its listeners know what country it is based in and thus does not clarify every mention of a local location with the corresponding country name).
- Global Knowledge Graph Now in BigQuery. The GDELT Global Knowledge Graph is now available in Google BigQuery, allowing you to query and explore the GKG in realtime and to integrate it into queries of the Event dataset. In fact, the Event, Mentions, and GKG tables are now all in BigQuery and updated every 15 minutes, allowing you to leverage BigQuery’s enormous power to perform mass-scale analytics in near-realtime on our changing planet.
Rest assured that the GDELT 1.0 data streams will be maintained at a minimum through the end of Spring 2015, so your existing applications will continue to work without modification. At present the GDELT 2.0 data streams only stretch back to late morning February 19, 2015, so those wishing to perform longitudinal analysis will still need to use GDELT 1.0 for historical analysis and GDELT 2.0 for realtime analysis – in late Spring 2015 we will be releasing the entire historical backfile back to 1979 in the GDELT 2.0 format.
We'll be releasing a new "Getting Started With GDELT" user guide in the next few days to walk you through the incredibly vast array of new capabilities in GDELT 2.0, but in the meantime, you can go ahead and jump right in to exploring GDELT 2.0 (keep in mind that the data files begin late morning February 19, 2015, so there is not currently a historical backfile).
- GDELT 2.0 Event Database Codebook.
- GDELT 2.0 Global Knowledge Graph Codebook (V2.1).
- GCAM Codebook.
- Access GDELT 2.0 in Google BigQuery: Events, Mentions, Global Knowledge Graph. (Updated every 15 minutes).
- Master CSV Data File List – English. (Updated every 15 minutes).
- Master CSV Data File List – GDELT Translingual. (Updated every 15 minutes).
- Last 15 Minutes CSV Data File List – English. (Updated every 15 minutes).
- Last 15 Minutes CSV Data File List – GDELT Translingual. (Updated every 15 minutes).