GDELT has long been used to understand environmental issues, from early work around climatic change to our well-known 2015 map of global wildlife crime for Foreign Policy magazine (direct link to the map) to being a finalist in USAID's Wildlife Crime competition. What are some of the ways you can use GDELT's datasets today to understand global environmental and wildlife issues from climate change to poaching?
Multilingual Narrative Analysis
GDELT provides a wide range of textual analytic datasets and APIs that can be used to understand the narrative space around environmental and wildlife issues.
- Interactive Keyword Search. At the simplest level, GDELT Summary can be used to keyword search the English machine translations of all worldwide online coverage GDELT has monitored in its core 65 languages, keyword search a decade of television news, visually search a quarter year's worth of television annotated through AI. Rich analytic visualizations are provided for all searches and beside each visualization or report is a live link to the underlying results in CSV, JSON, JSONP, RSS, GeoJSON and other formats, allowing you to integrate them into scripted workflows for more advanced analytics. For example, see the transition from "climate change" to "climate crisis" or see imagery of polar bears give way to deserts.
- Ngrams. There are also unigram and bigram news ngram datasets going back to Jan. 1, 2019 in all 152 languages monitored by GDELT, as well as 1-5 gram ngram datasets for television news stretching back a decade. Both allow you to see how specific terms are shifting in usage over time.
- Entity + Linguistic Data. For more advanced users, there is a live-updating neural global entity graph compiled from daily online news in 11 languages through Google's Natural Language API, as well as a live-updating classical entity graph compiled through an HMM+grammar system, along with a neural entity graph over television news. A massive live-updating part of speech dataset provides insights into how word usage is changing. These datasets can be used for everything from massive graph visualizations to understanding contextual entity relationships to Q&A such as whether Donald Trump ever met Al Gore to identifying contradictory and contested narratives such as whether climate change contributed to the Australian wildfires.
- Quoted Statements. Oftentimes you want to understand what is being said about a given topic across the world, such as tracking statements by government officials about environmental and wildlife initiatives. The Global Quotation Graph runs from January 1, 2020 through present and contains all quoted statements found in worldwide news coverage monitored by GDELT in all 152 languages it assesses! Here is an example using it to track mentions of hydroxychloroquine on a single day earlier this year.
- Frontpage Agenda Setting. Online publishing offers news outlets infinite publication space, meaning breaking news is added to coverage rather than displacing other coverage. News homepages/frontpages still reflect these editorial priorities, however, offering a global scale glimpse into news agenda setting in each country. The Global Frontpage Graph tracks nearly 200 billion frontpage links back to March 2018, allowing you to readily see how often environmental and wildlife issues rank frontpage treatment.
- Context API. The Context API is a powerful tool for tracking the context of a given narrative. While it currently searches only a rolling window of the last 24 hours, it allows you to keyword search GDELT's live monitoring stream and return not just matching URLs, but the first matching snippet from each, lending context to the match for relevancy calculations as well as allowing AI systems to interactively guide users to the best result, as well as assisting in Q&A and contested narrative assessment.
Mapping
Geography is a major emphasis of GDELT, with GDELT being among the first to debut mass-scale full-volume multilingual geocoding across all 65 languages in 2015. All coverage GDELT monitors in its core 65 languages are geocoded down to the resolution of a city landmark.
- Instant Maps With No Coding. For those who want to simply create instant rolling 7-day maps, GDELT Summary allows you to map any topic across the 65 languages GDELT translates without writing a single line of code! Here's an example of mapping the climate protests last year. For more advanced users, the maps are actually rendered by the powerful GEO 2.0 API, which supports CSV, JSON, JSONP and GeoJSON output formats, allowing you to easily script more powerful applications around it. For those who simply want to create instant maps around any topic across all 65 languages GDELT live translates, GDELT Summary is a perfect fit! In fact, many organizations use a two-stage workflow in which line analysts create and fine-tune maps and when they have one that works well for their needs, they can click on the "GeoJSON" link at the bottom-right of their map and forward the URL to the organization's data science team that can then integrate it into their more advanced GIS systems, polling it daily to create a longitudinal map that extends beyond the last 7 days and brings to bear far more advanced GIS capabilities! You can also instantly bring these GeoJSON feeds into Carto to create beautiful publication-ready and mobile-friendly maps and combine them with other data layers.
- Advanced Multilingual Mapping. Creating our 2015 wildlife crime map required a fairly complex workflow involving multiple layers of SQL queries and PERL scripts. Later that year we demonstrated how to create the same map entirely in BigQuery using UDFs to eliminate the need for the external scripts, outputing the results directly to Carto. These approaches have the benefit in that they leverage GDELT's mass machine translation across 65 languages to map a global perspective. Both approaches utilize the geographic data in the Global Knowledge Graph.
- Advanced English News Mapping. For those content with mapping just worldwide English language news coverage back to 2017, the Global Geographic Graph announced this past April makes it trivial to search more than 1.7 billion location mentions in global English news, keyword searching the surrounding snippets to map locations most closely associated with specific topics and even mapping how they have changed over time! With a single line of SQL you can map any topic, such as the geography of Angela Merkel. Using Carto's built-in BigQuery connector, you can seamlessly map a topic like Covid-19 by simply pasting the query directly into Carto! Read our recent tutorial or watch the video! You can also download the Global Geographic Graph directly as JSON files to use offline.
Visual Analysis
GDELT has two major initiatives around visual understanding of the news: worldwide news imagery and television coverage, both leveraging Google's machine vision APIs to understand and catalog the coverage.
- Global News Imagery. Since 2016 GDELT's Visual Global Knowledge Graph has processed more than half a billion global news images, processing up to a million randomly selected images each day. This can be used to understand visual narratives, such as the shift from polar bears to desertification or even assess pollution by looking in the background of news imagery for litter on the ground or pollution in the sky.
- Television Coverage. GDELT's Visual Global Entity Graph catalogs American television evening news broadcasts on ABC, CBS and NBC back more than a decade, along with CNN since Jan. 25, 2020 and MSNBC, Fox News and BBC News London since April 2, 2020. Both airtime summary files and the raw API JSON output are available. This can be used to understand the changing visual narratives of environmental and wildlife coverage on television news and how it is being portrayed and framed.
Historical: Academic Literature, Human Rights & Books
For those interested in historical and academic perspectives, GDELT has several powerful datasets that can help shed light on the "why" to the "what" in today's news.
- Academic Literature. In 2014 we published one of the first at-scale analyses of JSTOR, DTIC and the Internet Archive's PDF holdings on Africa and the Middle East spanning the period 1950 to 2014 titled "Cultural Computing at Literature Scale: Encoding the Cultural Knowledge of Tens of Billions of Words of Academic Literature". The resulting 60-year dataset was released as a specialized AME Global Knowledge Graph. While regional in nature, this historical dataset offers powerful insights into the underpinnings of modern environmental and wildlife conflicts and can be directly linked with our contemporary news GKG.
- Human Rights. The Human Rights Global Knowledge Graph may also be of interest to specific research questions, encoding more than 110,552 documents: 65,731 from Amnesty International back to 1960, 5,255 from FIDH, 24,791 from Human Rights Watch, 709 from the International Criminal Court, 1,333 from the International Crisis Group, 5,976 from the United Nations Office of the High Commissioner for Human Rights, and 6,757 from the United States Department of State (comprising its Country Reports on Terrorism, Human Rights Reports, International Religious Freedom Reports, and Trafficking in Persons Reports series) . Taken together, the reports, news releases, and alerts of these seven organizations offer a substantial cross-national view of human rights issues throughout the world, with many overlaps to environmental and wildlife topics.
- Historical Books. For those interested in historical perspectives, there is also a specialized set of Historical Book Global Knowledge Graphs that encode the public domain books of the Internet Archive and HaithiTrust archives from 1800 to 2015, totaling more than 3.5 million volumes in all. This dataset offers powerful insights into how the academic and popular discourse around the environment and wildlife has changed over the past two and a quarter centuries.
Historical: Book Imagery
Finally, for those interested in historical imagery, in collaboration with the Internet Archive, in 2014 we extracted all of the images from the Archive's public domain book collection spanning more than 600 million digitized pages dating back 500 years from over 1,000 libraries worldwide.
- Book Images Collection. The final collections include myriad volumes relating to the environment and wildlife, including the digitized portion of the Smithsonian Biodiversity Heritage Library's collections. You can read more about the collection, including downloading the complete archive, as well as browse and search the images on Flickr, including the Biodiversity Collection and "birds".