GDELT GEO 2.0 API Debuts!

We are immensely excited today to announce the debut of the first of the GDELT 2.0 APIs: the GDELT GEO 2.0 API! Two years in the making, this API takes everything we've learned from our original GeoJSON API, the lessons we've learned from more than a decade mapping the geography of text, and all of your feedback and requests and woven them into what we hope is a transformative new way of accessing the geography of worldwide news coverage.

Perhaps most importantly, the new API now allows you to create maps of any arbitrary keyword or phrase, rather than being limited to just the GKG topical themes of the previous GeoJSON API. Now you can enter any English keyword and instantly receive back a map of every location mentioned within a sentence or two of your keyword across everything GDELT has monitored in all 65 live machine translated languages over the last 24 hours, updated every 15 minutes. Now no matter what word or phrase you're interested in, you can map it and even combine multiple terms into basic boolean OR statements.

Instant Interactive Maps

We heard you loud and clear that the previous system, with its GeoJSON-only output, was too difficult for non-technical users to get started with. Towards this end the new API generates instant intuitive interactive maps at landmark, first order administrative and country levels and these maps can be embedded directly in your own website, updating every 15 minutes. No need to understand what GeoJSON is or how to build a map, just enter your keywords and get back an instant map that you can use as-is. At the same time, the API generates fully compliant GeoJSON files that you can bring right into online mapping platforms like Carto to generate rich geographic experiences where you can fully customize the map's appearance and integrate additional datasets and map layers to tell complex cartographic stories. When generating country and ADM1-level maps, the API even outputs the complete polygonal geometry of each country/ADM1 as part of the GeoJSON file, meaning you can import it instantly into Carto without any further work.

Search The World's News Imagery

It is also the first API to provide programmatic friendly access to the GDELT Visual Knowledge Graph (VGKG), allowing you to leverage the results of some of the world's most advanced deep learning algorithms cataloging the world's news imagery each day to create realtime-updating maps of the visual narratives of global news content. Map global protests or natural disasters like flooding based not on textual mentions of a flood, but actual ground truthed imagery emerging from the scene. Map happy joyful protests where everyone is smiling and cheerful or map angry or violent protests, all based on facial expressions. Search any snippet of text found the images (from foreground labels to a protest sign in the background) in more than 80 languages, any metadata found in the EXIF and other fields in the image file, the textual caption of the image from the page it was found on and expand to include all of the captions used for the image everywhere it appeared on the web that Google Images could find.

Note that the integration of the VGKG imagery into the GEO API is currently extremely experimental and typically yields a greatly elevated false positive rate. At this time we simply associate every image in each article with every location mentioned anywhere in that article. While this results in a high false positive rate, at the end of the day if you are mapping flooding using "imagetag:flood", the resulting map will typically highlight countries undergoing active flooding conditions. We are actively working on refining our geolocation of imagery and recently began integrating the full textual captioning of each image, which when combined with the deep learning-based image geolocation, available file metadata, Google Images reverse captioning search and other indicators, should allow us to geolocate the majority of images to within a high degree of accuracy and we will be making a series of announcements over the next two months as we roll out these enhanced capabilities.

Massive Complexity

Given the immense complexity of monitoring worldwide news media in nearly every country of the world across more than 100 languages, visually identifying and extracting the actual article text from the rest of the page, machine translating the incredible complexity and nuance of more than 65 languages in realtime and geocoding and disambiguating more than 9 million places on earth down to the level of a remote isolated hilltop, together with deep learning analysis of imagery reflecting the entirety of global human experience, you will almost always see at least some level of error in the results this API provides. This can range from one city being confused for another of the same name and context to a city name being mistranslated to an unrelated temporary breaking news inset at a different location being incorrectly inserted into the middle of an article. With anything this massive, you will always find some level of error when you dive deeply enough into the results, but overall the maps should accurately reflect the broad contours of the geography of your search.

 

QUICK START EXAMPLES

Here are some really simple examples to get you started using the API!

  • Point-level map of all locations mentioned near "trump". This is the most basic kind of map you can make and simply tallies up the locations mentioned most frequently over the last 24 hours in close proximity to the word "trump" (for Donald Trump) and displays them as dots on an interactive HTML map that displays in your browser. Each location is displayed as a blue dot and sized based on the number of times "trump" appeared near mentions of that location over the last 24 hours. Jenks ("natural breaks") clustering is used to determine dot sizes and a legend in the lower-right shows the numeric ranges for each dot size. Mousing over each location will show the name of that city or landmark and the number of times it was mentioned in a box at the top-right of the map. Click on any location to display a popup with up to 5 matching articles that contained "trump" near a mention of that city. (Note that often you will see less than 5 articles even if there were hundreds of matching articles reported for that location – this is because one or more of those articles mentioned "trump" multiple times near that city's name and the duplicate mentions were removed and since the system only returns 5 matches for each location, you will see less than 5 mentions.)
  • Country-level map of all locations mentioned near "trump". This is the same map as above, but instead of displaying each individual location mentioned in proximity with the word "trump", locations are aggregated up to the country level. Working at a country level can make it much easier to understand macro-level geographic trends. Since some countries are mentioned far more often than others in the global media, raw counts are not displayed here – instead for each country, the number of times that country or any location within that country is mentioned in proximity to the word "trump" is tallied for the last 24 hours and then divided by the total number of times that country or any of its locations were mentioned overall and then multiplied by 100 to yield the percent of discussion about that country that had to do with "trump." As with the point-level map, you can mouse over any country to see the percent mentions and click on a country to see up to the top 10 articles mentioning "trump" in context with that country. The legend in the lower-right uses Jenks clustering to determine the best color scale.
  • First-order administrative division map of all locations mentioned near "trump". This is the same as the country-level map above, but instead of aggregating to the country level, it aggregates to the first-order administrative division (ADM1) (in the US each state is an ADM1, while in other countries it is an equivalent division). Note that this will typically yield a much sparser map than either the point or country-level maps, since a mention of just a country as a whole (like "France") will not appear on this map, while it will appear on the other two maps. This map is typically most useful when your analysis requires fine macro resolution over a smaller geographic area.
  • Source country map of mentions of "trump". The three maps above visualize the locations mentioned most frequently near the word "trump" across the world's news media. Any mention of "trump" in the vicinity of a mention of "Paris" in any news article anywhere in the world will be counted in those maps. However, often what you really want is not what locations are mentioned, but rather a breakdown of what each country in the world is saying about your keyword. This map colors each country in the world by how much of the domestic media monitored by GDELT from that country mentioned the word "trump" anywhere in the article. Clicking on each country will show up to 10 articles from that country's domestic press that mentioned "trump." (Note that despite extensive cross-referencing, sometimes a news outlet may be assigned to the wrong country – please let us know when you spot an outlet placed in the wrong country).
  • Source country map of mentions of "trump" (only articles with sharing images). This map is identical to the one above, except it only counts articles that provided a "social sharing" image to be displayed when the article is shared via social media. Many news outlets do not provide such social images for all articles and even those that provide social images may simply use the outlet's logo as the image, but in the general case most articles that contain a sharing image feature something related to the tenor and contents of the article, making this mode a useful way to survey the visual portrayal of a topic (in this case "trump").
  • Portuguese language coverage. This map visualizes the locations mentioned most frequently in the Portuguese-language coverage monitored by GDELT across all topics. As expected, the locations focused on most frequently in the Portuguese press are in Brazil, Portugal, the former Portuguese colony Angola and Western Europe.
  • BBC coverage. This map visualizes the locations mentioned most frequently in BBC's reporting, allowing you to understand the geographic emphases of each outlet.
  • Red Cross coverage. Here are two examples of finding mentions by country around the world of the Red Cross – the first example looks for textual mentions, while the second finds all images that either depict the Red Cross logo, mention it somewhere in the image file metadata, mention it in the caption of the image, or mention it in any caption associated with the image anywhere it appears on the open web.
  • Country-level map of all locations mentioned in articles containing images that were found anywhere on the web with a caption mentioning "election".  This map takes all images monitored by GDELT over the last 24 hours that Google's Cloud Vision API tagged as having the concept "election" appearing in the caption anywhere the image was found on the open web (via a Google Images reverse image search) and compiles the list of articles GDELT monitored containing that image and then compiles a list of all locations mentioned anywhere in those articles and creates a country-level map showing which countries appear most commonly in those articles. Note that this will have a fairly high false positive rate since it is simply mapping all locations found anywhere in articles containing election-related imagery. Image geographic searches are highly experimental for the GEO API and we are working on a number of fronts to greatly refine the accuracy of image geolocation, leveraging both the Cloud Vision API's ability to recognize the geographic location of an image and bringing to bear a wealth of image and document-level cues. In short, for the time being these image-based maps will be fairly inaccurate, but they offer a preview of where we are heading in bringing imagery to bear for mapping.
  • Flooding map. As with the map above, this has a fairly high false positive rate, but typically the countries colored the deepest red are undergoing current flooding situations, making this an easy way to map ground truth imagery of global flooding conditions.
  • Mapping violence in the media. This map offers an incredibly powerful and unique look at how violence is portrayed visually across the world's news media. For each country up to five images from the past 24 hours deemed to contain at least some level of violence and published in that country's domestic media are returned. This allows you to look at the intensity and kinds of violence that citizens of each country are exposed to on a daily basis. Please not that the imagery returned by this search can be extremely disturbing. Also remember that all images are classified 100% automatically by computer and so you will see mistakes.
  • Vehicles by country. This returns up to five images of ground-based transportation vehicles from the domestic press of each country, offering a glimpse at how vehicles and styles differ across the world.
  • Uncommon images by country. This uses a reverse Google Images search to count how many times Google Images has seen each image across the open web and displays up to five images from the domestic press of each country that has not appeared widely elsewhere on the web or news coverage.

FULL DOCUMENTATION

The GDELT GEO 2.0 API is accessed via a simple URL with the following parameters.

  • QUERY. This contains your search query and supports keyword and keyphrase searches, OR statements and a variety of advanced operators.
    • "". Anything found inside of quote marks is treated as an exact phrase search. Thus, you can search for "Donald Trump" to find all matches of his name.
      • "donald trump"
    • (a OR b). You can specify a list of keywords to be boolean OR'd together by enclosing them in parentheses and placing the capitalized word "OR" between each keyword or phrase. Boolean OR blocks cannot be nested at this time. For example, to search for mentions of Clinton, Sanders or Trump, you would use "(clinton OR sanders OR trump)".
      • (clinton OR sanders OR trump)
    • -. You can place a minus sign in front of any operator, word or phrase to exclude it. For example "-sourcelang:spanish" would exclude Spanish language results from your search.
      • -sourcelang:spanish
    • Domain. Returns all coverage from the specified domain. Follow by a colon and the domain name of interest. Search for "domain:cnn.com" to return all coverage from CNN or "domain:arabic.cnn.com" to search only their Arabic online site.
      • domain:cnn.com
    • ImageFaceTone. Searches the average "tone" of human facial emotions in each image. Only human faces that appear large enough in the image to accurately gauge their facial emotion are considered, so large crowd photos where it is difficult to see the emotion of peoples' faces may not be scored accurately. The tone score of an average photograph typically ranges from +2 to -2. To search for photos where visible people appear to be sad, search "imagefacetone<-1.5". Only available in any of the "image" modes.
      • imagefacetone<-1.5
    • ImageNumFaces. This searches the total number of foreground human faces in the image. Typically only unobstructed human faces facing toward the camera and in the foreground of the image are counted – large crowd scenes will not be counted properly. Use this to identify images depicting a certain number of people in the foreground of the photo. You can search for "<" less than, ">" more than or "=" – searching "imagenumfaces=3" will identify images with three human faces, while "imagenumfaces>5" will return images with more than 5 human faces. Only available in any of the "image" modes.
      • imagenumfaces>3
    • ImageOCRMeta. This searches a combination of the results of OCR performed on the image in 80+ languages (to extract any text found in the image, including background text like storefronts and signage), all metadata embedded in the image file itself (EXIF, etc) and the textual caption provided for the image. To search for images of a specific event, such as "mobile congress" you would use this field, since that information would most likely either be found in signage in the background of the image, provided in the EXIF metadata in the image or listed in the caption under the image. The search parameter for this field must always be enclosed in quote marks, even when searching for a single word like "imageocrmeta:"zika"". Only available in any of the "image" modes.
      • imageocrmeta:"zika"
    • ImageTag. Every image processed by GDELT is assigned one or more topical tags from a universe of more than 10,000 objects and activities recognized by Google's algorithms. This is the primary and most accurate way of searching global news imagery monitored by GDELT, as these tags represent the ground truth of what is actually depicted in the image itself, whereas other fields like "imageocrmeta" and "imagewebtag" reflect metadata and caption information provided by others about the image. Always remember that these tags are assigned 100% by computer and thus you will always find some error in the results. You can find a list of all tags appearing in at least 100 images over the past year (Image Tag Lookup) – in addition the two special tags "safesearchviolence" and "safesearchmedical" can also be used. Searching for "imagetag:"safesearchviolence"" will return violent images, for example. Values must be enclosed in quote marks. Only available in any of the "image" modes.
      • imagetag:"safesearchviolence"
    • ImageWebCount. Every image processed by GDELT is run through the equivalent of a reverse Google Images search that searches the web to see if the image has ever appeared anywhere else on the web that Google has seen. Up to the first 200 web pages where the image has been seen are returned. This operator allows you to screen for popular versus novel images – searching for "imagewebcount<10" will search for relatively novel images while "imagewebcount>100" will return images that appear widely online. Note that this records only the number of pages that Google has seen the image on, not the number of sites, meaning that if, for example, CNN uses a single image widely in its reporting of a breaking news event and publishes many articles on the event with the same image, this count will be high for that image, even though it is a novel image. Only available in any of the "image" modes.
      • imagewebcount<10
    • ImageWebTag. Every image processed by GDELT is run through the equivalent of a reverse Google Images search that searches the web to see if the image has ever appeared anywhere else on the web that Google has seen. The system then takes every one of those appearances from across the web and looks at all of the textual captions appearing beside the image and compiles a list of the major topics used to describe the image across the web. This offers tremendous descriptive advantage in that you are essentially "crowdsourcing" the key topics of the image by looking at how it has been described across the web. Values must be enclosed in quote marks. Only available in any of the "image" modes. You can access a list of all tags appearing in at least 100 images (Image WebTag Lookup).
      • imagewebtag:"drone"
    • Location. Searches for a given word or phrase in the full formal name of the location. Thus, you can search for "location:"new york"" to search for all locations that contain "new york" in their name (though in this case "locationadm1:usny" would also yield the same result). Values must be enclosed in quote marks.
      • location:"new york"
    • LocationADM1. Returns all matches within the specified first order administrative division (ADM1). Search for "locationadm1:ustx" to return all matches from the state of Texas in the United States. Due to spelling variations you must specify the four-character ADM1 code rather than spelling it by name (ADM1 Lookup).
      • locationadm1:USTX
    • LocationCC. Returns all matches within the specified country. For countries with spaces in their names, type the full name without the spaces (like "locationcc:unitedarabemirates" or "locationcc:saudiarabia"). You can also use their 2-character FIPS country code (Country Lookup).
      • locationcc:france
    • Near. This returns all matches within a certain radius of a given point. You specify a particular latitude and longitude and then a radius in either miles or kilometers from that point and all matches within the resulting bounding box are returned. Note that while a radius is specified the actual search conducted is technically a bounding box, rather than a radial search. NOTE that for southern latitudes and western longitudes you should use negative values. By default "radius" is interpreted as miles, but you can append "km" to the end to specify kilometers. To search for all mentions of locations within 100 miles of Paris, France, you would use "near:48.8566,2.3522,100" or to use 100 kilometers instead, you would search for "near:48.8566,2.3522,100km".
      • near:48.8566,2.3522,100
    • SourceCountry. Searches for articles published in outlets located in a particular country. This allows you to narrow your scope to the press of a single country. For countries with spaces in their names, type the full name without the spaces (like "sourcecountry:unitedarabemirates" or "sourcecountry:saudiarabia"). You can also use their 2-character FIPS country code (Country Lookup).
      • sourcecountry:france
    • SourceLang. Searches for articles originally published in the given language. The GEO API currently only allows you to search the English translations of all coverage, but you can specify that you want to limit your search to articles published in a particular language. Using this operator by itself you can map all of the locations mentioned in a particular language across all topics to see the geographic focus of a given language. Search for "sourcelang:spanish" to return only Spanish language coverage. You can also specify its three-character language code. All 65 machine translated languages are supported (Languages Lookup).
      • sourcelang:spanish
    • Theme. Searches for any of the GDELT Global Knowledge Graph (GKG) Themes. GKG Themes offer a more powerful way of searching for complex topics, since they can include hundreds or even thousands of different phrases or names under a single heading. To search for coverage of terrorism, use "theme:terror". You can find a list of all themes that have appeared in at least 100 articles over the past two years (GKG Theme Lookup).
      • theme:TERROR
    • Tone. Allows you to filter for only articles above or below a particular tone score (ie more positive or more negative than a certain threshold). To use, specify either a greater than or less than sign and a positive or negative number (either an integer or floating point number). To find fairly positive articles, search for "tone>5" or to search for fairly negative articles, search for "tone<-5".
      • tone<-5
    • ToneAbs. The same as "Tone" but ignores the positive/negative sign and lets you simply search for high emotion or low emotion articles, regardless of whether they were happy or sad in tone. Thus, search for "toneabs<1" for fairly neutral articles or search for "toneabs>10" for fairly emotional articles.
      • toneabs>10
  • MODE. This controls the type of map that is generated. The modes that begin with "image" are restricted to image-related searches, while the other modes are restricted to keyword searches and basic document attribute searches. For example, to create a point map of a textual keyword or all coverage from a particular domain or a specific language, use "PointData" mode, while to search for all images depicting flooding, use "ImagePointData".
    • PointData. This is the default map mode and displays a dot at each location mentioned in proximity to your search term. It supports only textual searches and basic document attribute searches. All image-related search parameters (like "imagetag") are disabled in this mode and will return an error. Each location includes an HTML block listing up to 5 matching articles matching your search that appeared in proximity to that location.
    • ImagePointData. This is identical to PointData mode, but supports only the image-related search parameters and a few basic document parameters like source language. In short, if you want to create a map of a textual keyword or article attributes like language or tone use the "PointData" mode and if you want to search for images use the "ImagePointData" mode. Each location includes an HTML block listing up to 5 matching images matching your search.
    • PointHeatmap. If you just want to create a heatmap of the locations most closely associated with your search term, but don't need to return a list of the matching articles themselves, this mode returns up to 25,000 distinct matching locations. It trades off not returning the matching article list to be able to return a larger list of locations. This mode is only available with GeoJSON output format.
    • ImagePointheatmap. The same as above, but for image searches.
    • PointAnimation. Similar to PointHeatmap in that it does not return the actual article list, but extends it by creating a series of heatmaps in 15 minute increments over the past 24 hours, allowing you to visualize the changing geography of a topic. This mode is only available with GeoJSON output format. The GeoJSON is optimized for one-click visualization using Carto's "Torque" animation mode.
    • ImagePointAnimation. The same as above, but for image searches.
    • Country. This map mode aggregates all locations to the country level. With PointData mode, often there are so many dots on the screen it can be hard to really get a sense of the macro country-level landscape of a search. This mode also performs normalization, dividing the number of times locations in each country were mentioned in context with your search by the total number of times that country's locations were mentioned overall and multiplied by 100 to yield a percent density. This normalizes for the fact that there are likely more articles mentioning the United States with respect to any major global issue simply because the US plays such a central role in global politics, so by normalizing the result volume, the underlying true geographic trends emerge.
    • ImageCountry. The same as above, but for image searches.
    • SourceCountry. This map mode reflects the country or origin of your search results, coloring each country by the percent of all content monitored from that country over the last 24 hours that contained your search term and clicking on any country shows up to five matching articles from that country's press. This allows you to rapidly triage how the world is reporting on a particular issue and the specific framing and contextualization distinct to each nation's press.
    • ImageSourceCountry. The same as above, but for image searches.
    • ADM1. This performs the same role as "Country mode above, but operates at the resolution of first order administrative divisions (ADM1's). This offers the benefits of geographic aggregation while offering finer resolution for analyses where it is important to understand the specific corner of a country that is being mentioned most frequently in context with your search. Note that this mode may display less of the world than either point or country modes, since a mention of "France" will appear in the latter modes, but ADM1 will only reflect city/landmark or ADM1 resolution hits.
    • ImageADM1. The same as above, but for image searches.
  • FORMAT. This controls what file format the results are displayed in and allows you to control whether only articles that contain social sharing images are shown.
    • HTML. This is the default mode and returns a fully interactive browser-based map. The title of each matching article is displayed in textual format in the location popup.
    • ImageHTML. This is the same as "HTML" mode, but filters for only articles that contain a social sharing image and displays results by showing a thumbnail image beside the title of each article in the location popup. This format is required for the "image" modes but can be optionally used with the other modes.
    • ImageHTMLShow. This enables a special "showcase" mode in which the search results are divided into a 5 degree grid and one image is selected for display from each grid cell and displayed in a popup with up to 100 images total displayed on the map. This is particularly useful for presentations and for rapidly triaging matching imagery from across the world since a sample of up to 100 images scattered across the world are shown all at once, rather than having to see the results one at a time by clicking on each location. This format is only available for pointdata, country, sourcecountry, imagesourcecountry, imagecountry and imagepoint data modes.
    • GeoJSON. This outputs the map as a fully compliant GeoJSON file, ready for import directly into Carto and other online mapping platforms. For country and ADM1 modes, all polygonal geometry needed to display the map is embedded in the file as a MultiPolygon, allowing you to import and display the map without any further work. For all modes other than "pointheatmap" and "pointanimation", the GeoJSON file will contain the HTML code necessary to display the popup article list for each location.
    • ImageGeoJSON. This is the GeoJSON equivalent of the ImageHTML format – only articles with a social sharing image are returned and the HTML popup code contains the thumbnail images for each article sized to display at 175×100 pixels with rounded edges.
  • TIMESPAN. By default all articles monitored by GDELT over the last 24 hours are searched, but you can narrow this range if you want to consider only the most recent articles, up to the last 15 minutes. You can specify the number of minutes back to search, from the last 15 minutes up to the last 1440 minutes (last 24 hours).
    • Any value between 15 and 1440. The number of minutes back to search.
  • MAXPOINTS. To conserve system resources, the API only returns up to a certain maximum number of results for each mapping mode. You can restrict the number of results even further to minimize the size of your maps. This parameter is only enabled for "point" modes and is ignored for all other modes.
    • Pointdata Mode: 1-1000 Locations. In pointdata mode, this parameter controls the number of distinct locations drawn on the map, with up to 5 results (this is not configurable) from each location displayed in pointdata mode.
    • PointHeatmap Mode: 1-25000 Locations. In pointheatmap mode, up to 25,000 distinct locations are returned.
    • PointAnimation Mode: 1-10000 Locations. In pointanimation mode, up to 10,000 distinct locations per timestep are returned.
  • GEORES. By default the API returns all locations, including cities, ADM1 and country-level mentions. Sometimes you want to filter for more precise geographic mentions, visualizing only city mentions and excluding country-level mentions, for example.
    • 0. This is the default and will display all locations.
    • 1. This excludes country mentions and displays ADM1 and city/landmark mentions.
    • 2. This excludes country and ADM1-level mentions and displays only city/landmark mentions.
  • SORTBY. By default results are sorted by relevance to your query. Sometimes you may wish to sort by date or tone instead.
    • Date. Sorts results by publication date, displaying the most recent articles first.
    • ToneDesc. Sorts results by tone, displays the most positive articles first.
    • ToneAsc. Sorts results by tone, displays the most negative articles first.