Today we mark a truly transformative moment in the history of GDELT. From its public debut a year and a half ago, GDELT has grown at a rate that it is almost impossible to imagine, passing over one million downloads in six months this past August, and finding application across the globe. From its founding, the vision of the GDELT Project has lain not just in codifying physical events from the world’s news media, nor just creating a contextual graph over the people, organizations, locations, and themes of the media, but to move beyond these towards quantifying the extraordinary array of latent emotional and thematic signals subconsciously encoded in the world’s media each day.
The GDELT Project’s full name stands for the Global Database of Events, Language, and Tone (GDELT) and it is with incredible excitement that today we have the pleasure of officially unveiling to you the first glimpses of those Language and Tone portions of the GDELT Project initiative: the Global Content Analysis Measures (GCAM), to be rolled out over the coming weeks. In a nutshell, the GCAM system runs each news article monitored by GDELT through an array of leading content analysis tools to capture over 2,230 latent dimensions, reporting density and value scores for each. Using GCAM, you can assess the density of “Anxiety” speech via Linguistic Inquiry and Word Count (LIWC), "Positivity" via Lexicoder, “Smugness” via WordNet Affect, “Passivity” via Regressive Imagery Dictionary, “Perception” via WordNet Lexical Categories, “Moral/Spiritual” via Forest Values, “Vanity” via Roget’s Thesaurus, “Goal” via General Inquirer, and the list goes on and on! In total, 18 content analysis systems totaling more than 2,230 dimensions are now run on each news article seen by GDELT each day and all of these scores will be available via the forthcoming daily GKG 2.0 updates. When GDELT transitions to 15 minute updates later this month, all of these dimensions will even be calculated across the world’s news monitored by GDELT in near-realtime! See the GCAM Master Codebook for a list of all of the dimensions available and the Global Knowledge Graph 2.0 Codebook (scroll down to the GCAM field) for more details on the file format of the GCAM field and how to work with it.
We are incredibly excited to see what all of you are able to do with this incredible new chapter in GDELT’s history!
Below you’ll find the complete list of dictionaries currently used by the GCAM system to process each news article. If you've developed content analysis tools or dictionaries that you’d be willing to make available for us to run over the world’s news each day, we’d love to hear from you! We'll be making an announcement in the next two weeks when the new daily GKG 2.0 files with the GCAM encodings are available, so stay tuned!
Forest Values
Bengston, D, & Xu, Z. (1995). Changing national forest values: A content analysis. St. Paul, Minn.: North Central Forest Experiment Station, Forest Service, U.S. Dept. of Agriculture.
GDELT Global Knowledge Graph Themes
Kalev Hannes Leetaru. (2013). 'The GDELT Global Knowledge Graph (GKG)'. Available http://gdeltproject.org/
General Inquirer V1.02 (Harvard IV-4 Psychosocial Dictionary / NamenWirth & Weber’s Lasswell Dictionary)
Philip J. Stone, Robert F. Bales, Zvi Namenwirth, & Daniel M. Ogilvie (1962). The General Inquirer: A computer system for content analysis and retrieval based on the sentence as a unit of information. Behavioral Science, 7(4), 484-498
Lexicoder Sentiment Dictionary
Lori Young and Stuart Soroka. 2012. Affective News: The Automated Coding of Sentiment in Political Texts, Political Communication 29: 205-231. Available at http://lexicoder.com/
Lexicoder Topic Dictionaries
Albugh, Quinn, Julie Sevenans and Stuart Soroka. 2013. Lexicoder Topic Dictionaries, June 2013 versions, McGill University, Montreal, Canada. Available at http://lexicoder.com/
Linguistic Inquiry and Word Count (LIWC)
Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic Inquiry and Word Count: LIWC [Computer software]. Austin, TX. Available at http://www.liwc.net/
Loughran and McDonald Financial Sentiment Dictionaries
Tim Loughran and Bill McDonald, 2011, “When is a Liability not a Liability,” Journal of Finance, V66, pp. 35-65.
Opinion Observer
Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
Regressive Imagery Dictionary
Martindale C. (1987). Narrative pattern analysis: A quantitative method for inferring the symbolic meaning of narratives. In Literary discourse: Aspects of cognitive and social psychological approaches Halasz L. (ed) pp167–181, Berlin: de Gruyter
Roget’s Thesaurus 1911 Edition
Peter Mark Roget. (1911). Roget's Thesaurus of English Words and Phrases. New York: TY Crowell Company.
SentiWordNet 3.0
Andrea Esuli Stefano Baccianella and Fabrizio Sebastiani. (2010). Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In LREC.
SentiWords
Guerini M., Gatti L. & Turchi M. “Sentiment Analysis: How to Derive Prior Polarities from SentiWordNet”. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP'13), pp 1259-1269. Seattle, Washington, USA. 2013.
Subjectivity Lexicon
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.
Body Boundary Dictionary
Andrew Wilson. (2006). Development and application of a content analysis dictionary for body boundary research. Literary and Linguistic Computing, 21, 105-110.
WordNet Affect 1.0
Carlo Strapparava and Alessandro Valitutti. "WordNet-Affect: an Affective Extension of WordNet", in Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, May 2004, pp. 1083-1086.
WordNet Affect 1.1
Carlo Strapparava and Alessandro Valitutti. "WordNet-Affect: an Affective Extension of WordNet", in Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), Lisbon, May 2004, pp. 1083-1086.
WordNet Domains 3.2
Bernardo Magnini and Gabriela Cavaglià. "Integrating Subject Field Codes into WordNet". In Gavrilidou M., Crayannis G., Markantonatu S., Piperidis S. and Stainhaouer G. (Eds.) Proceedings of LREC-2000, Second International Conference on Language Resources and Evaluation, Athens, Greece, 31 May – 2 June, 2000, pp. 1413-1418.
WordNet 3.1 Lexical Categories
George A. Miller (1995). WordNet: A Lexical Database for English. Communications of the ACM Vol. 38, No. 11: 39-41.