A New Contextual Dataset For Exploring Climate Change Narratives: 6.3M English News URLs With Contextual Snippets 2015-2020

We are tremendously excited today to announce the fourth in our series of climate change narrative datasets, covering television, linguistics and the global narrative. This final dataset covers worldwide English language online news coverage 2015-2020 mentioning "climate change" OR "global warming" OR "climate crisis" OR "greenhouse gas" OR "greenhouse gases" OR "carbon tax" totaling 6.3 million articles. Each article includes the date GDELT saw it, the title of the article, the social media sharing image if provided (news outlets can specify an image to be shown for the article when it is shared online) and the URL of the article.

Most importantly, for each match, a short snippet is shown that shows the first instance of one of the climate change phrases above in the article with the 100 characters before and after the appearance, truncated to the nearest word (if the 100th character before or after the phrase appears in the middle of a word, the window will be shrunk to the closest full word). Note that in the majority of cases the first match in the article is selected, but sometimes due to the nature of the finite automaton used to generate the snippets, a later match may be chosen from the article if it allows for a larger context window under certain circumstances.

Using a window of 100 characters before and after the match allows for brief non-consumptive snippets that show the context of the match and allow a better understanding of whether the article's mention of climate change was a cursory mention or central to the story and the argument, evidence  and context of the narrative within.

We're enormously excited to see what you're able to do with this incredible new dataset, from tracing the narratives of the global climate change debate to exploring its intersection with misinformation/disinformation and digital falsehoods.

Note that these files are encoded in UTF8, but some spreadsheet software, including Microsoft Excel can encounter problems loading accented and loanword characters.