GDELT today encompasses more than 3 trillion datapoints spanning more than 200 years across 152 languages. Its datasets span text, imagery and video, enabling fundamentally new kinds of multimodal analyses covering nearly 200 billion frontpage links alone, 100 billion AI-annotated words, a billion textual articles, half a billion images totaling a quarter-trillion pixels and ten years of television news broadcast annotations, among myriad others, while posing novel methodological challenges from blending structured visual and unstructured textual understandings to translating from nanosecond frame-level machine precision to the coarse human airtime metrics of traditional content analysis. How exactly does one derive meaning from datasets totaling hundreds of billions of datapoints each and what do typical GDELT analytic workflows look like, from visual dashboards through advanced and even frontier cloud pipelines?
This new presentation will shortly be available and will walk through GDELT’s complete range of datasets, laying out examples of best practices and workflows and how we produce many of our own analyses each day.