We’re tremendously excited to announce that we will be debuting in January a paired keynote and half day workshop covering everything from using GDELT to explore the world to a behind the scenes look at what it looks like to mine the media at planetary scale using the modern cloud. The two are perfectly paired, with the keynote designed to appeal to non-technical and technical audiences alike from all disciplines and backgrounds, showcasing the incredible new insights we can gain into the world around us through data. The workshop offers a deep dive into all of the visualizations, analyses and insights from the keynote and more, showcasing both how to use GDELT and offering a broader look at how to perform these kinds of massive scale media analyses leveraging all of the capabilities of the modern cloud. Like its paired keynote, the workshop covers a tremendous range of topics, from a tutorial on using GDELT’s non-technical tools on through deeply technical examinations of how some of its largest explorations are performed. Please contact Kalev directly at email@example.com for details.
KEYNOTE: Looking Back On 20 Years Of Data Mining The Web And Forward To Our AI-Powered Online Future
What happens when massive computing power brings together an ever-growing cross-section of the world’s information in realtime, from news media to social media, books to academic literature, the world’s libraries to the web itself, mass machine translates all of it from more than 100 languages and transforms this immense record of humanity into a living global catalog of our planet, connecting the world’s information into a single massive ever-evolving realtime network that allows us to peer into the very soul of global society? Today the GDELT Project (https://www.gdeltproject.org/) is one of the largest open datasets for understanding human society, totaling more than 3.2 trillion datapoints spanning 200 years and has become a global standard used by humanitarians, NGOs, scholars, journalists and even ordinary citizens to make sense of our chaotic and rapidly evolving world. From disaster response to countering wildlife crime, epidemic early warning to food security, estimating realtime global risk to mapping the global flow of ideas and narratives, GDELT explores how we can use data to form bridges that can help build empathy and expand our own limited horizons, breaking down linguistic, geographic and cultural barriers to let us see the world through the eyes of others and even forecast the future, capturing the realtime heartbeat of the planet we call home. From mining thousands of web pages on a single small server 23 years ago to exploring our humanity through trillions of datapoints spanning data centers in 12 countries today, we’ll take a journey through what its been like to conduct web-scale research over the past two decades, from the days when Mosaic ruled the web to today’s globalized cloud and what we’ve learned from all those studies about what makes us human. Along the way we’ll look at how traditional machine learning and statistical models transforming billions of news articles into hundreds of millions of human events, tens of billions of hyperlinks and trillions of knowledge graph entries have been joined by deep learning approaches capable of translating half a billion images totaling a quarter-trillion pixels into 300 billion datapoints recording the objects, activities, locations, words and emotions through which we see the world around us. The ability of the emerging world of deep learning to lend structure to content that has never before been computationally explorable, on through systems capable of asking questions of our data and understanding its deeper patterns entirely on their own, we are reaching a world in which the web is increasingly becoming accessible in ways we couldn’t dream even a few years ago. Here’s what it looks like to conduct data analytics at a truly planetary scale and the incredible new insights we gain about the daily heartbeat of our global world and how our AI powered online future will help us make sense of our world in ways we could never have imagined.
WORKSHOP: Mining The Media At Planetary Scale: Exploring Our Global World Using The Modern Cloud
What can many tens of billions of hyperlinks, tens of billions of words of academic literature, billions of tweets, two billion news articles, half a billion photographs, millions of books and more than a million hours of television teach us about the world we live in, from global events to the textual and visual narratives through which we see our shared planet? How can analytic tools from mass machine translation to thousand-dimension sentiment mining, textual and visual geocoding to event, narrative and relationship extraction allow us to explore content in non-traditional ways? How can deep learning approaches allow us to move beyond text to examine our increasingly visual online world? Most importantly, once we’ve used these techniques to translate a pile of text or images into trillions of data points, how do we in turn transform those numbers into analyses and visualizations and ultimately into insights and findings? From the unexpected power of the creatively applied keyword search through the capability of tools like Google’s BigQuery to uncover patterns from the chaos of petabytes, how can we leverage the capabilities of the modern cloud for workflows from the basic through the pioneering?
This workshop will examine datasets, tools and workflows for understanding our world through the eyes of the news media, from simple turnkey tools that require no technical experience on through advanced workflows that harness the full capabilities of the modern cloud, offering a behind-the-scenes look at many of the analyses from the keynote. Most of the examples will focus on GDELT’s news-related datasets, APIs and tools, but we’ll also look at how we’ve analyzed large Twitter datasets (including the Decahose), half a century of academic literature (and the issues of normalizing across thousands of publishers), books (including transforming 600 million pages of books spanning 500 years into one of the world’s most unique art galleries) and television (including how to make TV “searchable”), covering questions from a wide array of disciplines. Instead of code samples, the workshop will focus on the higher order questions of how to map complex questions onto massive datasets in creative and efficient ways that leverage the unique capabilities and characteristics of the public cloud, focusing on Google Cloud Platform. From simple exploratory analyses like comparing the dueling worlds of CNN and FOX, through aspirational questions at the heart of what makes us human like creating a map of global happiness, towards practical applications like asking whether we can forecast the economic and political stability of governments, there will be something for everyone here.
Attendees will come away with a deeper understanding of:
- How to conduct large-scale analyses and mining of media data with a special emphasis on news content, but covering examples from social media, academic literature, books and television, among others.
- How to translate complex questions into computational workflows ranging from simple keyword searches through deep learning approaches.
- How to expand traditional textual assessment to incorporate visual analysis using cloud deep learning tools like Google’s Cloud Vision API, Cloud Speech API and Cloud Video API.
- How to think about how methodologies like machine translation, sentiment analysis, geocoding, narrative coding, event and relation extraction and image codification can be used in creative ways to answer complex questions, as well as the nuances in applying them to diverse global content.
- Lessons learned and from-the-trenches insights from applying analytic workflows at massive scale from the technical to the methodological and especially the nuances of working with global content across so many languages, sources, geographic resolutions and non-traditional modalities.
- Approaches to managing, versioning and accessing datasets spanning trillions of data points where the underlying computational pipelines, taxonomies and inputs are constantly evolving.
- Making massive research datasets available to global and highly diverse user communities ranging from advanced data scientists with existing mass computing resources through ordinary citizens using limited bandwidth mobile devices, creating interfaces for users including researchers, journalists, humanitarians on the ground, policymakers and citizens and the underlying technical architectures and design decisions required from simply posting terabyte JSON files in cloud storage to custom cloud-based analytic websites to tools like Google’s BigQuery platform.