Kalev spoke at the Brookings Institution this afternoon, surveying GDELT's various datasets and especially its new collaborations with the Internet Archive's TV News Archive around global television as data:
The GDELT Project is one of the largest open datasets for understanding global human society, totaling more than 8 trillion datapoints spanning 200 years in 152 languages. From mapping global conflict and modeling global narratives to providing the data behind one of the earliest alerts of the COVID-19 pandemic, from disaster response to countering wildlife crime, epidemic early warning to food security, estimating realtime global risk to mapping the global flow of ideas and narratives, GDELT explores how we can use data to let us see the world through the eyes of others and even forecast the future, capturing the realtime heartbeat of the planet we call home.
A new collaboration with the Internet Archive’s TV News Archive is exploring how scholars and journalists can better understand and visualize the Archive’s extraordinary public interest library of global television news spanning more than 5.2 million broadcasts totaling 3.4 million hours of airtime from over 100 channels representing 50 countries in 35 languages over 20 years on 5 continents.
How can treating television news as data create fundamentally new kinds of opportunities to conduct at-scale computational analysis of the global narrative landscape and the creation of new kinds of search and analytic tools to render the traditionally impenetrable linear format of video into a rich source of insights on human society? How can AI tools like OCR, object detection, embeddings, language understanding, knowledge graphs, transcription, translation and visual search make it possible to search television news in powerful new ways aligned with the needs of journalists, fact checkers and scholars? How can such tools help connect television news to social media and online and radio news, allowing narratives to be traced as they move across the media ecosystem and even help visualize them at scale? How can video be made “skimmable” and the hundreds of terabytes of annotations from such tools turned into actionable insights and tools useable by everyone from data scientists to journalists, fact checkers and even ordinary citizens?
The Archive’s new live Belarusian, Iranian, Russian and Ukrainian television news archives offer a fundamentally new approach to scholarship and investigative journalism on ongoing conflicts, especially domestic governmental narratives. New experiments with AI tools from ChatGPT to Whisper to CLIP showcase both the future potential and very real present limitations of these tools, from hallucination to domain mismatches.
Through an ever-growing landscape of non-consumptive algorithms, datasets and interfaces, from keyword search of closed captioning, automated transcription and translation, and OCR’d onscreen text, to experiments with cutting-edge visual search from logo detection to object detection and natural language descriptive search, to 3 billion “visual ngrams” totaling 1 quadrillion pixels, to making television “skimmable” through the Visual Explorer’s unique user interface, we are exploring how millions of hours of television can become unprecedented insights into the heartbeat of Planet Earth.