The GDELT Project

Lessons Learned From Planetary Scale News Crawling

The Datasets Of GDELT

Running a global crawling and processing infrastructure that monitors news outlets in nearly every country in over 65 languages is an immense undertaking involving an incredible number of moving parts that teaches us a tremendous amount each day about the technical underpinnings of the global news landscape. Few open data projects operate at the scale GDELT does and we get a lot of interest in the lessons we’ve learned building GDELT. We hope to begin publishing a regular series here on the GDELT Blog summarizing some of the experiences we’ve found most interesting, lessons we think will be most useful to others, unusual behavior we’ve observed, trends we’re seeing, why we do things the way we do and other advice we think may be useful to the broader community.

As we gear up for the debut of GDELT 3, we thought it would be useful to summarize a few highlights of some of the lessons learned that have heavily informed its evolution:

 

For this inaugural post we listed just a few highlights from the vast body of lessons learned that have driven the networking architecture of GDELT 3’s crawling fleet. As we get this series off the ground we’ll be posting a lot more with much more detail that we hope others will find of great use!