The mission of Google Ideas is to “explore how technology can enable people to confront threats in the face of conflict, instability or repression … connect[ing] users, experts and engineers to conduct research and seed new technology-driven initiatives.” Through the support of Google Ideas, GDELT has an incredible new home on Google Cloud Platform, running in an enterprise-class data center that powers Google itself.
GDELT’s news monitoring and computational pipelines reside on Google’s Compute Engine virtual machine platform. Google’s unmatched global network speeds and the ability to spin up new disks, CPUs, and memory on demand have been instrumental in GDELT’s transition from daily updates to shortly updating every 15 minutes (more on this in an announcement coming in the next few weeks). The GDELT Analysis Service is housed in an 8-core 64GB-RAM virtual machine that gives it the processing power and memory necessary to run highly sophisticated analyses, such as hosted Gephi-based network analysis and visualization.
The complete 100GB GDELT historical archive is stored in Google Cloud Storage, allowing the entire archive to be downloaded in just minutes. For the vast majority of GDELT’s user community, download bandwidth is now limited only by the speed of their own internet connection.
The GDELT Analysis Service uses Cloud Storage to deliver all of its analytical results and visualizations. Even with a high-bandwidth virtual machine, serving up web-based visualizations and large data files that may be shared widely or go viral would ordinarily require load balancing and careful monitoring of bandwidth. Instead, publishing analysis results via Google Cloud Storage means the GDELT Analysis Service can just copy files to the cloud and move on to the next analysis request – even if a file is shared widely, Google Cloud Storage handles the surge in downloads transparently. In essence, Google Cloud Storage acts as the ultimate storage reflector – a single virtual machine can upload files to be downloaded at effectively unlimited speeds by effectively unlimited numbers of people. This is truly transformative.
Google Cloud Storage even allows you to alias your cloud storage under your own domain name and make your files available as if you were hosting them on your own web server. In GDELT’s case, this allowed us to create “data.gdeltproject.org” to serve up all of GDELT’s large data files directly from Google Cloud Storage.
To cap it all off, the complete quarter-billion-record GDELT Event Database is now hosted as a public dataset in Google BigQuery, making it possible to query, interact, explore, and analyze the entire dataset in near-realtime. BigQuery takes care of all of the data management and system administration tasks transparently, allowing us to focus on building GDELT, not running a cluster of database servers. You can query GDELT using regular expressions, Pearson correlations, and complex logic, and the BigQuery platform handles it all in stride, delivering your results in just a few seconds!
The incredible power of Google Cloud Platform is opening the door to so many new possibilities and heralds a whole new chapter in GDELT’s evolution, from the new GDELT Analysis Service’s instant visualizations to GDELT’s forthcoming transition to 15 minute updates, to some incredible new extensions of GDELT that will be announced shortly. We are tremendously indebted to Google Ideas for their support of GDELT and are so excited to leverage the phenomenal power of Google Cloud Platform to allow GDELT to take the next pioneering steps towards computing on the entire planet!
Its going to be an incredibly exciting 2014!