The GDELT Project

Behind The Scenes: Decoupling Ingest From Processing

Always remember that once data is ingested into the cloud, it can be made available anywhere at any scale. Thus, ingest should be almost always be architecturally decoupled from processing when constructing cloud-scale infrastructure (absent cost and regulatory considerations). For GDELT itself, we ingest streams from our own globally distributed crawler fleets and partners all across the world via the data center nearest to them to maximize ingest rates and process it globally, taking advantage of specific hardware platforms and APIs available in each region worldwide, completely decoupling ingest from processing.

This has a number of benefits: