GDELT 3.0: Unified Global Crawler Fleet

Historically, GDELT has operated multiple parallel crawler fleets distributed across Google Cloud Platform's global datacenters, each dedicated to a particular service, such as core web monitoring, the Global Difference Graph, the Visual Global Knowledge Graph and so on. Having independent parallel fleets made fleet management, code deployment and monitoring easier, since systems were isolated and independent.

At the same time, this led to inefficiencies since the fleets were unable to coordinate on common tasks and share and redistribute load during peak conditions or new deployments, while the different deployment systems made synchronized updates more difficult.

GDELT 3.0 uses a unified global crawler fleet in which all nodes are able to perform all tasks, with geographic, services and topology considerations allowing the global management layers to dynamically allocate and layer individual crawling subsystems across the fleet, moving them dynamically across nodes as needed. Lifecycle management, deployment oversight and tracing and network placement are now coordinated by a management hypervisor local to each node that coordinates with global control and data planes, dispatch and storage servers and a global storage fabric layered on top of Google Cloud Storage.

We'll be talking more about this new architecture in the coming weeks, which enables an incredible wealth of new capabilities we'll be debuting soon!