Scaling GDELT For A New Era: Moving To Daemon Proxies For BigTable & GCS Using Agentic Gemini

Kalev Leetaru

3 weeks ago

Beneath its simple public APIs, datasets and interfaces, GDELT is powered by a massive globally distributed infrastructure that depends on BigTable and GCS for global storage and coordination. As GDELT's original monolithic pipelines have gradually been distributed into microservices, a growing bottleneck has been the CPU and latency of communication with these two GCP services in our legacy architecture. For resilience and scalability, microservices are globally distributed and completely isolated, with dozens running per VM across multiple programming languages and runtime frameworks. For GCS, its RESTful JSON API has allowed us to maintain reasonable performance even as our operations per second have increased enormously, but the SSL RESTful communications overhead has increasingly eaten into the available CPU cycles on each VM. In contrast, the overhead of dozens upon dozens of microservices and hundreds of batch processes communicating with BigTable, many using utility proxies to provide BigTable access to languages lacking client libraries, has become a major resource drain on VMs, consuming in some cases a very large portion of available CPU time.

To address this, we used agentic Gemini 3.1 Pro to develop two new daemon proxies that centralize all BigTable and GCS traffic for each VM. Written in Go, these daemons were designed to be relentlessly efficient, robust and scalable. Each GCE VM runs copies of these two daemons and all microservices and other processes on that VM that communicate with BigTable or GCS now does so via these daemon proxies using localhost HTTP. Rather than stream results via HTTP, all input and output are written to local RAMdisk files and those filenames passed to the daemon. For example, to write a batch of records to BigTable, a JSON request to the daemon might look like {"mode": "write", "inFile": "/dev/shm/tmp/recstowrite.jsonl", "table": "digtwin"} (one BigTable daemon is run per BigTable instance and threading is configurable at the daemon level). Given that all major programming languages natively support local HTTP requests, we can now trivially expand BigTable and GCS client support to all of our processes natively, allowing them to leverage the full optimizations of GCP's native BigTable and GCS client libraries.

We are already seeing massive scalability and efficiency gains from these new daemons, as now a single connection pool can be shared across the entire VM and communication is via gRPC and long-lived connections rather than stateless RESTful APIs and single-use utilities.

Most powerfully, agentic Gemini 3 Pro was used to write benchmark daemons in each of the languages with supported client libraries, with the Go-based daemon coming out as the ultimate winner in balancing raw performance, robustness and long-term stability and resource efficiency.

In fact, this agentic development workflow has been so successful that we are now transitioning our streaming GCS writers to a purpose-built Go utility that is showing a 90% reduction in CPU overhead and 60x+ reduction in clocktime for large streaming GCS writes on heavily loaded VMs compared with our previous architecture!