The GDELT Project

GCP Tips & Tricks: Using The Cloud Monitoring API To Track AI API Usage In Realtime

Yesterday we discussed how we massively optimized our archive-scale OCR throughput by splitting montaging and OCR workloads. When working at the "archive scale" of GDELT, even the largest hyperscale API quotas are insufficient to process immense archives under tight time pressure, requiring constant supervision of global fleets of clusters to constantly rebalance API submission rates, with constant realtime dynamic adjustments that react to the exact conditions of the entire processing environment second by second. With workloads that span the entire planet and every available region, how can coordination and orchestration systems precisely, dynamically and reactively stage manage these global fleets? Enter the GCP Cloud Monitoring API.

The brief Perl script below offers a glimpse of a trivial monitoring script designed to be run on a GCE VM that uses the Cloud Monitoring API to request a per-minute rollup of total API requests to the GCP Cloud Vision API over a rolling window of the past hour. The actual response further breaks this down by API region, but here we simply sum into a global total, updated in near-realtime. Monitoring interfaces like this allow our global fleet orchestrators to dynamically spin up and down our OCR fleet in response to API latency, availability, quota exhaustion, transient errors and workload changes.

#!/usr/bin/perl

use JSON::XS;

$ret = `curl -H "Authorization: Bearer \$(curl -s "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google" | jq -r '.access_token')" "https://monitoring.googleapis.com/v3/projects/[YOURPROJECTID]/timeSeries?filter=metric.type=%22serviceruntime.googleapis.com/api/request_count%22%20AND%20resource.type=%22consumed_api%22%20AND%20resource.labels.service=%22vision.googleapis.com%22&interval.endTime=\$(date -u +%Y-%m-%dT%H:%M:%SZ)&interval.startTime=\$(date -u -d '60 min ago' +%Y-%m-%dT%H:%M:%SZ)"  | jq -r '{timeseries: [.timeSeries[].points[]]}'`;
$ref = decode_json($ret);
foreach (@{$ref->{'timeseries'}}) {
    ($timestamp) = $_->{'interval'}->{'startTime'}=~/(\d\d\d\d\-\d\d\-\d\dT\d\d:\d\d)/; #$timestamp=~s/[\-:T]//;
    $TIMELINE{$timestamp} += $_->{'value'}->{'int64Value'} + 0;
}

foreach (sort keys %TIMELINE) {
    print "\t$_ => $TIMELINE{$_}\n";
}