Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Track Historical Backfilling Progress

With our new BigQuery + Bigtable digital twin over our GCS archive, we can trivially compile ongoing inventories of our…

Continue Reading

Experiments With CCExtractor Using Our BigQuery + Bigtable + GCS Digital Twin

In December 2020 we unveiled a massive new initiative in collaboration with the Internet Archive's TV News Archive to catalog…

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Make Date-Based Random Samples For Content Analysis & Testing

A key concept in "content analysis" methodologies over large temporally diverse archives is the notion of time-based random samples: creating…

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Identify Missing Channels

One of the most powerful aspects of our BigQuery-analyzable Bigtable-based GCS digital twin is the capability it makes possible to…

Continue Reading

Using Our BigQuery + Bigtable + GCS Digital Twin To Map The Status & Error Codes Of Analyzing A Quarter-Century Of The TV News Archive

Making it possible for us to perform archive-scale analyses over the massive Internet Archive TV News Archive lies a powerful…

Continue Reading

Plotting Cumulative Archival Growth Using Our BigQuery + Bigtable + GCS Digital Twin

On Monday, we explored how BigQuery can be combined with Bigtable to create a digital twin over a vast GCS…

Continue Reading

Using BigQuery's Bigtable Connector To Analyze A Petabyte GCS Archive Digital Twin

Powering the TV, TV AI and Visual Explorers is a petabyte-scale GCS archive consisting of hundreds of millions of discrete…

Continue Reading

Experiments With Generative Coding: Modernizing Legacy BigQuery Code & CodeGen Guardrails

Modern generative coding systems have garnered immense hype, frequently presented as drop-in replacements for human coders. Yet, the majority of…

Continue Reading

Tracking Infections, Death & Vaccination Over The Covid-19 Pandemic Using NGrams & BigQuery

How can the Web News NGrams 3.0 dataset be used to extract and track trends in numeric quantities? For example,…

Continue Reading

Creating A Daily Global Shortage Timeline Using Web NGrams 3.0 & BigQuery In One SQL Query

Earlier today we showed how to use Web NGrams 3.0 and BigQuery to track mentions of "shortages of" across English…

Continue Reading

Using Web NGrams 3.0 & BigQuery To Track "Shortages of …"

We've published a growing collection of tutorials on how to use the Web News NGrams 3.0 dataset for a range…

Continue Reading

Commodities & Financial Early Warning Using Web NGrams + GCP Timeseries Insights API + Translate + BigQuery

On Friday, we combined GDELT's Web NGrams 3.0 dataset with GCP's Timeseries Insights API, Translate API and BigQuery to create…

Continue Reading

Monkeypox & Disease Early Warning: Planetary-Scale Anomaly Detection With Web NGrams + GCP Timeseries Insights API + Translate + BigQuery

From capturing the first flickers of 2014's Ebola outbreak to powering one of the earliest alerts of the Covid-19 pandemic,…

Continue Reading

Timeseries Insights API + BigQuery + Translate + Web NGrams = Monkeypox Early Warning Demo Coming This Week

Stay tuned for a really exciting new demo coming later this week using the GCP Timeseries Insights API, BigQuery, Google…

Continue Reading

Performing At-Scale Entity Extraction Over The News Using BigQuery UDFs & Web NGrams 3.0

Earlier this week we showed how to write a simple Perl script to download the latest Web NGrams 3.0 dataset…

Continue Reading

Computing Quadgrams At BigQuery Scale Through ML.NGRAMS

Many questions in computational linguistics require the computation of character sequences over vast corpora, requiring strong scalability and robust distributed…

Continue Reading

Experiments With Machine Translation: KWIC Through BigQuery's ML.NGRAMS

As we carefully construct the training and test corpi for our machine translation models, one tool we rely heavily upon…

Continue Reading

Experiments With Machine Translation: From RAM Disks To BigQuery

At the core of all machine translation systems lie data. Vast archives of monolingual and bilingual training and testing data…

Continue Reading

GSG Embeddings + GKG + BigQuery + Tensorflow Embedding Projector = Visualizing The Covid-19 Vaccine News Landscape

What would it look like to visualize a day of worldwide online news coverage about a given topic, using document-level…

Continue Reading

Global Similarity Graph Document Embeddings & BigQuery UDFs: Semantic Multilingual Search Over The News

The new Global Similarity Graph Document Embeddings dataset uses the Universal Sentence Encoder V4 to compute document-level embeddings for each news…

Continue Reading

Using Global Similarity Graph Document Embeddings & BigQuery For "More Like This" Search: Cross Language Search

Earlier today we showed how the new Global Similarity Graph Document Embeddings dataset can be used to take an arbitrary…

Continue Reading

Using Global Similarity Graph Document Embeddings & BigQuery For "More Like This" Search

Earlier today we announced the new Global Similarity Graph Document Embeddings dataset that uses the Universal Sentence Encoder V4 to…

Continue Reading

Using BigQuery's UNNEST To Unroll Count-Based Datasets

Some applications like Google's Timeseries Insights API require that count-based datasets be unrolled since they examine discrete events. For example,…

Continue Reading

A Daily Timeline Of Key Vaccine Topics In 2021 Through A TF-IDF BigQuery Analysis Of The Global Relationship Graph

What are the most significant words and phrases associated with vaccines by day thus far this year? To explore this…