Fact Checking Television: Using Universal Sentence Encoder Embeddings To Scan News For Fact Check Claims At Scale

Earlier today we unveiled the Global Similarity Graph Television News Sentence Embeddings, a massive new dataset of 189 million sentence-level Universal Sentence Encoder embeddings over television news, covering BBC News London (2017-present), CNN, MSNBC, Fox News and the ABC/CBS/NBC evening news broadcasts spanning more than a decade. How might we use this immense dataset to scan television news for known fact check claims?

Today you can use the Television Explorer to keyword search the closed captioning of these stations and selections from more than 150 others for exact keyword matches. For example, to find references to vaccines and microchips together, you can search for "(vaccine OR vaccines) (microchip OR microchips)". However, this will only return captioning clips that contain your exact keywords. A mention of "semiconductor tracking" in vaccines or "chipped vaccines" won't be returned. Moreover, using keyword searches to identify references to fact check claims requires distilling each fact check down to a set of representative keywords that fully encapsulate it, which may be difficult for more complex claims. Put another way, a typical fact check will summarize the claim it is investigating in a sentence or a few sentences of text – to keyword search for this claim requires taking those sentences of text and reducing them to a handful of searchable keywords.

In contrast, our embeddings dataset represents each sentence of closed captioning as an immutable 512-dimension vector that attempts to represent its topical focus. To identify references to a known fact check claim, we can simply convert  the entire sentence- or paragraph-long fact check summary verbatim into an embedding and compare its cosine similarity against every one of the sentence embeddings in our dataset to identify potential references to it. A production application would likely use locality hashing or other similar approximate nearest neighbor (ANN) methods to avoid having to perform 189 million brute-force similarity comparisons, but for the purposes of this simple demonstration, we're going to use BigQuery to do a simple brute-force comparison.

Let's start with this Quick Take summary from a FactCheck.org fact check relating to the false claim that Covid-19 vaccines embed microchips in recipients for tracking:

  • "A video circulating on social media falsely claims that vaccines for COVID-19 have a microchip that “tracks the location of the patient.” The chip, which is not currently in use, would be attached to the end of a plastic vial and provide information only about the vaccine dose. It cannot track people." (FactCheck.org)

How could we search television news for mentions of this claim?

A traditional approach would be to take these three sentences and attempt to distill them down to a set of searchable keywords using statistical information about word usage like TFIDF distributions to identify "statistically significant phrases." In this case that approach might yield a collection of statistically significant words and phrases like "video, social media, vaccines, COVID-19, microchip, tracks … location, location … patient, chip, plastic vial, vaccine dose, track people." Performing a giant "AND" search for all of these keywords yields no results, so you would have to search for them in combination. The problem is that not all of these keywords/phrases are related to the central claim of tracking. A search for "vaccine + plastic vial" might yield a number of results, but those are not central to the claim of microchip-based tracking. Similarity, a claim of "social media + track people" is not related to the core claim. More advanced NLP techniques could help narrow which phrases are most central to the claim, but even those approaches may not be able to completely distill down the claim into a set of searchable keywords.

Enter the power of embeddings.

First, we take the three-sentence quick take summary above and convert it as-is into an embedding in the same Universal Sentence Encoder vector space. To do so, create a free new Colab notebook and run the following code:

#load libraries...
import tensorflow_hub as hub
import tensorflow as tf
!pip install tensorflow_text
import tensorflow_text as text # Needed for loading universal-sentence-encoder-cmlm/multilingual-preprocess
import numpy as np

#normalize...
def normalization(embeds):
norms = np.linalg.norm(embeds, 2, axis=1, keepdims=True)
return embeds/norms

sent = tf.constant(["A video circulating on social media falsely claims that vaccines for COVID-19 have a microchip that “tracks the location of the patient.” The chip, which is not currently in use, would be attached to the end of a plastic vial and provide information only about the vaccine dose. It cannot track people."])
embed_use = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")
sente = embed_use(sent)
sente = normalization(sente)
print(repr(sente))

It loads the necessary libraries and converts the verbatim three-sentence fact check claim summary as-is into the USE vector representation, printing a 512-dimension array:

<tf.Tensor: shape=(1, 512), dtype=float32, numpy= array([[-6.76356838e-04, 3.35783325e-02, -6.67930841e-02, -7.14375153e-02, -3.08108889e-02, 1.07741086e-02, 2.11005341e-02, 6.49546608e-02, 4.30178866e-02, -6.82501197e-02, 8.06940496e-02, 4.91752811e-02, 6.29389361e-02, 1.66880507e-02, -5.67101641e-03, -3.70305106e-02, -7.97090977e-02, -7.23296637e-03, -7.27019385e-02, 3.61410975e-02, -1.81389842e-02, 3.56576918e-03, -7.38827288e-02, 4.35053706e-02, -3.35104694e-03, 6.37104064e-02, 1.92584172e-02, -5.36565781e-02, 3.81646678e-02, 3.99802737e-02, -5.32769971e-02, 8.15957189e-02, -2.47750692e-02, 4.34365049e-02, 7.18429685e-02, 7.98831061e-02, -8.14278647e-02, 7.30962753e-02, -3.90970185e-02, -2.42321957e-02, 8.85859481e-04, 1.59769729e-02, 1.73619445e-02, -6.09335937e-02, -5.77191003e-02, -5.14351763e-03, 4.95849364e-02, 5.45341708e-02, 3.67251299e-02, 5.23556210e-03, -7.26118013e-02, -1.82263218e-02, 2.83526741e-02, 7.12847263e-02, -7.51046464e-02, -3.59605625e-02, -4.63198870e-02, -5.76271117e-02, 3.94778773e-02, -7.19692186e-02, -1.35769062e-02, 6.82483837e-02, -3.86933982e-02, -4.68094014e-02, 3.57298478e-02, -6.87625632e-02, 3.24299969e-02, 6.28880039e-02, -7.18246251e-02, 3.15887854e-02, -5.05154385e-05, 4.15558480e-02, -5.05858241e-03, -2.23924946e-02, 6.13835901e-02, -3.54572162e-02, 4.43822816e-02, -4.83701378e-02, -5.33702224e-02, 4.93556038e-02, -8.73594370e-04, -7.18877092e-02, -2.08738167e-02, -2.28899792e-02, 1.76748831e-03, -3.10755782e-02, 1.93924904e-02, 5.02826758e-02, 4.10971418e-03, -5.37503473e-02, 5.57333715e-02, 2.84079444e-02, -1.86564196e-02, 2.70265304e-02, -5.42857102e-04, 8.57702456e-03, 6.09128438e-02, -5.45500219e-02, -4.38415771e-03, -4.50687436e-03, -5.93304411e-02, 5.69677241e-02, -5.57250157e-02, -5.81391938e-02, 6.88386261e-02, 7.11042359e-02, 6.56928271e-02, 1.66122485e-02, 6.30950481e-02, -4.72512022e-02, -6.91004544e-02, 1.08533464e-02, 3.22929490e-03, -9.46690096e-04, 6.15085438e-02, -4.10078615e-02, -1.52225317e-02, 7.38443527e-03, -2.84125120e-03, 3.94875668e-02, -5.78187183e-02, -3.46644572e-03, 1.43317459e-02, -5.20273345e-03, -4.29149270e-02, -5.53261004e-02, -1.87431239e-02, -4.35798094e-02, -1.23266215e-02, 6.26963973e-02, 6.76322281e-02, 8.16316083e-02, 2.64641959e-02, -3.53941061e-02, -5.05655743e-02, -3.18637267e-02, -1.14524448e-02, 7.41548762e-02, -6.11535534e-02, -6.55739233e-02, 3.84859666e-02, 6.66911826e-02, -7.20713809e-02, 2.92808632e-03, 5.62359951e-02, 9.51226894e-03, 4.56910357e-02, -7.51297697e-02, 6.90657347e-02, 4.21055108e-02, -2.59263385e-02, 9.81741399e-03, 2.94160913e-03, -5.97751215e-02, 5.50655387e-02, 5.55065367e-03, 4.92441691e-02, 5.39373793e-02, 2.63481354e-03, 5.61112165e-02, -1.34488298e-02, 4.05395217e-02, -5.48825506e-03, -6.58439845e-02, -2.58194543e-02, -1.58555657e-02, 2.34762882e-03, -1.25716645e-02, -4.26669009e-02, 1.60640255e-02, 5.15688658e-02, -7.21045434e-02, -1.73300430e-02, -2.93727703e-02, 1.51059516e-02, 3.61871794e-02, -1.12395249e-02, 2.94390563e-02, 2.10165903e-02, -2.14187857e-02, -6.51167110e-02, -1.82202216e-02, 2.65326332e-02, 6.87541533e-03, 7.72116631e-02, -3.04351989e-02, -8.53389502e-03, 5.89442579e-03, 2.55114846e-02, -3.72634716e-02, 1.35741597e-02, -2.87334360e-02, 2.89946962e-02, -1.76235684e-03, 4.42778356e-02, 4.59584370e-02, -2.07500421e-02, -1.66762974e-02, 3.10814641e-02, -4.84270193e-02, 3.69234122e-02, 1.60416390e-03, 1.90665293e-02, -2.38827430e-02, 2.87359916e-02, -7.04021528e-02, 5.11950590e-02, 2.37841941e-02, 2.78002135e-02, 6.73737004e-02, 1.23101231e-02, -2.61966907e-03, 5.78176118e-02, 3.68967131e-02, -3.38435126e-03, 3.38038430e-02, 7.10640773e-02, -4.38991282e-03, -2.46944465e-03, 6.89607188e-02, -2.10304111e-02, 1.74455028e-02, 4.72253896e-02, 7.55667314e-02, 4.17576358e-02, 5.06599247e-02, -4.47778143e-02, -2.82419380e-02, -5.05971462e-02, -2.29050219e-02, -6.34119362e-02, -2.48881299e-02, -2.07891967e-02, -8.11395496e-02, -7.57279713e-03, 1.01827094e-02, 2.36734990e-02, -8.92031100e-03, -2.52159638e-03, -4.42645922e-02, 2.72163115e-02, -4.16662544e-02, 6.28380328e-02, -6.46699443e-02, 4.98163030e-02, 2.40474264e-03, -4.35062796e-02, 9.96236573e-04, 3.21699865e-03, -7.41114616e-02, 5.10127423e-03, 5.91132231e-03, -2.09343582e-02, 5.36168702e-02, 6.32562339e-02, -1.18424380e-02, -5.33314049e-02, 8.15369189e-02, -3.56774591e-02, -3.64913158e-02, -2.39417814e-02, -1.68477502e-02, 4.09753472e-02, -5.00184186e-02, -3.02095693e-02, -6.65327683e-02, 6.97435886e-02, 6.97659627e-02, 2.44959928e-02, -7.88502675e-03, -1.70990340e-02, -3.60420384e-02, -1.89642422e-02, -7.21183345e-02, -6.83112964e-02, 5.45631945e-02, 5.56440577e-02, -6.96792156e-02, 5.17817736e-02, -5.04019484e-03, -7.98536614e-02, -6.72034398e-02, 3.57697830e-02, 7.33052269e-02, -6.80490360e-02, -6.17038347e-02, -7.66119808e-02, 5.73239997e-02, -1.82283260e-02, -3.99673171e-02, 5.84224798e-02, -7.66021237e-02, 6.21817261e-02, -2.64632311e-02, -2.70551220e-02, -2.09122039e-02, 4.49912027e-02, 5.27962260e-02, -1.61876865e-02, 2.99768839e-02, -3.36280465e-02, -1.51605671e-03, -2.47574896e-02, 5.60405292e-03, 9.68514197e-03, -1.68982614e-02, -7.41703138e-02, 8.06703139e-03, -1.44717349e-02, -1.71089768e-02, 5.54176830e-02, 5.78010641e-02, -1.78876668e-02, -1.29997097e-02, 5.63468151e-02, 7.27057606e-02, -1.10625252e-02, 7.14442134e-03, 2.05701385e-02, 5.05811423e-02, 8.63553584e-03, 5.28230928e-02, 5.43508772e-03, -4.37706057e-03, -1.69210937e-02, 7.57706687e-02, 1.81356780e-02, 4.76261452e-02, 1.06395511e-02, -7.35997260e-02, 6.64972737e-02, 5.22572510e-02, 5.71820438e-02, 2.35127471e-02, 2.56717186e-02, -3.12523060e-02, 3.06110885e-02, -1.37786288e-03, -2.19957810e-02, -3.60827036e-02, 5.08690346e-03, -1.49257006e-02, 7.51743838e-02, 5.96603751e-03, -2.87195779e-02, -6.46486282e-02, -5.29822260e-02, -1.47496245e-03, -7.67807290e-02, 3.60531062e-02, 6.81242943e-02, 2.16352921e-02, -8.55341647e-03, -2.21430194e-02, -8.83253198e-03, 3.59478197e-03, -7.58751631e-02, 5.91248423e-02, 4.42272760e-02, -3.39478180e-02, 4.03287634e-02, -4.57744524e-02, 4.45390902e-02, 6.11837544e-02, -3.16450559e-02, -7.24180341e-02, 2.00887918e-02, -3.19629908e-02, -6.86090300e-03, 3.28799486e-02, -7.06641898e-02, 1.93985011e-02, -3.90757024e-02, -3.66524868e-02, 5.76053411e-02, 6.35175093e-04, 4.37529907e-02, -1.18877320e-02, 6.06463850e-02, -9.88359284e-03, 3.63793671e-02, -7.47962818e-02, -7.39435032e-02, 6.32128567e-02, 6.12870194e-02, -6.58575967e-02, 2.75015971e-03, -4.56172265e-02, 6.30888119e-02, 9.60739609e-03, -5.22800013e-02, -6.43881708e-02, 2.20250804e-02, 1.62106263e-03, -4.56735268e-02, -2.72764824e-02, 1.55690350e-02, -1.04821082e-02, 1.09128319e-02, 4.22615670e-02, 3.59000675e-02, 2.44700797e-02, 1.16510596e-02, -2.50982083e-02, 5.81165403e-02, -2.99764648e-02, 3.09661478e-02, 4.46595661e-02, -5.85869774e-02, -6.54702336e-02, -4.22021002e-02, -1.62350927e-02, -2.27494091e-02, 7.32957125e-02, 7.31576756e-02, -5.63123915e-03, -1.70655418e-02, -1.72184445e-02, 7.96012357e-02, 5.04064001e-02, 1.87336896e-02, 4.66014594e-02, 5.06016947e-02, 2.99742296e-02, 2.36040205e-02, -5.34015521e-02, 1.35052192e-03, 6.80805445e-02, 2.22724825e-02, -3.01939417e-02, -7.31360614e-02, -2.51521859e-02, 4.05842923e-02, 1.60862431e-02, -2.25230474e-02, 2.86010765e-02, 2.29199734e-02, 2.20593587e-02, -5.47554232e-02, 5.78941219e-02, 6.56009391e-02, -7.13857934e-02, -7.48909358e-03, 6.78754002e-02, -2.92496174e-03, -6.95068836e-02, 5.49913049e-02, 3.31136771e-02, -2.13340521e-02, -4.93099019e-02, -1.47924507e-02, -6.91533834e-02, -3.78118120e-02, -6.53230399e-02, -4.87434752e-02, 2.96516586e-02, -7.58804101e-03, -1.24682412e-02, -7.59115368e-02, -3.75635456e-03, 2.93405913e-02, -5.34483194e-02, -1.71132796e-02, 5.20518832e-02, -6.30412847e-02, -5.12841195e-02, -3.42662632e-02, -5.49382344e-02, -6.89640343e-02, 6.04783110e-02, -2.27603670e-02, 6.75819349e-03, 5.91140948e-02, -4.53736931e-02, 3.08122877e-02, -2.23298520e-02, -1.62059460e-02, 4.74171452e-02, -7.03755468e-02, -5.98351210e-02, 4.70205843e-02, -5.80502860e-03, 2.21272446e-02, -7.57156089e-02, 4.97078523e-02, -3.15653495e-02, 4.92160209e-02, -3.86846699e-02, 2.09846185e-03, -5.62710315e-02, -2.08172686e-02, 7.27586523e-02, 3.23538706e-02, -1.12844165e-02, 4.76871207e-02, 7.68466992e-03, -1.23470398e-02, 2.14785617e-02, 2.89252382e-02, 3.02119087e-02, 2.10444834e-02, 2.13446151e-02, -3.27234976e-02, -6.14904426e-02, -6.49609463e-03, -8.24379921e-02, -2.68000960e-02, 7.73761328e-03, -4.13497761e-02, 1.23230414e-02, -6.00103587e-02, 1.27278063e-02]], dtype=float32)>

Now we need to compute the cosine similarity of this vector against all of the sentence-level vectors in the Global Similarity Graph Television News Sentence Embeddings dataset to find potential references to it. Since this particular vaccine falsehood began trending later in 2020, we'll limit ourselves to examining broadcasts that aired from November 1, 2020 to present.

We just copy-paste the vector above into a simple SQL + UDF query in BigQuery which yields:

CREATE TEMPORARY FUNCTION cossim(a ARRAY<FLOAT64>, b ARRAY<FLOAT64>)
RETURNS FLOAT64 LANGUAGE js AS '''
var sumt=0, suma=0, sumb=0;
for(i=0;i<a.length;i++) {
sumt += (a[i]*b[i]);
suma += (a[i]*a[i]);
sumb += (b[i]*b[i]);
}
suma = Math.sqrt(suma);
sumb = Math.sqrt(sumb);
return sumt/(suma*sumb);
''';

WITH query AS (
select [-6.76356838e-04, 3.35783325e-02, -6.67930841e-02,
-7.14375153e-02, -3.08108889e-02, 1.07741086e-02,
2.11005341e-02, 6.49546608e-02, 4.30178866e-02,
-6.82501197e-02, 8.06940496e-02, 4.91752811e-02,
6.29389361e-02, 1.66880507e-02, -5.67101641e-03,
-3.70305106e-02, -7.97090977e-02, -7.23296637e-03,
-7.27019385e-02, 3.61410975e-02, -1.81389842e-02,
3.56576918e-03, -7.38827288e-02, 4.35053706e-02,
-3.35104694e-03, 6.37104064e-02, 1.92584172e-02,
-5.36565781e-02, 3.81646678e-02, 3.99802737e-02,
-5.32769971e-02, 8.15957189e-02, -2.47750692e-02,
4.34365049e-02, 7.18429685e-02, 7.98831061e-02,
-8.14278647e-02, 7.30962753e-02, -3.90970185e-02,
-2.42321957e-02, 8.85859481e-04, 1.59769729e-02,
1.73619445e-02, -6.09335937e-02, -5.77191003e-02,
-5.14351763e-03, 4.95849364e-02, 5.45341708e-02,
3.67251299e-02, 5.23556210e-03, -7.26118013e-02,
-1.82263218e-02, 2.83526741e-02, 7.12847263e-02,
-7.51046464e-02, -3.59605625e-02, -4.63198870e-02,
-5.76271117e-02, 3.94778773e-02, -7.19692186e-02,
-1.35769062e-02, 6.82483837e-02, -3.86933982e-02,
-4.68094014e-02, 3.57298478e-02, -6.87625632e-02,
3.24299969e-02, 6.28880039e-02, -7.18246251e-02,
3.15887854e-02, -5.05154385e-05, 4.15558480e-02,
-5.05858241e-03, -2.23924946e-02, 6.13835901e-02,
-3.54572162e-02, 4.43822816e-02, -4.83701378e-02,
-5.33702224e-02, 4.93556038e-02, -8.73594370e-04,
-7.18877092e-02, -2.08738167e-02, -2.28899792e-02,
1.76748831e-03, -3.10755782e-02, 1.93924904e-02,
5.02826758e-02, 4.10971418e-03, -5.37503473e-02,
5.57333715e-02, 2.84079444e-02, -1.86564196e-02,
2.70265304e-02, -5.42857102e-04, 8.57702456e-03,
6.09128438e-02, -5.45500219e-02, -4.38415771e-03,
-4.50687436e-03, -5.93304411e-02, 5.69677241e-02,
-5.57250157e-02, -5.81391938e-02, 6.88386261e-02,
7.11042359e-02, 6.56928271e-02, 1.66122485e-02,
6.30950481e-02, -4.72512022e-02, -6.91004544e-02,
1.08533464e-02, 3.22929490e-03, -9.46690096e-04,
6.15085438e-02, -4.10078615e-02, -1.52225317e-02,
7.38443527e-03, -2.84125120e-03, 3.94875668e-02,
-5.78187183e-02, -3.46644572e-03, 1.43317459e-02,
-5.20273345e-03, -4.29149270e-02, -5.53261004e-02,
-1.87431239e-02, -4.35798094e-02, -1.23266215e-02,
6.26963973e-02, 6.76322281e-02, 8.16316083e-02,
2.64641959e-02, -3.53941061e-02, -5.05655743e-02,
-3.18637267e-02, -1.14524448e-02, 7.41548762e-02,
-6.11535534e-02, -6.55739233e-02, 3.84859666e-02,
6.66911826e-02, -7.20713809e-02, 2.92808632e-03,
5.62359951e-02, 9.51226894e-03, 4.56910357e-02,
-7.51297697e-02, 6.90657347e-02, 4.21055108e-02,
-2.59263385e-02, 9.81741399e-03, 2.94160913e-03,
-5.97751215e-02, 5.50655387e-02, 5.55065367e-03,
4.92441691e-02, 5.39373793e-02, 2.63481354e-03,
5.61112165e-02, -1.34488298e-02, 4.05395217e-02,
-5.48825506e-03, -6.58439845e-02, -2.58194543e-02,
-1.58555657e-02, 2.34762882e-03, -1.25716645e-02,
-4.26669009e-02, 1.60640255e-02, 5.15688658e-02,
-7.21045434e-02, -1.73300430e-02, -2.93727703e-02,
1.51059516e-02, 3.61871794e-02, -1.12395249e-02,
2.94390563e-02, 2.10165903e-02, -2.14187857e-02,
-6.51167110e-02, -1.82202216e-02, 2.65326332e-02,
6.87541533e-03, 7.72116631e-02, -3.04351989e-02,
-8.53389502e-03, 5.89442579e-03, 2.55114846e-02,
-3.72634716e-02, 1.35741597e-02, -2.87334360e-02,
2.89946962e-02, -1.76235684e-03, 4.42778356e-02,
4.59584370e-02, -2.07500421e-02, -1.66762974e-02,
3.10814641e-02, -4.84270193e-02, 3.69234122e-02,
1.60416390e-03, 1.90665293e-02, -2.38827430e-02,
2.87359916e-02, -7.04021528e-02, 5.11950590e-02,
2.37841941e-02, 2.78002135e-02, 6.73737004e-02,
1.23101231e-02, -2.61966907e-03, 5.78176118e-02,
3.68967131e-02, -3.38435126e-03, 3.38038430e-02,
7.10640773e-02, -4.38991282e-03, -2.46944465e-03,
6.89607188e-02, -2.10304111e-02, 1.74455028e-02,
4.72253896e-02, 7.55667314e-02, 4.17576358e-02,
5.06599247e-02, -4.47778143e-02, -2.82419380e-02,
-5.05971462e-02, -2.29050219e-02, -6.34119362e-02,
-2.48881299e-02, -2.07891967e-02, -8.11395496e-02,
-7.57279713e-03, 1.01827094e-02, 2.36734990e-02,
-8.92031100e-03, -2.52159638e-03, -4.42645922e-02,
2.72163115e-02, -4.16662544e-02, 6.28380328e-02,
-6.46699443e-02, 4.98163030e-02, 2.40474264e-03,
-4.35062796e-02, 9.96236573e-04, 3.21699865e-03,
-7.41114616e-02, 5.10127423e-03, 5.91132231e-03,
-2.09343582e-02, 5.36168702e-02, 6.32562339e-02,
-1.18424380e-02, -5.33314049e-02, 8.15369189e-02,
-3.56774591e-02, -3.64913158e-02, -2.39417814e-02,
-1.68477502e-02, 4.09753472e-02, -5.00184186e-02,
-3.02095693e-02, -6.65327683e-02, 6.97435886e-02,
6.97659627e-02, 2.44959928e-02, -7.88502675e-03,
-1.70990340e-02, -3.60420384e-02, -1.89642422e-02,
-7.21183345e-02, -6.83112964e-02, 5.45631945e-02,
5.56440577e-02, -6.96792156e-02, 5.17817736e-02,
-5.04019484e-03, -7.98536614e-02, -6.72034398e-02,
3.57697830e-02, 7.33052269e-02, -6.80490360e-02,
-6.17038347e-02, -7.66119808e-02, 5.73239997e-02,
-1.82283260e-02, -3.99673171e-02, 5.84224798e-02,
-7.66021237e-02, 6.21817261e-02, -2.64632311e-02,
-2.70551220e-02, -2.09122039e-02, 4.49912027e-02,
5.27962260e-02, -1.61876865e-02, 2.99768839e-02,
-3.36280465e-02, -1.51605671e-03, -2.47574896e-02,
5.60405292e-03, 9.68514197e-03, -1.68982614e-02,
-7.41703138e-02, 8.06703139e-03, -1.44717349e-02,
-1.71089768e-02, 5.54176830e-02, 5.78010641e-02,
-1.78876668e-02, -1.29997097e-02, 5.63468151e-02,
7.27057606e-02, -1.10625252e-02, 7.14442134e-03,
2.05701385e-02, 5.05811423e-02, 8.63553584e-03,
5.28230928e-02, 5.43508772e-03, -4.37706057e-03,
-1.69210937e-02, 7.57706687e-02, 1.81356780e-02,
4.76261452e-02, 1.06395511e-02, -7.35997260e-02,
6.64972737e-02, 5.22572510e-02, 5.71820438e-02,
2.35127471e-02, 2.56717186e-02, -3.12523060e-02,
3.06110885e-02, -1.37786288e-03, -2.19957810e-02,
-3.60827036e-02, 5.08690346e-03, -1.49257006e-02,
7.51743838e-02, 5.96603751e-03, -2.87195779e-02,
-6.46486282e-02, -5.29822260e-02, -1.47496245e-03,
-7.67807290e-02, 3.60531062e-02, 6.81242943e-02,
2.16352921e-02, -8.55341647e-03, -2.21430194e-02,
-8.83253198e-03, 3.59478197e-03, -7.58751631e-02,
5.91248423e-02, 4.42272760e-02, -3.39478180e-02,
4.03287634e-02, -4.57744524e-02, 4.45390902e-02,
6.11837544e-02, -3.16450559e-02, -7.24180341e-02,
2.00887918e-02, -3.19629908e-02, -6.86090300e-03,
3.28799486e-02, -7.06641898e-02, 1.93985011e-02,
-3.90757024e-02, -3.66524868e-02, 5.76053411e-02,
6.35175093e-04, 4.37529907e-02, -1.18877320e-02,
6.06463850e-02, -9.88359284e-03, 3.63793671e-02,
-7.47962818e-02, -7.39435032e-02, 6.32128567e-02,
6.12870194e-02, -6.58575967e-02, 2.75015971e-03,
-4.56172265e-02, 6.30888119e-02, 9.60739609e-03,
-5.22800013e-02, -6.43881708e-02, 2.20250804e-02,
1.62106263e-03, -4.56735268e-02, -2.72764824e-02,
1.55690350e-02, -1.04821082e-02, 1.09128319e-02,
4.22615670e-02, 3.59000675e-02, 2.44700797e-02,
1.16510596e-02, -2.50982083e-02, 5.81165403e-02,
-2.99764648e-02, 3.09661478e-02, 4.46595661e-02,
-5.85869774e-02, -6.54702336e-02, -4.22021002e-02,
-1.62350927e-02, -2.27494091e-02, 7.32957125e-02,
7.31576756e-02, -5.63123915e-03, -1.70655418e-02,
-1.72184445e-02, 7.96012357e-02, 5.04064001e-02,
1.87336896e-02, 4.66014594e-02, 5.06016947e-02,
2.99742296e-02, 2.36040205e-02, -5.34015521e-02,
1.35052192e-03, 6.80805445e-02, 2.22724825e-02,
-3.01939417e-02, -7.31360614e-02, -2.51521859e-02,
4.05842923e-02, 1.60862431e-02, -2.25230474e-02,
2.86010765e-02, 2.29199734e-02, 2.20593587e-02,
-5.47554232e-02, 5.78941219e-02, 6.56009391e-02,
-7.13857934e-02, -7.48909358e-03, 6.78754002e-02,
-2.92496174e-03, -6.95068836e-02, 5.49913049e-02,
3.31136771e-02, -2.13340521e-02, -4.93099019e-02,
-1.47924507e-02, -6.91533834e-02, -3.78118120e-02,
-6.53230399e-02, -4.87434752e-02, 2.96516586e-02,
-7.58804101e-03, -1.24682412e-02, -7.59115368e-02,
-3.75635456e-03, 2.93405913e-02, -5.34483194e-02,
-1.71132796e-02, 5.20518832e-02, -6.30412847e-02,
-5.12841195e-02, -3.42662632e-02, -5.49382344e-02,
-6.89640343e-02, 6.04783110e-02, -2.27603670e-02,
6.75819349e-03, 5.91140948e-02, -4.53736931e-02,
3.08122877e-02, -2.23298520e-02, -1.62059460e-02,
4.74171452e-02, -7.03755468e-02, -5.98351210e-02,
4.70205843e-02, -5.80502860e-03, 2.21272446e-02,
-7.57156089e-02, 4.97078523e-02, -3.15653495e-02,
4.92160209e-02, -3.86846699e-02, 2.09846185e-03,
-5.62710315e-02, -2.08172686e-02, 7.27586523e-02,
3.23538706e-02, -1.12844165e-02, 4.76871207e-02,
7.68466992e-03, -1.23470398e-02, 2.14785617e-02,
2.89252382e-02, 3.02119087e-02, 2.10444834e-02,
2.13446151e-02, -3.27234976e-02, -6.14904426e-02,
-6.49609463e-03, -8.24379921e-02, -2.68000960e-02,
7.73761328e-03, -4.13497761e-02, 1.23230414e-02,
-6.00103587e-02, 1.27278063e-02] as sentEmbed
)
SELECT cossim(doc.sentEmbed, query.sentEmbed) sim, date, lead, previewUrl FROM `gdelt-bq.gdeltv2.gsg_iatvsentembed` doc, query WHERE DATE(date) >= "2020-11-01" order by sim desc limit 10000

This query compares the fact check claim vector against all of the captioning sentences in our dataset since November 1, 2020, totaling more than 13.1 million sentences. BigQuery's massive scale really shines here as it completes all 13.1 million cosine similarity comparisons, sorts the results and outputs the top 10,000 most similar results in just 28 seconds from start to finish.

You can see the top 20 most similar sentences in the table below. These include examples like the following:

The last sentence is particularly interesting in that it would have been missed by a traditional keyword search since it references a "nano chip" rather than a "microchip." A keyword search would not be able to recognize that a "nano chip" is likely the same as a "microchip" in this context, but the embedding model is able to see that the two are likely one in the same in this particular sentence and thus yields an embedding that is highly similar to our fact check claim.

Row sim date lead previewUrl
1
0.6979889356384344
2020-12-08 04:55:06 UTC
MANY PEOPLE
https://archive.org/details/FOXNEWSW_20201208_040000_Fox_News_at_Night_With_Shannon_Bream/start/3306
2
0.6853320074239302
2020-12-08 08:55:22 UTC
MANY PEOPLE
https://archive.org/details/FOXNEWSW_20201208_080000_Fox_News_at_Night_With_Shannon_Bream/start/3322
3
0.6629430437153419
2020-12-17 21:30:17 UTC
AMONG THE
https://archive.org/details/MSNBCW_20201217_210000_Deadline_White_House/start/1817
4
0.6557845795818997
2021-03-25 09:48:53 UTC
ANYTHING FROM
https://archive.org/details/CNNW_20210325_090000_Early_Start_With_Christine_Romans_and_Laura_Jarrett/start/2933
5
0.6542869766187697
2020-12-13 17:11:01 UTC
WE DON'T
https://archive.org/details/FOXNEWSW_20201213_170000_Americas_News_Headquarters/start/661
6
0.651444391945469
2021-07-13 23:52:32 UTC
WHO STILL
https://archive.org/details/CNNW_20210713_230000_Erin_Burnett_OutFront/start/3152
7
0.6475100945917306
2020-12-17 21:33:55 UTC
MORE REGULATED
https://archive.org/details/MSNBCW_20201217_210000_Deadline_White_House/start/2035
8
0.6444304289007284
2020-12-09 20:13:17 UTC
Along with
https://archive.org/details/BBCNEWS_20201209_200000_BBC_News/start/797
9
0.6444304289007284
2020-12-09 16:52:54 UTC
Along with
https://archive.org/details/BBCNEWS_20201209_140000_BBC_News/start/10374
10
0.6418026033632469
2021-06-09 20:49:30 UTC
THERE IS
https://archive.org/details/CNNW_20210609_200000_The_Lead_With_Jake_Tapper/start/2970
11
0.6393235478780447
2021-05-04 19:25:44 UTC
THEY'RE GOING
https://archive.org/details/CNNW_20210504_190000_CNN_Newsroom_With_Alisyn_Camerota_and_Victor_Blackwell/start/1544
12
0.6384793753449715
2021-07-19 22:02:17 UTC
OVER THE
https://archive.org/details/FOXNEWSW_20210719_220000_Special_Report_With_Bret_Baier/start/137
13
0.6333324162072822
2020-12-09 17:48:46 UTC
False claims
https://archive.org/details/BBCNEWS_20201209_170000_BBC_News/start/2926
14
0.6312473942503065
2020-12-17 15:31:31 UTC
IT'S ACTUALLY
https://archive.org/details/CNNW_20201217_150000_CNN_Newsroom_With_Poppy_Harlow_and_Jim_Sciutto/start/1891
15
0.6285368401936927
2020-12-23 22:48:40 UTC
WE'RE DOING
https://archive.org/details/MSNBCW_20201223_210000_Deadline_White_House/start/6520
16
0.6282257384988286
2020-12-03 15:14:25 UTC
From March
https://archive.org/details/BBCNEWS_20201203_140000_BBC_News/start/4465
17
0.6282257384988286
2020-12-03 14:51:40 UTC
From March
https://archive.org/details/BBCNEWS_20201203_140000_BBC_News/start/3100
18
0.6224919468230589
2021-02-03 13:02:52 UTC
unknown –
https://archive.org/details/BBCNEWS_20210203_130000_BBC_News_at_One/start/172
19
0.6220374958075189
2020-11-17 16:46:11 UTC
will it
https://archive.org/details/BBCNEWS_20201117_140000_BBC_News/start/9971
20
0.6219834389078501
2020-12-05 04:02:26 UTC
THEY'VE CLUED
https://archive.org/details/MSNBCW_20201205_040000_The_11th_Hour_With_Brian_Williams/start/146

We're tremendously excited to see what kinds of new applications this enables!