Using Television News Ngrams To Expand Query Search Terms & Catch Misspellings

This past April, Tara Reade's name surged into the media as she accused presidential candidate Joe Biden of sexual assault. Spot checking results and testing various misspellings suggested that "Tara Reid" was a common misspelling of her name, leading to a final query of "Tara Reade OR Tara Reid". How might the new Television News Ngram 2.0 Dataset help with systematically identifying misspellings and expanding query terms in general?

Using the bigram version of the dataset in BigQuery, a simple SQL query is all that's required to compile a list of name variants:

SELECT NGRAM, COUNT(1) cnt FROM `gdelt-bq.gdeltv2.iatv_2gramsv2` WHERE DATE(TIMESTAMP) >= "2020-04-24" and DATE(TIMESTAMP) <= "2020-05-25" and NGRAM like 'tara re%' group by NGRAM order by cnt desc

Yielding the following list:

Ngram Count
tara reade 492
tara reid 140
tara reade's 130
tara reid's 16
tara reed 15
tara read 6
tara reader 4
tara reed's 3
tara reads 2
tara rea 1
tara rereade's 1
tara re 1
tara readest 1
tara reade.ourse 1
tara rei 1
tara readeys 1