Examining The Onscreen Text Of Television News Advertising

Earlier today we demonstrated how to exclude advertising airtime when examining patterns in the OCR'd onscreen text that appears on television news.

Here are the top 20 words that appeared in the onscreen text of ads running on CNN on May 1, 2021:

SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [1,1], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='CNN' and 
date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram cnt
the 10268
and 8305
com 7512
for 5925
with 3308
not 2930
cnn 2760
your 2704
800 2502
are 2200
may 2164
new 2096
live 1954
you 1927
all 1904
free 1735
well 1596
can 1431
apply 1390
health 1347

Here are the same results, but looking at two-word phrases:

SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [2,2], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='CNN' and 
date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram cnt
. com 7417
' s 2678
1 – 2011
800 – 1983
-800 1666
com / 1299
 and 1102
2:00 991
www . 864
apply . 826
in the 819
 or 764
of the 753
wellhealthsafety . 713
: 27 709
27 am 703
– safety 701
health – 682
 the 677
as directed 648

The prevalence of ".com" URLs, "1-800" phone numbers and medical references like "use as directed" is clearly visible.

Here are the same 2-word results for MSNBC, showing a similar density of URLs, but an absence of 1-800 numbers. Interestingly, advertisements for upcoming MSNBC programming like pandemic-related specials and the channel's personalities like the Rachel Maddow show dominate this day, rather than commercial ads:

SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [2,2], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='MSNBC' and 
date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram cnt
live > 6731
' s 4469
. com 4277
s . 2966
. s 2890
u . 2877
coronavirus pandemic 2375
plan your 2181
your vaccine 2026
vaccine 1670
dr . 1247
10:00 1224
make a 1110
a plan 1104
scan now 1092
maddow show 1059
biden ' 1033
up your 1007
plan . 1004
8:00 1004

For Fox News there is a mixture of .com URLs, 1-800 numbers and advertising for upcoming programming:

SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [2,2], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='FOXNEWS' and 
date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram cnt
fox news 6195
' s 4432
. com 3873
fox vnews 3176
v fox 2116
tucker carlson 1816
800 – 1769
v news 1710
/ fox 1621
vfox news 1588
news live 1555
et channel 1429
gutfeld ! 1379
live fox 1338
coming up 1266
news 1257
news channel 1221
cavuto live 1164
fox 1155
 fox 1147

We hope this inspires you to think of other creative ways to examine television news advertising!