Earlier today we demonstrated how to exclude advertising airtime when examining patterns in the OCR'd onscreen text that appears on television news.
Here are the top 20 words that appeared in the onscreen text of ads running on CNN on May 1, 2021:
SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [1,1], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='CNN' and date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram | cnt |
the | 10268 |
and | 8305 |
com | 7512 |
for | 5925 |
with | 3308 |
not | 2930 |
cnn | 2760 |
your | 2704 |
800 | 2502 |
are | 2200 |
may | 2164 |
new | 2096 |
live | 1954 |
you | 1927 |
all | 1904 |
free | 1735 |
well | 1596 |
can | 1431 |
apply | 1390 |
health | 1347 |
Here are the same results, but looking at two-word phrases:
SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [2,2], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='CNN' and date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram | cnt |
. com | 7417 |
' s | 2678 |
1 – | 2011 |
800 – | 1983 |
-800 | 1666 |
com / | 1299 |
and | 1102 |
2:00 | 991 |
www . | 864 |
apply . | 826 |
in the | 819 |
or | 764 |
of the | 753 |
wellhealthsafety . | 713 |
: 27 | 709 |
27 am | 703 |
– safety | 701 |
health – | 682 |
the | 677 |
as directed | 648 |
The prevalence of ".com" URLs, "1-800" phone numbers and medical references like "use as directed" is clearly visible.
Here are the same 2-word results for MSNBC, showing a similar density of URLs, but an absence of 1-800 numbers. Interestingly, advertisements for upcoming MSNBC programming like pandemic-related specials and the channel's personalities like the Rachel Maddow show dominate this day, rather than commercial ads:
SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [2,2], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='MSNBC' and date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram | cnt |
live > | 6731 |
' s | 4469 |
. com | 4277 |
s . | 2966 |
. s | 2890 |
u . | 2877 |
coronavirus pandemic | 2375 |
plan your | 2181 |
your vaccine | 2026 |
vaccine | 1670 |
dr . | 1247 |
10:00 | 1224 |
make a | 1110 |
a plan | 1104 |
scan now | 1092 |
maddow show | 1059 |
biden ' | 1033 |
up your | 1007 |
plan . | 1004 |
8:00 | 1004 |
For Fox News there is a mixture of .com URLs, 1-800 numbers and advertising for upcoming programming:
SELECT ngram, count(1) cnt FROM `gdelt-bq.gdeltv2.vgegv2_iatv`, UNNEST(ML.NGRAMS(SPLIT(REGEXP_REPLACE(LOWER(OCRText), r'(\pP)', r' \1 '), ' '), [2,2], ' ')) as ngram WHERE length(ngram) > 2 and DATE(date) = "2021-05-01" and station='FOXNEWS' and date in (SELECT date FROM `gdelt-bq.gdeltv2.iatv_aif_vidtime` WHERE DATE(date) = "2021-05-01" and station='CNN' and type!='NEWS') group by ngram order by cnt desc limit 1000
ngram | cnt |
fox news | 6195 |
' s | 4432 |
. com | 3873 |
fox vnews | 3176 |
v fox | 2116 |
tucker carlson | 1816 |
800 – | 1769 |
v news | 1710 |
/ fox | 1621 |
vfox news | 1588 |
news live | 1555 |
et channel | 1429 |
gutfeld ! | 1379 |
live fox | 1338 |
coming up | 1266 |
news | 1257 |
news channel | 1221 |
cavuto live | 1164 |
fox | 1155 |
fox | 1147 |
We hope this inspires you to think of other creative ways to examine television news advertising!