Running The 'Trump Is' Analysis At Scale Using The TV News Ngrams 2.0

This past November we unveiled a fascinating look at how Donald Trump has been framed in the media since he first rode down the escalator of Trump Tower in June 2015. Using the TV API 2.0, we showed how you can write a quick script to download the matching clips and construct a monthly ngram by station. This still required a fair bit of work, so we're excited to showcase how you can now replicate that exact analysis using the new Television News Ngram 2.0 Dataset!

What if we want to know the top 10 "trump is" statements on CNN thus far this year? Doing so is just a simple query away!

SELECT NGRAM, count(1) CNT FROM `gdelt-bq.gdeltv2.iatv_4gramsv2` WHERE DATE(TIMESTAMP) >= "2020-01-01" and STATION='CNN' and NGRAM like 'trump is %' group by NGRAM order by CNT desc

This yields the table below:

NGRAM CNT
trump is going to 76
trump is trying to 68
trump is the first 29
trump is expected to 29
trump is set to 23
trump is making it 18
trump is scheduled to 17
trump is on the 14
trump is about to 13
trump is not the 11

What if we want to group by month and report just the top 10 entries by month? The SQL below will do exactly that, using this ARRAY_AGG() suggestion!

SELECT FORMAT_TIMESTAMP("%b %Y", DATEMONTH) MONTH, ARRAY_AGG(STRUCT(NGRAM, CNT) ORDER BY CNT DESC LIMIT 10) AS TOPCNN FROM (
SELECT TIMESTAMP_TRUNC(TIMESTAMP, MONTH) DATEMONTH, NGRAM, count(1) CNT FROM `gdelt-bq.gdeltv2.iatv_4gramsv2` WHERE DATE(TIMESTAMP) >= "2020-01-01" and STATION='CNN' and NGRAM like 'trump is %' group by DATEMONTH, NGRAM
) GROUP BY DATEMONTH ORDER BY DATEMONTH DESC

If you want to flatten the structure into ordinary rows, you can wrap it in an UNNEST():

SELECT MONTH, ENTRY.NGRAM NGRAM, ENTRY.CNT CNT FROM (
SELECT FORMAT_TIMESTAMP("%b %Y", DATEMONTH) MONTH, ARRAY_AGG(STRUCT(NGRAM, CNT) ORDER BY CNT DESC LIMIT 10) AS TOPCNN FROM (
SELECT TIMESTAMP_TRUNC(TIMESTAMP, MONTH) DATEMONTH, NGRAM, count(1) CNT FROM `gdelt-bq.gdeltv2.iatv_4gramsv2` WHERE DATE(TIMESTAMP) >= "2020-01-01" and STATION='MSNBC' and NGRAM like 'trump is %' group by DATEMONTH, NGRAM
) GROUP BY DATEMONTH ORDER BY DATEMONTH DESC
), UNNEST(TOPCNN) ENTRY

This yields the table below. We're excited to see what you can do with this instant narrative analysis!

MONTH NGRAM CNT
Jun-20 trump is going to 33
Jun-20 trump is trying to 16
Jun-20 trump is expected to 15
Jun-20 trump is the first 14
Jun-20 trump is not well 8
Jun-20 trump is about to 6
Jun-20 trump is now trying 6
Jun-20 trump is set to 6
Jun-20 trump is denied a 6
Jun-20 trump is in a 6
May-20 trump is trying to 21
May-20 trump is going to 14
May-20 trump is having a 6
May-20 trump is not a 5
May-20 trump is set to 5
May-20 trump is now saying 4
May-20 trump is talking about 4
May-20 trump is proud of 4
May-20 trump is do an 4
May-20 trump is pushing hard 4
Apr-20 trump is going to 17
Apr-20 trump is trying to 15
Apr-20 trump is rejecting the 4
Apr-20 trump is treating life-saving 4
Apr-20 trump is not the 4
Apr-20 trump is not up 3
Apr-20 trump is using his 3
Apr-20 trump is threatening to 3
Apr-20 trump is eager to 3
Apr-20 trump is claiming incorrectly 3
Mar-20 trump is going to 35
Mar-20 trump is not good 5
Mar-20 trump is why we 4
Mar-20 trump is getting rid 4
Mar-20 trump is why the 4
Mar-20 trump is not in 4
Mar-20 trump is why it's 4
Mar-20 trump is good at 3
Mar-20 trump is unliked by 3
Mar-20 trump is why many 3
Feb-20 trump is going to 45
Feb-20 trump is trying to 13
Feb-20 trump is not a 7
Feb-20 trump is the most 7
Feb-20 trump is expected to 6
Feb-20 trump is to deny 5
Feb-20 trump is in the 5
Feb-20 trump is not the 5
Feb-20 trump is on the 5
Feb-20 trump is very beatable 5
Jan-20 trump is going to 33
Jan-20 trump is set to 19
Jan-20 trump is making it 10
Jan-20 trump is removed from 7
Jan-20 trump is obsessed with 6
Jan-20 trump is a racist 6
Jan-20 trump is not a 6
Jan-20 trump is trying to 6
Jan-20 trump is the most 6
Jan-20 trump is here to 5