Using The Global Embedded Metadata Graph To Explore Trends In JSON-LD

How can we use the new Global Embedded Metadata Graph to explore some of the trends in metadata usage in a day of online news coverage?

What are the kinds of JSON-LD blocks we see in news coverage? Here is a query that returns a selection of JSON-LD:

SELECT date, url, title, lang, rec FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) rec WHERE jsonld is not null and DATE(date) = '2020-12-01' limit 100

And here are what the results look like. Here the query above has flattened the results, meaning a page with multiple JSON-LD records will appear in the results as multiple rows:

date url title lang rec
1
2020-12-01 20:18:24 UTC
https://oregional.com.br/cidades/ministerio-da-educacao-lanca-jogo-virtual-para-ajudar-na-alfabetizacao-de-criancas/
Ministério da Educação Lança Jogo Virtual Para Ajudar na Alfabetização de Crianças
PORTUGUESE
{"@context":"https://schema.org","@graph":[{"@type":"Organization","@id":"https://oregional.com.br/#organization","name":"O Regional","url":"https://oregional.com.br/","sameAs":["/jornalcatanduva","/oregionalcatanduva","https://www.youtube.com/channel/UCEz5vtjshPDnUKtZYid0BKA","https://twitter.com/oregionalonline"],"logo":{"@type":"ImageObject","@id":"https://oregional.com.br/#logo","inLanguage":"pt-BR","url":"https://oregional.com.br/wp-content/uploads/2020/06/cropped-INCONECG.png","width":512,"height":512,"caption":"O Regional"},"image":{"@id":"https://oregional.com.br/#logo"}},{"@type":"WebSite","@id":"https://oregional.com.br/#website","url":"https://oregional.com.br/","name":"O REGIONAL","description":"Jornal Regional de Catanduva","publisher":{"@id":"https://oregional.com.br/#organization"},"potentialAction":[{"@type":"SearchAction","target":"https://oregional.com.br/?s={search_term_string}","query-input":"required name=search_term_string"}],"inLanguage":"pt-BR"},{"@type":"ImageObject","@id":"https://or…
2
2020-12-01 20:18:28 UTC
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
GERMAN
{"@context":"http://schema.org","@type":"NewsArticle","@id":"https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html#id","headline":"Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen","description":"Am Cyber Monday 2020 locken Online-Shops wieder mit Angeboten. Mit einfachen Tricks sichern Sie sich die besten Angebote bei Amazon, Saturn und Co.","mainEntityOfPage":"https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html","datePublished":"2020-11-30T20:52:00+01:00","dateModified":"2020-11-30T20:55:22+01:00","author":{"@type":"Person","name":["Karolin Schäfer"]},"publisher":{"@type":"Organization","name":"HNA.de","logo":{"@type":"ImageObject","url":"https://www.hna.de/static/hna-de/img/basis/responsive/logo.png"}},"image":["https://www.hna.de/bilder/2020/11/21/90110120/24374837-amazon-black-friday-2020-tipps-a…
3
2020-12-01 20:18:28 UTC
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
GERMAN
{"@context":"http://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","name":"HNA Startseite","position":1,"item":{"@type":"Thing","@id":"//www.hna.de/"}},{"@type":"ListItem","name":"Verbraucher","position":2,"item":{"@type":"Thing","@id":"//www.hna.de/verbraucher/"}},{"@type":"ListItem","name":"Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen","position":3,"item":{"@type":"Thing","@id":"//www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html"}}]}
4
2020-12-01 20:18:32 UTC
https://www.almasryalyoum.com/news/details/2101448
منسق «حياة كريمة»: إحلال وتجديد 38 منزلًا بقرية بهي الدين في سيوة
ARABIC
{ "@context": "http://schema.org", "@type": "NewsArticle", "headline": "منسق «حياة كريمة»: إحلال وتجديد 38 منزلًا بقرية بهي الدين في سيوة", "mainEntityOfPage": { "@type": "WebPage", "@id": "http://www.almasryalyoum.com/news/details/2101448" }, "datePublished": "12/1/2020 9:59:57 PM", "dateModified": "12/1/2020 9:59:57 PM", "author": { "@type": "Person", "name": "<a href='https://www.almasryalyoum.com/editor/details/1253'>علي الشوكي</a>" }, "publisher": { "@type":"Organization", "name": "المصري اليوم", "logo": { "@type": "ImageObject", "url": "https://www.almasryalyoum.com/content/inc/img/MobLogo.png" } }, "image": { "@type": "ImageObject", "url": "https://mediaaws.almasryalyoum.com/news/verylarge/2020/12/01/1382160_0.jpg", "width": "325", "height": "244" } }

Of course, the JSON-LD blocks are simple scalar strings, so how do we extract useful information out of them? Using BigQuery's JSON_EXTRACT_SCALAR, we can parse each JSON block on-the-fly to extract a specific field:

SELECT date, url, title, lang, JSON_EXTRACT_SCALAR(rec, '$.description') description FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) rec WHERE jsonld is not null and JSON_EXTRACT_SCALAR(rec, '$.description') is not null limit 100

You can see the results below. NOTE that the query above requires the "description" field to be root-level. A "description" field nested deeper in the JSON-LD block structure won't be extracted by this query and would require a more complex query that can walk the entire structure of each block. Note also that as you can see in the results below, sometimes multiple blocks in the same page repeat the same information.

date url title lang description
1
2020-12-01 20:18:28 UTC
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
GERMAN
Am Cyber Monday 2020 locken Online-Shops wieder mit Angeboten. Mit einfachen Tricks sichern Sie sich die besten Angebote bei Amazon, Saturn und Co.
2
2020-12-01 20:18:44 UTC
https://baomoi.com/bi-mat-moi-ve-quai-vat-ho-loch-ness/c/37205520.epi
Bí mật mới về quái vật hồ Loch Ness
VIETNAMESE
Một nhà trinh thám đã nghỉ hưu người Anh cho biết, ông đã từng hai lần tận mắt nhìn thấy quái vật hồ Loch Ness.
3
2020-12-01 20:18:48 UTC
https://www.ekhokavkaza.com/a/30978754.html
Как не испортить жизнь в аду
RUSSIAN
«Это подлое убийство в очередной раз показало, что терроризм не признает не только законов цивилизованного общества, но и самых основных норм человеческой морали. В современных условиях сложной международной обстановки совершенное преступление подрывает мир и стабильность в регионе и несет…
4
2020-12-01 20:18:48 UTC
https://www.hd.se/2020-12-01/gantz-hotar-med-regeringskris-i-israel
Gantz hotar med regeringskris i Israel
SWEDISH
Försvarsminister Benny Gantz och hans Blåvita alliansen säger sig vara redo att kasta koalitionsregeringen med premiärminister Benjamin Netanyahu och hans Likudparti överbord och rösta för att upplösa Israels parlament knesset. Om hoten i slutänden verk
5
2020-12-01 20:18:52 UTC
https://www.mdzol.com/mdz-femme/2020/12/1/este-es-el-sillon-que-deberias-elegir-para-tu-casa-si-te-gusta-estar-comodo-122751.html
Este es el sillón que deberías elegir para tu casa, si te gusta estar cómodo
SPANISH
Si sos fan de estar en casa y ver series, pero además de eso estar cómodo así debe ser el sillón que tenes que tener.
6
2020-12-01 20:18:52 UTC
https://www.mdzol.com/politica/2020/12/1/cuantos-millones-le-prometio-invertir-ford-alberto-en-argentina-122870.html
¿Cuántos millones le prometió invertir Ford a Alberto en Argentina?
SPANISH
La millonaria inversión estará destinada a la fabricación de la próxima generación de la pick up Ranger en su planta de la localidad bonaerense de General Pacheco.
7
2020-12-01 20:18:56 UTC
https://www.varmatin.com/faits-divers/quatre-victimes-un-bebe-tue-un-conducteur-fou-ce-que-lon-sait-sur-le-drame-survenu-en-allemagne-611265
Quatre victimes, un bébé tué, un "conducteur fou"… Ce que l'on sait sur le drame survenu en Allemagne
FRENCH
Une "scène d'horreur": un "conducteur fou", ivre et souffrant de troubles psychiatriques, a percuté mardi des passants dans une zone piétonne à Trèves, dans le sud-ouest de l'Allemagne, tuant quatre personnes, dont un bébé, avant d'être interpellé.
8
2020-12-01 20:18:56 UTC
https://www.usinenouvelle.com/article/axereal-restructure-son-outil-industriel.N1035564
La coopérative Axéréal se restructure et pourrait supprimer 220 postes
FRENCH
Axéréal, première coopérative céréalière française, envisage de réduire le nombre de ses sites, notamment de ses silos, sur sa vaste zone de collecte. Axéréal veut supprimer 220 postes, un peu plus de 5 % de son effectif sur le territoire.
9
2020-12-01 20:18:56 UTC
https://www.rp.pl/Koronawirus-SARS-CoV-2/201209927-Grodzki-Nie-traktowalbym-slow-premiera-calkiem-powaznie.html
Grodzki: Nie traktowałbym słów premiera całkiem poważnie
POLISH
Opowiadanie o zwycięstwach z pandemią jest irytujące dla lekarzy. Polska to chyba jedyny kraj, który rozgrywa tę epidemię politycznie, a nie zdrowotnie – powiedział w "Faktach po Faktach" w TVN24 marszałek Senatu, prof. Tomasz Grodzki.
12
2020-12-01 20:18:56 UTC
https://abc7.com/politics/barr-no-evidence-of-fraud-thatd-change-election-outcome/8417189/
Attorney General Bill Barr: No evidence of fraud that'd change 2020 presidential election outcome
ENGLISH
Attorney General William Barr said Tuesday the Justice Department has not uncovered evidence of widespread voter fraud that would change the outcome of the 2020 presidential election.
13
2020-12-01 20:18:56 UTC
https://www.mprnews.org/story/2020/11/12/state-regulators-approve-line-3-permits-move-pipeline-closer-to-construction
State regulators approve Line 3 permits; move pipeline closer to construction
ENGLISH
An Enbridge spokesperson said only that the company would begin construction once it has all approvals in hand, but a union official whose members plan to work on the project said they expect construction to begin in the next month.
14
2020-12-01 20:18:56 UTC
https://www.chron.com/lottery/article/Winning-numbers-drawn-in-Numbers-Midday-game-15766817.php
Winning numbers drawn in 'Numbers Midday' game
ENGLISH
ALBANY, N.Y. (AP) _ The winning numbers in Tuesday afternoon's drawing of the New York…
15
2020-12-01 20:19:00 UTC
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
FRENCH
Samedi, les policiers ont interpellé un individu qui venait de rouer de coups sa compagne. Il a été aussi été trouvé en possession d’une arme à feu.
16
2020-12-01 20:19:00 UTC
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
FRENCH
Samedi, les policiers ont interpellé un individu qui venait de rouer de coups sa compagne. Il a été aussi été trouvé en possession d’une arme à feu.

What about compiling the "@type" field of JSON-LD blocks that have a root-level "Description" field? Note that we have to use the special JSON_EXTRACT_SCALAR syntax for a non-standard field name here due to the "@" symbol: "$['@type']":

SELECT date, url, title, lang, JSON_EXTRACT_SCALAR(rec, "$['@type']") type FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) rec WHERE jsonld is not null and JSON_EXTRACT_SCALAR(rec, '$.description') is not null limit 100

You can see that articles are being tagged as "NewsArticle", "Article", "WebPage" and even "Product". Note how the last two rows are the same article that includes "@type" entries for both "NewsArticle" and "Product" (the page actually includes considerably more detail across a number of Schema.org categories).

date url title lang type
1
2020-12-01 20:18:28 UTC
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
GERMAN
NewsArticle
2
2020-12-01 20:18:44 UTC
https://baomoi.com/bi-mat-moi-ve-quai-vat-ho-loch-ness/c/37205520.epi
Bí mật mới về quái vật hồ Loch Ness
VIETNAMESE
NewsArticle
3
2020-12-01 20:18:48 UTC
https://www.ekhokavkaza.com/a/30978754.html
Как не испортить жизнь в аду
RUSSIAN
NewsArticle
4
2020-12-01 20:18:48 UTC
https://www.hd.se/2020-12-01/gantz-hotar-med-regeringskris-i-israel
Gantz hotar med regeringskris i Israel
SWEDISH
NewsArticle
5
2020-12-01 20:18:52 UTC
https://www.mdzol.com/mdz-femme/2020/12/1/este-es-el-sillon-que-deberias-elegir-para-tu-casa-si-te-gusta-estar-comodo-122751.html
Este es el sillón que deberías elegir para tu casa, si te gusta estar cómodo
SPANISH
NewsArticle
6
2020-12-01 20:18:52 UTC
https://www.mdzol.com/politica/2020/12/1/cuantos-millones-le-prometio-invertir-ford-alberto-en-argentina-122870.html
¿Cuántos millones le prometió invertir Ford a Alberto en Argentina?
SPANISH
NewsArticle
7
2020-12-01 20:18:56 UTC
https://www.varmatin.com/faits-divers/quatre-victimes-un-bebe-tue-un-conducteur-fou-ce-que-lon-sait-sur-le-drame-survenu-en-allemagne-611265
Quatre victimes, un bébé tué, un "conducteur fou"… Ce que l'on sait sur le drame survenu en Allemagne
FRENCH
Article
8
2020-12-01 20:18:56 UTC
https://www.usinenouvelle.com/article/axereal-restructure-son-outil-industriel.N1035564
La coopérative Axéréal se restructure et pourrait supprimer 220 postes
FRENCH
NewsArticle
9
2020-12-01 20:18:56 UTC
https://www.rp.pl/Koronawirus-SARS-CoV-2/201209927-Grodzki-Nie-traktowalbym-slow-premiera-calkiem-powaznie.html
Grodzki: Nie traktowałbym słów premiera całkiem poważnie
POLISH
WebPage
10
2020-12-01 20:18:56 UTC
https://www.ammoland.com/2020/12/daily-deal-magpump-223-5-56-ar-15-magazine-loader-99-99-w-free-shipping/
Daily Deal: MagPump .223/5.56 AR-15 Magazine Loader $96.00 w/ Free Shipping
ENGLISH
Article
11
2020-12-01 20:18:56 UTC
https://www.ammoland.com/2020/12/daily-deal-magpump-223-5-56-ar-15-magazine-loader-99-99-w-free-shipping/
Daily Deal: MagPump .223/5.56 AR-15 Magazine Loader $96.00 w/ Free Shipping
ENGLISH
NewsArticle
12
2020-12-01 20:18:56 UTC
https://abc7.com/politics/barr-no-evidence-of-fraud-thatd-change-election-outcome/8417189/
Attorney General Bill Barr: No evidence of fraud that'd change 2020 presidential election outcome
ENGLISH
NewsArticle
13
2020-12-01 20:18:56 UTC
https://www.mprnews.org/story/2020/11/12/state-regulators-approve-line-3-permits-move-pipeline-closer-to-construction
State regulators approve Line 3 permits; move pipeline closer to construction
ENGLISH
NewsArticle
14
2020-12-01 20:18:56 UTC
https://www.chron.com/lottery/article/Winning-numbers-drawn-in-Numbers-Midday-game-15766817.php
Winning numbers drawn in 'Numbers Midday' game
ENGLISH
NewsArticle
15
2020-12-01 20:19:00 UTC
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
FRENCH
NewsArticle
16
2020-12-01 20:19:00 UTC
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
FRENCH
Product

We're tremendously excited to see what you're able to do with this new dataset!