How can we use the new Global Embedded Metadata Graph to explore some of the trends in metadata usage in a day of online news coverage?
What are the kinds of JSON-LD blocks we see in news coverage? Here is a query that returns a selection of JSON-LD:
SELECT date, url, title, lang, rec FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) rec WHERE jsonld is not null and DATE(date) = '2020-12-01' limit 100
And here are what the results look like. Here the query above has flattened the results, meaning a page with multiple JSON-LD records will appear in the results as multiple rows:
date | url | title | lang | rec | ||
---|---|---|---|---|---|---|
1 |
2020-12-01 20:18:24 UTC
|
https://oregional.com.br/cidades/ministerio-da-educacao-lanca-jogo-virtual-para-ajudar-na-alfabetizacao-de-criancas/
|
Ministério da Educação Lança Jogo Virtual Para Ajudar na Alfabetização de Crianças
|
PORTUGUESE
|
{"@context":"https://schema.org","@graph":[{"@type":"Organization","@id":"https://oregional.com.br/#organization","name":"O Regional","url":"https://oregional.com.br/","sameAs":["/jornalcatanduva","/oregionalcatanduva","https://www.youtube.com/channel/UCEz5vtjshPDnUKtZYid0BKA","https://twitter.com/oregionalonline"],"logo":{"@type":"ImageObject","@id":"https://oregional.com.br/#logo","inLanguage":"pt-BR","url":"https://oregional.com.br/wp-content/uploads/2020/06/cropped-INCONECG.png","width":512,"height":512,"caption":"O Regional"},"image":{"@id":"https://oregional.com.br/#logo"}},{"@type":"WebSite","@id":"https://oregional.com.br/#website","url":"https://oregional.com.br/","name":"O REGIONAL","description":"Jornal Regional de Catanduva","publisher":{"@id":"https://oregional.com.br/#organization"},"potentialAction":[{"@type":"SearchAction","target":"https://oregional.com.br/?s={search_term_string}","query-input":"required name=search_term_string"}],"inLanguage":"pt-BR"},{"@type":"ImageObject","@id":"https://or…
|
|
2 |
2020-12-01 20:18:28 UTC
|
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
|
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
|
GERMAN
|
{"@context":"http://schema.org","@type":"NewsArticle","@id":"https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html#id","headline":"Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen","description":"Am Cyber Monday 2020 locken Online-Shops wieder mit Angeboten. Mit einfachen Tricks sichern Sie sich die besten Angebote bei Amazon, Saturn und Co.","mainEntityOfPage":"https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html","datePublished":"2020-11-30T20:52:00+01:00","dateModified":"2020-11-30T20:55:22+01:00","author":{"@type":"Person","name":["Karolin Schäfer"]},"publisher":{"@type":"Organization","name":"HNA.de","logo":{"@type":"ImageObject","url":"https://www.hna.de/static/hna-de/img/basis/responsive/logo.png"}},"image":["https://www.hna.de/bilder/2020/11/21/90110120/24374837-amazon-black-friday-2020-tipps-a…
|
|
3 |
2020-12-01 20:18:28 UTC
|
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
|
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
|
GERMAN
|
{"@context":"http://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","name":"HNA Startseite","position":1,"item":{"@type":"Thing","@id":"//www.hna.de/"}},{"@type":"ListItem","name":"Verbraucher","position":2,"item":{"@type":"Thing","@id":"//www.hna.de/verbraucher/"}},{"@type":"ListItem","name":"Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen","position":3,"item":{"@type":"Thing","@id":"//www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html"}}]}
|
|
4 |
2020-12-01 20:18:32 UTC
|
https://www.almasryalyoum.com/news/details/2101448
|
منسق «حياة كريمة»: إحلال وتجديد 38 منزلًا بقرية بهي الدين في سيوة
|
ARABIC
|
{ "@context": "http://schema.org", "@type": "NewsArticle", "headline": "منسق «حياة كريمة»: إحلال وتجديد 38 منزلًا بقرية بهي الدين في سيوة", "mainEntityOfPage": { "@type": "WebPage", "@id": "http://www.almasryalyoum.com/news/details/2101448" }, "datePublished": "12/1/2020 9:59:57 PM", "dateModified": "12/1/2020 9:59:57 PM", "author": { "@type": "Person", "name": "<a href='https://www.almasryalyoum.com/editor/details/1253'>علي الشوكي</a>" }, "publisher": { "@type":"Organization", "name": "المصري اليوم", "logo": { "@type": "ImageObject", "url": "https://www.almasryalyoum.com/content/inc/img/MobLogo.png" } }, "image": { "@type": "ImageObject", "url": "https://mediaaws.almasryalyoum.com/news/verylarge/2020/12/01/1382160_0.jpg", "width": "325", "height": "244" } }
|
Of course, the JSON-LD blocks are simple scalar strings, so how do we extract useful information out of them? Using BigQuery's JSON_EXTRACT_SCALAR, we can parse each JSON block on-the-fly to extract a specific field:
SELECT date, url, title, lang, JSON_EXTRACT_SCALAR(rec, '$.description') description FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) rec WHERE jsonld is not null and JSON_EXTRACT_SCALAR(rec, '$.description') is not null limit 100
You can see the results below. NOTE that the query above requires the "description" field to be root-level. A "description" field nested deeper in the JSON-LD block structure won't be extracted by this query and would require a more complex query that can walk the entire structure of each block. Note also that as you can see in the results below, sometimes multiple blocks in the same page repeat the same information.
date | url | title | lang | description | ||
---|---|---|---|---|---|---|
1 |
2020-12-01 20:18:28 UTC
|
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
|
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
|
GERMAN
|
Am Cyber Monday 2020 locken Online-Shops wieder mit Angeboten. Mit einfachen Tricks sichern Sie sich die besten Angebote bei Amazon, Saturn und Co.
|
|
2 |
2020-12-01 20:18:44 UTC
|
https://baomoi.com/bi-mat-moi-ve-quai-vat-ho-loch-ness/c/37205520.epi
|
Bí mật mới về quái vật hồ Loch Ness
|
VIETNAMESE
|
Một nhà trinh thám đã nghỉ hưu người Anh cho biết, ông đã từng hai lần tận mắt nhìn thấy quái vật hồ Loch Ness.
|
|
3 |
2020-12-01 20:18:48 UTC
|
https://www.ekhokavkaza.com/a/30978754.html
|
Как не испортить жизнь в аду
|
RUSSIAN
|
«Это подлое убийство в очередной раз показало, что терроризм не признает не только законов цивилизованного общества, но и самых основных норм человеческой морали. В современных условиях сложной международной обстановки совершенное преступление подрывает мир и стабильность в регионе и несет…
|
|
4 |
2020-12-01 20:18:48 UTC
|
https://www.hd.se/2020-12-01/gantz-hotar-med-regeringskris-i-israel
|
Gantz hotar med regeringskris i Israel
|
SWEDISH
|
Försvarsminister Benny Gantz och hans Blåvita alliansen säger sig vara redo att kasta koalitionsregeringen med premiärminister Benjamin Netanyahu och hans Likudparti överbord och rösta för att upplösa Israels parlament knesset. Om hoten i slutänden verk
|
|
5 |
2020-12-01 20:18:52 UTC
|
https://www.mdzol.com/mdz-femme/2020/12/1/este-es-el-sillon-que-deberias-elegir-para-tu-casa-si-te-gusta-estar-comodo-122751.html
|
Este es el sillón que deberías elegir para tu casa, si te gusta estar cómodo
|
SPANISH
|
Si sos fan de estar en casa y ver series, pero además de eso estar cómodo así debe ser el sillón que tenes que tener.
|
|
6 |
2020-12-01 20:18:52 UTC
|
https://www.mdzol.com/politica/2020/12/1/cuantos-millones-le-prometio-invertir-ford-alberto-en-argentina-122870.html
|
¿Cuántos millones le prometió invertir Ford a Alberto en Argentina?
|
SPANISH
|
La millonaria inversión estará destinada a la fabricación de la próxima generación de la pick up Ranger en su planta de la localidad bonaerense de General Pacheco.
|
|
7 |
2020-12-01 20:18:56 UTC
|
https://www.varmatin.com/faits-divers/quatre-victimes-un-bebe-tue-un-conducteur-fou-ce-que-lon-sait-sur-le-drame-survenu-en-allemagne-611265
|
Quatre victimes, un bébé tué, un "conducteur fou"… Ce que l'on sait sur le drame survenu en Allemagne
|
FRENCH
|
Une "scène d'horreur": un "conducteur fou", ivre et souffrant de troubles psychiatriques, a percuté mardi des passants dans une zone piétonne à Trèves, dans le sud-ouest de l'Allemagne, tuant quatre personnes, dont un bébé, avant d'être interpellé.
|
|
8 |
2020-12-01 20:18:56 UTC
|
https://www.usinenouvelle.com/article/axereal-restructure-son-outil-industriel.N1035564
|
La coopérative Axéréal se restructure et pourrait supprimer 220 postes
|
FRENCH
|
Ax&eacute;r&eacute;al, premi&egrave;re coop&eacute;rative c&eacute;r&eacute;ali&egrave;re fran&ccedil;aise, envisage de r&eacute;duire le nombre de ses sites, notamment de ses silos, sur sa vaste zone de collecte. Ax&eacute;r&eacute;al veut supprimer 220 postes, un peu plus de 5&nbsp;% de son effectif sur le territoire.
|
|
9 |
2020-12-01 20:18:56 UTC
|
https://www.rp.pl/Koronawirus-SARS-CoV-2/201209927-Grodzki-Nie-traktowalbym-slow-premiera-calkiem-powaznie.html
|
Grodzki: Nie traktowałbym słów premiera całkiem poważnie
|
POLISH
|
Opowiadanie o zwycięstwach z pandemią jest irytujące dla lekarzy. Polska to chyba jedyny kraj, który rozgrywa tę epidemię politycznie, a nie zdrowotnie – powiedział w "Faktach po Faktach" w TVN24 marszałek Senatu, prof. Tomasz Grodzki.
|
|
12 |
2020-12-01 20:18:56 UTC
|
https://abc7.com/politics/barr-no-evidence-of-fraud-thatd-change-election-outcome/8417189/
|
Attorney General Bill Barr: No evidence of fraud that'd change 2020 presidential election outcome
|
ENGLISH
|
Attorney General William Barr said Tuesday the Justice Department has not uncovered evidence of widespread voter fraud that would change the outcome of the 2020 presidential election.
|
|
13 |
2020-12-01 20:18:56 UTC
|
https://www.mprnews.org/story/2020/11/12/state-regulators-approve-line-3-permits-move-pipeline-closer-to-construction
|
State regulators approve Line 3 permits; move pipeline closer to construction
|
ENGLISH
|
An Enbridge spokesperson said only that the company would begin construction once it has all approvals in hand, but a union official whose members plan to work on the project said they expect construction to begin in the next month.
|
|
14 |
2020-12-01 20:18:56 UTC
|
https://www.chron.com/lottery/article/Winning-numbers-drawn-in-Numbers-Midday-game-15766817.php
|
Winning numbers drawn in 'Numbers Midday' game
|
ENGLISH
|
ALBANY, N.Y. (AP) _ The winning numbers in Tuesday afternoon's drawing of the New York…
|
|
15 |
2020-12-01 20:19:00 UTC
|
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
|
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
|
FRENCH
|
Samedi, les policiers ont interpellé un individu qui venait de rouer de coups sa compagne. Il a été aussi été trouvé en possession d’une arme à feu.
|
|
16 |
2020-12-01 20:19:00 UTC
|
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
|
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
|
FRENCH
|
Samedi, les policiers ont interpellé un individu qui venait de rouer de coups sa compagne. Il a été aussi été trouvé en possession d’une arme à feu.
|
What about compiling the "@type" field of JSON-LD blocks that have a root-level "Description" field? Note that we have to use the special JSON_EXTRACT_SCALAR syntax for a non-standard field name here due to the "@" symbol: "$['@type']":
SELECT date, url, title, lang, JSON_EXTRACT_SCALAR(rec, "$['@type']") type FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) rec WHERE jsonld is not null and JSON_EXTRACT_SCALAR(rec, '$.description') is not null limit 100
You can see that articles are being tagged as "NewsArticle", "Article", "WebPage" and even "Product". Note how the last two rows are the same article that includes "@type" entries for both "NewsArticle" and "Product" (the page actually includes considerably more detail across a number of Schema.org categories).
date | url | title | lang | type | ||
---|---|---|---|---|---|---|
1 |
2020-12-01 20:18:28 UTC
|
https://www.hna.de/verbraucher/cyber-monday-black-friday-2020-amazon-saturn-tipps-sparen-media-markt-angebote-tricks-kassel-hna-zr-90110120.html
|
Tipps zum Cyber Monday 2020: Mit diesen Tricks auch nach Black Friday sparen
|
GERMAN
|
NewsArticle
|
|
2 |
2020-12-01 20:18:44 UTC
|
https://baomoi.com/bi-mat-moi-ve-quai-vat-ho-loch-ness/c/37205520.epi
|
Bí mật mới về quái vật hồ Loch Ness
|
VIETNAMESE
|
NewsArticle
|
|
3 |
2020-12-01 20:18:48 UTC
|
https://www.ekhokavkaza.com/a/30978754.html
|
Как не испортить жизнь в аду
|
RUSSIAN
|
NewsArticle
|
|
4 |
2020-12-01 20:18:48 UTC
|
https://www.hd.se/2020-12-01/gantz-hotar-med-regeringskris-i-israel
|
Gantz hotar med regeringskris i Israel
|
SWEDISH
|
NewsArticle
|
|
5 |
2020-12-01 20:18:52 UTC
|
https://www.mdzol.com/mdz-femme/2020/12/1/este-es-el-sillon-que-deberias-elegir-para-tu-casa-si-te-gusta-estar-comodo-122751.html
|
Este es el sillón que deberías elegir para tu casa, si te gusta estar cómodo
|
SPANISH
|
NewsArticle
|
|
6 |
2020-12-01 20:18:52 UTC
|
https://www.mdzol.com/politica/2020/12/1/cuantos-millones-le-prometio-invertir-ford-alberto-en-argentina-122870.html
|
¿Cuántos millones le prometió invertir Ford a Alberto en Argentina?
|
SPANISH
|
NewsArticle
|
|
7 |
2020-12-01 20:18:56 UTC
|
https://www.varmatin.com/faits-divers/quatre-victimes-un-bebe-tue-un-conducteur-fou-ce-que-lon-sait-sur-le-drame-survenu-en-allemagne-611265
|
Quatre victimes, un bébé tué, un "conducteur fou"… Ce que l'on sait sur le drame survenu en Allemagne
|
FRENCH
|
Article
|
|
8 |
2020-12-01 20:18:56 UTC
|
https://www.usinenouvelle.com/article/axereal-restructure-son-outil-industriel.N1035564
|
La coopérative Axéréal se restructure et pourrait supprimer 220 postes
|
FRENCH
|
NewsArticle
|
|
9 |
2020-12-01 20:18:56 UTC
|
https://www.rp.pl/Koronawirus-SARS-CoV-2/201209927-Grodzki-Nie-traktowalbym-slow-premiera-calkiem-powaznie.html
|
Grodzki: Nie traktowałbym słów premiera całkiem poważnie
|
POLISH
|
WebPage
|
|
10 |
2020-12-01 20:18:56 UTC
|
https://www.ammoland.com/2020/12/daily-deal-magpump-223-5-56-ar-15-magazine-loader-99-99-w-free-shipping/
|
Daily Deal: MagPump .223/5.56 AR-15 Magazine Loader $96.00 w/ Free Shipping
|
ENGLISH
|
Article
|
|
11 |
2020-12-01 20:18:56 UTC
|
https://www.ammoland.com/2020/12/daily-deal-magpump-223-5-56-ar-15-magazine-loader-99-99-w-free-shipping/
|
Daily Deal: MagPump .223/5.56 AR-15 Magazine Loader $96.00 w/ Free Shipping
|
ENGLISH
|
NewsArticle
|
|
12 |
2020-12-01 20:18:56 UTC
|
https://abc7.com/politics/barr-no-evidence-of-fraud-thatd-change-election-outcome/8417189/
|
Attorney General Bill Barr: No evidence of fraud that'd change 2020 presidential election outcome
|
ENGLISH
|
NewsArticle
|
|
13 |
2020-12-01 20:18:56 UTC
|
https://www.mprnews.org/story/2020/11/12/state-regulators-approve-line-3-permits-move-pipeline-closer-to-construction
|
State regulators approve Line 3 permits; move pipeline closer to construction
|
ENGLISH
|
NewsArticle
|
|
14 |
2020-12-01 20:18:56 UTC
|
https://www.chron.com/lottery/article/Winning-numbers-drawn-in-Numbers-Midday-game-15766817.php
|
Winning numbers drawn in 'Numbers Midday' game
|
ENGLISH
|
NewsArticle
|
|
15 |
2020-12-01 20:19:00 UTC
|
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
|
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
|
FRENCH
|
NewsArticle
|
|
16 |
2020-12-01 20:19:00 UTC
|
https://www.lavoixdunord.fr/901434/article/2020-12-01/maubeuge-il-roue-de-coups-sa-compagne-et-se-fait-interpeller-en-possession-d-une
|
Maubeuge: il roue de coups sa compagne et se fait interpeller en possession d'une arme à feu
|
FRENCH
|
Product
|
We're tremendously excited to see what you're able to do with this new dataset!