Global Embedded Metadata Graph (GEMG): Site Name

Many websites include the formal proper name of the outlet in the "og:site_name" property in an HTML META tag. This allows clients to display the site's name as "New York Times" instead of simply its domain name of "nytimes.com", for example. This field is captured in the GEMG and is in widespread use in news outlets:

SELECT count(distinct(url)) FROM `gdelt-bq.gdeltv2.gemg`, unnest(metatags) metatag WHERE (key='og:site_name') and >= '2021-11-01' and DATE(date) <= '2021-11-30'

Overall, 8,896,916 (71%) of news articles include this information. Instead of "og:site_name", some outlets include this information in their JSON-LD block as part of the "publisher" information:

SELECT COUNT(distinct(url)) from (
SELECT url FROM `gdelt-bq.gdeltv2.gemg`, unnest(metatags) metatag WHERE (key='og:site_name') and DATE(date) >= '2021-11-01' and DATE(date) <= '2021-11-30'
UNION ALL
SELECT url FROM `gdelt-bq.gdeltv2.gemg`, unnest(jsonld) block WHERE (block like '%"publisher"%') and DATE(date) >= '2021-11-01' and DATE(date) <= '2021-11-30'
)

Combining JSON-LD publisher and og:site_name fields, a total of 9,727,443 (78%) of outlets specify the full name of the outlet.