Global Embedded Metadata Graph (GEMG) Adds LINK REL Icon Property

Many websites include a so-called "favicon" that acts as a miniature logo for the site. While this can simply be a file named "favicon.ico" placed in the root directory of a domain, which most desktop browsers will automatically check for, it is more properly listed in the HTML of the page in the form of a <LINK REL="icon" HREF=""> tag. An older "shortcut link" property is often used in its place, while pages catering to Apple devices typically include the Apple-proprietary "apple-touch-icon" property.

These icons are particularly useful for displaying beside a site's name in search results and so as of today we are now extracting them and including them in the GEMG record for each article from here forward. Note that the GEMG records will include ALL of the icon tags present in <LINK> tags in the article, meaning some pages may have numerous icons, as some sites specify a multitude of icons for various devices and resolutions, all of which will now be included.

These newly extracted tags will be found in the META array of each GEMG record, with a "type" of "linktag" and a "key" representing the kind of icon it is, such as "icon" or "shortcut icon" or "apple-touch-icon".

For example, you can query for them via:

SELECT date,url,metatag FROM `gdelt-bq.gdeltv2.gemg`, unnest(metatags) metatag WHERE (type='linktag') and DATE(date) >= '2021-12-04' order by date desc

Learn More.