The GDELT Project

Compiling A Histogram Of META Tags From The Global Embedded Metadata Graph

What are the most common attributes seen in <META> tags on the open web? Earlier today, with around 12 hours or so of data in the Global Embedded Metadata Graph, we used the query below to compile a histogram of all of the tags that appeared in <META> tags in that period in at least two articles:

SELECT tag.key, tag.type, count(1) numtags, count(distinct url) numpages, count(distinct url)/(select count(1) from `gdelt-bq.gdeltv2.gemg`)*100 percpages FROM `gdelt-bq.gdeltv2.gemg`, unnest(metatags) tag group by key, type having numpages>1 order by numpages desc

You can download the complete list of tags below:

You can see the results below, showing each unique key/type combination, the total number of times it appeared, the number of unique URLs it appeared in and the percentage of all URLs it appeared in. Note especially the "fb:app_id" tag which is the 10th most common tag, but appeared in only 48.4% of all pages. This indicates that this tag is less common but appears multiple times when it is used, often connecting an article to multiple Facebook properties owned by its publisher. Comparing the numtags and numpages fields can be used to identify tags that are often used multiple times on a page versus those that typically appear just once.

key type numtags numpages percpages
1
og:title
property
967199
932074
81.14898536996472
2
viewport
name
981560
918362
79.95518006331208
3
og:image
property
986680
913519
79.53353485472698
4
og:url
property
942430
907916
79.04572190744177
5
description
name
925426
895687
77.98103075406844
6
og:type
property
919280
884355
76.99443494492407
7
og:description
property
914946
877743
76.41877561823304
8
og:site_name
property
787314
759980
66.16599744383578
9
twitter:card
name
666604
641937
55.88884168149637
10
fb:app_id
property
590514
556001
48.407011690794675
11
keywords
name
567404
546334
47.565375467092
12
robots
name
577689
521901
45.438169730697304
13
twitter:site
name
534826
519045
45.18951833368739
14
twitter:title
name
522238
505436
44.00468049688489
15
twitter:description
name
501609
486178
42.32802482334955
16
og:image:width
property
448497
431381
37.55724380025701
17
twitter:image
name
440069
424926
36.99525333537641
18
og:image:height
property
440744
424464
36.95503031527186
19
fb:pages
property
997718
369159
32.14002138262714
20
og:locale
property
353845
341809
29.758853417563703

We hope this offers a first glimpse into the world of <META> tags!