Certain kinds of linguistic filtering, like stop word lists for search, can benefit from datasets of the top most commonly used words in different parts of speech. Our Cloud Natural Language API AI-tagged Web Part Of Speech Dataset today encompasses 25 billion records spanning 11 languages and is updated in realtime, making it uniquely suitable for horizon scanning of the underlying linguistic patterns of news language.
Using this dataset, compiling a list of the top English nouns is as simple as:
SELECT LOWER(token) token, count(1) cnt FROM `gdelt-bq.gdeltv2.web_pos` WHERE DATE(dateTime) >= "2021-01-01" and lang='en' and posTag='NOUN' group by token having cnt>10000 order by cnt desc
Download Top English Nouns January 1, 2021 – May 31, 2021.
Compiling a list of the top English verbs is as simple as:
SELECT LOWER(token) token, count(1) cnt FROM `gdelt-bq.gdeltv2.web_pos` WHERE DATE(dateTime) >= "2021-01-01" and lang='en' and posTag='VERB' group by token having cnt>1000 order by cnt desc
Download Top English Verbs January 1, 2021 – May 31, 2021.
Compiling a list of the top English adverbs is as simple as:
SELECT LOWER(token) token, count(1) cnt FROM `gdelt-bq.gdeltv2.web_pos` WHERE DATE(dateTime) >= "2021-01-01" and lang='en' and posTag='ADV' group by token having cnt>1000 order by cnt desc
Download Top English Adverbs January 1, 2021 – May 31, 2021.
Compiling a list of the top English adjectives is as simple as:
SELECT LOWER(token) token, count(1) cnt FROM `gdelt-bq.gdeltv2.web_pos` WHERE DATE(dateTime) >= "2021-01-01" and lang='en' and posTag='ADJ' group by token having cnt>1000 order by cnt desc
Download Top English Adjectives January 1, 2021 – May 31, 2021.
Compiling a list of the top English adpositions is as simple as:
SELECT LOWER(token) token, count(1) cnt FROM `gdelt-bq.gdeltv2.web_pos` WHERE DATE(dateTime) >= "2021-01-01" and lang='en' and posTag='ADP' group by token having cnt>1000 order by cnt desc
Download Top English Adpositions January 1, 2021 – May 31, 2021.
We hope this inspires you in your own journeys in exploring the patterns of language.