Using The New Web News Ngram Datasets To Find The Top Estonian Words For 'Dollar'

Using the new Web News Ngram (WEB-NGRAM) dataset it becomes possible to explore the evolution and use of the world's languages with unprecedented resolution.

All it takes is a single SQL query and 7.6 seconds to leverage the 30 million words of Estonian news coverage monitored by GDELT January – September 2019 to find the most common Estonian words that begin with "dollar" in order of their popularity in the news.

Rank Word Count
1 dollari 2427
2 dollarit 2157
3 dollarini 824
4 dollar 152
5 dollareid 145
6 dollariga 102
7 dollarile 73
8 dollarite 52
9 dollarist 42
10 dollarilise 37
11 dollaril 27
12 dollarilt 25
13 dollarites 24
14 dollarise 22
15 dollariline 15
16 dollarisse 10
17 dollaritesse 9
18 dollariindeks 7
19 dollarid 6
20 dollarisendini 6

TECHNICAL DETAILS

Here is the SQL query used to generate the table above.

SELECT NGRAM, sum(COUNT) TOT FROM `gdelt-bq.gdeltv2.web_1grams` where LANG='ESTONIAN' and NGRAM like 'dollar%' group by NGRAM order by TOT desc LIMIT 100