What would it look like to convert a year and a half of homepage links totaling more than half a trillion words from worldwide news homepages in 110 languages into ngram datasets with just three SQL queries, an open source language detector, one script and the power of Google’s BigQuery platform?