Creating A New Generation Of Recommender Services: From Cohorts & Classification To Context & Trustworthiness

As we continue to explore how to help the world's citizenry access trustworthy information, one key area of research we will be focusing on this fall lies around recommender services for global news. Traditionally such systems have focused on personalized classification and cohort systems in which systems "learn" over time the unique preferences of a specific user and feed them an ever-more personalized stream of coverage. Such personalization can take the form of cohort systems in which users are binned into similar buckets of users with articles cross-recommended within the bucket or fully individualized personalization in which a unique classifier is built for each individual user, with myriad variations around these two common themes.

Such systems tend to inevitably reinforce echo chambers, directing cohorts of users to similar information and wrapping them in ever-more personalized reflections of their innate biases and beliefs.

Instead, an area we are especially interested in lies in the concept of context and trustworthiness.

Context refers to the idea that not all users can be persuaded away from falsehoods and misinformation, but by helping them understand the context of the information they are consuming, they can at least understand where in the media ecosystem they lie. For topics for which they have yet to develop an opinion, such systems can help them understand the range of arguments being expressed in the media, perhaps intervening before they depart mainstream coverage for more extremist and misinformation-laden areas of the information landscape. Examples include "you are here" maps of where a given article's arguments are situated within the broader universe of coverage of that topic.

Trustworthiness is a far more complicated and fraught topic. The particular instantiation we will be exploring revolves around the idea of refutation. An outlet that regularly runs articles on a topic that stand the test of time might be considered "trustworthy" on that topic. In contrast, an outlet that constantly revises its coverage of a given topic in ways that substantively changes their meaning or whose arguments are contradicted by the consensus majority of subsequent coverage might be viewed as less trustworthy. This is an immensely complicated and complex area of research and we are especially interested in how these kinds of more concrete and quantitative metrics might assist in understanding the accuracy and trust of media in place of the more subjective measures used today.

We encourage you to explore your own approaches to these questions. Two datasets of especial interest are the Web NGrams 3.0 and GSG Embedding datasets.