Identifying Conspiracy Theories At Their Earliest Stages

How might we build systems capable of tractable mass-scale scanning of global news coverage for known falsehoods and conspiracy theories? Last year we showcased how the USEv4 embeddings we compute over television news transcripts can be combined with USEv4 embeddings of known fact checks to trivially and quite effectively scan television news for claims related to known fact checks that are even able to connect "nano chips" with "microchips" to scan television news in realtime for known falsehoods. But, how can we scale this up to a broader range of topics, especially those for which fact checks don't yet exist? In its ultimate form, how might we scan the world's news media in realtime for the earliest glimmers of tomorrow's most pressing and dangerous falsehoods and conspiracy theories?

Rather than trying to build all-encompassing "dangerous conspiracy theory" classifier models, a far more flexible and robust approach is to score each news article along a set of dimensions that can then be scored by a second model which can be rapidly adjusted to weight the different dimensions according to the needs of the moment. For example, calls to action can be boosted as a signal during a period of physical violence, while sensitive topics related to breaking events can be boosted and so on.

Here are just a few examples of possible dimensions that could be used to score news articles for identifying conspiracy theories in realtime. No single dimension offers a guaranteed signal and scores can be particularly difficult to assess in breaking stories that have few counterparts, but together these dimensions offer a powerful signal for fact checkers to prioritize such stories.

  • Highly similar to known fact check. Using the Global Similarity Graph (GSG) USEv4 embeddings and mentioned entities, the similarity of an article with known fact checks from major fact checking organizations can be computed, suggesting at least the core topic of the article overlaps with known fact checked statements. We previously demonstrated how trivially such an approach can yield high-quality results for television news.
  • Highly emotional language. News articles that are use highly emotional charged language can stir readers emotions and increase their reaction to an article.
  • Call to action. Articles that include calls to action rather than clinical reporting, even if only in the third person by quoting an participant can be problematic, such as a call by an elected official for supports to take up arms against their government.
  • Sensitive topics. Falsehoods involving certain topics, such as trust in the electoral system, underrepresented demographics, health information and other topics can be especially harmful to society. A claim that Big Foot was recently sighted nearby will likely incur far less harm to society than a claim that an election was stolen or that all members of a particular minority group are terrorists. Classifiers can be built for each of these categories, allowing each article to be scored along these dimensions.
  • Distance from mainstream. How far is an article from the mainstream of the news reporting on that topic or story?
  • Distance from other conspiracy theories. Examining the language, entities, emotion and claim structure of the article and comparing it to known conspiracy theories, does it "read" more like a conspiracy theory than other news articles?

We'd love to see projects explore how these kinds of architectures might be used to scan the news in realtime for tomorrow's biggest falsehoods and most dangerous conspiracy theories!