The GDELT Project

Identifying Conspiracy Theories At Their Earliest Stages

How might we build systems capable of tractable mass-scale scanning of global news coverage for known falsehoods and conspiracy theories? Last year we showcased how the USEv4 embeddings we compute over television news transcripts can be combined with USEv4 embeddings of known fact checks to trivially and quite effectively scan television news for claims related to known fact checks that are even able to connect "nano chips" with "microchips" to scan television news in realtime for known falsehoods. But, how can we scale this up to a broader range of topics, especially those for which fact checks don't yet exist? In its ultimate form, how might we scan the world's news media in realtime for the earliest glimmers of tomorrow's most pressing and dangerous falsehoods and conspiracy theories?

Rather than trying to build all-encompassing "dangerous conspiracy theory" classifier models, a far more flexible and robust approach is to score each news article along a set of dimensions that can then be scored by a second model which can be rapidly adjusted to weight the different dimensions according to the needs of the moment. For example, calls to action can be boosted as a signal during a period of physical violence, while sensitive topics related to breaking events can be boosted and so on.

Here are just a few examples of possible dimensions that could be used to score news articles for identifying conspiracy theories in realtime. No single dimension offers a guaranteed signal and scores can be particularly difficult to assess in breaking stories that have few counterparts, but together these dimensions offer a powerful signal for fact checkers to prioritize such stories.

We'd love to see projects explore how these kinds of architectures might be used to scan the news in realtime for tomorrow's biggest falsehoods and most dangerous conspiracy theories!