Lessons Learned From Building Global Platforms For Diverse User Communities: Applications, Communities & Needs

GDELT finds use today in nearly every industry and user community, with a coinciding immensely disparate world of applications, communities and user needs. One of the most complex aspects of building global platforms that must serve such a diverse landscape of constituencies lies in the myriad distinct and often competing needs of these communities. Below are a few top-level lessons we've learned over the years.

Applications

  • Physical vs Narrative. From an application standpoint, GDELT today is deployed today for both physical-world applications (disease, conflict, migration, etc) and narrative applications (situational awareness, dis/misinformation, narrative evolution and mapping, inorganic state manipulation, etc). These represent very different needs. The former requires codified records capturing physical activity, while the latter requires "reading between the lines" and assessing complex nuanced latent linguistic dimensions.
  • Tactical vs Strategic. GDELT finds use in both tactical (right now – "nowcasting") monitoring and strategic (over the horizon – "forecasting") use cases. The former requires rapid realtime monitoring and the ability to identify anomalies and synthesize immense volumes of material to generate concise actionable insights for decision makers. In contrast, forecasting requires combining this realtime dimension with vast historical archives that allow longitudinal patterns to be identified and constructed into models that allow current observations to be extended into the future and adjusted in realtime.
  • Observing vs Risk Assessment. Similar to the tactical vs strategic divide, observational applications are designed to monitor global events and simply report what is known. They primarily revolve around observation, detection, contextualization, conflict identification and summarization. In contrast, risk assessment seeks to understand the impact of these observed trends on an organization's operations. For example, an observational application might be designed to catalog and map conflict, famine and disease outbreaks in a region. A risk assessment application would then apply these insights to assist an aid organization in where to best preposition and evolve the deployment of its resources.
  • Nowcasting vs Forecasting. Nowcasting applications are primarily focused on simply reporting the world as it stands, whereas forecasting applications attempt to foresee where it is headed. Similar to the tactical vs strategic divide, but with a greater focus on translating the results to decision making needs.

User Communities

  • Academics. Academic researchers tend to be interested in "grand challenge" questions that push the boundaries of the kinds of questions that can be asked of global society today. Their analytic timeframes tend to extend historically, with a greater interest in past rather than realtime present feeds. Despite the widespread availability of advanced computing resources on most university campuses, academic users tend to have highly constrained computational environments, often limited by the tooling and workflows they rely on and are typically able to consume only extremely small slices of the data, placing an emphasis on platform-provided extracts. This creates a competing duality where their research questions require looking across very large volumes of data, but their analytic workflows are capable of examining only very small slices. Academic projects typically range between urgent short term requests relating to publication deadlines, grant submissions and tasked public interest projects that require immediate turnaround on the order of hours to days, while other projects have longer turnarounds, but still typically expect extracts within days to weeks. Academic research tends to be more "free form," undefined and abstract, asking questions like "how can we detect falsehoods online" or "how can we forecast future conflict" that can require considerable assistance in translating the underlying data needs to the various GDELT datasets. Much academic work takes the form of "basic research" but an increasing volume of "applied research" is appearing as granting agencies and journals in many countries and fields shift towards prioritizing applied research or as academia in select fields returns to the post-WWII era's prioritization of applied public interest research. More recently, GDELT is gaining considerable traction in the algorithmic community as some of the largest open real-world multimodal datasets to advance fields like graph analytics.
  • Corporations. Corporate users tend to have highly specific applied needs and often have the computational infrastructure to simply ingest GDELT at scale and extract the selections they need on their own. They tend to "self service," identifying and downloading the datasets they need on their own with minimal assistance required. However, as GDELT finds increasing use in SOTA frontier work that combines frontier research with tactical and strategic demands, this community requires greater assistance, both in methodological questions of how to ask certain kinds of questions or to adapt datasets to specific use cases and the technological workflow questions of how to perform terascale, petascale and exascale analytics. GDELT insights are frequently combined with both external and internal datasets, including multimodal datasets like remote sensing data and ground observations and internal data streams. Differing update intervals, collection strategies and analytic workflows must be worked across when merging such datasets, with GDELT often forming the first foray of many corporations into purely external datasets, meaning the integration process extends far beyond GDELT to the entire process of engaging external data.
  • Governments. Governmental users tend to be in fields like public health, forced human migration, climate resilience and similar disciplines with a strong applied emphasis and significant need to translate findings into actionable insights for decision makers. Similar to corporate users, GDELT often forms the core of organizations' first large-data and/or external data initiatives, making it part of a larger education and assistance effort.
  • Journalists. Journalists require extremely concise insights on extremely short timeframes, typically measured in minutes to hours at maximum. Journalists just beginning with GDELT typically want results provided to them, while journalists who incorporate it into their workflows want fast self-service tools through which they can rapidly interrogate a topic to see if there is a story there and to refine their story. Investigative journalists lie at the opposite end, often wanting custom constructed terascale datasets and may require extensive assistance to get started and to understand how best to adapt their needs to the data.
  • NGOs. Non-governmental organizations like aid organizations tend to be very narrowly focused on realtime needs directly aligned with their organizational missions and needs and with a maximal premium on responsiveness. Smaller aid organizations, freed of the bureaucracy of government and larger organizations, but lacking their vast resources, sometimes look to make realtime life-or-death decisions, requiring extreme attention and consideration of workflows, model anomaly and thresholding processes and the like.

User Needs

  • Non-Technical. Most first-time users want to simply use a non-technical simple interface to get a sense of the data, what it looks like and how to use it in their application. In many fields like journalism, this is the primary interface they want to use long-term. The primary focus here is on user-friendly mobile-optimized intuitive web interfaces that make it possible to rapidly and easily ask questions of datasets. Yet it also requires balancing fully automated, guided and expert interrogation, with different user communities (and often the same user in different contexts) requiring different interaction metaphors.
  • Technical. The most common technical interface point comes through GDELT's CSV/JSON/JSONP) APIs and digested downloadable datasets. Applications typically begin with one of GDELT's APIs for MVPs, which many organizations find so well-suited for their needs that they simply wrap custom lightweight interfaces directly around the APIs. More advanced and complex workflows utilize either the digested downloadable datasets or filter the realtime feeds. The focus is on small data and maximum flexibility. Datasets like the realtime Web NGrams 3.0 tutorials are designed for this community, enabling them to perform bespoke analyses in realtime.
  • Extreme Technical. A small subset of users have the technical infrastructure and tooling to work with GDELT's datasets at scale. This community typically is ready to perform terascale, petascale and even exascale analyses, bringing to bear massive infrastructure, SOTA architectures, bleeding edge neural approaches, bespoke accelerators and cloud-scale workflows. They simply want to download entire datasets and analyze them at scale, requiring minimal assistance. Some need assistance in methodology considerations around using externally derived insights for the first time or working with the complexities of news, but this community of users typically just needs a pointer to a dataset and takes things from there. They are most interested in combining full historical backfiles with realtime updates. Rather than filtering datasets like Web NGrams 3.0 to relevant entries, they typically ingest the datasets whole and combine them in extremely complex and advanced hyperdimensional models.