This final project paper by Duke University graduate student Matt Dickenson examines using GDELT to automate the production of the previously human-compiled Militarized Interstate Disputes (MID) dataset, produced by the Correlates of War project.
Can classification methods help to automate the production of political indicators in near real time? The Militarized Interstate Disputes (MID) dataset, produced by the Correlates of War project, has been widely used in political research and policy discussions over the past three decades. Despite its value for understanding conflict, MID data coding is performed in iterative batches by human coders that lag behind the present by several years. However, reliance solely on human coders is neither necessary nor desirable. This project is the first stage in creating a pipeline to approximate the MID dataset using classification trees and daily event data (GDELT) at a substantial reduction in cost.