Knowledge Graphs Over Wikidata & Wikipedia: Conflicting Attributes

Wikipedia and Wikidata are often posited as ground truth datasets against which other data can be compared. The problem with this assumption is that they are riddled with expired and conflicting information, especially across their various editions. One approach to reconcile these conflicts in Wikidata's codified attributes is to parse those attributes from the article text of each entity. For example, rather than accepting the birth dates in Wikidata, read them directly from the article text. It turns out that this, too, has its challenges.

Take the entry for Dmitry Iosifovich Gulia. His English language entry lists his birthdate as "February 9, 1874." Yet Wikidata records his birth as February 21, 1874. Why the apparent contradiction and where is Wikidata getting the February 21 date? It turns out that his Russian-language entry contains an infobox with the February 21st date codified.

Myriad such examples can be found throughout Wikipedia, especially when comparing English entries with those of languages with a connection to the person, location, event or topic in question, offering a reminder of the challenges of using the English Wikipedia as a gateway to global events.