In the aftermath of any major event, from a sudden terror attack to an annual global conference, the world's news media capture and portray that event from a myriad of angles and narratives. Take a sudden-onset natural disaster like a major earthquake that produces extensive devastation. Imagery emerging from the affected area will likely comprise a mosaic of narrative threads. Each nation's news outlets will attempt to tie the disaster to its impact on and connection to their domestic readership. Countless images will feature beaming photographs of local volunteers deploying to the affected area or those whose lives were lost. Grim-faced politicians at podiums will make statements of solidarity, promise assistance, offer condolences, and potentially assure their own populations that such a disaster could never occur in their country. Images of those affected, from families mourning the loss of loved ones to relieved families who were untouched by the disaster. Aerial images offering context and magnitude will circulate alongside images from the ground documenting the damage in exquisite detail. Wide-angle photographs capturing the extent of the devastation will appear alongside carefully framed moments of a particular scene.
Each of these makes up the visual narrative that surrounds global coverage of an event like a natural disaster and each offers a very different perspective and visual portrayal. From the standpoint of disaster response and aid, images of the damage itself are likely far more useful than those of politicians at podiums, while aerial images will likely be of use to different assessment teams than detailed images of a single building or damaged artifact. The surrounding text is often less useful in disaster situations, with image captions offering only the most basic descriptive detail in many cases.
Enter the Google Cloud Vision API. By assigning each image a set of topic and activity labels, along with the overt expressed emotional state of any visible faces, we can instantly triage the blind firehose of imagery about a disaster into each of the categories above. Predefined collections of categories can be used to group imagery into well-defined roles, while automated textual clustering can be applied to the textual labels to group images into natural thematic and emotional threads. For example, all of the images collected in the first hour of an event can be labeled and the resulting category and emotional labels fed into a traditional textual topic clustering algorithm to characterize the primary visual narrative threads around the event. Its a whole new way of understanding visual narratives. We're starting a number of early experiments around this very concept and will be rolling out our preliminary results over the coming weeks so stay tuned!