Following the template laid out in our BigQuery+UDF One Minute Maps, today we unveil a map that offers a glimpse into global violence against women and sexual violence. Each dot on the map represents a location with at least 25 articles discussing violence against women or sexual violence from February 2015 through October 17, 2016. Clicking on a location will display a selection of up to 50 articles mentioning that location in context with these topics. Remember that this map is 100% automatically generated and translation and processing errors mean you will always see some level of error in the map. All mentions of violence against women and sexual violence are drawn exclusively from the language used in the underlying article, not editorial decisions about what constitutes such behavior (an article must explicitly refer to something as sexual violence for it to be counted on the map below). An absence of mentions does not necessary imply that no sexual or gendered violence has occurred in that area – it means only that such violence has not been reported in the media, which could reflect censorship or other restrictions. In total, 3.3 billion mentions of location across 402 million articles were scanned to identify locations mentioned in the context of sexual or gendered violence, taking around 120 seconds to process more than 500GB of data using Google BigQuery.
TECHNICAL DETAILS
Making this map followed the template of the One Minute Maps tutorial, with a slightly changed UDF and SQL query to return the FeatureID as part of the results to enable the map to be merged with other geographic datasets down the road. WARNING: this query will consume 500GB of your monthly quota.
The SQL query was:
select a.LocationName LocationName, b.NumArticles NumArticles, a.Latitude Latitude, a.Longitude Longitude, a.FeatureID FeatureID, a.ArticleList ArticleList, a.SharingImage SharingImage from ( select max(LocationName) LocationName, Latitude, Longitude, min(FeatureID) FeatureID, count(distinct(DocumentIdentifier)) NumArticles, GROUP_CONCAT_UNQUOTED(UNIQUE(DocumentIdentifier),' ') ArticleList, max(SharingImage) SharingImage from ( select DENSE_RANK() OVER (PARTITION BY Latitude, Longitude ORDER BY DocumentIdentifier DESC) Rank, LocationName, Latitude, Longitude, FeatureID, CONCAT('<a href="', DocumentIdentifier, '" target="blank">Article Link</a>') DocumentIdentifier, SharingImage from ( SELECT LocationName, Latitude, Longitude, FeatureID, AssociatedThemes, DocumentIdentifier, SharingImage FROM GKGThemeListByLocation(( SELECT V2Locations, V2Themes, DocumentIdentifier, SharingImage FROM [gdelt-bq:gdeltv2.gkg] where DocumentIdentifier like 'http%' AND (V2Themes like '%WB_742_%' or V2Themes like '%RAPE%') )) ) where (AssociatedThemes like '%WB_742_%' or AssociatedThemes like '%RAPE%') ) where Rank <= 50 group by Latitude, Longitude order by NumArticles desc ) a JOIN EACH ( select Latitude, Longitude, count(distinct(DocumentIdentifier)) NumArticles from ( select Latitude, Longitude, DocumentIdentifier from ( SELECT LocationName, Latitude, Longitude, AssociatedThemes, DocumentIdentifier, SharingImage FROM GKGThemeListByLocation(( SELECT V2Locations, V2Themes, DocumentIdentifier, SharingImage FROM [gdelt-bq:gdeltv2.gkg] where DocumentIdentifier like 'http%' AND (V2Themes like '%WB_742_%' or V2Themes like '%RAPE%') )) ) where (AssociatedThemes like '%WB_742_%' or AssociatedThemes like '%RAPE%') ) group by Latitude, Longitude ) b ON a.Latitude = b.Latitude and a.Longitude = b.Longitude where b.NumArticles >= 25 and abs(integer(a.FeatureID)) > 0 order by b.NumArticles DESC
The UDF was:
function locationParserFun(row, emitFn) { var MAX_DISTANCE = 250; if (row.V2Locations === null || row.V2Themes === null) { return; } var locations = String(row.V2Locations).split(';'); var themes = String(row.V2Themes).split(';'); // emitFn({Location: locations[0], V2Themes: themes.join(';')}) for (var location of locations) { var locationFields = location.split('#'); if (locationFields.length < 2) { continue; } var locationOffset = parseInt(locationFields[locationFields.length - 1], 10); var closeThemes = []; for(var theme of themes) { var themeFields = theme.split(','); if (themeFields.length < 2) { continue; } var themeName = themeFields[0]; var themeOffset = parseInt(themeFields[1], 10); if ((locationOffset > themeOffset && locationOffset - themeOffset < MAX_DISTANCE) || (locationOffset < themeOffset && themeOffset - locationOffset < MAX_DISTANCE)) { //closeThemes.push(themeName + ',' + themeOffset); this emits the theme offsets if desired closeThemes.push(themeName); } } if (closeThemes.length > 0) { //output the final results if we had a set of affiliated themes... //clean up our SharingImage... var sharingimage = row.SharingImage; if (sharingimage != null) { var match = sharingimage.toLowerCase(); if (match.indexOf('button') > -1 || match.indexOf('bttn') > -1 || match.indexOf('.gif') > -1 || match.indexOf('template') > -1 || match.indexOf('default') > -1 || match.indexOf('logo') > -1 || match.indexOf('img_fb') > -1 || match.indexOf('facebook') > -1 || match.indexOf('figaro') > -1 || match.indexOf('og-angop') > -1 || match.indexOf('gannett-cdn') > -1 || match.indexOf('no_preview') > -1 || match.indexOf('risingkashmir') > -1 || match.indexOf('townnews.com') > -1 || match.indexOf('-square') > -1 || match.indexOf('apmobile') > -1 || match.indexOf('mynorthwest.com') > -1 || match.indexOf('the_tribune_sq') > -1 || match.indexOf('g-mtn.png') > -1 || match.indexOf('story-thumb-large') > -1 || match.indexOf('top_stories_stopimg') > -1 || match.indexOf('fb_image') > -1 || match.indexOf('cubadebate-ipad') > -1 || match.indexOf('jdsupra.com') > -1 || match.indexOf('nohotlinks') > -1 || match.indexOf('og_image_meridianstar') > -1 || match.indexOf('strat_n') > -1 || match.indexOf('skins') > -1 || match.indexOf('banner') > -1 || match.indexOf('analytics') > -1 || match.indexOf('.icon') > -1 || match.indexOf('-icon') > -1 || match.indexOf('/icon') > -1 || match.indexOf('blank.') > -1 || match.indexOf('fb.png') > -1 || match.indexOf('opengraph_default') > -1 || match.indexOf('fai2.png') > -1 || match.indexOf('http-equiv') > -1 || match.indexOf('deramit.jpg') > -1 || match.indexOf('recomendovano.jpg') > -1 || match.indexOf('story-thumb-large.jpg') > -1 || match.indexOf('no_image.jpg') > -1 || match.indexOf('30001090.cms') > -1 || match.indexOf('pic_fb.jpg') > -1 || match.indexOf('mw-dr328_mw_soc_ns_20150801233302.png') > -1 || match.indexOf('fft81_mf2574590.Jpeg') > -1 || match.indexOf('udn_baby.png') > -1 || match.indexOf('news_no_image') > -1 || match.indexOf('nzhfbcover') > -1 || match.indexOf('no_headline') > -1 || match.indexOf('lotus_img') > -1 || match.indexOf('tr_white_square') > -1 || match.indexOf('favicon') > -1 || match.indexOf('top_stories_thumb') > -1 || match.indexOf('/0.jpg') > -1 || match.indexOf('newog600x600') > -1 || match.indexOf('/w.png') > -1 || match.indexOf('eldiaesfb') > -1 || match.indexOf('donga_icon') > -1 || match.indexOf('dot200.png') > -1 || match.indexOf('_stopimg.jpg') > -1 || match.indexOf('herald_h.jpg') > -1 || match.indexOf('squaresun_web') > -1 || match.indexOf('avatar.jpg') > -1 || match.indexOf('news.jpg') > -1 || match.indexOf('fb/wm.png') > -1 || match.indexOf('share_noimage') > -1 || match.indexOf('twp-3000x1568.jpg') > -1 || match.length > 1200) { sharingimage = ''; } sharingimage = sharingimage.replace(/\\/g,'/'); sharingimage = sharingimage.replace(/['"]/g,''); sharingimage = sharingimage.replace(/[^A-Za-z0-9./%$&!?#():;-_=]/g,''); } else { sharingimage = ''; } //output back to BigQuery... var result = {LocationName: locationFields[1], Latitude: locationFields[5], Longitude: locationFields[6], FeatureID: locationFields[7], AssociatedThemes: closeThemes.join(';'), DocumentIdentifier: row.DocumentIdentifier, SharingImage: sharingimage}; emitFn(result) } } } bigquery.defineFunction( 'GKGThemeListByLocation', // Name of the function exported to SQL ['V2Locations', 'V2Themes', 'DocumentIdentifier', 'SharingImage'], // Names of input columns [{'name': 'LocationName', 'type': 'string'}, // Output schema {'name': 'Latitude', 'type': 'string'}, {'name': 'Longitude', 'type': 'string'}, {'name': 'FeatureID', 'type': 'string'}, {'name': 'AssociatedThemes', 'type': 'string'}, {'name': 'DocumentIdentifier', 'type': 'string'}, {'name': 'SharingImage', 'type': 'string'} ], locationParserFun // Reference to JavaScript UDF );