GEN4: GDPR Geotargeting & US News Outlets: NPR

Last year we noted a sharp rise in HTTP 451 errors from our crawlers based in the European Union due to GDPR targeting by US news outlets. In response to this growing trend, we completely reimagined the queuing and distribution fabrics of our GEN3 architecture, empowering our globally distributed crawler fleets to continually monitor for geotargeting artifacts, autonomously learn which outlets rely on such practices and dynamically redirect traffic to those sites away from targeted regions. The resulting architecture has performed beyond even our most optimistic projections, completely eliminating 451 errors.

As we've been monitoring for global networking shifts related to Ukraine, we've been spending a lot of time evaluating the traffic observation notifications that our crawlers produce when they observe a behavioral response anomaly from a website. This led us to notice a different GDPR-related geotargeting practice that long predates Ukraine, but suggests further enhancement work to our geotargeting detection systems: HTTP GDPR redirects.

Most modern websites use embedded JavaScript code within each page to handle GDPR notifications and user privacy preference selection. However, it appears that a small number of older sites that have yet to be modernized rely on hard HTTP redirects that redirect browsers in the EU to a separate webpage or even entirely separate website to set their privacy preferences.

Surprisingly, NPR.org is among these.

Any access to a news article on NPR's website from an IP address in the EU that does not yet have the NPR privacy cookies set will result in the browser being redirected to "https://choice.npr.org/index.html" via a redirect. Thus, browsing to "https://www.npr.org/2022/03/10/1085448058/inflation-40-year-high-gas-prices-energy-russia-ukraine" from the EU will result in the browser being redirected to "https://choice.npr.org/index.html?origin=https://www.npr.org/2022/03/10/1085448058/inflation-40-year-high-gas-prices-energy-russia-ukraine".

As GEN4 comes online with its advanced telemetry data, we will be conducting an extensive survey of global news media geotargeting practices with an eye towards enhancing our global distribution system with a new capability of automatically detecting such hard geotargeting redirects and autonomously learning the best geographic placement of those sites within our global geographic crawler footprint.