Last month we announced the new GDELT Article List and its accompanying RSS feed that lists every URL GDELT sees. This RSS feed was designed for organizations, such as memory institutions like web archives, that wish to preserve global news content, enabling them to see GDELT's live URL firehose in realtime to crawl and archive those URLs on their own.
GDELT has long provided an additional set of RSS feeds as part of the Internet Archive's "No More 404" initiative that index the images, outlinks, social media embeds and mobile URLs. We're excited today to make those feeds widely available as part of the GDELT Article List RSS feed collection.
There are now five live GDELT Article List RSS feeds that update every 60 seconds with a rolling 15 minute window of URLs:
- All Articles. [feed.rss] This feed contains the primary URL of every news article GDELT monitors.
- All Mobile Edition URLs. [feed-mobile.rss] This feed contains the mobile-optimized or AMP URLs for each article, if available. If an article contains metadata that provides a link to a mobile-optimized or AMP version of the article, it will be included in this feed. Many sites are phasing out mobile-specific URLs in favor of newer approaches that dynamically adjust image and video resolutions on-the-fly, rather than creating separate dedicated mobile websites, but for those sites that do so and include that URL in the article metadata, it will appear here.
- All Article Images. [feed-images.rss] This feed contains the URLs of all images found in the articles. Only imagery that is part of the article body will be included in this feed (advertisements, insets, headers/footers/chrome will not be included). In web archiving applications, this feed is useful for ensuring that article-critical imagery is preserved.
- All Article Outlinks. [feed-links.rss] This feed contains the URLs of all outlinks found in the articles. Only links that are part of the article body will be included in this feed. In web archiving applications, this feed is useful for ensuring that first-order outlinks are preserved.
- All Social Media Embeds. [feed-social.rss] This feed contains social media embeds, such as images embedded from Twitter or videos embedded from YouTube or Vimeo.