How Much of the Internet Does The Wayback Machine Really Archive?

Kalev's latest piece for Forbes explores the Internet Archive's Wayback Machine, which turns 20 years old next year and holds over 450 billion web pages and 22 petabytes of data. The piece uses several techniques to peer inside the archive as a whole, exploring its holdings and finding a significant need for a better understanding of how the Wayback Machine functions before it can be used for robust reliable scholarly study of the evolution of the open web. The piece generated widespread conversation across the web archival community, including many mailing lists, blog posts, library posts, tweets, and even the Smithsonian's weekly roundup of archival news and has been viewed more than 22,500 times in the year and a half since.

Read the Full Article.