Force Archive Websites To Pick Up Webpages With This Handy Tool

Website archive services such as the Internet Archive: Wayback Machine are incredibly useful for when you need to see old versions of sites - either for nostalgia or because you're looking for a specific bit of information that has since been overwritten or deleted (such as a story you wrote for a former employer, for instance).

Photo: Pixabay

However, these services aren't perfect. There are times when an archive site might not make a snapshot of a site - typically, exactly when you need that snapshot most. Or perhaps someone has configured their site's robots.txt file to block archive services from performing their automatic crawls. No fun.

Thanks to a new tool from Motherboard, you can now attempt to archive the current version of a site across three different archive services at once: The Wayback Machine, Archive.is and Perma.cc (if you've set up a free account with them).

Installing Motherboard's archiving utility requires a little bit of legwork, but it isn't too tricky. You'll first need to install Python's requests, json and archiveis modules, which are all required for Motherboard's mass_archive tool to work. (Alas, this isn't just some simple executable or utility you can run.) The best way to install requests and json is to install pip first, and then use that to download the modules. You'll find archiveis here, and you can install it using pip as well.

You'll also need to grab the mass_archive.py script from the aforementioned GitHub project. Once you're ready, pull up a terminal in macOS or Linux and type this in (obviously, replacing example.com with the website you're looking to archive):

python mass_archive.py example.com

If you're using Python via an elevated command prompt in Windows, you can omit the initial "python" from that code.


Comments

Be the first to comment on this story!

Trending Stories Right Now