Sign up for our daily newsletter
A Raspberry Pi and an old hard drive were gathering dust in my drawer until the Internet Archive hack made headlines. Now they’re the heart of my local web archiving system, preserving everything from favorite blog posts to open-source projects. In this article, I’ll show you my step-by-step journey to create a private Internet archive and digital preservation independence using ArchiveBox.
The Internet Archive’s recent security breach hit the digital preservation community and all those who benefit from its work like a thunderbolt. On October 9th, hackers compromised the site and stole a massive user authentication database containing 31 million records.
What made matters worse was that this wasn’t the end of the Archive’s troubles. Just when they managed to restore some services by October 21st, hackers gained access to their Zendesk support system, demonstrating that the vulnerability ran deeper than initially thought.
Though the Archive has since resumed operations, its future remains uncertain because security breaches aren’t the only threat to digital preservation. A recent federal appeals court ruling dealt another significant blow to the Internet Archive, finding that their digital lending library wasn’t protected by fair use doctrine and could thus be forced to remove a significant chunk of its content.
The implications are clear: the need for personal control over digital preservation has never been more apparent. The good news is that anyone can set up a private internet archive using a Raspberry Pi and ArchiveBox with ease.
If you’re ready to create your own private internet archive, then you’ll need some hardware.
First and foremost, you’ll need a Raspberry Pi. For the best experience, I highly recommend the latest Raspberry Pi 5 because its significantly improved performance means your archiving tasks will run smoother and faster, and you’ll have plenty of headroom for future expansion of your archive.
That said, don’t feel pressured if you already own a Raspberry Pi 4B with 4GB or 8GB of RAM. These models are perfectly capable of running a personal archive, and they actually have one interesting advantage over the Pi 5: hardware H.264 video encoding. This becomes particularly valuable if you plan to stream archived videos to your TV or other devices around your home.
Along with your Pi, here’s what else you’ll need:
Once you have all these items on hand, you’re ready to start setting up your self-hosted internet archive!
The first step is to get an operating system up and running on your Raspberry Pi. I personally recommend Raspberry Pi OS because, as the official OS for Raspberry Pi devices, it’s by far the most popular and supported option available. You can follow our Raspberry Pi OS installation guide if you don’t know how to put it on your microSD card.
And if you’re feeling adventurous, you might want to explore some of the alternative operating systems available for the Raspberry Pi.
Once you have the operating system installed, boot up your Pi and connect it to the internet (it doesn’t matter if you use a wired or wireless connection). Then launch Terminal and perform a system update with the command:
When it comes to installing ArchiveBox, you have three options: Docker, an automatic setup script, or using your system’s package manager. I strongly recommend going with Docker. Not only does it provide the smoothest installation and update experience, but it also gives you the best security isolation and includes all the dependencies right out of the box.
Unfortunately, Docker isn’t pre-installed on Raspberry Pi OS, so we’ll need to set that up first (don’t forget to also perform the post-installation steps).
With Docker successfully installed, we’re ready to move on to installing ArchiveBox itself, which is going to be much simpler thanks to all the groundwork we’ve laid.
To install ArchiveBox using Docker, first create a directory where all your archived content will be stored. This will be your archive folder on the Raspberry Pi, so choose a location with ample storage, such as your external hard drive (you can navigate to it using the cd command):
Next, download the official Docker Compose configuration file that defines how ArchiveBox should run:
This configuration file is important because it sets up all the necessary components, including the web server and scheduled tasks. If you want to store your archive on an external drive instead of the Pi’s SD card (which is recommended), you’ll need to edit the “docker-compose.yml” file to point to your mounted drive location.
To do so, open the configuration file using any text editor, such as nano:
Look for the volumes section under the archivebox service. By default, it looks something like this:
We need to change ./data
to reflect the full path to our external drive’s data directory. For example, if your drive is mounted at /mnt/external_drive
, modify the line to look like this:
This tells Docker to store all ArchiveBox data in the “archivebox/data” directory on your external drive instead of using a relative path. Using the absolute path is important because it ensures Docker can always find your archive data, even if you run commands from different directories.
While you’re at it, you can also add the PUID
and PGID
environment variables to match your Pi’s user account. Find your user ID and group ID by running id -u
and id -g
, then add them to the environment section:
Finally, comment out or remove the sonic (faster and better searching for large collections) and novnc (allows you to set up a profile with logins to the sites you want to archive) services. The configuration of these optional services is beyond the scope of this guide, so I recommend you follow the official documentation if you’re interested in them.
The minimal working configuration should look something like this:
Save the file and exit the editor. Now initialize your archive and create an admin user to access the web interface:
Once the initialization is completed, you can start the ArchiveBox server:
You can now access your ArchiveBox instance by opening a web browser and navigating to http://localhost:8000. Try it now. This is what you should see:
To customize ArchiveBox’s behavior, you don’t need to edit configuration files directly. Instead, use the config
command to modify settings. For example, I always adjust timeouts and resource limits for better performance on the Raspberry Pi:
You can also disable submitting to archive.org to speed up archiving:
All settings are automatically saved in the ArchiveBox.conf
file in your data directory, and you can view current settings anytime by running:
With the basic setup complete, you can start adding content to your archive. ArchiveBox supports multiple ways to add URLs. The most straightforward one is the web interface. You simply click the Add button, paste your URLs, and click the Add URLs and archive button.
In some situations, it can be more convenient to archive via the command line. For example, to archive a single webpage, you can run:
Or to archive an entire list of URLs from a text file:
Finally, you can import from various bookmark services, including Pocket, Pinboard, or Instapaper. Please check the official wiki for detailed instructions.
Remember that your archive is as secure as the backups you maintain. To protect all the content you’re trying to preserve, I highly recommend implementing a reliable backup strategy with the help of the best Linux backup software to protect against data loss, power failures, or accidental deletions.
Cover image and screenshots by David Morelo.
Our latest tutorials delivered straight to your inbox
David Morelo is a professional content writer in the technology niche, covering everything from consumer products to emerging technologies and their cross-industry application. His interest in technology started at an early age and has only grown stronger over the years.
How to Run Any Program from Any Linux Distro with Distrobox
How to Install and Set Up Mumble Voice Chat Software
How to Back Up Your Raspberry Pi
Use CasaOS to Easily Manage Your Homelab Server
Make a Custom Web Gallery with Piwigo
Affiliate Disclosure: Make Tech Easier may earn commission on products purchased through our links, which supports the work we do for our readers.
© 2024 Uqnic Network Pte Ltd.
All rights reserved.