Another day, another successful de-platforming campaign, but when the Internet Archive purges its eponymous cache of the offending texts, a different kind of slight on one's natural rights is taking place.
In the first instance, those who own or otherwise control the machines that serve up the apparition unmistakably have the moral and legal right to kick out a guest, though naturally subject to rights under an orderly eviction. In the second case, ostensibly professional archivist organizations are choosing what portions of online history are worth preserving and which should be banished to the flames, according to some opaque and invariably ideologically driven principle.
However, a happy fact about technology is that we can use it to free ourselves from total preoccupation with political or philosophical questions. At a certain point, one chooses to exercise prudence and construct a first attempt. They see what happens, make tweaks, and see what works. (More broadly, I may suggest discovering what aids in the cultivation of true virtues within and across online communities.)
In this stage of technological development, little is achieved by a mob insistent on burning books at the local library. Campaigns run by ACLU attorneys to suppress particular books on Amazon can have only a minimal and transitory effect. These are stochastic acts, and producing fear is their effect if not their intended purpose. If demoralization, or acceptance of defeat, follows, further strictures on access or production of texts can be formalized, and the process repeats.
For all this smoke and mirrors—and to be clear, these are quite dangerous; one is the chief cause of mortality in house fires, and the other may inflict life-threatening grievous wounds—any person reasonably competent with technology can assemble and share their own library today. We knew this as early as listservs, Usenet, Kazaa, BitTorrent, and all the rest. We know it with eBooks and home media servers.
But naturally, it's true also of webpages or texts. Many of you who read this article will spend more time reading blogs, forum threads (group blogs), and Substack articles (blogs) than all the rest, perhaps combined. And in the end, which is more likely: Will you find some anon's article purged from existence, or will Clint Eastwood films be forever lost?
So, let's depart from history and philosophy and pick up the technique. As I mentioned, it's a happy accident that – despite widespread over-engineering – the typical webpage is a few megabytes in size. And if this is cleaned up and reduced to text and essential images, as many "readability" apps do, the file size may be one-tenth of that. One can comfortably store a few thousand articles – providing a few months of continuous reading time – in the space it takes to store one movie. We won't face severe physical or digital constraints here.
Is your library personal, private, or public?
We are seeing a resurgence of decentralized or semi-autonomous communities on the web that resemble the early semi-commercial blogosphere. There are analogies between the webring and something like Substack, or more abstractly, the emergent associations of the podcasting and social media world. Whatever can be said, it is far more interesting than the Big Three, so we should count our blessings.
Whether you're looking to curate links and articles with a broader community of mutuals, collect webpages to share with a couple of friends, or stockpile articles for yourself, you have a few different options.
Save a page to disk
When it comes to saving web pages – that's right, just File and Save – you obtain something substantially incomplete. Many browsers won't preserve media, layout, or style from the original. Others may generate a bunch of files and attempt to represent locally some tiny slice of the web. Safari produces exports of the webarchive format, embedding all of a webpage's key assets into one file. Another option is to save webpages as PDFs or to use specialized programs to extract text.
You'll end up with a number of files, which you can organize sensibly into folders, and your operating system's search functionality is good enough for you to locate what you need. Maybe this is all you need.
Host text files
Plain text has its benefits, if you're not afraid of copy and paste. A savvy user can host them quite easily with any web server, or use static hosting like GitHub Pages to share them across the web.
Large archives house text-only manifestos, essays, and other delusions from creative networks predating the blogosphere: email chains, BBS, newsgroups, listservs, etc. Some newer primitives have entered the text file enthusiast space; today you can join a decentralized microblogging network driven by text or even run your own. Text files are primitive in creative computing, and you can develop your projects around hosting them on your own server or use them to get into artificial intelligence and natural language processing. Don't discount them.
Use an indie bookmarking site
If you're looking for something that goes beyond a lonely folder on your drive, involves less work, and you're comfortable delegating control to a small outfit, Pinboard is a simple paid bookmarking tool that will allow you to assemble a public or private collection of links. For a few extra bucks, they will also save copies of the sites you bookmark.
Other options include Pocket and Instapaper, among many many others. Many of these services blur the line with the quaint RSS feed reader, worthy of its own tools and article.
Launch your own web archive
Thanks to open-source software, you can launch your own bookmarking site with just a bit more work. Shiori is an excellent option here, and you can run it on your own infrastructure in minutes. If you're comfortable with Docker, this guide will help you get off and running and web-accessible.
You can also run it locally and try it out in a few steps. Download the Shiori binary for your platform from Github, move it to your applications folder, or update your system path.
Check out what you can do with Shiori on the command line, which includes importing URLs and exporting bookmark data, by running:
shiori -h
Then, run the web application locally with the command:
shiori serve
Once properly started, you can visit it locally at http://localhost:8080 , and log in with the default credentials:
username: shiori
password: gopher
This will give you a sense of the interface, and it's fully functional. However, since you're running this web application locally, it won't be accessible to others, and it won't be accessible from your browser when the program isn't running.
In practice, you should follow a similar procedure on your server to enable yourself and others to access this application and keep it running day and night.
When you log in, you'll be met with a simple UI displaying your collection of bookmarks:
You can add new URLs with a click and archive these pages by default, resulting in two copies of the webpage being saved to your server: A text copy processed to improve readability and an archive of HTML and assets replicating the original site.
Settings are kept fairly minimal. Most parameters for bookmarks, including title, excerpt, and tags, will be pulled in automatically, though they can be manually adjusted. The application has a list and tile view, and you can add other users to your site, granting read-only access if you like.
Developers can leverage Shiori for their own creations using its open APIs, and an open source browser extension for Firefox and Chrome is also available for convenient use.
From there, you can create and share accounts with friends or extend Shiori through its API to authenticate your mutuals and give them access. The web app is now yours to make your own.
Seek even more control?
ArchiveBox is also open-source software but enables greater configuration and is more suitable for larger collections shared across vast numbers of users. Think of it as powering less a library of archived bookmarks but your own Internet Archive or WayBack Machine. You can set it to continuously archive pages and feeds as HTML files, PDFs, image screenshots, and more, including embedded media. This is option is an excellent choice for power users and developers, and is well-documented.
- YouTube youtu.be