ArchiveBox

ArchiveBox looks like something I’m going to have to find the time to look into:

ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more)

You can use it to preserve access to websites you care about by storing them locally offline. ArchiveBox imports lists of URLs, renders the pages in a headless, autheticated, user-scriptable browser, and then archives the content in multiple redundant common formats (HTML, PDF, PNG, WARC) that will last long after the originals disappear off the internet. It automatically extracts assets and media from pages and saves them in easily-accessible folders, with out-of-the-box support for extracting git repositories, audio, video, subtitles, images, PDFs, and more.

I currently pay to have my Pinboard account archive pages I bookmark, but as a matter of principle I like the concept of having a toolset that enables me to have the ability to save and browse local copies of stuff. 1 I currently tend to grab stuff I think I’ll want to read and refer to later and either chuck the URL at Pinboard or else use Evernote’s ability to grab a page’s content and file it away safely, but it’d be nice to have another option for accessing stuff that caught my eye open to me.

[Via Four short links]

  1. I fully recognise that in principle once you publish stuff online it should be left at that same URL pretty much in perpetuity, but as someone who hasn’t yet got round to republishing huge swathes of the stuff I’ve published at Sore Eyes using different content management systems over the years I’m the last person to try to shame anyone on this front. But then, the simple reality is that this site is at best a very low-profile weblog, so the vast majority of the stuff I’ve published here consists of pointers to stuff that other people have published. Whatever tiny importance my link to that content might have had in drawing attention to that content is ancient history now.

Leave a Reply

Your e-mail address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

This site uses Akismet to reduce spam. Learn how your comment data is processed.