If anyone is interested, I wrote a long blog post where I analyzed all the various ways of saving HTML pages into a single file, starting back in the 90s. It'll answer a lot of questions asked in this thread (MHTML, SingleFile, web archive, etc.)
I always ship single file pages whenever possible. My original reasoning for this was that you should be able to press view source and see everything. (It follows that pages should be reasonably small and readable.)
An unexpected side effect is that they are self contained. You can download pages, drag them onto a browser to use them offline, or reupload them.
I used to author the whole HTML file at once, but lately I am fond of TypeScript, and made a simple build system to let me write games in TS and have them built to one HTML file. (The sprites are base64 encoded.)
On that note, it seems (there is a proposal) that browsers will eventually get support for TypeScript syntax, at which point I won't need a compiler / build step anymore. (Sadly they won't do type checking, but hey... baby steps!)
Hm, very interesting, especially for bookmarking/archiving.
I'm curious, why not use the MHTML standard for this?
- AFAIK data URIs have practical length limits that vary per browser. MHTML would enable bundling larger files such as video.
- MHTML would avoid transforming meaningful relative URLs into opaque data URIs in the HTML attributes.
- MHTML is supported by most major browsers in some way (either natively in Chrome or with an extension in Safari, etc).
- MIME defines a standard for putting pure binary data into document parts, so it could avoid the 33% size inflation from base64 encoding. That said, I do not know if the `binary` Content-Transfer-Encoding is widely supported.
simonw ·235 days ago
Wrote up a TIL about this with more details: https://til.simonwillison.net/chrome/headless
My own https://shot-scraper.datasette.io/ tool (which uses headless Playwright Chromium under the hood) has a command for this too:
But it's neat that you can do it with just Google Chrome installed and nothing else.Show replies
russellbeattie ·235 days ago
https://www.russellbeattie.com/notes/posts/the-decades-long-...
Show replies
andai ·235 days ago
An unexpected side effect is that they are self contained. You can download pages, drag them onto a browser to use them offline, or reupload them.
I used to author the whole HTML file at once, but lately I am fond of TypeScript, and made a simple build system to let me write games in TS and have them built to one HTML file. (The sprites are base64 encoded.)
On that note, it seems (there is a proposal) that browsers will eventually get support for TypeScript syntax, at which point I won't need a compiler / build step anymore. (Sadly they won't do type checking, but hey... baby steps!)
Show replies
lopkeny12ko ·235 days ago
https://www.npmjs.com/package/single-file-cli
Show replies
jchook ·235 days ago
I'm curious, why not use the MHTML standard for this?
- AFAIK data URIs have practical length limits that vary per browser. MHTML would enable bundling larger files such as video.
- MHTML would avoid transforming meaningful relative URLs into opaque data URIs in the HTML attributes.
- MHTML is supported by most major browsers in some way (either natively in Chrome or with an extension in Safari, etc).
- MIME defines a standard for putting pure binary data into document parts, so it could avoid the 33% size inflation from base64 encoding. That said, I do not know if the `binary` Content-Transfer-Encoding is widely supported.
Show replies