Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include brotli #34

Open
v-python opened this issue Jun 6, 2020 · 38 comments
Open

Include brotli #34

v-python opened this issue Jun 6, 2020 · 38 comments

Comments

@v-python
Copy link

v-python commented Jun 6, 2020

It would be good to support all the compression, or at least decompression, schemes supported by servers and browsers. This looks like a great step forward to not having to write compression and decompression in javascript.

My use-case would be to download a bunch of brotli files in an archive (somewhat like tar), split them into individual files (unarchive, but don't decompress yet) in the service-worker cache, and return them to the browser as called for.

@eustas
Copy link

eustas commented Jul 31, 2020

Mentioned in https://github.com/WICG/compression/blob/master/explainer.md#future-work

@prajaybasu
Copy link

This needs to be higher up in the priorities.

In my case, I need to download a large binary/json data file and Brotli's compression offers significant savings compared to Zopfli. I cannot use a CDN because the file is either too large or the CDN's on-the-go compression is not suitable for the file.

Also, Brotli is already included in most browsers as a part of H2.

@ricea
Copy link
Collaborator

ricea commented Jan 22, 2021

@prajaybasu As far as I know, browsers are only shipping the decompression dictionary. Brotli has a separate dictionary needed for compression, which would significantly increase the size of the browser.

It's possible to add Brotli only to DecompressionStream and not to CompressionStream, but I feel it would be confusing.

So currently I am adopting a wait-and-see approach.

@v-python
Copy link
Author

It will be more confusing to realize that one browser supported decompression algorithm is not available, and have to locate a Javascript decompressor for it, and figure out how to use it, to work around the fact that it isn't built in. Especially when it is the best compression for the type of data being sent from the server.

@lucacasonato
Copy link
Member

@ricea Deno would be interested in a brotli option for DecompressionStream and CompressionStream. Even if it is not immediately standardized due to browser pushback, could we at least agree to the label that brotli will use? That way an addition in Deno will not be incompatible with a future browser implementation.

The two real contenders for brotli would be br and brotli I think. The former matches the value brotli has in content-encoding, but the latter is clearer to novices.

@ricea
Copy link
Collaborator

ricea commented Feb 4, 2022

I've been matching the content-encoding header as a rule, so I'm inclined to choose br. Maybe we could have brotli as an alias? I'm not sure how I feel about that.

IIRC, Brotli has a number of customisable settings which we would ultimately want to expose through the API, but as with DEFLATE we can add them later.

@lucacasonato
Copy link
Member

Sounds good. I'll throw up a tentative spec PR so we have something a little more formal.

@jasnell
Copy link

jasnell commented Feb 8, 2022

Both Node.js and Cloudflare Workers would also be interested in a brotli option for these. Looking forward to the spec PR.

@devongovett
Copy link

This would be great. I currently maintain brotli.js, a JavaScript-based brotli decompressor. Personally, I use it to decode WOFF2 fonts as part of fontkit, but it is very widely used (1.8M downloads per week on npm) by many others as well. It works well, but it is quite heavy in terms of JS payload size of the library itself, so it would be much better if I could use the native brotli decoders built into browsers instead.

@eustas
Copy link

eustas commented Jun 12, 2022

Browser-wise it is better to add it as a native library wrapper, as it is done for gzip. This way, its footprint will be nearly zero - decoder library is already linked into browser.
For pure JS decoder (to enable polyfills) we also have java-to-js-transpiled decoder. For encoder - well, it would be nice to have native library; if not - I agree, WASM is a viable option.

@stwrt
Copy link
Contributor

stwrt commented Sep 8, 2022

I discussed with some colleagues (who also work on WebKit), and we generally support the idea of adding Brotli to the Compression spec.

@paralin
Copy link

paralin commented May 10, 2024

I am decompressing some files using DecompressionStream and being able to decompress brotli would save a lot of space. The browser already decodes Content-Encoding: br compression, so it should be possible to also expose this decoder in DecompressionStream.

@annevk
Copy link
Member

annevk commented Aug 14, 2024

@ricea @saschanaz @smaug---- what do you all think about adding Brotli now? WebKit would like to go for it. @lucacasonato still interested in writing a PR?

@ricea
Copy link
Collaborator

ricea commented Aug 14, 2024

I'm coming around to the idea of shipping decompression only. Yes, it will confuse people but it provides value.

@annevk
Copy link
Member

annevk commented Aug 14, 2024

@ricea we would like to add both compression and decompression.

@ricea
Copy link
Collaborator

ricea commented Aug 15, 2024

Enabling brotli for compression is difficult for Blink because we don't currently ship the compression side of the library and it has a 190KB binary size cost just for the built-in dictionary. Adding anything over 16KB to Chromium requires a good justification.

It's been suggested that we could ship with the built-in dictionary disabled, but that would provide an unsatisfactory experience for our users.

If we can ship decompression first we can hopefully demonstrate demand that will justify the large binary size increase.

@Pomax
Copy link

Pomax commented Sep 4, 2024

It should be fairly reasonable to argue that those 190KB are fine though? Even if it's just to enable client-side font generation, right now any website that needs a brotli encoder has to load half a megabyte of library, which thanks to per-site caching means users will need to download those solutions at least once for each website that uses one, and realistically thanks to modern bundling practices, multiple times for the same website because folks end up bundling the encoder into their app bundles and then push out updated bundles.

Saving many megabytes for all Blink and Blink-derivative users both in bandwidth and JS load/parse time by having Blink add 190KB to the binary sounds like a fairly big win for everyone.

@paralin
Copy link

paralin commented Sep 4, 2024

190KB seems well worth it if not entirely negligible.

Edit: I personally only care about decompression :)

@Timmmm
Copy link

Timmmm commented Sep 7, 2024

it has a 190KB binary size cost just for the built-in dictionary

Surely the dictionary has to be present for decompression already?

@pkasting
Copy link

pkasting commented Sep 7, 2024

it has a 190KB binary size cost just for the built-in dictionary

Surely the dictionary has to be present for decompression already?

#34 (comment)

@pkasting
Copy link

pkasting commented Sep 7, 2024

190KB seems well worth it if not entirely negligible.

As noted, we define <16kB to be negligible. I spent several months working on changes to improve internal APIs that I then did not land because their binary size cost was 20k. 190k is a very large chunk. It's not so large it could never be paid, but it's not something to be just given away on a nice-to-have.

Shipping decompression first and re-evaluating would make the tradeoffs clearer.

@lxgr
Copy link

lxgr commented Sep 8, 2024

Brotli has a separate dictionary needed for compression, which would significantly increase the size of the browser.

Isn't it the same dictionary used for both compression and decompression? If so, how would it make sense to include only decompression, but not compression?

The only large-ish encoder-only data I could find is the dictionary hash table (https://github.com/google/brotli/blob/master/c/enc/dictionary_hash.c), but that looks like it can be generated pretty easily from the dictionary itself, and otherwise also compresses pretty well itself (to <30 KB with gzip, for example).

So where is this 190 KB number coming from? Or is decompressing/building the dictionary at install or runtime considered infeasible too?

@Timmmm
Copy link

Timmmm commented Sep 8, 2024

Yeah a dictionary is a dictionary surely? Presumably the extra size for compression is because it is indexed in some way to improve compression speed? Could that index generation not be done at runtime on demand?

@pmarreck
Copy link

pmarreck commented Sep 8, 2024

@ricea :

Adding anything over 16KB to Chromium requires a good justification.

"There are likely petabytes of waste going across the wire today b/c someone was worried about < 200kb install size while also insisting that compression must be symmetrical lest it confuse ppl. Admittedly I'm reading this issue blind so I might be missing other context, but this feels very penny-wise, pound-foolish. " (from a Reddit comment in a thread about this)

@saschanaz
Copy link
Member

Mozilla is open to add decompression support only. There certainly can be developer confusion, but as long as it's feature detectable we think it's manageable.

(let supportsBrotli = true; try { new Compression("brotli") } catch { supportsBrotli = false } is a bit too verbose, I wonder we should have .supports() function.)

@lxgr
Copy link

lxgr commented Sep 10, 2024

Another thought: Is the encoding dictionary even mandatory for compression? As far as I can tell from the spec, only decoders need to include it.

If that's really the point of contention, presumably it should be possible to just ignore it, at the expense of slightly worse compression performance for some small files.

@ricea
Copy link
Collaborator

ricea commented Sep 11, 2024

The built-in compression dictionary is not mandatory, but it makes a huge difference for input of a few kilobytes or less. Predictability is very important to developers, and if two browsers both claim to support Brotli developers should be able to expect to get comparable levels of compression from them.

@pmeenan
Copy link

pmeenan commented Sep 11, 2024

FWIW, the compression dictionary is mostly seeded with web-facing content that was expected to be compressed for the content-encoding use case (HTML, JS, CSS, etc). It's not immediately obvious that the use cases for CompressionStream (at least within a browser) would benefit as much from having the default dictionary available (or, optionally, each browser could choose to include the dictionary or not based on the device it is being installed on).

It wouldn't necessarily need to be detectable or something that the caller cares about, they'd just get (possibly) slightly less compression in some cases than they might in others but the resulting stream would still be smaller than gzip and compatible with existing decoders.

Bonus points if the api is considered for adding support for compression/decompression options like compression level, large window (at which point brotli would be more appropriate than br since br carries a specific window size config) and user-provided dictionaries.

@ricea
Copy link
Collaborator

ricea commented Sep 12, 2024

Bonus points if the api is considered for adding support for compression/decompression options like compression level, large window (at which point brotli would be more appropriate than br since br carries a specific window size config) and user-provided dictionaries.

I want to design the option bag separately, as many of the options will be common between different compression algorithms.

@pmeenan
Copy link

pmeenan commented Sep 12, 2024

I want to design the option bag separately, as many of the options will be common between different compression algorithms.

Sorry, yes, I just wouldn't want to add brotli with also addressing the option bag at the same time. At a minimum for compression level but it doesn't make much sense to provide algorithm control without also providing some level of control over the settings.

@jesup
Copy link

jesup commented Sep 27, 2024

Personally I'm in favor of including compression. However, we do have other forms of compression available. It did sound in the breakout that there are some usecases (see above as well) that explicitly need brotli compression, and without it will need to download a polyfill for it, which is expensive.

@jesup
Copy link

jesup commented Sep 27, 2024

Reading over the notes, I think my position is now: - include or not. Perhaps: Let's get decompression for brotli (and zstd) in, and have a separate issue for adding encoder support, since there isn't consensus yet that it's needed and worth the cost. Google clearly has issues with the cost to include it currently. Perhaps we can get the cost down as mentioned in the meeting.
https://www.w3.org/2024/09/25-brotli-irc

@nektro
Copy link

nektro commented Sep 27, 2024

Bun would be interested in supporting this since we already ship with brotli support internally for other APIs

@Timmmm
Copy link

Timmmm commented Sep 28, 2024

I think it's still unclear what the costs are. That IRC chat mentioned 190kB for the dictionary hash (the dictionary itself is already present for decompression), and also 400kB maybe for code (but was that talking about zstd? not super clear.) Would be worth answering:

  1. What is the code size increase due to the extra compression code (not including any dictionary hashes)?
  2. Can the dictionary hash be generated from the decompression dictionary at runtime? Is that acceptable?
  3. Can the dictionary hash be stored on disk and loaded on demand? @ricea said "Adding anything over 16KB to Chromium requires a good justification." but I'm not sure if he meant download size, disk space, code size or RAM usage.

I did have a look at the dictionary hash (which is only 64kB btw) to see how long it takes to generate it at runtime. Unfortunately it seems like the code to generate it in the first place was never open sourced. At least I couldn't find it. It's possibly the same as this but I dunno, that's only used if BROTLI_EXPERIMENTAL is defined.

@ricea
Copy link
Collaborator

ricea commented Sep 28, 2024

You can see the breakdown here (drill into third_party/brotli): https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2024%2F09%2F26%2F2036235%2Fsupersize_diff.sizediff#focus=60

It appears to contain about 280KB of static data and 220KB of code.

Creating the dictionary hash at runtime might be acceptable, as long as it doesn't take a ridiculously long time.

It's mainly disk space we're worried about, so loading in demand doesn't help.

To summarise the discussion at TPAC 2024, we are going to add Brotli to the standard, but Chrome still won't be able to ship it for a while. It appears that Safari is ready to ship and Firefox is also positive on shipping.

@Pomax
Copy link

Pomax commented Sep 28, 2024

I have to admit I'm a little surprised that google, who invented brotli would have reservations about including it. They invented it for a reason, and being compression for the web is literally it =)

(Not to mention the current Chrome footprint. 500kb extra to save many times that over the lifetime of the browser, added to a 100+MB product when it's your own technology that you're adding to the product you literally designed the thing for, seems like a product manager no-brainer)

@AlexanderOMara
Copy link

Maybe a silly question, but: What if the Brotli compression dictionary was compressed with Brotli, which could be decompressed as needed using the decompression dictionary?

@ricea
Copy link
Collaborator

ricea commented Oct 22, 2024

Maybe a silly question, but: What if the Brotli compression dictionary was compressed with Brotli, which could be decompressed as needed using the decompression dictionary?

I tried this once a long time ago, but the results were disappointing. I don't know why, as it does seem like it ought to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests