-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include brotli #34
Comments
This needs to be higher up in the priorities. In my case, I need to download a large binary/json data file and Brotli's compression offers significant savings compared to Zopfli. I cannot use a CDN because the file is either too large or the CDN's on-the-go compression is not suitable for the file. Also, Brotli is already included in most browsers as a part of H2. |
@prajaybasu As far as I know, browsers are only shipping the decompression dictionary. Brotli has a separate dictionary needed for compression, which would significantly increase the size of the browser. It's possible to add Brotli only to DecompressionStream and not to CompressionStream, but I feel it would be confusing. So currently I am adopting a wait-and-see approach. |
It will be more confusing to realize that one browser supported decompression algorithm is not available, and have to locate a Javascript decompressor for it, and figure out how to use it, to work around the fact that it isn't built in. Especially when it is the best compression for the type of data being sent from the server. |
@ricea Deno would be interested in a brotli option for The two real contenders for brotli would be |
I've been matching the IIRC, Brotli has a number of customisable settings which we would ultimately want to expose through the API, but as with DEFLATE we can add them later. |
Sounds good. I'll throw up a tentative spec PR so we have something a little more formal. |
Both Node.js and Cloudflare Workers would also be interested in a brotli option for these. Looking forward to the spec PR. |
This would be great. I currently maintain brotli.js, a JavaScript-based brotli decompressor. Personally, I use it to decode WOFF2 fonts as part of fontkit, but it is very widely used (1.8M downloads per week on npm) by many others as well. It works well, but it is quite heavy in terms of JS payload size of the library itself, so it would be much better if I could use the native brotli decoders built into browsers instead. |
Browser-wise it is better to add it as a native library wrapper, as it is done for gzip. This way, its footprint will be nearly zero - decoder library is already linked into browser. |
I discussed with some colleagues (who also work on WebKit), and we generally support the idea of adding Brotli to the Compression spec. |
I am decompressing some files using DecompressionStream and being able to decompress brotli would save a lot of space. The browser already decodes Content-Encoding: br compression, so it should be possible to also expose this decoder in DecompressionStream. |
@ricea @saschanaz @smaug---- what do you all think about adding Brotli now? WebKit would like to go for it. @lucacasonato still interested in writing a PR? |
I'm coming around to the idea of shipping decompression only. Yes, it will confuse people but it provides value. |
@ricea we would like to add both compression and decompression. |
Enabling brotli for compression is difficult for Blink because we don't currently ship the compression side of the library and it has a 190KB binary size cost just for the built-in dictionary. Adding anything over 16KB to Chromium requires a good justification. It's been suggested that we could ship with the built-in dictionary disabled, but that would provide an unsatisfactory experience for our users. If we can ship decompression first we can hopefully demonstrate demand that will justify the large binary size increase. |
It should be fairly reasonable to argue that those 190KB are fine though? Even if it's just to enable client-side font generation, right now any website that needs a brotli encoder has to load half a megabyte of library, which thanks to per-site caching means users will need to download those solutions at least once for each website that uses one, and realistically thanks to modern bundling practices, multiple times for the same website because folks end up bundling the encoder into their app bundles and then push out updated bundles. Saving many megabytes for all Blink and Blink-derivative users both in bandwidth and JS load/parse time by having Blink add 190KB to the binary sounds like a fairly big win for everyone. |
190KB seems well worth it if not entirely negligible. Edit: I personally only care about decompression :) |
Surely the dictionary has to be present for decompression already? |
|
As noted, we define <16kB to be negligible. I spent several months working on changes to improve internal APIs that I then did not land because their binary size cost was 20k. 190k is a very large chunk. It's not so large it could never be paid, but it's not something to be just given away on a nice-to-have. Shipping decompression first and re-evaluating would make the tradeoffs clearer. |
Isn't it the same dictionary used for both compression and decompression? If so, how would it make sense to include only decompression, but not compression? The only large-ish encoder-only data I could find is the dictionary hash table (https://github.com/google/brotli/blob/master/c/enc/dictionary_hash.c), but that looks like it can be generated pretty easily from the dictionary itself, and otherwise also compresses pretty well itself (to <30 KB with gzip, for example). So where is this 190 KB number coming from? Or is decompressing/building the dictionary at install or runtime considered infeasible too? |
Yeah a dictionary is a dictionary surely? Presumably the extra size for compression is because it is indexed in some way to improve compression speed? Could that index generation not be done at runtime on demand? |
@ricea :
"There are likely petabytes of waste going across the wire today b/c someone was worried about < 200kb install size while also insisting that compression must be symmetrical lest it confuse ppl. Admittedly I'm reading this issue blind so I might be missing other context, but this feels very penny-wise, pound-foolish. " (from a Reddit comment in a thread about this) |
Mozilla is open to add decompression support only. There certainly can be developer confusion, but as long as it's feature detectable we think it's manageable. ( |
Another thought: Is the encoding dictionary even mandatory for compression? As far as I can tell from the spec, only decoders need to include it. If that's really the point of contention, presumably it should be possible to just ignore it, at the expense of slightly worse compression performance for some small files. |
The built-in compression dictionary is not mandatory, but it makes a huge difference for input of a few kilobytes or less. Predictability is very important to developers, and if two browsers both claim to support Brotli developers should be able to expect to get comparable levels of compression from them. |
FWIW, the compression dictionary is mostly seeded with web-facing content that was expected to be compressed for the content-encoding use case (HTML, JS, CSS, etc). It's not immediately obvious that the use cases for CompressionStream (at least within a browser) would benefit as much from having the default dictionary available (or, optionally, each browser could choose to include the dictionary or not based on the device it is being installed on). It wouldn't necessarily need to be detectable or something that the caller cares about, they'd just get (possibly) slightly less compression in some cases than they might in others but the resulting stream would still be smaller than gzip and compatible with existing decoders. Bonus points if the api is considered for adding support for compression/decompression options like compression level, large window (at which point |
I want to design the option bag separately, as many of the options will be common between different compression algorithms. |
Sorry, yes, I just wouldn't want to add brotli with also addressing the option bag at the same time. At a minimum for compression level but it doesn't make much sense to provide algorithm control without also providing some level of control over the settings. |
Personally I'm in favor of including compression. However, we do have other forms of compression available. It did sound in the breakout that there are some usecases (see above as well) that explicitly need brotli compression, and without it will need to download a polyfill for it, which is expensive. |
Reading over the notes, I think my position is now: - include or not. Perhaps: Let's get decompression for brotli (and zstd) in, and have a separate issue for adding encoder support, since there isn't consensus yet that it's needed and worth the cost. Google clearly has issues with the cost to include it currently. Perhaps we can get the cost down as mentioned in the meeting. |
Bun would be interested in supporting this since we already ship with brotli support internally for other APIs |
I think it's still unclear what the costs are. That IRC chat mentioned 190kB for the dictionary hash (the dictionary itself is already present for decompression), and also 400kB maybe for code (but was that talking about zstd? not super clear.) Would be worth answering:
I did have a look at the dictionary hash (which is only 64kB btw) to see how long it takes to generate it at runtime. Unfortunately it seems like the code to generate it in the first place was never open sourced. At least I couldn't find it. It's possibly the same as this but I dunno, that's only used if |
You can see the breakdown here (drill into third_party/brotli): https://chrome-supersize.firebaseapp.com/viewer.html?load_url=https%3A%2F%2Fstorage.googleapis.com%2Fchromium-binary-size-trybot-results%2Fandroid-binary-size%2F2024%2F09%2F26%2F2036235%2Fsupersize_diff.sizediff#focus=60 It appears to contain about 280KB of static data and 220KB of code. Creating the dictionary hash at runtime might be acceptable, as long as it doesn't take a ridiculously long time. It's mainly disk space we're worried about, so loading in demand doesn't help. To summarise the discussion at TPAC 2024, we are going to add Brotli to the standard, but Chrome still won't be able to ship it for a while. It appears that Safari is ready to ship and Firefox is also positive on shipping. |
I have to admit I'm a little surprised that google, who invented brotli would have reservations about including it. They invented it for a reason, and being compression for the web is literally it =) (Not to mention the current Chrome footprint. 500kb extra to save many times that over the lifetime of the browser, added to a 100+MB product when it's your own technology that you're adding to the product you literally designed the thing for, seems like a product manager no-brainer) |
Maybe a silly question, but: What if the Brotli compression dictionary was compressed with Brotli, which could be decompressed as needed using the decompression dictionary? |
I tried this once a long time ago, but the results were disappointing. I don't know why, as it does seem like it ought to work. |
It would be good to support all the compression, or at least decompression, schemes supported by servers and browsers. This looks like a great step forward to not having to write compression and decompression in javascript.
My use-case would be to download a bunch of brotli files in an archive (somewhat like tar), split them into individual files (unarchive, but don't decompress yet) in the service-worker cache, and return them to the browser as called for.
The text was updated successfully, but these errors were encountered: