-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
THRIFT-5828: reduce over-allocation in Go binary protocol #3057
base: master
Are you sure you want to change the base?
Conversation
buf := new(bytes.Buffer) | ||
_, err := io.CopyN(buf, trans, int64(size)) | ||
return buf.Bytes(), err | ||
const readLimit = 10 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the stated goal here is to optimize for ordinary cases (with size < readLimit), why not just do io.CopyN
with bytes.Buffer
when it's the edge case? that way we don't have to allocate for an 10MiB buffer upfront if the size is malformed, and if it's really a very huge payload we just pay the price for a bit more allocations (same as today).
(I would also consider making this limit configurable, but don't feel too strongly either way here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just
io.CopyN
[...]?
I have no strong opinions in either direction, that works fine for me! The reason I changed it was that io.CopyN
uses more memory than this approach. Looking at 40MB ask from the issue tracker, io.CopyN
allocates:
main_test.go:36: 134217366 B/op 21 allocs/op - ask: 41943040, data: 41943040
while this approach allocates:
main_test.go:44: 83886131 B/op 5 allocs/op - ask: 41943040, data: 41943040
But the downside, as you said, is that we will allocate more than available when we get a malformed message.
Let me know if I should change it to use io.CopyN
instead!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so there are 3 different cases:
- ordinary case (reasonable size, not malformed)
- legit very large payload (large size, not malformed)
- malformed payload with large size
and the choice here is between 2 and 3 (we already agreed to optimize for 1).
between 2 and 3 I would prefer to optimize for 3, based on the consequences of the opposite. if we optimize for 2 (your current approach), then whenever the code get a malformed payload they would need to allocate 10MiB up front, that can be a big risk for code running with tight resources so the consequence can be more severe (e.g. they have more risk to crash due to insufficient memory). with the CopyN
approach the consequence is just more allocations for legit cases, which is slightly slower and use more memory, but if the code is intended to handle legit very large payloads then we can already assume that it has more resource and those "wasted" memory will have a smaller consequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I'll update the PR later today, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, PTAL
Due to a subtle caveat in buffer.Bytes the Go binary protocol method ReadBinary() always over-allocated. We now allocate the asked size directly up to 10 MiB, and then use io.CopyN to maintain the malformed message protection. The existing benchmarks show that we reduce allocations for small payloads. Ran the existing benchmarks in lib/go/thrift with and without this change: % go test -run X -bench . -benchmem > <file> % benchcmp -changed old.txt new.txt [...] benchmark old allocs new allocs delta BenchmarkSafeReadBytes/normal-20 5 2 -60.00% BenchmarkBinaryBinary_0-20 4 1 -75.00% BenchmarkBinaryBinary_1-20 4 1 -75.00% BenchmarkBinaryBinary_2-20 5 2 -60.00% BenchmarkCompactBinary0-20 4 1 -75.00% BenchmarkCompactBinary1-20 4 1 -75.00% BenchmarkCompactBinary2-20 5 2 -60.00% BenchmarkSerializer/baseline-20 8 5 -37.50% BenchmarkSerializer/plain-20 20 17 -15.00% BenchmarkSerializer/pool-20 8 5 -37.50% benchmark old bytes new bytes delta BenchmarkSafeReadBytes/normal-20 1656 160 -90.34% BenchmarkBinaryBinary_0-20 1608 160 -90.05% BenchmarkBinaryBinary_1-20 1608 160 -90.05% BenchmarkBinaryBinary_2-20 1634 184 -88.74% BenchmarkCompactBinary0-20 1608 160 -90.05% BenchmarkCompactBinary1-20 1608 160 -90.05% BenchmarkCompactBinary2-20 1634 184 -88.74% BenchmarkSerializer/baseline-20 1000 416 -58.40% BenchmarkSerializer/plain-20 3640 3056 -16.04% BenchmarkSerializer/pool-20 1002 417 -58.38%
The current state looks good to me, but I'm tagging @dcelasun, @jpkrohling, @yurishkuro who were involved in the discussion and review of the original issue of https://issues.apache.org/jira/browse/THRIFT-5322 and #2292 to take a look. My one (minor) concern is the hardcoded 10MiB limit. For malformed size > 10MiB we are falling back to the current approach which protects against over allocation, but if we have a malformed size that's < 10MiB but close, we are still doing the allocation up front. Does anyone use thrift go in a tight environment that will be a problem? I don't know. And if we do think that's a problem, what should we do to make it configurable? Add it as a new API to TConfiguration? or maybe make it something like "always 1/10 of |
Actually, reading https://nvd.nist.gov/vuln/detail/CVE-2019-11939 again, I think this implement is sill vulnerable to a very similar attack: bad actor can construct a very small message with a size of 10MiB-1, and the server will still be tricked to do those large allocations, and spamming those malicious messages can still result in denial of service. To mitigate against this attack, we need to make the read limit configurable. There are 3 ways to do it:
|
Thanks for investigating. I agree, in the THRIFT-5828 I wrote:
But to be more specific I'd prefer option 1 over the others. Reason being that the relation to |
+1 for a new entry in |
WDYT @jpkrohling, @yurishkuro? |
Due to a subtle caveat in buffer.Bytes the Go binary protocol method ReadBinary() always over-allocated.
We now allocate the asked size directly up to 10 MiB, and then maintain the malformed message protection by using a 10 MiB buffer for larger asks.
The existing benchmarks show that we reduce allocations for small payloads, and keep the edge case on a reasonable level at about 10MiB.
Ran the existing benchmarks in lib/go/thrift with and without this change:
[skip ci]
anywhere in the commit message to free up build resources.