THRIFT-5828: reduce over-allocation in Go binary protocol #3057

leitzler · 2024-10-25T08:28:10Z

Due to a subtle caveat in buffer.Bytes the Go binary protocol method ReadBinary() always over-allocated.

We now allocate the asked size directly up to 10 MiB, and then maintain the malformed message protection by using a 10 MiB buffer for larger asks.

The existing benchmarks show that we reduce allocations for small payloads, and keep the edge case on a reasonable level at about 10MiB.

Ran the existing benchmarks in lib/go/thrift with and without this change:

% go test -run X -bench . -benchmem > <file>

% benchcmp -changed old.txt new.txt
[...]
benchmark                                   old allocs     new allocs     delta
BenchmarkSafeReadBytes/normal-20            5              2              -60.00%
BenchmarkSafeReadBytes/max-askedSize-20     8              3              -62.50%
BenchmarkBinaryBinary_0-20                  4              1              -75.00%
BenchmarkBinaryBinary_1-20                  4              1              -75.00%
BenchmarkBinaryBinary_2-20                  5              2              -60.00%
BenchmarkCompactBinary0-20                  4              1              -75.00%
BenchmarkCompactBinary1-20                  4              1              -75.00%
BenchmarkCompactBinary2-20                  5              2              -60.00%
BenchmarkSerializer/baseline-20             8              5              -37.50%
BenchmarkSerializer/plain-20                20             17             -15.00%
BenchmarkSerializer/pool-20                 8              5              -37.50%

benchmark                                   old bytes     new bytes     delta
BenchmarkSafeReadBytes/normal-20            1656          160           -90.34%
BenchmarkSafeReadBytes/max-askedSize-20     15992         10489910      +65494.73%
BenchmarkBinaryBinary_0-20                  1608          160           -90.05%
BenchmarkBinaryBinary_1-20                  1608          160           -90.05%
BenchmarkBinaryBinary_2-20                  1634          184           -88.74%
BenchmarkCompactBinary0-20                  1608          160           -90.05%
BenchmarkCompactBinary1-20                  1608          160           -90.05%
BenchmarkCompactBinary2-20                  1634          184           -88.74%
BenchmarkSerializer/baseline-20             1000          416           -58.40%
BenchmarkSerializer/plain-20                3640          3056          -16.04%
BenchmarkSerializer/pool-20                 1002          417           -58.38%

Did you create an Apache Jira ticket? (Request account here, not required for trivial changes)
If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
Did you squash your changes to a single commit? (not required, but preferred)
Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

fishy · 2024-10-28T16:26:45Z

lib/go/thrift/binary_protocol.go

-	buf := new(bytes.Buffer)
-	_, err := io.CopyN(buf, trans, int64(size))
-	return buf.Bytes(), err
+	const readLimit = 10 * 1024 * 1024


since the stated goal here is to optimize for ordinary cases (with size < readLimit), why not just do io.CopyN with bytes.Buffer when it's the edge case? that way we don't have to allocate for an 10MiB buffer upfront if the size is malformed, and if it's really a very huge payload we just pay the price for a bit more allocations (same as today).

(I would also consider making this limit configurable, but don't feel too strongly either way here)

why not just io.CopyN [...]?

I have no strong opinions in either direction, that works fine for me! The reason I changed it was that io.CopyN uses more memory than this approach. Looking at 40MB ask from the issue tracker, io.CopyN allocates:

main_test.go:36: 134217366 B/op 21 allocs/op - ask: 41943040, data: 41943040

while this approach allocates:

main_test.go:44: 83886131 B/op 5 allocs/op - ask: 41943040, data: 41943040

But the downside, as you said, is that we will allocate more than available when we get a malformed message.
Let me know if I should change it to use io.CopyN instead!

so there are 3 different cases:

ordinary case (reasonable size, not malformed)

legit very large payload (large size, not malformed)

malformed payload with large size

and the choice here is between 2 and 3 (we already agreed to optimize for 1).

between 2 and 3 I would prefer to optimize for 3, based on the consequences of the opposite. if we optimize for 2 (your current approach), then whenever the code get a malformed payload they would need to allocate 10MiB up front, that can be a big risk for code running with tight resources so the consequence can be more severe (e.g. they have more risk to crash due to insufficient memory). with the CopyN approach the consequence is just more allocations for legit cases, which is slightly slower and use more memory, but if the code is intended to handle legit very large payloads then we can already assume that it has more resource and those "wasted" memory will have a smaller consequences.

👍 I'll update the PR later today, thanks!

Updated, PTAL

Due to a subtle caveat in buffer.Bytes the Go binary protocol method ReadBinary() always over-allocated. We now allocate the asked size directly up to 10 MiB, and then use io.CopyN to maintain the malformed message protection. The existing benchmarks show that we reduce allocations for small payloads. Ran the existing benchmarks in lib/go/thrift with and without this change: % go test -run X -bench . -benchmem > <file> % benchcmp -changed old.txt new.txt [...] benchmark old allocs new allocs delta BenchmarkSafeReadBytes/normal-20 5 2 -60.00% BenchmarkBinaryBinary_0-20 4 1 -75.00% BenchmarkBinaryBinary_1-20 4 1 -75.00% BenchmarkBinaryBinary_2-20 5 2 -60.00% BenchmarkCompactBinary0-20 4 1 -75.00% BenchmarkCompactBinary1-20 4 1 -75.00% BenchmarkCompactBinary2-20 5 2 -60.00% BenchmarkSerializer/baseline-20 8 5 -37.50% BenchmarkSerializer/plain-20 20 17 -15.00% BenchmarkSerializer/pool-20 8 5 -37.50% benchmark old bytes new bytes delta BenchmarkSafeReadBytes/normal-20 1656 160 -90.34% BenchmarkBinaryBinary_0-20 1608 160 -90.05% BenchmarkBinaryBinary_1-20 1608 160 -90.05% BenchmarkBinaryBinary_2-20 1634 184 -88.74% BenchmarkCompactBinary0-20 1608 160 -90.05% BenchmarkCompactBinary1-20 1608 160 -90.05% BenchmarkCompactBinary2-20 1634 184 -88.74% BenchmarkSerializer/baseline-20 1000 416 -58.40% BenchmarkSerializer/plain-20 3640 3056 -16.04% BenchmarkSerializer/pool-20 1002 417 -58.38%

fishy · 2024-10-28T20:55:04Z

The current state looks good to me, but I'm tagging @dcelasun, @jpkrohling, @yurishkuro who were involved in the discussion and review of the original issue of https://issues.apache.org/jira/browse/THRIFT-5322 and #2292 to take a look.

My one (minor) concern is the hardcoded 10MiB limit. For malformed size > 10MiB we are falling back to the current approach which protects against over allocation, but if we have a malformed size that's < 10MiB but close, we are still doing the allocation up front. Does anyone use thrift go in a tight environment that will be a problem? I don't know.

And if we do think that's a problem, what should we do to make it configurable? Add it as a new API to TConfiguration? or maybe make it something like "always 1/10 of MaxMessageSize configured via TConfiguration"?

fishy · 2024-10-28T21:14:32Z

Actually, reading https://nvd.nist.gov/vuln/detail/CVE-2019-11939 again, I think this implement is sill vulnerable to a very similar attack: bad actor can construct a very small message with a size of 10MiB-1, and the server will still be tricked to do those large allocations, and spamming those malicious messages can still result in denial of service.

To mitigate against this attack, we need to make the read limit configurable. There are 3 ways to do it:

Add a new config to TConfiguration for this
Just reuse the current MaxMessageSize configurable (which would mean that we can mostly revert THRIFT-5322: Guard against large string/binary lengths in Go #2292 I think?)
Define the read limit as something like 1/10 of configured MaxMessageSize, and that 1/10 can be either hardcoded or configurable (but if it's configurable that's mostly just 1).

leitzler · 2024-10-28T22:37:54Z

Thanks for investigating. I agree, in the THRIFT-5828 I wrote:

Ideally we would use TConfiguration that was added after PR2292 (in PR #2296), there is already a default max message size at 100MB for example. The max message size is already checked before reading so ReadBinary could be changed back to always allocating the full size.

But to be more specific I'd prefer option 1 over the others. Reason being that the relation to MaxMessageSize looks arbitrary to me (apart from the fact that MaxMessageSize will block it first if that value is lower).

dcelasun · 2024-10-29T11:20:44Z

+1 for a new entry in TConfiguration.

leitzler · 2024-11-11T08:33:07Z

WDYT @jpkrohling, @yurishkuro?

Jens-G added the golang patches related to go language label Oct 25, 2024

fishy self-requested a review October 28, 2024 15:52

fishy reviewed Oct 28, 2024

View reviewed changes

leitzler force-pushed the master branch from b29cea5 to 97648a2 Compare October 28, 2024 20:30

fishy requested a review from dcelasun October 28, 2024 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

THRIFT-5828: reduce over-allocation in Go binary protocol #3057

THRIFT-5828: reduce over-allocation in Go binary protocol #3057

leitzler commented Oct 25, 2024 •

edited

Loading

fishy Oct 28, 2024

leitzler Oct 28, 2024

fishy Oct 28, 2024

leitzler Oct 28, 2024

leitzler Oct 28, 2024

fishy commented Oct 28, 2024 •

edited

Loading

fishy commented Oct 28, 2024

leitzler commented Oct 28, 2024

dcelasun commented Oct 29, 2024

leitzler commented Nov 11, 2024

THRIFT-5828: reduce over-allocation in Go binary protocol #3057

Are you sure you want to change the base?

THRIFT-5828: reduce over-allocation in Go binary protocol #3057

Conversation

leitzler commented Oct 25, 2024 • edited Loading

fishy Oct 28, 2024

Choose a reason for hiding this comment

leitzler Oct 28, 2024

Choose a reason for hiding this comment

fishy Oct 28, 2024

Choose a reason for hiding this comment

leitzler Oct 28, 2024

Choose a reason for hiding this comment

leitzler Oct 28, 2024

Choose a reason for hiding this comment

fishy commented Oct 28, 2024 • edited Loading

fishy commented Oct 28, 2024

leitzler commented Oct 28, 2024

dcelasun commented Oct 29, 2024

leitzler commented Nov 11, 2024

leitzler commented Oct 25, 2024 •

edited

Loading

fishy commented Oct 28, 2024 •

edited

Loading