Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#583][Sotw] Ensure resources are properly sent again if envoy unsubscribes then subscribes again to a resource #585

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

valerian-roche
Copy link
Contributor

This PR is part of the fix for the issue related to resources being unsubscribed from then subscribed to again
This one applies to both simple and linear cache when using the sotw protocol.
Actionable cases would likely only impact sotw-ADS as EDS outside of ADS does not share streams and will therefore usually not have multiple resources this way

This PR is built on top of #584 as it's relying on the client state being provided to the cache

It is also fixing a few bugs found during testing:

  • in linear cache, the request set in the response was not the actual request. It was breaking some sotw parts as this was actually relied upon for some behavior (though not the bug of the issue)
  • in linear cache, when using sotw, if a request has multiple resources (at least three) and receive at least two subsequent updates of subsets of those resources, there is a deadlock as the watch is leaked in the watches of resources requested but not yet updated. As this channel won't be listened after the first reply, the second update will fill the buffer and any subsequent cache update impacting a resource requested and not yet updated will deadlock

It is updating a few other aspects in order to implement the fix:

  • Adds a staged resources map within the streamState. In sotw it is storing resources returned but not yet ACKed. This behavior should be ported to delta, but is out of the scope here. It allows removing the bespoke lastResponse value set only in sotw and parallel to streamState
  • Adds a new returned attribute in the rawResponse, providing the list of resources included in the opaque response. This is needed to know which resources are actually known to the client. The previous implementation based on the request was actually opening edge cases as there is no guarantee a requested resource has ever been replied
  • Adds cache logging in linear_test.go (helping while writing tests)

Finally, reasoning on the fixing through proper tracking of resources rather than basic resend of everything on doubts is explained within #584, and is even more applicable in the context of this issue (EDS through sotw-ads) when a single client can track thousands of endpoints within hundreds of CLAs

…e uniform between sotw and delta

Signed-off-by: Valerian Roche <[email protected]>
…voy unsubscribes then subscribes again to a resource

Fix potential deadlock in sotw-ads related to improper cleanup of watches in Linear cache when using delta in non-wildcard
Fix improper request set on sotw responses in Linear cache
Replaced lastResponse in sotw server by staged resources pending ACK

Signed-off-by: Valerian Roche <[email protected]>
Signed-off-by: Valerian Roche <[email protected]>
…nonce and version within streamState to properly track it with the pending resources

Signed-off-by: Valerian Roche <[email protected]>
response: responder,
})
// If we had an open watch, close it to make sure we don't end up sending a cache response while we update the state of the request
w.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race mentioned here is currently present in the code if we receive a new request while we are building a response in the cache (race on KnownResourceNames)

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Sep 25, 2022
@mattconzen
Copy link

@valerian-roche Is this one still needed in upstream or should it remain stale/closed?

@github-actions github-actions bot removed the stale label Sep 25, 2022
@valerian-roche
Copy link
Contributor Author

@valerian-roche Is this one still needed in upstream or should it remain stale/closed?

Those PRs are addressing issues which are still present. There has been no feedback on it or #584 which is a pre-requisite of those PRs
Maybe @alecholmez can provide some feedback on when it could be reviewed

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 26, 2022
@alecholmez alecholmez self-requested a review October 28, 2022 14:36
@alecholmez alecholmez self-assigned this Oct 28, 2022
Comment on lines +39 to +42
// ClientState provides additional data on the client knowledge for the type matching the request
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources)
// Though the methods may return mutable parts of the state for performance reasons,
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation
Copy link

@atollena atollena Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment would benefit from emphasizing that this state is per resource type.

Suggested change
// ClientState provides additional data on the client knowledge for the type matching the request
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources)
// Though the methods may return mutable parts of the state for performance reasons,
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation
// ClientState stores the server view of the client state for a given resource type.
// This allows proper implementation of stateful aspects of the protocol (e.g. returning only some updated resources).
// Though the methods may return mutable parts of the state for performance reasons,
// the cache is expected to consider this state as immutable and thread safe between a watch creation and its cancellation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated on parent PR #584, not yet rebased on this branch

// The versions are:
// - delta protocol: version of the specific resource set in the response
// - sotw protocol: version of the global response when the resource was last ACKed
GetKnownResources() map[string]string
Copy link

@atollena atollena Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't GetAckedResources be more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to this name. I have not pushed this branch yet, but I updated #584

Comment on lines +347 to +348
// Non wildcard case, we only reply resources that have effectively changed since the version set in the request
// This is used for instance in EDS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you must special case CDS and LDS to not fall into this case. Is this an existing bug?

From https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#four-variants:

The SotW approach was the original mechanism used by xDS, in which the client must specify all resource names it is interested in with each request, and for LDS and CDS resources, the server must return all resources that the client has subscribed to in each request. This means that if the client is already subscribing to 99 resources and wants to add an additional one, it must send a request with all 100 resource names, rather than just the one new one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is a known limitation that linear does not set full resources currently, and simple send all resources even when it should only send the changed ones (e.g. endpoints)
My hope was to finish the merge of the protocols first to not have to do code multiple times (now even more with sotw ads implementation) to fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#540 mentions this issue in the case of Sotw-ads on linear cache for xds-grpc

state = stream.NewStreamState(len(req.ResourceNames) == 0, nil)
} else if nonce != state.LastResponseNonce() {
// The request does not match the last response we sent.
// The protocol is a bit unclear in this case, but currently we discard such request and wait
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol is very clear on this:

From https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol#resource-updates:

The management server should not send a DiscoveryResponse for any DiscoveryRequest that has a stale nonce. A nonce becomes stale following a newer nonce being presented to Envoy in a DiscoveryResponse.

This prevents the management server from sending a response to a request that had not taken the last sent response into consideration yet.

(By the way, comment on line 149 is quite misleading -- I don't think they can be reused across streams, not even across type for ADS)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we need to rework/rephrase this. In sotw the nonce behavior is critical and improperly implemented in the control-plane currently. In delta we basically fully ignore them currently also, though it's only used for ACK/NACK there.

// The request does not match the last response we sent.
// The protocol is a bit unclear in this case, but currently we discard such request and wait
// for the client to acknowledge the previous response.
// This is unclear how this handles cases where a response would be missed as any subsequent request will be discarded.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a response be missed by a client? Responses have order guarantees. Can this really happen? Does Envoy skip received versions when it sees that it has multiple responses ready to be processed? This seems like an awkward behavior, as it's quite difficult to even implement that with normal grpc clients...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current control-plane implementation tracks the nonce per type, even though the nonce is per stream. It's therefore likely to have misaligned values here. I need to do a full review of the version/nonce situation as I'm pretty sure it's really wrong in ads cases. In non-ads cases the issue does not appear as type watch and stream are 1:1

Copy link

@atollena atollena Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nonce is actually per type iiuc? That could be documented more clearly... The nonce exists to prevent a race involving individual resource types. I think making the nonce per ADS stream (and cross resource type) would not work well: requests would be considered stale based on other resource type responses, and the client would have to re-request.

There is actually one edge case to at least take into consideration that may challenge some of the comments I made in my previous review, explained here: envoyproxy/envoy#10363.

state, ok := streamStates[typeURL]
if !ok {
// We don't have a current state for this type, create one
state = stream.NewStreamState(len(req.ResourceNames) == 0, nil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about explicit wildcards? ("*")? Shouldn't wildcard be true in that case, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit wildcards are handled in the subscription tracking


// Check if the new request ACKs the previous response
// This is defined as nonce and versions are matching, as well as no error present
if state.LastResponseNonce() == nonce && state.LastResponseVersion() == req.VersionInfo && req.ErrorDetail == nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to check the nonce here?

  1. If this is an existing stream/type pair, then nonce not matching skips all the logic via continue at L180.
  2. If this is a new stream, then both request and response nonces are empty so they always match.

I'm asking because if versions were not just an integer generated per stream and per instance (e.g. if based on a hash of subscribed resources), I think would be OK to consider the client up to date if it has the right version but an empty nonce (first request), as long as we can guarantee that the set of subscribed resources match (this implies embedded subscribed resource names into the version). Anyways, this is not how linear cache works today, so perhaps not so important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the nonce behavior: as mentioned above, we need to fully review it
For the shortcut of subscribed resources, we can check but envoy requires to send a response even if the version does not change for some phases of load. If we have the latest version and what was subscribed then, if envoy in-between the streams unsubscribed and is now resubscribing we would be missing a response

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so the list of subscribed revisions would need to be part of the version string.

Envoy requires to send a response even if the version does not change for some phases of load

OK, so I guess my hope that we could avoid sending a response on stream open if it is already up to date is not something we can support, even in theory. Anyways, this is not the focus of this PR so I'll close this comment for now.

}

// Remove resources no longer subscribed from the stream state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Remove resources no longer subscribed from the stream state
// Remove resources no longer subscribed to from the stream state.

knownResources := state.GetKnownResources()
unsubscribedResources := getUnsubscribedResources(req.ResourceNames, knownResources)
for _, resourceName := range unsubscribedResources {
delete(knownResources, resourceName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I find this pattern of using a getter and then mutating the streamState object directly from the field quite unintuivite.

Perhaps it would make sense to simply expose the fields from stream state directly? Or expose a "setter".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't like this pattern. Setter is a bit messy here, and map attributes create risks of users inserting in nil maps. I'll check how we can improve this

Comment on lines +231 to +232
// The protocol clearly states that if we send another response prior to an ACK
// the previous one is to be considered as discarded
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the protocol section that states that? I don't find the wording and I'm not sure this concern is valid.

Not having to resend resources that were already sent would make the code much simpler, as you would not need to track aknowledgement, afaik (no concept of "pending" resource).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was a mistake on my side for the comment
With regards to tracking, if envoy NACKed a resource and we consider it ACKed it might not be sent again. I'm not fully sure if:

  • we send a response at version N
  • we send another response at version N+1, prior to envoy acknowledging version N
  • if we considered that envoy already knew all versions at N, and don't resend them in N+1, and client subscriptions changed during this window, I'm afraid we could have cases where resources are not re-added when they should be. I'd need more time to assess it fully though

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, could you validate envoy's behavior in the case you describe? gRPC will cache versions sent at version N, and only update those that changed at N+1, so nothing is lost. I think it'd be tricky to even implement something different on the client side, but perhaps Envoy does.

@atollena
Copy link

atollena commented Jan 4, 2024

Thanks for working on this, this is great.

This would fix #431 which makes go-control-plane pretty much incompatible with grpc clients, as those typically unsubscribe to resources after a channel reaches its idle timeout.

@carsonoid
Copy link

Thanks for working on this, this is great.

This would fix #431 which makes go-control-plane pretty much incompatible with grpc clients, as those typically unsubscribe to resources after a channel reaches its idle timeout.

@atollena the patch here fixes the go client idle issue for me. Hopefully that it can help you too

@dfawley
Copy link
Member

dfawley commented Jan 15, 2025

What is the status of this PR? What would it take to make it finally land? We just got bit by this again in some proposed testing we were going to add to gRPC to catch races creating and destroying gRPC channels using xDS, but can't until this is fixed.

@valerian-roche
Copy link
Contributor Author

What is the status of this PR? What would it take to make it finally land? We just got bit by this again in some proposed testing we were going to add to gRPC to catch races creating and destroying gRPC channels using xDS, but can't until this is fixed.

Hey Doug,
Within my company we have been using this setup for a while with grpc xDS clients (actually using the branch https://github.com/DataDog/go-control-plane/tree/dd/main).
This branch relies on an API breaking change in the Cache interface which, for most users, should not have any impact, but creates an issue with the single tag release we used to have.
Once the dust has settled on the module split I hope to be able to push those changes on the upstream main branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants