-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Round-robin LB should be aware of and skip bad servers #1600
Comments
Since TransportSet today reconnects only if there is request, the round-robin LB needs to make the TransporSets actively reconnect, otherwise a once disconnected address wouldn't have a chance to be used again. This was supposed to be done by @ejona86 comes up with a better approach -- do not explicitly call out "retained transports", instead let LoadBalancer listen to the "channel state" (#28) of TransportSets, and make TransportSet to reconnect as desired when it's disconnected. With this approach, TransportSet can keep the lazy connection behavior, and the LoadBalancer API would be simpler. There are some discrepancies between TransportSet's state and the channel state semantics.
Those discrepancies probably don't matter to LoadBalancer. It just needs to know when TransportSet enters IDLE state, because is where TransportSet stops reconnecting. However, I am a little concerned about implementing the channel state API while violating its transitioning rules. |
@ejona86 and I brainstormed a few options. We found the following issues keep coming up: Channel state callback leakageRejected solution: TransportManager has a TransportSet leakageThis is actually #2276, but it's highly related. This issue is particularly a problem in GRPCLB affinity case, where the local LoadBalancer doesn't know all the servers, thus can't give TransportManager the list of servers to be kept. Rejected solution: we thought about relying on ChannelImpl's idle mode and TransportSet shutting down itself when going idle, to clear up unused TransportSets. However,
Reconnect back-off state being clearedRejected solution: we always shutdown the TransportSet when it becomes idle. It solves the problem of TransportSet leakage, but shutting down the TransportSet would clear the back-off state, so if the app keeps sending fail-fast RPCs to a going-away server, the following series of state transitions will end up hammering the server without any back-off. |
The solution we finally landed with:
How this addresses the issuesChannel state callback leakageThe callback is owned by TransportSet and will not leak as long as TransportSet doesn't leak. TransportSet leakageIt becomes LoadBalancer's responsibility to shutdown a SubChannel (thus TransportSet) that is no longer used. Reconnect back-off state being clearedBecause we never shutdown TransportSet when idle, back-off state will be kept (unless LoadBalancer maliciously shuts down and re-create a SubChannel frequently, which is not our concern). A once-bad server never gets used again by round-robinRoundRobinLoadBalancer will keep a list of all SubChannels and listen to the state of them. When seeing a SubChannel becoming TRANSIENT_ERROR, the LB will mark its EAG as unhealthy and skip it. Once the channel state semantics discrepancy in TransportSet is fixed, TRANSIENT_ERROR will always transition to CONNECTING, thus TransportSet will always try to reconnect if connection failed. |
@lukaszx0 @jhump This API change may impact your work in a non-trivial way. I'd like to hear your opinions. |
Thanks for the heads up @zhangkun83. I think this will actually be a better API for us. @lukaszx0 has done a lot of work integrating our name service with gRPC. But one last step is implementing heartbeats and better load balancing. The existing API is problematic -- particularly that We have cases where some traffic must transit through a virtual IP, for security reasons. Virtual IPs in our network generally map to hardware load balancers. This means that a client may only have a single address (e.g. host and port) but actually have multiple backends. With the previous API (only being able to target by actual address), there wasn't a clear way to distinguish multiple backends that transit through the same virtual IP address (for purposes of sending heartbeats and also for load balancing since we must distribute traffic across multiple connections that all go through the same address from the client's point of view). So we would likely have had to write the same connection management logic in our load balancer anyway, and potentially be unable to use We have this stuff implemented in Go, but their API for picking a connection accepts an address object which includes metadata from the name service (so much more like |
@zhangkun83 thanks for heads up, as Josh said, I think this might be a better API, however I must admit that I still need to spend some more time reading internals to fully understand current design and its abstractions to be able to better judge and/or propose some alternative solutions. |
@jhump happy to hear that! Would that be OK if I just change the API in-place, instead of deprecating and deleting the old methods in two releases? Keeping the old API is kind of painful. |
RE: detecting bad server, we have to live with the current channel-state API design, which doesn't provide reliable signal of TRANSIENT_FAILURE. As a workaround, we will mark an address as bad if it has not been ready for some timeout. |
Yes. We're quite flexible and this shouldn't be a problem for us. |
For the record, while I don't know much about kubernetes it looks like people using it are also "affected" by this and the use case @jhump described - the situation when there's a L4 load balancer in play |
All FYI (nothing really impacting discussion): There is a grpc-wide L4 LB issue at grpc/grpc#7957. The specific case of having multiple transports to the same host was intended to be supported, but we had multiple ways of obtaining it. For instance, we could easily have an extra id on The C core auto-reuses transports across channels, so they have the same problem (but at an even higher level). They use a similar hack where they add useless configuration that avoids sharing but otherwise has no effect. |
There is another issue that will result from LoadBalancer actively shutting down TransportSets -- a race between shutting down the transport (when RR addresses change) and calling EDIT: a possible solution, is that |
Design doc for the new API and the related Channel impl refactor: |
This is the first step of a major refactor for the LoadBalancer-related part of Channel impl (grpc#1600). What's changed: - TransportSet no longer has delayed transport, thus will not buffer streams when a READY real transport is absent. - When there isn't a READ real transport, obtainActiveTransport() will return null. - Overhauled Callback interface, with onStateChange() replacing the adhoc transport event methods. - Call out channelExecutor, which is a serializing executor that runs the Callback.
The doc appears private. I requested access. |
@jhspaybar: Due to policy restrictions, the doc cannot be published and I have to grant access to individuals upon request. |
This is the first step of a major refactor for the LoadBalancer-related part of Channel impl (grpc#1600). What's changed: - TransportSet no longer has delayed transport, thus will not buffer streams when a READY real transport is absent. - When there isn't a READ real transport, obtainActiveTransport() will return null. - Overhauled Callback interface, with onStateChange() replacing the adhoc transport event methods. - Call out channelExecutor, which is a serializing executor that runs the Callback.
This is the first step of a major refactor for the LoadBalancer-related part of Channel impl (#1600). It forks TransportSet into InternalSubchannel and makes changes on it. What's changed: - InternalSubchannel no longer has delayed transport, thus will not buffer streams when a READY real transport is absent. - When there isn't a READ real transport, obtainActiveTransport() will return null. - InternalSubchannel is no longer a ManagedChannel - Overhauled Callback interface, with onStateChange() replacing the adhoc transport event methods. - Call out channelExecutor, which is a serializing executor that runs the Callback. The first commit is an unmodified copy of the files that are being forked. Please review the second commit which changes on the forked files.
This is the second step of a major refactor for the LoadBalancer-related part of Channel impl (grpc#1600)
This is the second step of a major refactor for the LoadBalancer-related part of Channel impl (#1600)
@zhangkun83 LBv2 has landed in master, I think we can close it? |
Resolved by #2707 |
The round-robin logic in
GrpclbLoadBalancer
should listen to transport life-cycle events (handleTransportReady()
andhandleTransportShutdown()
) to learn what servers are not usable, so that it can skip them. The round-robin logic will also be shared with the simple round-robinLoadBalancer
(which is yet to be implemented).The text was updated successfully, but these errors were encountered: