-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Channel-state API #28
Comments
As part of this issue, we are free to pull in the current lifecycle API and redesign/tweak it. |
I am aware of users who are wanting this feature. |
I ll work on this next. Assuming the |
@buchgr, yes it is still recent. I do know that LoadBalancing will come into play and I don't think we have the states defined for it, so it would take some thought. |
@ejona86 thanks! so I guess ll first learn about the load balancer and then check out the implementation in the C core, to see if they took load balancing into account (yet), else I ll come up with my own proposal, that we can discuss and update the document according. I ll then implement this for Java. Sounds good? |
@buchgr, sounds good, but I'm okay with both of those going in parallel. LB is still in progress, so I don't mind improving the state specification over time and calling them bug fixes. |
Any update on the stats aspect of this? Right now, when we make a gRPC, we don't get any feedback regarding the state of the underlying connection, making it hard to understand where delays are coming from. |
@buchgr any updates? Are you still planning to work on this? |
@lukaszx0 nope ... it's free to be picked up by someone :-) |
I will note that the changes for fail fast made this easier to implement. It's mainly API work now. |
@zhangkun83 any updates? When will this feature be released? |
@zhangkun83 I was planning to find some time and work on this. I'll reach out to you and we can discuss. |
@ejona86 and I talked about the channel state API as #1600 will depend on it. @ejona86 wanted to not have the state be passed to the callback. Instead, the callback would need to call class ManagedChannel {
State getState();
void notifyWhenStateChanged(State source, Runnable callback, boolean connect);
} This has an issue with round-robin. The solution we decided on #1600 depends on the RR LB being able to catch every transition into TRANSIENT_FAILURE. However, if an address always results in connection timeout, the TransportSet for it would switching back and forth between CONNECTING and TRANSIENT_FAILURE. Because the initial back-off delays can be very short, which is the time spent in TRANSIENT_FAILURE, while CONNECTING may be noticeably long, it's possible that the callback from RR LB may get CONNECTING from I think we need to pass the state to the callback. |
I take it back. Passing the state to the callback is also flawed. Any state change between the callback being called and the callback calling Technically class ManagedChannel {
State getState();
void notifyWhenStateIsNot(State unexpected, Runnable callback, boolean connect);
void notifyWhenStateIs(State expected, Runnable callback, boolean connect);
} or more powerful: class ManagedChannel {
State getState();
void notifyWhenStateIsIn(EnumSet<State> expected, Runnable callback, boolean connect);
} |
I skimmed through grpc connectivity semantics docs and read your proposals.
Why is this a requirement? Is it mainly to keep API consistency with other languages? Also, while I think I like unified |
No, I don't think it is. The state can change and then change back to the previous state, so that some changes will appear spurious. I don't think we can use
It is to handle a very strong race. Basically, at times the state may change very, very rapidly. The listener should not be notified of every state transition, because that could lead to a heavy queue forming. Instead, just providing the current state has no performance issues and calling |
Kun, the connect arg should be on class ManagedChannel {
State getState(boolean tryToConnect);
void notifyWhenStateChanged(State lastObserved, Runnable callback);
} |
Maybe these methods could be named in a way that doesn't imply what the state is when the callback is run, but instead imply what state triggers the callback. My issue with |
Had a conversation with @a11r and @ejona86. @a11r doesn't think we need an alternative API. We came up with a workaround for the RR LB. It can mark a connection as bad if it has not been ready for a timeout, probably the same as the maximum connection timeout. Then it won't rely on seeing TRANSIENT_FAILURE. |
I believe it has been resolved with #2286 and this issue can be closed. |
I have filed #2292 to track the implementation in ManagedChannelImpl. |
|
@robeden, thanks for pointing that out. I've created #4359 to track and #4360 to fix the links in the code. Because there wasn't a tracking issue, this did slip through the cracks. It should be stabilized. It'll be discussed in our next API review in two weeks (we just finished an API review discussion, and we do it every other week). |
At this moment, creating TCP connections are created lazily on first call of a Channel, and if the TCP connection goes down it isn't reconnected until a subsequent call. However, some users will want the TCP connection to be created and maintained during the lifetime of the Channel.
This "constant connection" behavior does not make as much sense when accessing a third party service, as the service may purposefully be disconnecting idle clients, but is very reasonable in low-latency, intra-datacenter communication.
We need an API to choose between those behaviors and to export failure information about the Channel. All of this is bundled together for the moment under the name "health-checking API," but we can split it apart as it makes sense.
They are tied together for the moment because certain operations like "wait until Channel is healthy" assume that the channel will actively try to connect.
Some notes from @louiscryan:
The text was updated successfully, but these errors were encountered: