Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: LoadBalancer2 and DelayedClientTransport.reprocess(). #2443

Merged
merged 12 commits into from
Dec 2, 2016

Conversation

zhangkun83
Copy link
Contributor

This is the second step of a major refactor for the LoadBalancer-related
part of Channel impl (#1600).

Please review the PR as an entirety.

Copy link
Contributor

@lukaszx0 lukaszx0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hate to bring up naming again, but I'd like to suggest renaming Picker to Selector. Maybe it's just me, but Picker always sounded weird for me in this context whereas Selector make it's more intuitive and gives me a better idea of what it'll be responsible for. Proposed new naming: SubchannelSelector, SelectedSubchanel, #selectSubchannel, #updateSubchannelSelector).

* <p>The LoadBalancer is responsible for closing unused Subchannels, and closing all
* Subchannels within {@link #shutdown}.
*/
public abstract Subchannel createSubChannel(EquivalentAddressGroup addrs, Attributes attrs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: createSubChannel => createSubchannel to make it consistent with other class names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* Schedule a task to be run in the Channel Executor, which serializes the task with the
* callback methods on the {@link LoadBalancer2} interface.
*/
public abstract void runSeralized(Runnable task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: runSerialized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

/**
* No decision could be made. The RPC will stay buffered.
*/
public static PickResult withNothingYet() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe withNoSubchannel and also updated comment that the returned result is successful (Status.OK) ?

Copy link
Contributor Author

@zhangkun83 zhangkun83 Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadBalancer implementations care more about what constructor method to call, than what getStatus() returns. I think getStatus() is sufficiently documented, and it's unnecessary to talk about the resulting status.

I was struggling in the naming of this method. The message that I wanted to convey is that "LoadBalancer cannot make a decision yet, thus nothing will happen to this RPC.". I don't think "withNoSubchannel" does the job better. Some alternatives are:

  • withNoResult()
  • stillPending()
  • pending()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Maybe withNoResult() would work better? I like pending() the most, but it's missing with and I'd like to to be consistent with other methods. Heh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to withNoResult().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@zhangkun83
Copy link
Contributor Author

I don't find Picker to be less intuitive than Selector, maybe because that we use "Picker" in our internal LB interface :). Given that, I do prefer Picker because it's shorter.

@lukaszx0
Copy link
Contributor

lukaszx0 commented Nov 25, 2016 via email

public abstract EquivalentAddressGroup getAddresses();

/**
* The same attributes passed to {@link #createSubChannel}. LoadBalancer can use it to attach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Helper#createSubchannel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* called again and a new picker replaces the old one. If {@link #updatePicker} has never been
* called, the channel will buffer all RPCs until a picker is provided.
*/
public abstract void updatePicker(SubchannelPicker picker);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to updateSubchannelPicker ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary. updatePicker can't be mistaken for anything else in this context, given that the argument type already refers to "Subchannel".

* additional information here, e.g., the shard this Subchannel belongs to.
*/
public abstract Attributes getAttributes();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes are immutable which means that if we want to be able to update attributes on existing channel, we'll need setAttributes(Attributes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. You could use mutable values, e.g., AtomicReference in Attributes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah. That's fair.

Copy link
Contributor

@lukaszx0 lukaszx0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

*
* <h3>Channel Executor</h3>
*
* <p>Channel Executor is an internal executor of the channel, which is used to run serialize all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/run serialize/serialize/ or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* <p>Channel Executor is an internal executor of the channel, which is used to run serialize all
* the callback methods on the {@link LoadBalancer2} interface, thus the balancer implementation
* doesn't need to worry about synchronization among them. However, the actual thread that Channel
* Executor is typically the network thread, thus following rules must be followed to prevent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/is is/ or s/actual thread that/actual thread of the/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* synchronization primitives, blocking I/O, blocking RPCs, etc.</li>
*
* <li><strong>Avoid calling into other components with lock held</strong>. Channel Executor may
* run callbacks under a lock, e.g., the transport lock of OkHttp2. If your LoadBalancer has a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/OkHttp2/OkHttp/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/**
* The main balancing logic. It <strong>must be thread-safe</strong>. Typically it should only
* synchronize on its own state, and void synchronzing with the LoadBalancer's state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/void/avoid/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/**
* A decision to proceed the RPC on a Subchannel. The state of the Subchannel is supposed to be
* {@link ConnectivityState#READY}. However, a non-READY Subchannel will not fail the RPC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was unclear why this "However" was stated. Maybe something like, "However, since such decisions are racy, a non-READY Subchannel..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Rephrased as suggested.

* <p>When a new picker is provided via {@link Helper#updatePicker}, the channel will apply the
* picker on all buffered RPCs, by calling {@link SubchannelPicker#pickSubchannel}.
*
* <p>The channel will hold the a picker and use it for all RPCs, until {@link #updatePicker} is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the a/the/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* <p>It maintains at most one physical connection (aka transport) for sending new RPCs, while
* also keeps track of previous transports that has been shut down but not terminated yet.
*
* <p>If there isn't an active transport yet, and the Subchannel is assigned to an RPC, it will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the Subchannel is assigned to an RPC/an RPC is assigned to the Subchannel/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// To be implemented by ManagedChannelImpl2 and unit test
@Nullable
ClientTransport obtainActiveTransport(Subchannel subchannel) {
throw new UnsupportedOperationException("Not implemented");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange, but I'm willing to wait and see how it all fits together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I introduced a SubchannelImpl that adds obtainActiveTransport() to Subchannel.

}
pendingStreams.removeAll(toRemove);
if (pendingStreams.isEmpty()) {
pendingStreams = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pendingStreams should only be made null if the delayed transport is shut down and setTransport is called. Just leave the collection in-place, even if its empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since setTransport() will go away, and delayed transport will be kept open, it seems necessary to take opportunity to discard the hashmap, as it only grows in size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the code doesn't ever create a new hashmap. The next stream that needs to be pending will cause a NPE. If you want to re-initialize, you'll need to create the new hashmap here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... do you also need to handle transport terminated here, like in PendingStream.cancel?

Copy link
Contributor Author

@zhangkun83 zhangkun83 Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Actually I missed out the shutdown process for the LBv2 design. Would the following shutdown sequence work?:

  1. ManagedChannelImpl.shutdown(): stop accepting new calls, shutdown delayed transport. All other functionalities continue working.
  2. Once the delayed transport is terminated, shutdown NameResolver and LoadBalancer.
  3. Once all subchannels and all OOB channels have terminated, ManagedChannelImpl will terminate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added termination handling, and fixed the NPE. Both covered by tests. PTAL.

}

synchronized (lock) {
if (pendingStreams == null || pendingStreams.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit worrying. If streams may be removed from pendingStreams, then that seems like we could "start" a stream more than once (maybe as failures).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added documentation to say that reprocess() cannot be called concurrently with itself or setTransport. However, between the two synchronized blocks, streams may be cancelled, and shutdown() may also be called, which will make pendingStreams empty and null respectively. Looks I have to keep this check.

Copy link
Contributor Author

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejona86 PTAL

*
* <h3>Channel Executor</h3>
*
* <p>Channel Executor is an internal executor of the channel, which is used to run serialize all
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* <p>Channel Executor is an internal executor of the channel, which is used to run serialize all
* the callback methods on the {@link LoadBalancer2} interface, thus the balancer implementation
* doesn't need to worry about synchronization among them. However, the actual thread that Channel
* Executor is typically the network thread, thus following rules must be followed to prevent
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* synchronization primitives, blocking I/O, blocking RPCs, etc.</li>
*
* <li><strong>Avoid calling into other components with lock held</strong>. Channel Executor may
* run callbacks under a lock, e.g., the transport lock of OkHttp2. If your LoadBalancer has a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/**
* The main balancing logic. It <strong>must be thread-safe</strong>. Typically it should only
* synchronize on its own state, and void synchronzing with the LoadBalancer's state.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


/**
* A decision to proceed the RPC on a Subchannel. The state of the Subchannel is supposed to be
* {@link ConnectivityState#READY}. However, a non-READY Subchannel will not fail the RPC,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Rephrased as suggested.

* <p>When a new picker is provided via {@link Helper#updatePicker}, the channel will apply the
* picker on all buffered RPCs, by calling {@link SubchannelPicker#pickSubchannel}.
*
* <p>The channel will hold the a picker and use it for all RPCs, until {@link #updatePicker} is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* <p>It maintains at most one physical connection (aka transport) for sending new RPCs, while
* also keeps track of previous transports that has been shut down but not terminated yet.
*
* <p>If there isn't an active transport yet, and the Subchannel is assigned to an RPC, it will
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
pendingStreams.removeAll(toRemove);
if (pendingStreams.isEmpty()) {
pendingStreams = null;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since setTransport() will go away, and delayed transport will be kept open, it seems necessary to take opportunity to discard the hashmap, as it only grows in size.

// To be implemented by ManagedChannelImpl2 and unit test
@Nullable
ClientTransport obtainActiveTransport(Subchannel subchannel) {
throw new UnsupportedOperationException("Not implemented");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I introduced a SubchannelImpl that adds obtainActiveTransport() to Subchannel.

}

synchronized (lock) {
if (pendingStreams == null || pendingStreams.isEmpty()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added documentation to say that reprocess() cannot be called concurrently with itself or setTransport. However, between the two synchronized blocks, streams may be cancelled, and shutdown() may also be called, which will make pendingStreams empty and null respectively. Looks I have to keep this check.

@zhangkun83 zhangkun83 merged commit 7ec8167 into grpc:master Dec 2, 2016
@zhangkun83 zhangkun83 deleted the lbv2_delayedtransport branch December 2, 2016 18:35
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants