Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JMX/RMI socket factory with timeout #298

Merged
merged 1 commit into from
Jun 10, 2020

Conversation

prognant
Copy link
Contributor

@prognant prognant commented May 25, 2020

Proper RMI/JMX socket factory with timeout for both SSL and non-SSL.
Should be a drop in replacement for actual code. I also suggest to remove Connection.connectWithTimeout because timeouts are enforced within the socket factory. I also tested the attach API and it seems that infinite freeze can not happen in that case.

rmi_connection_timeout & rmi_client_timeout have been retained to configure timeouts in the socket factory.

See code comment for a question about the mysterious attribute.remote.x.request.waiting.timeout property...

A piece of the code path that may block because of network timeout is located here it plays with different configurable timeout during the TCP connection sequence, it even overrides for certain steps the SO_TIMEOUT from the the socket factory in at least one place (Can be set with through the following property sun.rmi.transport.tcp.handshakeTimeout). But having the SO_TIMEOUT always set at the socket level ensure always having a configurable timeout on socket operation. The the initial connection timeout is managed differently to keep the ability to get a socket from any factory (used for SSL support at the moment). It shall be noticed that the connection timeout can be shortened but there is an upper bound for enforceable value. The upper bound depends on the OS settings (~1 min on average).

Timeout will then be logged as shown below :

2020-06-04 11:51:42,501 | INFO  | App | Could not initialize instance: jmx-10.121.132.56-7198: java.util.concurrent.ExecutionException: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: Exception creating connection to: 10.121.132.56; nested exception is:
        java.io.IOException: connect timed out: 10.121.132.56:7198]

For information the call stack while connecting, providing information about involved classes :

PlainSocketImpl.socketConnect(InetAddress,int,int)[native method] (/rt.jar/java.net/PlainSocketImpl.class:-1)
AbstractPlainSocketImpl.doConnect(InetAddress,int,int) (/rt.jar/java.net/AbstractPlainSocketImpl.class:350)
AbstractPlainSocketImpl.connectToAddress(InetAddress,int,int) (/rt.jar/java.net/AbstractPlainSocketImpl.class:206)
AbstractPlainSocketImpl.connect(SocketAddress,int) (/rt.jar/java.net/AbstractPlainSocketImpl.class:188)
SocksSocketImpl.connect(SocketAddress,int) (/rt.jar/java.net/SocksSocketImpl.class:392)
Socket.connect(SocketAddress,int) (/rt.jar/java.net/Socket.class:607)
Socket.connect(SocketAddress) (/rt.jar/java.net/Socket.class:556)
Socket.<init>(SocketAddress,SocketAddress,boolean) (/rt.jar/java.net/Socket.class:452)
Socket.<init>(String,int) (/rt.jar/java.net/Socket.class:229)
RMIDirectSocketFactory.createSocket(String,int) (/rt.jar/sun.rmi.transport.proxy/RMIDirectSocketFactory.class:40)
RMIMasterSocketFactory.createSocket(String,int) (/rt.jar/sun.rmi.transport.proxy/RMIMasterSocketFactory.class:148)
JmxfetchRmiClientSocketFactory.createSocket(String,int) (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/util/JmxfetchRmiClientSocketFactory.java:29)
TCPEndpoint.newSocket() (/rt.jar/sun.rmi.transport.tcp/TCPEndpoint.class:617)
TCPChannel.createConnection() (/rt.jar/sun.rmi.transport.tcp/TCPChannel.class:216)
TCPChannel.newConnection() (/rt.jar/sun.rmi.transport.tcp/TCPChannel.class:202)
UnicastRef.newCall(RemoteObject,Operation[],int,long) (/rt.jar/sun.rmi.server/UnicastRef.class:343)
RegistryImpl_Stub.lookup(String) (/rt.jar/sun.rmi.registry/RegistryImpl_Stub.class:116)
RegistryContext.lookup(Name) (/rt.jar/com.sun.jndi.rmi.registry/RegistryContext.class:132)
GenericURLContext.lookup(String) (/rt.jar/com.sun.jndi.toolkit.url/GenericURLContext.class:205)
InitialContext.lookup(String) (/rt.jar/javax.naming/InitialContext.class:417)
RMIConnector.findRMIServerJNDI(String,Map,boolean) (/rt.jar/javax.management.remote.rmi/RMIConnector.class:1955)
RMIConnector.findRMIServer(JMXServiceURL,Map) (/rt.jar/javax.management.remote.rmi/RMIConnector.class:1922)
RMIConnector.connect(Map) (/rt.jar/javax.management.remote.rmi/RMIConnector.class:287)
JMXConnectorFactory.connect(JMXServiceURL,Map) (/rt.jar/javax.management.remote/JMXConnectorFactory.class:270)
Connection.createConnection() (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/Connection.java:63)
RemoteConnection.<init>(Map) (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/RemoteConnection.java:126)
ConnectionFactory.createConnection(Map) (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/ConnectionFactory.java:38)
Instance.getConnection(Map,boolean) (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/Instance.java:371)
Instance.init(boolean) (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/Instance.java:384)
InstanceInitializingTask.call() (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/InstanceInitializingTask.java:15)
InstanceInitializingTask.call() (/Users/pierre.rognant/git/jmxfetch/src/main/java/org/datadog/jmxfetch/InstanceInitializingTask.java:3)
FutureTask.run() (/rt.jar/java.util.concurrent/FutureTask.class:266)
ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) (/rt.jar/java.util.concurrent/ThreadPoolExecutor.class:1149)
ThreadPoolExecutor$Worker.run() (/rt.jar/java.util.concurrent/ThreadPoolExecutor.class:624)
Thread.run() (/rt.jar/java.lang/Thread.class:748)

@prognant prognant added the wip label May 25, 2020
@prognant prognant added this to the Next milestone May 25, 2020
@@ -68,7 +61,7 @@ protected void createConnection() throws IOException {
this.env.put("attribute.remote.x.request.waiting.timeout", CONNECTION_TIMEOUT);
Copy link
Contributor Author

@prognant prognant May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short story : I think this is a no-op property.

Long story: I tried to find out where this property might be used. On the "official" oracle side I found something similar only in the the jmx spec with a minor variation (jmx.remore.x.request.waiting.timeout listed as a non-standard parameter). We were using this value a while ago (change), initially added with this change. So far the only implementation I found that seems to honour this parameter is weblogic, I did not find any clue about this property inside the JDK.

Copy link

@sirianni sirianni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, there might be a small change in behavior at connect-time but this looks much better. Great stuff.

public Socket createSocket(String host, int port) throws IOException {
Socket socket = factory.createSocket(host, port);
socket.setSoTimeout(timeoutMs);
socket.setSoLinger(false, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍰

@Override
public Socket createSocket(String host, int port) throws IOException {
Socket socket = factory.createSocket(host, port);
socket.setSoTimeout(timeoutMs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past we had two different timeout values:

  • sun.rmi.transport.proxy.connectTimeout (connect TO)
  • sun.rmi.transport.tcp.responseTimeout (read TO)

This socket-level, and more explicit SO_TIMEOUT is probably fine for all cases, but we may be potentially changing behavior on new connections by dropping the rmi_connection_timeout. It does feel like a bit of a redundant option, but I'm wondering if we didn't have to set that for some weird edge-case.

Also, I may be missing something!

Copy link
Contributor Author

@prognant prognant May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the usage of sun.rmi.transport.tcp.responseTimeout because it sets SO_TIMEOUT later in the whole thing lifecyle (as per 1 and 2) and some earlier operation would occur without having the SO_TIMEOUT set.

AFAIK the latter option sun.rmi.transport.proxy.connectTimeout should not have impact on our base case as far as my test went, we exit the socket factory here, before using the property value. I had a quick look in the code base history I did not find where we were using this property...

I think I will still try to rework the factory a little bit to have the timeout enforced during initial connection time. Because removing the mailbox made the code relying on the OS timeout behavior for the initial connection as the current factory sets the timeout after the initial connection (then I will retained the rmi_connection_timeout and come up with the exact same behaviour as before) WDYT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@truthbk I did something similar to the mailbox code (:trollface:) thus we are getting back the rmi_connection_timeout, however it's fully located in the socket factory so any subsequent connection not going through RemoteConnection will still be subject to the timeout and I think all case are covered (i.e. SSL/non-SSL, initial connection, subsequent connection directly by the RMI subsystem, timeout during socket lifetime).
@sirianni it should bring an effective & generic solution to the issue #279.

@prognant prognant requested a review from truthbk June 4, 2020 13:13
@prognant prognant force-pushed the prognant/rmi-socket-factory-with-timeout branch from c18678b to 4f3c530 Compare June 4, 2020 13:18
Copy link
Member

@truthbk truthbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

@prognant prognant merged commit 52f78f2 into master Jun 10, 2020
@prognant prognant deleted the prognant/rmi-socket-factory-with-timeout branch June 10, 2020 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants