-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant network throughput drop after upgrading to 1.27.0 to 1.31.0 (Apache v2) #820
Comments
I came up with some code that reproduces this issue. It wasn't easy to find an external server where I can dump some random content and gives me average latency/throughput from the office, so I decided to mimic what our product actually does: pushing a blob to Docker Hub. The slowdown is already significant at our Google office.
The code below writes 40MB data to Docker Hub. It requires an actual Docker Hub account. You can create one easily in a second. Or, I can share my password privately. import com.google.api.client.http.*;
import com.google.api.client.http.apache.v2.ApacheHttpTransport;
import com.google.common.io.CharStreams;
import com.google.common.net.MediaType;
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.regex.*;
public class Main {
static class RandoContent implements HttpContent {
private final int megaBytes;
public RandoContent(int megaBytes) { this.megaBytes = megaBytes; }
@Override public long getLength() throws IOException { return megaBytes * 1024 * 1024; }
@Override public String getType() { return MediaType.OCTET_STREAM.toString(); }
@Override public boolean retrySupported() { return true; }
@Override public void writeTo(OutputStream out) throws IOException {
byte[] rando = new byte[1024];
for (int i = 0; i < megaBytes * 1024; i++) out.write(rando);
out.flush();
}
}
public static void main(String[] args) throws IOException {
String username = "francium25";
String password = ...
String dockerHubRepo = "francium25/test";
String authToken = getAuthToken(dockerHubRepo, username, password); // don't show this in public
String uploadUrl = getUploadUrl(dockerHubRepo, authToken);
System.out.println("start timing " + ApacheHttpTransport.class.getPackage().getName() + "...");
long started = System.nanoTime();
new ApacheHttpTransport().createRequestFactory()
.buildPatchRequest(new GenericUrl(uploadUrl), new RandoContent(40))
.setHeaders(new HttpHeaders().setAuthorization("Bearer " + authToken))
.execute();
System.out.println("elapsed (s): " + (System.nanoTime() - started) / 100000000L / 10.);
}
private static String getAuthToken(String repository, String username, String password) throws IOException {
String authUrl = "https://auth.docker.io/token?service=registry.docker.io&scope=repository:" + repository + ":pull,push";
HttpResponse res = new ApacheHttpTransport().createRequestFactory()
.buildGetRequest(new GenericUrl(authUrl))
.setHeaders(new HttpHeaders().setBasicAuthentication(username, password))
.execute();
String json = CharStreams.toString(new InputStreamReader(res.getContent(), StandardCharsets.UTF_8));
Matcher m = Pattern.compile("\"token\"\\s*:\\s*\"([^\"]+)\"").matcher(json);
m.find();
return m.group(1); // don't show this in public
}
private static String getUploadUrl(String repository, String authToken) throws IOException {
String uploadInitUrl = "https://registry-1.docker.io/v2/" + repository + "/blobs/uploads/";
HttpResponse res =
new ApacheHttpTransport().createRequestFactory()
.buildPostRequest(new GenericUrl(uploadInitUrl), new RandoContent(0))
.setHeaders(new HttpHeaders().setAuthorization("Bearer " + authToken))
.execute();
return res.getHeaders().getLocation();
}
} |
This might help with #820 @chanseokoh @chingor13
Do any of these look like likely causes: https://issues.apache.org/jira/projects/HTTPCORE/versions/12344865 |
1.27.0 used <project.httpclient.version>4.5.5</project.httpclient.version> so there's been a lot of time for performance regressions to sneak in. We might need to do some http debugging and packet inspection here. It might also be worthwhile pinning the version of httpclient and httpcore to 4.5.5 in your own project and see if that fixes it. |
Did quick tests.
Looks like it's not the Apache library, but some change on our side with 1.30.2. |
One thing that comes to my mind is #715. 1.30.1 wasn't configuring values the code is supposed to. But fixing the bug to set the values may have adversely affected performance? |
I think this deserves priority. People (including us) can't/shouldn't use recent versions of this library, because the performance degraded so much to the point of being not usable. |
It's worth listing out all the socket options that are being set, either directly or by default. Then we can run it by a TCP guru to see if any of them jump out as suspicious. If not, we can do more detailed testing and experiments. Based on your experiments so far I'd guess that the one or more of the TCP options, previously ignored, is now being set to a bad value. If we can identify the culprit it should be a simple case of setting it to a good value for your use case. |
some of the defaults for TCP were set back in the dark ages of the 1980s, and don't make a lot of sense for today's networks... |
In particular try bumping the send and receive buffer sizes from 8K to 64K and see if that helps. |
I see an issue with your sample code. For large files you should use request/response entity streaming. |
fixed by #855 |
…ent-bom to v1.41.0 (googleapis#820)
I've filed several issues here and I saw that 1.31.0 is the first release that contained all the fixes I needed. We finally managed to upgrade from 1.27.0 to 1.31.0 (Apache v2) and get rid of all deprecated usage.
However, after making a release of our product, we suddenly got numerous reports of slow network operations (slow write at least). Often slow to the point of hanging or taking a couple hours. At first I thought these were isolated issues in their environments, but we soon realized this is universal and consistent. We observed the degree of throughput drop is already significant on fast network at the office (for example, 9s vs 67s). The symptom gets dramatically worse when you are on not-so-fast network or an operation takes longer to complete (and I will explain why later). For example, in my own testing at home, the same task takes ~45s with 1.27.0, but it hangs indefinitely and times out after 17 minutes with 1.31.0.
Our team members individually verified the performance drop. We were able to track down the cause, and downgrading the library to 1.27.0 resolves the issue. For now, we decided to go back to 1.27.0. We don't know who's causing this issue at which level.
However, we cannot hold off upgrading the library forever. So, this issue must be cracked down eventually for us to move forward.
Here's one interesting observation from my own experiment, which I hope will shed some light:
When uploading ~40MB data in a fresh new single HTTP request, the throughput is always steadily >1MB/s with 1.27.0. The whole operation takes ~30 seconds or more. But with 1.31.0, the throughput starts with lower, say ~800KB/s, and it gradually wanes over time to 600KB/s, 400KB/s, ..., to absolute 0. No activity in the end. After 15 minutes or so, it throws
java.net.SocketException: Connection timed out (Write failed)
.The text was updated successfully, but these errors were encountered: