-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authentication error after upgrading to 0.23.1 #2453
Comments
@lbergelson from a quick scan of google-auth-client-java, it seems there's some code change regarding compute engine credential in version 0.7.1. Can you help check whether you could repro this issue with auth 0.7.1 ? (I saw you use gradle, so I assume simply declaring auth 0.7.1 as an explicit compile dependency should override auth 0.7.0) |
@neozwu I tried running with 0.22.0 but forcing the version of auth to 0.7.1, and was unable to reproduce the error that way. I also tried forcing 0.8.0 and wasn't able to reproduce the problem that way either. So either I'm building it wrong or it's not the auth library. From running
|
@neozwu I was able to resolve the 404 error that @lbergelson reported by building a custom version of the latest master of
So, it's very likely that the culprit is the It looks like they moved from using a domain name for the |
So it seems like the reason that I wasn't able to reproduce the issue by forcing 0.7.1 was that we're using the shaded jar, so the force statement wasn't actually replacing the dependencies. |
Any suggestions on how we could go about dealing with this issue? Does anyone think that the hardcoded IP address added in https://github.com/google/google-auth-library-java/pull/110/files could be the cause? |
Does anyone have any input on this one? It's important to us that we be able to upgrade. |
GCE metadata server team specifically recommend using the IP address as a best practice to avoid DNS issues when contacting the metadata server. Note that all of the Apiary client libraries have been using the fixed IP address for quite some time and to my knowledge have not experienced this issue. The downside is that if the GCE metadata server team were to ever change the IP address of the metadata server, it would break everyone who hardcoded the address. |
Since you have a complicated setup, it would be best if you tried executing a simple program which only invokes the google-auth-library-java 0.7.1 from within the environment without adding additional dependencies which would exclude your network/host setup from the failure scenario or shell out and use curl to contact the metadata server from your application and log what you get back using the IP address directly. If a much simpler application/shelling out works means that you have a dependency/configuration of java issue going on. If you exclude the host/network setup from the failure condition, then comparing the detailed maven dependency tree of the working and broken versions could help. For every library that changed, try doing a strict override on each changed dependency (and only that changed dependency keeping all other transitive dependencies the same) to narrow it down to which one is causing the failure. |
I was able to reproduce the same exact error running gatk PrintReadsSpark on a fresh Dataproc cluster with no special configuration applied. This suggests cluster/firewall misconfiguration may not be the problem. My repro's very short:
|
That's great @jean-philippe-martin -- now we're getting somewhere! Can you test whether the error goes away if you build a custom |
Your repro package still seems like it brings in a bunch of dependencies via Spark/CloudStorageFilesystem. From your simple application, try shelling out and sending a curl request by using the raw IP address and the non IP address version:
|
|
I can confirm that shelling out from the app works as expected (and so does directly ssh'ing to the Dataproc cluster nodes). |
One difference (not sure if relevant) is that |
Can you dump the verbose version of your dependency tree? |
Sure, @lukecwik, here is what I have. Apologies, it's long.
|
I think I ran into this as well after updating and adding some dependencies. No problems before doing that. DIFF
Dependency Tree
|
Should also note I only get the 404 when running locally. Running the job from DataProc works fine. Rolling back from |
We also ran into authentication issues. The issue was exposed when updating to
I was able to isolate the commit that broke it, if it's any help: Maybe it's better for me to post that as an issue on that repo? |
@nicktrav Would you mind copying over your comment to a new issue on the google-api-java-client repo? |
@hzyi-google I tried creating a cluster with the initialization action:
The error that @droazen saw is gone, but the 404 error has returned.
|
Ack. |
@lbergelson May you provide up to date build config for your project and dependency tree output? Also, you may try to create cluster with latest GCS and BQ connector versions - 1.9.0 and 0.13.0 respectively |
Looking into provided repro: https://github.com/jean-philippe-martin/nio-auth-repro |
@medb @jean-philippe-martin Thank you both. |
This is not a Dataproc specific issue, it is reproducible on vanilla GCE VM too. Presumably issue is caused by bug in shading configuration in newer versions of the Issue is not reproducible locally, because locally it uses client authentication, not a service account. Fix is here: jean-philippe-martin/nio-auth-repro#1 |
I don't understand what the issue with the shaded jar is. We're unable to use the unshaded jar due to dependency conflicts. Is there anyone who understands what the issue around the shading is? I discovered one bug in the shaded jar, but fixing it doesn't solve this issue. (#3540) One thing of note, is that if I ssh into the dataproc master node, and run the job there using spark-submit instead of passing it through dataproc, I don't encounter the error. |
@medb How do you reproduce on a vanilla GCE vm? We have not had any trouble running jobs on GCE with new versions of the library other than on dataproc. It would be useful if we could trigger it outside of dataproc to debug it more easily. |
I was using this repro app to run it locally, on GCE VM and Dataproc cluster. Locally (my workstation) it always work, but this is just because it does not use service account for authentication, on GCE VM and Dataproc it uses service account. |
@medb you're right, I was able to reproduce on a vanilla GCP VM. I started with an Ubuntu 16.04 image with "Allow full access to all Cloud APIs" enabled, then installed openJDK on it and ran the jar - it failed as before, complaining about potential misconfigured scopes. I also checked the jar that doesn't use the shaded version of NIO, and this worked. Just to check, I then typed "gcloud auth application-default login", and ran it again. This time, it worked beautifully. So it looks like there's something about the combination of default service account and shaded NIO jar that causes issues when running on Cloud. |
I had an interesting result playing with the way the shading is done for the NIO jar. The current NIO jar uses a When used in my repro app, the resulting jar results in the test passing - both in a GCE VM and when used via Dataproc. This is very promising. The next step is to try this with GATK itself and see if that solves the problem. |
Great catch, this should be it! Usually |
@jean-philippe-martin That's great news! If you open a PR here, we'd be happy to try out the patch with GATK on our end. |
@jean-philippe-martin I tested your branch using GATK, and it appears to completely resolve this issue! |
@droazen what wonderful news! That's great! |
I was able to identify the root cause, it's https://issues.apache.org/jira/browse/MSHADE-156 |
…f of our custom fork The google-cloud-java maintainers have merged a fix for the longstanding issue googleapis/google-cloud-java#2453 that prevented us from running on a modern version of the library, and forced us to run off of a fork. This PR updates us to the latest release, which incorporates the fix. Resolves #3591 Resolves #3500 Resolves #4986
I've confirmed that this issue is resolved with the latest release (0.59.0), so this can finally be closed! Thanks to everyone for their assistance over the past year! |
…f of our custom fork (#5135) The google-cloud-java maintainers have merged a fix for the longstanding issue googleapis/google-cloud-java#2453 that prevented us from running on a modern version of the library, and forced us to run off of a fork. This PR updates us to the latest release, which incorporates the fix. Resolves #3591 Resolves #3500 Resolves #4986
This long nightmare is over, but I forgot to close the issue. Closing it. Thank you everyone. |
We've started seeing an authentication error in our project after we upgraded to 0.23.1, the issue also seems to be present in 0.24.0. Reverting to 0.22.0 solves the issue.
We start seeing the following 404 error when running a spark application that uses NIO to access gcs files:
Looking at the dependency updates in this project, it seems like one of the auth libraries updated to version 0.8.0. Could that be the causing the issue?
Is there some new configuration setting we should be using in our gcloud project? Any help would be appreciated.
The text was updated successfully, but these errors were encountered: