Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jenkins spark tests failing with 404 #3591

Closed
lbergelson opened this issue Sep 19, 2017 · 17 comments · Fixed by #5135
Closed

jenkins spark tests failing with 404 #3591

lbergelson opened this issue Sep 19, 2017 · 17 comments · Fixed by #5135

Comments

@lbergelson
Copy link
Member

The jenkins spark tests are failing with the following error:

This seems to have been introduced in #3576

code:      0
message:   Error code 404 trying to get security access token from Compute Engine metadata for the default service account. This may be because the virtual machine instance does not have permission scopes specified.
reason:    null
location:  null
retryable: false
com.google.cloud.storage.StorageException: Error code 404 trying to get security access token from Compute Engine metadata for the default service account. This may be because the virtual machine instance does not have permission scopes specified.
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:189)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:339)
	at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:197)
	at com.google.cloud.storage.StorageImpl$5.call(StorageImpl.java:194)
	at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:91)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:54)
	at com.google.cloud.storage.StorageImpl.get(StorageImpl.java:194)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.checkAccess(CloudStorageFileSystemProvider.java:614)
	at java.nio.file.Files.exists(Files.java:2385)
	at htsjdk.samtools.util.IOUtil.assertFileIsReadable(IOUtil.java:346)
	at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:206)
	at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:162)
	at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:118)
	at org.broadinstitute.hellbender.engine.ReadsDataSource.<init>(ReadsDataSource.java:87)
	at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSource.getHeader(ReadsSparkSource.java:182)
	at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeReads(GATKSparkTool.java:390)
	at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.initializeToolInputs(GATKSparkTool.java:370)
	at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:360)
	at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
	at org.broadinstitute.hellbender.Main.main(Main.java:233)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Error code 404 trying to get security access token from Compute Engine metadata for the default service account. This may be because the virtual machine instance does not have permission scopes specified.
	at shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials.refreshAccessToken(ComputeEngineCredentials.java:137)
	at shaded.cloud_nio.com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:160)
	at shaded.cloud_nio.com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:146)
	at shaded.cloud_nio.com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
	at com.google.cloud.http.HttpTransportOptions$1.initialize(HttpTransportOptions.java:157)
	at shaded.cloud_nio.com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:300)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:337)
	... 32 more
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [cb87810a-0133-42b3-a954-363b62adce39] entered state [ERROR] while waiting for [DONE].
@lbergelson
Copy link
Member Author

@davidbernick I opened this issue to track the problem.

@lbergelson
Copy link
Member Author

lbergelson commented Sep 19, 2017

@mwalker174 Is encountering the same problem in the wild. He's reporting that it goes away if you specify the environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY

he's seeing the warning message:

16:55:09.480 WARN  SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly

which should only appear during tests, so something is strange.

@lbergelson
Copy link
Member Author

As far as I can tell, getting that error message means that BaseTest is being loaded at runtime, and running it's static initializer block which calls SparkContextFactory.enableTestSparkContext();

lbergelson added a commit that referenced this issue Sep 19, 2017
This reverts commit b47838c.
This commit introduced a major issue for spark #3591.
@cmnbroad
Copy link
Collaborator

cmnbroad commented Sep 20, 2017

I think the message isn't coming from BaseTest, its coming from a static block in SparkContextFactory:

at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.getGcsHadoopAdapterTestProperties(SparkContextFactory.java:68)
at org.broadinstitute.hellbender.engine.spark.SparkContextFactory.<clinit>(SparkContextFactory.java:59)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineArgumentCollection.<init>(SparkCommandLineArgumentCollection.java:20)
at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.<init>(SparkCommandLineProgram.java:30)
at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.<init>(GATKSparkTool.java:64)
at org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark.<init>(PrintReadsSpark.java:19)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.broadinstitute.hellbender.Main.extractCommandLineProgram(Main.java:285)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:150)
at org.broadinstitute.hellbender.Main.main(Main.java:233)

@droazen
Copy link
Contributor

droazen commented Sep 20, 2017

@cmnbroad @lbergelson Looks like SparkContextFactory.DEFAULT_TEST_PROPERTIES is currently initialized statically at class-loading time, resulting in a call to getGcsHadoopAdapterTestProperties() even when we're not running the test suite.

@droazen
Copy link
Contributor

droazen commented Sep 20, 2017

@jean-philippe-martin Can you comment on this one? It looks like google-cloud-java recently bumped their google-auth-library-credentials and google-auth-library-oauth2-http dependencies to 0.8.0 -- was there some change that would require us to modify our authentication-related code in GATK, and/or the permissions setup in our Google Cloud project, that could explain the error:

Error code 404 trying to get security access token from Compute Engine metadata for the default service account. This may be because the virtual machine instance does not have permission scopes specified.

@mwalker174
Copy link
Contributor

I was just talking with @vruano. The error might come from improper permissions/roles being set up on the cluster by default.

@lbergelson
Copy link
Member Author

It's weird that it worked before though if roles aren't set up right. It seems like security issues shouldn't be solved by asking people to upgrade their client software so that it can deny them permission.

@droazen
Copy link
Contributor

droazen commented Sep 20, 2017

It seems plausible to me, though, that the Google auth library may have been patched to perform checks that it wasn't performing previously. Maybe our project permissions have always been mis-configured :)

@jean-philippe-martin
Copy link
Contributor

Sorry guys I have no special insight on this. Do you have a command line so I can try to reproduce locally?

@mwalker174
Copy link
Contributor

So this seems to only happen when trying to access a bucket from a job on dataproc. For example, the following throws the error:

./gatk-launch PathSeqFilterSpark -I gs://bucket/in.bam -O gs://bucket/out.bam -- --sparkRunner GCS --cluster my-cluster

but the following does not:
./gatk-launch PathSeqFilterSpark -I hdfs://bams/in.bam -O hdfs://bams/out.bam -- --sparkRunner GCS --cluster my-cluster

This happens even if I launch the cluster "gcloud dataproc clusters create ... --scope cloud-platform", which is supposed to grant full storage permissions. I believe this is equivalent to checking the "Allow API access to all Google Cloud Services" box if you launch a cluster through the web console.

Also explicitly adding the service account as a "storage legacy bucket owner" does not seem to help.

@jean-philippe-martin
Copy link
Contributor

jean-philippe-martin commented Sep 21, 2017

OK so just following along; the problem appears related to the Google Cloud Storage Connector and its configuration. When running on Cloud we need to ask for the https://www.googleapis.com/auth/devstorage.read_write scope, as described in the install docs. But you're right that https://www.googleapis.com/auth/cloud-platform should imply that so it should work...

The command line argument is --scopes (plural) and not --scope but that's probably not the issue, the tool would have complained if you actually typed scope in there.

Perhaps the code is trying to do the non-cloud setup and that's what's making it not work on cloud?

@lbergelson
Copy link
Member Author

this may be related to #3491, although that one predates this by quite a bit

@jamesemery
Copy link
Collaborator

jamesemery commented Sep 21, 2017

Apparently related, just running IndexFeatureFile on my machine results in several stack traces:

Sep 21, 2017 4:10:53 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
WARNING: Failed to detect whether we are running on Google Compute Engine.
java.net.ConnectException: Host is down (connect failed)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
	at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
	at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
	at shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials.runningOnComputeEngine(ComputeEngineCredentials.java:176)
	at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.tryGetComputeCredentials(DefaultCredentialsProvider.java:270)
	at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentialsUnsynchronized(DefaultCredentialsProvider.java:194)
	at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentials(DefaultCredentialsProvider.java:112)
	at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:113)
	at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:86)
	at com.google.cloud.ServiceOptions.defaultCredentials(ServiceOptions.java:277)
	at com.google.cloud.ServiceOptions.<init>(ServiceOptions.java:252)
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:82)
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:30)
	at com.google.cloud.storage.StorageOptions$Builder.build(StorageOptions.java:77)
	at org.broadinstitute.hellbender.utils.gcs.BucketUtils.setGlobalNIODefaultOptions(BucketUtils.java:361)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:155)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
	at org.broadinstitute.hellbender.Main.main(Main.java:233)

and

WARNING: Failed to detect whether we are running on Google Compute Engine.
java.net.ConnectException: Host is down (connect failed)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
	at sun.net.www.http.HttpClient.New(HttpClient.java:339)
	at sun.net.www.http.HttpClient.New(HttpClient.java:357)
	at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
	at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
	at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
	at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
	at shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials.runningOnComputeEngine(ComputeEngineCredentials.java:176)
	at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.tryGetComputeCredentials(DefaultCredentialsProvider.java:270)
	at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentialsUnsynchronized(DefaultCredentialsProvider.java:194)
	at shaded.cloud_nio.com.google.auth.oauth2.DefaultCredentialsProvider.getDefaultCredentials(DefaultCredentialsProvider.java:112)
	at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:113)
	at shaded.cloud_nio.com.google.auth.oauth2.GoogleCredentials.getApplicationDefault(GoogleCredentials.java:86)
	at com.google.cloud.ServiceOptions.defaultCredentials(ServiceOptions.java:277)
	at com.google.cloud.ServiceOptions.<init>(ServiceOptions.java:252)
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:82)
	at com.google.cloud.storage.StorageOptions.<init>(StorageOptions.java:30)
	at com.google.cloud.storage.StorageOptions$Builder.build(StorageOptions.java:77)
	at org.broadinstitute.hellbender.utils.gcs.BucketUtils.setGlobalNIODefaultOptions(BucketUtils.java:361)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:155)
	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195)
	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:131)
	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:152)
	at org.broadinstitute.hellbender.Main.main(Main.java:233)

I was able to fix the issue by setting the environment variable NO_GCE_CHECK=true in my shell though

@droazen
Copy link
Contributor

droazen commented Sep 21, 2017

@mwalker174 @lbergelson @jean-philippe-martin I was able to fix the 404 error by building a custom version of the lastest master of google-cloud-java with the following patch:

diff --git a/pom.xml b/pom.xml
index 0a77a625b0..e0884bbf2d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -131,10 +131,10 @@
     <api-client.version>1.22.0</api-client.version>
 
     <api-common.version>1.1.0</api-common.version>
-    <gax.version>1.8.1</gax.version>
-    <gax-grpc.version>0.25.1</gax-grpc.version>
+    <gax.version>1.8.0</gax.version>
+    <gax-grpc.version>0.25.0</gax-grpc.version>
     <generatedProto.version>0.1.19</generatedProto.version>
-    <google.auth.version>0.8.0</google.auth.version>
+    <google.auth.version>0.7.0</google.auth.version>
     <grpc.version>1.6.1</grpc.version>
     <guava.version>20.0</guava.version>
     <http-client.version>1.22.0</http-client.version>

So it's likely the google.auth.version bump that introduced the error -- in particular, the change described in https://github.com/google/google-auth-library-java/releases/tag/v0.7.1 and implemented in googleapis/google-auth-library-java#110

@droazen
Copy link
Contributor

droazen commented Sep 21, 2017

I've updated googleapis/google-cloud-java#2453 with this result -- we'll see what they say.

@droazen droazen added this to the Engine-4.1 milestone Jan 16, 2018
@droazen droazen modified the milestones: Engine-4.1, Engine-1Q2018 Feb 5, 2018
@droazen droazen modified the milestones: Engine-1Q2018, Engine-2Q2018 Apr 6, 2018
@droazen
Copy link
Contributor

droazen commented Aug 24, 2018

Fixed in google-cloud-java 0.59.0

droazen added a commit that referenced this issue Aug 24, 2018
…f of our custom fork

The google-cloud-java maintainers have merged a fix for the longstanding issue
googleapis/google-cloud-java#2453 that prevented us
from running on a modern version of the library, and forced us to run off of a fork.
This PR updates us to the latest release, which incorporates the fix.

Resolves #3591
Resolves #3500
Resolves #4986
droazen added a commit that referenced this issue Aug 24, 2018
…f of our custom fork (#5135)

The google-cloud-java maintainers have merged a fix for the longstanding issue
googleapis/google-cloud-java#2453 that prevented us
from running on a modern version of the library, and forced us to run off of a fork.
This PR updates us to the latest release, which incorporates the fix.

Resolves #3591
Resolves #3500
Resolves #4986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment