Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues submitting Dataflow jobs with v1.23.0 #1073

Closed
nicktrav opened this issue Oct 26, 2017 · 2 comments
Closed

Issues submitting Dataflow jobs with v1.23.0 #1073

nicktrav opened this issue Oct 26, 2017 · 2 comments
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.

Comments

@nicktrav
Copy link

Porting this over from googleapis/google-cloud-java#2453.

We're running into some authentication issues on the latest version (1.23.0). Here's what we see when trying to interface with GCS via some Beam / Dataflow jobs, for example:

Exception in thread "main" java.lang.RuntimeException: Error while staging packages
	at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:322)
	at org.apache.beam.runners.dataflow.util.PackageUtil.stageClasspathElements(PackageUtil.java:263)
	at org.apache.beam.runners.dataflow.util.GcsStager.stageFiles(GcsStager.java:65)
	at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:503)
	at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:153)
	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:295)
	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:281)
Caused by: java.lang.RuntimeException: Could not stage classpath element: /Users/nickt/Development/java/dist/realtime.jar
	at org.apache.beam.runners.dataflow.util.PackageUtil.stageOnePackage(PackageUtil.java:247)
	at org.apache.beam.runners.dataflow.util.PackageUtil.access$100(PackageUtil.java:65)
	at org.apache.beam.runners.dataflow.util.PackageUtil$2.run(PackageUtil.java:312)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
	at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
	at org.apache.beam.runners.dataflow.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Error executing batch GCS request
	at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:603)
	at org.apache.beam.sdk.util.GcsUtil.getObjects(GcsUtil.java:342)
	at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.matchNonGlobs(GcsFileSystem.java:217)
	at org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystem.match(GcsFileSystem.java:86)
	at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:111)
	at org.apache.beam.sdk.io.FileSystems.matchSingleFileSpec(FileSystems.java:141)
	at org.apache.beam.runners.dataflow.util.PackageUtil.stageOnePackage(PackageUtil.java:202)
	... 9 more
Caused by: java.util.concurrent.ExecutionException: com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
	at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:500)
	at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:479)
	at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:76)
	at org.apache.beam.sdk.util.GcsUtil.executeBatches(GcsUtil.java:595)
	... 15 more
Caused by: com.google.api.client.http.HttpResponseException: 404 Not Found
Not Found
	at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1070)
	at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:241)
	at org.apache.beam.sdk.util.GcsUtil$3.call(GcsUtil.java:588)
	at org.apache.beam.sdk.util.GcsUtil$3.call(GcsUtil.java:586)
	at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)
	at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
	at org.apache.beam.sdks.java.extensions.google.cloud.platform.core.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
	... 3 more

I was able to isolate the commit that broke it, if it's any help: 22e7683. Concretely, we can submit jobs just fine prior to this SHA.

Reverting to 1.22.0 fixes the issue in our case.

@mattwhisenhunt mattwhisenhunt added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Oct 27, 2017
@frew
Copy link

frew commented Nov 13, 2017

#1074 is the same root cause as this issue. As @nicktrav helpfully pointed out, 22e7683 by @ethanbao and @neozwu breaks clients of the library that don't setBatchPath() since it's now null instead of batch as previously. Was this breakage intentional? It seems strictly better to default to the old behavior in the Builder?

@moandcompany
Copy link

moandcompany commented Jan 6, 2018

Tagging this issue with a reference to #607 DataflowJavaSDK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.
Projects
None yet
Development

No branches or pull requests

5 participants