Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

result output parameter of script templates occasionally become empty #11330

Closed
3 tasks done
boiledfroginthewell opened this issue Jul 11, 2023 · 6 comments · Fixed by #11368
Closed
3 tasks done

result output parameter of script templates occasionally become empty #11330

boiledfroginthewell opened this issue Jul 11, 2023 · 6 comments · Fixed by #11368
Labels
area/templates/script P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug

Comments

@boiledfroginthewell
Copy link
Contributor

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

The attached workflow runs successfully most of the time but it randomly fails once in 40 to 100 runs. When it failed, the value of {{steps.echo.outputs.result}} was empty even if the echo step successufully finished.

The outputs and archived logs in the UI are also empty.

image
image
image

But kubernetes logs show that the script correctly displayed messages.

$ kubectl logs test-output-4q6zg-echo-336682105
time="2023-07-10T12:09:37.409Z" level=info msg="capturing logs" argo=true
hello
time="2023-07-10T12:09:37.412Z" level=info msg="sub-process exited" argo=true error="<nil>"

Version

v3.4.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: test-output-
spec:
  podGC:
    strategy: OnWorkflowSuccess
  ttlStrategy:
    secondsAfterCompletion: 1
    secondsAfterSuccess: 1
    secondsAfterFailure: 3600

  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: echo
            template: echo
        - - name: verify
            template: verify
            arguments:
              parameters:
                - name: message
                  value: "{{steps.echo.outputs.result}}"

    - name: echo
      script:
        image: busybox:1.35
        command: [sh]
        source: |
          echo hello
        # env:
        #   - name: ARGO_DEBUG_PAUSE_AFTER
        #     value: "true"
        resources:
          limits:
            cpu: "100m"
            memory: "100Mi"

    - name: verify
      inputs:
        parameters:
          - name: message
      script:
        image: busybox:1.35
        command: [sh]
        source: |
          if [ "{{inputs.parameters.message}}" != "hello" ]; then
            echo "steps.echo.outputs.result is empty: {{inputs.parameters.message}}" >&2
            exit 1
          fi

Logs from the workflow controller

n/a

Logs from in your workflow's wait container

time="2023-07-10T12:10:15.735Z" level=info msg="No output parameters"
time="2023-07-10T12:10:15.735Z" level=info msg="No output artifacts"
time="2023-07-10T12:10:15.736Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: test-output-4q6zg/test-output-4q6zg-verify-664085946/main.log"
time="2023-07-10T12:10:15.736Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2023-07-10T12:10:15.736Z" level=info msg="Saving file to s3" bucket=my-bucket endpoint="minio:9000" key=test-output-4q6zg/test-output-4q6zg-verify-664085946/main.log path=/tmp/argo/outputs/logs/main.log
time="2023-07-10T12:10:16.169Z" level=info msg="Save artifact" artifactName=main-logs duration=433.751568ms error="<nil>" key=test-output-4q6zg/test-output-4q6zg-verify-664085946/main.log
time="2023-07-10T12:10:16.169Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2023-07-10T12:10:16.169Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2023-07-10T12:10:16.183Z" level=info msg="Alloc=8474 TotalAlloc=16633 Sys=27773 NumGC=5 Goroutines=10"
time="2023-07-10T12:10:16.183Z" level=info msg="Deadline monitor stopped"
time="2023-07-10T12:09:37.829Z" level=info msg="Capturing script output"
time="2023-07-10T12:09:37.829Z" level=info msg="No output parameters"
time="2023-07-10T12:09:37.829Z" level=info msg="No output artifacts"
time="2023-07-10T12:09:37.830Z" level=info msg="S3 Save path: /tmp/argo/outputs/logs/main.log, key: test-output-4q6zg/test-output-4q6zg-echo-336682105/main.log"
time="2023-07-10T12:09:37.830Z" level=info msg="Creating minio client using static credentials" endpoint="minio:9000"
time="2023-07-10T12:09:37.830Z" level=info msg="Saving file to s3" bucket=my-bucket endpoint="minio:9000" key=test-output-4q6zg/test-output-4q6zg-echo-336682105/main.log path=/tmp/argo/outputs/logs/main.log
time="2023-07-10T12:09:37.997Z" level=info msg="Save artifact" artifactName=main-logs duration=167.537273ms error="<nil>" key=test-output-4q6zg/test-output-4q6zg-echo-336682105/main.log
time="2023-07-10T12:09:37.997Z" level=info msg="not deleting local artifact" localArtPath=/tmp/argo/outputs/logs/main.log
time="2023-07-10T12:09:37.997Z" level=info msg="Successfully saved file: /tmp/argo/outputs/logs/main.log"
time="2023-07-10T12:09:38.049Z" level=info msg="Alloc=8546 TotalAlloc=16632 Sys=23677 NumGC=5 Goroutines=10"
@boiledfroginthewell
Copy link
Contributor Author

I found that this can be reproduced by directly running argoexec command. This is somewhat easier to reproduce.

When I run the following command for a long time, /var/run/argo/ctr/main/stdout file becomes empty.
It seems that this happens when the stdout/combined files are closed before cmd.Start() finishes writing texts.

$ docker run \
    --rm -it \
    --cpus 0.5 \
    -e ARGO_CONTAINER_NAME=main \
    -e  ARGO_INCLUDE_SCRIPT_OUTPUT=true \
    -v ./argoexec:/var/run/argo/argoexec \
    busybox:latest \
    sh
$ echo '{}' > /var/run/argo/template
$ while :; do
    rm -f /var/run/argo/ctr/main/*
    < /dev/null /var/run/argo/argoexec emissary --loglevel info --log-format text -- echo hello
    ls -l /var/run/argo/ctr/main
    [ ! -s /var/run/argo/ctr/main/stdout ] && break
done

INFO[2023-07-10T05:17:44.676Z] capturing logs                                argo=true
hello
INFO[2023-07-10T05:17:45.715Z] sub-process exited                            argo=true error="<nil>"
total 12
-rw-rw-rw-    1 root     root             6 Jul 10 05:17 combined
-rw-r--r--    1 root     root             1 Jul 10 05:17 exitcode
-rw-rw-rw-    1 root     root             6 Jul 10 05:17 stdout
INFO[2023-07-10T05:17:47.280Z] capturing logs                                argo=true
hello
INFO[2023-07-10T05:17:47.354Z] sub-process exited                            argo=true error="<nil>"total 4
-rw-rw-rw-    1 root     root             0 Jul 10 05:17 combined
-rw-r--r--    1 root     root             1 Jul 10 05:17 exitcode
-rw-rw-rw-    1 root     root             0 Jul 10 05:17 stdout

@tooptoop4
Copy link
Contributor

i was also affected by this issue!

@caelan-io
Copy link
Member

Thanks for providing detailed repo instructions. @tooptoop4 - can you please upvote the issue to help prioritize it if you're also affected?

@caelan-io caelan-io added area/templates/script P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important labels Jul 14, 2023
@ajoskowski
Copy link

Any workaround here before new version is released?

terrytangyuan pushed a commit that referenced this issue Aug 10, 2023
… (#11368)

Signed-off-by: boiledfroginthewell <boiledfroginthewell@users.noreply.github.com>
terrytangyuan pushed a commit that referenced this issue Aug 11, 2023
… (#11368)

Signed-off-by: boiledfroginthewell <boiledfroginthewell@users.noreply.github.com>
cbuchli added a commit to helio/argo-workflows that referenced this issue Oct 4, 2023
@imliuda
Copy link
Contributor

imliuda commented Nov 13, 2023

actually, we don't need this line:

cmd.WaitDelay = 100 * time.Millisecond

according this link: golang/go#50436 , if we did't set the WaitDelay, it will just wait util EOF, the key is this line:

_ = cmd.Wait()

so, i think it is better to remove the WaitDelay, as if we set it, golang will do wait, and when it is timeout, golang will close the pipes, which may leading a partial content, because it is hard to decide how long to wait.

although i have merged the commit, but this issue can still happen, outputs only has a partial part, not the whole content.

@agilgur5
Copy link

agilgur5 commented Nov 13, 2023

actually, we don't need this line:

and when it is timeout, golang will close the pipes

The PR #11368 specifies that WaitDelay was intentionally added, as otherwise it could run indefinitely. So a "choose your poison" between the two.

It may make sense to make that delay configurable if that becomes a problem though. If you are having partial content issues, please file a new issue about that.

@argoproj argoproj locked as resolved and limited conversation to collaborators Apr 17, 2024
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this issue May 9, 2024
…proj#11330 (argoproj#11368)

Signed-off-by: boiledfroginthewell <boiledfroginthewell@users.noreply.github.com>
Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/templates/script P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important type/bug
Projects
None yet
6 participants