Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: livenessProbe redundant exec leads to false-positive result #21994

Closed
2 tasks done
Wilderone opened this issue Mar 4, 2022 · 4 comments
Closed
2 tasks done
Labels
area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug

Comments

@Wilderone
Copy link
Contributor

Official Helm Chart version

1.4.0 (latest released)

Apache Airflow version

2.2.4 (latest released)

Kubernetes Version

v1.21.5-eks-bc4871b

Helm Chart configuration

No response

Docker Image customisations

No response

What happened

Current livenessProbe in scheduler returns 0 exit code even if execution failed, this leads to container won't restart if it necessary.
I see two ways to fix this:

  1. Simply remove - exec from livenessProbe
  2. Move livenessProbe to values for customisation by client

What you expected to happen

Restart container if livenessProve failed

How to reproduce

Fastest way - change livenessProbe to:

- sh
- c
- exec
- cat /123

You will see no errors in pod describe

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Wilderone Wilderone added area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug labels Mar 4, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 4, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk potiuk added invalid and removed invalid labels Mar 7, 2022
@potiuk
Copy link
Member

potiuk commented Mar 7, 2022

Hey @ephraimbuddy @jedcunningham - I think that one might be important enough to cancel 1.5.0 of chart and make a release candidate 2. I just checked it and I think this is is indeed wrong what we have.

The current liveness probe is:

            exec:
              command:
                - sh
                - -c
                - exec
                - |
                  CONNECTION_CHECK_MAX_COUNT=0 /entrypoint python -Wignore -c "
                  import os
                  os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                  os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'

                  from airflow.jobs.scheduler_job import SchedulerJob
                  from airflow.utils.db import create_session
                  from airflow.utils.net import get_hostname
                  import sys

                  with create_session() as session:
                      job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
                          SchedulerJob.latest_heartbeat.desc()).limit(1).first()

                  sys.exit(0 if job.is_alive() else 1)
                  "

But it should be IMHO:

            exec:
              command:
                - sh
                - -c
                - | 
                  CONNECTION_CHECK_MAX_COUNT=0 /entrypoint python -Wignore -c "
                  import os
                  os.environ['AIRFLOW__CORE__LOGGING_LEVEL'] = 'ERROR'
                  os.environ['AIRFLOW__LOGGING__LOGGING_LEVEL'] = 'ERROR'

                  from airflow.jobs.scheduler_job import SchedulerJob
                  from airflow.utils.db import create_session
                  from airflow.utils.net import get_hostname
                  import sys

                  with create_session() as session:
                      job = session.query(SchedulerJob).filter_by(hostname=get_hostname()).order_by(
                          SchedulerJob.latest_heartbeat.desc()).limit(1).first()

                  sys.exit(0 if job.is_alive() else 1)
                  "

@ephraimbuddy
Copy link
Contributor

Hey @ephraimbuddy @jedcunningham - I think that one might be important enough to cancel 1.5.0 of chart and make a release candidate 2. I just checked it and I think this is is indeed wrong what we have.

Yeah. I support having an RC 2.

@jedcunningham
Copy link
Member

This is fixed by #22041.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

4 participants