-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks marked as "UP_FOR_RESCHEDULE" get stuck in Executor.running and never reschedule #25728
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
It appears this was never an issue before 2.3.0 because the CeleryExecutor implemented it's own trigger_tasks logic, until this PR landed: #23016 |
Enabling Debug logs I see something very interesting, on Airflow 2.1.3 I see this debug message " Equivalently I see the "running task instances" debug message often go down to 0 in Airflow 2.1.3 but in Airflow 2.3.3 I never see this debug message go down to 0. @potiuk @malthe sorry to ping you directly but I'm really starting to think this is a bug in the change to celery executor rather than just our environment being broken. Are there any hints you can give that would help us better pin down what the problem might be? In the mean time I am going to try and see if I can reproduce the issue at home so I can post a reproducing example here that others can follow. |
It would be useful to have logs to see what exactly is going on. There's more background on the original change in this issue: #21316. |
Thanks I already read through this, I'll see what I can do about the logs (they're big so I will need to cut down to the relevant part and also I'd need to get management sign off, if I'm able to reproduce outside my company then it will make the process a lot simpler). |
Looks like it was our fault! It seems the issue was that our scheduler celery results backend was pointing to a different database than our worker celery results backend 🤦♂️. Thanks for responding earlier, sorry it was on our side. |
Apache Airflow version
2.3.3
What happened
Upon upgrading from Airflow 2.1.3 to Airflow 2.3.3 we have an issue with our sensors that have mode='reschedule'. Using
TimeSensor
as example:[2022-08-15 00:01:11,027] {base_executor.py:215} ERROR - could not queue task TaskInstanceKey(dag_id='TestDAG', task_id='testTASK', run_id='scheduled__2022-08-12T04:00:00+00:00', try_number=1, map_index=-1) (still running after 4 attempts)
Looking at the relevant code (https://github.com/apache/airflow/blob/2.3.3/airflow/executors/base_executor.py#L215) it seems that the Task Key was never removed from
self.running
after it initially rescheduled itself.What you think should happen instead
Rescheduled tasks should reschedule
How to reproduce
Operating System
Fedora 29
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
The symptoms of this discussion sounds the same, but no one has replied on it yet: #25651
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: