KubernetesJobWatcher no longer inherits from Process #11017

dimberman · 2020-09-18T17:50:14Z

multiprocessing.Process is set up in a very unfortunate manner
that pretty much makes it impossible to test a class that inherits from
Process or use any of its internal functions. For this reason we decided
to seperate the actual process based functionality into a class member

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

multiprocessing.Process is set up in a very unfortunate manner that pretty much makes it impossible to test a class that inherits from Process or use any of its internal functions. For this reason we decided to seperate the actual process based functionality into a class member

This reverts commit 1539bd0.

#11065) This reverts commit 1539bd0.

We introduced deletion of the old artifacts as this was a suspected culprit of Kubernetes Job failures. It turned out eventually that those Kubernetes Job failures were caused by the apache#11017 change, but it's good to do housekeeping of the artifacts anyway. The delete workflow action introduced in a hurry had two problems: * it runs for every fork if they sync master. This is a bit too invasive * it fails continuously after 10 - 30 minutes every time as we have too many old artifacts to delete (GitHub has 90 days retention policy so we have likely tens of thousands of artifacts to delete) * it runs every hour and it causes occasional API rate limit exhaution (because we have too many artifacts to loop trough) This PR introduces filtering with the repo, changes frequency of deletion to be 4 times a day and adds script that we are running manualy to delete those excessive artifacts now. Eventually when the number of artifacts goes down the regular job shoul delete maybe few hundreds of artifacts appearing within the 6 hours window and it should stop failing.

We introduced deletion of the old artifacts as this was the suspected culprit of Kubernetes Job failures. It turned out eventually that those Kubernetes Job failures were caused by the #11017 change, but it's good to do housekeeping of the artifacts anyway. The delete workflow action introduced in a hurry had two problems: * it runs for every fork if they sync master. This is a bit too invasive * it fails continuously after 10 - 30 minutes every time as we have too many old artifacts to delete (GitHub has 90 days retention policy so we have likely tens of thousands of artifacts to delete) * it runs every hour and it causes occasional API rate limit exhaustion (because we have too many artifacts to loop trough) This PR introduces filtering with the repo, changes the frequency of deletion to be 4 times a day. Back of the envelope calculation tops 4/day at 2500 artifacts to delete at every run so we have low risk of reaching 5000 API calls/hr rate limit. and adds script that we are running manually to delete those excessive artifacts now. Eventually when the number of artifacts goes down the regular job should delete maybe a few hundreds of artifacts appearing within the 6 hours window in normal circumstances and it should stop failing then.

boring-cyborg bot added area:Scheduler including HA (high availability) scheduler k8s labels Sep 18, 2020

dimberman requested review from potiuk, ashb and kaxil September 18, 2020 17:50

kaxil approved these changes Sep 18, 2020

View reviewed changes

dimberman merged commit 1539bd0 into apache:master Sep 18, 2020

dimberman deleted the kubernetes-job-watcher-seperate branch September 18, 2020 18:33

dimberman added a commit that referenced this pull request Sep 21, 2020

Revert "KubernetesJobWatcher no longer inherits from Process (#11017)"

c14666a

This reverts commit 1539bd0.

dimberman mentioned this pull request Sep 21, 2020

Revert "KubernetesJobWatcher no longer inherits from Process" #11065

Merged

dimberman added a commit that referenced this pull request Sep 21, 2020

Revert "KubernetesJobWatcher no longer inherits from Process (#11017)" (

f4513c0

#11065) This reverts commit 1539bd0.

potiuk mentioned this pull request Sep 22, 2020

Improves deletion of old artifacts. #11079

Merged

kaxil added provider:cncf-kubernetes Kubernetes provider related issues and removed area:k8s labels Nov 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubernetesJobWatcher no longer inherits from Process #11017

KubernetesJobWatcher no longer inherits from Process #11017

dimberman commented Sep 18, 2020

KubernetesJobWatcher no longer inherits from Process #11017

KubernetesJobWatcher no longer inherits from Process #11017

Conversation

dimberman commented Sep 18, 2020