-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improves deletion of old artifacts. #11079
Improves deletion of old artifacts. #11079
Conversation
We introduced deletion of the old artifacts as this was a suspected culprit of Kubernetes Job failures. It turned out eventually that those Kubernetes Job failures were caused by the apache#11017 change, but it's good to do housekeeping of the artifacts anyway. The delete workflow action introduced in a hurry had two problems: * it runs for every fork if they sync master. This is a bit too invasive * it fails continuously after 10 - 30 minutes every time as we have too many old artifacts to delete (GitHub has 90 days retention policy so we have likely tens of thousands of artifacts to delete) * it runs every hour and it causes occasional API rate limit exhaution (because we have too many artifacts to loop trough) This PR introduces filtering with the repo, changes frequency of deletion to be 4 times a day and adds script that we are running manualy to delete those excessive artifacts now. Eventually when the number of artifacts goes down the regular job shoul delete maybe few hundreds of artifacts appearing within the 6 hours window and it should stop failing.
@@ -1,11 +1,12 @@ | |||
name: 'Delete old artifacts' | |||
on: | |||
schedule: | |||
- cron: '0 * * * *' # every hour | |||
- cron: '27 */6 * * *' # run every 6 hours |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say "Every day" would be good enough, WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bit of a problem - we cannot rate-limit the action and we occasionally exhausted the limit now.
I did some back-of-the-envelope calculations:
The limit is 5000 API calls / hr. So if we have > 3000 artifacts to delete, we are getting dangerously close to be able to exhaust our API calls within single run.
We have (tops) ~ 200 builds a day with (tops) ~ 50 artifacts each (assume intensive period and increasing number of artifacts) > 10.000 artifacts to delete a day. Running it 4 times/day is ~ 2.500 artifacts to delete for each run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. I am running the deletion now using the script with ~ 1000 deletions/hour ( I also hit rate limit with my personal token). With those assumptions and 90 day retention period, it will take ~ 10 days to delete all old artifacts (assuming we have 30% of the 10.000/day = 3000/day * 90 /24 ~ 10 days :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW. I think my calculations were a bit too pessimistics. Seems that 24 hours ewere enough to delete the artifacts. The job started to succeed and it takes ~ 5 minutes to run when it runs. Also my script is now often finishing after some 30-50 artifacts so seems we got to a "clean state" much faster than I thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Now when I run "delete" and no jobs finish in the meantime there are no artifacts to delete :) @dimberman -> artifacts are cleaned up now..
@@ -0,0 +1,84 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we could have written it in python/octokit, but I've found the bash script that worked out of the box. Improved it slightly and refactored according to Google Shell Guide and it seems to do it''s job (running it now in the background).
Raw logs show "2020-09-22T12:02:38.4767818Z ##[error]Process completed with exit code 137." (seems a worker killed or OOM) Merging. |
We introduced deletion of the old artifacts as this was a
suspected culprit of Kubernetes Job failures. It turned out
eventually that those Kubernetes Job failures were caused by
the #11017 change, but it's good to do housekeeping of the
artifacts anyway.
The delete workflow action introduced in a hurry had two problems:
it runs for every fork if they sync master. This is a bit
too invasive
it fails continuously after 10 - 30 minutes every time
as we have too many old artifacts to delete (GitHub has
90 days retention policy so we have likely tens of
thousands of artifacts to delete)
it runs every hour and it causes occasional API rate limit
exhaution (because we have too many artifacts to loop trough)
This PR introduces filtering with the repo, changes frequency
of deletion to be 4 times a day and adds script that we are
running manualy to delete those excessive artifacts now. Eventually
when the number of artifacts goes down the regular job shoul delete
maybe few hundreds of artifacts appearing within the 6 hours window
and it should stop failing.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.