Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scheduled task to clean up old files from workspace #16247

Merged
merged 7 commits into from
Sep 2, 2022

Conversation

gosusnp
Copy link
Contributor

@gosusnp gosusnp commented Sep 1, 2022

What

We need a job to clean up old files from the workspace to keep it size under control.

Closes #15567

How

Add a scheduled micronaut task that cleans up old data.
We are currently using TEMPORAL_HISTORY_RETENTION_IN_DAYS as configuration for older files.
This should be improved once we have the temporal history clean up in place.

Tests

Local validation from a running docker instance.
All files older than 30days have been removed, workspace went from 450Mb to 45Mb.

@gosusnp gosusnp temporarily deployed to more-secrets September 1, 2022 23:00 Inactive
@@ -2,7 +2,7 @@ ARG JDK_VERSION=19-slim-bullseye
ARG JDK_IMAGE=openjdk:${JDK_VERSION}
FROM ${JDK_IMAGE} AS cron

ARG VERSION=0.40.0-alpha
ARG VERSION=0.40.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if we know why this didn't get bumped by @evantahler's PR. Do we need to add airbyte-cron to a script somewhere to ensure it gets updated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gosusnp found the problem! .bumpversion.cfg is what the publish command uses to know which files to bump. This wasn't in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that should be the issue.

.map(Path::toFile)
.filter(f -> new AgeFileFilter(oldestAllowed).accept(f))
.forEach(file -> {
log.info("Deleting file: " + file.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this should probably be lowered to debug, as I suspect it will cause log file spam when we actually delete stuff. Perhaps replace it with a counter and a log message to indicate how many files were cleaned at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@jdpgrailsdev jdpgrailsdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@gosusnp gosusnp temporarily deployed to more-secrets September 2, 2022 17:09 Inactive
@gosusnp gosusnp temporarily deployed to more-secrets September 2, 2022 18:26 Inactive
@gosusnp gosusnp merged commit bf48791 into master Sep 2, 2022
@gosusnp gosusnp deleted the gosusnp/15567-logs_clean_up_task branch September 2, 2022 20:21
letiescanciano added a commit that referenced this pull request Sep 5, 2022
* master: (47 commits)
  Add email to identify users analytics call (#16327)
  🎉 Source Amazon Ads: improve `config.start_date` validation (#16191)
  Add comments about intermediate state emission (#16262)
  MySQL Source : Standardize spec.json for DB connectors that support log-based CDC replication (#16216)
  MSSQL Source : Standardize spec.json for DB connectors that support log-based CDC replication (#16215)
  Hide a bunch more destination with potential unsecure API access (#16320)
  Skip unit tests when run-tests is false (#16267)
  Hide Destination connections from UI (#16310)
  Add scheduled task to clean up old files from workspace (#16247)
  Source Google Analytics v4: Re-name google analytics connector (#16306)
  🐛 Source Facebook Marketing: remove "end_date" from config if empty value (re-implement #16096) (#16222)
  Fix github action syntax (#16277)
  Re-name google analytics cionnectors (#16287)
  Bump Airbyte version from 0.40.3 to 0.40.4 (#16275)
  Hide ES and Redis destination connectors from Cloud (#16276)
  15700  add tests for PokeAPI (#15701)
  Add ProtocolVersion to StandardDefs (#16237)
  🪟 🔧 🧹 Migrate attempt `bytesSynced` to `totalStats.bytesEmitted` and cleanup `AttemptDetails` component (#16126)
  Improve behavior of password input field (#16011)
  Improve airbyte-metrics support in the Helm chart (#16166)
  ...
robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
)

* Add airbyte-cron to bumpversion

* Update airbyte-cron version to current

* Add workspace clean up job

* Add missing env var to docker-compose

* Update file deletion logging
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
)

* Add airbyte-cron to bumpversion

* Update airbyte-cron version to current

* Add workspace clean up job

* Add missing env var to docker-compose

* Update file deletion logging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to clean up old data (logs)
3 participants