Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to clean up old data (logs) #15567

Closed
gosusnp opened this issue Aug 11, 2022 · 36 comments · Fixed by #16247
Closed

Provide a way to clean up old data (logs) #15567

gosusnp opened this issue Aug 11, 2022 · 36 comments · Fixed by #16247

Comments

@gosusnp
Copy link
Contributor

gosusnp commented Aug 11, 2022

Tell us about the problem you're trying to solve

We no longer clean up data which can cause space issues.

It was part of the old scheduler, it has been removed temporarily and is planned to be migrated (#11869)
As a result, until that is done, we no longer offer a way to programmatically clean up space.

Describe the solution you’d like

When we deprecated the code, we kept around the former clean up code.
Could we package it as a script to manually clean up as a workaround for now?

@marcosmarxm
Copy link
Member

Zendesk ticket #1884 has been linked to this issue.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-08-12 at 14:04:

Looks this feature was removed with the Scheduler. A ticket was created in Github #15567 any updates I'll return to you. Today as a workaround you need to clean the logs yourself

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-08-12 at 14:17:

Thanks, that’s excellent info! Will keep an eye on that issue. In the meantime I’ve done exactly that (a systemd timer and a cleanup script, but it’s a bit too hacky).

Could we package it as a script to manually clean up as a workaround for now?

Really appreciate this suggestion in the ticket!

[Discourse post]

@evantahler
Copy link
Contributor

This should be addressed by #15218

@marcosmarxm
Copy link
Member

Zendesk ticket #2096 has been linked to this issue.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-08-30 at 16:23:

Sorry to hear that Arash. Looks there is a problem after removing the airbyte-scheduler service.
Check issue: #15567
 
A workaround for now is clean the data folder to not consume so much space. I'll ask the engineer team to take a look and see what is possible to do too.

@evantahler evantahler changed the title Provide a way to clean up old data Provide a way to clean up old data (dup of https://github.com/airbytehq/airbyte/issues/15218) Aug 30, 2022
@evantahler
Copy link
Contributor

This is part of #15218

@evantahler evantahler changed the title Provide a way to clean up old data (dup of https://github.com/airbytehq/airbyte/issues/15218) Provide a way to clean up old data (logs) Sep 1, 2022
@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-01 at 17:19:

Today there isn't an easy script to run this Gergely. We need to wait a complete solution for now :(

@gosusnp
Copy link
Contributor Author

gosusnp commented Sep 2, 2022

@marcosmarxm, #16247 re-adds a job that removes old files from the workspace.

Currently, we will remove any files older than 30days by default.
This retention can be configured by setting TEMPORAL_HISTORY_RETENTION_IN_DAYS variable.
We should still document this behavior properly and improve how this is configured, but this should mitigate the space issues users are seeing.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-04 at 06:07:

@marcosmarxm Do you have any timeframe for fixing this issue? I think it’s very important.
Thanks

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-05 at 16:47:

@arashlayeghi I’ve had the same issue. I manually delete logs that are older than 7 days every week or so. Pain in the ass, though I suppose I could setup a cronjob. Would be great if this feature worked correctly in the Airbyte deployment.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-05 at 17:41:

Hello updating the status of this issue.
The solution was merged but isn't published to latest version, probably version: v0.40.5 will have those modifications.
 
You can check the discussion here: #16247

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-05 at 17:42:

Hello update here: the solution was merged.
But only next version v0.40.5 will receive the modifications, probably end of this week will have another update.
 
Please check PR #16247

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-13 at 05:22:

Thanks, Marcos,
I updated it to v0.40.5 now. Do I need to do something or does it automatically clean up the storage when needed?

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-13 at 16:16:

Airbyte version 0.45.5 reimplemented the feature, you can change the default value changing the variable TEMPORAL_HISTORY_RETENTION_IN_DAYS

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-13 at 16:17:

Airbyte version 0.45.5 reimplemented the feature, you can change the default value changing the variable TEMPORAL_HISTORY_RETENTION_IN_DAYS

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-13 at 16:22:

You can 

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-14 at 07:20:

Thanks, will give that a try!

Just for reference, as I wasn’t sure reading your comment, I see that the current default value is 30 days, that’s good to know.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-14 at 15:05:

You're correct.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-19 at 15:38:

Hello Marcos,

This is our .env file containing TEMPORAL_HISTORY_RETENTION_IN_DAYS=1 at the end.
I ran docker-compose up -d more than 34 hours ago but it seems the storage is getting full without cleaning up. What is wrong with my deployment?

env.txt (3.6 KB)

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-09-21 at 14:05:

Did you update to latest version of Airbyte?

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-11-15 at 09:19:

Hi @marcosmarxm
Follow your introduce above I config variable TEMPORAL_HISTORY_RETENTION_IN_DAYS=7 in .env.
I run docker compose up -d more than 7 days but the storage is still full without clean up.

Environment

  • Airbyte version: 0.40.18
  • OS Version / Instance: AWS EC2
  • Deployment: Docker compose
  • Step where error happened: Deploy with docker compose up
cat .env
# This file only contains Docker relevant variables.
#
# Variables with defaults have been omitted to avoid duplication of defaults.
# The only exception to the non-default rule are env vars related to scaling.
#
# See https://github.com/airbytehq/airbyte/blob/master/airbyte-config/config-models/src/main/java/io/airbyte/config/Configs.java
# for the latest environment variables.
#
# # Contributors - please organise this env file according to the above linked file.

SHARED

VERSION=0.40.18

When using the airbyte-db via default docker image

CONFIG_ROOT=/data
DATA_DOCKER_MOUNT=airbyte_data
DB_DOCKER_MOUNT=airbyte_db

Workspace storage for running jobs (logs, etc)

WORKSPACE_ROOT=/tmp/workspace
WORKSPACE_DOCKER_MOUNT=airbyte_workspace

Local mount to access local files from filesystem

todo (cgardens) - when we are mount raw directories instead of named volumes, *_DOCKER_MOUNT must

be the same as *_ROOT.

Issue: #578

LOCAL_ROOT=/tmp/airbyte_local
LOCAL_DOCKER_MOUNT=/tmp/airbyte_local

todo (cgardens) - hack to handle behavior change in docker compose. *_PARENT directories MUST

already exist on the host filesystem and MUST be parents of *_ROOT.

Issue: #577

HACK_LOCAL_ROOT_PARENT=/tmp

Proxy Configuration

Set to empty values, e.g. "" to disable basic auth

BASIC_AUTH_USERNAME=airbyte
BASIC_AUTH_PASSWORD=password

DATABASE

Airbyte Internal Job Database, see https://docs.airbyte.io/operator-guides/configuring-airbyte-db

DATABASE_USER=docker
DATABASE_PASSWORD=docker
DATABASE_HOST=db
DATABASE_PORT=5432
DATABASE_DB=airbyte

translate manually DATABASE_URL=jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT}/${DATABASE_DB} (do not include the username or password here)

DATABASE_URL=jdbc:postgresql://db:5432/airbyte
JOBS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=0.29.15.001

Airbyte Internal Config Database, defaults to Job Database if empty. Explicitly left empty to mute docker compose warnings.

CONFIG_DATABASE_USER=
CONFIG_DATABASE_PASSWORD=
CONFIG_DATABASE_URL=
CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION=0.35.15.001

AIRBYTE SERVICES

TEMPORAL_HOST=airbyte-temporal:7233
INTERNAL_API_HOST=airbyte-server:8001
#CONNECTOR_BUILDER_API_HOST=airbyte-connector-builder-server:80 #FIXME: Uncomment this when enabling the connector-builder
WEBAPP_URL=http://localhost:8000/

Although not present as an env var, required for webapp configuration.

API_URL=/api/v1/

JOBS

Relevant to scaling.

SYNC_JOB_MAX_ATTEMPTS=3
SYNC_JOB_MAX_TIMEOUT_DAYS=3
JOB_MAIN_CONTAINER_CPU_REQUEST=
JOB_MAIN_CONTAINER_CPU_LIMIT=
JOB_MAIN_CONTAINER_MEMORY_REQUEST=
JOB_MAIN_CONTAINER_MEMORY_LIMIT=

NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_LIMIT=
NORMALIZATION_JOB_MAIN_CONTAINER_MEMORY_REQUEST=
NORMALIZATION_JOB_MAIN_CONTAINER_CPU_LIMIT=
NORMALIZATION_JOB_MAIN_CONTAINER_CPU_REQUEST=

LOGGING/MONITORING/TRACKING

TRACKING_STRATEGY=segment
JOB_ERROR_REPORTING_STRATEGY=logging

Although not present as an env var, expected by Log4J configuration.

LOG_LEVEL=INFO

APPLICATIONS

Worker

WORKERS_MICRONAUT_ENVIRONMENTS=control-plane

Cron

CRON_MICRONAUT_ENVIRONMENTS=control-plane

Relevant to scaling.

MAX_SYNC_WORKERS=5
MAX_SPEC_WORKERS=5
MAX_CHECK_WORKERS=5
MAX_DISCOVER_WORKERS=5

Temporal Activity configuration

ACTIVITY_MAX_ATTEMPT=
ACTIVITY_INITIAL_DELAY_BETWEEN_ATTEMPTS_SECONDS=
ACTIVITY_MAX_DELAY_BETWEEN_ATTEMPTS_SECONDS=
WORKFLOW_FAILURE_RESTART_DELAY_SECONDS=
TEMPORAL_HISTORY_RETENTION_IN_DAYS=7

FEATURE FLAGS

AUTO_DISABLE_FAILING_CONNECTIONS=false
FORCE_MIGRATE_SECRET_STORE=false

MONITORING FLAGS

Accepted values are datadog and otel (open telemetry)

METRIC_CLIENT=

Useful only when metric client is set to be otel. Must start with http:// or https://.

OTEL_COLLECTOR_ENDPOINT="http://host.docker.internal:4317"

USE_STREAM_CAPABLE_STATE=true

ls -lah airbyte_workspace
..............
drwxr-xr-x    3 root root   4096 Nov  4 03:39 8290/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8291/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8292/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8293/
drwxr-xr-x    3 root root   4096 Nov  4 04:02 8294/
drwxr-xr-x    3 root root   4096 Nov  4 04:03 8295/
drwxr-xr-x    3 root root   4096 Nov  4 04:09 8296/
drwxr-xr-x    3 root root   4096 Nov  4 04:30 8297/
drwxr-xr-x    3 root root   4096 Nov  4 04:32 8298/
drwxr-xr-x    3 root root   4096 Nov  4 04:32 8299/
..............

[Discourse post]

@attaxia
Copy link
Contributor

attaxia commented Nov 16, 2022

I can confirm that setting TEMPORAL_HISTORY_RETENTION_IN_DAYS does not have the expected effect. I also checked the logs and don't see any entries indicating the cleanup has started.

@marcosmarxm
Copy link
Member

marcosmarxm commented Nov 21, 2022

@attaxia can you open a new issue to discuss the problem? Please give the complete info about your env and Airbyte platform version.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-11-21 at 20:23:

Sorry the delay here Arash and Quân, I'll need take a deeper look during the week.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-12-01 at 20:06:

It seems for me the issue lies in /var/lib/docker, which I also can’t seem to access without sudo permissions.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-12-02 at 18:13:

Are you have issues with storage in latest version Lucas?

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-12-05 at 05:43:

Hey @marcosmarxm I bumped disk to 60gb, and am going to set that variable TEMPORAL_HISTORY_RETENTION_IN_DAYS=7 to see if that solves it. I was migrating a 700M row size table and that was taking up all 30gb of previous disk.

I resolved the issue by doubling disk size and turning off all other connections while I ran the backfill on the 700M table. It would be great to know a better solution but disk is cheap.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2022-12-05 at 16:52:

Lucas would be better to open a new Github. issue to track what is happening in your case.

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2023-01-16 at 14:51:

@marcosmarxm Any word on a fix for this yet?

I’ve attempted to set TEMPORAL_HISTORY_RETENTION_IN_DAYS=7 but this has had no effect.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2023-01-16 at 14:52:

Typo in the version number here, 0.45.5 does not exist yet.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2023-01-17 at 09:07:

Hello Billy the PR #20317 renabled the log rotation for Airbyte 0.40.26 version. Can you check you're using this or later version?

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2023-01-17 at 09:28:

Hey @marcosmarxm , I can confirm i’m using version 0.40.22.
I will upgrade to latest ( v0.40.28 and report back.

[Discourse post]

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2023-01-17 at 10:06:

Please update to version 0.40.26 version see PR #20317

@marcosmarxm
Copy link
Member

Comment made from Zendesk by Marcos Marx on 2023-01-17 at 11:17:

marcosmarxm:

Hello Billy the PR Airbyte Cron: renable schedule to file cleaner by marcosmarxm · Pull Request #20317 · airbytehq/airbyte · GitHub renabled the log rotation for Airbyte 0.40.26 version. Can you check you’re using this or later version?

Upgrading from 0.40.22 to 0.40.28 has resolved the issue where setting TEMPORAL_HISTORY_RETENTION_IN_DAYS= had no affect.

Thanks @marcosmarxm

[Discourse post]

@CamPen21
Copy link

Hey Guys, question, my EC2 instance got full due to airbyte logs. I hadn't set the TEMPORAL_HISTORY_RETENTION_IN_DAYS option.
How can I manually free up space safely?
The stack won't start because the DB doesn't have any more space.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants