Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Datahub to 0.15 #1280

Open
MatMoore opened this issue Jan 24, 2025 · 2 comments
Open

Update Datahub to 0.15 #1280

MatMoore opened this issue Jan 24, 2025 · 2 comments
Assignees

Comments

@MatMoore
Copy link
Contributor

MatMoore commented Jan 24, 2025

This needs testing on dev first.

Releases - https://github.com/acryldata/datahub/releases

Looks as though the PR has been included in RC6 - acryldata/datahub@v1.0.0rc5...v1.0.0rc6

Upgrade Docs

Currently we're on 0.14.1, but now 0.15 is available.

This includes a new DataHubGc source which can clean up unwanted DataProcessInstance records.

In our case

  • DataProcessInstance metadata before 13th Jan is unreliable for dbt, due to us ingesting run results from the "deploy docs" job. So it may be worth setting dataprocess_cleanup.retention_days temporarily to clean this up.
  • We should configure a keep_last_n settings so that we don't run out of disk space
@github-project-automation github-project-automation bot moved this to Todo 📝 in Data Catalogue Jan 24, 2025
@mitchdawson1982 mitchdawson1982 moved this from Todo 📝 to In Progress 🚀 in Data Catalogue Feb 10, 2025
@mitchdawson1982 mitchdawson1982 self-assigned this Feb 10, 2025
@mitchdawson1982 mitchdawson1982 moved this from In Progress 🚀 to Review 🛂 in Data Catalogue Feb 17, 2025
@mitchdawson1982
Copy link
Collaborator

mitchdawson1982 commented Feb 17, 2025

Current status -

  • Dev is running the latest version V0.15.0.1
  • Ran ingestion with the updated cli version and the cadet based ingestions currently fail.

[2025-02-17 22:02:08,456] WARNING {root:243} - No top level tags found in database metadata file for workforce_stats
[2025-02-17 22:02:08,456] WARNING {root:243} - No top level tags found in database metadata file for workforce_stats
[2025-02-17 22:02:08,456] WARNING {root:243} - No top level tags found in database metadata file for bold_sm_spells
[2025-02-17 22:02:08,456] WARNING {root:243} - No top level tags found in database metadata file for sentences
[2025-02-17 22:02:08,456] WARNING {root:243} - No top level tags found in database metadata file for sentences
[2025-02-17 22:02:08,456] WARNING {root:243} - No top level tags found in database metadata file for derived_oasys_data_first
[2025-02-17 22:02:08,457] WARNING {root:243} - No top level tags found in database metadata file for derived_oasys_data_first
[2025-02-17 22:02:08,765] INFO {datahub.ingestion.run.pipeline:574} - Processing commit request for DatahubIngestionCheckpointingProvider. Commit policy = CommitPolicy.ALWAYS, has_errors=False, has_warnings=False
[2025-02-17 22:02:08,765] WARNING {datahub.ingestion.source.state_provider.datahub_ingestion_checkpointing_provider:99} - No state available to commit for DatahubIngestionCheckpointingProvider
[2025-02-17 22:02:08,770] INFO {datahub.ingestion.run.pipeline:594} - Successfully committed changes for DatahubIngestionCheckpointingProvider.
[2025-02-17 22:02:47,734] ERROR {datahub.ingestion.run.pipeline:78} - failed to write record with workunit urn:li:container:4d0d7fb5585c0a94e535cce7725324dc-containerProperties with ('Unable to emit metadata to DataHub GMS: com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries', 'status': 500, 'urn': 'urn:li:container:4d0d7fb5585c0a94e535cce7725324dc', 'workunit_id': 'urn:li:container:4d0d7fb5585c0a94e535cce7725324dc-containerProperties'}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'message': 'com.datahub.util.exception.RetryLimitReached: Failed to add after 3 retries', 'status': 500, 'urn': 'urn:li:container:4d0d7fb5585c0a94e535cce7725324dc', 'workunit_id': 'urn:li:container:4d0d7fb5585c0a94e535cce7725324dc-containerProperties'}

Possible issue - datahub-project/datahub#12202

@mitchdawson1982 mitchdawson1982 moved this from Review 🛂 to Blocked 🚫 in Data Catalogue Feb 19, 2025
@mitchdawson1982
Copy link
Collaborator

PR for Issue has been merged, no release as yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Blocked 🚫
Development

No branches or pull requests

2 participants