-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor bulk_save_to_db #42245
Merged
Merged
Refactor bulk_save_to_db #42245
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7ddfec5
to
5d83fe0
Compare
This function collects information from DAG objects, and creates/updates database rows against them. However, it handles A LOT of information, reading a lot of objects, touching a lot of models. The function is not very readable. A new module has been introduced in airflow.dag_processing.collection to encapsulate the logic previously in bulk_save_to_db. Some loops are broken down into multiple loops, so each loop does not do too much (which leads to a lot of long-living variables that reduce readability). Not much is changed aside from that, just mostly splitting things into separate steps to make things clearer.
The dataset manager already does this.
7d9f778
to
e2ca6ce
Compare
This saves a few calls when no dataset/alias references were found in any DAGs.
e2ca6ce
to
d1862e6
Compare
d7ceda8
to
7d1e961
Compare
Note of self: Submit another PR to refactor |
Also maybe we need a new method in DatasetManager for creating aliases. (Or even more for other things?) |
Here we gooooooo |
Lee-W
reviewed
Sep 19, 2024
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
Lee-W
approved these changes
Sep 20, 2024
joaopamaral
pushed a commit
to joaopamaral/airflow
that referenced
this pull request
Oct 21, 2024
Co-authored-by: Ephraim Anierobi <splendidzigy24@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This function collects information from DAG objects, and creates/updates database rows against them. However, it handles A LOT of information, reading a lot of objects, touching a lot of models. The function is not very readable.
A new module has been introduced in airflow.dag_processing.collection to encapsulate the logic previously in bulk_save_to_db.
Some loops are broken down into multiple loops, so each loop does not do too much (which leads to a lot of long-living variables that reduce readability). Not much is changed aside from that, just mostly splitting things into separate steps to make things clearer.