Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-69: Airflow Core adjustments for introduction of Edge Executor #41730

Merged
merged 1 commit into from
Oct 18, 2024

Conversation

jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Aug 25, 2024

This PR is a companion to PR #41729 and splits the core adjustments from PoC in PR #40224 and "Mothership" PR #40900.

The PR carries documentation and a small extension to the variables allowing the core to "know" EdgeExecutor. Therefore it is not really a feature but mainly a doc-only change and would propose to add this also to Airflow 2.10 line. Feedback welcome.

Note: Build of docs depends on PR for Provider Package being merged first. (Reference missing)

Note After the discussion the PR is adjusted now from "Remote Executor" to "Edge Executor"

Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think distributed is a good name and works perfectly well for the new feature being delivered.

@potiuk
Copy link
Member

potiuk commented Aug 27, 2024

I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think distributed is a good name and works perfectly well for the new feature being delivered.

Agrew with @vincbeck. Naming is hard but I'd rather call the new executor "Distributed" rather than the reuse "old" name for it. Both namings are equally good IMHO and it's up to us to define them - so in this case it is the question of either adding new term, or reusing old term for something else and adding the new term for smth else. Sounds it has a lot potential for confusion.;

@jscheffl
Copy link
Contributor Author

I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think distributed is a good name and works perfectly well for the new feature being delivered.

Agrew with @vincbeck. Naming is hard but I'd rather call the new executor "Distributed" rather than the reuse "old" name for it. Both namings are equally good IMHO and it's up to us to define them - so in this case it is the question of either adding new term, or reusing old term for something else and adding the new term for smth else. Sounds it has a lot potential for confusion.;

I'm rather for the term remote as it is shorter and makes CLI and docs shorter. But not fixed to the name. I can make a bulk search&replace. Shall I call a vot eon the name on the DevList?

@potiuk
Copy link
Member

potiuk commented Aug 27, 2024

At least discuss - yes. would be good. We did that in recent past.

@jscheffl
Copy link
Contributor Author

jscheffl commented Aug 27, 2024

I'm rather for the term remote as it is shorter and makes CLI and docs shorter. But not fixed to the name. I can make a bulk search&replace. Shall I call a vot eon the name on the DevList?

--> Happy discussion! https://lists.apache.org/thread/br1jfoc8p1wjzk74c09srjgr29spytfy

@jscheffl jscheffl force-pushed the feature/aip-69-remote-executor-core branch from c82b1a0 to 98a5b2a Compare August 29, 2024 22:27
@jscheffl jscheffl changed the title AIP-69: Airflow Core adjustments for introduction of Remote Executor AIP-69: Airflow Core adjustments for introduction of Edge Executor Aug 29, 2024
@potiuk
Copy link
Member

potiuk commented Sep 1, 2024

Needs to be cherry-picked to v2-10-test I guess

@potiuk
Copy link
Member

potiuk commented Sep 1, 2024

(and tests are failing too)

@ephraimbuddy
Copy link
Contributor

I am -0 on these changes. I will not block them if the community decides this is best, but I think changing a naming paradigm that has been in the community for years is not the best approach if we proceed with the user in mind. I think it's incumbent on the new feature to settle on a name that does not clash with existing names. This way users don't have the whiplash of a new paradigm to learn at the same time as relearning the old and getting confused between the two. This name change will cause unnecessary confusion, that is simply not needed. I think distributed is a good name and works perfectly well for the new feature being delivered.

+1 on this; I think the distributed would be better

@potiuk
Copy link
Member

potiuk commented Sep 3, 2024

Hey @jscheffl -> I wanted to make an attempt to remove AIP-44 completely from main/ Airflow 3 - which means likely that we will have to do something about those changes for Edge Executor in main.

Do you think the provider can be impletemented and built in main even if we remove all AIP-44-related things from main?

@jscheffl
Copy link
Contributor Author

jscheffl commented Sep 4, 2024

Hey @jscheffl -> I wanted to make an attempt to remove AIP-44 completely from main/ Airflow 3 - which means likely that we will have to do something about those changes for Edge Executor in main.

Do you think the provider can be impletemented and built in main even if we remove all AIP-44-related things from main?

Hi @potiuk AIP-44 is the base for the Executor. If you want to wipe the codebase and slim it down (freshest legacy) then we would need to filter the provider from testing on main - and just keep testing the provider against 2.10 compat. There might be multiple options like (1) filtering it in the list in breeze, (2) Implement a pytest.skip based on the airflow version string.

I would favor (of course) if we keep AIP-44 at least for a moment until AIP-72 matures a bit to make it usable. I would assume once AIP-72 is available as a first MVP we can change to code in a way that provider loads modules from v2 compat for Airflow 2 and "new AIP-72" compatible bindings if Airflow 3 (Similar like we did it in the past with Pydantic 1/2 switch).

How much pressure do you feel to remove AIP-44?
I assume your question is also for a full cleaning of all AIP-44 traces? Some "pieces" could be removed early like the internal API apiserver - but of course the @internal_api decorator logic must be sitting in the core and on the exposed API functions.

@potiuk
Copy link
Member

potiuk commented Sep 5, 2024

How much pressure do you feel to remove AIP-44?

I personaly don't feel much pressure - other than all the "peer" presure where other deprecations being actively removed as we speak.

Also I am going for almost 3 weeks holidays after the Summit, so it's less for me and more for all the work that is going to happen around AIP-72 especially while I am away.

There is quite some overlap - a lot of code for AIP-72 is going to involve refactorings and rewrites of pretty much the same parts as AIP-44 touched - it will be on a higher level than AIP-44 did. While AIP-44 went very "low level" - and replaced and extracted individual DB transactions from the code, AIP-72 is going to implement a way higher "abstraction" as regular API calls, where the operations - task and daf file parsing, and callbacks - will be wrapped in higher-level API calls.

But I imagine a lot of the code/ implementation will be about pretty much the same code (and your concerns about Edge worker switching to using the APIs of AIP-72 are only confirming that)

I thought that removing AIP-44 code first (and keeping tests running) would remove quite a lot of noise from AIP-72 work, so I think it might be worth doing before.

But ultimately, it's up to @ashb POC (which we have not seen yet) and details how the implementation of AIP-72 will be done (Replace? Copy? Running old DB functionality and new code in parallel? Duplicating the code first in separate projects before proceeding with AIP-72 implementation).

So I think the question "Should we remove AIP-44 first or is it ok to leave it for now" is more a question on how AIP-72 process will look like - and as such only @ashb can answer it properly I think.

@ashb
Copy link
Member

ashb commented Sep 5, 2024

Certainly leaving AIP-44 in main for now makes total sense.

Once I'm back from the Summit I plan on turning my isolated POC code into draft PRs etc for proper review, and then we can decide, but I do think AIP-44 etc should be removed before Airflow 3 hits beta.

@jscheffl
Copy link
Contributor Author

jscheffl commented Sep 5, 2024

Certainly leaving AIP-44 in main for now makes total sense.

Once I'm back from the Summit I plan on turning my isolated POC code into draft PRs etc for proper review, and then we can decide, but I do think AIP-44 etc should be removed before Airflow 3 hits beta.

@potiuk @ashb Thanks for the feedback. So if there is no immediate pressure and also some time that would be great. Looking forward for the first AIP-72 impressions and also once migrating over I'd offer to contribute to AIP-44 removal/cleaning. Of course BEFORE any Airflow 3 Beta this should be made.

@jscheffl jscheffl force-pushed the feature/aip-69-remote-executor-core branch from 98a5b2a to bd5fb2c Compare September 5, 2024 22:13
@jscheffl jscheffl force-pushed the feature/aip-69-remote-executor-core branch from bd5fb2c to 1f9eace Compare September 18, 2024 21:22
@jscheffl jscheffl force-pushed the feature/aip-69-remote-executor-core branch 2 times, most recently from 85887c1 to 3028164 Compare October 10, 2024 19:45
@jscheffl jscheffl force-pushed the feature/aip-69-remote-executor-core branch from 3028164 to 3727953 Compare October 17, 2024 21:50
@jscheffl jscheffl force-pushed the feature/aip-69-remote-executor-core branch from 3727953 to a8c6c9d Compare October 18, 2024 04:56
@jscheffl jscheffl merged commit 2531338 into apache:main Oct 18, 2024
41 checks passed
jscheffl added a commit to jscheffl/airflow that referenced this pull request Oct 19, 2024
harjeevanmaan pushed a commit to harjeevanmaan/airflow that referenced this pull request Oct 23, 2024
PaulKobow7536 pushed a commit to PaulKobow7536/airflow that referenced this pull request Oct 24, 2024
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-69 Edge Executor area:Executors-core LocalExecutor & SequentialExecutor kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants