-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make TES task scheduling activities more scalable #138
Labels
enhancement
New feature or request
Scalability
Enable users can scale TES workloads
TES Priority: P1
Groomed to a Priority 1 issue
Comments
This was referenced May 17, 2023
Merged
This was referenced Aug 22, 2023
@ngambani We'll know once I've scale tested it. |
With the node runner, we now have events |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New feature or request
Scalability
Enable users can scale TES workloads
TES Priority: P1
Groomed to a Priority 1 issue
Problem:
The task update loop (currently the single largest workload TES does) scales in an exponential manner (based on measurements).
Solution:
The task update loop should scale on a linear scale. See below for suggested solution.
Describe alternatives you've considered
See microsoft/CromwellOnAzure#497. This issue will replace that issue.
Additional context
Currently, the Scheduler service passes each active TesTask to BatchScheduler, which performs various API calls to determine the state of that task in Batch, which it uses to then perform operations on behalf of and/or alter the state of the task. This is performed serially.
The proposed solution is instead to perform the following operations in the following order (assignment of responsibilities TBD):
Examples of 2. include creating pools/jobs (in auto-scale mode, since they are sharable), retrieving image container credentials, etc. The means of combining operations is TBD.
The text was updated successfully, but these errors were encountered: