Skip to content

Commit 4e33b9e

Browse files
authored
Merge pull request #796 from hzxuzhonghu/automated-cherry-pick-of-#782-origin-release-0.4
Automated cherry pick of #782: Support scale up and down
2 parents 0f7f8fb + 2aa0049 commit 4e33b9e

File tree

2 files changed

+103
-0
lines changed

2 files changed

+103
-0
lines changed
51.2 KB
Loading

docs/design/job-scale-up-down.md

+103
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Volcano Job scale up and down
2+
3+
@hzxuzhonghu; April 24, 2020
4+
5+
## Motivation
6+
7+
Currently, Volcano does not support Job update. It is not allowed to update the `Job.Spec` on the fly.
8+
However, many users show appeal to run ML training jobs in a elastic manner. For example ModelArts want to dynamically adjust Job's replicas according to the cluster idle capacity
9+
in order to achieve most high efficiency on GPU card.
10+
11+
I propose to support volcano job dynamical scale up/down before more intelligent elasticity in the first step.
12+
13+
## Design
14+
15+
Before this design, let's recall the current Job's initialization
16+
17+
### Job Initialization
18+
19+
When a Volcano job is created, the job controller does the following to run/manage all of its tasks.
20+
21+
1. all the plugins execute OnJobAdd callbacks to create service and hosts configmap, etc
22+
23+
2. create pvc for the job
24+
25+
3. create PodGroup for the job
26+
27+
4. execute plugins' OnPodAdd callbacks to set pod related env, mount hostfile, etc
28+
29+
5. call the kube-apiserver to create pods equals the replicas of the job
30+
31+
All above steps are run in `syncJob`, which is called when external events happen, for this it happens when Job is newly created.
32+
33+
### Volcano Job Scale Up/Down
34+
35+
The Job's scale up and down correlates to reconciling of the resources the job owns, like PVC/PodGroup/Service/HostFile ConfigMap
36+
so the procedure is kind of similar to the [Job Initialization](#Job Initialization).
37+
38+
The differences are:
39+
40+
1. job plugins' callbacks:only the `svc` plugin should update the configmap including the job tasks
41+
42+
2. create pods when scale up
43+
44+
3. delete pods when scale down
45+
46+
However, only when the job is not started, the initialization is run.
47+
So we need a way to know whether it is a scale up/down event that triggered this round of sync.
48+
49+
The way I propose is to add a new event `JobUpdatedEvent` to indicate that the job is updated(here only cares about the scale up/down).
50+
And accordingly add a new action `UpdateJobAction` to run `UpdateJob` function. And the overall workflow is:
51+
![workflow](images/Job-scale-up-down.PNG)
52+
53+
To scale up/down on the fly, Volcano should be responsible to notify the original pods the current status, including the hosts of all the pods.
54+
This is done by plugins, so to distinguish from the initialization phase, a new `OnJobUpdate` is introduced.
55+
It is to reconcile all the associated configs of the job. Currently, the `svc` plugin should update the configmap of all the hosts.
56+
57+
**NOTE**:
58+
59+
1. Users should watch the `/etc/volcano` to get the up-to-date hosts files if they want to be aware of the training workers.
60+
61+
2. The env `VC_{task name}_HOSTS` `VC_{task name}_NUM` of the existing pods can not be mutated on the fly, so be careful not to use it.
62+
63+
```
64+
type PluginInterface interface {
65+
// The unique name of Plugin.
66+
Name() string
67+
68+
// for all pod when createJobPod
69+
OnPodCreate(pod *v1.Pod, job *vcbatch.Job) error
70+
71+
// do once when syncJob
72+
OnJobAdd(job *vcbatch.Job) error
73+
74+
// do once when killJob
75+
OnJobDelete(job *vcbatch.Job) error
76+
77+
OnJobUpdate(job *vcbatch.Job) error
78+
}
79+
```
80+
81+
`UpdateJob` is much like the current `SyncJob`, and it's workflow is:
82+
83+
1. all the plugins execute OnJobUpdate callbacks, which is to update all the envs, service and hosts configmap.
84+
85+
2. create pvc for the job if necessary
86+
87+
3. update PodGroup for the job if necessary
88+
89+
4. execute plugins' OnPodAdd callbacks to set pod related env, mount hostfile, etc
90+
91+
5. call the kube-apiserver to create/delete pods equals the replicas of the job
92+
93+
94+
**Note**: when scale down, the pod delete order is from the larger indexed to the lower indexed. But this is not guaranteed as Kubernetes is a eventual consistent system.
95+
96+
97+
98+
### Admission webhook
99+
100+
Should prevent invalid mutating Job Spec on the fly. In this proposal, we only allow `replicas` and `minAvailable` update. Any other spec changes will be prohibited.
101+
It is also not allowed if the number of total replicas is less than the `minAvailable`.
102+
103+
`minAvailable` must be greater than zero, we depend on it to maintain the job status.

0 commit comments

Comments
 (0)