Skip to content

Commit 2090f83

Browse files
committed
Added Queue design doc.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
1 parent dba21aa commit 2090f83

File tree

1 file changed

+113
-0
lines changed

1 file changed

+113
-0
lines changed

docs/design/queue.md

+113
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Queue
2+
3+
[@k82cn](http://github.com/k82cn); April 17, 2019
4+
5+
## Motivation
6+
7+
`Queue` was introduced in [kube-batch](http://github.com/kubernetes-sigs/kube-batch) long time ago as an internal feature, which makes all jobs are submitted to the same queue, named `default`. As more and more users would like to share resources with each other by queue, this proposal is going to cover primary features of queue achieve that.
8+
9+
## Function Specification
10+
11+
The queue is cluster level, so the user from different namespaces can share resource within a `Queue`. The following section defines the api of queue.
12+
13+
### API
14+
15+
```go
16+
type Queue struct {
17+
metav1.TypeMeta `json:",inline"`
18+
19+
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"`
20+
21+
// Specification of the desired behavior of a queue
22+
// +optional
23+
Spec QueueSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"`
24+
25+
// Current status of Queue
26+
// +optional
27+
Status QueueStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"`
28+
}
29+
30+
type QueueSpec struct {
31+
// The weight of queue to share the resources with each other.
32+
Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"`
33+
}
34+
35+
type QueueStatus struct {
36+
// The number of job in Unknown status
37+
Unknown int32 `json:"running,omitempty" protobuf:"bytes,1,opt,name=running"`
38+
// The number of job in Running status
39+
Running int32 `json:"running,omitempty" protobuf:"bytes,2,opt,name=running"`
40+
// The number of job in Pending status
41+
Pending int32 `json:"pending,omitempty" protobuf:"bytes,3,opt,name=pending"`
42+
// The number of job in Completed status
43+
Completed int32 `json:"completed,omitempty" protobuf:"bytes,4,opt,name=completed"`
44+
// The number of job in Failed status
45+
Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"`
46+
// The number of job in Aborted status
47+
Aborted int32 `json:"aborted,omitempty" protobuf:"bytes,6,opt,name=aborted"`
48+
}
49+
```
50+
51+
### QueueController
52+
53+
The `QueueController` will manage the lifecycle of queue:
54+
55+
1. Watching `PodGroup`/`Job` for status
56+
2. If `Queue` was deleted, also delete all related `PodGroup`/`Job` in the queue
57+
58+
### Admission Controller
59+
60+
The admission controller will check `PodGroup`/`Job` 's queue when creation:
61+
62+
1. if the queue does not exist, the creation will be rejected
63+
2. if the queue is releasing, the creation will be also rejected
64+
65+
### Feature Interaction
66+
67+
#### Customized Job/PodGroup
68+
69+
If the `PodGroup` is created by customized controller, the `QueueController` will count those `PodGroup` into `Unknown` status; because `PodGroup` focus on scheduling specification which did not include customized job's status.
70+
71+
#### cli
72+
73+
Command line is also enhanced for operator engineers. Three sub-commands are introduced as follow:
74+
75+
__create__:
76+
77+
`create` command is used to create a queue with weight; for example, the following command will create a queue named `myqueue` with weight 10.
78+
79+
```shell
80+
$ vkctl queue create --name myqueue --weight 10
81+
```
82+
83+
__view__:
84+
85+
`view` command is used to show the detail of a queue, e.g. creation time; the following command will show the detail of queue `myqueue`
86+
87+
```shell
88+
$ vkctl queue view myqueue
89+
```
90+
91+
__list__:
92+
93+
`list` command is used to show all available queues to current user
94+
95+
```shell
96+
$ vkctl queue list
97+
Name Weight Total Pending Running ...
98+
myqueue 10 10 5 5
99+
```
100+
101+
#### Scheduler
102+
103+
* Proportion plugin:
104+
105+
Proportion plugin is used to share resource between `Queue`s by weight. The deserved resource of a queue is `(weight/total-weight) * total-resource`. When allocating resources, it will not allocate resource more than its deserved resources.
106+
107+
* Reclaim action:
108+
109+
`reclaim` action will go through all queues to reclaim others by `ReclaimableFn`'s return value; the time complexity is `O(n^2)`. In `ReclaimableFn`, both `proportion` and `gang` will take effect: 1. `proportion` makes sure the queue will not be under-used after reclaim, 2. `gang` makes sure the job will not be reclaimed if its `minAvailable` > 1.
110+
111+
* Backfill action:
112+
113+
When `allocate` action assign resources to each queue, there's a case that ([kube-batch#492](<https://github.com/kubernetes-sigs/kube-batch/issues/492>)) the resources maybe unnecessary idle because of `proportion` plugin: there are one pending job in two queue each, and the deserved resources of each queue can not meet the requirement of their jobs. In such case, `backfill` action will ignore deserved guarantee of queue to fill idle resources as much as possible. This introduces another potential case that the coming smaller job is blocked; this case will be handle by reserved resources of each queue in other project.

0 commit comments

Comments
 (0)