|
| 1 | +# Queue |
| 2 | + |
| 3 | +[@k82cn](http://github.com/k82cn); April 17, 2019 |
| 4 | + |
| 5 | +## Motivation |
| 6 | + |
| 7 | +`Queue` was introduced in [kube-batch](http://github.com/kubernetes-sigs/kube-batch) long time ago as an internal feature, which makes all jobs are submitted to the same queue, named `default`. As more and more users would like to share resources with each other by queue, this proposal is going to cover primary features of queue achieve that. |
| 8 | + |
| 9 | +## Function Specification |
| 10 | + |
| 11 | +The queue is cluster level, so the user from different namespaces can share resource within a `Queue`. The following section defines the api of queue. |
| 12 | + |
| 13 | +### API |
| 14 | + |
| 15 | +```go |
| 16 | +type Queue struct { |
| 17 | + metav1.TypeMeta `json:",inline"` |
| 18 | + |
| 19 | + metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` |
| 20 | + |
| 21 | + // Specification of the desired behavior of a queue |
| 22 | + // +optional |
| 23 | + Spec QueueSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"` |
| 24 | + |
| 25 | + // Current status of Queue |
| 26 | + // +optional |
| 27 | + Status QueueStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` |
| 28 | +} |
| 29 | + |
| 30 | +type QueueSpec struct { |
| 31 | + // The weight of queue to share the resources with each other. |
| 32 | + Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"` |
| 33 | +} |
| 34 | + |
| 35 | +type QueueStatus struct { |
| 36 | + // The number of job in Unknown status |
| 37 | + Unknown int32 `json:"running,omitempty" protobuf:"bytes,1,opt,name=running"` |
| 38 | + // The number of job in Running status |
| 39 | + Running int32 `json:"running,omitempty" protobuf:"bytes,2,opt,name=running"` |
| 40 | + // The number of job in Pending status |
| 41 | + Pending int32 `json:"pending,omitempty" protobuf:"bytes,3,opt,name=pending"` |
| 42 | + // The number of job in Completed status |
| 43 | + Completed int32 `json:"completed,omitempty" protobuf:"bytes,4,opt,name=completed"` |
| 44 | + // The number of job in Failed status |
| 45 | + Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"` |
| 46 | + // The number of job in Aborted status |
| 47 | + Aborted int32 `json:"aborted,omitempty" protobuf:"bytes,6,opt,name=aborted"` |
| 48 | +} |
| 49 | +``` |
| 50 | + |
| 51 | +### QueueController |
| 52 | + |
| 53 | +The `QueueController` will manage the lifecycle of queue: |
| 54 | + |
| 55 | +1. Watching `PodGroup`/`Job` for status |
| 56 | +2. If `Queue` was deleted, also delete all related `PodGroup`/`Job` in the queue |
| 57 | + |
| 58 | +### Admission Controller |
| 59 | + |
| 60 | +The admission controller will check `PodGroup`/`Job` 's queue when creation: |
| 61 | + |
| 62 | +1. if the queue does not exist, the creation will be rejected |
| 63 | +2. if the queue is releasing, the creation will be also rejected |
| 64 | + |
| 65 | +### Feature Interaction |
| 66 | + |
| 67 | +#### Customized Job/PodGroup |
| 68 | + |
| 69 | +If the `PodGroup` is created by customized controller, the `QueueController` will count those `PodGroup` into `Unknown` status; because `PodGroup` focus on scheduling specification which did not include customized job's status. |
| 70 | + |
| 71 | +#### cli |
| 72 | + |
| 73 | +Command line is also enhanced for operator engineers. Three sub-commands are introduced as follow: |
| 74 | + |
| 75 | +__create__: |
| 76 | + |
| 77 | +`create` command is used to create a queue with weight; for example, the following command will create a queue named `myqueue` with weight 10. |
| 78 | + |
| 79 | +```shell |
| 80 | +$ vkctl queue create --name myqueue --weight 10 |
| 81 | +``` |
| 82 | + |
| 83 | +__view__: |
| 84 | + |
| 85 | +`view` command is used to show the detail of a queue, e.g. creation time; the following command will show the detail of queue `myqueue` |
| 86 | + |
| 87 | +```shell |
| 88 | +$ vkctl queue view myqueue |
| 89 | +``` |
| 90 | + |
| 91 | +__list__: |
| 92 | + |
| 93 | +`list` command is used to show all available queues to current user |
| 94 | + |
| 95 | +```shell |
| 96 | +$ vkctl queue list |
| 97 | +Name Weight Total Pending Running ... |
| 98 | +myqueue 10 10 5 5 |
| 99 | +``` |
| 100 | + |
| 101 | +#### Scheduler |
| 102 | + |
| 103 | +* Proportion plugin: |
| 104 | + |
| 105 | + Proportion plugin is used to share resource between `Queue`s by weight. The deserved resource of a queue is `(weight/total-weight) * total-resource`. When allocating resources, it will not allocate resource more than its deserved resources. |
| 106 | + |
| 107 | +* Reclaim action: |
| 108 | + |
| 109 | + `reclaim` action will go through all queues to reclaim others by `ReclaimableFn`'s return value; the time complexity is `O(n^2)`. In `ReclaimableFn`, both `proportion` and `gang` will take effect: 1. `proportion` makes sure the queue will not be under-used after reclaim, 2. `gang` makes sure the job will not be reclaimed if its `minAvailable` > 1. |
| 110 | + |
| 111 | +* Backfill action: |
| 112 | + |
| 113 | + When `allocate` action assign resources to each queue, there's a case that ([kube-batch#492](<https://github.com/kubernetes-sigs/kube-batch/issues/492>)) the resources maybe unnecessary idle because of `proportion` plugin: there are one pending job in two queue each, and the deserved resources of each queue can not meet the requirement of their jobs. In such case, `backfill` action will ignore deserved guarantee of queue to fill idle resources as much as possible. This introduces another potential case that the coming smaller job is blocked; this case will be handle by reserved resources of each queue in other project. |
0 commit comments