What's Changed
- scheduler: refine reservation msg if matched filtered out by other plugins by @saintube in #2085
- Add ZiMengSheng into approvers alias by @ZiMengSheng in #2087
- scheduler: fix reservation plugin clone concurrent read write by @buptcozy in #2084
- scheduler: fix elastic quota TestPlugin_OnPodDelete random panic by @buptcozy in #2083
- scheduler: change reservation event message format by @zwzhang0107 in #2090
- scheduler: add node info in reservation level event by @zwzhang0107 in #2094
- scheduler: support consume reserved numa resource by @ZiMengSheng in #2080
- scheduler: add new pod estimate with loadaware plugin by @zwForrest in #1992
- koord-descheduler: limit the total number of pod evictions by @zwForrest in #2091
- chore: fix e2e configs after the load-aware scheduling updated by @saintube in #2101
- koord-descheduler: balance prod pods between nodes in LowNodeLoad by @zwForrest in #2066
- chore: fix low node load test after prod thresholds added by @songtao98 in #2103
- koord-scheduler: Pod updates should not update the timestamp by @zwForrest in #2100
- koord-descheduler: fix object limiter by move it to reconciler by @songtao98 in #2088
- koord-descheduler: enhance LowNodeLoad scorer by @LY-today in #2092
- koord-descheduler: support evicting all bare pods in migration controller by @songtao98 in #2102
- koordlet: update resctrl qos proposal by @kangclzjc in #2079
- koordlet: add taskids in statesinformer by @kangclzjc in #2057
- koord-manager: fix the prom metrics handler by @saintube in #2107
- koord-descheduler: fix descheduler log by @zwForrest in #2114
- scheduler: support pod request exact match reservation by @ZiMengSheng in #2121
- scheduler: reduce cycleState overhead for reservation by @saintube in #2120
- scheduler: fix reserveNUMAResource bug by @ZiMengSheng in #2124
- koordlet: add allpods reconciler func by @kangclzjc in #2112
- scheduler: reduce deviceShare memory overhead by @ZiMengSheng in #2126
- webhook: add quota evaluate webhook by @shaloulcy in #2129
- koordlet: upgrade nri to 0.6.1 by @kangclzjc in #2132
- koord-manager: support manually configured default cpu normalization settings by @yangfeiyu20102011 in #2128
- koordlet: tc plugin for netqos by @lucming in #1920
- chore: fix koordlet dockerfile by @saintube in #2134
- koord-descheduler: Fix threshold initialization for prod. by @zwForrest in #2130
- scheduler: support device topology strategy by @ZiMengSheng in #2133
- scheduler: optimize when only 1 reservation/node by @ZiMengSheng in #2136
- koord-descheduler: add migration object limiter for namespace by @songtao98 in #2068
- webhook: revise elasticquota mutating when validating disabled by @saintube in #2135
- koord-descheduler: support max migrating globally limitation by @songtao98 in #2143
- scheduler: skip uncessary Filter&Score plugin by @ZiMengSheng in #2138
- all: support priority and preemption policy transformer by @saintube in #2137
- scheduler: elastic quota ignore terminating pod by @shaloulcy in #2141
- koord-descheduler: fix maxMigratingGlobally arg json tag by @songtao98 in #2144
- scheduler:support multi gpu share by @AdrianMachao in #2127
- scheduler: return framework.Unschedulable when cpuPolicy not satisfied by @ZiMengSheng in #2146
- koord-manager: fix invalid assignments in the resource amplification … by @yangfeiyu20102011 in #2149
- koordlet: add some information to improve log readability by @yangfeiyu20102011 in #2153
- ci: add disk GC for E2E jobs by @saintube in #2157
- scheduler: fix panic when NUMANode equals -1 by @ZiMengSheng in #2151
- scheduler: fix too many ut log by @ZiMengSheng in #2154
- scheduler: support scheduler config v1 by @AdrianMachao in #2167
- scheduler: add coscheduling preEnqueue by @AdrianMachao in #2155
- koordlet: add record events by @kangclzjc in #2162
- koordlet: fix unsafe conversion in net_cls by @saintube in #2173
- scheduler: assure only gpu will allocate by topology by @ZiMengSheng in #2178
- schedler: move gang OnceSatified to gangGroupInfo by @buptcozy in #2176
- scheduler: add reservation preemption by @saintube in #2139
- koord-descheduler: update k8s descheduler to 0.28.0 by @songtao98 in #2156
- scheduler: support reservation ignored and nodenumaresource preemption by @saintube in #2163
- koord-manager: support resource amplification config, cpu, memory and other resource by @yangfeiyu20102011 in #2172
- scheduler: fix pod update when pod creation is before quota creation by @shaloulcy in #2177
- scheduler: permit other plugin to set numaAffinity by @ZiMengSheng in #2182
- koord-descheduler: LowNodeLoad check if evicted pod can cause new node over utilized by @songtao98 in #2142
- Update OWNERS_ALIASES by @hormes in #2145
- koord-descheduler: fixes namespace object limiter by @songtao98 in #2160
- scheduler: suppport amd.com/gpu by @ZiMengSheng in #2174
- scheduler: elastic quota ignore terminating pod immediately by @TaoYang526 in #2180
- apis: add reservation name, taints and tolerations in ReservationAffinitty by @zwzhang0107 in #2186
- scheduler: fix panic when nonimatingInfo nil by @ZiMengSheng in #2189
- apis: add protobuf for reservation by @zwzhang0107 in #2192
- scheduler: fix ElasticQuota state's Clone by @saintube in #2190
- koordlet: fix wrong msg in calculateBESuppressCPU by @yangfeiyu20102011 in #2193
- chores(deps): Add support depguard rules in golangci-lint by @dongjiang1989 in #1965
- koord-descheduler: fix migration controller max unavailable computing algorigthm by @songtao98 in #2196
- scheduler: support devices of the same node gpuMem not equal by @ZiMengSheng in #2199
- scheduler: support pod preemption from numa awareless reservation by @ZiMengSheng in #2204
- scheduler: optimize numa affinity store by @ZiMengSheng in #2209
- apis: fix type of MinResources in podgroup by @zwzhang0107 in #2212
- koordlet: Add resctrl runtime hook for pod level by @kangclzjc in #2123
- koord-manager: nodeslo-controller enqueue request when node labels updated by @chengjoey in #2201
- scheduler: optimize performance on transformer extension and Skip status by @saintube in #2211
- koord-manager: add unallocated resource into mid resource. by @tan90github in #2152
- scheduler: support reservation name, taints and tolerations by @saintube in #2207
- scheduler: recover gang check in preFilter by @ZiMengSheng in #2217
- apis: GPU Partition related by @ZiMengSheng in #2219
- scheduler: optimize GPU allocate logic by @ZiMengSheng in #2221
- scheduler: support amd by @ZiMengSheng in #2223
- scheduler: support reserving pods resources by @saintube in #2224
- scheduler: allocate tolerate numa-meaning-less device by @ZiMengSheng in #2226
- scheduler: fix partition binpack disorder by @ZiMengSheng in #2227
- koord-descheduler: fix prod pod excessive eviction when node recover to normal by @JBinin in #2225
- scheduler: fix gpu shared bug by @ZiMengSheng in #2228
- scheduler: support secondary device well planned by @ZiMengSheng in #2229
- scheduler: fix shared gpu pod allocated minor of -1 by @ZiMengSheng in #2230
- scheduler: improve gang log by @googs1025 in #2009
- scheduler: consider pod requests when gpu&RDMA joint allocate by @ZiMengSheng in #2233
- scheduler: optimize reservation perf with lazy restoring by @saintube in #2241
- util: fix incorrect comparison for resource.go#LessThanOrEqualCompletely by @TaoYang526 in #2235
- scheduler: support besteffort policy by @ZiMengSheng in #2243
- scheduler: fix scaling factor 100 for burstable pod by @ZiMengSheng in #2242
- gpu: setting default gpu partition policy by @ZiMengSheng in #2245
- koordlet: add ReadMemoryUsage method to improve generality of cgroup reader interface by @yangfeiyu20102011 in #2251
- koord-manager: refactor oversale resource calculate logic. by @tan90github in #2240
- util: add defaulting to blkio qos to improve robustness by @zqzten in #2238
- webhook: Support setting quota admission to zero for special usage scenarios, such as temporarily pausing pod submissions. by @TaoYang526 in #2237
- scheduler: add downgrade strategy for empty 'aggregated' on cold koor… by @clay-wangzhi in #2239
- koord-manager: mv slo-controller metrics to util by @zwzhang0107 in #2257
- scheduler: fix deviceshare with reservation-ignored pods by @saintube in #2252
- scheduler: setFailedPlugin when plugin transformer failed by @ZiMengSheng in #2262
- koordlet: export host application cpu and memory usage for prometheus by @yangfeiyu20102011 in #2259
- scheduler: update dingtalk QR Code by @ZiMengSheng in #2277
- scheduler: support allocating from reservation when no resource matched by @saintube in #2279
- koordlet: add GetNodeMetricSpec in statesInformer interface by @j4ckstraw in #2278
- api: supply id for minor meaningless device by @ferris-cx in #2250
- koordlet: remove useless stateInformer.impl StateInformer interface by @j4ckstraw in #2286
- Upgrade ginkgo to v2.11.0 by @nce3xin in #2284
- koordlet: support change pod CPUQOS by annotations by @j4ckstraw in #2206
- koordlet: rdma device inject by @ferris-cx in #2285
- koord-manager: consider NodeReserved when calculate mid resource. by @tan90github in #2253
- koordlet: fix reconciler description mismatch. by @tan90github in #2282
- manager: sync rdma resource to node by @ferris-cx in #2249
- koordlet: supply rdma devices by @ZiMengSheng in #2276
- koordlet: add metrics about be used cpu and node used memory by @leason00 in #2283
- scheduler: add log for cpu/mem numaaware allocate by @ZiMengSheng in #2293
- koordlet: fix GeviceNumbers compatibility in mac by @ZiMengSheng in #2295
- koordlet: collect metric for host application memory usage with page cache by @j4ckstraw in #2273
- slo-controller: fix mid resource calculate formula by @lijunxin559 in #2291
- gpu: support strict gpu share with hami by @ZiMengSheng in #2272
- scheduler: make error more clear when numa-aware by @ZiMengSheng in #2305
- scheduler: fix singleNUMANodeExclusive not clone by @ZiMengSheng in #2309
- scheduler: fix UT Fail cause map iterate by @ZiMengSheng in #2311
- scheduler: refactor reset quota by @shaloulcy in #2301
- scheduler: extend schedulerMonitor by @saintube in #2314
- scheduler: register pod delete event for resource plugin by @ZiMengSheng in #2308
- scheduler: delete quota inplace by @shaloulcy in #2312
- scheduler: add quota inplace by @shaloulcy in #2313
- scheduler: revise ReservationFilterPlugin and fix preempting pods resources by @saintube in #2315
- scheduler: fix ReplaceQuotas ut by @shaloulcy in #2318
- scheduler: make topology manager aware device preference by @ZiMengSheng in #2316
- scheduler: fix quota check panic by @shaloulcy in #2321
- scheduler: fix reservation reserved updated by @saintube in #2322
- scheduler: export GPUPartitionIndexOfNVIDIAHopper by @ZiMengSheng in #2324
- koordlet: fix prodReclaimablePredictor result to avoid influence of o… by @lijunxin559 in #2325
- feat: add NodeResourcesFitPlus and ScarceResourceAvoidance plugin by @LY-today in #2302
- koord-manager: add metrics for webhook by @nce3xin in #2330
- scheduler: fix monitor terminating pod by @saintube in #2331
- koordlet: pod resources proxy by @ferris-cx in #2300
- scheduler: some minor optimizations for reservation by @saintube in #2336
- scheduler: move custom informers for extension by @saintube in #2338
- webhook: support elasticquota enable update resource key by @lijunxin559 in #2323
- docs: pod resources proxy by @ferris-cx in #2292
- koordlet: fix podresources not found by @ZiMengSheng in #2344
- koordlet: fix UT for podresources by @ZiMengSheng in #2345
New Contributors
- @LY-today made their first contribution in #2092
- @AdrianMachao made their first contribution in #2127
- @TaoYang526 made their first contribution in #2180
- @dongjiang1989 made their first contribution in #1965
- @chengjoey made their first contribution in #2201
- @JBinin made their first contribution in #2225
- @clay-wangzhi made their first contribution in #2239
- @ferris-cx made their first contribution in #2250
- @nce3xin made their first contribution in #2284
- @lijunxin559 made their first contribution in #2291
Full Changelog: v1.5.0...v1.6.0