Knative Serving release v0.9.0
Pre-releaseMeta
This is “Serving v1” RC2
There is discussion ongoing within the community about how we will message and document that Serving (within constraints) is ready for production workloads, and how we coordinate this with the rest of Knative, which is not yet there.
v1 API
The v1 API shape and endpoint is available starting in this release. Due to potential minimum version constraints this release can be deployed with either just the v1alpha1 endpoint or with all endpoints (v1alpha1, v1beta1, and v1) endpoints enabled. The v1 API shape is usable through all endpoints.
To use the v1beta1 or v1 endpoints, a minimum Kubernetes version of 1.14 is required (1.13.10 also had the fix backported). The minimum required Kubernetes version will become 1.14 in the next release of Knative.
autoscaling.knative.dev/minScale now only applies to routable revisions
We have changed the behavior of minScale to only apply to Revisions that are referenced by a Route. This addresses a long-standing pain point where users used minScale, but Revisions would stick around until garbage collected, which takes at least 10 hours.
Cold Start improvements
We have made some improvement to our cold-start latency, which should result in a small net improvement across the board, but also notably improves:
- Cold-starts that are sequenced (e.g. front-end calls back-end and both cold-start)
- Events with responses (e.g. passing events back to the broker with each hop cold starting)
- The long tail of cold-start latency (this should now be reliably under 10s for small container images)
Autoscaling
Cold Start Improvements #4902 and #3885 (thanks @greghaynes)
The Activator will now send requests directly to the pods when the ClusterIP is not yet ready, providing us with ~200ms latency from the time the pod is ready to the time we send the first request, compared to up to 10s before.
This also fixes a problem where cold start was subject to the 1iptables-min-sync-period of the kubelet (10s on GKE), which created a relatively high floor for cold start times under certain circumstances.
RPS autoscaling #3416 (thanks @yanweiguo and @taragu)
It is possible to drive autoscaling not only by concurrency but also by RPS/QPS/OPS metric, which is a better metric for short and light weight requests (@yanweiguo)
Report RPS metrics (@taragu)
minScale only applies to routable revisions #4183 (thanks @tanzeeb)
Previously Revisions would keep around the minScale instance even when they were no longer routable.
Added Reachability concept to the PodAutoscaler.
Continuous benchmarks are live at https://mako.dev (thanks @mattmoor, @srinivashegde86, @Fredy-Z, @vagababov)
Autoscaler scaledown rate #4993 (thanks @vagababov)
The rate at which the autoscaler scales down revisions can now be limited to a rate configured in config-autoscaler.
Various bug fixes/improvements:
- AutoScaler did not update metric service #5291 (@vagababov)
- SKS goes to Serve mode after Autoscaler restart #5327 (@vagababov)
- Activator scale down problems #5364 (@mattmoor and @yanweiguo)
- TBC is 200 by default now (thanks @vagababov)
- PA now exports desired/actual Pods in the Status (thanks @vagababov)
- Code cleanups, tests stability, etc (@markusthoemmes, @taragu, @savitaashure, etc)
Core API
v1 API #5483, #5259, #5337, #5439, #5559 (thanks @dgerd, @mattmoor)
The v1 API shape and endpoint is available starting in this release. See the "Meta" section for more details.
Validate system annotations #4995 (thanks @shashwathi)
Webhook validation now ensures that serving.knative.dev annotations have appropriate values.
Revisions now have the service.knative.dev/route
label #5048 (thanks @mattmoor)
Revisions are now labeled by the referencing Route to enable querying.
Revision GC refactored into its own reconciler #4876 (thanks @taragu)
Revision reconciliation now occurs separately from Configuration reconciliation.
Surface Deployment failures to Revision status #5077 (thanks @jonjohnsonjr)
DeploymentProgressing and DeploymentReplicaFailure information is propagated up to Revision status. An event is no longer emitted when the deployment times out.
Validate VolumeSources and VolumeProjections #5128 (thanks @markusthoemmes)
We now validate the KeyToPath items in the webhook to ensure that both Key and Path are specified. This prevents potential pod deployments problems.
ContainerConcurrecy default is now configurable #5099 (thanks @taragu, @Zyqsempai)
ContainerConcurrency is now configured through the config-defaults
ConfigMap. Unspecified values will receive the default value, and explicit zero values will receive 'unlimited' concurrency.
Apply Route's labels to the child Ingress #5467 (thanks @nak3)
Labels on the Route will be propagated to the Ingress owned by the Route.
Jitter global resyncs to improve performance at scale #5275 (thanks @mattmoor)
Global resyncs no longer enqueue all objects at once. This prevents latency spikes in reconciliation time and improves the performance of larger clusters.
Improved error messages for readiness probes #5385 (thanks @nak3)
Bug Fixes:
- Fix Revisions stuck in updating when scaled-to-zero #5106 (thanks @tanzeeb)
- Fix Service reconcile when using named Revisions #5547 (thanks @dgerd)
- Skip copying kubectl.kubernetes.io/last-applied-configuration annotation #5202 (thanks @skaslev)
- Image repository credentials now work for image pulling #5477 (thanks @jonjohnsonjr)
- Error earlier if using invalid autoscaling annotations #5412 (thanks @savitaashture)
- Fix potential NPE in Route reconciler #5333 (thanks @mjaow)
- Fix timeoutSeconds=0 to set default timeout #5224 (thanks @nak3)
- Consistent update for Ingress ObservedGeneration #5250 (thanks @taragu)
Test Improvements:
- Fix cgroup test for non-default CPU periods #5322 (thanks @duglin)
- Improve Revision unit test coverage #5248 (thanks @savitaashture)
Networking
Cold start improvement
The activator sends request directly to Pod #3885 #4902 (thanks @greghaynes)
Disable and remove ClusterIngress resources #5024 (thanks @wtam)
Various bug fixes
- Prober ignore Gateways that can’t be probed #5129 (thanks @JRBANCEL)
- Make port name in Gateway unique by adding namespace prefix #5324 (thanks @nak3)
- Activator to handle graceful shutdown correctly #5364 (thanks @mattmoor)
- Route cluster-local visibility should take precedence over placeholder Services #5411 (thanks @tcnghia)
Monitoring
- Upgrade Grafana image to official release 6.3.3 #5288 (thanks @yanweiguo)
- Remove addonmanager labels from monitoring.yaml #5235 (thanks @yanweiguo)
- Make reconciler dashboard a generic one #5247 (thanks @sayanh)
- Report RPS for autoscaler metrics #5238 (thanks @taragu)
- Remove shadowed logging package. #5132 (thanks @markusthoemmes)
- Profiling support #5083 (thanks @mgencur)
- Update log level to run tests on debug level #5071 (thanks @taragu)
- Report Activator request concurrency to metrics backend #4931 (thanks @yanweiguo)
- Add the grafana metric for the excess burst capacity #4820 (thanks @vagababov)
- Export webhook metrics to prometheus #4707 (thanks @anniefu)