Releases: Cloudzero/cloudzero-charts
1.1.0-beta-1
1.1.0-beta-1 (2025-03-18)
Initial (beta) release of the new CloudZero Aggregator.
Upgrade Steps
- Upgrade with:
helm upgrade --install -n cloudzero-agent cloudzero-beta -f configuration-example.yaml
See the beta installation instructions for further detail
Bug Fixes
- Update nodeSelector settings: The nodeSelector is now available for the
initCertJob
andinitBackfillJob
jobs. - nodeSelector, tolerations, and affinity settings moved: These settings have now moved to the
insightsController.server
section.
Improvements
- CloudZero Aggregator: The CloudZero Aggregator (affectionately known as "The Gator") is a new component that sits between the CloudZero Agent and the CloudZero Platform. The Gator aggregates metrics into a local cache before sending them in larger batches to the CloudZero Platform. This provides substantial improvements in reliability, performance, disaster recovery, user-friendliness, and more.
- Reduce scrape interval:: The scrape interval was previously set to every 2 minutes, this has been reduced to every 1 minute.
1.0.2
Release 1.0.2 (2025-03-18)
This release fixes an issue with helm chart templating, as well as makes an improvement to the sampling rate of the Prometheus agent.
Upgrade Steps
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.2
Bug Fixes
- Node Scheduling Settings Fixed: Fixes an issue in which the
initCertJob
did not have the option to setnodeSelector
,affinity
, ortolerations
. Additionally, these settings can now be set for each initialization Job individually. - Values File Documentation Fixed: Fixes an issue in which the node scheduling settings for the
insightsController
were indented to the wrong level.
Improvements
- Default Scrape Interval Set to 60s: The default
scrape_interval
setting used by the internal Prometheus agent is updated from120s
to60s
. This improvement makes it more likely that the agent captures usage information for short-lived pods.
1.0.1
Release 1.0.1 (2025-03-02)
This release fixes two issues relating to template rendering and TLS certificate generation, as well as adding documentation for Istio enabled clusters. In addition, some other bug fixes around prometheus metrics, logging, and sqlite were added.
Upgrade Steps
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.1
Bug Fixes
- Webhook Resource Names Trimmed Appropriately: Fixes an issue in which the name used by webhook resources adds a suffix after trimming, which can potentially allow resource names that violate Kubernetes naming rules.
- Certificate Generation Runs For All Webhook Configuration Changes: Fixes an issue in which the TLS certificate generation initialization Job does not run if a
ValidatingWebhookConfiguration
is created after initial installation. - Invalid Prometheus Metric Label Name: Fixes an issue where supplying an invalid label name to a Prometheus metric causes a panic.
- Utilization of Default Kubernetes Logger: Removes the last utilization of the default Kubernetes logger, which causes logging levels defined in the configuration to not be respected.
Improvements
- Shorter TTL for
init-cert
Job: Theinit-cert
Job is now cleaned up after 5 seconds, so that repeated installations regenerate certificates as needed. - Improvements to SQLite Testing: The SQLite connection string was edited for improved clarity, and a concurrency test was added.
- Various Logging Changes: Some logging messages were downgraded from
info
todebug
.
1.0.0-rc4
Release 1.0.0-rc4 (2025-02-16)
This release makes improvements to the certificate initialization Job so that more invalid states can be rectified. Additionally, annotations can now be added to initialization Jobs. Expiration of both initialization Jobs is not configurable.
Upgrade Steps
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc4
See upgrades.md for full documentation of upgrade behavior as it relates to initialization Jobs.
Improvements
-
Certificate Initialization Job Checks For More Invalid Conditions: The certificate initialization job now checks for certificates with invalid SAN settings, mismatches between webhook configurations, and mismatches between the webhook
caBundle
value and theca.crt
value in the TLS secret. -
Automatic Job Cleanup Configuration: TTL for both initialization Jobs is now configurable, and defaults to 180 seconds.
-
Initialization Job Annotation Support: Both initialization Jobs allow the user to set annotations. This was specifically added to make management via ArgoCD easier, as ArgoCD will consider expired Jobs to be OutOfSync with the release source. See upgrades.md for details on recommended annotations.
1.0.0
Release 1.0.0 (2025-02-17)
This release introduces native Kubernetes Labels and Annotations support to the CloudZero platform. You can now identify Kubernetes dimensions based on the Labels and Annotations used in your Kubernetes deployments.
New Features
- Kubernetes Labels and Annotations: Enhance your ability to categorize and manage resources by leveraging Labels and Annotations directly within the CloudZero platform.
Configuration Changes
To take advantage of these new features, update your Helm chart configuration as outlined below.
Example example-override-values.yaml
File:
# -- UNCHANGED: Cloud Service Provider Account ID
# This must be a string - even if it is a number in your system.
# Adding a new line here is an easy workaround.
cloudAccountId: |-
null
# -- UNCHANGED: The Cluster name
clusterName: null
# -- UNCHANGED: The Cloud Service Provider Region
region: null
# -- UNCHANGED: CloudZero API key. Required if existingSecretName is null.
apiKey: null
# -- UNCHANGED: If set, the agent will use the API key in this Secret to authenticate with CloudZero.
existingSecretName: null
# -- NEW: Flag to deploy the Jetstack.io "cert-manager". Most environments will already have this deployed,
# so set this to "false" if applicable. Otherwise, enabling this to "true" is a quick way to get started.
# See the README for more information.
cert-manager:
# -- DEFAULT: enabled.
enabled: true | false
# -- NEW: Service Account used for the Insights Controller
# The account is required. If you already have an existing account, set the name in the field below.
serviceAccount:
# -- DEFAULT: create the service account.
create: true | false
name: ""
annotations: {}
# -- NEW: Label and Annotation Configuration
insightsController:
# -- By default, a ValidatingAdmissionWebhook will be deployed to record all created labels and annotations.
enabled: true | false
labels:
# -- DEFAULT: enabled.
enabled: true | false
# -- This value MUST be set to a list of regular expressions used to gather labels from pods,
# deployments, statefulsets, daemonsets, cronjobs, jobs, nodes, and namespaces.
patterns:
# List of Go-style regular expressions used to filter desired labels.
# Caution: The CloudZero system has a limit of 300 labels and annotations,
# so it is advisable to provide a specific list of required labels.
- '.*'
annotations:
# -- DEFAULT: disabled.
enabled: true | false
patterns:
# List of Go-style regular expressions used to filter desired annotations.
# Caution: The CloudZero system has a limit of 300 labels and annotations,
# so it is advisable to provide a specific list of required annotations.
- '.*'
Upgrade Instructions
If you have an existing CloudZero Agent deployment, follow these steps to upgrade:
-
Define the
values.yaml
Override Configuration:Ensure your
values.yaml
override configuration includes the new settings outlined above. Note that some existing values may no longer be necessary. -
Update the Helm Chart Repository:
helm repo add cloudzero https://cloudzero.github.io/cloudzero-charts helm repo update
-
Upgrade the Deployment:
helm upgrade --install <YOUR_RELEASE_NAME> -n <YOUR_NAMESPACE> cloudzero -f override-values.yaml
Replace
<YOUR_RELEASE_NAME>
with the name you used to release the chart into your environment.Replace
<YOUR_NAMESPACE>
with the namespace you used for your deployment.
Deprecations and Breaking Changes
-
node-exporter
Deprecation:The
node-exporter
has been deprecated and is no longer used. -
External
kube-state-metrics
Deprecation:External
kube-state-metrics
has been deprecated. We now deploy an instance within the CloudZero Agent deployment namedcloudzero-state-metrics
, which is not discoverable by other monitoring platforms and ensures the necessary configuration is defined for telemetry collection requirements. If you host the images in a private image repository, you can override the following in thevalues.yaml
file:kubeStateMetrics: image: registry: registry.k8s.io repository: kube-state-metrics/kube-state-metrics
-
API Key Management Argument Relocation:
- API key management arguments have moved to the
global
section. - Previously, you could pass an
apiKey
orexistingSecretName
argument directly to the chart. - These arguments should now be passed as
global.apiKey
andglobal.existingSecretName
, respectively.
- API key management arguments have moved to the
Security Scan Results
Image | Scanner | Scan Date | Critical | High | Medium | Low | Negligible |
---|---|---|---|---|---|---|---|
ghcr.io/cloudzero/cloudzero-insights-controller/cloudzero-insights-controller:0.1.0 | Grype | 2024-12-23 | 0 | 0 | 0 | 0 | 0 |
ghcr.io/cloudzero/cloudzero-agent-validator/cloudzero-agent-validator:0.10.0 | Grype | 2024-12-23 | 0 | 0 | 0 | 0 | 0 |
Summary of Changes:
-
Typos and Grammar:
- Corrected "Annotaitons" to "Annotations".
- Ensured consistent use of "Go-style" instead of "golang style".
-
Clarity and Consistency:
- Enhanced section headings for better readability.
- Clarified comments within the YAML example for better understanding.
- Ensured consistent capitalization of terms like "Labels" and "Annotations".
-
Formatting:
- Fixed indentation in the
kubeStateMetrics
YAML snippet. - Improved bullet points and indentation for better visual structure.
- Ensured code blocks and commands are clearly separated from the text.
- Fixed indentation in the
-
Additional Notes:
- Added clearer instructions in the deprecation section for
kube-state-metrics
. - Maintained consistent terminology and formatting throughout the document.
- Added clearer instructions in the deprecation section for
1.0.0-rc3
Release 1.0.0-rc3 (2025-02-13)
This release makes improvements to the upgrade process as it relates to management of the initialization Jobs.
Upgrade Steps
This upgrade should be force installed. Meaning, users managing with helm
directly should include the --force
flag when upgrading. Alternatively, uninstall and reinstall the helm release. Users managing the release with tools such as ArgoCD should choose an upgrade strategy that does a full replacement.
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc3 --force
See upgrades.md for full documentation of upgrade behavior as it relates to initialization Jobs.
Improvements
-
Certificate Initialization Job Runs Every Upgrade: The certificate initialization job now runs on every upgrade and does a better job of ensuring that the certificate is generated correctly and is being used. This means that the
--force
flag used in thehelm upgrade
command will always create a new certificate. Runninghelm upgrade
without--force
will not regenerate the certificate. -
Automatic Job Cleanup: Both initialization jobs are now automatically cleaned up after a period of time, which ensures that Jobs are rerun when appropriate.
-
Certificate Initialization Job ClusterRole: The certificate initialization job now has a dedicated ClusterRole, ClusterRoleBinding, and ServiceAccount. This is done to separate required permissions and only grant
PATCH
permission to a very narrow resource scope.
1.0.0-rc2
Release 1.0.0-rc2 (2025-02-12)
This release fixes an issue in which the internal TLS certificate could create a SAN field with an incorrect service address.
Upgrade Steps
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc2
Bug Fixes
- SAN Field Properly Formatted: Previously, users installing the agent in a non-
default
namespace who were also using the internal TLS certificate generation may have run into an issue in which the certificate is improperly generated. The template now takes the release namespace into account.
1.0.0-rc1
Release 1.0.0-rc1 (2025-01-23)
This release contains several improvements from 1.0.0-beta-10
:
- The name of the initialization Job that gathers information about existing state of a cluster now includes the version of the chart and the image tag used in the Pod.
- The
initScrapeJob
field is deprecated in favor ofinitBackfillJob
. However, this is not a breaking change;initScrapeJob
can still be used without issue. - The
server.agentMode
boolean argument is now provided. - Improvements are made to the resource consumption of the agent-server pod.
- Metrics from the agent-server pod are made available for monitoring.
Upgrade Steps
Optionally rename the initScrapeJob
field in any override files with initBackfillJob
. initBackfillJob
is the preferred field, but configurations using initScrapeJob
will still work.
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc1
Improvements
-
Initialization Job Name Changes With Releases: It was previously possible to have failures in release upgrades if the container image used in the Job changed. This is because the
image
field in a Job spec is immutable. To prevent this, a new Job is created every time the Helm chart version is changed and/or when the image used in the Job is changed. This also ensures that changes to the underlyinginsights-controller
application will be used in the new backfill of existing cluster state data. -
Clarified Field Names: The Job used for gathering existing cluster data was previously controlled via a field named
initScrapeJob
. This is an overloaded term given that this chart also uses the term "scrape job" in the context of Prometheus. This has caused some confusion, so the field is now renamed toinitBackfillJob
.initScrapeJob
is still usable, and values frominitScrapeJob
are merged withinitBackfillJob
with the latter having precedence. -
Easier Debugging: The
server.agentMode
field can be toggled tofalse
; by default it is set totrue
so that the Prometheus server runs inagent
mode to keep resource usage manageable. Setting the field tofalse
takes the Prometheus server out of agent mode. This is helpful for debugging issues with the Prometheus agent-server. -
Resource Consumption Reduction: The Prometheus scrape job used to gather metrics from the
insights-controller
pods now restricts the metrics scraped to ones explicitly set in thevalues.yaml
. This means that the internal TSDB must hold less data. -
Improved Observability: The agent-server now scrapes itself for metrics and exports them for monitoring by the CloudZero platform. This means that issues within a cluster can be detected much sooner and with greater visibility into the cause of the issue.
1.0.0-beta-10
Release 1.0.0-beta-10 (2025-01-17)
This release adds logic to ensure that the static target used in the env-validator
and in the Prometheus configuration always matches the internal Service created by the kube-state-metrics
subchart.
Upgrade Steps
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero-beta/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-beta-10
Improvements
- Static Target and KSM Service Always Match: Both the
env-validator
and the Prometheus agent require an address for akube-state-metrics
Service. By default, the Service name generated by thekube-state-metrics
subchart generates a name that matches the target value generated by the chart.
However, if the user overrides the name of the kube-state-metrics
Service using kubeStateMetrics.fullnameOverride
, there can be a mismatch between the names. This change attempts to mirror the logic used by the internal kube-state-metrics
chart so that the target and Service names will match regardless of user input.
1.0.0-beta-9
Release 1.0.0-beta-9 (2025-01-15)
This release adds the ability to set the log level via the insightsController.server.logging.level
field. Additionally, the interval in which data is written to the CloudZero platform and the timeout for writing data are configurable via insightsController.server.send_interval
and insightsController.server.send_timeout
, respectively. The default timeout is increased from 10s
to 1m
.
The kube-state-metrics
subchart section now explicitly includes container image information. This introduces no functional changes; it is intended to make it clearer to the user which images will be used and from where they will be pulled.
Upgrade Steps
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero-beta/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-beta-9
Bug Fixes
- KSM Address: Fixes an issue in which the internal
kube-state-metrics
service address can be templated incorrectly.
Improvements
- More Configurable Server Settings: The log level, remote write interval, and remote write timeout are now configurable in the chart values. See the
insightsController.server
section in thevalues.yaml
for more details. - Default Setting for Send Timeout: The default remote write timeout is increased to
1m
, which allows for backfilling data from larger clusters. - Container Image Information Added: The values passed to the internal
kube-state-metrics
subchart now explicitly set the container image registry, repository, and tag information for the purposes of documentation.