Skip to content

Releases: Cloudzero/cloudzero-charts

1.1.0-beta-1

19 Mar 00:33
Compare
Choose a tag to compare

1.1.0-beta-1 (2025-03-18)

Initial (beta) release of the new CloudZero Aggregator.

Upgrade Steps

  • Upgrade with:
helm upgrade --install -n cloudzero-agent cloudzero-beta -f configuration-example.yaml

See the beta installation instructions for further detail

Bug Fixes

  • Update nodeSelector settings: The nodeSelector is now available for the initCertJob and initBackfillJob jobs.
  • nodeSelector, tolerations, and affinity settings moved: These settings have now moved to the insightsController.server section.

Improvements

  • CloudZero Aggregator: The CloudZero Aggregator (affectionately known as "The Gator") is a new component that sits between the CloudZero Agent and the CloudZero Platform. The Gator aggregates metrics into a local cache before sending them in larger batches to the CloudZero Platform. This provides substantial improvements in reliability, performance, disaster recovery, user-friendliness, and more.
  • Reduce scrape interval:: The scrape interval was previously set to every 2 minutes, this has been reduced to every 1 minute.

1.0.2

18 Mar 20:51
Compare
Choose a tag to compare

Release 1.0.2 (2025-03-18)

This release fixes an issue with helm chart templating, as well as makes an improvement to the sampling rate of the Prometheus agent.

Upgrade Steps

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.2

Bug Fixes

  • Node Scheduling Settings Fixed: Fixes an issue in which the initCertJob did not have the option to set nodeSelector, affinity, or tolerations. Additionally, these settings can now be set for each initialization Job individually.
  • Values File Documentation Fixed: Fixes an issue in which the node scheduling settings for the insightsController were indented to the wrong level.

Improvements

  • Default Scrape Interval Set to 60s: The default scrape_interval setting used by the internal Prometheus agent is updated from 120s to 60s. This improvement makes it more likely that the agent captures usage information for short-lived pods.

1.0.1

11 Mar 17:18
Compare
Choose a tag to compare

Release 1.0.1 (2025-03-02)

This release fixes two issues relating to template rendering and TLS certificate generation, as well as adding documentation for Istio enabled clusters. In addition, some other bug fixes around prometheus metrics, logging, and sqlite were added.

Upgrade Steps

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.1

Bug Fixes

  • Webhook Resource Names Trimmed Appropriately: Fixes an issue in which the name used by webhook resources adds a suffix after trimming, which can potentially allow resource names that violate Kubernetes naming rules.
  • Certificate Generation Runs For All Webhook Configuration Changes: Fixes an issue in which the TLS certificate generation initialization Job does not run if a ValidatingWebhookConfiguration is created after initial installation.
  • Invalid Prometheus Metric Label Name: Fixes an issue where supplying an invalid label name to a Prometheus metric causes a panic.
  • Utilization of Default Kubernetes Logger: Removes the last utilization of the default Kubernetes logger, which causes logging levels defined in the configuration to not be respected.

Improvements

  • Shorter TTL for init-cert Job: The init-cert Job is now cleaned up after 5 seconds, so that repeated installations regenerate certificates as needed.
  • Improvements to SQLite Testing: The SQLite connection string was edited for improved clarity, and a concurrency test was added.
  • Various Logging Changes: Some logging messages were downgraded from info to debug.

1.0.0-rc4

17 Feb 13:53
Compare
Choose a tag to compare

Release 1.0.0-rc4 (2025-02-16)

This release makes improvements to the certificate initialization Job so that more invalid states can be rectified. Additionally, annotations can now be added to initialization Jobs. Expiration of both initialization Jobs is not configurable.

Upgrade Steps

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc4

See upgrades.md for full documentation of upgrade behavior as it relates to initialization Jobs.

Improvements

  • Certificate Initialization Job Checks For More Invalid Conditions: The certificate initialization job now checks for certificates with invalid SAN settings, mismatches between webhook configurations, and mismatches between the webhook caBundle value and the ca.crt value in the TLS secret.

  • Automatic Job Cleanup Configuration: TTL for both initialization Jobs is now configurable, and defaults to 180 seconds.

  • Initialization Job Annotation Support: Both initialization Jobs allow the user to set annotations. This was specifically added to make management via ArgoCD easier, as ArgoCD will consider expired Jobs to be OutOfSync with the release source. See upgrades.md for details on recommended annotations.

1.0.0

17 Feb 15:50
Compare
Choose a tag to compare

Release 1.0.0 (2025-02-17)

This release introduces native Kubernetes Labels and Annotations support to the CloudZero platform. You can now identify Kubernetes dimensions based on the Labels and Annotations used in your Kubernetes deployments.

New Features

  • Kubernetes Labels and Annotations: Enhance your ability to categorize and manage resources by leveraging Labels and Annotations directly within the CloudZero platform.

Configuration Changes

To take advantage of these new features, update your Helm chart configuration as outlined below.

Example example-override-values.yaml File:

# -- UNCHANGED: Cloud Service Provider Account ID
#    This must be a string - even if it is a number in your system.
#    Adding a new line here is an easy workaround.
cloudAccountId: |-
  null

# -- UNCHANGED: The Cluster name
clusterName: null

# -- UNCHANGED: The Cloud Service Provider Region
region: null

# -- UNCHANGED: CloudZero API key. Required if existingSecretName is null.
apiKey: null

# -- UNCHANGED: If set, the agent will use the API key in this Secret to authenticate with CloudZero.
existingSecretName: null

# -- NEW: Flag to deploy the Jetstack.io "cert-manager". Most environments will already have this deployed,
#    so set this to "false" if applicable. Otherwise, enabling this to "true" is a quick way to get started.
#    See the README for more information.
cert-manager:
  # -- DEFAULT: enabled.
  enabled: true | false

# -- NEW: Service Account used for the Insights Controller
#    The account is required. If you already have an existing account, set the name in the field below.
serviceAccount:
  # -- DEFAULT: create the service account.
  create: true | false
  name: ""
  annotations: {}

# -- NEW: Label and Annotation Configuration
insightsController:
  # -- By default, a ValidatingAdmissionWebhook will be deployed to record all created labels and annotations.
  enabled: true | false
  labels:
    # -- DEFAULT: enabled.
    enabled: true | false
    # -- This value MUST be set to a list of regular expressions used to gather labels from pods,
    #    deployments, statefulsets, daemonsets, cronjobs, jobs, nodes, and namespaces.
    patterns:
      # List of Go-style regular expressions used to filter desired labels.
      # Caution: The CloudZero system has a limit of 300 labels and annotations,
      # so it is advisable to provide a specific list of required labels.
      - '.*'
  annotations:
    # -- DEFAULT: disabled.
    enabled: true | false
    patterns:
      # List of Go-style regular expressions used to filter desired annotations.
      # Caution: The CloudZero system has a limit of 300 labels and annotations,
      # so it is advisable to provide a specific list of required annotations.
      - '.*'

Upgrade Instructions

If you have an existing CloudZero Agent deployment, follow these steps to upgrade:

  1. Define the values.yaml Override Configuration:

    Ensure your values.yaml override configuration includes the new settings outlined above. Note that some existing values may no longer be necessary.

  2. Update the Helm Chart Repository:

    helm repo add cloudzero https://cloudzero.github.io/cloudzero-charts
    helm repo update
  3. Upgrade the Deployment:

    helm upgrade --install <YOUR_RELEASE_NAME> -n <YOUR_NAMESPACE> cloudzero -f override-values.yaml

    Replace <YOUR_RELEASE_NAME> with the name you used to release the chart into your environment.

    Replace <YOUR_NAMESPACE> with the namespace you used for your deployment.

Deprecations and Breaking Changes

  1. node-exporter Deprecation:

    The node-exporter has been deprecated and is no longer used.

  2. External kube-state-metrics Deprecation:

    External kube-state-metrics has been deprecated. We now deploy an instance within the CloudZero Agent deployment named cloudzero-state-metrics, which is not discoverable by other monitoring platforms and ensures the necessary configuration is defined for telemetry collection requirements. If you host the images in a private image repository, you can override the following in the values.yaml file:

    kubeStateMetrics:
      image:
        registry: registry.k8s.io
        repository: kube-state-metrics/kube-state-metrics
  3. API Key Management Argument Relocation:

    • API key management arguments have moved to the global section.
    • Previously, you could pass an apiKey or existingSecretName argument directly to the chart.
    • These arguments should now be passed as global.apiKey and global.existingSecretName, respectively.

Security Scan Results

Image Scanner Scan Date Critical High Medium Low Negligible
ghcr.io/cloudzero/cloudzero-insights-controller/cloudzero-insights-controller:0.1.0 Grype 2024-12-23 0 0 0 0 0
ghcr.io/cloudzero/cloudzero-agent-validator/cloudzero-agent-validator:0.10.0 Grype 2024-12-23 0 0 0 0 0

Summary of Changes:

  1. Typos and Grammar:

    • Corrected "Annotaitons" to "Annotations".
    • Ensured consistent use of "Go-style" instead of "golang style".
  2. Clarity and Consistency:

    • Enhanced section headings for better readability.
    • Clarified comments within the YAML example for better understanding.
    • Ensured consistent capitalization of terms like "Labels" and "Annotations".
  3. Formatting:

    • Fixed indentation in the kubeStateMetrics YAML snippet.
    • Improved bullet points and indentation for better visual structure.
    • Ensured code blocks and commands are clearly separated from the text.
  4. Additional Notes:

    • Added clearer instructions in the deprecation section for kube-state-metrics.
    • Maintained consistent terminology and formatting throughout the document.

1.0.0-rc3

14 Feb 17:26
Compare
Choose a tag to compare

Release 1.0.0-rc3 (2025-02-13)

This release makes improvements to the upgrade process as it relates to management of the initialization Jobs.

Upgrade Steps

This upgrade should be force installed. Meaning, users managing with helm directly should include the --force flag when upgrading. Alternatively, uninstall and reinstall the helm release. Users managing the release with tools such as ArgoCD should choose an upgrade strategy that does a full replacement.

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc3 --force

See upgrades.md for full documentation of upgrade behavior as it relates to initialization Jobs.

Improvements

  • Certificate Initialization Job Runs Every Upgrade: The certificate initialization job now runs on every upgrade and does a better job of ensuring that the certificate is generated correctly and is being used. This means that the --force flag used in the helm upgrade command will always create a new certificate. Running helm upgrade without --force will not regenerate the certificate.

  • Automatic Job Cleanup: Both initialization jobs are now automatically cleaned up after a period of time, which ensures that Jobs are rerun when appropriate.

  • Certificate Initialization Job ClusterRole: The certificate initialization job now has a dedicated ClusterRole, ClusterRoleBinding, and ServiceAccount. This is done to separate required permissions and only grant PATCH permission to a very narrow resource scope.

1.0.0-rc2

12 Feb 16:07
Compare
Choose a tag to compare

Release 1.0.0-rc2 (2025-02-12)

This release fixes an issue in which the internal TLS certificate could create a SAN field with an incorrect service address.

Upgrade Steps

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc2

Bug Fixes

  • SAN Field Properly Formatted: Previously, users installing the agent in a non-default namespace who were also using the internal TLS certificate generation may have run into an issue in which the certificate is improperly generated. The template now takes the release namespace into account.

1.0.0-rc1

29 Jan 20:55
Compare
Choose a tag to compare

Release 1.0.0-rc1 (2025-01-23)

This release contains several improvements from 1.0.0-beta-10:

  • The name of the initialization Job that gathers information about existing state of a cluster now includes the version of the chart and the image tag used in the Pod.
  • The initScrapeJob field is deprecated in favor of initBackfillJob. However, this is not a breaking change; initScrapeJob can still be used without issue.
  • The server.agentMode boolean argument is now provided.
  • Improvements are made to the resource consumption of the agent-server pod.
  • Metrics from the agent-server pod are made available for monitoring.

Upgrade Steps

Optionally rename the initScrapeJob field in any override files with initBackfillJob. initBackfillJob is the preferred field, but configurations using initScrapeJob will still work.

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc1

Improvements

  • Initialization Job Name Changes With Releases: It was previously possible to have failures in release upgrades if the container image used in the Job changed. This is because the image field in a Job spec is immutable. To prevent this, a new Job is created every time the Helm chart version is changed and/or when the image used in the Job is changed. This also ensures that changes to the underlying insights-controller application will be used in the new backfill of existing cluster state data.

  • Clarified Field Names: The Job used for gathering existing cluster data was previously controlled via a field named initScrapeJob. This is an overloaded term given that this chart also uses the term "scrape job" in the context of Prometheus. This has caused some confusion, so the field is now renamed to initBackfillJob. initScrapeJob is still usable, and values from initScrapeJob are merged with initBackfillJob with the latter having precedence.

  • Easier Debugging: The server.agentMode field can be toggled to false; by default it is set to true so that the Prometheus server runs in agent mode to keep resource usage manageable. Setting the field to false takes the Prometheus server out of agent mode. This is helpful for debugging issues with the Prometheus agent-server.

  • Resource Consumption Reduction: The Prometheus scrape job used to gather metrics from the insights-controller pods now restricts the metrics scraped to ones explicitly set in the values.yaml. This means that the internal TSDB must hold less data.

  • Improved Observability: The agent-server now scrapes itself for metrics and exports them for monitoring by the CloudZero platform. This means that issues within a cluster can be detected much sooner and with greater visibility into the cause of the issue.

1.0.0-beta-10

17 Jan 17:36
Compare
Choose a tag to compare

Release 1.0.0-beta-10 (2025-01-17)

This release adds logic to ensure that the static target used in the env-validator and in the Prometheus configuration always matches the internal Service created by the kube-state-metrics subchart.

Upgrade Steps

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero-beta/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-beta-10

Improvements

  • Static Target and KSM Service Always Match: Both the env-validator and the Prometheus agent require an address for a kube-state-metrics Service. By default, the Service name generated by the kube-state-metrics subchart generates a name that matches the target value generated by the chart.

However, if the user overrides the name of the kube-state-metrics Service using kubeStateMetrics.fullnameOverride, there can be a mismatch between the names. This change attempts to mirror the logic used by the internal kube-state-metrics chart so that the target and Service names will match regardless of user input.

1.0.0-beta-9

15 Jan 22:32
Compare
Choose a tag to compare

Release 1.0.0-beta-9 (2025-01-15)

This release adds the ability to set the log level via the insightsController.server.logging.level field. Additionally, the interval in which data is written to the CloudZero platform and the timeout for writing data are configurable via insightsController.server.send_interval and insightsController.server.send_timeout, respectively. The default timeout is increased from 10s to 1m.

The kube-state-metrics subchart section now explicitly includes container image information. This introduces no functional changes; it is intended to make it clearer to the user which images will be used and from where they will be pulled.

Upgrade Steps

Upgrade using the following command:

helm upgrade --install <RELEASE_NAME> cloudzero-beta/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-beta-9

Bug Fixes

  • KSM Address: Fixes an issue in which the internal kube-state-metrics service address can be templated incorrectly.

Improvements

  • More Configurable Server Settings: The log level, remote write interval, and remote write timeout are now configurable in the chart values. See the insightsController.server section in the values.yaml for more details.
  • Default Setting for Send Timeout: The default remote write timeout is increased to 1m, which allows for backfilling data from larger clusters.
  • Container Image Information Added: The values passed to the internal kube-state-metrics subchart now explicitly set the container image registry, repository, and tag information for the purposes of documentation.