Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Kubernetes Metadata Extension #1583

Open
wants to merge 8 commits into
base: feature-custom-metrics-entity
Choose a base branch
from

Conversation

musa-asad
Copy link
Contributor

@musa-asad musa-asad commented Mar 6, 2025

Description of the issue

To support the Explore related feature in CloudWatch, the CloudWatch Agent sends an "Entity", which includes relevant metadata to correlate metrics or logs between resources (e.g., an EKS cluster) and services (e.g., a Java application). When the CloudWatch Agent runs in a Kubernetes cluster, we need to collect the namespace, workload name, and node name to populate the "Entity".

However, we currently only get Kubernetes metadata when Application Signals is enabled. For OTLP custom metrics, if Application Signals isn't configured, then we don't have a way to fetch Kubernetes metadata. To achieve this, we must migrate the logic the Application Signals Processor uses to get Kuberbetes metadata to a global extension, which can be used by other pipelines.

Description of changes

  • Implement Kubernetes Metadata Extension (extension/k8smetadata)
    • Add README.md to document how the extension works.
    • Add config.go and factory.go barebone files in order to use in translation for the OTEL configuration.
    • Add extension.go, which sets up an EndpointSlice watcher and Pod IP → {Workload, Namespace, Node} mappings. The GetPodMetadata() method returns the respective mapping for a given Pod IP.
  • Replicate endpointslicewatcher.go and kubernetes_utils.go to the internal/k8sCommon/k8sclient directory and adjust Pod IP → Workload@Namespace logic to Pod IP → {Workload, Namespace, Node}. Unused functionality was removed.
  • Update Entity Processor to Use Extension (awsentity)
    • Implement getPodMeta() function in processor.go, which invokes the Kubernetes Metadata Extension to receive {Workload, Namespace, Node} from a given Pod IP.
    • Update internal/k8sattributescraper/k8sattributescraper.go to use the Kubernetes Metadata Extension to populate workload, namespace, and node instead of resource attributes if those values are present.
  • Add Kubernetes Metadata Extension to Translation for the OTLP Pipeline (translator/translate/otel)
    • Update sample yaml files to include new extension.
    • Create extension/k8smetadata/translator.go to reference the extension in translation.
    • Add translators.Extensions.Set(k8smetadata.NewTranslator()) to pipeline/host/translator.go for OTLP pipeline in Kubernetes contexts.
  • Update service/defaultcomponents/components.go to include "k8smetadata".
  • Add unit tests for new functionality.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  1. Created an EKS cluster and deployed the Amazon CloudWatch Observability EKS add-on.
  2. Set up sample application by following https://aws-otel.github.io/docs/getting-started/adot-eks-add-on/sample-app.
  • Removed resource attributes.
  • Changed OTEL_EXPORTER_OTLP_ENDPOINT to http://cloudwatch-agent.amazon-cloudwatch:4317.
  1. Built the agent image by running make docker-build-amd64 and changed the image in the AmazonCloudWatchAgent custom resource from k8s-metadata-extension branch.
  • Added debug exporter to OTLP pipeline for testing.

Agent Config:

    {
      "agent": {
       "debug": true,
      },
      "logs": {
        "metrics_collected": {
          "otlp": {
            "grpc_endpoint": "0.0.0.0:4317",
            "http_endpoint": "0.0.0.0:4318"
          },
        }
      },
    }

Kubernetes Metadata Extension

Logs:

2025-03-10T06:26:12Z I! {"caller":"k8smetadata/extension.go:70","msg":"Starting EndpointSliceWatcher Run()","kind":"extension","name":"k8smetadata"}                                                                                                                               
2025-03-10T06:26:12Z I! {"caller":"k8smetadata/extension.go:72","msg":"Waiting for EndpointSlice cache to sync...","kind":"extension","name":"k8smetadata"}                                                                                                                        
I0310 06:26:12.913113       1 shared_informer.go:313] Waiting for caches to sync for endpointSliceWatcher                                 I0310 06:26:13.013709       1 shared_informer.go:320] Caches are synced for endpointSliceWatcher                                          2025-03-10T06:26:13Z I! {"caller":"k8sclient/endpointslicewatcher.go:83","msg":"endpointSliceWatcher: Cache synced","kind":"extension","name":"k8smetadata"}
2025-03-10T06:26:13Z I! {"caller":"k8smetadata/extension.go:74","msg":"EndpointSlice cache synced, extension fully started","kind":"extension","name":"k8smetadata"}
2025-03-10T04:57:35Z D! {"caller":"k8sclient/endpointslicewatcher.go:111","msg":"Processing endpoint","kind":"extension","name":"k8smetadata","podName":"sample-app-786d6c49b4-j4vvx","namespace":"default","nodeName":"ip-XXX-XX-XX-XX.us-west-2.compute.internal"}
2025-03-10T04:57:35Z D! {"caller":"k8smetadata/extension.go:93","msg":"GetPodMetadata: found metadata","kind":"extension","name":"k8smetadata","ip":"XXX.XX.XX.XXX","workload":"sample-app","namespace":"default","node":"ip-XXX-XX-XX-XX.us-west-2.compute.internal"}

Debug Exporter Output:
Entity Fields:

com.amazonaws.cloudwatch.entity.internal.type: Str(Service)
com.amazonaws.cloudwatch.entity.internal.service.name: Str(sample-app)
com.amazonaws.cloudwatch.entity.internal.deployment.environment: Str(eks:entity-cluster-2/default)
com.amazonaws.cloudwatch.entity.internal.platform.type: Str(AWS::EKS)
com.amazonaws.cloudwatch.entity.internal.k8s.cluster.name: Str(entity-cluster-2)
com.amazonaws.cloudwatch.entity.internal.k8s.namespace.name: Str(default)
com.amazonaws.cloudwatch.entity.internal.k8s.workload.name: Str(sample-app)
com.amazonaws.cloudwatch.entity.internal.k8s.node.name: Str(ip-XXX-XX-XX-XX.us-west-2.compute.internal)
com.amazonaws.cloudwatch.entity.internal.instance.id: Str(i-006ac48ffed131779)

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@musa-asad musa-asad changed the base branch from main to feature-custom-metrics-entity March 6, 2025 05:16
@musa-asad musa-asad force-pushed the k8s-metadata-extension branch from a5ea567 to 512f3f3 Compare March 6, 2025 05:17
@musa-asad musa-asad closed this Mar 6, 2025
@musa-asad musa-asad force-pushed the k8s-metadata-extension branch from 512f3f3 to 20824f2 Compare March 6, 2025 05:20
@musa-asad musa-asad reopened this Mar 6, 2025
@musa-asad musa-asad self-assigned this Mar 6, 2025
Co-authored-by: Ping Xiang <>
@musa-asad musa-asad force-pushed the k8s-metadata-extension branch 3 times, most recently from e9a7edf to 801cd88 Compare March 10, 2025 06:02
@musa-asad musa-asad marked this pull request as ready for review March 10, 2025 06:30
@musa-asad musa-asad requested a review from a team as a code owner March 10, 2025 06:30
musa-asad and others added 2 commits March 11, 2025 20:49
Co-authored-by: Lisa Guo <lguo25@gmail.com>
Copy link
Contributor

@nathalapooja nathalapooja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OverAll looks good. Added few comments and check the below as well

  1. It says it used 'k8s-attr` branch in overview instead of PR branch
  2. Lets do stress test to make sure we are not throttling the k8s api.

return k8sclient.PodMetadata{}
}
metadata := pm.(k8sclient.PodMetadata)
e.logger.Debug("GetPodMetadata: found metadata",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent should not be logging entity key attributes and attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already log them as part of the EMF exporter so I assumed that was okay, but that's a good point. We should not log them in both cases.

}

const testIP = "1.2.3.4"
expected := k8sclient.PodMetadata{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility that podMetadata has only workload/any one field and remaining as empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, namespace and node name is empty, but workload is present? It's possible, but I don't believe it would happen. I can add testing coverage for it though.

service: "nginx-service",
expected: "nginx-123456",
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a test to fallback to fullpodname

@@ -176,7 +193,7 @@ func (p *awsEntityProcessor) processMetrics(_ context.Context, md pmetric.Metric
}
}
if p.config.KubernetesMode != "" {
p.k8sscraper.Scrape(rm.At(i).Resource())
p.k8sscraper.Scrape(rm.At(i).Resource(), getPodMeta())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add additional unit tests for validating the addition of entity fields by reading from podMeta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants