Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new: troubleshooting module - All scenarios #1182

Open
wants to merge 137 commits into
base: main
Choose a base branch
from

Conversation

arcegacardenas
Copy link
Contributor

What this PR does / why we need it:

This is a new module that helps troubleshoot common EKS scenarios.

Which issue(s) this PR fixes:

Fixes #906

Quality checks

  • My content adheres to the style guidelines
  • I ran make test module="<module>" it was successful (see https://github.com/aws-samples/eks-workshop-v2/blob/main/docs/automated_tests.md) I ran each lab within the module individually due to a sts token expiration issue To make able to simulate the scenarios I had to test each individually.
  • The PR has meaningful title and description of the changes that will be included in the workshop release notes

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

arcegacardenas and others added 30 commits July 19, 2024 16:03
…es#1023)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…ples#1024)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…les#1041)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
…samples#1026)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
robisoh88 and others added 29 commits February 7, 2025 12:12
…bsite guide for similar structure and language with eksworkshopv2.
… troubleshooting-module-all-scenarios-robisoh88
…egacardenas/eks-workshop-v2 into troubleshooting-module-all-scenarios
…aml port section for troubleshooting workernode scenario three
- Modified cluster name from var.addon_context.eks_cluster_id to var.eks_cluster_id
- Modified entire three module guide and its written format. Made minor changes to commands so output is more efficient and less verbose.
- had to update ssm.sh script path on the guide to absolute path
- fixed cleanup script to include deleting new-cni-nodegroup
- identified issue where aws-node pod was not in pending state. Source issue was the node type. Modified nodetype for the new manage nodegroup from m5.large to c5.large.
- Modified cluster name from var.addon_context.eks_cluster_id to var.eks_cluster_id
- Modified entire three module guide and its written format. Made minor changes to commands so output is more efficient and less verbose.
- Updated ssm.sh script path on the guide to absolute path
- Fixed cleanup script to include deleting new-cni-nodegroup
- Identified issue where aws-node pod was not in pending state. Source issue was the node type. Modified nodetype for the new manage nodegroup from m5.large to c5.large.
- Tested and confirmed all pass 'make test module'

- Reviewed and modified DNS module guides to fit format better
- Reviewed and modified Pod Module guides to fit format better
- Tested against 'pre-commit run --show-diff-on-failure --color=always --all-files'
Testing troubleshooting worker node module

- Had to update metrics-server.yaml and fix the deployment port from 4443 to 10250. This was causing metrics server pods to not run properly.

Testing troubleshooting Pod modules

- Updated guide Step 1 to include pod label, so it only outputs the efs-app pod.
- Updated step 2 describe pod command to grep 'efs'. Since this exports pod name, previsouly it was exporting multiple pods in default namespace causing the scripts on other the steps to fail.
- Modified step 5 get output since it was still pointint to  variable which was reset by this time. Added pod lable to select the correct pod.

Testing troubleshooting scenario for DNS

- updated main page prepare environment script from 'prepare-environment troubleshooting/dns' to 'prepare-environment'. This scenario does not use terraform scripts and needs environment to be in original prepare-environment state.
- updated kubectl get command for deployment CoreDNS to coredns on Step 3 of first scenario. This was causing a deployment not found message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new: Troubleshooting Lab
9 participants