Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudflare - DNS Challenge Broken #7540

Closed
jkossis opened this issue Jan 31, 2025 · 23 comments · Fixed by #7549
Closed

Cloudflare - DNS Challenge Broken #7540

jkossis opened this issue Jan 31, 2025 · 23 comments · Fixed by #7549
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jkossis
Copy link

jkossis commented Jan 31, 2025

Describe the bug:
Cloudflare is no longer returning zone information in individual dns records. This is now breaking the interaction when cert-manager goes to delete the txt record here.

Of note, while the deprecation shows last November, I just noticed this breaking yesterday. So I imagine they just recently went through with the deprecation on their end.

Expected behaviour:
Deletion of the txt record should be successful, leading to a successful certificate generation.

Steps to reproduce the bug:
Attempt to generate a certificate using cloudflare as the dns challenge provider.

Anything else we need to know?:
As is, generating certificates using cloudflare as the dns challenge provider is broken.

Environment details:

  • Kubernetes version: v1.31.1
  • Cloud-provider/provisioner: N/A
  • cert-manager version: v1.16.3
  • Install method: helm

/kind bug

@cert-manager-prow cert-manager-prow bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 31, 2025
@jkossis
Copy link
Author

jkossis commented Jan 31, 2025

Of note, we could keep cloudFlareRecord struct as is, and just patch the returned record with the zoneID from here.

@onedr0p
Copy link

onedr0p commented Feb 1, 2025

I have noticed this as well on my end as well:

logger="cert-manager.controller" E0201 09:57:32.685127       1 sync.go:132] "error cleaning up challenge" err=<while querying the Cloudflare API for DELETE "/zones//dns_records/xxx"
Error: 7003: Could not route to /client/v4/zones/dns_records/xxx, perhaps your object identifier is invalid?> logger="cert-manager.controller" resource_name="my-domain-tld-production-1-2685090930-326339775" resource_namespace="cert-manager" resource_kind="Challenge" resource_version="v1" dnsName="my-domain.tld" type="DNS-01" E0201 09:57:32.685461       1 controller.go:157] "re-queuing item due to error processing" err=<while querying the Cloudflare API for DELETE "/zones//dns_records/xxx"
          Error: 7003: Could not route to /client/v4/zones/dns_records/xxx, perhaps your object identifier is invalid?

The certificate status is Issuing certificate as Secret does not exist

@onedr0p
Copy link

onedr0p commented Feb 1, 2025

After November 30th, 2024, Cloudflare will stop including the zone_id and zone_name fields on individual DNS records in API responses. These fields are currently ignored when sent to the API as part of a request body, so no changes to request bodies are required.

It only took them 2 months after the date they gave to actually update their API. 😢

@Gu35t09
Copy link

Gu35t09 commented Feb 2, 2025

I've had the same issue in my testing environment.

I was able to at least get the certificate request working by logging in to cloudflare and manually modfiy the new txt record adding"" (so for example "recordadccd...").

It still doesn't delete the record but it's still possibile to generate a certificate if needed.

@sandweel
Copy link

sandweel commented Feb 3, 2025

The DNS record is successfully created and verified using the Token or Global API key, but it cannot be deleted. Since the cleanup process cannot be completed, the certificate issuance has failed.

Environment details:

Kubernetes version: v1.24.3
cert-manager version: v1.6.1

Minikube version: v1.35.0
cert-manager version: v1.16.3

Both have the same Cleanup error:

Status:
  Presented:   true
  Processing:  true
  Reason:      while querying the Cloudflare API for DELETE "/zones//dns_records/3acebb4c72640773f23144cdaa91842c" 
                Error: 7003: Could not route to /client/v4/zones/dns_records/3acebb4c72640773f23144cdaa91842c, perhaps your object identifier is invalid?
  State:       valid
Events:
  Type     Reason          Age                 From                     Message
  ----     ------          ----                ----                     -------
  Normal   Started         3m6s                cert-manager-challenges  Challenge scheduled for processing
  Normal   Presented       2m57s               cert-manager-challenges  Presented challenge using DNS-01 challenge mechanism
  Normal   DomainVerified  106s                cert-manager-challenges  Domain "testsslmanual2.cloud.aw3.dev" verified with "DNS-01" validation
  Warning  CleanUpError    25s (x5 over 102s)  cert-manager-challenges  Error cleaning up challenge: while querying the Cloudflare API for DELETE "/zones//dns_records/3acebb4c72640773f23144cdaa91842c

LukeCarrier added a commit to LukeCarrier/cert-manager that referenced this issue Feb 3, 2025
Cloudflare have stopped including zone IDs in their record responses
now, 2 months after they said they did and with their trademark zero
effort in outreach to consumers of their API. Ensure that findTxtRecord
returns a record struct with the zone ID set regardless.

Fixes cert-manager#7540
LukeCarrier added a commit to LukeCarrier/cert-manager that referenced this issue Feb 3, 2025
Cloudflare have stopped including zone IDs in their record responses
now, 2 months after they said they did and with their trademark zero
effort in outreach to consumers of their API. Ensure that findTxtRecord
returns a record struct with the zone ID set regardless.

Fixes cert-manager#7540

Signed-off-by: Luke Carrier <luke@carrier.family>
@dev-ago
Copy link

dev-ago commented Feb 4, 2025

A small workaround that worked for us yesterday was to manually delete the TXT Record _acme_challenge created in Cloudflare by the Cert Manager.

@nielsNocore
Copy link

nielsNocore commented Feb 4, 2025

@dev-ago

A small workaround that worked for us yesterday was to manually delete the TXT Record _acme_challenge created in Cloudflare by the Cert Manager.

Also worked for us, good workaround for now

@uofirob
Copy link

uofirob commented Feb 4, 2025

When you delete that txt record, how long should it take for the Kubernetes cluster to stop throwing the error? Do I need to force a refresh with a specific command? I tried resetting my cluster, but it just re-created the TXT entry in cloudflare.

Update it seems to have finally gone through. Thanks for the workaround!

@LukeCarrier
Copy link
Contributor

Of note, we could keep cloudFlareRecord struct as is, and just patch the returned record with the zoneID from here.

This is the approach I took in #7549 👍

@0x2b3bfa0
Copy link

0x2b3bfa0 commented Feb 7, 2025

I was in a rush and ended up writing a hacky script to delete the records:

function request() {
  curl "https://api.cloudflare.com/client/v4$2" \
    --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
    --header "Content-Type: application/json" \
    --request "$1"
}

records=$(
  kubectl --namespace cert-manager logs deployment/cert-manager |
  perl -ne '/DELETE "\/zones\/\/dns_records\/([[:xdigit:]]+)"/ and print "$1\n"' |
  sort --uniq
)

request GET /zones |
jq --raw-output '.result[].id' |
while read zone; do
  for record in $records; do
    request DELETE "/zones/$zone/dns_records/$record"
  done
done

@alextricity25
Copy link

alextricity25 commented Feb 11, 2025

This is affecting all versions of cert-manager, correct? We are running 1.15.3 and are starting to notice that our Challenges are never successfully going through. I don't get a specific error as others are reporting (1 sync.go:132] "error cleaning up challenge" err=<while querying the Cloudflare API for DELETE "/zones//dns_records/xxx")

I tried upgrading to 1.17 on a different environment, and I am getting the same error as others are seeing where the record isn't being cleaned up and the certificate is not being created.

In my case, on my environments which still have 1.15.3, the Challenge is stuck with the message "Presented challenge using DNS-01 challenge mechanism". It sounds like there must be a difference between 1.15.3 and 1.16.3 that is causing this issue to surface differently.

Image

On that note, would the fix for this be backported to 1.15, and would this require all users who are using the CloudFlare DNS challenge mechanism to upgrade?

@cjdaniel
Copy link

This is affecting all versions of cert-manager, correct? We are running 1.15.3

Cannot speak to all versions, but we are running 1.15.1 and are affected.

@onedr0p
Copy link

onedr0p commented Feb 11, 2025

Anyone here or the company they work for pay for cert-manager support or uses Venafi / CyberArk and has support thru them that can get this issue prioritized? 😄

Pretty crazy this has been broken for two weeks without a word from the maintainers.

@alextricity25
Copy link

When you delete that txt record, how long should it take for the Kubernetes cluster to stop throwing the error? Do I need to force a refresh with a specific command? I tried resetting my cluster, but it just re-created the TXT entry in cloudflare.
Update it seems to have finally gone through. Thanks for the workaround!

@uofirob How long did it take? Deleting the TXT record is doing nothing for me 😐

@uofirob
Copy link

uofirob commented Feb 12, 2025 via email

@whoiscnu
Copy link

Hello Team.,

Facing the issue isssue with all cert-manager versions. We observed it is an isssue from Feb 3rd when a cert renewal did not happen and is in renewing status.

Error cleaning up challenge: while querying the Cloudflare API for DELETE "/zones//dns_records/XXXXXXX" Error: 7003: Could not route to /client/v4/zones/dns_records/XXXXXXX, perhaps your object identifier is invalid?

@SgtCoDFish
Copy link
Member

Thanks all for raising this, I'll take a look and try to get a fix deployed.

Quick note: the only versions where this would be patched would be the currently supported releases: 1.17, 1.16 and 1.12 LTS - I mention that because I've seen a few mentions of 1.15.x in this issue and we won't do a patch release for 1.15 since it's now EOL!

cert-manager-bot pushed a commit to cert-manager-bot/cert-manager that referenced this issue Feb 12, 2025
Cloudflare have stopped including zone IDs in their record responses
now, 2 months after they said they did and with their trademark zero
effort in outreach to consumers of their API. Ensure that findTxtRecord
returns a record struct with the zone ID set regardless.

Fixes cert-manager#7540

Signed-off-by: Luke Carrier <luke@carrier.family>
cert-manager-bot pushed a commit to cert-manager-bot/cert-manager that referenced this issue Feb 12, 2025
Cloudflare have stopped including zone IDs in their record responses
now, 2 months after they said they did and with their trademark zero
effort in outreach to consumers of their API. Ensure that findTxtRecord
returns a record struct with the zone ID set regardless.

Fixes cert-manager#7540

Signed-off-by: Luke Carrier <luke@carrier.family>
SgtCoDFish pushed a commit to SgtCoDFish/cert-manager that referenced this issue Feb 12, 2025
Cloudflare have stopped including zone IDs in their record responses
now, 2 months after they said they did and with their trademark zero
effort in outreach to consumers of their API. Ensure that findTxtRecord
returns a record struct with the zone ID set regardless.

Fixes cert-manager#7540

Manually fixed up to apply cleanly, also includes
dfba339 cherry picked from master

Signed-off-by: Luke Carrier <luke@carrier.family>
Signed-off-by: Ashley Davis <SgtCoDFish@users.noreply.github.com>
@SgtCoDFish
Copy link
Member

Reopening as this isn't fixed until releases are published!

@SgtCoDFish SgtCoDFish reopened this Feb 12, 2025
@SgtCoDFish
Copy link
Member

SgtCoDFish commented Feb 13, 2025

First release is published, notifying here for anyone that wants to fix ASAP: https://github.com/cert-manager/cert-manager/releases/tag/v1.17.1

I tested this on my own site (which conveniently happens to use cert-manager + Cloudflare) and it worked as expected.

I'll edit this message when I've done 1.16 (EDIT: v1.16.4 is done!) and 1.12, although 1.12 will be slower. Once they're done, I'll close this issue.

That said: I'd recommend anyone on 1.12 using Cloudflare DNS to update to a newer version since newer versions contain other improvements to the Cloudflare DNS solver which you almost certainly want!

@epollia
Copy link

epollia commented Feb 13, 2025

Can you tell me when we can expect an update for version 1.12 on quay.io/repository/jetstack/cert-manager-controller?

@SgtCoDFish
Copy link
Member

We don't give definitive dates for any releases, but I'm hoping to do the release either tomorrow or Monday. It takes a little more time to release v1.12 because of how it's structured, and there's another PR (#7570) I want to land before I start the release.

As in my previous message though: I'd strongly recommend updating to v1.16.4 (which is now released) or v1.17.1 if using the Cloudflare DNS-01 solver - obviously v1.12 is still supported for now, but we don't backport everything and there are other improvements in the newer versions that you'd probably want. We have a full guide on upgrading from 1.12 -> 1.16 on the website.

@SgtCoDFish
Copy link
Member

cert-manager v1.12.16 and v1.16.4 are now live with the fix included. Please test it out!

Given the nature of the issue, it's possible that there might need to be some manual cleanup of the DNS records before it works, but I'm not in a position to be able to test or confirm that. Hopefully, though, this should be enough to fix the issue!

Thanks again to everyone involved, I'll close this now!

@alextricity25
Copy link

@SgtCoDFish Yes I needed to clean up the TXT _acme-challenge records. Otherwise the challenge would fail with unexpected non-ACME API error" err="context deadline exceeded". This is especially true for anyone with wildcard domains, since the TXT record is not namespaced when upon creation (i.e. _acme-challenge vs _acme-challenge.<my-subdomain>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet