Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

it contains Tasks that don't exist: Couldn't retrieve Task "" #6408

Closed
jihwong opened this issue Mar 21, 2023 · 2 comments · Fixed by #6424
Closed

it contains Tasks that don't exist: Couldn't retrieve Task "" #6408

jihwong opened this issue Mar 21, 2023 · 2 comments · Fixed by #6424
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jihwong
Copy link

jihwong commented Mar 21, 2023

Expected Behavior

Expect pipelinerun to run normally

Actual Behavior

Running again is normal, and this error occasionally occurs.

Discover through webhook logs. Creating a resolutionrequest normally will result in three "knative.dev/operation":"CREATE" records. But the cluster-564cb716f353f3b29acf2eeae11d0d07 has six records

#k logs -n tekton-pipelines tekton-pipelines-webhook-696cb8f894-kvwg8 |grep 6c189aca7abcb406496e8aa9bbb5fd5d|grep CREATE|wc -l
3

#k logs -n tekton-pipelines tekton-pipelines-webhook-696cb8f894-kvwg8 |grep 564cb716f353f3b29acf2eeae11d0d07 |grep CREATE|wc -l
6

resolutionrequests (There are usually three records, but an error occurs when the second record is created)

#k get resolutionrequests -n NAMESPACE |grep XXX
cluster-564cb716f353f3b29acf2eeae11d0d07   PipelineRun   XXX   True                 2023-03-20T15:26:16Z   2023-03-20T15:26:16Z
cluster-5ed0564a1b690a9c48c1da7294932a56   PipelineRun   XXX   True                 2023-03-20T15:26:15Z   2023-03-20T15:26:15Z

pipelinerun.status content

status:
  completionTime: "2023-03-20T15:26:16Z"
  conditions:
  - lastTransitionTime: "2023-03-20T15:26:16Z"
    message: 'Pipeline NAMESPACE/PIPELINERUN_NAME can''t be Run; it contains
      Tasks that don''t exist: Couldn''t retrieve Task "": error requesting remote
      resource: resolutionrequests.resolution.tekton.dev "cluster-564cb716f353f3b29acf2eeae11d0d07"
      already exists'
    reason: CouldntGetTask
    status: "False"
    type: Succeeded

Additional Info

  • Tekton Pipeline version:
    v0.41.0
@jihwong jihwong added the kind/bug Categorizes issue or PR as related to a bug. label Mar 21, 2023
@jihwong jihwong closed this as not planned Won't fix, can't repro, duplicate, stale Mar 21, 2023
@jihwong jihwong reopened this Mar 21, 2023
@l-qing
Copy link
Member

l-qing commented Mar 22, 2023

Yes, this is a bug. The relevant code is here:

// Submit constructs a ResolutionRequest object and submits it to the
// kubernetes cluster, returning any errors experienced while doing so.
// If ResolutionRequest is succeeded then it returns the resolved data.
func (r *CRDRequester) Submit(ctx context.Context, resolver ResolverName, req Request) (ResolvedResource, error) {
rr, _ := r.lister.ResolutionRequests(req.Namespace()).Get(req.Name())
if rr == nil {
if err := r.createResolutionRequest(ctx, resolver, req); err != nil {
return nil, err
}
return nil, resolutioncommon.ErrRequestInProgress
}

I think we can ignore the already exists error and wait for the next reconciliation. Such as:

func (r *CRDRequester) Submit(ctx context.Context, resolver ResolverName, req Request) (ResolvedResource, error) {
	rr, err := r.lister.ResolutionRequests(req.Namespace()).Get(req.Name())
	if rr == nil {
		if err := r.createResolutionRequest(ctx, resolver, req); err != nil && !apierrors.IsAlreadyExists(err) {
			return nil, err
		}
		return nil, resolutioncommon.ErrorRequestInProgress
	}

In my environment, fixing it this way can avoid that error. Not sure if there will be any other issues.

l-qing added a commit to l-qing/pipeline that referenced this issue Mar 22, 2023
fix tektoncd#6408

When submitting quickly, the creation may fail because the cache is not
updated. We can assume that is in progress, and the next reconcile will
handle it based on the actual situation.
l-qing added a commit to l-qing/pipeline that referenced this issue Mar 22, 2023
fix tektoncd#6408

When submitting quickly, the creation may fail because the cache is not
updated. We can assume that is in progress, and the next reconcile will
handle it based on the actual situation.
l-qing added a commit to l-qing/pipeline that referenced this issue Mar 22, 2023
fix tektoncd#6408

When submitting quickly, the creation may fail because the cache is not
updated. We can assume that is in progress, and the next reconcile will
handle it based on the actual situation.
@jihwong
Copy link
Author

jihwong commented Mar 23, 2023

Yes, this is a bug. The relevant code is here:

// Submit constructs a ResolutionRequest object and submits it to the
// kubernetes cluster, returning any errors experienced while doing so.
// If ResolutionRequest is succeeded then it returns the resolved data.
func (r *CRDRequester) Submit(ctx context.Context, resolver ResolverName, req Request) (ResolvedResource, error) {
rr, _ := r.lister.ResolutionRequests(req.Namespace()).Get(req.Name())
if rr == nil {
if err := r.createResolutionRequest(ctx, resolver, req); err != nil {
return nil, err
}
return nil, resolutioncommon.ErrRequestInProgress
}

I think we can ignore the already exists error and wait for the next reconciliation. Such as:

func (r *CRDRequester) Submit(ctx context.Context, resolver ResolverName, req Request) (ResolvedResource, error) {
	rr, err := r.lister.ResolutionRequests(req.Namespace()).Get(req.Name())
	if rr == nil {
		if err := r.createResolutionRequest(ctx, resolver, req); err != nil && !apierrors.IsAlreadyExists(err) {
			return nil, err
		}
		return nil, resolutioncommon.ErrorRequestInProgress
	}

In my environment, fixing it this way can avoid that error. Not sure if there will be any other issues.

Thank
I modified it in my environment and ran it for a while to see if it worked properly

l-qing added a commit to l-qing/pipeline that referenced this issue Mar 23, 2023
fix tektoncd#6408

When submitting quickly, the creation may fail because the cache is not
updated. We can assume that is in progress, and the next reconcile will
handle it based on the actual situation.
l-qing added a commit to l-qing/pipeline that referenced this issue Mar 24, 2023
fix tektoncd#6408

When submitting quickly, the creation may fail because the cache is not
updated. We can assume that is in progress, and the next reconcile will
handle it based on the actual situation.
l-qing added a commit to l-qing/pipeline that referenced this issue Mar 24, 2023
fix tektoncd#6408

When the time interval between two reconciliations of the
owner (TaskRun, PipelineRun) of a ResolutionRequest is short,
it may cause the second reconciliation to fail when triggering
a Submit because the informer cache may not have been updated yet.

In this case, we can assume that it is in progress, and the next
reconciliation will handle it based on the actual situation.
tekton-robot pushed a commit that referenced this issue Mar 27, 2023
fix #6408

When the time interval between two reconciliations of the
owner (TaskRun, PipelineRun) of a ResolutionRequest is short,
it may cause the second reconciliation to fail when triggering
a Submit because the informer cache may not have been updated yet.

In this case, we can assume that it is in progress, and the next
reconciliation will handle it based on the actual situation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants