Duplicates deletion error #6217

kokhanevych-macpaw · 2022-04-29T15:38:30Z

Bug description
I found the issue with duplicates deletion. When I enabled this feature I started to face such errors sometimes during report uploading:
{"message":"Internal server error, check logs for details"}
In django logs I found the error:

ERROR [dojo.api_v2.exception_handler:32] insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"
DETAIL: Key (finding_id)=(742929) is not present in table "dojo_finding".
django.db.utils.IntegrityError: insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"

As I understand DD tried to find a finding that was deleted as a duplicate.

@valentijnscholten commented on Slack:
"The job to delete duplicates runs every minute, so if you have an import that runs longer than that, there may be a race condition that newly imported findings which are duplicates of existing findings are deleted before the import is complete. I can't remember exactly what the transactional model is in django, but probably not everything runs in a transaction so duplicates become visible to the job that deletes duplicates before the import is completed."

Steps to reproduce
Steps to reproduce the behavior:

Create a project.
Enable Deduplicate findings and Delete duplicates features in settings.
Import a report with a lot of findings (more than 100).
Import a few reports with duplicates to the project.
After some imports you will face an error in logs:
{"message":"Internal server error, check logs for details"}

Expected behavior
The report should be imported successfully.

Deployment method (select with an X)

Docker Compose
[X] Kubernetes
GoDojo

Environment information

DefectDojo version 2.9.1

Logs

ERROR [dojo.api_v2.exception_handler:32] insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"
DETAIL: Key (finding_id)=(742929) is not present in table "dojo_finding".
django.db.utils.IntegrityError: insert or update on table "dojo_test_import_finding_action" violates foreign key constraint "dojo_test_import_fin_finding_id_28fe8e2d_fk_dojo_find"

The text was updated successfully, but these errors were encountered:

bgoareguer · 2022-11-14T14:16:27Z

I would really like to activate duplicate deletion but I guess it would be safer to fix this issue first. Is there any investigation on this issue?

sameeraksc · 2023-03-11T02:04:04Z

we also have the same issue. any plan to fix it soon :-(

sameeraksc · 2023-03-11T16:42:07Z

@Maffooch

MathRig · 2023-03-21T14:16:31Z

Same for us. We have the same but in 2.19.2 version.

fhoeborn · 2023-03-22T08:26:31Z

There is a possible solution in the linked Draft Pull Request, it's currently having trouble with the JIRA integration's unit tests. This is holding it back from leaving its draft state. Feel free to take a look - maybe you have a idea for that situation.

quirinziessler · 2023-06-16T14:04:13Z

Is there any update on this or plan when this will be fixed?
I think I have the same issue with newest version :/

BoBeR182 · 2023-08-29T18:55:25Z

Also affects uploading large trivy.json results via pipeline if import takes over 1 minute.

BoBeR182 · 2023-08-30T14:23:36Z

A hot fix of changing the following settings allowed me to upload the 5mb trivy scan import.

DD_ASYNC_FINDING_IMPORT=True
DD_ASYNC_FINDING_IMPORT_CHUNK_SIZE=100

MRagulin · 2024-06-24T20:05:16Z

I faced with the same problem. Main reason is race condition. It's already been discussed and @fhoeborn offer pull request to fix it. But it still not implemented.
#7444

mickeyYA · 2025-01-09T11:59:21Z

any updates regarding this?

valentijnscholten · 2025-01-09T15:52:51Z

The PR is quite a big change in transactional behaviour, which might be hard to predict what the impact will be on all the different use cases people are using Defect Dojo imports.
A quickfix could be to make the 1 minute setting configurable. Currently the duplicate delete job runs every minute. If you change that to 5 minutes, it will drastically reduce the chance of this race condition occurring for small imports.
We could change the default to 5 minutes and/or make it configurable via a System Setting or environment variable.

It would also be an option to create the Test_Import_Finding_Action objects instantly during import instead of after the import process has completed.

valentijnscholten · 2025-01-15T08:48:58Z

@Maffooch Do you think some suppressions like #11549 could help here as well?

valentijnscholten · 2025-01-31T16:16:01Z

The history is created using bulk_create:

django-DefectDojo/dojo/importers/base_importer.py

Lines 360 to 391 in d3b3c26

    
           # Define all of the respective import finding actions for the test import object 
        
           test_import_finding_action_list = [] 
        
           for finding in closed_findings: 
        
               logger.debug(f"preparing Test_Import_Finding_Action for closed finding: {finding.id}") 
        
               test_import_finding_action_list.append(Test_Import_Finding_Action( 
        
                   test_import=test_import, 
        
                   finding=finding, 
        
                   action=IMPORT_CLOSED_FINDING, 
        
               )) 
        
           for finding in new_findings: 
        
               logger.debug(f"preparing Test_Import_Finding_Action for created finding: {finding.id}") 
        
               test_import_finding_action_list.append(Test_Import_Finding_Action( 
        
                   test_import=test_import, 
        
                   finding=finding, 
        
                   action=IMPORT_CREATED_FINDING, 
        
               )) 
        
           for finding in reactivated_findings: 
        
               logger.debug(f"preparing Test_Import_Finding_Action for reactivated finding: {finding.id}") 
        
               test_import_finding_action_list.append(Test_Import_Finding_Action( 
        
                   test_import=test_import, 
        
                   finding=finding, 
        
                   action=IMPORT_REACTIVATED_FINDING, 
        
               )) 
        
           for finding in untouched_findings: 
        
               logger.debug(f"preparing Test_Import_Finding_Action for untouched finding: {finding.id}") 
        
               test_import_finding_action_list.append(Test_Import_Finding_Action( 
        
                   test_import=test_import, 
        
                   finding=finding, 
        
                   action=IMPORT_UNTOUCHED_FINDING, 
        
               )) 
        
           # Bulk create all the defined objects 
        
           Test_Import_Finding_Action.objects.bulk_create(test_import_finding_action_list)

The with suppress(...) construct will not be helpful here as the error is happening inside Postgres while executing the query. So the query will have failed. There's no way to tell Postgress to ignore the error and continue inserting records for findings that still exist. The ignore_conflicts parameter in Django will send the ON CONFLICT CONTINUE clause to Postgress, but this will only work for duplicate key violations or similar violations. It has no effect on ForeignKey violations like we are running into here.

The only "correct" solution is to indeed go with transactions, but as stated before that impact is or might be quite big so needs further consideration and testing.

For now the safest way would be to create the import history records one-by-one instead of using bulk_create. That way we can catch errors for each record, but continue with the other records. Alternatively we create the import history recorf for each finding immediately after creating (or closing/reopening) the finding itself.

What do you think @Maffooch ?

Maffooch · 2025-01-31T17:12:03Z

Great analysis on this one. I think your proposal here would be the safe bet for a baby step approach. If all well goes there, we could further explore the idea of creating finding actions right after the action on the finding is made

For now the safest way would be to create the import history records one-by-one instead of using bulk_create. That way we can catch errors for each record, but continue with the other records. Alternatively we create the import history recorf for each finding immediately after creating (or closing/reopening) the finding itself.

valentijnscholten · 2025-02-05T16:59:26Z

I have created multiple PRs.

Option A: Use try-catch to catch the IntegrityError -> #11739
I tried reproducing the error in a unit test or even running a local instance with some extra code to delete a finding after importing it.
But Django has some safeguards in place during bulk_create or even during create/save. So the PR just catches IntegrityError and hopes for the best.During the PR creation I realized that also the code after the import history might be affected where tags are added to findings and endpoint of findings.In theory here the race condition could also happen, so I added the same try-catch here.
I am not sure if this is a sustainable solution as we have to keep this in mind for any future further processing at this point.
The root cause is that the deduplication for findings is triggered before the history is created, before tags are added, etc.
And this PR makes us look we don't know what we are doing with all this try-catch happening.

Option B: Inline the creation of history and tag objects -> #11740
I tried to see what the code would look like if we moved the import history creation before the dedupe is triggered via finding.save(). Although it feels like less clean code because I had to move it around a bit, it at least looks we know what we're doing. I guess this just removes the premature optimization we had in place with the history being created in bulk instead of on the fly.

Option C: Postponse dedupe and JIRA pushing -> #11741
I wanted to see what the code would look like if we would truely to all importing and processing first before doing deduplication and before pushing to JIRA. This PR has those changes. To be honest I didn't test this and it might not even pass the unit tests with the JIRA recording and all. This PR would be the most "fun" option to pursue from a personale/developer perspective, but it might not be the best way to spend my/our time. If we spend time on a bigger refactoring like this, I think it should be introducing transactions and make Dojo work correctly with those.

The PRs are all in draft to get some feedback from other Defect Dojo developers including @Maffooch

I'm a bit torn between Option A and Option B.

valentijnscholten · 2025-02-15T15:57:48Z

Option A was merged into bugfix and should be out in 2.43.3.

@kokhanevych-macpaw @bgoareguer @sameeraksc @MathRig @fhoeborn @quirinziessler @MRagulin

mickeyYA · 2025-02-15T16:17:11Z

thank you for addressing this

kokhanevych-macpaw added the bug label Apr 29, 2022

fhoeborn mentioned this issue Jan 18, 2023

Improve transaction behaviour on imports #7444

Closed

rndmh3ro mentioned this issue Mar 14, 2024

[Bug] Mitigation / Reimport failing telekom-mms/trivy-dojo-report-operator#62

Closed

This was referenced Feb 5, 2025

Duplicate Delete errors: catch IntegrityErrors (A) #11739

Merged

Duplicate Delete Errors: Inline import history + tags (B) #11740

Closed

Duplicate Delete Errors: postpone dedupe and jira (C) #11741

Closed

valentijnscholten closed this as completed Feb 17, 2025

valentijnscholten mentioned this issue Feb 18, 2025

Simultaneous Nuclei Findings Submission from Multiple Agents Causes Internal Server Error #8882

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicates deletion error #6217

Duplicates deletion error #6217

kokhanevych-macpaw commented Apr 29, 2022

bgoareguer commented Nov 14, 2022

sameeraksc commented Mar 11, 2023

sameeraksc commented Mar 11, 2023

MathRig commented Mar 21, 2023

fhoeborn commented Mar 22, 2023

quirinziessler commented Jun 16, 2023 •

edited

Loading

BoBeR182 commented Aug 29, 2023

BoBeR182 commented Aug 30, 2023

MRagulin commented Jun 24, 2024 •

edited by valentijnscholten

Loading

mickeyYA commented Jan 9, 2025

valentijnscholten commented Jan 9, 2025 •

edited

Loading

valentijnscholten commented Jan 15, 2025

valentijnscholten commented Jan 31, 2025

Maffooch commented Jan 31, 2025

valentijnscholten commented Feb 5, 2025

valentijnscholten commented Feb 15, 2025 •

edited

Loading

mickeyYA commented Feb 15, 2025

Duplicates deletion error #6217

Duplicates deletion error #6217

Comments

kokhanevych-macpaw commented Apr 29, 2022

bgoareguer commented Nov 14, 2022

sameeraksc commented Mar 11, 2023

sameeraksc commented Mar 11, 2023

MathRig commented Mar 21, 2023

fhoeborn commented Mar 22, 2023

quirinziessler commented Jun 16, 2023 • edited Loading

BoBeR182 commented Aug 29, 2023

BoBeR182 commented Aug 30, 2023

MRagulin commented Jun 24, 2024 • edited by valentijnscholten Loading

mickeyYA commented Jan 9, 2025

valentijnscholten commented Jan 9, 2025 • edited Loading

valentijnscholten commented Jan 15, 2025

valentijnscholten commented Jan 31, 2025

Maffooch commented Jan 31, 2025

valentijnscholten commented Feb 5, 2025

valentijnscholten commented Feb 15, 2025 • edited Loading

mickeyYA commented Feb 15, 2025

quirinziessler commented Jun 16, 2023 •

edited

Loading

MRagulin commented Jun 24, 2024 •

edited by valentijnscholten

Loading

valentijnscholten commented Jan 9, 2025 •

edited

Loading

valentijnscholten commented Feb 15, 2025 •

edited

Loading