Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix8913 #8915

Merged
merged 3 commits into from
Feb 12, 2025
Merged

Fix8913 #8915

merged 3 commits into from
Feb 12, 2025

Conversation

aspiringmind-code
Copy link
Contributor

See Issue #8913
The check for task['status'] == Completed should happen after the checkpublication condition
Tested {'TN': '250206_133625:crabint1_crab_20250206_143624', 'testResult': 'TestPassed', 'dbStatus': 'SUBMITTED', 'combinedStatus': 'COMPLETED', 'jobsPerStatus': {'finished': 1, 'probe': 5, 'rescheduled': 0}, 'publication (fail/tdb/DBS)': '0/0/0'}

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 33 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-CRABServer-PR-test/2371/artifact/artifacts/PullRequestReport.html

@belforte
Copy link
Member

belforte commented Feb 6, 2025

at first sight I think that moving the status check is a mistake. We should talk about this, or you can point to a test w/o it.

@aspiringmind-code
Copy link
Contributor Author

at first sight I think that moving the status check is a mistake. We should talk about this, or you can point to a test w/o it.

A test without the movement of the block continues to be stuck at "TestRunning": See 250206_105544:crabint1_crab_20250206_115542 in here

@belforte
Copy link
Member

belforte commented Feb 6, 2025

if publication is not completed, TestRunning is the desired outcome.

@belforte
Copy link
Member

belforte commented Feb 6, 2025

OK. Now I am lost in CMS/Atlas build system presentation. Will try to figure out when I have time. Given that current test is broken, feel free to merge.

for job in task['jobs'].keys():
if '-' in job:
jobsToPublish -= 1
# Remove probe jobs (job id of X-Y kind) from count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see how this code gives different result to previous one. It should still give -4 for jobsToPublish. It is simply a bit/ easier to read.
BTW please let's stick to snakeCase convention.

if published_in_transfersdb == jobsToPublish and published_in_dbs == jobsToPublish:
result = 'TestPassed'
elif failedPublications:
#result = 'TestFailed'
needToResubmitPublication = True
else:
result = 'TestRunning'
if task['status'] == 'COMPLETED':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this simply discards whatever publication count was done before and says "passed" if status is COMPLETED, regardless of publication.
Or am I lost indentations ?

@belforte
Copy link
Member

belforte commented Feb 6, 2025

I looked a bit more and I think that this PR is wrong. I suggest to disable publication check in the pipeline yaml file to make pipeline succeed and run the test scrip interactively making sure we cover all use cases. Failing probes, failing processing, failing publication ....

@aspiringmind-code
Copy link
Contributor Author

I think i understand the problem now. Since we already subtract probe_jobs to get total_jobs in here, by having it again, we are effectively subtracting twice. You're right the movement of the task['status'] after the checkpublication block is wrong. All we need to do is just use simple total_jobs as jobstoPublish instead of double subtraction of probe_jobs.

@belforte
Copy link
Member

belforte commented Feb 6, 2025

maybe there's a cleaner way to detect how many jobs should be checked ! Setting aside proble job only once. But I am now taken until tomorrow afternoon. Basically this is "code that was written w/o knowing that automatic splitting exists" which was then adapted "hack by hack based on the failure of the day" :-(

@cmsdmwmbot
Copy link

Jenkins results:

  • Python3 Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 32 comments to review
  • Pycodestyle check: succeeded
    • 19 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-CRABServer-PR-test/2372/artifact/artifacts/PullRequestReport.html

@aspiringmind-code
Copy link
Contributor Author

@belforte Is it ok if I merge this for now?

@belforte belforte self-requested a review February 12, 2025 08:22
Copy link
Member

@belforte belforte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@belforte
Copy link
Member

yeah.. sorry for delay

@belforte belforte merged commit 7063eb9 into dmwm:master Feb 12, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants