-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deferred job submission failes after #8893 #8908
Comments
PreJob needs to find
yet it is not in the DAGJob.jdl logged in the TW temp directory crab3@crab-dev-tw01:/data/srv/tmp/_250131_100639:belforte_crab_20250131_110635rz9qfzbv$ grep CRAB_Job DAGJob.jdl
My.CRAB_JobSW = "CMSSW_13_3_0"
My.CRAB_JobArch = "el8_amd64_gcc12"
My.CRAB_JobCount = 30
crab3@crab-dev-tw01:/data/srv/tmp/_250131_100639:belforte_crab_20250131_110635rz9qfzbv$ |
could this be |
using
But now PreJob fails
|
|
the crab3@crab-dev-tw01:/data/srv/tmp/_250131_105201:belforte_crab_20250131_115157toscgred$ grep CRAB_TaskSubmitTime *jdl
DAGJob.jdl:CRAB_TaskSubmitTime = 1738320746
subdag.jdl:CRAB_TaskSubmitTime = 1738320746 define (unusee) variables, not classAds |
bug.
|
after that PreJobs run and say things like
but the DEFER is gone from DAG file [crabtw@vocms059 cluster10093032.proc0.subproc0]$ cat RunJobs.dag|grep PRE|grep -v SKIP|cut -d' ' -f 1-9|head
#SCRIPT PRE FinalCleanup dag_bootstrap.sh FINAL $DAG_STATUS $FAILED_COUNT cmsweb-test2.cern.ch:8443
SCRIPT PRE Job1 dag_bootstrap.sh PREJOB $RETRY 1
SCRIPT PRE Job2 dag_bootstrap.sh PREJOB $RETRY 2
SCRIPT PRE Job3 dag_bootstrap.sh PREJOB $RETRY 3
SCRIPT PRE Job4 dag_bootstrap.sh PREJOB $RETRY 4
SCRIPT PRE Job5 dag_bootstrap.sh PREJOB $RETRY 5
SCRIPT PRE Job6 dag_bootstrap.sh PREJOB $RETRY 6
SCRIPT PRE Job7 dag_bootstrap.sh PREJOB $RETRY 7
SCRIPT PRE Job8 dag_bootstrap.sh PREJOB $RETRY 8 that led to PRE being retried 3 times always failing and Dagman aborted |
pfff..
works with +.CRAB_JobReleaseTimeout but not with My.CRAB_JobReleaseTimeout I have no idea why they used such a strict check |
after fixing all of that, things look OK.
|
that was using |
I also used above task to try |
rats.. CRABServer/src/python/TaskWorker/Actions/DagmanCreator.py Lines 522 to 526 in ba25203
|
my bad. I need to do |
All OK now ! |
after #8893 all PreJob scripts run at same time at beginning and all jobs are submitted even if
+CRAB_JobReleaseTimeout=60
is correctly propagated to scheduler in Job.submit and DAG file hasThe text was updated successfully, but these errors were encountered: