Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup TaskType names #8488

Closed
3 tasks done
Tracked by #8337
belforte opened this issue Jun 11, 2024 · 1 comment
Closed
3 tasks done
Tracked by #8337

cleanup TaskType names #8488

belforte opened this issue Jun 11, 2024 · 1 comment

Comments

@belforte
Copy link
Member

belforte commented Jun 11, 2024

in the context of Modernize HTC calls
see #8337 (comment)

  • rename TaskType ad to CRAB_DAGType
  • rename ROOT type to MAIN. So that dagman jobs will have CRAB_DagType in ["MAIN", "PROCESSING", "TAIL"]. "normal" tasks will only have "MAIN". "automatic" will use "MAIN" for running probes, followed by "PROCESSING" and up to 3 "TAIL".
    But I am still not happy here. ROOT is more descriptive for a DAG which can be either the full thing standalone or the probe step for automatic splitting. Maybe "BASE" ?
    I will start with renaming the classAd, at least it will make it easier later on to identify where it is used and possibly change the value.

BEWARE these changes will not be backward compatible ! i.e. crab kill and crab resubmit will not work on tasks which have TaskType but not CRAB_DAGType.

  • add a backward-compatibility layer (to be remove after 2 months in production)

Note: ROOT is assigned in initial submission (so it i the same for everybody) in DagamSubmitter. Later on PreDag uses the stage name when it submits the various subdags. Stage name is set in DagmanCreator and used in PreDag and PreJob

List of possible values. Note that CRAB_DAGType is the classAd of a job !! Simply it takes a non trivial value when the job runs in the Scheduler universe to execut a DagMan

kind of job universe CRAB_DAGType stage name
main Dagman w/o automatic splitting scheduler (7) BASE conventional
main Dagman with automatic splitting scheduler (7) BASE probe
grid job vanilla (5) Job not used
task_process local (12) undefined not used
processing subDag scheduler (7) PROCESSING processing
tail subDag scheduler (7) TAIL tail

Possible alternative names for ROOT: MAIN , BASE

TaskType ad is defined in

jobJDL["+TaskType"] = classad.quote("ROOT") # we want the ad value to be "ROOT", not ROOT

and

and in case of automatic splitting in

'-append', '+TaskType = "{0}"'.format(stage.upper()), subdag])

the stage in PreDAG is passed as argument when PreDAG is called during Dag execution:

SCRIPT DEFER 4 300 PRE Job{count}SubJobs dag_bootstrap.sh PREDAG {stage} {completion} {count}

and the code which managed the various DAGs creation is the somehow cryptic

## Write down the DAG as needed by DAGMan.
restHostForSchedd = kwargs['task']['resthost']
dag = DAG_HEADER.format(nodestate='.{0}'.format(parent) if parent else ('.0' if stage == 'processing' else ''),
resthost=restHostForSchedd)
if stage == 'probe':
dagSpecs = dagSpecs[:getattr(self.config.TaskWorker, 'numAutomaticProbes', 5)]
for dagSpec in dagSpecs:
dag += DAG_FRAGMENT.format(**dagSpec)
if stage in ('probe', 'processing'):
# default for probe DAG: only one processing DAG after 100% of the probe jobs have completed
subdagCompletions = [100]
nextStage = {'probe': 'processing', 'processing': 'tail'}[stage]
if stage == 'processing' and len(dagSpecs) > getattr(self.config.TaskWorker, 'minAutomaticTailSize', 100):
subdagCompletions = getattr(self.config.TaskWorker, 'minAutomaticTailTriggers', [50, 80, 100])
for n, percent in enumerate(subdagCompletions, 0 if stage == 'probe' else 1):
subdagSpec = {
'count': n,
'stage': nextStage,
'completion': (len(dagSpecs) * percent) // 100
}
dag += SUBDAG_FRAGMENT.format(**subdagSpec)
subdag = "RunJobs{count}.subdag".format(**subdagSpec)
with open(subdag, "w", encoding='utf-8') as fd:
fd.write("")
subdags.append(subdag)

TaskType ad is also used in

function perform_condorq {
DAG_INFO=$(condor_q -constr 'CRAB_ReqName =?= "'$REQUEST_NAME'" && stringListMember(TaskType, "ROOT PROCESSING TAIL", " ")' -af ClusterId JobStatus EnteredCurrentStatus)
TIME_OF_LAST_QUERY=$(date +"%s")
log "HTCondor query of DAG status done on $(date '+%Y/%m/%d %H:%M:%S %Z')"
}

tailconst = "TaskType =?= \"TAIL\" && CRAB_ReqName =?= %s" % classad.quote(ad.get("CRAB_ReqName"))

rootConst = f"TaskType =?= \"ROOT\" && CRAB_ReqName =?= {classad.quote(workflow)}"

task_ads = schedd.query('TaskType =?= "ROOT" && CRAB_HC =!= "True"', QUERY_ATTRS)

const = f'CRAB_ReqName =?= {classad.quote(self.workflow)} && TaskType=?="Job"'

rootConst = f'stringListMember(TaskType, "ROOT PROCESSING TAIL", " ") && CRAB_ReqName =?= {classad.quote(self.workflow)}'

@belforte
Copy link
Member Author

do not put priority very high, since in the end it is quite some work, need to changes a lot of places, for "small" gain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant