Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a Submit object, not a classAd (dagAd) as argument to schedd.submit #8336

Closed
Tracked by #8337
belforte opened this issue Apr 15, 2024 · 4 comments
Closed
Tracked by #8337

Comments

@belforte
Copy link
Member

an htcondor.Submit object should be used as first argument to schedd.submit in

clusterId = schedd.submit(dagAd, 1, True, resultAds)

as documented in the submit paragraph of interacting with schedulers
see also #8333 (comment)

Use the syntax of submission file so that description is simpler and more clear and conversion to classAd is done by HTCondor

This is a bit non trivial since we need to make sure that all submission ads are properly handled, submitting a few test tasks is not enough to validate.

Hopefully DagmanSubmitter will be easier to read after this.

@belforte belforte mentioned this issue Apr 15, 2024
13 tasks
@belforte belforte changed the title use a Submit object, not a classAd (dagAd) as arg go schedd.submit use a Submit object, not a classAd (dagAd) as argument to schedd.submit Apr 15, 2024
@belforte belforte self-assigned this Apr 15, 2024
@belforte
Copy link
Member Author

belforte commented May 29, 2024

harder than I thought :-(

After some struggle I got to the point of submitting a dagman to the scheduler which as by design is in HOLD waiting for spool() command as in the current code

clusterId = schedd.submit(dagAd, 1, True, resultAds)
schedd.spool(resultAds)

But I can't find a way to make schedd.spool() work. Documentation appears inconsistent with reality.
I have sent a mail to support

mail text

Hi. I am trying to move to new way of submitting in python bindings
https://htcondor.readthedocs.io/en/latest/apis/python-bindings/api/htcondor.html#interacting-with-schedulers

But am facing an unexpected obstacle which I do not know how to solve.
Can you help ?

I am using bindings from HTCondor 23.7.2

Since in context of CMS CRAB we submit to remote schedulers,
I have to use

submitResult = schedd.submit(submit_object,...)
schedd.spool(the correct arg !)

So far we have been using the old format
clusterId = schedd.submit(classAd, returnAd)
schedd.spool(returnAd)

I have converted my code which used to pass a classAd object
to create a submit object and I think I getting there since
I get a SubmitResult object with the expected content
(Pdb) type(submitResult)
<class 'htcondor.htcondor.SubmitResult'>
(Pdb) submitResult.cluster()
104893441
(Pdb) submitResult.first_proc()
0
(Pdb) submitResult.num_procs()
1
Pdb) type(submitResult.clusterad())
<class 'classad.classad.ClassAd'>

But I can't find a proper argument for schedd.spool()
the documentation says:

spool(ad_list) → None :

Spools the files specified in a list of job ClassAds to the condor_schedd.

Parameters

    ad_list (list[ClassAds]) – A list of job descriptions; typically, this is the list returned by the jobs() method on the submit result object.
Raises

    RuntimeError – if there are any errors.

Note the :
typically, this is the list returned by the jobs() method on the submit result object.

But the submit result object does not have a jobs() method !!

I tried
schedd.spool(submitResult.clusterad()) , but (abbreviated)
Boost error: did not match C++ signature:

I tried with a list, but
(Pdb) schedd.spool([submitResult.clusterad()])
*** htcondor.HTCondorIOError: DCSchedd::spoolJobFiles:1:Job ad 0 did not have a proc id
(Pdb)

submitResult.clusterad() has a 'ClusterId' attribute but indeed not a ProcId nor Proc not "proc id"
(Pdb) submitResult.clusterad()['clusterid']
104893441
(Pdb) submitResult.clusterad()['procid']
*** KeyError: 'procid'
(Pdb) submitResult.clusterad()['proc']
*** KeyError: 'proc'
(Pdb) submitResult.clusterad()['proc id']
*** KeyError: 'proc id'
(Pdb)

I am out of ideas

Thanks
Stefano

Here's ToddM's answer

> But submitResult object does not have a jobs() method

    I'd always assumed that the documentation meant "the Submit object's jobs() method", but I'd never actually tested it; it turns out to be just a little more complicated than that.  Full example that seems to work for me follows.  (Job will go on hold because it doesn't produce the
specified output, but that makes it easier to check if everything got spooled correctly.)  Let me know if this works for you.

-- ToddM


#!/usr/env/env python3

import htcondor

collector = htcondor.Collector()
location = collector.locate(htcondor.DaemonTypes.Schedd, "azaphrael.org" )
schedd = htcondor.Schedd(location)

submit = htcondor.Submit("""
    universe = vanilla
    executable = /bin/sleep
    arguments = 1
    transfer_executable = false
    should_transfer_files = true

    transfer_input_files = /tmp/input.txt
    transfer_output_files = output.txt

    queue 1
""")


result = schedd.submit(
    submit,
    spool=True
)

wtaf = submit.jobs(
    count=result.num_procs(),
    clusterid=result.cluster(),
)
schedd.spool(list(wtaf))
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to [htcondor-users-request@cs.wisc.edu](mailto:htcondor-users-request@cs.wisc.edu) with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

@belforte
Copy link
Member Author

current state of the art is https://github.com/belforte/CRABServer/tree/add-user-policy-to-tape-recall-8354

By the way #8333 needs to be done at same time as this

@belforte
Copy link
Member Author

belforte commented May 29, 2024

back to mundane things, there's bunch of things which have different name in JDL vs classAd (e.g. TransferInput vs transfer_input_files) and custom ads which I did not add the + initially becasue lack the telling initial CRAB_ (e.g. MaxWallTimeMinsProbe and similar used for automatic splitting)

HTC ads which needs a different name in JDL

/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'TransferInput = InputFiles.tar.gz, subdag.ad' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'Err = /data/srv/tmp/_240529_160122:belforte_crab_20240529_18011856z6jrw8/request.err' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'Out = /data/srv/tmp/_240529_160122:belforte_crab_20240529_18011856z6jrw8/request.out' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'OtherJobRemoveRequirements = DAGManJobId =?= ClusterId' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'TransferOutput = RunJobs.dag.dagman.out, RunJobs.dag.rescue.001' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)

CRAB private ads which need the leading + sign. The first is a schedd ad coming from collector via HTCondorLocator, but it is not a job Ad ! (and by the way it is always null for us.. maybe it was needed once upon a time....)

/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'RemoteCondorSetup = ' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'MaxWallTimeMinsTail = 225' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'MaxWallTimeMinsProbe = 60' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'MaxWallTimeMinsRun = 30' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/htcondor/_lock.py:70: UserWarning: the line 'MaxWallTimeMins = 30' was unused by Submit object. Is it a typo?
  rv = func(*args, **kwargs)

belforte added a commit to belforte/CRABServer that referenced this issue May 30, 2024
belforte added a commit to belforte/CRABServer that referenced this issue May 30, 2024
@belforte
Copy link
Member Author

belforte commented May 31, 2024

submission works now. But Automatic splitting fails to submit the subdag with

Submitting job(s)ERROR: unable to read proxy file
ERROR: condor_submit failed; aborting.

fixed.

belforte added a commit to belforte/CRABServer that referenced this issue May 31, 2024
belforte added a commit to belforte/CRABServer that referenced this issue May 31, 2024
belforte added a commit to belforte/CRABServer that referenced this issue Jun 5, 2024
belforte added a commit that referenced this issue Jun 5, 2024
* Revert "Revert "Run test jobs on crab sched 903 (#8472)" (#8474)"

This reverts commit 0665454.

* Revert "Run test jobs on crab sched 903 (#8472)"

This reverts commit c5ec3ef.

* Revert "ensure proxyfile in RestInfoForFileTransfers.json is a filename w/o path. Fix #8464 (#8467)"

This reverts commit 9ced4fd.

* Revert "workaround for #8456 (#8466)"

This reverts commit 602f8d6.

* Revert "Update makeTests.py: collector param does not allow port #. Simply put FNAL first"

This reverts commit 7ac2b90.

* Revert "Update makeTests.py: add collector port for ITB"

This reverts commit f6c01eb.

* Revert "do not set RequestCpus in task submission JDL. Fix #8456 (#8457)"

This reverts commit 198e2d3.

* Revert "pass string, not bytes, to htcondor.param Fix #8450 (#8452)"

This reverts commit 856d1ef.

* Revert "schedd.xquery is deprecated. Use schedd.query. Fix #8447 (#8449)"

This reverts commit b129645.

* Revert "new format of schedd.submit)/spool() fix #8336 fix #8333 (#8448)"

This reverts commit 806226a.

* Revert "do not indicate unused args in FTS calls. Fix #8460 (#8475)"

This reverts commit 20d4f90.
belforte added a commit to belforte/CRABServer that referenced this issue Jun 28, 2024
…m#8448)

* new format of schedd.submit)/spool() fix dmwm#8336 fix dmwm#8333

* pylint
belforte added a commit to belforte/CRABServer that referenced this issue Jun 28, 2024
* Revert "Revert "Run test jobs on crab sched 903 (dmwm#8472)" (dmwm#8474)"

This reverts commit 0665454.

* Revert "Run test jobs on crab sched 903 (dmwm#8472)"

This reverts commit c5ec3ef.

* Revert "ensure proxyfile in RestInfoForFileTransfers.json is a filename w/o path. Fix dmwm#8464 (dmwm#8467)"

This reverts commit 9ced4fd.

* Revert "workaround for dmwm#8456 (dmwm#8466)"

This reverts commit 602f8d6.

* Revert "Update makeTests.py: collector param does not allow port #. Simply put FNAL first"

This reverts commit 7ac2b90.

* Revert "Update makeTests.py: add collector port for ITB"

This reverts commit f6c01eb.

* Revert "do not set RequestCpus in task submission JDL. Fix dmwm#8456 (dmwm#8457)"

This reverts commit 198e2d3.

* Revert "pass string, not bytes, to htcondor.param Fix dmwm#8450 (dmwm#8452)"

This reverts commit 856d1ef.

* Revert "schedd.xquery is deprecated. Use schedd.query. Fix dmwm#8447 (dmwm#8449)"

This reverts commit b129645.

* Revert "new format of schedd.submit)/spool() fix dmwm#8336 fix dmwm#8333 (dmwm#8448)"

This reverts commit 806226a.

* Revert "do not indicate unused args in FTS calls. Fix dmwm#8460 (dmwm#8475)"

This reverts commit 20d4f90.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant