Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job arrays or job scripts? #66

Open
TomNicholas opened this issue Nov 12, 2018 · 1 comment
Open

Job arrays or job scripts? #66

TomNicholas opened this issue Nov 12, 2018 · 1 comment

Comments

@TomNicholas
Copy link
Contributor

TomNicholas commented Nov 12, 2018

Currently experi uses the job array functionality to submit multiple jobs, but I want to question whether this is the best thing to do.

Creating a single batch script specifying a job array means that you don't have access to the {variables} in any of the scheduler options, so you can't vary (for example) processor number (and hence potentially resolution), output or err files.

If instead experi created a template for a single non-array job, then copied specific instances of this template into individual run directories, then it could be much more flexible. This would also be better for reproducibility, because the exact options used would be stored in the directory containing the output. It would help when one job needs to be re-run or restarted (which can easily happen due to numerical instabilities), because the job file for that simulation would be stand-alone.

Wrapping all the commands in a bash array also makes debugging harder, as in #65.

Finally it seems like some job schedulers don't have an option to submit arrays (see LoadLeveler here), which would mean they can never be supported by experi.

I suppose that some of the advantages of using the job array system are the ability to control the jobs together using commands like scancel (or whatever the PBS equivalents are), but I'm not sure this outweighs the disadvantages of using the very limited array system which PBS and SLURM currently have implemented.

@malramsay64
Copy link
Owner

The last time I asked myself this question was before I had started developing experi, so it is a good thing to revisit it now.

So easy things first;

Finally it seems like some job schedulers don't have an option to submit arrays (see LoadLeveler here), which would mean they can never be supported by experi.

Experi supports PBS because that is the scheduler on the systems I use, and working on SLURM since that is the most common criticism of the package. I personally currently have no intentions of adding support for other schedulers, although happy for someone else to do that work and submit a pull request. I'm not particularly worried about not supporting LoadLeveler.

Wrapping all the commands in a bash array also makes debugging harder, as in #65.

That was my mistake in not checking functionality actually worked properly, the kind of error which you encountered shouldn't occur. Another type of error which I would expect to be more common is an error in specifying the variables as you would like. Having an array of commands in a single file which can be easily compared makes spotting that kind of error easier. The generated files can be checked using experi --dry-run.

If instead experi created a template for a single non-array job, ... flexible ... reproducibility ... one job needs to be re-run.

Summarising your paragraph, I agree with all the reasons you posit for a single jobs as being excellent attributes to strive for, although I don't know if they are all exclusive to single files.

On the reproducibility front I am using the experiment.yml file as my reference file (see my Crystal Melting project) defining all the exact specifications for each different simulation. Though this is a much higher level higher level reproducibility than a single computation.

For a single job being re-run I have introduced the --use-dependencies command line option, which is both a terrible name (open to suggestions) and poorly documented. It checks for the existence of the creates files in each command, generating a subset of the entire array containing the commands to generate the files which don't exist.

Creating a single batch script specifying a job array means that you don't have access to the {variables} in any of the scheduler options, so you can't vary (for example) processor number (and hence potentially resolution), output or err files.

This is definitely a deficiency of the array jobs, with the 'solution' being to over-provision, specifying for the most demanding jobs and having wastage on the smaller ones.

The way I see it, for small experiments with tens of jobs, there is not really a benefit to using array jobs. Where there are hundreds or thousands of variable combinations I think there is a real benefit to using array jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants