-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job arrays or job scripts? #66
Comments
The last time I asked myself this question was before I had started developing experi, so it is a good thing to revisit it now. So easy things first;
Experi supports PBS because that is the scheduler on the systems I use, and working on SLURM since that is the most common criticism of the package. I personally currently have no intentions of adding support for other schedulers, although happy for someone else to do that work and submit a pull request. I'm not particularly worried about not supporting LoadLeveler.
That was my mistake in not checking functionality actually worked properly, the kind of error which you encountered shouldn't occur. Another type of error which I would expect to be more common is an error in specifying the variables as you would like. Having an array of commands in a single file which can be easily compared makes spotting that kind of error easier. The generated files can be checked using
Summarising your paragraph, I agree with all the reasons you posit for a single jobs as being excellent attributes to strive for, although I don't know if they are all exclusive to single files. On the reproducibility front I am using the For a single job being re-run I have introduced the
This is definitely a deficiency of the array jobs, with the 'solution' being to over-provision, specifying for the most demanding jobs and having wastage on the smaller ones. The way I see it, for small experiments with tens of jobs, there is not really a benefit to using array jobs. Where there are hundreds or thousands of variable combinations I think there is a real benefit to using array jobs. |
Currently experi uses the job array functionality to submit multiple jobs, but I want to question whether this is the best thing to do.
Creating a single batch script specifying a job array means that you don't have access to the {variables} in any of the scheduler options, so you can't vary (for example) processor number (and hence potentially resolution), output or err files.
If instead experi created a template for a single non-array job, then copied specific instances of this template into individual run directories, then it could be much more flexible. This would also be better for reproducibility, because the exact options used would be stored in the directory containing the output. It would help when one job needs to be re-run or restarted (which can easily happen due to numerical instabilities), because the job file for that simulation would be stand-alone.
Wrapping all the commands in a bash array also makes debugging harder, as in #65.
Finally it seems like some job schedulers don't have an option to submit arrays (see LoadLeveler here), which would mean they can never be supported by experi.
I suppose that some of the advantages of using the job array system are the ability to control the jobs together using commands like
scancel
(or whatever the PBS equivalents are), but I'm not sure this outweighs the disadvantages of using the very limited array system which PBS and SLURM currently have implemented.The text was updated successfully, but these errors were encountered: