-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try scheduling as much as available #4528
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Previously, it would just schedule about ~1500 tasks unless the regions were totally full. Now we will schedule up to 15K tasks. Also, we will take into account batch's queueing (there could be other reasons for this besides CPU quota, though there shouldn't) and tasks that were already scheduler but not sent to batch or preprocessed so we don't overload the queue.
# TODO(metzman): This doesn't distinguish between fuzz and non-fuzz | ||
# tasks (nor preemptible and non-preemptible CPUs). Fix this. | ||
waiting_tasks = sum( | ||
batch.count_queued_or_scheduled_tasks(project, region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option to simulate these behaviors is https://simpy.readthedocs.io/
https://brooker.co.za/blog/2022/04/11/simulation.html
It is hard to imagine what these policies imply, by just the description
logs.info(f'Soon committed CPUs: {soon_commited_cpus}') | ||
available_cpus = sum( | ||
get_available_cpus_for_region(project, region) for region in regions) | ||
available_cpus = max(available_cpus - soon_commited_cpus, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we take the queue size into account, we can go back to running this very frequently, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm...I had the opposite thought, that because we can schedule so many more at once, there's no need to run it so often. I think there can be a slight delay between publishing and reaching the queue, so probably above 5 minutes makes most sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. offered a design time option to anticipate real world behavior of these batch scheduling policies
This reverts commit d222215.
The preprocess count for fuzz task went to zero after #4564 got deployed, reverting. #4528 is also being reverted because it introduced the following error into the fuzz task scheduler, which caused fuzz tasks to stop being scheduled: ``` Traceback (most recent call last): File "/mnt/scratch0/clusterfuzz/src/python/bot/startup/run_cron.py", line 68, in <module> sys.exit(main()) ^^^^^^ File "/mnt/scratch0/clusterfuzz/src/python/bot/startup/run_cron.py", line 64, in main return 0 if task_module.main() else 1 ^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/cron/schedule_fuzz.py", line 304, in main return schedule_fuzz_tasks() ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/cron/schedule_fuzz.py", line 284, in schedule_fuzz_tasks available_cpus = get_available_cpus(project, regions) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/cron/schedule_fuzz.py", line 247, in get_available_cpus result = pool.starmap_async( # pylint: disable=no-member ^^^^^^^^^^^^^^^^^^ AttributeError: 'ProcessPoolExecutor' object has no attribute 'starmap_async' ```
Instead of trying to schedule a small amount of fuzz tasks every 2 minutes and hoping this leads to fuzzing at full capacity, just schedule almost the full amount at once.
The preprocess count for fuzz task went to zero after #4564 got deployed, reverting. #4528 is also being reverted because it introduced the following error into the fuzz task scheduler, which caused fuzz tasks to stop being scheduled: ``` Traceback (most recent call last): File "/mnt/scratch0/clusterfuzz/src/python/bot/startup/run_cron.py", line 68, in <module> sys.exit(main()) ^^^^^^ File "/mnt/scratch0/clusterfuzz/src/python/bot/startup/run_cron.py", line 64, in main return 0 if task_module.main() else 1 ^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/cron/schedule_fuzz.py", line 304, in main return schedule_fuzz_tasks() ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/cron/schedule_fuzz.py", line 284, in schedule_fuzz_tasks available_cpus = get_available_cpus(project, regions) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/scratch0/clusterfuzz/src/clusterfuzz/_internal/cron/schedule_fuzz.py", line 247, in get_available_cpus result = pool.starmap_async( # pylint: disable=no-member ^^^^^^^^^^^^^^^^^^ AttributeError: 'ProcessPoolExecutor' object has no attribute 'starmap_async' ```
Instead of trying to schedule a small amount of fuzz tasks every 2 minutes and hoping this leads
to fuzzing at full capacity, just schedule almost the full amount at once.