-
Notifications
You must be signed in to change notification settings - Fork 7
Drop needs_reboot conditional in _check_status(...)
#11
Conversation
Notes: * _check_status needs C901 because it is too complex with the addition of `juju-reboot`. It should be simplified in a later commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NucciTheBoss!
Let’s talk about this further before it gets merged. |
Sounds good @jamesbeedy! Is there anything in particular about this proposed change that you would like to discuss? My main reasoning for adding this automatic reboot feature is that the "Machine needs reboot" conditional is interfering with my deployment scripts on clouds where the Juju base images are not as update to date with the latest Ubuntu packages. I can get around the block by manually rebooting the machines when this conditional is encountered, but the commands are cloud-specific. This makes it difficult for when you are writing a bootstrapping script intended to be used with multiple clouds. |
Hey @NucciTheBoss , The reasons we have avoided
There have been ideas discussed concerning a tighter charm code integration with slurm, where the slurmd charm code would be responsible for draining the node and then when no jobs were running and the node was drained, authorize a reboot (but this still did not account for We also have discussed triggering a reboot from an action ... but then the action never completes and we end up in a strange situation there too. |
Possibly we just remove checking if a machine needs reboot .... thoughts? |
Yeah... triggering reboots and then regaining control is always a pain. I'm certain we could find a "clever" way to trigger reboots using an action since def _on_reboot_action(self, _: ActionEvent) -> None:
RebootRequiredEvent().emit()
def _on_reboot_event(self, _: RebootRequiredEvent) -> None:
subprocess.run(["juju-reboot"]) This way you escape the inconvenient situation where Juju won't return the result of your reboot action and then the hook schedules the reboot.
I would prefer this 1000 times over scheduling a reboot/blocking deployment for an update 😅. If you all are alright with this, I can update my flock of pull requests to focus on removing these conditionals. If we still want to signal to the user that the machine requires a reboot, we could keep the needs reboot conditional but change it to just throwing a warning message in |
100% lets drop it 👊 |
Sweet! I'll get to work on removing the code then. We should discuss the issues with GPU and Infiniband drivers at the next sync. |
juju-reboot
to automatically restart slurmdbd units_check_status(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Description
This pull request adds
juju-reboot
to the_check_status(...)
function so that the machine will automatically restart if it needs security updates applied. Usingjuju-reboot
helps with automatic deployments as a human-operator is not required to manually intervene when the slurmdbd charm is first deployed.How was the code tested?
I tested this charm on a private OpenStack instance with ubuntu@22.04 base images that needed security updates to be deployed.
Checklist