Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurmd runs the installation hook twice #85

Closed
jedel1043 opened this issue Mar 3, 2025 · 0 comments · Fixed by #86
Closed

Slurmd runs the installation hook twice #85

jedel1043 opened this issue Mar 3, 2025 · 0 comments · Fixed by #86
Labels
C-slurm Component: Slurm

Comments

@jedel1043
Copy link
Collaborator

Bug Description

The Slurmd charm runs the installation hook twice, which is a bit suboptimal since it makes the installation phase longer than necessary.

To Reproduce

juju deploy slurmd --channel latest/edge --base ubuntu@24.04

Environment

~ juju --version
3.6.3-genericlinux-amd64
➜  ~ lxd --version
5.21.3 LTS

Relevant log output

controller-0: 10:33:47 INFO juju.worker.pruner.statushistory pruner config: max age: 336h0m0s, max collection size 5120M for hpc (b31a5bac-a547-4267-8756-1511d7c9452c)
controller-0: 10:33:47 INFO juju.worker.provisioner entering provisioner task loop; using provisioner pool with 16 workers
controller-0: 10:33:47 INFO juju.worker.provisioner provisioning in zones: [jedel-Canonical]
controller-0: 10:46:00 INFO juju.worker.provisioner provisioning in zones: [jedel-Canonical]
controller-0: 10:46:00 INFO juju.worker.provisioner found machine pending provisioning id:0, details:0
controller-0: 10:46:02 INFO juju.worker.provisioner trying machine 0 StartInstance in availability zone jedel-Canonical
controller-0: 10:46:04 INFO juju.worker.provisioner started machine 0 as instance juju-c9452c-0 with hardware "arch=amd64 cores=0 mem=0M virt-type=container", network config [], volumes [], volume attachments map[], subnets to zones [], lxd profiles []
controller-0: 10:46:04 INFO juju.worker.instancemutater.environ no changes necessary to machine-0 lxd profiles ([default juju-hpc])
controller-0: 10:46:05 INFO juju.worker.instancepoller machine "0" (instance ID "juju-c9452c-0") instance status changed from {"running" "Container started"} to {"running" "Running"}
controller-0: 10:46:11 INFO juju.worker.instancepoller machine "0" (instance ID "juju-c9452c-0") has new addresses: [local-cloud:10.165.206.139@alpha]
machine-0: 10:46:57 INFO juju.cmd running jujud [3.4.3 6524e5d928d904ace31f6f81820a895bf37415b7 gc go1.21.10]
machine-0: 10:46:57 DEBUG juju.cmd   args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}
machine-0: 10:46:57 DEBUG juju.utils setting GOMAXPROCS to 32
machine-0: 10:46:57 DEBUG juju.agent read agent config, format "2.0"
machine-0: 10:46:57 INFO juju.worker.upgradesteps upgrade steps for 3.4.3 have already been run.
machine-0: 10:46:57 DEBUG juju.cmd.jujud start "engine"
machine-0: 10:46:57 INFO juju.cmd.jujud start "engine"
machine-0: 10:46:57 DEBUG juju.worker.dependency "syslog" manifold worker started at 2025-03-03 16:46:57.67568516 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-steps-gate" manifold worker started at 2025-03-03 16:46:57.676064154 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-check-gate" manifold worker started at 2025-03-03 16:46:57.676435224 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "termination-signal-handler" manifold worker started at 2025-03-03 16:46:57.676582151 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-steps-flag" manifold worker started at 2025-03-03 16:46:57.676836691 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "charmhub-http-client" manifold worker started at 2025-03-03 16:46:57.676960805 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-check-flag" manifold worker started at 2025-03-03 16:46:57.677546308 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "clock" manifold worker started at 2025-03-03 16:46:57.677799887 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "agent" manifold worker started at 2025-03-03 16:46:57.678121032 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "api-config-watcher" manifold worker started at 2025-03-03 16:46:57.678426648 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.introspection introspection worker listening on "@jujud-machine-0"
machine-0: 10:46:57 DEBUG juju.cmd.jujud "engine" started
machine-0: 10:46:57 DEBUG juju.worker.introspection stats worker now serving
machine-0: 10:46:57 DEBUG juju.worker.apicaller connecting with old password
machine-0: 10:46:57 DEBUG juju.worker.dependency "state-config-watcher" manifold worker started at 2025-03-03 16:46:57.688878352 +0000 UTC
machine-0: 10:46:57 DEBUG juju.api successfully dialed "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:46:57 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:46:57 INFO juju.worker.apicaller [b31a5b] "machine-0" successfully connected to "10.165.206.70:17070"
machine-0: 10:46:57 DEBUG juju.worker.apicaller changing password...
machine-0: 10:46:57 DEBUG juju.worker.dependency "is-not-controller-flag" manifold worker started at 2025-03-03 16:46:57.700118714 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "is-controller-flag" manifold worker started at 2025-03-03 16:46:57.700149161 +0000 UTC
machine-0: 10:46:57 INFO juju.worker.apicaller [b31a5b] password changed for "machine-0"
machine-0: 10:46:57 DEBUG juju.api RPC connection died
machine-0: 10:46:57 DEBUG juju.worker.dependency "api-caller" manifold worker stopped: restart immediately
machine-0: 10:46:57 DEBUG juju.worker.apicaller connecting with current password
machine-0: 10:46:57 DEBUG juju.api successfully dialed "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:46:57 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:46:57 INFO juju.worker.apicaller [b31a5b] "machine-0" successfully connected to "10.165.206.70:17070"
machine-0: 10:46:57 DEBUG juju.worker.dependency "api-caller" manifold worker started at 2025-03-03 16:46:57.720330368 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrader" manifold worker started at 2025-03-03 16:46:57.729595635 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-steps-runner" manifold worker started at 2025-03-03 16:46:57.73049995 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-steps-runner" manifold worker completed successfully
machine-0: 10:46:57 DEBUG juju.worker.dependency "migration-inactive-flag" manifold worker started at 2025-03-03 16:46:57.730828299 +0000 UTC
machine-0: 10:46:57 INFO juju.worker.upgrader desired agent binary version: 3.4.3
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-check-flag" manifold worker stopped: gate unlocked
stack trace:
github.com/juju/juju/worker/gate.init:95: gate unlocked
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-check-flag" manifold worker started at 2025-03-03 16:46:57.75380146 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "valid-credential-flag" manifold worker started at 2025-03-03 16:46:57.75653853 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "migration-fortress" manifold worker started at 2025-03-03 16:46:57.764769327 +0000 UTC
machine-0: 10:46:57 INFO juju.worker.deployer new context: units "", stopped ""
machine-0: 10:46:57 DEBUG juju.worker.dependency "deployer" manifold worker started at 2025-03-03 16:46:57.765895981 +0000 UTC
machine-0: 10:46:57 INFO juju.worker.deployer checking unit "slurmd/0"
machine-0: 10:46:57 INFO juju.worker.deployer deploying unit "slurmd/0"
machine-0: 10:46:57 DEBUG juju.worker.dependency "migration-minion" manifold worker started at 2025-03-03 16:46:57.775753044 +0000 UTC
machine-0: 10:46:57 INFO juju.worker.deployer creating new agent config for "slurmd/0"
machine-0: 10:46:57 DEBUG juju.agent read agent config, format "2.0"
machine-0: 10:46:57 INFO juju.worker.deployer starting workers for "slurmd/0"
machine-0: 10:46:57 DEBUG juju.worker.deployer start "slurmd/0"
machine-0: 10:46:57 INFO juju.worker.deployer start "slurmd/0"
machine-0: 10:46:57 DEBUG juju.worker.deployer created rotating log file "/var/log/juju/unit-slurmd-0.log" with max size 100 MB and max backups 2
machine-0: 10:46:57 DEBUG juju.worker.introspection introspection worker listening on "@jujud-unit-slurmd-0"
machine-0: 10:46:57 DEBUG juju.worker.deployer "slurmd/0" started
machine-0: 10:46:57 DEBUG juju.worker.introspection stats worker now serving
machine-0: 10:46:57 INFO juju.worker.migrationminion migration migration phase is now: NONE
machine-0: 10:46:57 DEBUG juju.worker.machinesetup Starting machine setup requiring an API connection
machine-0: 10:46:57 DEBUG juju.worker.logger initial log config: "<root>=DEBUG"
machine-0: 10:46:57 DEBUG juju.worker.fanconfigurer Processing new fan config
machine-0: 10:46:57 DEBUG juju.worker.dependency "ssh-authkeys-updater" manifold worker started at 2025-03-03 16:46:57.786825068 +0000 UTC
machine-0: 10:46:57 INFO juju.worker.logger logger worker started
machine-0: 10:46:57 DEBUG juju.worker.dependency "host-key-reporter" manifold worker started at 2025-03-03 16:46:57.786941888 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "state-converter" manifold worker started at 2025-03-03 16:46:57.786988436 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "disk-manager" manifold worker started at 2025-03-03 16:46:57.78700645 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "log-sender" manifold worker started at 2025-03-03 16:46:57.787021048 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "machine-action-runner" manifold worker started at 2025-03-03 16:46:57.78702786 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "upgrade-series" manifold worker started at 2025-03-03 16:46:57.787032479 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "logging-config-updater" manifold worker started at 2025-03-03 16:46:57.78703802 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "reboot-executor" manifold worker started at 2025-03-03 16:46:57.787042207 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "api-address-updater" manifold worker started at 2025-03-03 16:46:57.787048088 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "storage-provisioner" manifold worker started at 2025-03-03 16:46:57.787052507 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "proxy-config-updater" manifold worker started at 2025-03-03 16:46:57.787056665 +0000 UTC
machine-0: 10:46:57 DEBUG juju.worker.dependency "agent-config-updater" manifold worker stopped: resource permanently unavailable
stack trace:
resource permanently unavailable
github.com/juju/juju/worker/fortress.Occupy:63:
github.com/juju/juju/cmd/jujud/agent/engine.Housing.Decorate.occupyStart.func1:93:
machine-0: 10:46:57 DEBUG juju.utils.ssh reading authorised keys file /home/ubuntu/.ssh/authorized_keys
machine-0: 10:46:57 DEBUG juju.utils.ssh reading authorised keys file /home/ubuntu/.ssh/authorized_keys
machine-0: 10:46:57 DEBUG juju.utils.ssh writing authorised keys file /home/ubuntu/.ssh/authorized_keys
machine-0: 10:46:57 DEBUG juju.worker.logger reconfiguring logging from "<root>=DEBUG" to "<root>=INFO"
machine-0: 10:46:57 ERROR juju.worker.dependency "lxd-container-provisioner" manifold worker returned unexpected error: container types not yet available
machine-0: 10:46:57 ERROR juju.worker.dependency "broker-tracker" manifold worker returned unexpected error: no container types determined
machine-0: 10:46:57 ERROR juju.worker.dependency "kvm-container-provisioner" manifold worker returned unexpected error: container types not yet available
machine-0: 10:46:57 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:46:57 INFO juju.worker.authenticationworker "machine-0" key updater worker started
machine-0: 10:46:57 INFO juju.worker.upgradeseries no series upgrade lock present
machine-0: 10:46:57 INFO juju.worker.machiner setting addresses for "machine-0" to [local-machine:127.0.0.1 local-cloud:10.165.206.139 local-machine:::1]
machine-0: 10:46:57 WARNING juju.worker.machinesetup determining kvm support: INFO: /dev/kvm does not exist
HINT:   sudo modprobe kvm_amd
modprobe: FATAL: Module msr not found in directory /lib/modules/6.8.0-54-generic
: exit status 1
no kvm containers possible
machine-0: 10:46:57 INFO juju.worker.machiner "machine-0" started
machine-0: 10:46:57 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
unit-slurmd-0: 10:46:57 INFO juju Starting unit workers for "slurmd/0"
unit-slurmd-0: 10:46:57 INFO juju.worker.apicaller [b31a5b] "unit-slurmd-0" successfully connected to "10.165.206.70:17070"
unit-slurmd-0: 10:46:57 INFO juju.worker.apicaller [b31a5b] password changed for "unit-slurmd-0"
unit-slurmd-0: 10:46:57 INFO juju.worker.apicaller [b31a5b] "unit-slurmd-0" successfully connected to "10.165.206.70:17070"
unit-slurmd-0: 10:46:57 INFO juju.worker.upgrader no waiter, upgrader is done
unit-slurmd-0: 10:46:57 INFO juju.worker.migrationminion migration migration phase is now: NONE
unit-slurmd-0: 10:46:57 INFO juju.worker.logger logger worker started
machine-0: 10:46:57 INFO juju.worker.leadership slurmd/0 promoted to leadership of slurmd
unit-slurmd-0: 10:46:57 ERROR juju.worker.meterstatus error running "meter-status-changed": charm missing from disk
machine-0: 10:46:57 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-slurmd-0
machine-0: 10:46:57 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/3.4.3-ubuntu-amd64
unit-slurmd-0: 10:46:57 INFO juju.worker.uniter unit "slurmd/0" started
unit-slurmd-0: 10:46:57 INFO juju.worker.uniter resuming charm install
unit-slurmd-0: 10:46:57 INFO juju.worker.uniter.charm downloading ch:amd64/noble/slurmd-115 from API server
machine-0: 10:46:57 INFO juju.downloader downloading from ch:amd64/noble/slurmd-115
machine-0: 10:46:57 INFO juju.downloader download complete ("ch:amd64/noble/slurmd-115")
machine-0: 10:46:57 INFO juju.downloader download verified ("ch:amd64/noble/slurmd-115")
machine-0: 10:47:01 INFO juju.worker.kvmprovisioner machine-0 does not support kvm container
machine-0: 10:47:01 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-0: 10:47:01 INFO juju.container unused config option: "container-networking-method" -> "local"
unit-slurmd-0: 10:47:03 INFO juju.worker.uniter hooks are retried true
unit-slurmd-0: 10:47:04 INFO juju.worker.uniter.storage initial storage attachments ready
unit-slurmd-0: 10:47:04 INFO juju.worker.uniter found queued "install" hook
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install Restarting services...
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install Service restarts being deferred:
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install  /etc/needrestart/restart.d/dbus.service
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install  systemctl restart systemd-logind.service
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install  systemctl restart unattended-upgrades.service
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install No containers need to be restarted.
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install No user sessions are running outdated binaries.
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install
unit-slurmd-0: 10:47:05 WARNING unit.slurmd/0.install No VM guests are running outdated hypervisor (qemu) binaries on this host.
unit-slurmd-0: 10:47:06 INFO unit.slurmd/0.juju-log Running legacy hooks/install.
unit-slurmd-0: 10:47:06 INFO unit.slurmd/0.juju-log rebooting unit slurmd/0
unit-slurmd-0: 10:47:06 INFO juju.worker.uniter.context trying to kill context process 7491
unit-slurmd-0: 10:47:06 INFO juju.worker.uniter.context waiting for context process 7491 to die
unit-slurmd-0: 10:47:06 INFO juju.worker.uniter.context kill returned: os: process already finished
unit-slurmd-0: 10:47:06 INFO juju.worker.uniter.context assuming already killed
unit-slurmd-0: 10:47:07 INFO juju.worker.uniter.operation ran "install" hook (via hook dispatching script: dispatch)
unit-slurmd-0: 10:47:07 INFO juju.worker.uniter unit "slurmd/0" shutting down: machine needs to reboot
machine-0: 10:47:07 INFO juju.worker.deployer stopped "slurmd/0", err: machine needs to reboot
machine-0: 10:47:07 ERROR juju.worker.deployer fatal error "slurmd/0": machine needs to reboot
machine-0: 10:47:28 INFO juju.cmd running jujud [3.4.3 6524e5d928d904ace31f6f81820a895bf37415b7 gc go1.21.10]
machine-0: 10:47:28 DEBUG juju.cmd   args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}
machine-0: 10:47:28 DEBUG juju.utils setting GOMAXPROCS to 32
machine-0: 10:47:28 DEBUG juju.agent read agent config, format "2.0"
machine-0: 10:47:28 INFO juju.agent.setup setting logging config to "<root>=INFO"
machine-0: 10:47:28 INFO juju.worker.upgradesteps upgrade steps for 3.4.3 have already been run.
machine-0: 10:47:28 INFO juju.cmd.jujud start "engine"
machine-0: 10:47:28 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:47:28 INFO juju.worker.apicaller [b31a5b] "machine-0" successfully connected to "10.165.206.70:17070"
machine-0: 10:47:28 INFO juju.worker.upgrader desired agent binary version: 3.4.3
machine-0: 10:47:28 INFO juju.worker.deployer new context: units "slurmd/0", stopped ""
machine-0: 10:47:28 INFO juju.worker.deployer creating new agent config for "slurmd/0"
machine-0: 10:47:28 INFO juju.worker.deployer starting workers for "slurmd/0"
machine-0: 10:47:28 INFO juju.worker.deployer start "slurmd/0"
machine-0: 10:47:28 INFO juju.worker.deployer checking unit "slurmd/0"
machine-0: 10:47:28 INFO juju.worker.deployer checking unit "slurmd/0"
machine-0: 10:47:28 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:47:28 INFO juju.worker.migrationminion migration migration phase is now: NONE
machine-0: 10:47:28 INFO juju.worker.logger logger worker started
machine-0: 10:47:28 INFO juju.api connection established to "wss://10.165.206.70:17070/model/b31a5bac-a547-4267-8756-1511d7c9452c/api"
machine-0: 10:47:28 INFO juju.worker.kvmprovisioner machine-0 does not support kvm container
machine-0: 10:47:28 INFO juju.worker.upgradeseries no series upgrade lock present
machine-0: 10:47:28 INFO juju.worker.machiner setting addresses for "machine-0" to [local-machine:127.0.0.1 local-cloud:10.165.206.139 local-machine:::1]
machine-0: 10:47:28 INFO juju.container.lxd Availability zone will be empty for this container manager
machine-0: 10:47:28 INFO juju.container unused config option: "container-networking-method" -> "local"
unit-slurmd-0: 10:47:28 INFO juju Starting unit workers for "slurmd/0"
unit-slurmd-0: 10:47:28 INFO juju.worker.apicaller [b31a5b] "unit-slurmd-0" successfully connected to "10.165.206.70:17070"
unit-slurmd-0: 10:47:28 INFO juju.worker.apicaller [b31a5b] "unit-slurmd-0" successfully connected to "10.165.206.70:17070"
unit-slurmd-0: 10:47:28 INFO juju.worker.upgrader no waiter, upgrader is done
machine-0: 10:47:28 INFO juju.worker.machiner "machine-0" started
machine-0: 10:47:28 WARNING juju.worker.machinesetup determining kvm support: INFO: /dev/kvm does not exist
HINT:   sudo modprobe kvm_amd
modprobe: FATAL: Module msr not found in directory /lib/modules/6.8.0-54-generic
: exit status 1
no kvm containers possible
unit-slurmd-0: 10:47:28 INFO juju.worker.migrationminion migration migration phase is now: NONE
unit-slurmd-0: 10:47:28 INFO juju.worker.logger logger worker started
machine-0: 10:47:28 INFO juju.worker.authenticationworker "machine-0" key updater worker started
machine-0: 10:47:28 INFO juju.worker.leadership slurmd/0 promoted to leadership of slurmd
machine-0: 10:47:28 INFO juju.agent.tools ensure jujuc symlinks in /var/lib/juju/tools/unit-slurmd-0
machine-0: 10:47:28 INFO juju.agent.tools was a symlink, now looking at /var/lib/juju/tools/3.4.3-ubuntu-amd64
unit-slurmd-0: 10:47:28 INFO juju.worker.uniter unit "slurmd/0" started
unit-slurmd-0: 10:47:28 INFO juju.worker.uniter hooks are retried true
unit-slurmd-0: 10:47:28 INFO juju.worker.uniter.storage initial storage attachments ready
unit-slurmd-0: 10:47:28 INFO juju.worker.uniter found queued "install" hook
unit-slurmd-0: 10:47:28 INFO unit.slurmd/0.juju-log Running legacy hooks/install.
unit-slurmd-0: 10:47:28 INFO unit.slurmd/0.juju-log parsed 4 apt package repositories from /etc/apt/sources.list.d/ubuntu.sources
unit-slurmd-0: 10:47:28 INFO unit.slurmd/0.juju-log ['add-apt-repository', '--yes', '--sourceslist=deb https://ppa.launchpadcontent.net/ubuntu-hpc/experimental/ubuntu noble main', '--no-update']
unit-slurmd-0: 10:48:19 INFO unit.slurmd/0.juju-log installing node health check (nhc)
unit-slurmd-0: 10:48:19 INFO unit.slurmd/0.juju-log extracting nhc tarball
unit-slurmd-0: 10:48:19 INFO unit.slurmd/0.juju-log building nhc with autotools
unit-slurmd-0: 10:48:22 INFO unit.slurmd/0.juju-log testing nhc build
unit-slurmd-0: 10:48:25 INFO unit.slurmd/0.juju-log installing nhc
unit-slurmd-0: 10:48:25 INFO unit.slurmd/0.juju-log installing RDMA packages: ['rdma-core', 'infiniband-diags']
unit-slurmd-0: 10:48:29 INFO unit.slurmd/0.juju-log enabling OpenMPI UCX transport in /etc/openmpi/openmpi-mca-params.conf
unit-slurmd-0: 10:48:29 INFO unit.slurmd/0.juju-log detecting GPUs and installing drivers
unit-slurmd-0: 10:48:32 INFO unit.slurmd/0.juju-log no GPU drivers requiring installation
unit-slurmd-0: 10:48:32 WARNING unit.slurmd/0.install Created symlink /etc/systemd/system/multi-user.target.wants/juju-slurmd-0-systemd-notices.service → /etc/systemd/system/juju-slurmd-0-systemd-notices.service.
unit-slurmd-0: 10:48:33 INFO unit.slurmd/0.juju-log parsed 2 apt package repositories from /etc/apt/sources.list.d/archive_uri-https_ppa_launchpadcontent_net_ubuntu-hpc_experimental_ubuntu-noble.list
unit-slurmd-0: 10:48:33 INFO unit.slurmd/0.juju-log parsed 4 apt package repositories from /etc/apt/sources.list.d/ubuntu.sources
unit-slurmd-0: 10:48:33 INFO unit.slurmd/0.juju-log ['add-apt-repository', '--yes', '--sourceslist=deb https://ppa.launchpadcontent.net/ubuntu-hpc/experimental/ubuntu noble main', '--no-update']
unit-slurmd-0: 10:48:35 INFO unit.slurmd/0.juju-log installing node health check (nhc)
unit-slurmd-0: 10:48:35 INFO unit.slurmd/0.juju-log extracting nhc tarball
unit-slurmd-0: 10:48:35 INFO unit.slurmd/0.juju-log building nhc with autotools
unit-slurmd-0: 10:48:38 INFO unit.slurmd/0.juju-log testing nhc build
unit-slurmd-0: 10:48:41 INFO unit.slurmd/0.juju-log installing nhc
unit-slurmd-0: 10:48:41 INFO unit.slurmd/0.juju-log installing RDMA packages: ['rdma-core', 'infiniband-diags']
unit-slurmd-0: 10:48:41 INFO unit.slurmd/0.juju-log enabling OpenMPI UCX transport in /etc/openmpi/openmpi-mca-params.conf
unit-slurmd-0: 10:48:41 INFO unit.slurmd/0.juju-log detecting GPUs and installing drivers
unit-slurmd-0: 10:48:42 INFO unit.slurmd/0.juju-log no GPU drivers requiring installation
unit-slurmd-0: 10:48:43 INFO juju.worker.uniter.operation ran "install" hook (via hook dispatching script: dispatch)
unit-slurmd-0: 10:48:43 INFO juju.worker.uniter found queued "leader-elected" hook
unit-slurmd-0: 10:48:43 INFO juju.worker.uniter.operation ran "leader-elected" hook (via hook dispatching script: dispatch)
unit-slurmd-0: 10:48:44 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-slurmd-0: 10:48:44 INFO juju.worker.uniter found queued "start" hook
unit-slurmd-0: 10:48:44 INFO unit.slurmd/0.juju-log Running legacy hooks/start.
unit-slurmd-0: 10:48:44 INFO juju.worker.uniter.operation ran "start" hook (via hook dispatching script: dispatch)

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-slurm Component: Slurm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant