Slurm version 24.05.4 was released, including a fix for a recently discovered security issue with the new stepmgr subsystem. A mistake in authentication handling in stepmgr could permit an attacker to execute processes under other users’ jobs. This is limited to jobs running with –stepmgr, or on systems that have globally enabled stepmgr through “SlurmctldParameters=enable_stepmgr” in their configuration.
The Stepmgr feature was introduced in the Slurm release this year. It enables offloading job step management from the Slurm controller, which is suitable for systems with heavy step usage as it improves job concurrency and reduces RPC congestion. You can configure this feature by account, partition, or globally.
Sites having this configuration in place and running the vulnerable version of Slurm should update to version 24.05.4.
Details about this vulnerability can be found here: https://nvd.nist.gov/vuln/detail/CVE-2024-48936
Recent Comments