SLURM is a queue management system and stands for Simple Linux Utility for Resource Management. SLURM was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world.
SLURM is similar in many ways to most other queue systems. You write a batch script then submit it to the queue manager. The queue manager then schedules your job to run on the queue (or partition in SLURM parlance) that you designate. Below we will provide an outline of how to submit jobs to SLURM, how SLURM decides when to schedule your job and how to monitor progress.
SLURM has a number of valuable features compared to other job management systems:
Summary of SLURM commands
The table below shows a summary of SLURM commands. These commands are described in more detail below along with links to the SLURM doc site.
SLURM | SLURM Example | |
---|---|---|
Submit a batch serial job | sbatch | sbatch runscript.sh |
Run a script interatively | srun | srun --pty -p interact -t 10 --mem 1000 /bin/bash /bin/hostname |
Kill a job | scancel | scancel 999999 |
View status of queues | squeue | squeue -u akitzmiller |
Check current job by id | sacct | sacct -j 999999 |