X BOL wishing you a very and Happy New year

Alternative content

History

Our Sponsors



Download BioinformaticsOnline(BOL) Apps in your chrome browser.




SLURM basics !: Revision

SLURM is a queue management system and stands for Simple Linux Utility for Resource Management. SLURM was developed at the Lawrence Livermore National Lab and currently runs some of the largest compute clusters in the world.

SLURM is similar in many ways to most other queue systems. You write a batch script then submit it to the queue manager. The queue manager then schedules your job to run on the queue (or partition in SLURM parlance) that you designate. Below we will provide an outline of how to submit jobs to SLURM, how SLURM decides when to schedule your job and how to monitor progress.

SLURM has a number of valuable features compared to other job management systems:

  • Kill and Requeue SLURM’s ability to kill and requeue is superior to that of other systems. It waits for jobs to be cleared before scheduling the high priority job. It also does kill and requeue on memory rather than just on core count.
  • Memory Memory requests are sacrosanct in SLURM. Thus the amount of memory you request at run time is guaranteed to be there. No one can infringe on that memory space and you cannot exceed the amount of memory that you request.
  • Accounting Tools SLURM has a back end database which stores historical information about the cluster. This information can be queried by the users who are curious about how much resources they have used.

Summary of SLURM commands

The table below shows a summary of SLURM commands. These commands are described in more detail below along with links to the SLURM doc site.

 SLURMSLURM Example
Submit a batch serial job sbatch sbatch runscript.sh
Run a script interatively srun srun --pty -p interact -t 10 --mem 1000 /bin/bash /bin/hostname
Kill a job scancel scancel 999999
View status of queues squeue squeue -u akitzmiller
Check current job by id sacct sacct -j 999999