Statistics SLURM Cluster

What has changed?

The linux servers in the Statistics department have historically been managed as an independent units, which does not allow for easy load balancing of work; some nodes end up with a large amount of processes, while others are idle. To help address this issue, we are implementing a queue system. 

Systems that will be part of the Statistics queue:

  • linux10.stat.iastate.edu
  • linux11.stat.iastate.edu
  • thirteen01.stat.iastate.edu
  • thirteen02.stat.iastate.edu
  • thirteen03.stat.iastate.edu
  • thirteen04.stat.iastate.edu
  • thirteen05.stat.iastate.edu
  • impact1.stat.iastate.edu
  • impact2.stat.iastate.edu
  • impact3.stat.iastate.edu
  • impact4.stat.iastate.edu

As machines are added to the queue system, in order to run jobs you will need to submit your job through the workload manager (slurm). 

What is SLURM

Our clusters consists of many compute nodes, but at the same time have many users submitting many jobs. So, we need a mechanism to distribute the jobs across the nodes in a reasonable fashion and and SLURM is the one we are using now. ​Slurm (Simple Linux Utility for Resource Management) is a highly configurable open source workload and resource manager designed for Linux clusters of all sizes. Its key features are:

  • extensive scheduling options including advanced reservations,
  • suspend/resume for supporting binaries,
  • scheduler backfill,
  • fair-share scheduling, and
  • preemptive scheduling for critical jobs.

Slurm provides similar function as torque. But, the some commands are different on Slurm and Torque. For example, to see a list of all jobs on the cluster, using Moab/Torque, one would issue just the qstat command whereas the Slurm equivalent would be the squeue command:

How to access the SLURM queue for Stat

Login to the machine smaster.stat.iastate.edu from any SSH client. Here is a tutorial for SSH Terminal Access

How to submit jobs ?

A basic job submission workflow can be found at http://info.brightcomputing.com/Blog/bid/174099/Slurm-101-Basic-Slurm-Usage-for-Linux-Clusters.

You can submit your jobs in either PBS or slurm format to queue a job for execution via SLURM. You can find some example submission scripts on Research IT's SLURM basics page

Some useful commands (pbs version):

  1. qsub myscript: This command is used to submit the job. Your jobs will be scheduled for queues on the basis of resources requested. 
  2. qstat -q: It gives the status of all the ques and the current queue structure

Some useful commands (slurm version):

  1. srun myscript: This command is used to submit the job. Your jobs will be scheduled for queues on the basis of resources requested. 
  2. squeue: It gives the status of all the ques and the current queue structure

A helpful comparison cheat sheet is available at http://www.schedmd.com/slurmdocs/rosetta.pdf.

Queue structure:

The queue structure has been created to be as flexible as possible given the hardware available, and type of jobs typically run in the department.  The queue structure should allow small jobs to always get through the queue in a short period of time, without waiting behind larger jobs.  It also allows the flexibility for large jobs of unknown length to run without being interrupted by an arbitrary queue wall.

Default RAM is 3G per core requested. You may request more RAM in your job submission script if needed.

  • Short:
    • Up to 30 minutes
    • All cores on all machines are available, with 8 cores guaranteed available only for short jobs
  • Medium:
    • Up to 8 hours
    • All cores on all but one machine are available (max of 12 cores / node)
  • Long:
    • Up to 48 hours
    • All cores on 4 machines available (max of 12 cores / node) 
  • Unlimited:
    • Unlimited runtime
    • All cores on 4 machines available (max of 12 cores / node)

File Storage on STAT cluster

Home folder

Each user has a home directory, i.e. /home/<username> which is mounted from a central storage server called 'shome'.  This server is used to hold the common home directories for the STAT cluster.  Your home directory can be accessed by all of the compute nodes in the cluster (linux*, impact*, thirteen*, etc.).  You home directory is where you should put your job submission scripts, programs, and input/output files. This directory is not backed up. When your job is done running, you should copy the results you want to save to your work folder or another location that is secure and backed up.

Work folder

Within your home folder, there is a folder called 'work'.  This folder lives on the EMC Isilon (aka MyFiles), and is within the STAT department folder.  This folder is backed up and is served by redundant servers.  You should not run jobs from this folder, or reference files in this folder as input or output to your jobs.  This folder should be used as somewhere to store your raw data, and final results. You can copy between this folder and your home directory while on smaster before submitting your job.

Where to find help

For more details, please reference the official SLURM documentation. You can also find some examples on Research IT's SLURM basics page

If you have any further questions, please contact stat-tech@iastate.edu or researchit@iastate.edu

Page