Running jobs with PBS on Linux clusters


Running Jobs:

There are 3 possible job queues to choose from:

  • single - Used for jobs that will only execute on a single node, i.e. nodes=1:ppn<=8. Currently, this queue is only enabled for the 24 GB nodes, and has a limit of 168 hours (7 days) of wallclock time.

  • workq - Used for jobs that will use at least one node, i.e. nodes>=1:ppn=8. Currently, this queue is only enabled for the 24 GB nodes, and has a limit of 72 hours (3 days) of wallclock time.

  • bigmem - Used for jobs that want to use the 48 and 96 GB nodes. This queue has a limit of 48 hours (2 days) of wallclock time.

Single Queue Job Script Template

$ cat ~/script

#!/bin/bash
#PBS -q single
#PBS -l nodes=1:ppn=1 
#PBS -l walltime=HH:MM:SS
#PBS -o desired_output_file_name
#PBS -N NAME_OF_JOB

/path/to/your/executable

Work Queue Job Script Template

$ cat ~/script

#!/bin/bash
#PBS -q workq
#PBS -l nodes=1:ppn=8
#PBS -l walltime=HH:MM:SS
#PBS -o desired_output_file_name
#PBS -j oe 
#PBS -N NAME_OF_JOB

# mpi jobs would execute:
#   mpirun -np 8 -machinefile $PBS_NODEFILE /path/to/your/executable
# OpenMP jobs would execute:
#   export OMP_NUM_THREADS=8; /path/to/your/executable

Bigmem Queue Job Script Template

$ cat ~/script

#!/bin/bash
#PBS -q bigmem
# Request to use a 48 GB node, similarly could request mem96
#PBS -l nodes=1:ppn=8:mem48
#PBS -l walltime=HH:MM:SS
#PBS -o desired_output_file_name
#PBS -j oe 
#PBS -N NAME_OF_JOB

# mpi jobs would execute:
#    mpirun -np 8 -machinefile $PBS_NODEFILE /path/to/your/executable
# OpenMP jobs would execute:
#    export OMP_NUM_THREADS=8; /path/to/your/executable

Submit the job by executing:

$ qsub script

Monitoring Jobs

The following commands can be used to view/modify the queue

  • qdel jobid - deletes a PBS job in the queue.

  • qstat - shows you the status of your job and the jobs of others in the queue. It can show you various other bits of information about your job as well, such as the number of nodes it intends to use, the name of the queue it's in, etc

  • showq - displays jobs info within the batch system.

  • showstart jobid - gives an estimated starting time for your job.

  • checkjob jobid - displays detailed job state information

 

16604
4/15/2020 2:59:51 PM