Texas Tech University

Array Jobs

A common problem that users have to solve in an HPC environment is how to best run the same script/application numerous times with only small changes between each run. For example, you may have 100 data sets that you wish to run through a single application on the cluster. The naive solution is to manually or programatically generate 100 shell scripts and then submit them all to the queue. This is horribly inefficient on both the head node and you.

An alternative solution exists on the HPCC clusters - array jobs. The advantages of using array jobs are:

  1. You put considerably less burden on the head nodes.
  2. You only have to write a single shell script.
  3. You don't have to worry about generating, submitting or deleting hundreds or thousands of shell scripts.
  4. If you submit an array job and then realize you made a mistake, you only have one job id to delete (qdel), instead of trying to remove hundreds of jobs.

The fact is, there are absolutely no disadvantages to using array jobs. Submitting an array job to do 1000 computations is equivalent to submitting 1000 separate scripts, but is much less work for you and the system.

Keep in mind that this guide will assume that you are knowledgeable in the process fo writing submission scripts and submitting normal jobs to the HPCC clusters. If you don't feel confident in performing those tasks, please first read the Job Submission Guide.

Table of Contents

  1. Writing and Submitting and Array Job
    1. Scheduler Parameters for Array Jobs
    2. Environment Variables for Array Jobs
  2. Quanah Array Job Submission Tutorial
  3. Hrothgar Array Job Submission Tutorial

Writing and Submitting an Array Job

The process for writing and submitting an array job is near identical to the process used to write and submit standard jobs on the HPCC Clusters - see Job Submission Guide. The primary difference is with the addition of a few new "environment variables" and a new argument added to your job submission script.

Scheduler Parameters for Array Jobs

Similar to submitting a standard job, every array job that runs requires a submission script that contains the parameters for the job scheduler as well as the commands you wish to run on the cluster. Running an array job requires you to add an additional parameter (-t) to the submission script.  The -t parameter will look as follows:

#$ -t <start_num>-<end_num>:<increment>

Such that

  • <start_num>
    • Is an integer that is >0
    • Defines the task id for the first task in the array job.
  • <end_num>
    • Is an integer that is >start_num
    • Defines the task id for the last task in the array job.
  • increment
    • Is an integer that is >0
    • Defines the increment between task ids.

For example:

#$ -t 1-37:6

Would set the first task id as 1 and then run every 6th task id that is <=37.  In this case that would be tasks 1,7,13,19,25,31 and 37.

Note: You are not required to provide an <increment> value. If one is not provided it will default to 1.

 

Environment Variables for Array Jobs

When writing a submission script, you may have noticed the use of some environmental variables that would exist once the job has been submitted.  These included variables such as $JOB_ID and $JOB_NAME which would correspond the ID given to the job by the scheduler and the name defined by the -n option, respectively. While writing submission scripts for array jobs, the scheduler will also define a few additional variables that your script can make use of. These new variables are defined below:

  • $SGE_TASK_ID
    • When the code in your submission script is executed on a compute node, this will be the task ID associated with the currently running task.
  • $SGE_TASK_FIRST
    • This environment variable contains the integer value defined by <start_num> in your -t parameter.
  • $SGE_TASK_LAST
    • This environment variable contains the integer value defined by <end_num> in your -t parameter.
  • $SGE_TASK_STEPSIZE
    • This environment variable contains the integer value defined by <step> in your -t parameter.
    • This value will default to 1 if you do not specify a step value.

 

Quanah Array Job Submission Tutorial

Submitting an array job on Quanah can be done using the following steps:

Step 1. Log on to quanah.hpcc.ttu.edu using your eRaider account and password.

Step 2. Create a new directory then copy the tutorial job script for Quanah to your new folder.

mkdir arrayTest
cd arrayTest/
cp /lustre/work/examples/quanah/array.sh .
ls

You now have a copy of the array job submission script file for this particular tutorial. To read this file we can use the command cat array.sh which will print out the contents of this script.

quanah:/arrayTest$ cat array.sh
#!/bin/sh
#$ -V
#$ -cwd
#$ -S /bin/bash
#$ -N ArrayTestJob
#$ -q omni
#$ -pe fill 1
#$ -P quanah
#$ -t 1-37:6
#The variable $SGE_TASK_ID is the ID for this task.
#The variable $SGE_TASK_FIRST is the ID for the first task.
#The variable $SGE_TASK_LAST is the ID for the last task.
if [[ $SGE_TASK_ID == $SGE_TASK_FIRST ]]; then
position="first"
elif [[ $SGE_TASK_ID == $SGE_TASK_LAST ]]; then
position="last"
else
position="neither"
fi
#The variable $SGE_TASK_STEPSIZE tells you the size of each "step". Default is 1.
echo "Job ID: $JOB_ID"
echo "Task ID: $SGE_TASK_ID"
echo "ID increment (step): $SGE_TASK_STEPSIZE"
echo "First or last: $position"

This script will request to run 7 tasks (task ids: 1, 7, 13, 19, 25, 31 and 37) and for each task the node will print out the job ID, task ID, ID increment (step) and whether the task was the first task, last task or neither.

Note: Unlike in previous submission scripts, array tasks should NOT include the -e and -o options. These will be generated automatically for each array task using the format: <job_name>.o<job_id>.<task_id> and <job_name>.e<job_id>.<task_id>.

Step 3. We will now submit the array job to the Quanah cluster using the qsub command.

qsub array.sh

Step 4. Once your job has completed, list the directory and view some of the newly created output files. For each task ID, you will see the following files:

  • ArrayTestJob.o<job_id>.<task_id>
    • This file will contain the output for your task.
  • ArrayTestJob.e<job_id>.<task_id>
    • This file will contain any errors generated by your task.
  • ArrayTestJob.po<job_id>.<task_id>
    • This file will contain information about the node that task in your array job was scheduled to.
  • ArrayTestJob.pe<job_id>.<task_id>
    • This file will contain errors from the node that task in your array job was scheduled to.

Step 5. Edit the array.sh file and change the line "#$ -t 1-37:6" to instead say "#$ -t 1-6". Now rerun steps 3 and 4 and see what has changed. Notice how the array job now only ran 5 array tasks with IDs 1,2,3,4, and 5. Keep in mind that if you do not specifically set an increment, it will default to an increment of 1.

Step 6. Congratulations, you have no successfully set up and run an array job on Quanah!

 

Hrothgar Array Job Submission Tutorial

Submitting an array job on Hrothgar can be done using the following steps:

Step 1. Log on to hrothgar.hpcc.ttu.edu using your eRaider account and password.

Step 2. Create a new directory then copy the tutorial job script for Hrothgar to your new folder.

mkdir arrayTest
cd arrayTest/
cp /lustre/work/examples/hrothgar/array.sh .
ls

You now have a copy of the array job submission script file for this particular tutorial. To read this file we can use the command cat array.sh which will print out the contents of this script.

hrothgar:/arrayTest$ cat array.sh
#!/bin/sh
#$ -V
#$ -cwd
#$ -S /bin/bash
#$ -N ArrayTestJob
#$ -q west
#$ -pe west 12
#$ -P hrothgar
#$ -t 1-37:6
#The variable $SGE_TASK_ID is the ID for this task.
#The variable $SGE_TASK_FIRST is the ID for the first task.
#The variable $SGE_TASK_LAST is the ID for the last task.
if [[ $SGE_TASK_ID == $SGE_TASK_FIRST ]]; then
     position="first"
elif [[ $SGE_TASK_ID == $SGE_TASK_LAST ]]; then
position="last"
else
position="neither"
fi

#The variable $SGE_TASK_STEPSIZE tells you the size of each "step". Default is 1.
echo "Job ID: $JOB_ID"
echo "Task ID: $SGE_TASK_ID"
echo "ID increment (step): $SGE_TASK_STEPSIZE"
echo "First or last: $position"

This script will request to run 7 tasks (task ids: 1, 7, 13, 19, 25, 31 and 37) and for each task the node will print out the job ID, task ID, ID increment (step) and whether the task was the first task, last task or neither.

Note: Unlike in previous submission scripts, array tasks should NOT include the -e and -o options. These will be generated automatically for each array task using the format: <job_name>.o<job_id>.<task_id> and <job_name>.e<job_id>.<task_id>.

Step 3. We will now submit the array job to the Hrothgar cluster using the qsub command.

qsub array.sh

Step 4. Once your job has completed, list the directory and view some of the newly created output files. For each task ID, you will see the following files:

  • ArrayTestJob.o<job_id>.<task_id>
    • This file will contain the output for your task.
  • ArrayTestJob.e<job_id>.<task_id>
    • This file will contain any errors generated by your task.
  • ArrayTestJob.po<job_id>.<task_id>
    • This file will contain information about the node that task in your array job was scheduled to.
  • ArrayTestJob.pe<job_id>.<task_id>
    • This file will contain errors from the node that task in your array job was scheduled to.

Step 5. Edit the array.sh file and change the line "#$ -t 1-37:6" to instead say "#$ -t 1-6". Now rerun steps 3 and 4 and see what has changed. Notice how the array job now only ran 5 array tasks with IDs 1,2,3,4, and 5. Keep in mind that if you do not specifically set an increment, it will default to an increment of 1.

Step 6. Congratulations, you have no successfully set up and run an array job on Hrothgar!

High Performance Computing Center