LoadLeveler - IBM Tivoli Workload Scheduler (Outdated)¶
Warning
This page is deprecated.
Job Submission¶
First of all, to submit a job to LoadLeveler a job file needs to be created. This job file can be passed to the command:
llsubmit [llsubmit_options] <job_file>
Job File Examples¶
Serial Batch Jobs¶
An example job file may look like this:
#@ job_name = my_job
#@ output = $(job_name).$(jobid).out
#@ error = $(job_name).$(jobid).err
#@ class = short
#@ group = triton-ww | triton-ipf | triton-ism | triton-et
#@ wall_clock_limit = 00:30:00
#@ resources = ConsumableMemory(1 gb)
#@ environment = COPY_ALL
#@ notification = complete
#@ notify_user = your_email@adress
#@ queue
./my_serial_program
This example requests a serial job with a runtime of 30 minutes and a overall memory requirement of 1 GB. There are four groups available, don't forget to choose the one and only matching group. When the job completes, a mail will be sent which includes details about resource usage.
MPI Parallel Batch Jobs¶
An example job file may look like this:
#@ job_name = my_job
#@ output = $(job_name).$(jobid).out
#@ error = $(job_name).$(jobid).err
#@ job_type = parallel
#@ node = 2
#@ tasks_per_node = 8
#@ class = short
#@ group = triton-ww | triton-ipf | triton-ism | triton-et
#@ wall_clock_limit = 00:30:00
#@ resources = ConsumableMemory(1 gb)
#@ environment = COPY_ALL
#@ notification = complete
#@ notify_user = your_email@adress
#@ queue
mpirun -x OMP_NUM_THREADS=1 -x LD_LIBRARY_PATH -np 16 ./my_mpi_program
This example requests a parallel job with 16 processes (2 nodes, 8 tasks
per node), a runtime of 30 minutes, 1 GB memory requirement per task
and therefore a overall memory requirement of 8 GB per node. Please
keep in mind that each node on Triton only provides 45 GB. The choice
of the correct group is also important and necessary. The -x
option of
mpirun
exports the specified environment variables to all MPI
processes.
OMP_NUM_THREADS=1
: If you are using libraries like MKL, which are multithreaded, you always should set the number of threads explicitly so that the nodes are not overloaded. Otherwise you will experience heavy performance problems.LD_LIBRARY_PATH
: If your program is linked with shared libraries (like MKL) which are not standard system libraries, you must export this variable to the MPI processes.
When the job completes, a mail will be sent which includes details about resource usage.
Before submitting MPI jobs, ensure that the appropriate MPI module is loaded, e.g issue:
# module load openmpi
Hybrid MPI+OpenMP Parallel Batch Jobs¶
An example job file may look like this:
#@ job_name = my_job
#@ output = $(job_name).$(jobid).out
#@ error = $(job_name).$(jobid).err
#@ job_type = parallel
#@ node = 4
#@ tasks_per_node = 8
#@ class = short
#@ group = triton-ww | triton-ipf | triton-ism | triton-et
#@ wall_clock_limit = 00:30:00
#@ resources = ConsumableMemory(1 gb)
#@ environment = COPY_ALL
#@ notification = complete
#@ notify_user = your_email@adress
#@ queue
mpirun -x OMP_NUM_THREADS=8 -x LD_LIBRARY_PATH -np 4 --bynode ./my_hybrid_program
This example requests a parallel job with 32 processes (4 nodes, 8 tasks
per node), a runtime of 30 minutes, 1 GB memory requirement per task
and therefore a overall memory requirement of 8 GB per node. Please
keep in mind that each node on Triton only provides 45 GB. The choice
of the correct group is also important and necessary. The mpirun
command
starts 4 MPI processes (--bynode
forces one process per node).
OMP_NUM_THREADS
is set to 8, so that 8 threads are started per MPI
rank. When the job completes, a mail will be sent which includes details
about resource usage.
Job File Keywords¶
Keyword | Valid values | Description |
---|---|---|
notification |
always , error , start , never , complete |
When to write notification email. |
notify_user |
valid email address | Notification email address. |
output |
file name | File for stdout of the job. |
error |
file name | File for stderr of the job. |
job_type |
parallel , serial |
Job type, default is serial . |
node |
1 - 64 |
Number of nodes requested (parallel jobs only). |
tasks_per_node |
1 - 8 |
Number of processors per node requested (parallel jobs only). |
class |
see llclass |
Job queue. |
group |
triton-ww , triton-ipf , triton-ism , triton-et |
choose matching group |
wall_clock_limit |
HH:MM:SS | Run time limit of the job. |
resources |
name(count) ... name(count) |
Specifies quantities of the consumable resources consumed by each task of a job step |
Further Information: [[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl35.admin.doc/am2ug_jobkey.html][Full description of keywords]].
Submit a Job without a Job File¶
Submission of a job without a job file can be done by the command:
llsub [llsub_options] <command>
This command is not part of the IBM LoadLeveler software but was developed at ZIH.
The job file will be created in background by means of the command line options. Afterwards, the job
file will be passed to the command llsubmit
which submit the job to LoadLeveler (see above).
Important options are:
Option | Default | Description |
---|---|---|
-J <name> |
llsub |
Specifies the name of the job. You can name the job using any combination of letters, numbers, or both. The job name only appears in the long reports of the llq , llstatus , and llsummary commands. |
-n |
1 |
Specifies the total number of tasks of a parallel job you want to run on all available nodes. |
-T |
not specified | Specifies the maximum number of OpenMP threads to use per process by setting the environment variable OMP_NUM_THREADS to number. |
--o, -oo <filename> |
<jobname>.<hostname>.<jobid>.out |
Specifies the name of the file to use as standard output (stdout) when your job step runs. |
-e, -oe <filename> |
<jobname>.<hostname>.<jobid>.err |
Specifies the name of the file to use as standard error (stderr) when your job step runs. |
-I |
not specified | Submits an interactive job and sends the job's standard output (or standard error) to the terminal. |
-q <name> |
non-interactive: short interactive(n1): =interactive interactive(n>1): interactive_par |
Specifies the name of a job class defined locally in your cluster. You can use the llclass command to find out information on job classes. |
-x |
not specified | Puts the node running your job into exclusive execution mode. In exclusive execution mode, your job runs by itself on a node. It is dispatched only to a node with no other jobs running, and LoadLeveler does not send any other jobs to the node until the job completes. |
-hosts <number> |
automatically | Specifies the number of nodes requested by a job step. This option is equal to the bsub option -R "span\[hosts=number\]" . |
-ptile <number> |
automatically | Specifies the number of nodes requested by a job step. This option is equal to the bsub option -R "span\[ptile=number\]" . |
-mem <size> |
not specified | Specifies the requirement of memory which the job needs on a single node. The memory requirement is specified in MB. This option is equal to the bsub option -R "rusage\[mem=size\]" . |
The option -H
prints the list of all available command line options.
Here is an example for an MPI Job:
llsub -T 1 -n 16 -e err.txt -o out.txt mpirun -x LD_LIBRARY_PATH -np 16 ./my_program
Interactive Jobs¶
Interactive Jobs can be submitted by the command:
llsub -I -q <interactive> <command>
LoadLeveler Runtime Environment Variables¶
LoadLeveler runtime variables give you some information within the job script, for example:
#@ job_name = my_job
#@ output = $(job_name).$(jobid).out
#@ error = $(job_name).$(jobid).err
#@ job_type = parallel
#@ node = 2
#@ tasks_per_node = 8
#@ class = short
#@ wall_clock_limit = 00:30:00
#@ resources = ConsumableMemory(1 gb)
#@ environment = COPY_ALL
#@ notification = complete
#@ notify_user = your_email@adress
#@ queue
echo $LOADL_PROCESSOR_LIST
echo $LOADL_STEP_ID
echo $LOADL_JOB_NAME
mpirun -np 16 ./my_mpi_program
Further Information: [[http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cluster.loadl35.admin.doc/am2ug_envvars.html][Full description of variables]].
Job Queues¶
The llclass
command provides information about each queue. Example
output:
Name MaxJobCPU MaxProcCPU Free Max Description
d+hh:mm:ss d+hh:mm:ss Slots Slots
--------------- -------------- -------------- ----- ----- ---------------------
interactive undefined undefined 32 32 interactive, exclusive shared nodes, max. 12h runtime
triton_ism undefined undefined 8 80 exclusive, serial + parallel queue, nodes shared, unlimited runtime
openend undefined undefined 272 384 serial + parallel queue, nodes shared, unlimited runtime
long undefined undefined 272 384 serial + parallel queue, nodes shared, max. 7 days runtime
medium undefined undefined 272 384 serial + parallel queue, nodes shared, max. 3 days runtime
short undefined undefined 272 384 serial + parallel queue, nodes shared, max. 4 hours runtime
Job Monitoring¶
All Jobs in the Queue¶
# llq
All of One's Own Jobs¶
# llq -u username
Details About Why A Job Has Not Yet Started¶
# llq -s job-id
The key information is located at the end of the output, and will look similar to the following:
==================== EVALUATIONS FOR JOB STEP l1f1n01.4604.0 ====================
The class of this job step is "workq".
Total number of available initiators of this class on all machines in the cluster: 0
Minimum number of initiators of this class required by job step: 4
The number of available initiators of this class is not sufficient for this job step.
Not enough resources to start now.
Not enough resources for this step as backfill.
Or it will tell you the estimated start time:
==================== EVALUATIONS FOR JOB STEP l1f1n01.8207.0 ====================
The class of this job step is "checkpt".
Total number of available initiators of this class on all machines in the cluster: 8
Minimum number of initiators of this class required by job step: 32
The number of available initiators of this class is not sufficient for this job step.
Not enough resources to start now.
This step is top-dog.
Considered at: Fri Jul 13 12:12:04 2007
Will start by: Tue Jul 17 18:10:32 2007
Generate a long listing rather than the standard one¶
# llq -l job-id
This command will give you detailed job information.
Job Status States¶
State | Short | Description |
---|---|---|
Canceled | CA | The job has been canceled as by the llcancel command. |
Completed | C | The job has completed. |
Complete Pending | CP | The job is completed. Some tasks are finished. |
Deferred | D | The job will not be assigned until a specified date. The start date may have been specified by the user in the Job Command file or it may have been set by LoadLeveler because a parallel job could not obtain enough machines to run the job. |
Idle | I | The job is being considered to run on a machine though no machine has been selected yet. |
NotQueued | NQ | The job is not being considered to run. A job may enter this state due to an error in the command file or because LoadLeveler can not obtain information that it needs to act on the request. |
Not Run | NR | The job will never run because a stated dependency in the Job Command file evaluated to be false. |
Pending | P | The job is in the process of starting on one or more machines. The request to start the job has been sent but has not yet been acknowledged. |
Rejected | X | The job did not start because there was a mismatch or requirements for your job and the resources on the target machine or because the user does not have a valid ID on the target machine. |
Reject Pending | XP | The job is in the process of being rejected. |
Removed | RM | The job was canceled by either LoadLeveler or the owner of the job. |
Remove Pending | RP | The job is in the process of being removed. |
Running | R | The job is running. |
Starting | ST | The job is starting. |
Submission Error | SX | The job can not start due to a submission error. Please notify the Bluedawg administration team if you encounter this error. |
System Hold | S | The job has been put in hold by a system administrator. |
System User Hold | HS | Both the user and a system administrator has put the job on hold. |
Terminated | TX | The job was terminated, presumably by means beyond LoadLeveler's control. Please notify the Bluedawg administration team if you encounter this error. |
User Hold | H | The job has been put on hold by the owner. |
Vacated | V | The started job did not complete. The job will be scheduled again provided that the job may be rescheduled. |
Vacate Pending | VP | The job is in the process of vacating. |
Cancel a Job¶
A Particular Job¶
# llcancel job-id
All of One's Jobs¶
# llcancel -u username
Job History and Usage Summaries¶
On each cluster, there exists a file that contains the history of all
jobs run under LoadLeveler. This file is
/var/loadl/archive/history.archive
, and may be queried using the
llsummary
command.
An example of usage would be as follows:
# llsummary -u estrabd /var/loadl/archive/history.archive
And the output would look something like:
Name Jobs Steps Job Cpu Starter Cpu Leverage
estrabd 118 128 07:55:57 00:00:45 634.6
TOTAL 118 128 07:55:57 00:00:45 634.6
Class Jobs Steps Job Cpu Starter Cpu Leverage
checkpt 13 23 03:09:32 00:00:18 631.8
interactive 105 105 04:46:24 00:00:26 660.9
TOTAL 118 128 07:55:57 00:00:45 634.6
Group Jobs Steps Job Cpu Starter Cpu Leverage
No_Group 118 128 07:55:57 00:00:45 634.6
TOTAL 118 128 07:55:57 00:00:45 634.6
Account Jobs Steps Job Cpu Starter Cpu Leverage
NONE 118 128 07:55:57 00:00:45 634.6
TOTAL 118 128 07:55:57 00:00:45 634.6
The llsummary
tool has a lot of options, which are discussed in its
man pages.
Check status of each node¶
# llstatus
And the output would look something like:
root@triton[0]:~# llstatus
Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys
n01 Avail 0 0 Idle 0 0.00 2403 AMD64 Linux2
n02 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n03 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n04 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n05 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n06 Avail 0 0 Idle 0 0.71 9999 AMD64 Linux2
n07 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n08 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n09 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n10 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n11 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n12 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n13 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n14 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n15 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n16 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n17 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n18 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n19 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n20 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n21 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n22 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n23 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n24 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n25 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n26 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n27 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n28 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n29 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n30 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n31 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n32 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n33 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n34 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n35 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n36 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n37 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n38 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n39 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n40 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n41 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n42 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n43 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n44 Avail 0 0 Idle 0 0.01 9999 AMD64 Linux2
n45 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n46 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n47 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n48 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n49 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n50 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n51 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n52 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n53 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n54 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n55 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n56 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n57 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n58 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n59 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n60 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n61 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n62 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n63 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
n64 Avail 0 0 Idle 0 0.00 9999 AMD64 Linux2
triton Avail 0 0 Idle 0 0.00 585 AMD64 Linux2
AMD64/Linux2 65 machines 0 jobs 0 running tasks
Total Machines 65 machines 0 jobs 0 running tasks
The Central Manager is defined on triton
The BACKFILL scheduler is in use
All machines on the machine_list are present.
Detailed status information for a specific node:
# llstatus -l n54
Further information: IBM Documentation (see version 3.5)