Capítulo 1 Patógenos
1.10. Cancro por Chrysoporthe cubensis
For simple parallel jobs you can use the lstools commands to start parts of the job on other hosts. Because the lstools commands handle signals transparently, LSF Batch can suspend and resume all components of your job without additional programming.
The simplest parallel job runs an identical copy of the executable on every host. The lsgrun command takes a list of host names and runs the specified task on each host. The lsgrun -p option specifies that the task should be run in parallel on each host. The example below submits a job that uses lsgrun to run myjob on all the selected batch hosts in parallel:
% bsub -n 10 ’lsgrun -p -m "$LSB_HOSTS" myjob’ Job <3856> is submitted to default queue <normal>.
For more complicated jobs, you can write a shell script that runs lsrun in the background to start each component.
8VLQJ/6)0DNHWR5XQ3DUDOOHO%DWFK-REV
For parallel jobs that have a variety of different components to run, you can use LSF Make. Create a makefile that lists all the components of your batch job and then submit the LSF Make command to LSF Batch. The following example shows a bsub
command and Makefile for a simple parallel job. % bsub -n 4 lsmake -f Parjob.makefile
Job <3858> is submitted to default queue <normal>. % cat Parjob.makefile
# Makefile to run example parallel job using lsbatch and LSF Make all: part1 part2 part3 part4
/6)%DWFK8VHU·V*XLGH
part1 part2 part3: myjob data.$@
part4: myjob2 data.part1 data.part2 data.part3
The batch job has four components. The first three components run the myjob command on the data.part1, data.part2 and data.part3 files. The fourth component runs the myjob2 command on all three data files. There are no dependencies between the components, so LSF Make runs them in parallel.
6XEPLWWLQJ390-REVWR/6)%DWFK
PVM is a parallel programming system distributed by Oak Ridge National
Laboratories. PVM programs are controlled by a file, the PVM hosts file, that contains host names and other information. The pvmjob shell script supplied with LSF can be used to run PVM programs as parallel LSF Batch jobs. The pvmjob script reads the LSF Batch environment variables, sets up the PVM hosts file and then runs the PVM job. If your PVM job needs special options in the hosts file, you can modify the pvmjob script. For example, if the command line to run your PVM job is:
% myjob data1 -o out1
the following command submits this job to LSF Batch to run on 10 hosts: % bsub -n 10 pvmjob myjob data1 -o out1
Other parallel programming packages can be supported in the same way. The p4job shell script runs jobs that use the P4 parallel programming library. Other packages can be handled by creating similar scripts.
6XEPLWWLQJ03,-REVWR/6)%DWFK
The Message Passing Interface (MPI) is a portable library that supports parallel programming. LSF supports MPICH, a joint implementation of MPI by Argonne National Laboratory and Mississippi State University. This version supports both TCP/IP and IBM’s Message Passing Library (MPL) communication protocols.
LSF provides an mpijob shell script that you can use to submit MPI jobs to LSF Batch. The mpijob script writes the hosts allocated to the job by the LSF Batch system to a file
&XVWRPL]LQJ%DWFK-REVIRU/6)
and supplies the file as an option to MPICH’s mpirun command. The syntax of the mpijob command is:
mpijob option mpirun program arguments Here, option is one of the following:
-tcp Write the LSF Batch hosts to a PROCGROUP file, supply the -p4pg
procgroup_file option to the mpirun command, and use the TCP/IP protocol. This is the default.
-mpl Write the LSF Batch hosts to a MACHINE file, supply the -machinefile machine_file option to the mpirun command, and use the MPL on an SP-2 system.
The following examples show how to use mpijob to submit MPI jobs to LSF Batch. To submit a job requesting four hosts and using the default TCP/IP protocol, use: % bsub -n 4 mpijob mpirun myjob
Note
Before you can submit a job to a particular pool of IBM SP-2 nodes, an LSF
administrator must install the SP-2 ELIM. The SP-2 ELIM provides the pool number and lock status of each node.
To submit the same job to run on four nodes in pool 1 on an IBM SP-2 system using MPL, use:
% bsub -n 4 -R "pool == 1" mpijob -mpl mpirun myjob
To submit the same job to run on four nodes in pool 1 that are not locked (dedicated to using the High Performance Switch) on an SP-2 system using MPL, use:
% bsub -n 4 -q mpiq -R "pool == 1 && lock == 0" mpijob -mpl mpirun myjob Note
Before you can submit a job using the IBM SP-2 High Performance Switch in dedicated mode, an LSF administrator must set up a queue for automatic requeue on job failure. The job queue will automatically requeue a job that failed because an SP-2 node was locked after LSF Batch selected the node but before the job was dispatched.