• No se han encontrado resultados

Some batch jobs require resources that LSF does not directly support. For example, a batch job may need to reserve a tape drive or check for the availability of a software license.

The -E pre_exec_command option to the bsub command specifies an arbitrary command to run before starting the batch job. When LSF Batch finds a suitable host on which to run a job, the pre-execution command is executed on that host. If the pre- execution command runs successfully, the batch job is started.

An alternative to using the -E pre_exec_command option is for the LSF

administrator to set up a queue level pre-execution command. See ‘Queue-Level Pre-/

Post-Execution Commands’ on page 224 of the LSF Batch Administrator’s Guide for more

information.

By default, the pre-execution command is run under the same user ID, environment, and home and working directories as the batch job. For queue-level pre-execution commands, you can specify a different user ID by defining the

LSB_PRE_POST_EXEC_USER variable. If the pre-execution command is not in your normal execution path, the full path name of the command must be specified. For parallel batch jobs, the pre-execution command is run on the first selected host. The pre-execution command returns information to LSF Batch using the exit status. If the pre-execution command exits with non-zero status, the batch job is not dispatched.



6XEPLWWLQJ%DWFK-REV



The job goes back to the PEND state, and LSF Batch tries to dispatch another job to that host. The next time LSF Batch tries to dispatch jobs this process is repeated.

LSF Batch assumes that the pre-execution command runs without side effects. For example, if the pre-execution command reserves a software license or other resource, you must take care not to reserve the same resource more than once for the same batch job.

The following example shows a batch job that requires a tape drive. The tapeCheck program is a site specific program that exits with status zero if the specified tape drive is ready, and one otherwise:

% bsub -E "/usr/local/bin/tapeCheck /dev/rmt0l" myjob

-RE'HSHQGHQFLHV

Some batch jobs depend on the results of other jobs. For example, a series of jobs could process input data, run a simulation, generate images based on the simulation output, and finally, record the images on a high-resolution film output device. Each step can only be performed when the previous step completes and all subsequent steps must be aborted if any step fails.

The -w depend_cond option to the bsubcommand specifies a dependency condition, which is a logical expression based on the execution states of preceding batch jobs. When the depend_cond expression evaluates to TRUE, the batch job can be started. Complex conditions can be written using the logical operators ‘&&’ (AND), ‘||’ (OR), ‘!’ (NOT) and parentheses ‘()’.

If any one of the depended batch jobs is not found, bsub fails and the job is not submitted.

Inter-job dependency scheduling can be based on specific job exit status, so that a suitable recovery job can be initiated in case of specific types of job failures. The exit condition in the dependency string (specified in the -w option of bsub) can be triggered on particular exit code(s) of the dependant job. Relational operators can be used when a job needs to be triggered on a range of exit codes.

/6)%DWFK8VHU·V*XLGH 



If there is a space character, a logic operator or parentheses in the expression string, the string must be enclosed in single or double quotes (’ or ") to prevent the shell from interpreting the special characters.

Batch jobs are identified by job ID number or job name. The job ID number is displayed by the bsub command when the job is submitted. The job name is a string specified by the -J job_name option.

In job dependency expressions, numeric job names must be enclosed in quotes. Note that a numeric job name should be doubly quoted, e.g. -w "’210’", since the UNIX shell treats -w "210" the same as -w 210.

Job names refer to jobs submitted by the same user. If more than one of your jobs has the same name, the condition is tested on the last job submitted with that name. A wildcard character ‘*’ can be specified at the end of a job name to indicate all jobs matching the name. For example, jobA* will match jobA, jobA1, jobA_test, jobA.log etc. There must be at least one match.

The conditions that can be tested are:

started({jobID | jobName})

If the specified batch job has started running or has run to completion, the condition is TRUE; that is, the job is not in the PEND or PSUSP state, and also is not currently running the pre-execution command if the bsub -E option was specified.

done({jobID | jobName})

If the specified batch job has completed successfully and is in the DONE state, the condition is TRUE. Otherwise, it is FALSE.

exit({jobID | jobName})

If the specified batch job has terminated abnormally and is in the EXIT state, the condition is TRUE. Otherwise, it is FALSE.

exit({jobID | jobName}, [op] code)

If the specified job has terminated with the exit code specified by code, or with an exit code satisfying the relationship expressed by op code, the condition is



6XEPLWWLQJ%DWFK-REV



TRUE. Otherwise, it is FALSE. When a batch job is killed while pending, it is assigned a special exit code of 512.

The op variable may be any of the relational operators ‘>’, ‘>=’, ‘<‘, ‘<=’, ‘==’, ‘!=’. The code variable is numeric, representing a job exit code.

ended({jobID | jobName})

If the specified batch job has finished (either in the EXIT or DONE state), the condition is TRUE. Otherwise, it is FALSE.

{jobID | jobName}

Specifying only jobID or jobName is equivalent to done({jobID | jobName}). If the specified batch job has completed successfully and is in the DONE state, the condition is TRUE. Otherwise, it is FALSE.

-RE'HSHQGHQF\([DPSOHV

done(312) && (started(Job2)||exit(Job3))

The submitted job will not start until job 312 has completed successfully, and either the job named Job2 has started or the job named Job3 has terminated abnormally. 1532 || jobName2 || ended(jobName3*)

The submitted job will not start until either job 1532 has completed, the job named jobName2 has completed, or all jobs with names beginning with jobName3 have finished.

exit (34334, 12)

The submitted job will not start until job 34334 finishes with an exit code of 12. exit (myjob, < 30)

The submitted job will not start until myjob finishes with an exit code lower than 30.

Note

If you require more extensive dependencies, for example, calendar or event

/6)%DWFK8VHU·V*XLGH