2.1 MARCO TEÓRICO
2.1.2 PARTICIPACIÓN CIUDADANA
2.1.2.2 Toma de decisiones
First make sure that the Sun Grid Engine daemons are running. In order to look for the cod_qmaster, cod_schedd and cod_commd daemons on the master machine,
login to the master host and execute the UNIX command ps -ax if the master host
runs a BSD based UNIX or ps -ef if the master host's UNIX is SYSV based. Parse
through the output of ps and look for the string cod_qmaster. If you do not find
lines (in the BSD case) looking for example like:
14673 p1 S < 2:12 /usr/CODINE/bin/sun4/cod_commd 14676 p1 S < 4:47 /usr/CODINE/bin/sun4/cod_qmaster 14678 p1 S < 9:22 /usr/CODINE/bin/sun4/cod_schedd
or (in the SYSV case) like:
root 439 1 0 Jun 22 ? 3:37 /usr/CODINE/bin/sgi/cod_commd root 442 1 0 Jun 22 ? 3:37 /usr/CODINE/bin/sgi/cod_qmaster root 446 1 0 Jun 22 ? 3:37 /usr/CODINE/bin/sgi/cod_schedd
one or multiple of the Sun Grid Engine daemons required on the master host are not running on this machine (you can look into the file
<codine_root>/<cell>/common/act_qmaster_name whether you really are on the
master host). You can try to restart the daemons by hand. Section "Sun Grid Engine Daemons and Hosts" on page 56 describes how to proceed.
% qconf -ah admin_host_name[,...] % qconf -as submit_host_name[,...]
Chapter 2 Installation and Administration Guide 53535353
In order to look for the daemons required on the execution machines, login to the execution hosts the Sun Grid Engine execution host installation procedure was run on. Again execute ps and look for the string cod_execd in the output. If you do not
find lines like (in the BSD case):
14685 p1 S < 1:13 /usr/CODINE/bin/sun4/cod_commd 14688 p1 S < 4:27 /usr/CODINE/bin/sun4/cod_execd
or (in the SYSV case) like:
root 169 1 0 Jun 22 ? 2:04 /usr/CODINE/bin/sgi/cod_commd root 171 1 0 Jun 22 ? 7:11 /usr/CODINE/bin/sgi/cod_execd
one or multiple daemons required on the execution host are not running. Again section "Sun Grid Engine Daemons and Hosts" on page 56 describes how to restart the daemons by hand.
If both the necessary daemons run on the master and execution hosts the Sun Grid Engine system should be operational. You can check if Sun Grid Engine accepts commands by simply typing:
from the command line when logged into either the master host or another administrative host (do not forget to include the path where you installed the Sun Grid Engine binaries into your standard search path). This qconf command
displays the current global cluster configuration (see section Cluster Configuration on page 70).
If this command fails, most probably either your CODINE_ROOT environment
variable is set inappropriately or qconf fails to contact the cod_commd associated
with cod_qmaster. In this case, you should check whether the script files
<codine_root>/<cell>/common/settings.csh or
<codine_root>/<cell>/common/settings.sh set the environment variable COMMD_PORT. If so, please make sure that the environment variable COMMD_PORT
is set to that particular value before you try the above command again. If the
COMMD_PORT variable is not used in the settings files, the services database (e.g. /etc/services or the NIS services map) on the machine you executed the
command must provide a cod_commd entry. If this is not the case, please add such
an entry to the machines services database and give it the same value as is configured on the Sun Grid Engine master host. Then retry the qconf command.
Before you start submitting batch scripts to the Sun Grid Engine system, please check if your sites standard and your personal shell resource files (.cshrc,
.profile or .kshrc) contain inconvenient commands like stty (batch jobs do not % qconf -sconf
54 54 54
54 Sun Grid Engine July 2001
have a terminal connection by default and, therefore, calls to stty will result in an
error). An easy way to do this is to login to the master host and to execute the command:
an_exec_host means one of the already installed execution hosts you are going to use (you should check on all execution hosts if your login and/or home directories differ from host to host). The rsh command should give you an output very similar to the date command executed locally on the master host. If there are any additional lines
containing error messages, the reasons for the errors must be removed prior to be able to run a batch job successfully.
For all command interpreters you can check on an actual terminal connection before you execute a command like tty. The following is a Bourne-/Korn-Shell example
how to do this:
The C-Shell syntax is very similar:
Note Note Note
Note The leading tty-s is an exception as it causes no problems with batch
execution.
Now you are ready to submit batch jobs. First you should try to submit one of the example scripts contained in the directory <codine_root>/examples/jobs. To
submit them, just use the command:
% rshan_exec_hostdate tty -s if [ $? = 0 ]; then stty erase ^H fi tty -s if ( $status = 0 ) then stty erase ^H endif % qsubscript_path
Chapter 2 Installation and Administration Guide 55555555
and use the Sun Grid Engine qstat command to monitor the jobs behavior (please
refer to the Sun Grid Engine Users Guide for more information about submitting and monitoring batch jobs). As soon as the job has finished execution please check your home directory for the redirected stdout/stderr files <script_name>.e<job_id> and
<script_name>.o<job_id> with <job_id> being a consecutive unique integer number
assigned to each job.
In case of problems, please see section Trouble Shooting on page 158.