• No se han encontrado resultados

CAPÍTULO 5: PLAN DE MEJORA

5.1. Propuesta de mejora en la implementación del SG-SST en la empresa

The Configured Nodes list box displays those nodes currently classified as "configured" and listed in the cluster config file with the node tag. This is the list you will typically be referring to most often when using beosetup, as these nodes are the ones that are formally treated as members of the cluster.

The columns of information that can be displayed for each node are described below. You can sort and reverse-sort the node list by the contents of any column, by simply clicking on the column heading. You can also specify which of the columns to display through the beosetup Preferences dialog box.

ID

This column displays the node’s assigned number in the cluster. The node numbers are assigned based on the node’s position in the cluster config file (the first node is given #0, the second is given #1...). While the numbers may appear somewhat arbitrary, they become very important once you consider what it might take to perform maintenance activities on a particular node in a 128 node cluster. If beosetup shows an error on node #56, how do you find it? A description of why beosetup treats these numbers as important, is found in the section on numbering your cluster’s nodes. Ethernet Hardware (MAC) Address

This column displays the node’s Ethernet hardware (MAC) address. A node’s address is stored with its associated entry in the cluster configuration file. As described below, a pop-up menu is available in this list box with an option to modify a node’s Ethernet hardware address.

IP Address

This column displays the node’s IP address on the cluster’s internal network interface. This number is computed from the node number and the starting address as defined by the cluster’s IP address range. For a given row in the list (i.e for a given node number), the IP address will remain the same until the IP address range is modified. The IP address range can be changed using the Network Properties panel of the Configuration dialog box.

State

This column displays the current state of the node as reported by BProc, the Beowulf unified process space software. The possible values for a node’s state are as follows:

down — node is not communicating with the master

unavailable — node has been marked unavailable or off-line

error — node encountered an error during boot

up — node is operating normally and is on-line

boot — node has started, but not yet completed booting

Node states are also indicated by colored highlights. You can specify a color for each node state in the beosetup Preferences dialog box.

Keep in mind that these states only indicate the condition of the node as reported by the BProc daemons. In turn, the BProc daemons are only capable of determining a node’s state based on the messages communicated (or not communicated, as the case may be) between a node and the master. For example, just because a node’s state is not listed as error doesn’t mean there isn’t some undetected hardware problem occurring within the node. These states are only indicative of how the BProc software sees the node. A pop-up menu is available in this list box with options for taking nodes on and off-line.

User

This column displays the user currently assigned as owner of the node. The value can be a user name, a numeric user id, or the Scyld-defined all encompassing user "any." The user permission setting for a node is analogous to the user ownership feature of files in Linux and basically controls who is allowed to use the node. As described below, a pop- up menu is available in this list box with an option for changing a node’s user. A node’s user data is stored with its associated entry in the cluster configuration file.

Group

This column displays the group currently assigned as owner of the node. The value can be a group name, a numeric group id, or the Scyld-defined all encompassing group "any." The group permission setting for a node is analogous to the group ownership feature of files in Linux and basically controls who is allowed to use the node. As described below, a pop-up menu is available in this list box with an option for changing a node’s group. A node’s group data is stored with its associated entry in the cluster configuration file.

Mode

This column displays the execute permissions, which are "user", "group" or "all".

Numbering Your Cluster Nodes

Depending on the size of your cluster, you may or may not pay a great deal of attention to how the individual nodes are numbered. After all, if your cluster is just a handful of commodity PCs sitting on some shelves, what’s the big deal. Even

a single rack of 8, 16, or 32 nodes may not seem like such a headache. But what about the lab with clusters containing 128 nodes or more? How would you figure out what box to examine if you only knew that node #37 was having problems? Luckily, beosetup was designed to make managing clusters of all sizes easy by paying careful attention to how nodes are numbered, and once setup, to maintaining those numbers.

When first setting up your cluster, it’s recommended that you power up your compute nodes one-by-one in some logical order relative to how your hardware is arranged. In this way, for example, you’ll know that the first box at the top of the rack is node #0, the second box from the top is node #1...and so on. Once you have all nodes in your cluster powered on and numbered (and presumably labeled), beosetup will maintain the node numbering of all the up nodes in your cluster as other nodes in the cluster are inserted and / or deletion. Examples will hopefully explain this better.

Let’s say your cluster contains 3 compute nodes. Node #0 is currently powered off and down, while nodes #1 and #2 are

up and running. If you delete node #0, its address is changed to off and it’s left in the list as a placeholder so nodes #1 and

#2 maintain their current node numbers. As nodes are inserted, moved around or deleted, beosetup takes care to ensure the assigned node numbers for all up nodes are maintained. Refer to specific operations described below for further details on what effect they have on a cluster’s node numbers.

Documento similar