• No se han encontrado resultados

Las facilidades probatorias que garantizan la efectividad del derecho a la prueba. prueba

2.1. Mecanismos de efectividad del derecho a la prueba

2.1.1. Los deberes, las cargas y las facilidades probatorias

2.1.1.2. Las facilidades probatorias que garantizan la efectividad del derecho a la prueba. prueba

General Theory

The back end of a CLARiiON array uses many of the same principals of the front end relative to data flow. The following section should be taken in a general context, then apply this knowledge to the back end loop.

If you start by looking at what a storage area network is, you find that it is a collection of fibre channel or iSCSI nodes that usually communicate with each other via some type of media such as fiber optic or copper wire. A node is defined as a member of the fibre channel network that is provided a physical and logical connection to the network by a physical port on the switch.

Every node requires the use of specific drivers to access this network. For example, on a host, one has to install an HBA and the corresponding drivers to implement FCP or iSCSI. These drivers are responsible for translating fibre channel or IP commands into something the host can understand (such as SCSI commands) and vice versa.

The Fibre Channel nodes communicate with each other using a device such as a Fabric Switch. The primary function of a fabric switch is to provide a physical connection and logical routing of data frames between the attached devices. A switch also provides fabric services to the nodes attached to it to allow them to communicate with each other within a fabric. In the absence of a switch, arbitrated-loop or point-to-point communication is in use (discussed later).

A SAN usually consists of one or more of the following components

Fibre Channel (FC) is used as the transport protocol in most SAN implementations. It is a serial protocol that can use either copper or optics as the physical medium. Earlier implementations of FC used copper while most modern

implementations use fiber optic cables. The CLARiiON storage system front end uses optical media while the back end uses copper media.

CLARiiON Backend Arbitrated Loop

The ANSI Fibre Channel Standard defines three topologies: Point-to-point, Switched fabric (FC-SW) and Arbitrated loop (FC-AL). The following describes each of these. The CLARiiON back end operates on the Arbitrated loop (FC-AL) topology with some differences dependent upon the type of disk enclosure in use.

Backend data flow

How does this relate to the backend of a CLARiiON Storage System?

Data flow through each enclosure type FC-series data flow

The FC4700, FC5700, FC5300 and FC5500 backend loop data flow is shown only to depict the difference with newer CX/CX3-series arrays. The principals of determining a failing device are the same but the data flow is different.

CX data flow

The CX-series data flow with DAE-2

Using the information in the above two diagrams, you can find the failing device. For a standard DAE2 type of enclosure, the failing device is fairly straight forward. In its simplest troubleshooting form, the device reporting the highest number of errors is just the messenger. It will normally be a device just prior to the one reporting errors which is the problem device.

The consideration of ‘upstream’ and ‘downstream’ devices has to take into account the type of disk array enclosure in use.

The above two examples are for the older FC-series DAE and original CX-series DAE2 enclosures. The next section will

ATA (Advanced Technology Attachment) Disk Enclosures

The ATA disk enclosure is the same DAE2 type enclosure used with FC-type drives in the CX-series array. It is not available on the FC-series arrays. The difference with ATA is that it uses PATA drives with a paddle card to convert them into serial for use on the same midplane structure of the enclosure. Additionally, the LCC (Link Control Card) is replaced with a more intelligent BCC (Bridge Control Card) that is codenamed Klondike. The BCC is also known for marketing purposes as an LCC, so be sure you are clear about the type of ‘LCC’ you are working on when requesting assistance.

The firmware or frumon code that resides on the BCC is codenamed Yukon.

ATA Disk Ownership

Upon power-up the data path to all disk drives will be validated by each LCC to set the initial ownership values within the FRUMON drive bypass register. FRUMON will initiate the validation request by first reading the drive present register.

This information is passed to a LCC redundant controller process to co-ordinate the validation of all drives from both controllers. Once all paths have been validated and the internal database is updated, the corresponding FC AL-PA is enabled and commands can be processed for the corresponding disk drive.

Ownership of a disk path is established after validation completes with every other disk drive ‘owned’ by the local controller. The LCC will attempt to change ownership during operation if a majority of commands are received on the controller that does not ‘own’ the disk. This is done to minimize command latency while maximizing bandwidth capabilities of the controller. This is performed every five minutes and is based on >50% of the I/Os traversing the midplane.

Since SATA is a point-to-point connection, the LCC utilizes a dedicated inter-controller link between controllers for dynamic dual path capability. The FC inter-controller link is designed to behave very similar to a second host port on each controller with the exception it must also act as an initiator to send commands and messages to the peer controller.

Dynamic dual path also allows for controller ‘ownership’ of a SATA disk drive simplifying management of the data path and alleviating cache coherency issues since all data is managed for a given disk drive on a single controller.

To assist in troubleshooting an ATA enclosure, it is important to understand that a drive exhibiting issues, may have the peer BCC as the ‘owning’ BCC. Starting at R19 Base Software, it is possible to view the BCC event logs. Due to the nature of these logs, they should be gathered and provided for review as part of the normal analysis process. Note that power cycling or resetting the BCC will clear the event log, so it is important to obtain this log prior to any recovery activity.

To accomplish this, you can run an FCLI command to retrieve these logs. Since FCLI is a powerful tool, caution should be taken when executing commands. As indicated, FLARE R19 includes a new command to retrieve the ATA BCC (aka;

LCC) event log and include it in SPcollect information. This command was added to improve diagnosis of some of the issues seen with ATA (Klondike) enclosures. In many cases this is the only means for diagnosing ATA BCC behavior.

Since each ATA enclosure contains two BCCs, it is important to collect logs from both to allow proper diagnosis.

ATA BCC event logs should be retrieved prior to making any changes and certainly before resetting, reseating, or power-cycling the ATA BCCs since the event logs are cleared upon reset. The retrieval command must be run prior to executing SPcollect. The command retrieves the log from the ATA BCC or ATA BCCs attached to the SP on which the command is executed. Always collect the logs from both ATA BCCs in an ATA enclosure (i.e., execute the log retrieval command on both SPs from the FCLI prompt.

- Run EMCRemote onto each of the storage processors (SPs).

- From the command line, enter:

flarecons d f b (when connected to SP-B) flarecons d f a (when connected to SP-A) or

- If you can determine which enclosure or ATA BCC on the particular backend bus is operating incorrectly, retrieve the logs from it using the following command (executed on each SP):

lccgetlog –e <enclosure _number> example: lccgetlog –e rmb1 - If you cannot determine which enclosure or ATA LCC is the problem, or suspect that multiple enclosures are misbehaving, use the following command (executed on each SP) to retrieve event log information on all ATA LCCs: lccgetlog –all

- Disconnect from FCLI mode: CTRL-C

- Note: To exit from FCLI mode, you must use CTRL-C. If in FCLI mode and type "Quit," you return the SP to serial mode. The only way to then get back into FCLI mode is to break into debugger mode and restart FLARE or reseat the SP.

- Repeat the procedure for the other storage processor.

- After using the command to gather all the logs run SPCollect as usual (emc60493).

Note the command takes about 1-2 seconds to retrieve the log. If the retrieval command fails, a “LCC Yukon Getlog Failed for Encl:” message appears in the event log along with the enclosure number. If the command fails, the ATA BCC is likely hung or otherwise unable to return the logs. If the customer situation allows, reseat the ATA BCC or power cycle the enclosure to clear the hang. The BCC reseat or chassis reset will clear logs so further log retrieval is not possible.

ATA BCC LEDs

There are five LED housings mounted at the air dam so that their LEDs are visible through holes in the air dam. Two are for cable connect status, two are for loop ID, and one is for fault and power

l LEDs are green except for fault LED which is amber.

indication. Al Cable on Loop LED

The LEDs indicate, when on, that there is a valid fibre channel signal on the receive side of the cable and that the cable is configured onto the fibre channel loop. The cable is taken off of the fibre channel loop if there is no valid fibre signal on the receive side of the cable and/or FRUMON was told to take the cable off of the fibre channel loop. A single error is not sufficient to take the cable off of the fibre channel loop. Fibre channel protocol is robust enough to handle errors that happen on the loop. This prevents short error bursts from removing enclosures from the loop. Short error bursts can happen for many reasons like hot insertion and removal of FRUs.

Loop ID LEDs

Loop ID LEDs indicate which backend loop the enclosure is on. There are eight LEDs.

Four in each of two LED enclosures. Each LED enclosure is a single LED wide & four

Power and Fault LEDs

The power and fault LEDs share the same enclosure. The power and fault LEDs have built in current limiting resistors. The power LED, when lit, indicates that 5 volts is active and that the board has been inserted enough for the short pins, used in the board inserted loop, to have made contact. The fault LED, when lit, indicates that either there is something wrong or the board needs to be replaced or has not finished power up testing and config.

Ultrapoint (Stiletto) Disk Array Enclosure – DAE2P/DAE3P Fibre Channel Data Path

The fibre channel data flow in and out of every port is managed by a Cut-Through-Switch (CTS). The CTS controls the Fibre Cable ports and the Fibre Disk drive ports. The LCC (codename Stiletto) architecture supports a total of 15 drives. The CTS will route the IO traffic to its respective drive port using the

destination address of the data. This provides a point-to-point connectivity from the host port of the switch to the respective drive ports avoiding a link routed through preceding disk drives.

Fibre channel dataflow in and out from the cable ports are managed by the CTS. The loop’s inputs and outputs consist of a Primary port and an Expansion port, where the Primary port provides connectivity either to a host (SP) or a previous chassis and the Expansion port provides connectivity to a downstream chassis for each loop. A downstream or upstream chassis can be either a DAE or a DPE or a host (SP).

Each cable port input port has a link monitoring capability, which monitors loss of link, link errors and link level violations. Both a digital signal loss of link and analog loss of link are provided.

Link errors such as 8B/10B code violations and disparity errors and Fibre Channel frame CRC32 errors are detected. Fibre Channel link level violations include detecting link anomalies such as loss-of-sync and comma density violations.

Utilizing the link input monitoring capability the Fibre channel data entering both Primary and the Expansion ports are

selected to route through the LCC when cables containing valid fibre data are connected to primary and expansion ports.

The diagram shows a bit more detail of how the CTS works. When an I/O is destined for a drive, that drive and only that drive is brought onto the backend bus.

All other devices are bypassed for the I/O on the data path. This eliminates the possibility that a single drive

How to troubleshoot an Ultrapoint backend bus using the ‘counters’.

Currently a FCLI command called lccgetstats is available. The command will retrieve and display output of the counters.

The counters have limited functionality, but what is available is very important and useful in troubleshooting a Stiletto backend bus. It can also be used on a Stiletto LCC when installed in a mixed-FC or ATA backend environment. The command to use is:

fcli> lccgetstats -b # fcli> lccgetstats –display

The following example is from a CX500 running R19 Base Software)

• EMCRemote into SPB and bring up flarecons on SPB (flarecons d f b)

• Execute the lccgetstats commands. The help screen follows.

fcli> lccgetstats

For the lccgetstats, more parameters are required.

Usage: lccgetstats –h lccgetstats <operation>

Help options: Operations: -b Get information for the specified bus -e Get information for specified enclosure

-display Send the retrieved information to fcli console.

• Note. First issue the retrieval, then issue a second command to display

• In R19 code, the HELP screen is incorrect regarding command usage. It is corrected in current code releases.

• Issue the commands as shown. Backend BUS0 and BUS1 each have two Stiletto enclosures attached.

02/24/2006 19:36:49 fcli> lccgetstats -b 1 lccgetstats request sent.

fcli> lccgetstats -display

Enclosure 8, PRI_PORT, 02/24/2006 17:06:11 Retimer and Monitor Configuration: 0x38

Retimer and Monitor Status: 0x0 Notice there is no status.

Retimer and Monitor LCV Error Count: 0x0 Notice there are no error counts.

Retimer and Monitor CRC Error Count: 0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x34

Enclosure 8, EXP_PORT, 02/24/2006 17:06:11 Retimer and Monitor Configuration: 0x38

Retimer and Monitor Status: 0x0 Notice there is no status.

Retimer and Monitor LCV Error Count: 0x0 Notice there are no error counts.

Retimer and Monitor CRC Error Count: 0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x15

Enclosure 9, PRI_PORT, 02/24/2006 17:06:12 Retimer and Monitor Configuration: 0x38

Retimer and Monitor Status: 0x0 Notice there is no status.

Retimer and Monitor LCV Error Count: 0x0 Notice there are no error counts.

Retimer and Monitor CRC Error Count: 0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x34

Retimer and Monitor Configuration: 0x38

Retimer and Monitor Status: 0x222 Notice the status change.

Retimer and Monitor LCV Error Count: 0xffff Error count ‘ffff’, no cable on EXP port.

Retimer and Monitor CRC Error Count: 0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x15

• Status 0x222 (see table later in document) means ‘DLOL’, digital loss of link signal, ‘CDV’, no characters seen within the frame clock period and ‘LCV’, rate errors have exceeded the frame clock period threshold. You will get this status when a cable is not connected or improperly connected to the port.

• We pull the LCC cable from the expansion port of ENC 0_BUS1.

• Execute the lccgetstats commands 02/24/2006 19:55:45

fcli> lccgetstats -b 1 lccgetstats request sent.

02/24/2006 19:55:45

fcli> lccgetstats -display

Enclosure 8, PRI_PORT, 02/24/2006 19:55:37 Retimer and Monitor Configuration: 0x38

Retimer and Monitor Status: 0x0 Notice there is no status.

Retimer and Monitor LCV Error Count: 0x0 Notice there are no error counts.

Retimer and Monitor CRC Error Count: 0x0 Enclosure 8, EXP_PORT, 02/24/2006 19:55:37 Retimer and Monitor Configuration: 0x38

Retimer and Monitor Status: 0x222 Notice the status has changed.

Retimer and Monitor LCV Error Count: 0xffff Notice there are now error counts.

Retimer and Monitor CRC Error Count: 0x0

• Notice that Enclosure 9 is no longer displayed as it is no longer connected. The expansion port line items are removed from this example as they are not currently in use.

• Reconnect Enclosure 9 and execute the lccgetstats command. The previous buffer will be shown and then the new output will be shown. Each time you run the command, the counters will be reset to zero and what is displayed is the ‘previous’ display results and the ‘current’ command results.

• Running the lccgetstats command needs to be done several times to see if errors are incrementing. If you have a bad bus and the first time you run the command, hundreds of errors may be shown.

• Execute the command again to clear the counters (you can display them if you wish). Then wait 5-10 minutes and rerun the command to see if the errors have incremented.

• Do this a few times to see if errors continue. As with any troubleshooting effort of a backend bus, I/O is required and it may take minutes or hours for errors to get generated. Suggestion: execute a Background Verify to a LUN on the backend bus. This will help generate I/O to assist you in isolating the backend bus.

The following are descriptions of the registers returned in the lccgetstats output.

Retimer and Monitor Configuration: 0x38

Indicates how the monitor is configured. Not important for field usage in reviewing lccgetstats values.

Retimer and Monitor Status: 0x###

Retimer and Monitor LCV Error Count: 0x#

Line code violations are the occurrence of either a Bipolar Violation (BPV) or Excessive Zeroes (EXZ) error event. BPV is the occurrence of a pulse of the same polarity as the previous pulse. EXZ is the occurrence of more than fifteen

contiguous zeroes. It counts all the line code violations/disparity errors detected by the 8B/10B decoder.

Retimer and Monitor CRC Error Count: 0x#

Cyclic Redundancy Check (used to verify the integrity of a data block) errors found during the retiming of the FC signal. It counts all the CRC32 errors detected by the CRC checker.

Expansion A: 0x# “Unused counter for future FRUMON error counts. Ignore any values”

Expansion B: 0x# “Unused counter for future FRUMON error counts. Ignore any values”

Expansion C: 0x# “Unused counter for future FRUMON error counts. Ignore any values”

Expansion D: 0x# “Unused counter for future FRUMON error counts. Ignore any values”

How to interpret the output of the Ultrapoint counters

Output of the lccgetstats is not difficult once you understand the order of data flow. Since data flow is monitored by the

‘outbound’ counters first and then ‘inbound’ counters, it is suggested to realign the output of the display. From a previously shown example, you can rearrange the counters and list them in the order of the data path. It would look like the following;

Enclosure 8, PRI_PORT, 02/24/2006 17:06:11 Retimer and Monitor Configuration:0x38 Retimer and Monitor Status:0x0

Retimer and Monitor LCV Error Count:0x0 Retimer and Monitor CRC Error Count:0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x34

Enclosure 9, PRI_PORT, 02/24/2006 17:06:12 Retimer and Monitor Configuration:0x38 Retimer and Monitor Status:0x0

Retimer and Monitor LCV Error Count:0x0 Retimer and Monitor CRC Error Count:0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x34

Enclosure 9, EXP_PORT, 02/24/2006 17:06:12 Retimer and Monitor Configuration:0x38 Retimer and Monitor Status:0x222

Retimer and Monitor LCV Error Count:0xffff Retimer and Monitor CRC Error Count:0x0 Expansion A: 0x0

Expansion B: 0x0 Expansion C: 0x0 Expansion D: 0x15

Enclosure 8, EXP_PORT, 02/24/2006 17:06:11 Retimer and Monitor Configuration: 0x38 Retimer and Monitor Status: 0x0

Retimer and Monitor LCV Error Count: 0x0

Retimer and Monitor LCV Error Count: 0x0