• No se han encontrado resultados

El sistema de numeración y la multiplicación y división por 10, 100 y 1

In document CLIC para DESCARGAR (página 56-60)

The symptom-directed d iagnosis fault isolation tools use a knowledge base of fault isolation rules to determine bow to analyze the data inside the error log entry. The fault isolation rules were designed by reliability engineering experts who understand the behavior of the machine when it fails.

Thl're are two basic types of fault isolation ru les. single event and multiple event. Single-event rules arc used for analyzing single error events (i .e. , one error log entry). Multi ple-even£ rules arc used for

analyzing m u ltiple error events that occur over a

specified inrerval of rime.

Single Event Fault Isolation Rules

There arc several categories of single evenr fault iso­ lation ru les. These rules are derived from rhe on- l ine error derection designed into the VA X 9000 system .

Prinuuy 5>)mdrome Fault Isolation Rules Primary syndromes are the error larches rhat detect and report error events. Each error latch stores the result of an on- line error derector. Each error detec­ ror covers a secrion of logic in rhe system. By map­ ping t h is logic to the physical parr ition (i .e. , field replaceable units), the values of set error latches can be ust"d as a first-pass fau l t isolation. I n many instances, this anal ysis a lone is sufficient to deter­ m ine rhe faulty field replaceable unit.

Secondary .'l)•ndrome Fault Isolation Rules In some i nstances, the fault isolation provided by the pri­ mary synd romes may nor local i ze the fault s u ffi­ ciently. For example, if the primary synd rome field

FRU 1 FRU 3 00 - 07 PARITY GENERATOR DO - 08 MUX FRU 2 01

08

PARITY

GENERATOR

replaceable unit callout results i n more than one field replaceable unit hav ing a significant possibility of failure, then secondary syndromes must be used to reduce the cal lout. Secondary syndromes are key machine states, other than error l atches. that are stored in rhe error log entry. Examples of secondary synd romes include mulriplexer select lines, mem­ ory :tddress values, and orher p�nh-sensitive control signals. These s ignal stares are used to derermine rhe specific parh rhar was S(;nsitized w hen an error occu rred . T he nonsensitized path(s) can t he n be removed from the callour . An example of how sec­ ondary syndromes are used for fault isolarion is shown in Figure 4 .

Fault Propagation Rules Sometimes a single-error event can trigger multiple error detectors because of fau lt propagation or domain intersection .

Fault propagation occurs when a fault i n :.1 given

error domain (i .e. , the propagation source) propa­

gates into other error domains (i.e. , the propagation

destinations). 'J() identify the real sou rce of the

error, the possible fault propagation paths must be found and the precedence of the error detectors in each propagation path must be identified . When multiple error latches are set, the propagation rules can then be applied to e l i m i nate a l l p ropagation

PARITY ERROR PARITY _ERROR

CHECKER LATCH

M U X_SELECT

PARITY _ER ROR M U X_SELECT CALLOUT

1 :)8 UNKNOWN FRU 1 FRU 2 FRU 3 0 FRU 1 FRU 3 FRU 2 FRU 3

Figure 4 Secondary 5)1ndrome Example: MUX Select Usedfor rciUlt Isolation Reji'nement

Hierarchical Fault Detection and Isolation Strategy for the VAX <)000 System

destinations for each propagation source in the call­ out. An example of faul t propagation is shown in Figure 5.

Domain Intersection Rules Domain intersection results when two or more error detectors cover a common piece of logic. This information is used to refine the callout when multiple error latches are set in the VAX 9000 system as shown in Figure 6.

Multiple Event Fault Isolation Rules Multiple-event rules attempt to correlate separate error events to fine! a common problem. This type of analysis is beneficial when an i n termittent or transient problem is not diagnosed sufficiently by single-event symptom-directed diagnosis rules.

For example, if a logic fault were analyzed with single-event, symptom-directed diagnosis rules, an intermittent logic fault could be concluded as hav-

FRU 1

ing occurred. Such an analysis would result in a ull­ out of the faulty field replaceable unit. However, multiple-event rules include checking for certain environmental deviations in close proximity to a logic fault. In this case, multiple-event analysis would attempt to correlate the logic fault with the envi ronmental deviations to determine if the fault is transient in nature. If this were the case, a callout would not be required.

Multiple-event rules can also be used to enforce the callout refinement provided by secondary syndromes, fau lt propagation, and domain inter­ section. For example, in a VAX 9000 system that repeatedly generates identical or similar error log entries, multiple event analysis can correlate these entries to a single intermittent fault. It can provide a scenario of which is the most l i kely secondary syndrome path to be sensitized and the most likely error domain to detect the error first. In this case,

PARITY _ERROR_ 1

PARITY OG-08/ PARITY ERROR

GENERATOR CHECKER LATCH

OG-08 FRU 2 LOGIC OG-08 PARITY CHECKER

PARITY _ E R ROR_1 PARITY _ERROR--2 CALLOUT

NO PROPAGATION FRU 1

INFORMATION FRU 2

WITH PROPAGATION FRU 1

INFORMATION

ERROR

LATCH

Figure 5 Fault Propagation Example

PARITY _ER ROR_2

FRU 1 FRU 2

00-07

PARITY

GENE RATOR

FRU 3

PA R ITY _ERROR_1 PARITY _ERROR_2 CALLOUT

0 0 FRU 1 FRU 2 FRU 1 FRU 3 FRU 1

PAR ITY ERROR PARITY _ERROR_1

CHECKER LATCH

PARITY ERROR PARITY _ERROR_2

CHEC K E R LATCH

Figure 6 Domain Intersection EwmljJle

multiple-event analysis can view these events as a s ingle problem rather than seeing each error log entry i n isolat ion .

CAD Tools and Processes

'J(> ensure that t he VAX 9000 symptom-d irected diagnosis fault coverage and isolation goals were achien:d , CAD tools were needed to measure the quality of the on-line error detection in the design. Tools a lso were needed ro help develop symptom­ directed diagnosis fault isolation rules and to faci li­ tate the conversion of these rules into a format that cou ld he used by the fa ult isolation software.

Some of the significant symptom-directed diag­ nosis CAD tools that were devdoped and used for

the VAX 9000 system are discussed below.

Hardware Isolation Domain Evaluator The hardware isolation domain ev:tl uaror ( H I I )E)

CAD roo! was developed to prov ide sy mptom­ directed diagnosis fault coverage and isolation information to the VA X 9000 logic designers. Hl l)E

also can generate simp!<: symptom-d irected d iag­

nosis fa ult isolation rules for usc in the system fa ult isolation matrices.

One of the:: goals for H ID E was to provide ea rl y fec::dback to logi c designers on the quality of on-l ine

1-i ()

error detection in designs. Ea rly feed back gave

dc::signers rime to make design changes i f cm·erage

or isolation goal s were not achieved . Further. the

information prov ided by H I OE helps designers

select locations for error detectors and gave design­ ers quick feedback on the implications of detector placement and design changes.

Symptom Diagnosis Information Language

The symptom-directed d i agnosis fa u l t isola tion

rules for t he VAX 9000 system were coded into a set of system fault isolation matrix fi les, cal led symp­ tom diagnosis information files. Symptom diagnosis information is a language t hat is designed to express hoth single-event and mulriple-c::vent, symptom­

di recrc::d diagnosis fault isolation rules in an objec­ tive and consistent manner.

In c::arlier VAX systems, new fau l t isolation tools were needed for each new computer system . In the

VAX 9000 system, the sym ptom diagnosis informa­

tion language provides a general-purpose means to specify symptom-directed d iagnosis fault isolation ru les. The fi les a rc used as t he ru le base for t he

symptom-directed d iagnosis fau l t isola tion tools.

which means that the tools can be used for furure computer system designs.

Hierarchical Fault Detection and Isolation Strategy for the VAX f)OOO S)'slern

On-line Fault Isolation Software

The VA X 9000 system contains on-line sym rrom­ direcred diagnosis sofrwan: rhar auromatica l l y diag­ noses faults as they occur. The software produces an isolation cal lout of the possible fau l t y fie l d replaceable units t h a t i s automatica lly received by Digital customer s<.:rvice centers through a symp­ rom-di rected d iagnosis reporting process. This rrocess is design<.:d (() mini mize the repair time for VAX 9000 systems. It a u tomatica l l y noti fies Digital of problems and provides a rerair plan to Customer Services before personnel arc sent to the customer's sire.

Service Processor Diagnostic

The VA X 9000 sen·ice rrocessor unit contains a symptom-directed diagnosis fa ult isolation process that rerforms single-event a n a l ysis. This pro­ cess runs in the background waiting for error log entries. \V hen an error log entry is generated. the

process analyzes the error log entry and rroduces an encoded ca l lout of possible fa ulty field rerlace­ ahle un its.

The symptom-d irected diagnosis fault isolation algorithm is rerformed by a general-purpose diag­ nostic engine. T h is engine uses a binary version of the s y m prom di agnosis i n formation fi le, i . e. , binary-coded matri x , as a rule base for its analysis. The d i agnostic engine can anal yze any error log entry that has a va lid corresponding binary-coded matrix file.

In addition to the encoded callout, the single­ event fa ult isolation rrou:ss produces status infor­ m ation from each error event that is used for multiple-event analysis.

VAXsimPL US

The Vr\X sim l'Ll 'S tool runs on the VAX 9000 CPU and

performs symptom-direcred diagnosis mulriple­ e,·cnt analysis. The tool analyzes information gen­

erated by the si ngle-event, symptom-directed

diagnosis process using multiple-event. hinary­ coded matrix files. The VA Xsim P L LIS tool uses the same general-purpose d iagnostic engine as the single-event. symptom-directed diagnosis process. The outrur of the VA Xsim PLliS tool is a syndrome entry that coll apses several error even t s into a single error analysis theory.

Summary

A complete rest and diagnosis strategy for a large computn system, such as the VAX 9000 system,

1-equires off- l ine resting and its on- li ne cou nterpa rt.

s�·mptom-direcred diagnosis. Off-li ne test ing rro-

Di�ilal Tt.•cbuicatjounwt \'ul. 1 N". 4 Foil /'J')Ii

vides a hierarchical mechanism for testing each component before it is assemb led into the next level. In off-line resting, the usc of the scan system rrovidcs high coverage and accurate fa ult isolation . Scan test ing also has p roven e ffective du ring a l l p h ases o f the VA X 9000 system product develop­ ment: design, manufacturing, prototype debug,

and customer support.

Sy mptom-di rected d iagnosis is a sophisticated tool that provides detection and isolation of inter­ mittent faults. Intermi ttent faults have heen a signif­ icant problem i n t he past because of the difficu lty to re-create the conditions that lead to such faults. Symptom -directed diagnosis solves t he problem of intermittent faults by analyzing symptom informa­ tion generated by on- line error handlers rather t han by attempting to re-create the fault. Thus. the use of symptom-dir<.:cted d iagnosis provides greater machine availability for the VA X 9000 system .

Acknowledgments

The im plementat ion of the VA X 9000 fault detec­ tion and isolation strategy wou ld have been impos­ sible if not for the perseverance and dedication to high qual ity shown by the fol lowing peorle: Jeff Barry, Dom inic Carr, Steve Conway, Ed Crowley, Betty Daley, Tony [)ancona, Dave D'Antonio, Chris Demos, Sue DesMarais, Pa ul Dorm irzer, Rick

Dusek, M i ke Evans. S kip Gaede. M i ke Gavronsky, Philipre Girard, Matt Goldman, Francis (; ravel , C h ris josep h , D a l e Kec k, Tom Kreh e l , C harlie Kretz, Burch Leitz, Helen Lenane. Paul Leveil le. Keith Mayhue, Ch ris McCabe, Robert Nobrega. Mi ke Newman , Paul Paternoster, Brian Rosr, Dan Schu l l m a n , Scott Sitterly, Norm Sozio, Tamar

Wexler. Tom Winter, Ted Wojcik , Richard Wood .

Eugene Xia. and t he members of thc MCU rester and

tviC A :) rest develormenr t<.:ams.

General References

A . M iczo, Digital Logic Testing (New York : H arper and Row Publ ishers, I nc . , 1986).

N. Tendoikar and R . Swan n , " Automated Diag­ nostic Met hodology for the I BM 3081 Processor Complex ,'' IRM journal of Research and Det,el­ opment. \'O I . 26. no. 1 (.January 1982 ) : 78-HH.

H. Tanaka er a l . . " System Level Fault Dicrionary Generation ." IEEh. International Test Conference Proceedinp,s ( New York, 198H): H84 -HH7.

M . Coldman et a l . , "The VAX 9000 Sen·ice Pro­ cessor Unit," Digital Technical .Journal. vol . 2 .

no. -+ (Fall 1990, this issue): 90- 101 .

In document CLIC para DESCARGAR (página 56-60)