5 DIAGNÓSTICO TÉCNICO DE LOS SISTEMAS ACTUALES DE
5.1 SISTEMA DE ACUEDUCTO
5.1.13 Recomendaciones del Plan Maestro
Of all the various modern approaches, perhaps the most prominent is the Dynamic Fault Tree (DFT)8 methodology (Manian et al., 1998), a gate-based temporal fault tree methodology designed for quantitative analysis of dynamic systems. DFTs make it possible to analyse fault tolerant computer systems using Markov chains (Norris, 1997). Markov models are capable of modelling the sequence-dependent behaviour typically found in such systems, but they are large and cumbersome, and the production of Markov chains is often tedious; by generating them automatically from fault trees, it becomes possible to avoid many of the disadvantages of manually producing Markov models while also granting fault trees the ability to analyse sequence-dependent failure behaviour in systems. DFTs therefore allow for the analysis of systems with more complex interrelationships between events, including functional dependencies, standby components, and event sequences.
DFTs are a comprehensive attempt at a solution, as evidenced by their inclusion in the newer
Fault Tree Handbook for Aerospace Applications (Vesely et al., 2002). They have been
incorporated into automatic safety analysis tools or methodologies such as DIFTree (Dugan et
al., 1997) and its successor, Galileo (Sullivan et al., 1999; Dugan et al., 1999; Manian et al.,
1999); they have also been put to a variety of uses, such as dependability analysis (Meshkat et
al., 2002), formal models (Coppit et al., 2000), expert systems (Assaf & Dugan, 2003), analysis
of software (Dugan & Assaf, 2001), sensitivity analysis (Ou & Dugan, 2000), phased-mission systems (Xing & Dugan, 2002), common-cause analysis (Tang & Dugan, 2004), and linked with event trees (Xu & Dugan, 2004). A formal compositional semantics using Markov chains can also be found in (Boudali et al., 2007a & b).
DFTs represent temporal information by defining two main special purpose 'temporal' gates, described below.
7 Assumes that PAND is not inclusive of simultaneous occurrence of events. If X PAND Y includes the possibility of X and Y occurring at the same time, then X < X = X and the Idempotent law holds.
8 The capitalised term 'Dynamic Fault Tree' and acronym 'DFT' are used exclusively in this thesis to refer to the dynamic fault tree methodology of Dugan et al and should not be confused with other dynamic FTA approaches or 'dynamic'/temporal fault trees in general, which are referred to only in lower case.
Figure 16 – Functional Dependency (FDEP) gate
Functional Dependency (FDEP) gates allow DFTs to model situations where one component A is dependent on another component B for operation. If component B fails, then component A will also fail; the failure of component B is then the trigger event for the failure of component A. FDEP gates have a single trigger event and multiple dependent children events; the occurrence of the trigger event will cause the occurrence of all the children. If any of the children occur by themselves, this does not affect the other children or the trigger event. This type of gate is very useful for modelling networked components or components connected by a central bus where the failure of the interconnection will cause the failure of the individual components. They also allow fault trees to model interdependencies that would otherwise cause loops in the fault tree.
Figure 17 – Generic Spare Gate
Spare gates allow DFTs to model secondary backup components that are not activated until needed. These are difficult to model in ordinary fault trees because they do not fail until being activated, i.e. after the primary has failed. The inputs to spare gates are all basic events. The first
(left-most) is the primary component, and any further inputs are secondary components, which are activated in order (so if there are three secondaries, each will be activated when the previous one fails). The SPARE gate itself is only true when the primary and all the secondaries have failed. There are usually three varieties of SPARE gates: hot spares, which are always on but only provide function when the primary fails; warm spares, which are kept in a state of reduced readiness until needed; and cold spares which are kept deactivated until required. Hot spares will have higher failure rates than cold spares because they are always active, and failure rates of warm spares will fall in between hot and cold spares. This is modelled by a dormancy factor which affects the failure rate; a hot spare will have a factor close to 1.0λ, a cold spare somewhere close to 0, and a warm spare somewhere in between, e.g. 0.5λ. Spare gates can also share secondary backup components, i.e. one secondary can be the backup for two primaries. In this way, spare gates can also model a common pool of backup components.
DFTs also make use of a version of the Priority-AND gate (which as mentioned earlier, is true if its inputs all occur in a specific order) and sometimes also include separate sequence or SEQ gates which impose a left-to-right sequence on input events (and can thus be viewed as a specialised version of a PAND gate). The potential ambiguities of the PAND gate are not often addressed in the DFT methodology, though a more formal definition of the DFT gates is given in Coppit et al. (2000) (and discussed in Section 6.2.1).
A DFT using any of these gates can then be analysed using Markov chains; any fault tree model with exponential-like distributions can be solved quantitatively as a Markov chain, though in practice the computational complexity may be prohibitive for large models. Markov chains are constructed by considering the effects of all possible component failures in all possible operational states in turn. These child states are then considered in turn until all possible states, both failed and operational, are taken into account. From these, and the failure rates of each component, quantitative analysis is possible.
Markov chains, and by extension DFTs, have various advantages and disadvantages. It is easier to use a fault tree than to use Markov chains directly, and DFT tools like Galileo handle this automatically, converting to and from Markov chains as necessary. Obviously, DFTs also allow the analysis of dynamic system elements, like redundant components and functional dependencies. However, Markov chains are slow. For a large tree with lots of dynamic modules, this could become a significant problem. The reason for this poor performance is that Markov chains have a near exponential state space; virtually every basic event must be considered against every other basic event. Even for moderate trees, the state space could be massive. The problem becomes much worse in cases where dynamic gates share common events, i.e. the same event is an input to more than one gate; for this reason, tools like Galileo often disallow
repeated/shared events. It is also difficult to perform other types of analysis (i.e. qualitative) using Markov chains. Identifying the weak points in the system, rather than just determining an overall reliability, is even more expensive with Markov chains.
To help solve this problem, DFTs use the modularisation method by Rauzy & Dutuit, explained in section 2.2.4, to break up the fault tree into independent modules that can be analysed separately (Gulati & Dugan, 1997; another technique can be found in Huang & Chang, 2006). This has two benefits. Firstly, it reduces the complexity of the fault tree as a whole; three sub- trees each with 1000 states are more manageable than one whole tree with 1000x1000x1000 states. Secondly, it enables separate analysis of different modules using different methods. Therefore, a module containing only static elements can be analysed with the much faster BDD approach (Sinnamon & Andrews, 1996), whereas a module containing dynamic/temporal elements can be analysed with Markov chains. In this way, DFTs combine the speed of static methods with the added expressive power of dynamic methods. However, this approach is only effective in fault trees containing several independent modules; if there are none, or if dynamic gates occur near the top of the tree, then the advantages of modularisation are lost.
Another solution to this problem has been put forward by Amari et al. (2003), who use conditional probabilities to perform quantitative analysis of a DFT without first converting it into a Markov chain. This is particularly useful if the top node of the tree is a dynamic gate (which would render the modularisation ineffective), but it is potentially not as effective in cases where the tree is mostly static, as in that case traditional methods may be more efficient for most modules.
There has also been some work on performing qualitative analysis using DFTs (Tang and Dugan, 2004). The goal of the method is to obtain the minimal cut sequences from the DFT, which are ordered minimal cut sets, i.e. the events must occur in a specific order. Although the minimal cut sequences can be extracted from the resulting Markov model, this is an expensive operation. The proposed alternative uses a variation on the BDD method instead. The method consists of first separating the timing constraints – i.e. the temporal information – from the logical constraints. For example, a PAND has the logical constraint that all of its inputs must occur and the timing constraint that all of its inputs must occur in a specific sequence. Once this has been done, the dynamic gates can all be replaced with static equivalents (e.g. PANDs are replaced by ANDs). The resulting tree can then be used to produce a ZBDD (Zero-suppressed BDD – a minimised version of the BDD), which can be minimised to produce minimal cut sets in the normal way; BDDs are much quicker than Markov chains for this purpose. Once the minimal cut sets have been obtained, they are expanded to include the timing constraints removed earlier to form the minimal cut sequences (if appropriate).
The problem with this method is that it explicitly separates the temporal information from the fault tree during the reduction process. The result is that any possible redundancies or contradictions can only be identified and removed accordingly once the temporal information has been restored at the end, potentially missing the opportunity to do so earlier in the process. Furthermore, it is not clear whether or not any such reduction is done even at the final stage; in which case, possible logical reductions or contradictions could be missed, resulting in inaccuracies in the quantitative analysis.
Liu et al. (2007) propose another method of combined qualitative and quantitative analysis of DFTs called CSSA (Cut Sequence Set Algorithm). In CSSA, the DFT is broken down into a set of 'cut sequences' called sequential failure expressions (SFEs), which are ordered lists of events
separated by the sequential failure symbol, . For example, X Y is a SFE in which X fails
first and then Y fails. The CSS (Cut Sequence Set) is the collection of all SFEs that represent the fault tree. AND gates are converted into SFEs by enumerating all possible sequences, of which there are n! for a gate with n inputs; thus an AND gate with three inputs, e.g. X.Y.Z,
would yield 6 SFEs: XYZ, XZY, YXZ, YZX, ZYX, and ZXY.
PAND gates indicate a single SFE directly, e.g. X PAND Y is the same as XY. FDEP gates
are represented as (E1 AND E2) OR E3, where E1, E2, and E3 are SFEs representing the trigger event, the triggered events, and any non-triggered events respectively. Finally, SPARE gates are represented by specific SFEs that link the failure of the primary to the failure of the secondary. Once the CSS has been generated, it can be quantified using conditional probability formulae. Therefore, the CSSA method avoids the use of Markov chains entirely.