ANÁLISIS DE LA GESTIÓN 2012 Visión global

The availability of WSN systems is becoming increasingly important due to increased dependency of various WSN applications requiring continuous monitoring and also the need for fault-tolerant design. Availability is closely related to re-

liability, and is defined in ITU-T Recommendation E.800 [ITU-T, 2008] as “the

ability of a system to be in a state to perform a function or an operation at a given instant of time, or at any instant of time within a given time interval, assum- ing that the external resources, if required, are provided.” The main difference between the reliability and availability is that the reliability refers to failure-free

operation during an interval, while availability refers to failure-free operation at a

given instant of time [Trivedi,2002b], usually the time when a device or system is

first accessed to provide a required function or service. Availability may further be categorised as:

1. Instantaneous Availability or Point availability A(t) of a component (or a system) is defined as the probability that the component/or system

is properly functioning at time t [Trivedi, 2002b], [Ever, 2007], and may be

described mathematically as:

A(t) = R(t)

Z t

R(t − x)m(x)dx (3.6)

where R(t) is the probability of having no failure in interval (0, t) and m(x) is the repair density. The equation shows that the system is available either if no failures occurs in interval (0, t), or failure occurs but repair of the

system is completed before time t [Trivedi, 2002a].

In the absence of a repair or a replacement, availability A(t) is simply equal to the reliability R(t) of the component.

2. Limiting Availability defined as the steady-state availability (A) is the

limiting value of A(t) as t → ∞. From the literature [Ever, 2007], it may

be expressed mathematically as:

lim t→∞A(t) = A = 1 ξ 1 ξ + 1 η = M T T F M T T F + M T T R (3.7)

where ξ and η are failure and repair rates respectively, and 1_ξ and 1_η are Mean

Time To Failure (M T T F ) and Mean Time To Repair (M T T R) respectively. 3. Interval (Average) Availability defined as the expected fraction of time

the system is up in a given interval (0 → t) may be given by:

AI(t) = 1 t Z t 0 A(x)dx (3.8)

In [Trivedi, 2001], it is explained that the three availabilities relate as given in equation 3.9 lim t→∞AI(t) = limt→∞A(t) = η η + ξ (3.9)

In order to study system reliability and availability, three model types are identi-

fied as Combinatorial, State space and Hierarchical models [Trivedi,2001], [Ever,

2007]. In combinatorial models, three model types: reliability block diagrams, reliability graphs and fault trees are commonly used. These model types are similar since they capture conditions that make a system fail in terms of the structural relationships between the system components. Reliability block diagrams (RBD) implemented either in series, parallel or in k-out-of-n configurations represent the logical structure of a system with regard to how the reliability of its components affects the system reliability. An RBD can be used to model availability if the repair and failure times are all independent. The assumption of independence and series-parallel structure allows very fast computation of reliability and availability measures. However, many system models in practice do not follow the series-parallel structure. Symbolic Hierarchical Automated Relia- bility/Performance Evaluator (SHARPE) software package developed by Sahner and Trivedi in 1986 allows easy specification and solution of such models [Trivedi and Malhotra, 1993], [Trivedi,2001].

Reliability graph models are considered to consist of a set of nodes and edges (and directed arcs), where the edges represent components that can fail or structural

relationships between the components [Trivedi, 2001]. The graph contains one

node, the source (meaning no arcs enters it), with no incoming edges and one node, the sink (also called destination or terminal nodes) with no outgoing edges. The arcs are assigned failure distributions. A system represented by a reliability graph fails when there is no path from the source to the sink. The edges can be assigned failure probabilities, failure rates or unavailability values or functions, the same as reliability block diagrams. A reliability graph is equivalent to a non- series-parallel reliability block diagram. In the reliability graph, the components are the arcs, while in the block diagram, the components are the boxes. The non-

series-parallel block diagram cannot be directly analysed by (or even specified for) SHARPE, but the reliability graph can. The price for more generality is the increased complexity of solution.

A fault tree is a pictorial representation of the sequence of events/conditions to

be met for a failure to occur [Sahner et al., 1996], [Sathaye et al., 2000]. It uses

AN D, OR, and k of n logic gates to represent the combination of events in a tree- like structure. In order to represent situations where one failure event propagates failures along multiple paths in the fault tree, fault trees can have repeated nodes. There exists several efficient algorithms for solving fault tree [Sathaye et al., 2000]. Examples include; algorithms for serial - parallel systems (for fault tree without repeated components), a multiple inversion (MVI) algorithm called the LT algorithm for obtaining the sum of disjoint products (SDP) from mincut

set [Muppala and Trivedi, 1992] and the factoring /conditioning algorithm that

works by factoring a fault tree with repeated nodes into a set of fault trees without

repeated nodes [Sathaye et al., 2000], Satyanarayana and Prabhakar [1978]. In

[Doyle and Dugan, 1995], [Doyle et al., 1995], it is shown that binary decision

diagrams(BDD)-based algorithms can be used to solve very large fault trees.

In previous studies [Sathaye et al., 2000], Trivedi [2002b], it is noted that relia-

bility block diagram, reliability graph and fault trees cannot easily handle more complex situations such as failure/repair dependencies and shared repair facilities. State space representations have successfully been used to model such complex systems. A state space model is a description of a configuration of states used as a simple model of the system under study. State space models consist of states and transitions between the states. Gracefully degrading systems may be able to survive the failure of one or more active components and continue to provide service at a reduced level. Some commonly used techniques for modelling of gracefully degradable systems include Markov reward model (MRM), Markov

chains, Stochastic reward nets and Petri nets [Trivedi,2001] and [Sathaye et al.,

2000].

The advantage of using non-state-space models seen above is that they are efficient to specify and solve. However, the solution of these models assumes the components are independent. For instance, in a block diagram, fault-tree or reli-

ability graph, the components must be completely independent of one another in their failure and repair behaviour. A failure in one component cannot affect the operation of another component, and components cannot share a repair facility. Markov models provide the ability to model systems that violate the assumptions made by the non-state-space models as seen but at the cost of a state space ex-

plosion. A system having n components may require up to 2n _{states in a Markov}

chain representation [Trivedi, 2001].

Trivedi mentions two ways of dealing with state space explosion problem as tol-

erance or avoidance [Trivedi, 2001]. Complex system tolerance must apply to

specification, storage and solution of the model. If the storage and solution

problems can be solved, the specification problem can be solved by using more concise (and simpler) model specifications that can be automatically transformed into Markov models. Complex models can be avoided by using hierarchical model

composition [Trivedi, 2002b]. The ability of SHARPE to combine results from

different kinds of models also makes it possible to use state-space methods for those parts of a system that require them, and use non-state-space methods for the more well-behaved parts of the system.

In practical system design, a pure availability model may not be enough for gracefully degrading communication and computer systems considering that they tend to be very conservative given they do not explicitly consider different levels of performance of system states. A composite model for both availability and performance is therefore necessary as the system degrades over time. A more

realistic analysis method was introduced in [Beaudry, 1978] and a conceptual

framework of performability introduced by Meyer [Meyer, 1980]. This modelling

approach is very useful for systems as they degrade and experience moments of breakdowns and failures.

In document MEMORIA Y BALANCE ANUAL 2012 CORPBANCA (página 60-81)