The documentation of internal faults is similar to documenting unsafe external interactions, as described in Section 4.3.3. The differences are that a) faults, rather than manifestations, are paired with successor dangers, b) process variable names and values are not documented, c) mechanisms for design-time detection are considered, and d) additional runtime handling techniques are available. The first difference is self-explanatory, since this activity is focused on faults rather than manifestations. Note, though, that in T-SAFE, error sources are used instead of error paths, since faults occur without being triggered by an error event’s arrival (see lines 16 and 17 of Figure4.14). The second difference is also straightforward: the process model of a component is updated entirely using information arriving from other components; if the behavior of those components is not considered (as is the case in Activity 2) then a component’s process model can be ignored. The final two differences merit more discussion, though.
Specifying Design-Time Detections
While all classes of dangers may be detectable at runtime, some faults can be detected while a system is still being designed. These are dangers that come about due to problems in an element’s development, and there are several techniques that can be used to detect them. Aviˇzienis et al. identify five techniques, broken down into two categories [3]:
Dynamic Verification: These techniques involve executing the system.
– Symbolic Execution: Symbolically executing a system involves using symbols rather than concrete values. As system execution progresses, the symbols become increasingly constrained by the statements that make up the current execution
path. The path-specific collection of constraints is referred to as a path condition, and can be tested at any point to determine if the path is viable or if it violates certain analyst-specified properties.
– Testing: Perhaps the most common form of detecting problems at design time, testing consists of simply providing some known inputs to a system and then ver- ifying that it behaves as expected. While testing can be specialized in a number of ways (by domain, stage of development, etc.) and is in general quite flexible, it lacks the analytic power of the other verification techniques. In particular, arguments for completeness of test coverage can be difficult to make.
Static Verification: These techniques do not involve system execution, but rather analysis proceeds on either models or descriptions of a system.
– Model Checking: This involves building a model of a system in some modeling language, and then verifying certain properties about that model. One challenge with this approach is ensuring that the model of a system aligns with the system as it is built.
– Static Analysis: This is an umbrella term for any of a number of techniques which examine static descriptions of a system. Many static analyses are built into compiler toolchains to catch relatively simple coding errors.
– Theorem Proving: A more sophisticated technique, theorem proving requires stating and proving claims about a system. Those statements need to be checked not only for their provability, but also—like the models used in model checking— for their adherence to the system’s actual behavior.
Documenting which strategy an analyst thinks is best for detected design-time problems requires specifying both the detection technique (if applicable) and providing a short nar- rative description. For example, the best approach to ensure that the PCA Interlock’s app
logic is free from built-in bugs (fault 1 in Table 4.1) might be formal verification, since it is a relatively small piece of software that is vitally important. In M-SAFE, the analyst would write something like “Formal Verification: Since the app logic should be relatively simple but needs a very high level of assurance, the core algorithms should be formally verified.” This should be entered in the Design-time Detection section of the worksheet, which is F38 in Figure 4.15.
Like Activity 1’s specification of runtime error detection and handling, specification of design-time fault detection in T-SAFE is slightly more involved. As in Activity 1, the documentation of detectable design-time faults involves specifying an event, in the com- ponent’s error behavior declaration, which is associated with the fault’s type; see line 25 of Figure 4.14, which specifies that the PumpDeteriorates event is associated with the Deteriorationfault type. The analyst would use the DesignTimeFaultDetection property specified on lines 5-8 of Figure 4.21 to associate a detection mechanism and nar- rative explanation with the fault’s occurrence.
Additional Run-Time Handling Techniques
Faults which are detectable at runtime are documented similarly to detectable errors, except that an analyst can declare an additional technique for rectifying the underlying fault. Aviˇzienis et al. explain four such “fault handling” techniques [3]:
Diagnosis: This technique involves identifying and recording the cause of the problem, and can be difficult to automate.
Isolation: This involves physically or logically “excluding the components,” which will make the fault dormant (i.e., extant but harmless).
Reconfiguration: This involves the swapping in of spare components or the reassign- ment of “tasks among non-failed components.” Both this technique and the previous
1 property set MAP_Error_Properties is 2
3 -- Other properties removed for space
4
5 DesignTimeFaultDetection : record (
6 FaultDetectionApproach : enumeration (StaticAnalysis, TheoremProving,
ModelChecking, SymbolicExecution, Testing);
↪
7 Explanation : aadlstring;
8 ) applies to (all);
9
10 RuntimeFaultHandling : record (
11 FaultHandlingApproach : enumeration (Diagnosis, Isolation,
Reconfiguration, Reinitialization);
↪
12 ErrorHandlingApproach : MAP_Error_Properties::ErrorHandlingApproachType;
13 Explanation : aadlstring;
14 ) applies to (all);
15
16 EliminatedFaults : record (
17 FaultTypes : list of reference({emv2}** error type);
18 Explanation : aadlstring;
19 ) applies to (all);
20
21 end MAP_Error_Properties;
Figure 4.21: Property types used in Activity 2 of SAFE
one overlap somewhat with compensation-based error handling approaches, since they rely on redundancy.
Reinitialization: This technique involves resetting (i.e., “rebooting”) the system, in the hopes that it is in a valid state after having been restarted.
When addressing a particular fault, an analyst may decide that either fault or error handling techniques (i.e., those discussed in Section4.3.3) would be individually preferable, or that their combined use would be best. In M-SAFE these approaches are documented using columns H and I in the “Internally Caused Dangers” section of the worksheet in Figure 4.15.
Consider, for example, a situation where the clinician misunderstands the patient’s health and provides an overly-strong prescription. Since this clinician isn’t modeled (at the current level of abstraction), this problem would be considered a fault. Its solution might be would
be two-fold: First, the app should have a carefully-designed user interface (UI) that would make such mistakes difficult to commit; this itself a topic of study, see e.g., [98]. Second, training for use of the PCA Interlock should be periodically re-performed to account for any incorrect adaptations clinicians may mistakenly make; Leveson writes that “Systems and organizations continually experience change as adaptations are made in response to local pressures and short-term productivity and cost goals.” [30] An example of M-SAFE’s documentation of this fault handling technique is shown in cell I39 of Figure 4.16.
Documentation in T-SAFE is similar to that of the detection mechanisms specified pre- viously, except that the RuntimeFaultHandling property (which is specified in lines 10-14 of Figure4.21) is used. Like the detection documentation, the the handling documen- tation would be applied to the component error behavior events associated with the fault’s type.