AVISO DE LOS DERECHOS DE PADRES Y ESTUDIANTES PARA ACCESO A LOS ARCHIVOS ESCOLARES

Problem Statement: The goal of the Software Health Management element is to develop the

tools and techniques needed to enable the detection, diagnosis, prognosis, and mitigation of errors and related adverse events caused or contributed to by software systems in aircraft. While this bears many similarities to health management of physical systems, there are important differences that must be taken into consideration. The most important consideration is that all software faults are design errors. Software does not fail in the physical sense. However, aircraft software is inherently coupled with physical systems, and many faults in aircraft software are triggered by interactions with physical phenomena. Thus, software health can only be assessed in the context of the larger system in which the software is embedded. Unfortunately, there is little reliable data concerning software failure. Specifically (ref. 1, p. 39):

The lack of systematic reporting of significant software failures is a serious problem that hinders evaluation of the risks and costs of software failure and measurement of the effectiveness of new policies or interventions.

This suggests an inherent difficulty in addressing detection and diagnosis of software faults. Furthermore, Avizienis, et al (ref. 2) observes that certain software faults are “recognized as faults only after […] a failure has ensued.” It is possible that the first indication of a software fault is catastrophic system failure. A canonical example is the first flight failure of the Ariane 5 (ref. 3), which stemmed from sequence of seemingly reasonable design decisions. This

highlights another observation of (ref. 1, p. 40) that “by far the largest class of problems arises from errors made in the eliciting, recording, and analysis of requirements.”

There are examples in the literature that can be considered software health management

techniques. Sha (ref. 4) outlines an architectural mitigation strategy based on run-time monitors coupled with simple, safe, but otherwise sub-optimal alternative solutions. Goldberg (ref. 5) advocates adapting ARINC 653 Health Monitoring mechanisms to support monitoring of software using formal models of expected behavior. Castelli, et al (ref. 6) document a proactive approach to a class of aging software faults. In this context, aging refers to run-time degradation of software integrity due to resource exhaustion, data corruption, or accumulation of numerical errors. The strategy outlined is centered on periodically refreshing or restoring the state to eliminate the deleterious effects. This is a reasonable strategy for systems with long periods of continuous operation, and is worth considering for ground systems. However, for aircraft

software systems, the flight duration is rarely more than ten hours. Airborne systems are already periodically restored to a known good state prior to every flight.

However, airborne systems may suffer from another form of software aging. Parnas (ref. 7) suggests two contributing factors for software aging: (1) not modifying software in response to evolving needs, and (2) modifying software in response to evolving needs. While there are mechanisms in place to manage changes to fielded software systems, this is an area with potential for either introducing new or unmasking existing software defects.

Research Approach: A central recommendation of the National Academies (ref. 1) is that

dependable software systems should be developed with explicit claims and evidence to

substantiate those claims, augmented with expertise in developing that class of systems. In light of these recommendations, research will be focused on developing a framework for (software) health management that involves:

• Explicit claims of system (and subsystem) requirements including assumptions about the application domain and environment in which the system is to operate;

• Evidence that software satisfies these explicit claims under the stated domain assumptions;

• Architectural principles, enforced by hardware mechanisms, that ensure that software behavior dependencies are traceable; and

• Mechanisms for correctly composing software systems from trusted components within the constraints imposed by the architectural principles.

To realize this framework, we propose exploring software health management in the context of system level dependability cases. Dependability cases are a mechanism recommended by

Jackson (ref. 1) for managing the explicit claims and evidence in support of system dependability claims. The central idea behind this approach is that any observed (sub) system behavior that is inconsistent with any explicit (sub) claim in a dependability case is evidence that either the system or its associated dependability case is flawed. In either case, we have reason to doubt the

dependability of the system. Initial tasks will focus on detection and mitigation techniques, with the anticipation that more robust detection capabilities will lay a foundation for future

investigations into diagnosis and prognosis.

Another objective is to gain a better understanding of relevant software failure mechanisms for aircraft systems. There exist taxonomies of faults (refs. 2, 8). We will determine which

classification scheme is appropriate for aircraft systems. Nikora (ref. 9) is currently analyzing historical software fault data (using the classification suggested in ref. 8) from several robotic space exploration missions. A similarly focused study of aircraft systems software failures is recommended.

References

1. Jackson, D.; et al. Software for Dependable Systems: Sufficient Evidence? National Academies Press, 2007.

2. Avizienis, A.; et al. Basic Concepts and Taxonomy of Dependable and Secure

Computing, IEEE Trans. On Dependable and Secure Computing, Vol. 1, No. 1, pp. 11— 33, January-March 2004.

3. Lions, et al.; Ariane 5 Flight 501 Failure, Report by the Inquiry Board, July 1996. (retrieved from http://sunnyday.mit.edu/accidents/Ariane5accidentreport.html) 4. Sha, L.; Using Simplicity to Control Complexity, IEEE Software, July/August 2001. 5. Goldberg, A. and G. Horvath; Software Fault Protection with ARINC 653, IEEE

Aerospace Conference, March 2007.

6. Castelli, V.; et al. Proactive Management of Software Aging, IBM J. Res. & Dev., Vol. 45, No. 2, March 2001.

7. Parnas, D. L.: Software Aging, in Proceedings of the 16th International Conference on Software Engineering, pp. 279—287, 1994.

8. Grottke, M. and K. Trivedi, Fighting Bugs: Remove, Retry, Replicate, and Rejuvenate, IEEE Computer, pp. 107 – 109, February 2007.

9. Nikora, A; Classifying Software Faults to Improve Fault Detection Effectiveness, NASA OSMA Software Assurance Symposium, September 2007. (retrieved from

http://sarpresults.ivv.nasa.gov/ViewResearch/130.jsp)

IVHM 2.4 Software Health Management

Number Title Year Dependencies

2.4.5.1 Initiate survey of state of the art assessment of software health management concepts and technologies (WAYPOINT)

FY08Q4

Outcome Findings will be collected in a document in a submission to a peer-reviewed conference. (FY08Q4)

2.4.5.2 Framework for accumulating evidence that observed behavior, including both inputs and outputs, of a software system is consistent with its expected behavior.

FY09Q4 FY10Q3 FY11Q2

2.4.5.1, 2.4.5.2, 2.4.5.3 Metrics i) Perform a study to catalog historical aircraft software anomalies to include representative anomalies

uncovered during pre-deployment verification and validation activities as well as those discovered post- deployment. From this catalog a set of working metrics will be derived for developing an evidence base. (FY09Q4)

ii) Instrument a relevant aircraft software / hardware instantiation to capture a minimum of two representative anomalies identified in the metrics established in (i). (FY10Q3)

iii) Collect data from an instrumented system and conduct a peer review of the framework with the multi- agency High Confidence Software (HCSS) coordinating group of the NITRD; document and report case

study analysis results to an appropriate peer-reviewed journal. (FY11Q2) Metric

Rationale

The collected data will provide an evidence base that certain properties, especially certain extrinsic properties, are consistent with 1) explicit assumptions regarding system specifications and 2) explicit assumptions made with respect to certain physical devices with which the software interacts. Data will be provided to the Dashlink website for dissemination.

2.4.5.3 Classification of software malfunctions for which 1) recovery is guaranteed and 2) recovery is not guaranteed. (WAYPOINT)

FY10Q4 2.4.5.1

Outcome Delivery of a safety-critical software malfunction taxonomy that identifies classes of software malfunctions that are suitable to in-flight recovery by identifying sets of malfunctions for which effective mitigation strategies are guaranteed, and those for which the recovery cannot be guaranteed. (FY10Q4)

2.4.5.4 Evaluation of integrated adaptive reconfiguration of safety- critical aircraft software

FY12Q4

Metrics i) Identify and document, via RTIP, a suitable experiment on a realistic testbed. Experiment documentation will include TBD evaluation metrics. Suitability of experiment/demonstration will be assessed in the context of HCSS peer review; experiment and metrics will show 100% traceability to these challenge problems. (FY12Q4)

ii) Demonstrate TBD performance (RTIP-specified) of adaptive reconfiguration of safety-critical aircraft software. (FY12Q4)

Metric Rationale

The study will include the identification of outstanding issues and an evaluation of potential impact on the deployment of these systems. The capability demonstration will provide confidence that the mitigation techniques are relevant to onboard IVHM systems.

In document DISTRITO INDEPENDIENTE ESCOLAR DE COMAL NORMAS DE CONDUCTA DEL ESTUDIANTE (página 31-33)