Three common approaches adopted to evaluate the performance of TEWA-related systems were identified during the literature review conducted during this study. These approaches are proto- type evaluation in conjunction with end-users, single scenario evaluation and batch-simulations. The limitations and advantages of each of these approaches are described in this section within the context of a TEWA system, referring specifically to how, and when, these methods should be applied during a system’s life-cycle.
8.4.1 Prototype Evaluation in Conjunction with End-Users
The preferred method (in terms of building confidence) of testing whether a TEWA DSS adheres to the system specifications, is to conduct live flight tests [92]. The complex nature of a TEWA system together with the high cost of such an approach, however, makes it intractable to conduct flight tests merely for the purposes of system evaluation during the early stages of system development. System designers are therefore forced to rely on M&S to perform trade-off studies and sensitivity studies for the purposes of system evaluation [186].
Furthermore, flight tests alone do not provide insight into scenarios that were not actually tested [65]. It is not practical to test all the possible engagement scenarios with full-scale flight tests. Moreover, it would not be feasible to test enough replications of a specific engagement scenario live in order to determine the system’s performance with statistical significance. Because of the confidential nature of this research area, historical data on flight tests, from which system
8.4. Performance Evaluation Approaches 143
designers can gain insight into the performance characteristics of existing TEWA systems, are very rare.
Even so, in order to fully understand the success of a DSS, it is required to include the end-users into the performance evaluation stage. The insights provided by these end-users (operators of the system) are crucial in developing a highly efficient system. The prototype evaluation of a DSS is therefore an attractive evaluation technique during the final stages of system development. Testing a system in conjunction with the intended end-users in an environment similar to the actual operational environment should provide an accurate indication of whether the system is indeed useful, as well as identify weaknesses and strengths in a bid to further improve the system. As mentioned in §8.2, this evolutionary development is, indeed, a characteristic property of an SoS.
This prototype evaluation method has been applied successfully by Smith et. al. [184] as part of the TADMUS programme. The end-users were divided into two groups — the one group had access to the DSS which aided the TE process of the operators, while the other group performed without it. A total of 15 teams were created of which eight teams had access to the TADMUS DSS. The same TEWA scenario was presented to both groups. At critical times during the scenario, the teams were asked to provide a prioritised list of the most threatening targets and their decisions were evaluated against the “correct” answers as calculated by the TADMUS system algorithms. The results of the study indicated that the operators who had access to the DSS performed significantly better at detecting deceptive threats.
The problems associated with practical implementation is that this method requires many re- sources and significant time to generate useful results [184, 186]. The success of this approach also depends on the HMI implemented — even more so than on effective algorithms [67]. Fur- thermore, a suitable high-stress, dynamic scenario must be created, or otherwise the results will not be valid for real situations. System designers are therefore “forced” to utilise M&S tools in the evaluation of the performance of TEWA systems; especially during the early phases of a system life-cycle.
8.4.2 Single Scenario Evaluation
Another approach towards DSS performance evaluation involves the construction of a single sce- nario. The algorithms may then be implemented in the scenario and the results are evaluated in that context. Simulations need to be run for numerous iterations in order to derive proba- bilities estimates associated with certain events, as stated in §8.2. In this way, it is possible to achieve a rudimentary understanding of the algorithms’ performance and identify opportunities for improvement of their working. The results of such an approach can be validated by show- ing the response to domain experts and determining whether the outputs correspond to their intuition [218].
When applying this approach, it is still required to determine if the outcomes of the simulation provides the expected and, to a certain extent, realistic results. One method to gain such confidence, is to generate scenarios for which the correct solutions are known (e.g. a historical scenario). Proving that the system functions as expected in known circumstances is helpful and, in fact, desirable, but it does not prove that the system will generate accurate, reliable outputs for all circumstances. Although a single simulation output may match a known solution, it does not necessarily prove the successful functioning of the system. To ascertain that the system will work for all situations, it is required to execute a large number of different simulations and validate their outcomes. The outcomes may be validated by comparing the results with expert
opinions (i.e. by presenting the same situational problem executed by the simulation to military experts and comparing their answers to that of the simulation).
This single scenario approach may also be used to test the performance of different algorithm combination designed to achieve the same goal (e.g. different combinations of TE and WA algo- rithms). For instance, numerous algorithms may be applied to the same scenario and the results compared. Johansson and Falkman [95] followed this approach by developing two algorithms and comparing their results in an identical scenario. Not only were the results analysed, but also the ability of the algorithms to adapt to changes, such as missing or incomplete information, and abrupt threat value changes [95].
The single scenario method should require less resources than the aforementioned prototype evaluation method. This approach is therefore more suitable during the earlier iterative design stages of a project, especially when the developed system does not yet exist. When applying this approach, it is important that the results are carefully considered and that the limitations of the specific test scenarios are understood, since the results only show how the system performs in the particular scenario tested. Therefore, instead of evaluating whether the system can be used in generic situations, this approach merely clarifies specific strengths and limitations of the models (and implemented algorithms) and highlights particular characteristics of the system. As such, careful consideration should be taken when selecting the scenario(s) for which the tests will be conducted.
8.4.3 Batch-simulations
The use of batch-simulations entails executing a large number of scenarios so as to be able to perform statistical analyses on the conditions for which the algorithms perform well. This method builds upon the single-scenario approach, since each tested scenario would need to be tested sufficiently in order to account for SoS effects. In order to apply this method, performance measures have to specified for the use in the ensuing statistical analysis, since it is typically not possible for a human analyst to analyse all the scenarios separately by considering the outcomes individually.
A problem experienced in many batch-simulation studies in the open-literature where TE or WA are evaluated7, is that the majority disregard the complexities defining the CPM problem, as mentioned in §6.1. Most studies use a combination of straight-path threats with no refer- ence to realistic tracks [135], homogeneous WSs (with constant SSHP and no different types of WSs) [125], simplified WA constraints set-ups and randomly generated scenarios [93, 205]. Consequently, it is both difficult to draw significant conclusions regarding the functioning of the TEWA system and to compare the results of different studies, as motivated in §6.2. Because of all the possible different formulations of this problem (TE and WA combined), the focus can only be on evaluating the effectiveness of the implemented set of algorithms with certain types of data-sets (scenario set-up, WS properties etc.).
Because of the scarcity of realistic scenario-related information, this method is difficult to apply in an academic setting. Batch-simulations should only be performed if a high-level of confidence has been reached with respect to the effectiveness of the algorithms. As such, a realistic single- scenario approach, where algorithms can be tested sufficiently, should be seen as a prerequisite before applying this approach.
7Studies where a TEWA system is evaluated, with interoperable TE and WA subsystems, could not be found. The focus is generally on one of the two subsystems. In the case where WA is evaluated, a rudimentary TE approach (a single TE measure, generally distance) is generally implemented and vice versa [135, 215].