3. EVOLUCIÓN DE LA POBLACIÓN
3.1.1. La repoblación del territorio
3.1.1.1. Etapas repobladoras
There are a number of issues with signature based IDS as they are commonly implemented. Of primary importance is the conflict between coverage and performance (i.e. precision, recall, accuracy, and specificity). Some form of cost analysis (e.g. computing, latency, hardware, training, etc.) is generally performed in order to choose the right IDS technology for a given network. Many modern IDS also rely on labor-intensive tuning and signature- refinement to match particular network characteristics and known host vulnerabilities. There are many trade-offs which result in sub-optimal detection, but which decrease the IDS cost substantially. Fan, et al. describe cost metrics in terms of operational costs, attacker induced damage, and incident response, citing the need to consider cost within the development and deployment of IDS[39]. Others have cleverly incorporated cost assessment into decision support systems to propose or enact response actions [86], IDS reconfiguration, and dynamic performance tuning [58].
A commonly cited statement in the literature is that the number of alerts generated by modern IDS is overwhelming to human operators and essentially untenable for any type of manual analysis [58, 92]. Any number of false positives increases the burden without improving the abilities of the analyst. This statement is often cited as a failing of anomaly detection schemes and a justification for the elimination of “unimportant” signatures from misuse detection systems. It has also spurred the industry into the creation of Security Incident Event Management Systems (SIEM) which aggregate, correlate, predict, and display events and their impact[58]. While necessary for the sanity of the human analyst, a SIEM system does not address the issue of improving IDS coverage and can mask performance issues in underlying detection signatures. Due to the unresolved human- factors and cognitive issues, the research community has been essentially unmotivated to significantly expand the coverage of IDS. The number of signatures has remained relatively stagnant for nearly a decade. If the aggregation and correlation issues are ever adequately
resolved, the broader research community may return to identifying a wider variety of events using signature-based approaches.
Careful management of signature-set in order to achieve desired performance goals may also mask performance issues and can actually decrease the usefulness of IDS by removing contextual knowns from the stream of true-positives being displayed to an analyst. In a perfectly “tuned” system one might expect only the most high-priority events to be displayed and all other events to be discarded (or at least hidden). It is possible that removal of true-positives up front in the detector is only necessary due to the fact that such systems are not using predictive mechanisms to “bootstrap” their own performance. It is important to note that the predictive adaptations employed in many prior research studies have generally applied globally to the input data rather than being applied to individual signatures or to sets of signatures grouped by equivalence classes of future events.
A 2008 study by Yu, et al. demonstrated a system which tunes the detection model on-the-fly according to feedback from the system operator when false predictions are made. This adaptive anomaly detection approach throttles alarm output and tunes the detection model[92]. As such, these predictions are not directly seen by or affect the detection model.
Another way to think about tuning procedures such as those demonstrated by Yu, et al. is that they assist the detector in eliminating false positives. But for any set of signatures for which the false positive rate is already zero, no additional gains can be achieved. The procedure does not improve the performance of a signature-set other than to increase the accuracy. If false negatives are being incurred due to packet loss, the approach as described would not directly apply. Nonetheless, these results are important and demonstrate important methods for improving IDS performance. The general approach could also be easily modified to adapt the detection model to decrease packet loss and thus false negatives.
Another type of performance adaptation is one proposed and implemented by Wenke Lee & Wei Fan, et al. They describe methods for performing cost-sensitive adaptation of detection models[39, 56, 58]. Their primary contribution is to show that cost-based adaptive reconfiguration is a viable approach for performance optimization. The primary cost measures considered within their approach were: taxonomic prioritization based on event type, damage cost, response cost, and operational cost. Fan and Lee et al. desire to construct a general purposes cost-model and produce rough categorizations of each measure for purposes of system evaluation. Their basic approach is to reconfigure the IDS as a whole based on in situ measurement of performance issues. Particularly clever is Lee’s use of injected events to determine when the IDS is dropping events. Lee proposes a set of cost objective functions for evaluating and optimizing detection performance[56]. Our research efforts serves to greatly expand their operational cost measure by exploring concepts such as wasted information and anticipatory performance optimizations.
Lee’s approach is quite general and captures the fundamentals of performance adaptation and optimization. By formalizing each of the cost factors involved, Lee has created a relatively straightforward value optimization problem. This provides a useful global optimization for cases where the IDS is overloaded. However, during normal running of the IDS, in respect anticipatory and probabilistic refinements alike, the IDS is wasting work for events which are easy to predict or which are likely to occur many times during a single packet stream.
Similar to the probabilistic signature activation approach, Lee’s threshold-based re- configuration is performed at the last possible time, when the system is failing. Such an optimization can only used to improve the IDS coverage during periods of high load. Further, the optimization problem is equivalent to the Knapsack problem, which is NP- complete and difficult to recompute online. Depending upon the number of features being considered, the cost of performing the optimization could be quite high and could not be used for online optimization. Although, online optimization for the lifetime of the IDS
instantiation is not considered in his analysis, it is clear that an extension might allow for parametric optimizations based on current event and detection engine statistics.
The anticipatory approaches outlined in this dissertation differ significantly from Lee. Such optimizations can be performed at any time, irrespective of system load. Within Lee’s approach the optimization is delayed until the system is beginning to lose fidelity. It is also performed as an adaptation over the entire input set globally, which is likely to be sub-optimal for subsets of packet data. Instead, we consider an approach in which detections for each individual connection (or sets of related connections) are individually considered and improved.
In order for anticipatory approaches to be useful, IDS decision procedures must be constantly re-evaluating the most likely subsets of the decision procedure for all individual connections. Lee’s approach adds a more comprehensive weighting regarding the relative importance of various signatures, producing an intelligent prioritization of event output, but this is optimal only on average. This is essentially the same problem inherent to global optimization of decision tree-based decision procedures. Lee is attempting to guarantee (based on numerous factors) that all events above a given threshold of importance will always be processed. When a performance threshold is met where the IDS is losing information, the system can be dynamically reconfigured to prioritize input data processing features by eliminating analysis tasks.
In the context of an anticipatory approach, Lee provides useful global optimizations, especially for the cases where the IDS is overloaded. However, during normal running of the IDS, Lee’s approach is still wasting information gained due to processing of signatures for particular packets for which predictors might indicate irrelevance.
In addition to useful optimization schemes, Lee et al. also provide a wealth of arguments for the necessity of adaptive systems, but their goals are significantly different. Lee intends to ensure that when the IDS is overloaded optimal decision are made on what to keep and
what to throw away. He assumes that the analysis tasks already represent the maximum coverage possible.
Related to the work by Lee & Fan et al. is a 2003 paper by Balepin et al. which presents a single host-based detection and response paradigm. Their key contribution is a more comprehensive response cost model. The problems that they intend to solve are: a) ensuring that responses do not cause more harm than good; and b) ensuring that responses are not launched unnecessarily due to false-positives or contraindicating factors[17]. By taking into consideration potential response actions, they have provided a useful generalization from the cost models presented by earlier researchers.
When we discuss anticipation within the context of IDS what we are really after is anticipatory response. While our primary research goals have focused on anticipatory optimizations, the research community has been principally focused on the ability of systems to appropriately respond to detected threats. There has long been controversy in the design and deployment of automated response mechanisms, particularly when human decision making is removed from the loop. Nonetheless, it has become apparent that automated response is necessary and many systems incorporate automated response mechanisms. The simplest and most common types are those which make minor adjustments to prevent future attacks. These often adjust or stop network traffic flows (by dynamically changing firewall rules or dropping connections). Other systems may dynamically adjust security policies such as adjusting security domains within the software environment of a single host[85].
In 2009 Strasburg et al. refined the cost-sensitive detection concepts to better describe response systems[86]. They considered three factors: response operational cost, response goodness, and response impact on the system. While some factors used in their cost assessment methodology were subjective, even a subjective measure of cost can inform whether or not any response should be taken. If the cost of a response outweighs its benefits, then alternatives must be sought or planned responses abandoned. Cost-driven optimizations
such as those by Strasburg et al. can both enable better decision making by automated system and simultaneously allow more efficient use of limited computing resources.
More recent work by Barlet-Ros et al., while not directly related to intrusion detection, is relevant due to their use of predictive approaches for managing limited resources of network monitoring systems[18]. Their goal is to maintain bounds on network monitoring system accuracy by proactively shedding excess load. Similar to the PacketWrangler approach, they treat the monitoring software as a black box. However, their purpose is dynamic load shedding whereas PacketWrangler was intended for performance optimization even when systems are not overburdened. Their approach is novel, however, in that prior knowledge of cost models is not necessary.