FASE III: Ejecución - CONTENIDO DE LA PROPUESTA

CAPÍTULO IV: MARCO PROPOSITIVO

4.2 CONTENIDO DE LA PROPUESTA

4.2.3 FASE III: Ejecución

This research leads to some future work, which is discussed in the following subsections.

8.2.1 Alternative clustering methods

As we have seen in section8.1, finding the best clustering method with the purpose of enhancing anomaly detection was outside the scope of this research. It might benefit existing anomaly detection solutions to further narrow down which specific clustering methods (potentially, under which parameters and assumptions) result in the best detection rates. This could include entirely different clustering methods than the four discussed in this work. Additionally, another interesting mode of clustering to investigate is fuzzy clustering, in which a single host can belong to multiple clusters at the same time with a certain probability.

8.2.2 Different data sources

While this research primarily concentrates on flow data, it is worthwhile to investigate the trade- off between the various data types. As discussed in section5.1.2, there are clear differences between the level of information they provide. Still, since it remains unclear how the data source impacts the detection rates, it would be worthwhile to further explore whether choosing a different data source would improve the detection rates.

8.2.3 IPv6 compatibility

In the current approach, it is assumed the network data is carried over IPV4. For the sus- tainability of the presented approach, it should be investigated what implications the future deployment of IPV6 has with regards to this type of anomaly detection.

Although the flow data format will not require great change, the IPV6 protocol has some notable differences to its predecessor. As a result of protocol changes and the introduction of new

8.2. FUTURE WORK 75 functionality, assumptions that hold for IPV4 may no longer hold for IPV6. For instance, the assumption that an IP address can reliably be mapped to the same host is no longer trivial under IPV6, in which new alternatives to DHCP and the introduction of privacy extension may make tracking machines harder.

Additionally, a transition to IPV6 might result in changes in the behaviour of APTs. The steps taken on the internal network changes are likely to change: this might give opportunities for both sides, making it either easier or harder to detect intrusions. Although there has been research into this area [47], further research on the impact on internal network traffic specifically is recommended.

8.2.4 Broader application

The presented anomaly detection algorithm for internal network traffic is just an instance of the more general concept of applying clustering as a ‘smart’ intermediary step. In future work, the application of this concept in a broader context should be investigated.

For instance, it could be examined how clustering could enhance similar concepts (e.g. user behaviour analytics), how it can be applied to different data sources (e.g. external network traffic data) and domains (e.g. host-level monitoring), or even outside the intrusion detection domain (anomaly detection in the broadest sense).

A

Proof-of-concept implementation

An important part of this research involved the development of a proof-of-concept of the anomaly detection system that is proposed in this thesis. This enabled us to make statements about the performance, efficiency and usability of the proposed methods. The proof-of-concept is based on the description of the method as in chapter5and is used for the evaluation in chapter6.

This appendix briefly describes the structure of the proof-of-concept, the functionality of the modules and some of the inner workings. The proof-of-concept is written in thePython

programming language and consists of four main modules,generation,preprocessing, clusteringandanalysis.

To get access to the full source code (including code annotations and usage examples) of this proof-of-concept, please contact the author.

A.1 Generating network traffic

Thegenerationmodule can generate PCAP files according to a client-server model. It is possible to specify the number of servers and number of clients, upon which network data with those numbers of hosts is simulated. This module relies onns-315, an free, open-source, discrete-event network simulator for Internet systems, targeted for research use.

Each client connects at least once to a number of servers, determined randomly based on a Pareto distribution. Subsequently, the number of flows, the number of bytes per flow, the duration per flow and the interval between flows are randomly generated according to statistical distributions too. The number of flows is decided following a Pareto distribution, whereas the others are generated according a Weibull distribution.

The module configuresns-3to generate traffic according to this infrastructure. Furthermore, the model aggregates the data produced byns-3. The result is a single PCAP file, containing the generated network traffic. The contents of the packets are empty, i.e. null bytes.

This module is not essential to the proof-of-concept, but helps generating internal network traffic with the option to tune the parameters to alter the network configuration. This may be useful for the analysis carried out in the following modules.

In document Auditoría de gestión para la Asociación de Personas con Discapacidad Física del cantón Joya de los Sachas, provincia de Orellana, periodo 2015 (página 116-129)