• No se han encontrado resultados

3.3 LA ESCUELA Y LA EDUCACIÓN EN VALORES

3.3.4 La moral y los valores vistos por los niños y adolescentes

ML is poised as an effective DDoS detection and mitigation tool, due to its ability to learn and make predictions on previously unseen data. Many ML-based approaches have been tested, using both supervised and unsupervised approaches. Supervised ML learning approaches are the most prevalent in recent DDoS detection systems, leveraging statistical features of flows to identify anomalous traffic on the network.

Chen (2009) investigated a statistical model, where the inbound Synchronise Arrival Rate (SAR) was compared to the normal distribution of traffic flows originating on the Internet toward their campus network.

The method identified an attack using a two-step process. First, the current SAR was compared with the normal, expected SAR distribution. If no significant difference was noted, the ratio of Synchronize (SYN) and ACK packets was checked against the normal distribution. Chen (2009) stated that the method incurred low computational overhead as it simply counted the SYN and ACK packets received, without storing or tracking the entire three-way handshake. The efficacy of the method was tested and validated through experimental simulations, where results against a private data set indicated that a low false positive and false negative rate was achieved with short detection times. According to Chen (2009), the method was capable of detecting even subtle attacks as the SAR and SYN-to-ACK ratio deviate from the norm. While Chen (2009) failed to provide a summary of accuracy achieved for their method, the results published showed reduced computational complexity compared to two similar methods. These methods are the SYN arrival method, where a heuristic threshold is configured for SYN packet counts over set intervals and the SYN-FIN method, where TCP flags for the entire flow are monitored. The ability for the method to detect both high and low bandwidth DDoS

Feature Description

NICM P Percentage ofICM P packets NU DP Percentage ofU DP packets NT CP Percentage ofT CP packets

NT CP SY N Percentage ofSY N flags inT CP packets

NT CP SY N ACK Percentage ofSY N+ACK flags inT CP packets NT CP ACK Percentage ofACK flags inT CP packets

AAV GHEADER The average size of packet headers AAV GDAT ASIZE The average size of data packets

Table 5.1: Features extracted from flows between time intervals, used to construct the neural network input data in the method proposed by Jalili et al. (2005).

attacks make it suitable for identifying TCP Connection Attacks (Section 5.2.1.1), which is critical as TCP is the most important traffic on the Internet, targeted by 90 percent of DDoS attacks (Moore et al., 2006).

Jalili et al. (2005) trained supervised Artificial Neural Networks (ANN) for detecting flooding attacks, where an attacker targets a network with semi-normal packets. Semi-normal packets are correctly formed data packets exchanged within a flow, occurring at an abnormal rate or containing abnormal payload. The statistical features a flow exhibits change under DDoS attack, deviating from the normal distribution of features (Jalili et al., 2005). Jalili et al. (2005) also stated that these features are time-based and may exhibit divergent profiles at different times, such as day and night. For this reason, Jalili et al. (2005) divided the provided normal and attack traffic samples into minor time intervals, forming temporal data sets. These data sets included all packets and their associated timestamps. The statistical features extracted from these traces form the training data for a pre-processor neural network. The authors implemented this ANN for DDoS detection in conjunction with their Unsupervised Neural Network based Intrusion Detector (UNNID) method (Amini and Jalili, 2004), a method that implements Adaptive Resonance Theory (ART)(Grossberg, 2013). Combined, these methods form the the Statistical Pre-Processor and Unsupervised Neural Network-based Intrusion Detector (SPUNNID) method. The features used by Jalili et al. (2005) to describe flows at each interval are listed in Table 5.1. To evaluate the

Feature Description

T ime Time the connection was first observed (start time) SrcIP Source IP address of the flow

DstIP Destination IP address of the flow SrcP ort Source port of the flow

DstP ort Destination port of the flow P rotocol IP protocol of the flow

Table 5.2: Features extracted from the public, anomymous data sets used by the DDoS detection method proposed by Bhaya and Manaa (2014).

method, the authors recorded real network traffic, which they replayed in a simulated environment while DDoS traffic was introduced by a traffic generator. Evaluation results revealed that with a time interval not exceeding two minutes, the method was able to differentiate between normal and attack traffic in 94.9 percent of the cases.

Bhaya and Manaa (2014) asserted that data mining techniques (Witten and Frank, 2005) were capable of successfully distinguishing normal traffic from attack traffic with good accuracy. The authors proposed a hybrid approach for detecting and analysing DDoS attacks in real world traffic, tested using the CAIDA UCSD “DDoS Attack 2007” data set8 for attack traffic and traces from the CAIDA Anonymized Internet Traces 2008 data set9 for normal traffic. The data sets contained anonymised flow traces, where the payload of each packet had been removed. The method proposed by Bhaya and Manaa (2014) used attributes inferred from IP packet headers only. These attributes are described in Table 5.2. The authors extracted 2,000,000 packets from each data set and, using Shannon’s Entropy (Shannon, 2001) and min-max normalisation, transformed the data to suitable input vectors for their k-means clustering algorithm.

8http://www.caida.org/data/passive/ddos-20070804_dataset.xml

9http://www.caida.org/data/passive/passive_2008_dataset.xml

actual value

Prediction outcome

p n total

p0 True Positive

False

Negative P0

n0 False Positive

True

Negative N0

total P N

Table 5.3: A confusion matrix for DDoS detection, where True Positive denotes the number of packets correctly identified as malicious, False Positive where normal traffic is incorrectly identified as malicious (attack),False Negativewhen attack traffic is incorrectly identified as normal traffic and True Negative when normal traffic is correctly identified.

Bhaya and Manaa (2014) evaluated the success of their method using a confusion matrix (Stehman, 1997) (Table 5.3), where success was determined by Equations 5.1, 5.2 and 5.3.

accuracy = T P +T N

T P +T N +F P +F N (5.1) detectionrate= T P

T P +F P (5.2)

f alsealarm= F P

F P +T N (5.3)

The results published by Bhaya and Manaa (2014) show their centroid-based rules method achieved better results than a simple centroid-based method (Oh and Lee, 2003). Testing showed that an accuracy of 99.77 percent was possible using the centroid-based rules method on the CAIDA data set, whereas the comparable centroid-based method achieved 97.67 percent.

Yassin et al. (2013) proposed a hybrid machine-learning approach to anomaly detection on networks using K-means Clustering (KMC) and Na¨ıve Bayes Classifier (NBC), called “KMC+NBC”. The authors argued that NBC alone produced a high percentage false alarm rate, whereas when combined with KMC, the same accuracy and detection rates are noted,

with a significantly reduced false alarm rate. The authors evaluated their method using the ISCX 2012 benchmark data set, containing over 1,512,000 packets, describing network traffic for a seven-day period by 20 distinct attributes (Table 5.4). More information about this data set is available in Shiravi et al. (2012). Yassin et al. (2013) showed that, with the KMC+NBC method, 99 percent accuracy and 98.8 percent detection rate was achievable, compared to NBC alone, which scored 88.2 percent and 85.0 percent respectively.

Juma et al. (2014) extended the work of Yassin et al. (2013), replacing the K-means and NBC algorithms with X-means clustering (Pelleg et al., 2000) and Random Forest Classifier (Ho, 1995) respectively. Juma et al.

(2014) stated that the configuration was better suited to reducing false alarms, compared to similar methods. The method also used the ISCX 2012 IDS data set during evaluation, where average accuracy scores reached 99.96 percent, with detection rates of 99.99 percent, whilst maintaining false alarm rates of 0.02 percent. Yassin et al. (2013) and Juma et al.

(2014) demonstrated that instituting clustering algorithms such as K-means and X-means, as first-phase classifiers, greatly enhanced accuracy and detection rates of anomalous traffic in IP flow trace data sets, while reducing the number of false positives.

The APIC method adopts a similar architecture, where an EA guides a clustering algorithm using descriptive feature sets, prior to producing unique classifiers to identify future instances of each cluster. The following section describes experiments where the tests devised by Yassin et al. (2013) and Juma et al. (2014) were implemented using the APIC method.

Feature Description Data Type appN ame The name of the application protocol String totalSourceBytes Total bytes from the source Integer totalDestinationBytes Total bytes from the destination Integer totalSourceP ackets Total packets from the source Integer totalDestinationP ackets Total packets from the destination Integer sourceP ayloadAsBase64 First source packet payload (Base64

Encoded)

String

sourceP ayloadAsU T F First source packet payload (UTF Encoded)

String

destinationP ayloadAsBase64 First destination packet payload (Base64 Encoded)

String

destinationP ayloadAsU T F First destination packet payload (UTF Encoded)

String

direction Direction of flow String

sourceT CP F lags TCP flags observed in source packets String destinationT CP F lags TCP flags observed in destination

packets

String

source Source IP address String

protocolN ame Transport layer protocol String

sourceP ort Source port of the flow Integer

destination Destination IP address String

destinationP ort Destination port of the flow Integer

startDateT ime Start time of the flow Integer

endDateT ime End time of the flow Integer

tag Indicator of “normal” or “attack”

traffic

String

Table 5.4: Features included in the ISCX 2012 IDS labelled data set.