The Anomaly IDS is trained for five weeks to learn the normal network traffic of the IIT, Kharagpur. The model considers a vector of 25 network attributes to describe the target network. The IDS is also trained for more than three weeks to learn the network behaviour under intrusions. The intrusions are simulated in the network using MIT-DARPA 1999 data set. The training data contains a total of 4396 vector data points for normal traffic and 2120 vector data points for intrusive traffic. The training period covers different types week days (working, Saturday and non working days). The network profile is generated using the training data which contains a total of 168 vector data points corresponding to each hour of the day over the entire week. The same training data and the test data is used with all the three techniques discussed earlier.
About MIT-DARPA IDS Evaluation
In 1998, the Information Systems Technology Group of Lincoln Laboratory at MIT, in conjunction with the Air Force Research Laboratory (AFRL) and the Defence Advanced Research Projects Agency (DARPA), began work to develop a standard for the evaluation of Network IDS. Developing this evaluation meant the creation of consistent and repeatable network traffic. The traffic was created through the study of 4 months of data from Hanscom Air Force Base and approximately 50 other bases. Using that data, they were able to generate and simulate network traffic, while introducing attacks, probes and intrusions into the data. Both training and testing data were simulated and two types of traffic were published. Training data is traffic in which the attacks were known from the start. A second set of data contains traffic in which the attacks were not described explicitly. Data sets of Week 1 and Week 3 contain attack free traffic while Week 2 contains training data with attacks. Week 4 and Week 5 are the testing data containing network attacks in the midst of normal background data. Test Data sets contains four categories of simulated attacks
DoS – Denial of service (e.g. SYN flood)
R2L -- unauthorized access from remote machine (password guessing)
U2R –unauthorized access to super user or root functions (buffer overflow attacks) Probing --surveillance and other probing vulnerabilities (port scanning)
A more complete discussion on this is available at the Lincoln Laboratory/ MIT site [22].
The table 1 gives the values obtained for the Hotelling’s multivariate expression and Bayesian Classifier for normal and intrusive network traffic.
Values for Hotelling’s Statistic Values for Bayesian Classifier
Normal Intrusive Normal Intrusive
1 7.74E+09 1.32E+17 3.07E+08 6.59E+16
2 7.60E+08 9.07E+16 1.48E+07 4.54E+16
3 5.60E+08 6.26E+16 1.32E+07 3.13E+16
4 4.49E+08 6.05E+16 1.07E+07 3.02E+16
5 1.59E+08 4.35E+16 1.04E+07 2.18E+16
6 8.84E+07 2.97E+16 1.03E+07 1.48E+16
7 5.10E+07 2.60E+16 6.70E+06 1.30E+16
8 4.50E+07 2.37E+16 6.52E+06 1.19E+16
9 2.95E+07 1.95E+16 2.88E+06 9.77E+15
10 2.46E+07 1.57E+16 2.74E+06 7.85E+15
11 2.09E+07 1.09E+16 1.71E+06 5.44E+15
12 1.93E+07 9.58E+15 2.16E+05 4.79E+15
13 1.36E+07 9.34E+15 2.60E+05 4.67E+15
14 1.34E+07 6.34E+15 7.19E+05 3.17E+15
15 1.17E+07 5.19E+15 1.29E+06 2.59E+15
16 8.36E+06 5.12E+15 1.40E+06 2.56E+15
17 7.88E+06 3.79E+15 1.41E+06 1.89E+15
18 6.27E+06 2.64E+15 1.59E+06 1.32E+15
19 5.67E+06 2.29E+15 1.63E+06 1.15E+15
20 4.85E+06 2.28E+15 2.42E+06 1.14E+15
21 3.26E+06 3.32E+14 2.84E+06 1.66E+14
22 3.18E+06 2.67E+14 3.13E+06 1.34E+14
23 2.82E+06 2.67E+14 3.94E+06 1.33E+14
24 2.80E+06 2.12E+14 4.18E+06 1.06E+14
25 2.59E+06 1.65E+14 5.85E+06 8.25E+13
26 1.44E+06 1.08E+14 6.70E+06 5.39E+13
27 5.20E+05 7.73E+13 6.82E+06 3.87E+13
Table 1: Typical values obtained for the normal and intrusive network traffic with Hotelling’s and Bayesian discriminator functions
By manually analysing a large set of values obtained for Hotelling’s and Bayesian discriminators, it is found that following values more closely discriminate the normal activities from the intrusive ones.
Hotellings Technique: On an average, the values for normal activities lie between
1.00E+06 to 5.00E+07 while for intrusive the values are above .90E+08.
Bayesian Technique: On an average, the values for normal activities lie between
4.3. C
OMPARATIVE RESULTSAttack Name Tools/Data set used Count Detection using different Techniques
Probabilistic (Bayesian Classifier) Statistical (Hotelliing's Hypothesis) Statistical (Mean ± 2*SD)
ping flood ping tool 15 15 15 15
DoS attack ddos open source tool 5 5 5 5
TCP RST attack neti open source code 5 5 5 5
TCP Syn flood
attack neti open source code 7 7 7 6
UDP attack neti open source code 10 10 10 10
X mas scan nmap tool 5 5 4 4
NTinfoscan MIT_ DARPA 1999 Data set 1 0 0 0 pod " " 2 2 2 2 back '' " 2 0 0 0 httptunnel " " 2 0 0 0 land " " 2 2 2 2 secret " " 3 0 0 0 portsweep " " 3 3 3 2 eject " " 3 0 0 0 mailbomb " " 2 2 2 2 ipsweep " " 3 3 2 2 satan " " 2 1 1 1 neptune " " 2 2 2 2 Total 74 62 60 58 Detection Accuracy (%) 83.78 81.08 78.38 Total Alerts generated 65 64 67 No. of Attacks missed 12 16 20
rate (%) False Negative rate (%) 16.22 21.62 27.03 Positive Prediction rate (%) 95.40 90.63 78.30
Table 2: Chart showing the comparative results of the experiments
Table 2. given below shows the results obtained by Daniel Barbara et al using pseudo-Bayes estimators [6]
Table 3. Experimental results on MIT_LL DARPA 1999 Data set.
Source: http://www.cs.ubc.ca/local/reading/proceedings/siam_datamining2001/pdf/sdm01_29.pdf
4.4. D
ISCUSSIONThe experiment clearly revealed that the Bayesian classification method gives better detection rate and less false positives in detecting the intrusions among the three techniques discussed in the project. The detection accuracy of ≈84 % is achieved using the Bayesian method with the false positive rate of 4.6%. Hotelling’s statistical method gave a hit rate of ≈81% at 6.2% false positive rate. The performance metrics for statistical Moments (mean and standard deviation) model yielded hit rate of ≈ 78% while the false positive rate was 13%. The
comparative analysis with the previous works also reveals that the Bayesian approach is a superior technique.
In summary, the results show that the approach followed in this thesis is quite effective and efficient for detecting the network based attacks. It is also observed that the multivariate statistical techniques are more effective than the univariate technique, particularly the Bayesian techniques has promising potential in the future IDS research
5. CHAPTER
5
5.1. C
ONCLUSIONNetwork Intrusion Detection System has a major role to play in safeguarding the network resources against various kinds of attacks. With the advent of new vulnerabilities and sophistications in the nature of attacks, new techniques for intrusion detection have evolved. The main objectives of the research being increasing the detection accuracy while keeping the false positive rate low.
As stated earlier, the signature based techniques are good but has the obvious short comings like failure to detect novel attacks, increasing signature database etc. So the viable alternative would be to analyse the behaviour of the network as a whole and trying to build the model based on the observations. So Anomaly based detection has been a wide area of interest for researchers since it provides the base line for developing promising techniques.
The Anomaly based detection complements the Signature based technique and helps in identifying the novel attacks which lead to the anomalies in the network traffic. The major concerns in this method are identifying the appropriate network features to characterize the network and build a behavioural model and also the rate of false positives may increase sharply if the IDS is not trained sufficiently in the target network.
In the present framework of project, discussed the design and development of “Anomaly based intrusion Detection system” which is built on top of a existing open source signature based network IDS, called SNORT so to have both the analysis techniques in a single package .
The Anomaly based component of IDS is trained in the Computer and Informatics Centre of Indian Institute of Technology (IIT), Kharagpur where the IIT network traffic is sniffed using a port mirrored switch at the gateway. The IDS is trained for more than a month in the IIT network at computer and Informatics centre, to learn the normal traffic pattern. Also it is exposed to the intrusive traffic
for more than 3 weeks, in a simulated environment by replaying the MIT DARPA Intrusion Detection System training datasets (1999).
The thesis presented three techniques for detecting anomaly based intrusions at the network level. Statistical based anomaly detection techniques use statistical properties and statistical tests to determine whether "observed behaviour" deviate significantly from the "expected behaviour". The first technique is based on univariate statistic model with mean and variance. The second method uses the multivariate Hotelling’s method while the last technique uses the Bayesian classification technique for discriminating attacks from that of normal activities.
All the three techniques are evaluated with the DARPA IDS evaluation Data sets (1999) and the results are compared. Bayesian approach proved to be a better solution than the Hotelling’s Multivariate technique and the method of Statistical Moments.
Presently, the work caters only to identify and classify the events into normal and attack classes. It can be extended to detect and classify the attacks into multiple attack classes. Dynamic updation of the Anomaly Model using Bayesian Network can also be considered for future enhancement. Different Analysis techniques like HMM and Fuzzy Logic can also be tried as alternative techniques for anomaly detection.
BIBLIOGRAPHY
[1]. R.Coolen, “Intrusion Detection: Generics and State of the Art”, RTO Technical Report 49,http://www.tno.nl/instit/fel/div2/resources/rto-tr-049-ids.pdf
[2]. J. P. Anderson, “Computer Security Threat Monitoring and Surveillance”, Technical Report April 1980, http://csrc.nist.gov/publications/history/ande80.pdf
[3]. Martin Roesch : “Snort Documents”, http://www.snort.org/docs/
[4]. Net Optics, Inc. “White Paper: Deploying Network Taps with Intrusion Detection Systems”,
http://www.netoptics.com/products/downloads.asp?PageID=150&Section=res
[5]. Jack Koziol, “Intrusion Detection with Snort”, Pearson publications, 2003 [6]. Basic Analysis and Security Engine project, http://base.secureideas.net/
[7]. White papers on “Basic Analysis and Security Engine”(BASE),
http://whitepapers.techrepublic.com.com/abstract.aspx?docid=266711
[8]. Q. Zhao, J. Sun, S. Zhang, “A hybrid and hierarchical NIDS paradigm utilizing naïve Bayes
classifier”, Canadian conference on Electrical and Computer Engineering, 2004,
http://ieeexplore.ieee.org/iel5/9317/29618/01344977.pdf?tp=&isnumber=&arnumber=1344977
[9]. Javitz HS, Valdes A. “The NIDES statistical component description of justification” Technical Report A010, SRI International, Menlo Park, CA, March 1994.
http://www.cs.ucdavis.edu/~wu/ecs236/papers/hw2_NIDES-STA-description.pdf
[10]. Javitz HS, Valdes A. “The SRI statistical anomaly detector”, Proceedings of the 1991 IEEE Symposium on Research in Security and Privacy, May 1991
http://ieeexplore.ieee.org/iel2/349/3628/00130799.pdf?tp=&isnumber=&arnumber=130799
[11]. V. Paxson, “Bro: A System for Detecting Network Intruders in Real-Time”, Computer
Networks, 1999, http://bro-ids.org/publications.html
[12]. D. Barbar´a and S. Jajodia and N. Wu and B. Speegle , “The ADAM project”,
http://www.isse.gmu.edu/dbarbara/adam.html
[13]. Nong Ye and Qiang Chen, “An anomaly detection technique based on a chi-square statistic
for detecting intrusions into information systems”, Quality and Reliability Engineering
International, 17:105--112, 2001, http://citeseer.ist.psu.edu/ye01anomaly.html
[14]. Ye, N., Li, X., Chen, Q., Emran, S. M., and Xu, M. “Probabilistic Techniques for Intrusion
Detection Based on Computer Audit Data”, IEEE Transactions on Systems, Man and
Cybernetics, vol.31(4), pp.266--274, July 2001.,
http://ieeexplore.ieee.org/iel5/3468/20237/00935043.pdf?tp=&isnumber=&arnumber=935043
[15]. A. Qayyum, M. H. Islam, and M. Jamil, “Taxonomy of Statistical Based Anomaly Detection
Techniques for Intrusion Detection”, IEEE International Conference on Emerging
Technologies, September 17-18,2005
http://ieeexplore.ieee.org/iel5/10430/33125/01558893.pdf?tp=&isnumber=&arnumber=1558893
[16]. M. Mahoney and P. Chan, “PHAD: Packet header anomaly detection for identifying hostile
network traffic”, Technical report, Florida Tech., technical report CS-2001-4, April
2001, http://citeseer.ist.psu.edu/mahoney01phad.html
[17]. Mahoney M. and P. Chan, “Learning models of network traffic for detecting novel attacks", Technical report, Florida Tech 2002, http://cs.fit.edu/~mmahoney/paper5.pdf
[18]. D. Barbara, N. Wu and S. Jajodia, “Detecting Novel Network Intrusions using Bayes
Estimators”, Proceedings of the 1st SIAM International Conference on Data Mining,
2001, http://www.cs.ubc.ca/local/reading/proceedings/siam_datamining2001/pdf/sdm0129.pdf
[19]. Jack Koziol, “Intrusion Detection with Snort”, Pearson publications, 2003
[20]. R. Dan Reid & Nada R. Sanders, “Operations Management”, 3rd edition., Wiley ,2007
[21]. P. Cisar, S. M Cisar, “Quality Control in Function of Statistical Anomaly Detection in Intrusion
Detection Systems”, SISY 2006 - 4th Serbian-Hungarian Joint Symposium on Intelligent
Systems, www.bmf.hu/conferences/sisy2006/19_Cisar.pdf
[22]. DARPA Intrusion Detection Evaluation, Data Sets and Documentation, 1999
http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/docs/detections_1999.html
[23]. Giorgio Giacinto, Fabio Roli, Luca Didaci, ”Fusion of multiple classifiers for intrusion
detection in computer networks”. Pattern Recognition Letters 24(12): 1795-1803 (2003)
http://www.diee.unica.it/informatica/en/publications/papers-prag/IDS-Journal-01.pdf
[24]. R. Puttini, Z. Marrakchi, and L. Me. “Bayesian Classification Model for Real Time Intrusion
Detection”, in 22th International Workshop on Bayesian Inference and Maximum
Entropy Methods in Science and Engineering, 2002.