Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS)

Hasan, Md. Al Mehedi and Nasser, Mohammed and Pal, Biprodip and Ahmad, Shamim (2014) Support Vector Machine and Random Forest Modeling for Intrusion Detection System (IDS). Journal of Intelligent Learning Systems and Applications, 06 (01). pp. 45-52. ISSN 2150-8402

[thumbnail of JILSA_2014021411471330.pdf] Text
JILSA_2014021411471330.pdf - Published Version

Download (660kB)

Abstract

The success of any Intrusion Detection System (IDS) is a complicated problem due to its nonlinearity and the quantitative or qualitative network traffic data stream with many features. To get rid of this problem, several types of intrusion detection methods have been proposed and shown different levels of accuracy. This is why the choice of the effective and robust method for IDS is very important topic in information security. In this work, we have built two models for the classification purpose. One is based on Support Vector Machines (SVM) and the other is Random Forests (RF). Experimental results show that either classifier is effective. SVM is slightly more accurate, but more expensive in terms of time. RF produces similar accuracy in a much faster manner if given modeling parameters. These classifiers can contribute to an IDS system as one source of analysis and increase its accuracy. In this paper, KDD’99 Dataset is used and find out which one is the best intrusion detector for this dataset. Statistical analysis on KDD’99 dataset found important issues which highly affect the performance of evaluated systems and results in a very poor evaluation of anomaly detection approaches. The most important deficiency in the KDD’99 dataset is the huge number of redundant records. To solve these issues, we have developed a new dataset, KDD99Train+ and KDD99Test+, which does not include any redundant records in the train set as well as in the test set, so the classifiers will not be biased towards more frequent records. The numbers of records in the train and test sets are now reasonable, which make it affordable to run the experiments on the complete set without the need to randomly select a small portion. The findings of this paper will be very useful to use SVM and RF in a more meaningful way in order to maximize the performance rate and minimize the false negative rate.

Item Type: Article
Subjects: STM Digital Library > Medical Science
Depositing User: Unnamed user with email support@stmdigitallib.com
Date Deposited: 09 Feb 2023 08:02
Last Modified: 01 Aug 2024 08:33
URI: http://archive.scholarstm.com/id/eprint/322

Actions (login required)

View Item
View Item