This is the instructions for this assignment:Attached please find thepaper for yourcoursetermproject. This is the baseline paper.
You need to read more papers regarding this subject (from references and internet).
Then you write yourtermprojectby focusing on:
(1) What are the issues? (2) What are the techniques/approaches used until now to attack the problems identifiedin part (1)? (3) Your critics of the current approaches (or shortcomings of the current solutions). (4) Summaryand conclusions. You also can include possible future work in this area.
NOTE:I do not want a summary of the baseline paper.Here I submit the file with the title of the project.Please put all the references that you used.Thanks
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE DDoS Attack Detection Based on Simple ANN with SMOTE for IoT Environment Yan Naung Soe Dept. of Electrical and Information Engineering Universitas Gadjah Mada Yogyakarta, Indonesia
[email protected] Paulus Insap Santosa Dept. of Electrical and Information Engineering Universitas Gadjah Mada Yogyakarta, Indonesia
[email protected] Rudy Hartanto Dept. of Electrical and Information Engineering Universitas Gadjah Mada Yogyakarta, Indonesia
[email protected] Abstract— As the IoT era is rapidly developed in recent years, the attackers are mostly targeting IoT environments. They boosted the IoT devices as the bots to attack the target organization, and these devices are easily infected by IoT malware due to their resource constraint to process the powerful security mechanism on these devices. One of very dangerous IoT malware, like Mirai, launched DDoS attacks to the targeted organization via infected IoT devices. Even though many security mechanisms were implemented for IoT devices, it is still needed to get an effective detection system for IoT environments. Our detection system uses the public dataset to detect that kind of attack using machine learning technique, simple architecture with Artificial Neural Network (ANN). Although we used the modern botnet attack dataset, Bot-IoT to detect the DDoS attack, it is needed to overcome one important issue, like imbalance data problem because this dataset has a small amount of benign data and large amount attack data. We used the SMOTE (Synthetic Minority Over-sampling Technique) for solving imbalance data problem to implement a machine learning-based DDoS detection system. Our results indicate that the proposed approach can effectively detect the DDoS attack for IoT environment. Keywords—DDoS Attack, ANN, IoT, SMOTE I. INTRODUCTION IoT (Internet of Things) has been rapidly developing in recent year and, it makes our daily activities to be more convenient. According to the Cisco report [1], mobile traffic will reach up to 49 exabytes per month by 2021. However, the cyber-attacks are more targeting on these devices because of the resource constraint of these devices. It is challenging to implement the effective security mechanism on these devices because these devices have low computational processing memory. The cyber-attacks on IoT devices are growing, and the malware attacks on the smart device reached up three times in 2017 [2]. One of the common challenge attacks on IoT is the Hijacked Devices Conscripted into Botnets attack [3]. These botnet attacks will make the denial-of-service (DDoS) attacks to the target host via infected IoT devices, which was controlled by the C&C server. The typical IoT malware, called Mirai [4], was reported in 2016. It infected about 2.5 million of IoT devices and lunched the DDoS attacks. Therefore, the effective mechanism is needed to detect that kind of attack for protecting the IoT devices and related networks. There are mainly two kinds of the detection system, called misused and anomaly-based detection systems. The public detection systems, like Snort [5] and Suricata [6], are misused-based detection systems. Although these systems are popular for detecting cyber- attacks, these were mainly focused on the traditional network. Moreover, the misused-based detection systems can be circumvented by attackers because these systems used the previous attacks’ signatures for implementing that kind of attack detection systems. The anomaly-based system used the benign traffic data to match with the incoming traffic pattern for detection the attacks. These systems can be effective to detect unknown attacks, but it is difficult for implementing in the IoT environment because of the different nature of the IoT devices. We used a machine learning technique to implement the attack detection system because it can be supported to detect the variant of attack signatures. We also need to select the training dataset to build our machine learning-based detection architecture. Although there were some other machine learning-based detection works, they used the KDDCUP 99 [7] and KDD-NSL [8] dataset. These datasets have no modern attack data, and these are not for IoT network. Therefore, we used the modern dataset, called Bot-IoT [9] which was captured by botnets attacks for IoT environment. However, this dataset has a few benign traffic data, just 477 records, but more than 1.9 million of attacks (DDoS) traffic data are available. Therefore, we adopted the data re-sampling technique, SMOTE to solve the data imbalance problem. We also used the neural network architecture for implementing the attack detection system. We only used a single hidden layer and a single output node for the detection system to be lighter. The paper is organized as followed: In section II, we will discuss the previous attack detection system. In section III, we will present the background methodology, which we adopted the methods and techniques. The proposed system will be expressed in section IV, and the evaluation results will be discussed in section V. Finally, this paper will be concluded in section VI. II. RELATED WORK The public IDS systems, such as Snort [5] and Suricata [6], are the signature-based detection systems, these systems need to update attacks’ signatures/rules for detecting new kind of attacks. Therefore, the signature-based/misused-based detection systems can’t detect the zero-day attack or unknown attack. Authorized licensed use limited to: Florida Atlantic University. Downloaded on January 26,2021 at 22:06:22 UTC from IEEE Xplore. Restrictions apply. The previous machine learning-based detection works [10], [11] used the popular IDS dataset, KDDCUP99 and some other works [12], [13] also used the variant of this dataset, KDD NSL. However, these datasets have no IoT attack traffic data. Therefore, these datasets cannot be used for botnet attacks detection on IoT environment. Some other researches [14]–[16] were also implemented for DDoS attack detection. J. Chen et al. [14] proposed the DDoS attack detection method based on abnormal network behavior in the big data environment. L. Zhou et al.[16] proposed an entropy-based detection measurement on the distribution of the packet size interval for detecting DDoS attack. Y. Xu et al. [15] proposed the method to locate potential DDoS victims and attackers on the SDN network. However, their works were mainly focused on the attacks in the traditional network. III. BACKGROUND METHODOLOGY We used Artificial Neural Network (ANN) and Synthetic Minority Over-sampling Technique (SMOTE) for implementing the DDoS attack detection. A. Artificial Neural Network Artificial Neural Network, ANN is a general, practical method for learning real-valued, discrete-valued, and vector- valued functions [17]. It is useful for classification and clustering, and it can be implemented the supervised or unsupervised manner. The ANN is able to perform classification and even discover new trends or patterns in data. Basic ANN is composed of three layers such as input, output, and hidden layer. Each layer has the number of nodes which is connected from the input layer to the hidden layer and from the hidden layer to the output layer. Those connections represent weights between nodes. The typical neural network process flows are shown in Fig. 1. There are mainly two phases in neural network architecture, such as forward and backward phases. In the proposed system training phase, it needs to apply not only forward phase but also backward phase. However, it only needs to apply the forward phase when the attack detection phase is applied. The backward processing phase is essential for getting the optimal weight vector to build the neural network architecture for future attack detection effectively. To get the optimal weight vector, the activation function is a critical role model for the neural network. Although many possible activation functions can be applied, we choose the sigmoid activation function because it is the most suitable with our proposed system. The flow of the sigmoid function is shown in Fig. 2 and it is calculated by equation (1). �(�) = � ����� B. Synthetic Minority Over-sampling Technique (SMOTE) This is an efficient technique to make the data to be balanced between normal and abnormal behaviors. The misclassifying problem can occur if the number of data for each class is significantly different. Moreover, the classification model by using imbalance data can affect the performance and stability of the detection system. This technique was proposed by N. V. Chawla et al. [18]. This technique randomly chooses the neighbors from the k nearest neighbors, depending upon the amount of data to be resampled. The scenario of this technique is as following: (1) take the difference between the feature vector and its nearest neighbor, (2) multiply this difference by a random number between 0 and 1, and (3) add it to the feature vector under consideration. The process flow of SMOTE is shown in Fig. 3. Fig. 2. Sigmoid activation function Fig. 1. Artificial neural network architecture (1) Fig. 3. A process flow of SMOTE Authorized licensed use limited to: Florida Atlantic University. Downloaded on January 26,2021 at 22:06:22 UTC from IEEE Xplore. Restrictions apply. C. Normalization We applied the max-min normalization for mapping the data which are ranging between 0 and 1. The flow of the max- min normalization is shown in equation 2. It is required for setting the input values to be at a comparable range. It can be guaranteed to have the exact scale for the features’ values. � = �����(�) ���(�)���� (�) D. Python Libraries We mainly used python libraries, called the scikit-learn and imblearn for implementing our proposed system. Python is very useful language in data science and cybersecurity era. Imblearn [19]: It is a python library to support for data sampling techniques. It is also compatible with another python library, called scikit-learn. We used this library for implementing data re-sampling, SMOTE which is mainly supported for machine learning. Scikit-learn [20]: It can be used for splitting the dataset into a training set and testing set also. We used this library