Evaluation of UDP-Based DDoS Attack Detection by Neural Network Classifier with Convex Optimization and Activation Functions

ABSTRACT


INTRODUCTION
The alarming rise of cyberattacks targeting internetconnected devices has become a pressing concern, with Distributed Denial-of-Service (DDoS) attacks being a major culprit.As detailed in sources [1] DDoS attacks [2] overwhelm systems with a deluge of malicious traffic originating from a multitude of compromised devices.DDoS attacks aim to cripple a specific system or server by overwhelming it with a flood of malicious traffic.This onslaught can render the victim completely inaccessible (service failure) or significantly slow it down (service degradation).The consequences of such an attack can be severe, leading to financial losses, server outages, and putting immense pressure on IT staff to restore normal operations.There are two primary types of DDoS attacks: reflection-based and exploitation-based.The key difference lies in how they target vulnerabilities.Reflection-based DDoS attacks exploit weaknesses in internet communication.In a reflection-based DDoS attack, attackers exploit weaknesses in third-party servers.They trick these servers into sending massive responses by forging the source address in packets to appear as if they originated from the victim's IP address.This bombardment of responses overwhelms the victim's system, causing a denial-of-service.These protocols can be application-layer or transport-layer protocols used for basic communication.The victim is flooded with legitimate responses, but from the wrong source, making it hard to identify the attacker.Exploitation-based DDoS attacks target specific weaknesses in the victim's system or software.These weaknesses can be in application-layer protocols or transportlayer protocols that handle basic communication.Similar to reflection attacks, the attacker hides behind legitimate communication, making it difficult to identify them.This research emphasizes the detection of DDoS attacks based on UDP, specifically targeting NTP, TFTP, UDP, and UDP-Lag attacks.
A UDP flood [3] constitutes an exploitation-based DDoS attack, where an extensive volume of UDP packets is directed towards a specific server with the intent to inundate its processing and response capacities.
The UDP-Lag attack [4] is a sneaky trick used by some gamers to slow down their opponents.It disrupts the connection between a player and the game server, giving the attacker an unfair advantage.There are two ways to launch a UDP-Lag attack: Lag Switch and Bandwidth Hogging Software.Lag Switch is a special piece of hardware that disrupts the flow of data between the player and the game server.Imagine it like a faulty on/off switch for your internet connection.Bandwidth Hogging Software is a program that eats up a lot of internet bandwidth on the attacker's network.By hogging the bandwidth, there's less available for other users, causing their connection to slow down.
The Network Time Protocol (NTP) [5] is like a universal clock for computers on the internet.It keeps everyone synchronized.Attackers exploit publicly available NTP servers.These servers are designed to respond to requests with a much bigger chunk of data.The attacker sends a tiny request to the NTP server with the victim's IP address spoofed as the source.The tricked server then sends a huge response back to the victim, overloading their system with useless traffic.
The Trivial File Transfer Protocol (TFTP) [6] facilitates the transfer of firmware and configuration files among networked devices.
Primary challenges [7] in DDoS detection are early detection, less computation and accuracy in detection.If DDoS attacks not detect early its consequences create very huge financially in financially and damage reputation.Conventional DDoS detection methods like data mining and statistical methods not detect early and not detect more accurately.This research employs quantitative methods to assess classification evaluation metrics for detecting UDPbased DDoS attacks using Multilayer Perceptron.Three optimization methods are evaluated, each paired with four different activation functions.This study tackles DDoS attack detection by finding a special set of features.Uncorrelated Features that act independently of each other.Imagine features like different colored lights on a traffic light -they all tell a different part of the story.Three different techniques (Pearson, Spearman, and Kendall) to identify these independent features, like looking at the traffic light from three different angles.By looking at the features identified by all three methods, we find a smaller set of highly reliable features.This is like the intersection of the three views of the traffic light, giving us the clearest picture.This approach leverages a focused set of informative features to achieve swift and accurate DDoS attack detection.By utilizing a more concise data selection, the method reduces overall resource requirements.
In this segment, we introduce UDP-based DDoS attacks and outline the objectives of this study.Section 2 dives deep into the heart of this study: our proposed method for detecting DDoS attacks.It covers all the essential steps.The Overall Approach (Framework), it explains the big picture of our method, laying out the different components that work together.The Step-by-Step Process (Algorithm), it breaks down the method into clear, sequential steps, like a recipe for DDoS attack detection.Preparing the Data (Preprocessing), it discusses how we get the data ready for analysis, ensuring it's in the best shape for our method to work effectively.The Workhorse (Multilayer Perceptron), it introduces the Multilayer Perceptron, a powerful classification tool we use to identify DDoS attacks.Fine-Tuning the Machine (Activation Functions and Optimization Methods), it explores different ways to adjust the Multilayer Perceptron to achieve the best possible performance in detecting attacks.The evaluation metrics for classification, along with the results and discussions based on experimental findings, are detailed in Section 3. Lastly, Section 4 summarizes the key findings and takeaways from this investigation into DDoS attack detection.

METHODOLOGY
The framework of the proposed model is illustrated in Figure 1.

Dataset
This study utilizes the CICDDoS2019 dataset [8], a comprehensive resource compiled by the Canadian Institute for Cybersecurity at the University of New Brunswick.It comprises eleven distinct DDoS attack datasets provided in PCAP file format.These PCAP files converted into a CSV file format by CICFlowmeter.This study use UDP, UDP-Lag, NTP and TFTP UDP-based DDoS attack datasets.Each dataset contains 87 network traffic features and millions of records.Experiments also performed on customized UDPbased DDoS attack dataset.

Preprocessing
Before feeding data into a machine learning model, we need to do some prep work.This process, called preprocessing.Preprocessing [9] helps improve the accuracy and efficiency of the model.It involves making data suitable for the model by eliminating socket features that exhibit variability across networks, addressing missing and infinity values to clean the data.Encode the class labels for normal traffic (benign) and attack traffic into a language the model understands.In this case, normal traffic becomes "0" and attack traffic becomes "1".This simplifies things for the model.All the feature values on a similar scale.Standardizing the features ensures they all contribute equally to the model's analysis.

Feature selection
In this study, relevant features [10] are selected for the model to enhance performance.This is achieved through the use of variance threshold and correlation methods.Remove features that don't provide any useful information.These can be features with no variation (constant) or features that almost never change (quasi-constant).If two features are highly correlated (meaning they move together), consider only one.To find these related features, use three different methods (Pearson, Spearman, and Kendall) and remove features with a correlation score very close to +1 or -1, where these values indicate very strong positive or negative relationships correspondingly.
To measure how closely related two features are, the Pearson correlation coefficient [11] is determined by Eq. (1).
where, xi , yi: the sample values of the x and y features, ̅ and  ̅: the mean values of the x and y features, r: is the correlation coefficient.The Spearman correlation coefficient [12], similar to Pearson's, measures how related two features are, but it focuses on the order or ranking of the data points rather than their actual values.It is determined by Eq. ( 2).
where, n: total observations, di: variation between the consecutive ranks, ρ: is correlation coefficient.The Kendall correlation coefficient [13] assesses the strength and direction of the relationship between two features by considering how often the order of one variable changes in the same (positive correlation) or opposite (negative correlation) direction as the other.It is determined by Eq. ( 3).
( 1) 2 cd NN nn where, Nc, Nd: the number of concordant and discordant correspondingly, : the correlation coefficient.

Multilayer perceptron
Imagine a powerful learning machine called a Multilayer Perceptron (MLP) [14].It works like a complex web of interconnected processing units, similar to the network of neurons in a brain.These processing units are called neurons, and they are grouped into layers.Each neuron in one layer can connect to any neuron in the next layer, allowing for complex information flow.The first layer (input layer) receives raw data from the features we're interested in.The final layer (output layer) acts like a decision maker, with one neuron for tasks like predicting a single value or classifying data into two categories (like normal vs. attack traffic).For more complex classifications with multiple categories, there would be multiple output neurons (one for each category).Layers in between the input and output layers are called hidden layers.These layers play a critical role in learning patterns from the data, but how they work is determined by the training process itself, making them quite adaptable.
Neurons in our Multilayer Perceptron (MLP) [15] are like tiny decision-makers.But how do they decide when to fire up and contribute to the final output?That's where activation functions come in!Activation functions act like gatekeepers, determining whether a neuron should send its signal onward based on the information it receives.This study explores different activation functions, like linear, logistic, Tanh, and ReLU, to see which ones work best for our MLP in detecting DDoS attacks.
The linear activation function is also called the Identity function [16].In this approach, a neuron's output directly reflects its input.Mathematically, it can be represented as: The Logistic Activation Function [17] squeezing information between 0 and 1.As the input increases, the output approaches 1.0, while decreasing input values lead to an output closer to 0.0.This function is commonly utilized in models requiring probabilistic predictions.Mathematically, it can be represented as: The Tanh (Hyperbolic Tangent) activation function [18] processes any real value as input and yields an output ranging between -1 and 1.As the input increases, the output tends towards 1.0, and for decreasing input values, it tends towards -1.0.Mathematically, it can be represented as: The ReLU (Rectified Linear Unit) activation function [19] is a nonlinear function.When the input is positive, it directly outputs the input; otherwise, it outputs zero.Mathematically, it can be represented as: This research investigates how different optimization techniques can improve a neural network's ability to detect DDoS attacks.Optimization, in this context, refers to finetuning the network's internal settings to achieve the best possible performance.The study compares three specific optimization methods SGD, L-BFGS, and ADAM [20] used in conjunction with a multilayer perceptron classifier to see which one yields the most effective DDoS attack detection system.
This study examines SGD [21] as a potential optimization method for DDoS attack detection using a neural network.SGD is known for its efficiency in training large machine learning models, particularly those involving linear classifiers and convex loss functions.It's also praised for its ease of implementation.However, SGD requires careful adjustment of internal settings (hyperparameters) and the number of training cycles to achieve optimal results.Additionally, its performance can be affected by the way data features are scaled.
The LBFGS method [22] is like a smart assistant helping us train our neural network.It belongs to a group of optimization methods called "quasi-Newton" optimization methods, designed to operate within constrained computer memory resources.It is particularly well-suited for addressing problems characterized by a high number of features.Unlike SGD, LBFGS does not necessitate extensive hyperparameter tuning.However, it does consume more memory and typically requires a greater number of iterations in comparison to SGD.This research explores ADAM as an optimization technique for the neural network used in DDoS attack detection.ADAM combines momentum, a method that helps the network navigate training challenges, with Root Mean Squared Propagation (RMSP), which tackles issues like vanishing learning rates.Praised for its fast learning and efficiency, ADAM is often the default choice for training Multilayer Perceptron (MLP) classifiers.However, it requires more computational resources compared to other optimization methods.
This study conducts experiments utilizing the Python programming language along with libraries such as sklearn.neural_network,pandas, and numpy for processing MLP classification algorithms.Visualization of the ROC-AUC curve is facilitated using matplotlib and seaborn libraries.

RESULTS AND DISCUSSIONS
This study investigates how to optimize a MLP classifier for accurate and efficient detection of UDP-based DDoS attacks.We explore different optimization methods and activation functions to achieve high attack detection accuracy while minimizing data processing and execution time.The research also examines the impact of reducing features in the input data on both accuracy and processing speed.To speed up how quickly the model can identify attacks, we preprocess the data by streamlining the features.This means reducing the number of data points the model needs to analyze without sacrificing accuracy.Consequently, decreasing data computation time also leads to a significant reduction in execution time.Figure 1 outlines proposed approach to balancing accuracy and efficiency.We're developing a model that reduces the amount of data the system needs to analyze features while maintaining its ability to accurately detect DDoS attacks.Datasets containing UDP-based DDoS attacks from the CICDDoS2019 collection.This dataset includes a variety of both TCP and UDP-based DDoS attack types.To see how well model performs, execute experiments using datasets containing different types of UDP-based DDoS attacks.These included UDP flood, UDP-Lag, NTP, and TFTP attacks.Building a broader test, this study constructed a custom dataset.This combined data from UDP attack types (UDP flood, UDP-Lag, NTP, TFTP) to simulate a more real-world scenario with mixed threats.
This section dives into the analysis of the results.First, we examine the impact of removing features with little variation constant and quasi-constant using a technique called variance thresholding.Next, we explore how well features relate to each other correlation using three methods: Pearson, Spearman, and Kendall.We then identify features that are consistently uncorrelated across all three methods.Finally, we evaluate the performance of the algorithms based on standard classification metrics.This section details a novel feature selection method used to improve efficiency in DDoS attack detection.After initial data cleaning, we removed features with minimal variation to reduce the overall number analyzed.Next, we employed three separate correlation techniques Pearson, Spearman, and Kendall on the UDP-based DDoS datasets.Each method identified unique sets of uncorrelated features.We then looked for features that consistently appeared as uncorrelated across all three methods.This approach prioritizes highly informative features, reducing computational burden due to the smaller feature set.Table 1 summarizes the number of commonly identified uncorrelated features across the datasets.Interestingly, features like "Flow IAT Min," "Flow Duration," and various traffic volume metrics consistently emerged as uncorrelated across multiple attack types (NTP, TFTP, UDP, and UDP-Lag).Finally, we evaluated the effectiveness of Multilayer Perceptron (MLP) classifiers with different optimization methods and activation functions on all datasets UDP, UDP Lag, NTP, TFTP, and a bespoke dataset using these identified uncorrelated feature sets.

Evaluation metrics of classification algorithms
This study uses the following evaluation metrics for results of UDP-based DDoS attacks detection.

TP TN ACCURACY
TP TN FP FN ] In the given context, "N" represents the total number of observations, "p" denotes the predicted probability, and "y" signifies the actual value.
Accuracy specifies the amount of correct classifications out of all classification results.Precision specifies how much classifier predictions are correctly classified.Recall measures the proportion of actual correct classes that are correctly classified by the classifier.The F1 score is the harmonic mean of precision and recall.Log loss quantifies the disparity between the actual classification outcomes and the model's predicted classification results.Furthermore, the results are assessed using K-fold cross-validation and ROC-AUC score.In K-fold cross-validation, the dataset is divided into k smaller subsets.Each model is trained on k-1 folds and tested on the remaining fold, repeating this process until all folds have been used for testing.The ROC-AUC score plots the true positive rate against the false positive rate across variance threshold values.ROC scores indicate the area under the curve, with values ranging from 0 to 1, where 1 denotes the best score and 0 indicates poor model performance.

Results and Discussion on UDP dataset
UDP-flood attacks detection, MLP model utilizing the ADAM optimization approach and Tanh activation function achieves the highest overall accuracy and K-fold crossvalidation accuracy, both exhibiting very low standard deviation.Table 2 illustrates the accuracy results for identifying UDP-flood DDoS attacks.MLP with the LBFGS optimization method yields superior K-fold cross-validation accuracy compared to overall accuracy.Meanwhile, MLP with SGD and ADAM optimization methods produce identical accuracy values for both overall and K-fold cross-validation.Notably, MLP models with SGD and ADAM optimization methods consistently deliver identical accuracy values across both overall and K-fold cross-validation scenarios.For smaller datasets, LBFGS optimization demonstrates respectable performance, while SGD and ADAM optimization methods exhibit robust results irrespective of dataset size.Additionally, the ReLU activation function yields consistent accuracy values for both overall and K-fold cross-validation.MLP models with activation functions generally outperform those without activation functions (e.g., identity activation).

Results and discussion on UDP-Lag DDoS attack dataset
For detecting UDP-Lag DDoS attacks, the MLP model utilizing the ADAM optimization approach and Tanh activation function exhibits the highest overall accuracy and K-fold cross-validation accuracy, accompanied by very low standard deviation.However, these two accuracy values are not identical.Conversely, MLP employing the LBFGS optimization approach and logistic activation function demonstrates subpar overall accuracy but improved K-fold cross-validation accuracy in UDP-Lag DDoS attack detection.The accuracy results for detecting the UDP-Lag DDoS attack dataset are presented in Table 5.
The precision, recall, and F1-score metrics for MLP with various optimization techniques and activation functions are provided in Table 6 for UDP-Lag attack detection, utilizing the common uncorrelated feature subset.Across all optimization techniques, MLP with ADAM optimization consistently achieves the highest precision, recall, and F1-score values, regardless of the presence of activation functions, on the UDP-Lag dataset.Notably, for the UDP-Lag dataset, the combination of ADAM optimization and logistic activation function yields superior specificity values, while SGD optimization with logistic and ReLU activation functions results in zero specificity values.
Log-loss and ROC values of MLP employing various optimization techniques and activation functions for UDP-Lag attack detection are outlined in Table 7. MLP utilizing the ADAM optimization method and tanh activation function outperforms others in terms of log-loss values on the UDP-Lag dataset.Conversely, MLP with LBFGS optimization method and logistic activation function exhibits inferior log-loss values compared to others on the UDP-Lag dataset.Additionally, MLP with the ADAM optimization method and tanh activation function achieves superior ROC-AUC scores compared to other configurations on UDP-Lag dataset.ROC curves illustrating the performance of MLP classification algorithms with LBFGS, SGD, and ADAM optimization methods, along with different activation functions, for UDP-Lag attack detection are depicted from Figures 5-7.

Results and Discussion on NTP DDoS attack dataset
In the detection of NTP DDoS attacks, the MLP model achieves the highest overall accuracy with the LBFGS optimization approach and ReLU activation function, while it achieves the best K-fold cross-validation accuracy with the tanh activation function.The accuracy results for detecting NTP DDoS attacks are presented in Table 8.

2
The experiments are conducted on Google Colaboratory (Colab) provided a cloud-based platform with 25 GB of RAM and a special processing unit called a TPU (Tensor Processing Unit) to accelerate the training process.Additionally, The CICFlowMeter network traffic flow generator tool played a crucial role in converting raw network traffic data stored in PCAP files into a more usable format (CSV files) for analysis by the MLP.

Figure 2 .
Figure 2. ROC-curves of the MLP with LBFGS optimization method with different activation functions on UDP dataset

Figure 3 .
Figure 3. ROC-curves of the MLP with SGD optimization method with different activation functions on UDP dataset Table3showcases the performance of Multilayer Perceptron (MLP) classifiers for UDP flood attack detection using the identified common uncorrelated features.The table presents precision, recall, and F1-score metrics for various optimization techniques and activation functions employed with the MLP model.Notably, MLP consistently delivers strong performance across all optimization methods, with or without activation functions, when detecting UDP flood DDoS attacks.Interestingly, the combination of ADAM optimization and tanh activation function yielded the best specificity is ability to correctly identify normal traffic for the UDP flood attack dataset.Conversely, LBFGS optimization without an activation function resulted in a lower specificity compared to other configurations.

Figure 4 .
Figure 4. ROC-curves of the MLP ADAM optimization method with different activation functions on UDP dataset Log loss and ROC values of MLP, employing various optimization techniques and activation functions, are presented in Table4for detecting UDP flood attacks using the common uncorrelated feature subset.Among the activation functions, MLP with the tanh activation function demonstrates the best log-loss values across all optimization methods on the UDP dataset.Furthermore, MLP with the ADAM optimization method and tanh activation function outperforms others in terms of log-loss values on the UDP-flood dataset.Additionally, MLP with the ADAM optimization method and tanh activation function yields superior ROC-AUC scores compared to other configurations on the UDP-flood dataset.The ROC curves of MLP classification algorithms with LBFGS, SGD, and ADAM optimization methods, along with

Figure 11 .Figure 12 .Figure 13 .
Figure 11.ROC-curves of the MLP with LBFGS optimization method with different activation functions on TFTP attack

Table 1 .
The count of common uncorrelated features across the datasets

Table 2 .
The accuracy of the overall model and K-fold cross-validation, presented as percentages with standard deviation, for MLP utilizing various optimization techniques and activation functions on UDP flood attack with the common uncorrelated feature subset

Table 3 .
Classification evaluation metrics of MLP employing diverse optimization techniques and activation functions for detecting UDP flood attacks, utilizing the common uncorrelated feature subset

Table 5 .
Overall model accuracy and K-fold cross-validation accuracy score (with a standard deviation) in % of the MLP with different optimization techniques and different activation functions on UDP-Lag attack using common uncorrelated feature subset

Table 6 .
Classification evaluation metrics of the MLP with different optimization techniques and different activation functions on UDP-Lag attack using the common uncorrelated feature subset

Table 7 .
ROC-AUC score, and Log-loss value of the MLP with different optimization techniques and different activation functions on UDP-Lag attack using the common uncorrelated feature subset.The precision, recall, and F1-score metrics for MLP employing various optimization techniques and activation functions on NTP attack, utilizing the common uncorrelated feature subset, are illustrated in Table9.MLP consistently achieves superior precision, recall, and F1-score values across all optimization techniques, regardless of the presence of activation functions, for detecting NTP DDoS attacks.Notably, on an NTP DDoS attack dataset, MLP with SGD optimization without activation function yields the best specificity value, while ADAM optimization with tanh activation function demonstrates poorer specificity compared to others.Log-loss and ROC values of MLP utilizing various optimization techniques and activation functions for NTP attack detection are outlined in Table10.MLP employing the LBFGS optimization method and ReLU activation function outperforms others in terms of log-loss scores on the NTP dataset.Additionally, MLP with LBFGS optimization method and ReLU activation function achieves a superior ROC-AUC score compared to other configurations on the NTP DDoS attack dataset.The ROC curves depicting the performance of MLP classification algorithms with LBFGS, SGD, and ADAM optimization methods, along with different activation functions, for NTP attack detection are illustrated from Figures 8-10.3.2.4 Results and Discussion on TFTP DDoS attack datasetIn detecting TFTP DDoS attacks, the MLP model achieves the highest overall accuracy with the LBFGS optimization approach and ReLU activation function, while it achieves highest KFC validation accuracy using the tanh activation function.The accuracy results for detecting TFTP attacks are presented in Table11.

Table 8 .
Overall model accuracy and K-fold cross-validation accuracy score (with a standard deviation) in % of the MLP with different optimization techniques and different activation functions on NTP attack using common uncorrelated feature subset

Table 9 .
Classification evaluation metrics of the MLP with different optimization techniques and different activation functions on NTP attack using the common uncorrelated feature subset

Table 10 .
ROC-AUC score, and Log-loss value of the MLP with different optimization techniques and different activation functions on NTP attack using the common uncorrelated feature subset

Table 12 .
MLP consistently achieves superior precision, recall, and F1-score values across all optimization techniques, regardless of the presence of activation functions, for detecting TFTP attacks.Notably, on the TFTP dataset, MLP with the combination of LBFGS optimization and tanh activation function yields better specificity values, while SGD optimization with logistic activation function results in poorer specificity compared to others.Log-loss and ROC scores with MLP utilizing various optimization techniques and activation functions for TFTP attack detection are presented in Table13.MLP employing the LBFGS optimization method and ReLU activation function outperforms others in terms of log-loss values on the TFTP dataset.Additionally, MLP with the ADAM optimization method and ReLU activation function achieves a superior ROC-AUC score compared to other configurations on the TFTP dataset.The ROC curves illustrating the performance of MLP classification algorithms with LBFGS, SGD, and ADAM optimization methods, along with different activation functions, for TFTP attack detection are depicted from

Table 11 .
Overall model accuracy and K-fold cross-validation accuracy score (with a standard deviation) in % of the MLP with different optimization techniques and different activation functions on TFTP attack using common uncorrelated feature subset

Table 12 .
Classification evaluation metrics of the MLP with different optimization techniques and different activation functions on TFTP attack using the common uncorrelated feature subset

Table 13 .
ROC-AUC score, and Log-loss value of the MLP with different optimization techniques and different activation functions on TFTP attack using the common uncorrelated feature subset