Traffic Management Enhancement: A Competitive Machine Learning System for Traffic Condition Classification

Traffic Management Enhancement: A Competitive Machine Learning System for Traffic Condition Classification

Surya Michrandi Nasution* Reza Rendian Septiawan Rifqi Muhammad Fikri Burhanuddin Dirgantoro


Corresponding Author Email: 
michrandi@telkomuniversity.ac.id
Page: 
553-567
|
DOI: 
https://doi.org/10.18280/ijtdi.080407
Received: 
7 September 2024
|
Revised: 
10 November 2024
|
Accepted: 
18 November 2024
|
Available online: 
26 December 2024
| Citation

© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

In big cities, traffic congestion is a prevalent issue. In order to decide how to manipulate traffic in order to alleviate congestion, traffic regulators, who supervise traffic flow, must conduct an analysis of present conditions. Classifying traffic conditions from road information is a critical step that impacts these decisions. Traffic conditions can be categorized using a variety of techniques, each with benefits and drawbacks of its own. Recently, the rapid development of machine learning techniques has accelerated their use in a variety of sectors, including intelligent transportation systems (ITS). In this study, a competitive machine learning system is introduced to support the decision-making process in ITS, specifically in traffic condition classification. The proposed system operates in two stages: first, identifying the best model configuration from various machine learning methods, and second, deciding through a voting system based on the selected models. The proposed system employs six machine learning methods, each with 4-5 variations in model configurations. The methods tested include Neural Networks, k-Nearest Neighbor, Logistic Regression, Bayesian Networks, Decision Trees, and Random Forests, with individual accuracy rates of 66.2%, 70.5%, 44.4%, 46.1%, 72.2%, and 72.6%, respectively. The models that achieved the highest performance for each method proceed to a voting system, both non-weighted and weighted. The experimental results indicate that the non-weighted system achieved an accuracy of 68.6% to 69.3%, while the weighted system reached 71.9% to 72.5%. The findings show that the proposed competitive machine learning system offers a viable solution for classifying traffic conditions with promising results, especially for implementation in Bandung City, Indonesia.

Keywords: 

intelligent transportation system, competitive machine learning system, traffic condition classification, machine learning, voting system

1. Introduction

Recently, the rapid development of machine learning techniques has accelerated their use in a variety of sectors, including intelligent transportation systems (ITS) [1]. ITS is one of the key ways to advance technology in the transportation sector. It can be applied at both microscopic [2] and macroscopic [3] levels. In the microscopic view, ITS tends to measure the in-car situation, including the vehicle’s headway [4], direction [5], speed [6], traffic sign [7], pedestrian detection [8], distance calculation [9], etc. In a simple word, microscopic lets the technology understand the detailed information of the vehicles that run on the road. On the other hand, in the macroscopic view, the technology allows control of the road infrastructure, such as traffic management system [10], shortest route calculation [11], road closure information [12], balancing the vehicle’s emission [13], and any other situation that impacts community needs.

Commonly, ITS implementation in a macroscopic view can be done by implementing machine learning methods to define the traffic situation based on historical information [14]. Machine learning is able to classify traffic conditions based on its knowledge about historical data. The classification itself is defined as a process of grouping a dataset with similar conditions into a specific category [11]. By using the training data, which is collected previously, machine learning is able to understand the current situation and check the category that has the most similar situation to the situation in the past. Machine learning can be implemented as well as a prediction system to forecast the conditions in the future [15].

The machine learning method itself has not stopped developing. However, almost all machine learning methods evolved from basic methods such as Neural Networks (NN) [16], Decision Trees [17], k-Nearest Neighbor (kNN) [18], Bayesian Networks [19], Logistic Regression [20], Random Forest [21], etc. The knowledge-growing system was developed by Husni et al., who modified the NN, Decision Trees, and Bayesian Networks to adaptively learn from previous information [15]. Nasution et al. [22] also developed a semi-ensemble learning system that is supported by using several models that use NN as its basic method.

Every machine learning model delivered different results according to the data that used in the training and testing of the model. It contradicts any papers that declare their model is using the best machine learning method. In fact, the performances of the model depend on the data that used in the model [23]. This means the statements of researchers can be falsified since the quality of the machine learning model is influenced by the data quality. According to this condition, this paper proposed a competitive machine learning system that lets the classification system choose the best methods based on the performances of the several models that are implemented in the system.

The competitive learning system will be implemented by using the traffic condition dataset in Bandung City that was collected by Nasution et al. in 2023 [24]. The data is gathered by them using several methods: (1) direct calculation by implementing object detection method to the public Close Circuit Television (CCTV) and (2) indirect calculation by gathering traffic information from TOMTOM, and the traffic condition will be measured by using several formulations. The dataset from these collection methods was joined to create a comprehensive dataset so it will be able to categorize traffic density in Bandung, which is the second largest metropolitan area in Indonesia after Jakarta, which suffers from extreme traffic congestion [25].

Bandung’s unique urban layout and tourism-driven traffic surges present unique issues that set it apart from other big cities. The city’s dense layout, which includes narrow streets and historic districts, limits the opportunity for road construction and alternate routing. In contrast to cities with more expansive and flexible road networks, such as Jakarta or Surabaya, Bandung’s infrastructure is unable to handle unexpected surges in traffic, particularly during weekends and holidays when travel is at its highest. Accurate and real-time traffic classification is crucial for efficient city planning because of the evolving traffic pattern, which necessitates a flexible traffic management strategy that can respond to these periodic, high-intensity congestion situations.

In general, traffic information needs several key parameters, such as time (days, rush-hour time, etc.) and weather data (weather, temperature, etc.). By using these parameters, this research tries to classify the traffic situation by using several machine learning models and tries to find the best configuration for each machine learning algorithm that is implemented. By the time the best model for each machine learning method is defined, the classification stage will continue to measure the final result by counting the number of categories that are classified by each model.

This paper proposed a breakthrough by creating a multi-level classification system that uses a competitive concept that is able to (1) compare machine learning models with various configurations and (2) implement the voting system to define the final classification result of the category. By proposing this concept, the learning system’s performance will be better than the classification system by using a single common machine learning method. In terms of intelligent transport systems, this work has significance since it is the only one that addresses traffic condition classification through the use of the competitive machine learning approach. More novel traffic prediction models may also be included in the system to increase performance. Additionally, some models could be modified to be integrated into the voting system, thereby increasing the scalability and adaptability of traffic prediction.

This paper is organized as follows. Section 2 will discuss the literature review that is related to the research. The proposed systems will be discussed in Section 3, followed by the simulation result and discussion in Section 4. Finally, Section 5 will provide the conclusion of this research.

2. Literature Review

A machine learning model can be used to solve the traffic condition classification approach by supplying basic input to the system. Many researchers have generally been using machine learning to classify traffic conditions [15] or other traffic-related situations [26, 27] based on historical events. Classification models that are frequently used include NN, kNN, Bayesian Networks, Logistic Regression, Decision Trees, and Random Forests. This chapter covers the literature review of the machine learning model that was employed in the course of the research.

2.1 Neural Networks (NN)

The multilayer perceptron (MLP) is an NN algorithm that includes an input layer, one or more hidden layers, and an output layer [28]. The neurons in these layers produce outputs by applying an activation function to the weighted sum of the inputs. To identify complex patterns and non-linear relationships in the data, the hidden layers are essential.

Eq. (1) is used to compute the input to the hidden layer Z in the single hidden layer MLP. Input feature categories are represented by the value of X, whereas weight matrices between the input and hidden layers indicate the relevance of each feature W1. In the following layer, the inputs are multiplied by a set of weights assigned to each neuron. Biases b, on the other hand, are the values that cause the input to be shifted to the activation function.

$Z^1=W^1 X+b^1$        (1)

Next, as Eq. (2) illustrates, Z1 is the input used to activate the hidden layer (H). Additionally, as shown by Eq. (3), the H will serve as the input for the following hidden layer, Z2. With activation functions (σ), the weighted sum of inputs (such as ReLU, sigmoid, and tanh) is transformed nonlinearly. A NN would essentially be a linear model without an activation function. Consequently, its capacity to resolve complex problems involving highly non-linear data relationships is limited.

$H=\sigma\left(W^1 X+b^1\right)$          (2)

$Z^2=W^2 H+b^2$        (3)

The network can learn increasingly complex representations by adding one or more hidden layers. In order to capture non-linear relationships, the hidden layers perform non-linear adjustments to the input data. When there are several hidden layers, the output of the l-th hidden layer can be broadly expressed in Eq. (4).

$y=\sigma\left(W^l H^{l-1}+b^l\right)$       (4)

NN models generally vary based on how the number of layers is configured. Additionally, the use of hyperparameters such as the activation function, solver, regularization term, and learning rate represents the variances in configuration.

2.2 k-Nearest Neighbor (kNN)

kNN classifies data by combining the results of calculations from data (recent occurrences) and determining the distance from the centroid of a grouped class, which is different from the method used in NN [29, 30]. This method categorizes the output data according to how similar it is to the previously computed training data output, taking into account the training process that was conducted. The new data point ($x$), which needs to be classified, and each other data point in the training set must first be separated by a certain amount of distance. The literature review indicates that there are several ways to determine this distance calculation method. The most utilized distance metric is Euclidean distance [30], which is represented mathematically in Eq. (5). The Euclidean distance emphasizes the shortest path along a straight line, as illustrated in Figure 1. As a result, it works best for continuous data with the same scale and smooth change in data points.

$d(x, y)=\sqrt{\sum_{i=1}^n\left(x_i-y_i\right)^2}$       (5)

Figure 1. Distance measurement using Euclidean distance

The Manhattan distance [31] is defined as the distance between two places along right-angled axes. As shown in Figure 2, the distance between A and B is measured based on the summation of the difference between two points on each axis. It is commonly compared to the distance you would walk in a city with a grid layout. It is the total of the absolute differences between their locations. Therefore, the Manhattan distance is useful when dealing with high-dimensional data or when it is necessary to total the differences between dimensions separately. The formula for Manhattan distance is presented in Eq. (6).

$d(x, y)=\sum_{i=1}^n\left|x_i-y_i\right|$           (6)

Figure 2. Distance measurement using Manhattan distance

The Manhattan and Euclidean distances are both generalized to form the Minkowski distance [32]. It can be seen in Figure 3 that the distance between information is measured by using the combination of previous distance measurement methods. It adds a parameter called p that allows for fine-tuning and selects the metric to be utilized. The formula for Minkowski distance is presented in Eq. (7).

$d(x, y)=\left(\sum_{i=1}^n\left|x_i-y_i\right|^p\right)^{\frac{1}{p}}$       (7)

Figure 3. Distance measurement using Minkowski distance

The Chebyshev distance [33], often known as the maximum distance, takes into account the greatest difference between any two places’ coordinates. Figure 4 shows the illustration of calculating the distance between information using Chebyshev. It calculates the greatest absolute difference in every dimension by using Eq. (8).

$d(x, y)=\max _i\left|x_i-y_i\right|$        (8)

Figure 4. Distance measurement using Chebyshev distance

The particular dataset and classification problem must be taken into consideration while selecting the distance metric. The weights, algorithm, and leaf size are the hyperparameters that affect its performance. Weights affect whether neighbors are taken into account uniformly or based on proximity when making predictions. Meanwhile, the algorithm influences the speed and efficiency of the search process. Lastly, when employing tree-based algorithms, the leaf size regulates the trade-off between search speed and accuracy.

2.3 Logistic Regression

A statistical technique known as Logistic Regression uses a categorical, typically binary, dependent variable [34]. It simulates the likelihood that an input falls into a specific category. To predict the likelihood of the positive class, Logistic Regression combines the sigmoid function and the linear regression equation. By using Eq. (9), the positive class is defined in this method.

$P(y=1 \mid x)=\frac{1}{1+e^{-\left(W^T X+b\right)}}$          (9)

The parameters penalty, solver, and regularization strength (C) in Logistic Regression to determine how the model responds to overfitting, adjust weights, and strikes a compromise between prediction accuracy and model complexity. The penalty parameter (L1, L2, ElasticNet) defines the type of regularization performed on the model.

The optimization algorithm that finds the best-fitting weights and bias is determined by the solver parameter. In terms of the value of parameter C, a simpler model with fewer large coefficients will be produced by a smaller c, which may help avoid overfitting. A greater c makes it possible for the model to match the training data more closely, which runs the risk of overfitting but may boost accuracy on the training set.

2.4 Bayesian Network

Bayesian Network is a directed acyclic graph (DAG)-based probabilistic graphical model that represents variables and their conditional dependencies. In a Bayesian Network, every node represents a variable, and direct dependencies between nodes are indicated by edges. It is helpful for inferring outcomes, making decisions in complicated systems, and reasoning under uncertainty since it applies Bayes’s Theorem to calculate the likelihood of events.

In this literature research, there are five distribution models used for classifying the traffic situation, namely, (1) Gaussian, (2) Multinomial, (3) Bernoulli, (4) Complement, and (5) Categorical Naive Bayes. Gaussian Naive Bayes assumes that the continuous features follow a Gaussian distribution. The Gaussian Naive Bayes assumes that the continuous features follow a Gaussian distribution. It is easy to implement and works well with small continuous datasets due to its simplicity. However, if the features are not normally distributed, the performance may degrade. The Gaussian distribution likelihood is the value of the features (xi), mean ($\mu_y$), and variance ($\sigma_y^2$) given a class (y), modeled as shown in Eq. (10).

$P\left(x_i \mid y\right)=\frac{1}{\sqrt{2 \pi \sigma_y^2}} \exp \left(-\frac{\left(x_i-\mu_y\right)^2}{2 \sigma_y^2}\right)$     (10)

Multinomial Naive Bayes is commonly used when the features represent counts discrete non-negative integers, such as word frequency in text classification. Since it requires discrete counts, it is unable to handle continuous variables directly without the preprocessing stage. The Multinomial distribution likelihood of the feature (xi) is the count of the i-th feature and θy,i is the probability of observing feature xi, is present in class y, modeled as shown in Eq. (11).

$P\left(x_i \mid y\right)=\frac{\theta_{y, i}^{x_i}}{x_{i}!}$           (11)

The Bernoulli Naive Bayes assumes binary features have 0 or 1 for its value. It’s commonly used for text classification, but it focuses on whether a word that appears in a document rather than how often it appears. The Bernoulli distribution likelihood of binary feature value $x_i$ is modeled, as shown in Eq. (12).

$P\left(x_i \mid y\right)=\theta_{y, i}^{x_i}\left(1-\theta_{y, i}\right)^{\left(1-x_i\right)}$           (12)

Complement Naive Bayes is designed to handle imbalanced data in each class. It computes the likelihood based on the complement of the data for each class. It is slightly more complicated to implement and computationally heavier than standard Multinomial Naive Bayes. The probability of a feature, given the complement of the class y, is modeled as shown in Eq. (13).

$P\left(x_i \mid \bar{y}\right)=\frac{\theta_{\bar{y}, i}^{x_i}}{x_{i}!}$     (13)

Categorical Naive Bayes assumes that features are categorical, not in the numerical form. It is suited for datasets where features are discrete categories. The likelihood of each feature xi, with θy,i,v is the probability of feature xi, taking value $v$ in class $y$, is calculated by using Eq. (14).

$P\left(x_i \mid y=v\right)=\theta_{y, i, v}$      (14)

Each variant of Naive Bayes has its specific strengths based on the type of data, making it versatile for different classification tasks. The comparison of the 5-distribution models is displayed in the following Table 1.

Table 1. Bayesian distribution model comparison

Distribution

Data Type

Use Case

Advantages

Disadvantages

Gaussian (GNB)

Continuous

(numerical)

Classification with continuous features

Works well with continuous data, simple to use

Asses normal distribution

Multinomial (MNB)

Discrete counts (integers)

Text or document classification

Effective for text data, handles high dimensions

Requires discrete count data

Bernoulli (BNB)

Binary

(0 or 1)

Binary text, features classification

Ideal for binary data, simple to implement

Limited to binary features

Complement (CNB)

Discrete counts, imbalanced

Text classification with imbalance class

Works well with imbalanced datasets

Assumes independence of features

Categorical (CatNB)

Categorical (discrete)

Data with finite discrete categories

Naturally handles categorical

Cannot handle continuous data

2.5 Decision Tree

The Decision Tree algorithm is a popular supervised machine-learning technique that may be used for both classification and regression tasks [35]. It functions by progressively dividing a dataset into smaller and smaller subsets and creating a Decision Tree to go along with it. A tree with decision nodes and leaf nodes is in the product. Criterion is a statistic used in Decision Trees to assess the quality of a split at each node in the tree. This metric aids in choosing the optimal feature and value to split the data on to produce homogeneous groups. The objective is to optimize the split’s effectiveness, which improves the model’s predictive capacity. Previous research works have identified two common criteria: Gini and Entropy.

Gini $=1-\sum_{i=1}^C\left(p_i\right)^2$              (15)

Entropy $=-\sum_{i=1}^C p_i \log _2\left(p_i\right)$           (16)

The Gini index, which is shown in Eq. (15), measures how effectively a split divides the classes. The lower the Gini value, the better the algorithm’s classification performance. Another method that tries to measure the value of Entropy is by quantifying the amount of information obtained by splitting the dataset [35]. Eq. (16) is used to formulate the Entropy of each feature in the dataset. The higher the entropy index, the better the algorithm’s classification performance. pi is the probability of a data point belonging to class i, and C is the number of classes.

2.6 Random Forest

Random Forest is categorized as an ensemble learning algorithm that combines multiple Decision Trees to improve the performance of a machine learning model. It operates by building numerous Decision Trees during training and combining their results (typically by averaging or majority voting) to produce a final output. For classification, given $N$ training sample $\left\{\left(x_i, y_i\right)\right\}_{i=1}^N$, where xi is the real number feature vector, and yi is the class label. A Random Forest builds T Decision Trees, each Decision Tree Tj outputs a predicted class $\hat{y}_i^{(j)}$ for a sample xi. The final predicted class $\hat{y}_i$ for Random Forest is the mode (majority vote) of individual tree predictions is measured by using Eq. (17).

$\hat{y}_i=\operatorname{mode}\left(\hat{y}_i^{(1)}, \hat{y}_i^{(2)}, \ldots, \hat{y}_i^{(T)}\right)$          (17)

Random Forest works by building multiple Decision Trees, introducing randomness through bootstrapped sampling and random feature selection, and then combining the predictions from individual trees to make the final decision. Its performance depends on tuning hyperparameters like the number of trees, maximum depth, and feature selection.

The advancement of machine learning models resulted in increasingly complicated approaches to traffic prediction. Although these models increased prediction accuracy, they were frequently constrained by their great sensitivity to variations in traffic patterns and dependence on feature engineering. The weather, time of day, season, special events, and other factors all have an impact on the extremely variable traffic conditions in Bandung. As a result, the majority of traffic prediction models may not be completely adaptable to be used in Bandung. This study will experiment with several models and their configurations in competitive machine learning.

3. Proposed Method

In order to implement the competitive learning system concept, several machine learning methods are applied with various configurations. It addresses finding the best model and its configuration from each machine-learning method. This research aims to classify the traffic conditions in Bandung City, Indonesia, by using the proposed system, as shown in Figure 5. As seen in the figure, there are at least six common methods that could be used to classify road traffic. At the first stage of the proposed system, each method will have several models with different configurations. Each model will be trained and built by using the training data. Then, there will be a model selection based on the performance comparison. By the time the best model in each method is defined, the system will enter the final stage of competitive learning system by gathering the classification results from all selected models and finding the majority class that has been calculated by using testing data.

Figure 5. Proposed system

Refers to the literature review that has been conducted in the previous chapter, the most common classification methods are NN [16], Decision Trees [17], kNN [18], Bayesian Networks [19], Logistic Regression [20], Random Forest [21], etc. The models that will be built are formed from these methods with various configurations. The configurations include the training approach, such as the number of hidden layers in NN, distance metrics in kNN, criterion calculation in the Decision Tree, etc.

As mentioned earlier, each model will be built based on various configurations. The model is trained by using a dataset that is divided into training and testing data. The training data itself will be divided into two sections, namely, training data and validation data. The competition of selecting the best model for each method will be done by comparing the performances of the model when tested using validation data.

In the latest stage of this system, the chosen models will classify the testing data. Each result will be stored and counted by the system. The final classification will be measured by calculating the maximum number of voters in the categories.

3.1 Dataset

The dataset that used in this research covers 265 road segments on the main road of Bandung City, Indonesia. The dataset is spread into 265 files and already categorized for specific road segments. In order to simplify the process, the modification of the dataset is conducted by recompiling the dataset into one dataset and adding the origin and destination of the road information that appears in the original dataset. The observation area of the dataset is shown in Figure 6.

According to Nasution et al. [24], the dataset was collected in 2020. The features of the dataset are days (D), rush hour (RH), weather information (We), temperature (T), humidity (H), and traffic situation or density (TD). Additional features that were added to the latest dataset are the origination point (Ori) and the destination point (Dest). The modification that has been done in this study is elaborating the 265 road segments dataset into one. Table 2 shows the sample dataset that will be used to classify the traffic situation in Bandung.

Figure 6. The observation area

Table 2. Sample dataset

Ori

Dest

D

RH

We

T

H

TD

43

55

0.83

0

0.75

0.73

0.23

1

52

51

0.5

0

0.50

0.79

0.04

0

58

12

0.33

0

0.50

0.76

0.55

2

45

64

0

1

0.62

0.76

0.55

0

33

32

0.33

0

0.50

0.76

0.55

2

The pre-processing stage is conducted on the modified dataset in order to reduce the classification error at the training, validation, and testing. Whenever the data finishes the pre-processing stage, the dataset must be shuffled to balance the population of the class for training, validation, and testing data. The dataset consists of 575,578 road information that will be split into 70% and 10% for training and validation, respectively, and the rest will be used as testing data.

3.2 Machine learning model

3.2.1 Implementation of NN model

In this study, several configurations could be implemented to the NN, as Gurcan et al. [36] did in 2021 by using one hidden layer that consists of 100 neurons. The hidden layer will be activated by using ReLU and optimized by using Adam Optimizer. In the end, to get better classification results, the training process is set to 200 times with 0.001 as the learning rate.

Other researchers implement other configurations in NN [36]. They used more than one hidden layer with various numbers of neurons, as He and Chen [28] did in their model, who used three hidden layers with 50 neurons in each layer. Gu et al. [37] also used three hidden layers with a greater number than He and Chen did, as in 200, 100, and 50 neurons in the first, second, and third layers, respectively. As another configuration, Yu and Zhu [38] use an unusual number of neurons for the three hidden layers in the NN, with 128, 64, and 32 neurons, respectively. These models were created by using more than 300 iterations and have various learning rates, from 0.001 to 0.01.

Anowar and Sadaoui [39] also developed an NN model that is slightly similar to Gurcan et al. [36], but it is built with a greater number of neurons by 50, and it has a 10 times larger learning rate (0.01). Anowar and Sadaoui [39] use an adaptive learning rate, so it’s able to adapt the learning rate as needed.

According to the literature review, the common activation functions used are ReLU, tanh, and sigmoid. Because of its simplicity and capacity to prevent vanishing gradients, the ReLU activation function has become the standard option for NN. Meanwhile, the common optimizers that are used in NN are Adam (A) and (SGD). In terms of the solver, SGD is slower but more generalizable than Adam, which converges quickly. Then, the types of learning rate (LR) are categorized as Constant(C), Adaptive (Ad), and InvScalling (I). The configurations utilized in various research applying NN-based system classification are shown in Table 3. Column ‘LR Init.’ in Table 3 shows the initialization of the learning rate, while column ‘Iter.’ shows the number of iterations.

Table 3. NN model’s configuration

Study

HL

AF

Solver

LR

LR Init.

Iter.

[36]

100

ReLU

A

C

0.001

200

[28]

50, 50, 50

tanh

SGD

Ad

0.01

300

[37]

200, 100, 50

ReLU

A

I

0.005

500

[38]

128, 64, 32

ReLU

A

Ad

0.001

300

[39]

150

ReLU

SGD

Ad

0.01

200

3.2.2 Implementation of kNN model

As explained in the previous section, kNN works based on the distance between the centroid of each category and the classification result. The method tries to measure the number of category member that has the nearest distance to the results. The category that has the maximum number will be claimed as the result’s category. It is common to use the nearest neighbor for each category with odd numbers, such as 3, 5, 7, 9, etc. However, in the configuration that is compared, the neighbors are set to 1, 5, 7, 10, and 15. Every kNN distance metric has benefits of its own. Euclidean Distance is appropriate for geometric or visual data, but Manhattan Distance is best suited for grid data, such as cities or transportation lines with block patterns. Minkowski Distance is adaptable to various forms of multidimensional data since it can be configured with parameters to test different distance measurements. For grid-based applications, Chebyshev Distance is particularly helpful in identifying notable shifts in a single dimension among multi-dimensional data. The data structure and analysis objectives determine which metric is best, which might have an impact on the kNN model’s accuracy and performance.

Table 4. kNN model’s configuration

Study

N_Neighbors

Weights

Algorithm

Leaf Size

Metric

[29]

5

Distance

Ball_tree

40

Minkowski

[31]

10

Uniform

Kd_tree

20

Manhattan

[34]

1

Distance

Brute

-

Euclidean

[32]

7

Distance

Auto

30

Euclidean

[40]

15

Distance

Auto

30

Chebyshev

Table 4 shows the various kNN configurations that were used in this research. As seen in the table, the distance metrics used to classify the data are Minkowski, Manhattan, Euclidean, and Chebyshev.

3.2.3 Implementation of Logistic Regression model

In this method, the configurations are diverse in their penalty regularization, which is categorized as L1, L2, and Elasticnet, as previously explained. L1 regularization (Lasso) adequately selects features by adding an absolute penalty to coefficients, driving some to zero and promoting sparsity. This is effective when only a few variables are predicted to be more significant, such as traffic situation or density. When several predictors (weather, temperature, days, humidity, rush hour, etc.) contribute, L2 regularization (Ridge) distributes the penalty more evenly and applies a squared penalty, decreasing coefficients towards zero but usually not to exactly zero. Elastic Net is especially helpful for datasets with highly correlated features, as it can improve stability and predictive performance by choosing groups of correlated features together. It does this by balancing sparsity and shrinkage by combining L1 and L2 penalties. Each compared model has 100 to 500 iterations before the model is ready to be used as a classification system. The common methods for solving the regression are Limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS), Coordinate Descent (Liblinear), and Stochastic Average Gradient Augmented (SAGA). Table 5 shows the comparison between configurations for Logistic Regression.

Table 5. Logistic Regression model’s configuration

Study

Penalty

Solver

C

Iterations

[41]

L2

Lbfgs

1.0

100

[42]

L1

Liblinear

0.5

100

[43]

Elasticnet

Saga

1.0

200

[44]

L2

Lbfgs

1.0

100

[45]

L2

Saga

0.1

500

3.2.4 Implementation of Bayesian Network model

The implementation of Bayesian Networks will follow its variances as explained in the previous section, namely: Gaussian (GNB) [46], Multinomial (MNB) [47], Bernoulli (BNB) [48], Complement (CNB) [49], dan Categorical (CatNB) [50]. Gaussian Naive Bayes is frequently used when features have a normal distribution, as in real-valued attributes like sensor data, and it performs well with continuous data. Multinomial Naive Bayes is perfect for the classification of texts (e.g., word frequency counts) because it works well with count-based data. Bernoulli Naive Bayes is frequently employed in document classification problems using binary word presence characteristics because it is built for binary or Boolean data. Complement Naive Bayes works well with unbalanced classes and frequently increases text classification accuracy. Categorical Naive Bayes is used for issues when features are not numerical, such as demographics or nominal survey replies, and performs best for categorical features with numerous classes. Inferencing using Bayesian Networks with various configurations means the system will conclude the classification by using different formulas, as explained in the literature.

3.2.5 Implementation of Decision Tree model

The Decision Tree is the most common method for classification systems. Commonly, it also delivers great performances when classifying some problems. In order to build a Decision Tree, the importance (C) of each feature from the dataset will be calculated either using the Gini Index or Entropy [35] measurement. Gini Impurity, which ranges from 0 (pure) to 0.5 (maximally impure), determines the probability of incorrectly classifying a randomly selected element based on the node’s class distribution. In contrast, entropy measures the uncertainty within a node and ranges from 0 (pure) to log(n), where n represents the number of classes. In general, the outcomes of both criteria are comparable: Entropy may be marginally more sensitive to shifts in the class distribution, while Gini is easier to compute and has a tendency to divide more conservatively. Other important parameters required to construct the tree are the maximum tree’s depth (DoT), minimum samples split and the leaf (SpL), maximum feature per leaf nodes (FLN), minimum value of impurity (ID), and class weight (Wc).

As an additional parameter, the Decision Tree is allowed to use the randomization technique (RS), to ensure the classification result’s consistency. Table 6 shows the configurations that used to be compared whenever the trained model was built.

Table 6. Decision Tree model’s configuration

Study

C

DoT

SpL

FLN

ID

Wc

RS

[35]

Gini

-

2,1

-

0.0

-

42

[51]

Entropy

10

4,2

Sqrt, 20

0.01

Balanced

-

[52]

Gini

20

10,5

Log2, 50

0.0

-

42

[53]

Entropy

15

5,3

0.5, -

0.005

-

-

[54]

Gini

-

2,1

-

0.01

Balanced

-

3.2.6 Implementation of Random Forest model

Theoretically, a Random Forest is generated from multiple Decision Trees that elaborate their result to conclude the final result. Indirectly, the parameters used in the Random Forest are almost similar to the Decision Tree. The maximum depth (DoT), maximum features (FLN), minimum sample leaf (SpL), and random state (RS) are similar parameters to the Decision Tree. In this method, the most important parameter is the number of estimators (nE), which define the number of Decision Trees that are used to generate the final result. The configurations utilized in various research are shown in Table 7.

Table 7. Random Forest model’s configuration

Study

nE

DoT

FLN

SpL

RS

[55]

500

10

sqrt

5

42

[56]

200

None

None

2

42

[46]

300

None

log2

None

42

[57]

1000

15

0.3

None

42

3.3 Competitive machine learning system

3.3.1 Performance measurement between methods

The implementation of each classification model that has been discussed in the previous section will be done in this stage. The performance of each model is measured in order to conclude which model shows the best performance among others. It is measured based on the validation data that is split from the training data. Thus, the performance (accuracy or another metric) will be compared to each other, and the model with the best performance will be selected as the representation of the method.

The implementation and testing are conducted in a computer using M3Pro Processor with 18 GB DDR5 Memory. The competitive machine learning system is built under python3 and supported by the sckit-learn library. Figure 7 shows the illustration of the first stage of the proposed system that proposed in this research. As seen in the picture above, each classification model has been compared to each other. Later, in the final stage of the system, the voting schemes are implemented to decide the final classification.

The performance metric measured in this research covers the accuracy, precision, recall, F1-Score, Matthew Correlation Coefficient (MCC), training time (first stage), and testing time (final stage). Eqs. (18)-(22) show the formulation to calculate the accuracy, precision, recall, F1-Score, and MCC respectively. These formulas need the values of true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

$Accuracy =\frac{T P+T N}{T P+T N+F P+F N}$        (18)

$Precision =\frac{T P}{T P+F P}$         (19)

Recall $=\frac{T P}{T P+T N}$         (20)

$F 1-$ Score $=\frac{2 \times \text { Precision } \times \text { Recall }}{\text { Precision }+ \text { Recall }}$          (21)

$M C C=\frac{(T P \times T N)-(F P \times F N)}{\sqrt{(T P+F P)+(T P+F N)+(T N+F P)+(T N+F N)}}$          (22)

Figure 7. Competitive machine learning system

3.3.2 Majority voting system

According to the result from the first stage of the competitive machine learning system, the chosen models from each method are prepared to classify the brand-new data test. Every model must deliver the classification result and report it to the voting system, which is the final stage of the competitive machine learning system. In the implementation of the voting system, there are two concepts to decide the final result from the system, namely: (1) non-weighted and (2) weighted voting system. Overall, the final stage of the system is illustrated in Figure 8.

Figure 8. Voting scheme in competitive machine learning system

In the non-weighted system, each classification model (N) has similar impacts in delivering the final classification result (Y) based on the category of traffic condition (j) as stated in the dataset. By the time the classification from each chosen model (ci) equals to the specific category (vj), the system records the vote by using Kronecker Delta function δ(ci, vj), and it continuously adds whenever the results are similar to a category. The number of voters is calculated by using Eq. (23), where the value equals to 1 when ci=vj, otherwise it is zero.

$Y_{\left(v_j\right)}=\sum_{i=1}^N \delta\left(c_i, v_j\right)$      (23)

On the other hand, each model has a different impact when calculating the final result using a weighted voting system. In this method, all models have their weight (wi), which is taken from the accuracy of the validation. This situation delivers an impact when concluding the result based on the voters in each category. Eq. (24) is used to measure the weighted voters. The result in this equation is not defined based on the number of voters but by calculating the total weight from similar classification results.

$Y_{\left(v_j\right)}=\sum_{i=1}^N w_i \times \delta\left(c_i, v_j\right)$             (24)

Whenever all data is already tested, the system decides its conclusion (cfin) by finding the maximum number of voters from each category using Eq. (25).

$c_{f i n}=\underset{j}{\arg \max } Y_{\left(v_j\right)}$              (25)

Based on these measurements, this learning system is able to decide the final classification result from various machine learning methods.

4. Results and Discussion

In this chapter, the implementation results of competitive learning for all stages are discussed. At the beginning, there was a discussion about the selection of the best model for each method. Later, there was also a discussion about the final decision of the classification using a voting system.

4.1 Machine learning performance

4.1.1 Performance of Neural Networks

In this section, the performance measurement for NN is conducted. The models are implemented based on the configuration that was previously explained. In general, NN needs longer training time than other methods. It is proven through the experiment that shown in Table 8, the training time for NN models is 85.35 s to 866.714 s. The training time itself is influenced by the configuration. Even though it needs a longer time to build the model, it still delivers 61% - 66% for accuracy, precision, recall, and F1-Score, respectively.

Table 8. Performance of NN models

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain (s)

1

61.127

61.247

61.127

60.89

0.483

85.35

2

66.134

66.606

66.134

66.232

0.549

673

3

64.911

65.094

64.911

64.804

0.533

866.714

4

65.034

65.148

65.034

64.996

0.534

321.552

5

62.048

62.014

62.048

61.901

0.494

156.302

The average MCC value from the models is around 0.5 points, and it means the model is categorized as a good condition method but not in a perfect condition since there is still has a probability of yielding a misclassification. Figure 9 shows an illustration of the NN performance measurements. The major shortcoming of this method is the training time that it takes to build a model compared with other methods.

Figure 9. Performance measurements for NN

4.1.2 Performance of k-Nearest Neighbor

This method delivers slightly better performance results than the NN in classifying using the traffic condition dataset. It has a performance range of 64% to 70% for accuracy, precision, recall, and F1-Score. Table 9 shows the detailed testing result for the kNN with various configurations.

Based on the training time, this method only takes 1 to 18 seconds to build various kNN models. Figure 10 illustrates the performance comparison of kNN that was developed using various configurations. Meanwhile, the average value of MCC is around 0.53 to 0.607 points. These values have similar meanings to the previous method that was discussed since it has a slightly better performance than the NN.

Figure 10. Performance measurements for kNN

Table 9. Performance of kNN models

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain (s)

1

69.446

69.673

69.446

69.41

0.593

18.029

2

69.003

69.345

69.003

69.014

0.588

4.319

3

64.775

64.818

64.775

64.792

0.53

10.675

4

69.993

70.206

69.993

69.968

0.601

3.326

5

70.509

70.726

70.509

70.502

0.607

1.54

4.1.3 Performance of Logistic Regression

This method appears to exert significant effort in classifying the test data. Unfortunately, the performance is under the previous methods. It only reaches 44% for the accuracy and recall, around 43%-44% for the precision’s value, and only 43% for its harmonic means. Table 10 provides detailed experimental results for the model built using Logistic Regression, while Figure 11 illustrates the primary performance outcomes in graphical form.

Table 10. Performance of Logistic Regression models

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain

(s)

1

44.248

43.813

44.248

43.336

0.258

2.334

2

44.384

44.036

44.384

43.432

0.26

8.957

3

44.303

43.892

44.303

43.432

0.258

3.412

4

44.076

43.785

44.076

43.515

0.255

2.557

5

44.294

43.883

44.294

43.424

0.258

2.313

Figure 11. Performance measurements for Logistic Regression

The main advantage of using this method is the time that is needed to train the data and build the model takes less than 10 seconds. However, the faster processing time does not imply greater performance since this method’s limitations are the linearity assumptions, sensitivity to outliers, and poor handling of complex or high-dimensional data, making it less effective than advanced models for non-linear or large-scale problems. The measurement of MCC only reached 0.25 points, which categorizes this method as a low-performance method since there were lots of mistakes in deciding the classification results. Overall, this method is still feasible to use in the system.

4.1.4 Performance of Bayesian Networks

The Bayesian Networks are able to build a model in less than a second. It is even faster than generating the Logistic Regression model. Commonly, this method can classify better when the training data has a balanced distribution and less variation, which became the main limitation of this method.

The value of MCC in Bayesian Networks is similar to the previous method; it’s around 0.22 to 0.29 points. Table 11 shows the detailed training and validation results of various Bayesian Network configurations. Figure 12 shows the comparison of the accuracy, precision, recall, and F1-Score for each configuration. As seen in the figure, the first model dominated others, especially in the precision and F1-Score.

Table 11. Performance of Bayesian Networks models

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain (s)

1

46.085

45.911

46.085

43.143

0.29

0.137

2

39.735

30.255

39.735

27.267

0.233

0.149

3

39.713

33.8

39.713

26.83

0.234

0.173

4

39.919

31.697

39.919

28.238

0.234

0.167

5

41.672

42.233

41.672

39.792

0.228

0.227

Figure 12. Performance measurements for Bayesian Networks

4.1.5 Performance of Decision Tree

Decision Tree is famous for its performance, which often appears beyond the average. In addition, the high performance is also supported by short-time processing to build the tree. The disadvantage of this method is that it tends to deliver overfitting results. According to the model configurations in this research, the performances do not show any overfitting results. The accuracy, precision, recall, F1-Score, and MCC are around 42%-72%, 37%-72%, 42%-72%, 36%-72% and 0.24-0.632, respectively.

Figure 13. Performance measurements for DT

As for the value of MCC, several configurations put the model in bad condition since it has 0.24 points, but on the other hand, there is a model that has 0.632 as its MCC value. It shows that the highest performance of the model is categorized as good. Table 12 and Figure 13 show the experimental result and its illustration from the Decision Tree. According to the results, it can be that the first model is the best model that will represent the Decision Tree in the final stage of the competitive machine learning system.

Table 12. Performance of Decision Tree models

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain (s)

1

72.28

72.687

72.28

72.287

0.632

0.574

2

42.564

45.009

42.564

42.105

0.24

0.218

3

54.674

54.84

54.674

53.558

0.402

0.175

4

50.468

51.629

50.468

44.879

0.359

0.214

5

44.804

37.369

44.804

36.597

0.292

0.255

4.1.6 Performance of Random Forest

At a glance, this method delivers better results compared to other methods when classifying the traffic condition. On average, the accuracy, precision, recall, and F1-Score are higher than 65% for each model configuration. Table 13 shows the Random Forest performance for four different configurations. The training time that Random Forest takes is categorized as acceptable. Figure 14 shows the comparison of the main parameters of Random Forest’s models.

The MCC values in the Random Forest spread from 0.538 to 0.636 points. This value is the best MCC’s value among other methods. Unfortunately, this method takes slightly longer to be built. The training time is around 35 to 145 seconds. This happened since Random Forest is the modification of Decision Tree, so it needs to make several trees first before it decides the configuration results.

Table 13. Performance of Random Forest models

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain (s)

1

65.255

65.447

65.255

65.067

0.538

56.549

2

72.601

72.997

72.601

72.595

0.636

35.023

3

72.527

72.856

72.527

72.507

0.635

54.944

4

71.008

71.177

71.008

70.981

0.614

145.444

Figure 14. Performance measurements for RF

4.2 Model’s configuration selection

Based on the model that was already built in the previous chapter, the model that has the best performance (accuracy) is used as a representation of the method. According to the method’s accuracy that already acquired, the machine learning methods that will be used in the voting system are (1) NN model-2; (2) K-NN model-5; (3) Logistic Regression model-2; (4) Bayesian Networks model-1; (5) Decision Tree model-1; and (6) Random Forest model-2. Table 14 shows the compiled model that represents the machine learning method.

Table 14. Performance of methods for competitive machine learning system

Model

Acc. (%)

Prec. (%)

Rec. (%)

F1-Score (%)

MCC

TTrain (s)

NN-2

66.2

66.6

66.2

66.3

0.55

673

KNN-5

70.5

70.7

70.5

70.5

0.607

1.54

LR-2

44.4

44

44.4

43.4

0.26

8.957

BN-1

46.1

45.9

46.1

43.1

0.29

0.137

DT-1

72.2

72.6

72.2

72.2

0.632

0.574

RF-2

72.6

73

72.6

72.6

0.636

35.023

4.3 Majority voting system performance results

In this section, the voting system is implemented, analyzed, and discussed. As aforementioned, two voting systems tried to decide the final classification result. In the experiment, the testing data used is 1000, 2000, 5000, and 10000 data.

4.3.1 Non-weighted voting system

At first, the non-weighted voting system was implemented to improve the classification system’s performance. As shown in Table 15, the accuracy, precision, recall, and F1-Score are 68.6%-69.3%, 68.9%-69.6%, 68.6%-69.3%, and 68%-69.3% respectively. These values enhance the performances of NN, Logistic Regression, and Bayesian Networks. Even though its performances are slightly lower than kNN, Decision Tree, and Random Forest. Figure 15 shows the performance comparisons of the non-weighted voting system on various numbers of data tests.

The MCC value of the voting result is around 0.58-0.59 points. This value is two times better than the Logistic Regression’s and Bayesian’s MCC. It’s concluded that the non-weighted voting system is able to be used as a classification method since it has good performance.

Figure 15. Classification results based on non-weighted voting system

From the point of view of time processing, the voting system needs 165.619 seconds to classify 1000 data. It means a datum will be classified in 0.165619 seconds. The other testing numbers take 0.16131 s, 0.1708618 s, and 0.173875 s for 2000, 5000, and 10000 data tests. The average time processing for the non-weighted voting system is 0.1679185 s.

Table 15. Performance of competitive machine learning system using non-weighted voting

Testing

Acc (%)

Prec (%)

Rec (%)

F1-Score (%)

MCC

TTesting (s)

1000

68.6

68.9

68.6

68.6

0.582

165.619

2000

68.9

69

68.8

68

0.585

322.636

5000

68.7

69

68.7

68.7

0.583

854.309

10000

69.3

69.6

69.3

69.3

0.591

1738.75

4.3.2 Weighted voting system

Based on each method’s performance, it seems unfair if each method has a similar impact on the voting system. In this section, the weighted voting system is implemented. The weight used in this system is collected from the value of accuracy in each method. Later, the weight will be summarized when several methods vote for similar categories for each test.

Table 16 shows the weighted voting system results from the data test. As seen from the table, when the system tries to classify 1000 data from the dataset, the accuracy, precision, recall, and F1-Score reach more than 71.9%, with an MCC value of 0.627 points. These results show that this method delivers 3.1 and 0.04 points better for accuracy and MCC compared to the non-weighted version. Overall, the performance of this method is better than other methods that have been discussed in the paper. Figure 16 shows the performances based on the various number of data tests.

Table 16. Performance of competitive machine learning system using weighted voting

Testing

Acc (%)

Prec (%)

Rec (%)

F1-Score (%)

MCC

TTesting (s)

1000

71.9

72.5

71.9

71.9

0.627

166.27

2000

72.5

72.8

72.4

72.4

0.634

330.056

5000

72

72.5

72

72

0.628

831.303

10000

72.3

72.7

72.3

72.4

0.632

1629.126

Figure 16. Classification results based on weighted voting system

The classification results came after 166.27, 330.056, 831.303, and 1629.126 seconds for 1000, 2000, 5000, and 10000 data tests. It means that in processing a datum, it takes 0.16627, 0.165028, 0.1662606, and 0.1629126 seconds. On average, it takes 0.1651178 seconds to classify a traffic situation.

Based on the implementation, testing, and discussion that have been conducted, the competitive machine learning system is able to classify the traffic situation in good condition. It may be categorized as a non-perfect system since there are still classification mistakes. The methods used in the voting system have an accuracy of only around 60%-72%, and some of them only reach around 40%. These voting methods are able to define the class from data tests, even though there are several methods categorized as low-performance methods. Non-weighted voting systems can enhance the method’s performance, but the weighted voting system can do it better. In the final stage of a competitive machine learning system, the weighted voting system is preferred to be used with this proposed learning system over the non-weighted one.

The findings in the proposed system must be validated by traffic regulators before it can be applied in the real world. According to the performance testing, the system’s accuracy is acceptable since it is neither categorized as overfitting nor underfitting. For broader applications, this traffic classification system could be adapted as a predictive tool capable of forecasting traffic conditions several minutes in advance. This feature could support traffic management systems, suggest tourist itineraries based on departure time, or benefit other traffic-related applications.

This research was conducted in Bandung City, Indonesia, where the challenge lies in the relatively limited variety of road types compared to larger cities with numerous alleyways. Despite this, the proposed system can be effectively applied to predict traffic congestion in cities with similar characteristics, provided that an appropriate dataset is available for analysis.

5. Conclusion and Suggestions

This research proposes a competitive machine learning system that applies machine learning models to classify the traffic situation. There are 29 model configurations that spread into six machine learning methods, namely NN, kNN, Logistic Regression, Bayesian Networks, Decision Tree, and Random Forest. The best model from each method – with accuracy of each model 66.2%, 70.5%, 44.4%, 46.1%, 72.2%, and 72.6%, respectively – is taken to be used in the voting system stage, which consists of two different schemes: non-weighted and weighted system.

Based on the experimental result, the non-weighted system delivers the classification result within 0.1679185 seconds with an accuracy between 68.6% - 69.3%. Meanwhile, the weighted system is able to deliver the result 0.0028007 seconds faster with an accuracy between 71.9% - 72.5%, which is also better than the non-weighted system. Besides, the value of MCC in the weighted system is higher by 0.04 points compared to its counterpart. Thus, based on their performance results, the weighted voting system is preferred to be used with this proposed learning system over the non-weighted one.

For future research, implementing the competitive machine learning system on distributed computing devices could significantly reduce classification time by distributing the computational load across multiple machines. Additionally, testing the model in various cities with unique traffic patterns would help assess its adaptability and reveal any necessary adjustments for different environments. Another possible research opportunity in this field is that integrating real-time data from various sources, such as anonymized GPS or IoT devices, or even from more complex data feed that came from the satellites, could further enhance the system’s accuracy and responsiveness, making it especially valuable for dynamic traffic management. Scaling the model for application in larger cities with complex road networks would also benefit from a focus on optimizing processing speed, and computational efficiency can be an interesting topic to be studied.

Funding

This paper was supported by The Ministry of Education, Culture, Research, and Technology, Republic of Indonesia (Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi; Kemdikbudristek) via DRTPM 2024 (Decision Letter No.: 0459/E5/PG.02.00/2024), with main Contract No.: 106/E5/PG.02.00.PL/2024 and derivative Contract No.: 43/SP2H/RT-MONO/LL4/2024; 060/LIT07/PPM-LIT/2024.

  References

[1] Stodola, J., Stodola, P., Furch, J. (2022). Intelligent transport systems. In Proceedings of 3rd International Conference CNDGS’2022, pp. 41-49. https://doi.org/10.47459/cndcgs.2022.5

[2] Grumert, E., Ma, X.L., Tapani, A. (2015). Analysis of a cooperative variable speed limit system using microscopic traffic simulation. Transportation Research Part C: Emerging Technologies, 52: 173-186. https://doi.org/10.1016/j.trc.2014.11.004

[3] Bose, A., Ioannou, P. (2003). Mixed manual/semi-automated traffic: A macroscopic analysis. Transportation Research Part C: Emerging Technologies, 11(6): 439-462. https://doi.org/10.1016/j.trc.2002.04.001

[4] Lu, M. (2019). Cooperative intelligent transport systems. Institution of Engineering and Technology. https://doi.org/10.1049/PBTR025E

[5] Dimitrakopoulos, G., Uden, L., Varlamis, I. (2020). The Future of Intelligent Transport Systems. Elsevier. https://doi.org/10.1016/C2018-0-02715-2

[6] Hina, M.D., Soukane, A., Ramdane-Cherif, A. (2022). Computational intelligence in intelligent transportation systems: An overview. In: Tomar, R., Hina, M.D., Zitouni, R., Ramdane-Cherif, A. (eds) Innovative Trends in Computational Intelligence. EAI/Springer Innovations in Communication and Computing. Springer, Cham, pp. 27-43. https://doi.org/10.1007/978-3-030-78284-9_2

[7] Sabirov, A.I., Katasev, A.S., Dagaeva, M.V. (2021). A neural network model for traffic signs recognition in intelligent transport systems. Computer Research and Modeling, 13(2): 429-435. https://doi.org/10.20537/2076-7633-2021-13-2-429-435

[8] Khalifa, A.B., Alouani, I., Mahjoub, M.A., Rivenq, A. (2020). A novel multi-view pedestrian detection database for collaborative intelligent transportation systems. Future Generation Computer Systems, 113: 506-527. https://doi.org/10.1016/j.future.2020.07.025

[9] Lilhore, U.K., Imoize, A.L., Li, C.T., Simaiya, S., Pani, S.K., Goyal, N., Kumar, A., Lee, C.C. (2022). Design and implementation of an ML and IoT based adaptive traffic-management system for smart cities. Sensors, 22(8): 2908. https://doi.org/10.3390/s22082908

[10] Huang, K., Jiang, C., Li, P., Shan, A., Wan, J., Qin, W.H. (2022). A systematic framework for urban smart transportation towards traffic management and parking. Electronic Research Archive, 30(11): 4191-4208. https://doi.org/10.3934/era.2022212

[11] Gholamhosseinian, A., Seitz, J. (2021). Vehicle classification in intelligent transport systems: An overview, methods and software perspective. IEEE Open Journal of Intelligent Transportation Systems, 2: 173-194. https://doi.org/10.1109/OJITS.2021.3096756

[12] Sujatha, A., Suguna, R., Jothilakshmi, R., Rani, K.P., Mujawar, R.Y., Prabagaran, S. (2023). Traffic congestion detection and alternative route provision using machine learning and IoT-based surveillance. Journal of Machine and Computing, 3(4): 475-485. https://doi.org/10.53759/7669/jmc202303039

[13] Rauniyar, A., Berge, T., Kuijpers, A., Litzinger, P., Peeters, B., Gils, E.V., Kirchhoff, N., Hakegard, J.E. (2023). NEMO: Real-time noise and exhaust emissions monitoring for sustainable and intelligent transportation systems. IEEE Sensor Journal, 23(20): 25497-25517. https://doi.org/10.1109/JSEN.2023.3312861

[14] Poole, A., Kotsialos, A. (2016). Swarm intelligence algorithms for macroscopic traffic flow model validation with automatic assignment of fundamental diagrams. Applied Soft Computing Journal, 38: 134-150. https://doi.org/10.1016/j.asoc.2015.09.011

[15] Husni, E., Nasution, S.M., Kuspriyanto, Yusuf, R. (2020). Predicting traffic conditions using knowledge-growing Bayes classifier. IEEE Access, 8: 191510-191518. https://doi.org/10.1109/ACCESS.2020.3032230

[16] Marquina-Araujo, J.J., Cotrina-Teatino, M.A., Cruz-Galvez, J.A., Noriega-Vidal, E.M., Vega-Gonzalez, J.A. (2024). Application of Autoencoders Neural Network and K-Means clustering for the definition of geostatistical estimation domains. Mathematical Modelling of Engineering Problems, 11(5): 1207-1218. https://doi.org/10.18280/mmep.110509

[17] Sarkar, D., Bali, R., Sharma, T. (2018) Practical Machine Learning with Python. Apress Berkeley, CA. https://doi.org/10.1007/978-1-4842-3207-1

[18] Mante, J., Kolhe, K. (2024). Ensemble of tree classifiers for improved DDoS attack detection in the Internet of Things. Mathematical Modelling of Engineering Problems, 11(9): 2355-2367. https://doi.org/10.18280/mmep.110909

[19] Ng, A. (2018). Machine Learning Yearning: Technical Strategy for AI Engineers, in the Era of Deep Learning. https://www.dbooks.org/machine-learning-yearning-1501/.

[20] James, G., Witten, D., Hastie, T., Tibshirani, R. (2017). An Introduction to Statistical Learning with Aplication in R. Springer New York, NY. https://doi.org/10.1007/978-1-0716-1418-1

[21] Trevor, H., Robert, T., Jerome Friedman. (2001). The element of statistical learning - data mining, interference and prediction. Springer New York, NY. https://doi.org/10.1007/978-0-387-84858-7

[22] Nasution, S.M., Husni, E., Yusuf, R., Kuspriyanto. (2020). Semi-ensemble learning using neural network for classifying traffic condition. In 2020 International Conference on Information Technology Systems and Innovation (ICITSI), Bandung, Indonesia, pp. 443-448. https://doi.org/10.1109/ICITSI50517.2020.9264956

[23] Olugbade, S., Ojo, S., Imoize, A. L., Isabona, J., Alaba, M.O. (2022) A review of artificial intelligence and machine learning for incident detectors in road transport systems. Mathematical and Computational Applications, 27(5): 77. https://doi.org/10.3390/mca27050077

[24] Nasution, S.M., Husni, E., Kuspriyanto, K., Yusuf, R. (2023). Heterogeneous traffic condition dataset collection for creating road capacity value. Big Data and Cognitive Computing, 7(1): 40. https://doi.org/10.3390/bdcc7010040

[25] Farda M., Balijepalli, C. (2018). Exploring the effectiveness of demand management policy in reducing traffic congestion and environmental pollution: Car-free day and odd-even plate measures for Bandung city in Indonesia. Case Study Transportation Policy, 6(4): 577-590. http://doi.org/10.1016/J.CSTP.2018.07.008

[26] Gururaj, H.L., Janhavi, V., Tanuja, U., Flamini, F., Soundarya, B.C., Ravi, V. (2022). Predicting traffic accidents and their injury severities using machine learning techniques. International Journal of Transport Development and Integration, 6(4): 363-377. https://doi.org/10.2495/TDI-V6-N4-363-377

[27] Ibrahim, N.A., Subramaniam, A., Walker, P., Jabar, S.N., Rahman, S.A. (2023). Development and prediction of Kuala Terengganu driving cycle via long short-term memory recurrent neural network. International Journal of Transport Development and Integration, 7(2): 105-111. https://doi.org/10.18280/ijtdi.070205

[28] He, X., Chen, Y.S. (2021). Modifications of the multi-layer perceptron for hyperspectral image classification. Remote Sensing, 13(17): 3547. https://doi.org/10.3390/rs13173547

[29] Mullick, S.S., Datta, S., Das, S. (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Transactions on Neural Networks and Learning Systems, 29(11): 5713-5725. https://doi.org/10.1109/TNNLS.2018.2812279

[30] Nasution, I.S., Delima, D.P., Zaidiyah, Z., Fadhil, R. (2022). A low cost electronic nose system for classification of Gayo arabica coffee roasting levels using stepwise linear discriminant and K-nearest neighbor. Mathematical Modelling of Engineering Problems, 9(5): 1271-1276. https://doi.org/10.18280/mmep.090514

[31] Nurwanto, F., Ardiyanto, I., Wibirama, S. (2016). Light sport exercise detection based on smartwatch and smartphone using k-nearest neighbor and dynamic time warping algorithm. In 2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE), Yogyakarta, Indonesia, pp. 1-5 https://doi.org/10.1109/ICITEED.2016.7863299

[32] Gallego, A.J., Calvo-Zaragoza, J., Valero-Mas, J.J., Rico-Juan, J.R. (2018). Clustering-based k-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recognition, 74: 531-543. https://doi.org/10.1016/j.patcog.2017.09.038

[33] Iwasaki, M., Miyazaki, D. (2018). Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. arXiv preprint arXiv:1810.07355. https://doi.org/10.48550/arXiv.1810.07355

[34] Nababan, A.A., Sutarman, Zarlis, M., Nababan, E.B. (2024). Multiclass logistic regression classification with PCA for imbalanced medical datasets. Mathematical Modelling of Engineering Problems, 11(9): 2377-2387. https://doi.org/10.18280/mmep.110911

[35] Priyanka, Kumar, D. (2020). Decision tree classifier: A detailed survey. International Journal of Information and Decision Sciences, 12(3): 246-269. https://doi.org/10.1504/ijids.2020.108141

[36] Gurcan, O.F., Beyca, O.F., Dogan, O. (2021). A comprehensive study of machine learning methods on diabetic retinopathy classification. International Journal of Computational Intelligence Systems, 14(1): 1132-1141. https://doi.org/10.2991/IJCIS.D.210316.001

[37] Gu, B., Liu, G.D., Zhang, Y.F., Geng, X., Huang, H. (2021). Optimizing large-scale hyperparameters via automated learning algorithm. arXiv preprint arXiv:2102.09026. https://doi.org/10.48550/arXiv.2102.09026

[38] Yu, T., Zhu, H. (2020). Hyper-parameter optimization: A review of algorithms and applications. arXiv preprint arXiv:2003.05689. https://doi.org/10.48550/arXiv.2003.05689

[39] Anowar, F., Sadaoui, S. (2021). Incremental learning framework for real-world fraud detection environment. Computational Intelligent, 37(1): 635-656. https://doi.org/10.1111/coin.12434

[40] Liu, W., Chawla, S. (2011). Class confidence weighted KNN algorithms for imbalanced data sets. In Advances in Knowledge Discovery and Data Mining: 15th Pacific-Asia Conference, PAKDD 2011, Proceedings, Part II 15, Shenzhen, China, pp. 345-356. https://doi.org/10.1007/978-3-642-20847-8_29

[41] Behunkou, U.I., Kovalyov, M.Y. (2023). Loan classification using logistic regression. Informatics, 20(1): 55-74. https://doi.org/10.37661/1816-0301-2023-20-1-55-74

[42] Kayabol, K. (2020). Approximate sparse multinomial logistic regression for classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2): 490-493. https://doi.org/10.1109/TPAMI.2019.2904062

[43] Solimun, Fernandes, A.A.R. (2024). Ensemble bagging Discriminant and logistic regression in classification analysis. New Mathematics and Natural Computation, 1-21. https://doi.org/10.1142/S1793005725500061

[44] Mutis, M., Beyaztas, U., Simsek, G.G., Shang, H.L. (2023). A robust scalar-on-function logistic regression for classification. Communications in Statistics - Theory and Methods, 52(23): 8538-8554. https://doi.org/10.1080/03610926.2022.2065018

[45] Kirasich, K., Smith, T., Sadler, B. (2018). Random forest vs logistic regression: Binary classification for heterogeneous datasets. SMU Data Science Review, 1(3): 9.

[46] Dani, Y., Ginting, M.A. (2024). Comparison of iris dataset classification with Gaussian naïve Bayes and decision tree algorithms. International Journal of Electrical and Computer Engineering, 14(2): 1959-1968. https://doi.org/10.11591/ijece.v14i2.pp1959-1968

[47] Chebil, W., Wedyan, M., Alazab, M., Alturki, R., Elshaweesh, O. (2023). Improving semantic information retrieval using multinomial naive Bayes classifier and Bayesian networks. Information, 14(5): 272. https://doi.org/10.3390/info14050272

[48] Singh, G., Kumar, B., Gaur, L., Tyagi, A. (2019). Comparison between multinomial and Bernoulli naïve Bayes for text classification. In 2019 International Conference on Automation, Computational and Technology Management (ICACTM), London, UK, pp. 593-596. https://doi.org/10.1109/ICACTM.2019.8776800

[49] Subarkah, P., Damayanti, W.R., Permana, R.A. (2022). Comparison of correlated algorithm accuracy naive bayes classifier and naive bayes classifier for classification of heart failure. ILKOM Jurnal Ilmiah, 14(2): 120-125. https://doi.org/10.33096/ilkom.v14i2.1148.120-125

[50] Abbas, A., Jaiswal, M., Agarwal, S., Jha, P., Siddiqui, T.J. (2024). Performance based comparative analysis of naïve Bayes variants for text classification. In Data Science and Communication. Springer Nature, Singapore, pp. 295-310. https://doi.org/10.1007/978-981-99-5435-3_20

[51] Chaabane, I., Guermazi, R., Hammami, M. (2020). Enhancing techniques for learning decision trees from imbalanced data. Advances in Data Analysis and Classification, 14: 677-745. https://doi.org/10.1007/s11634-019-00354-x

[52] Abdulqader, H.A., Abdulazeez, A.M. (2024). A review on decision tree algorithm in healthcare applications. The Indonesian Journal of Computer Science, 13(3): 3863-3881. https://doi.org/10.33022/ijcs.v13i3.4026

[53] Gepp, A., Kumar, K. (2015). Predicting financial distress: A comparison of survival analysis and decision tree techniques. Procedia Computer Science, 54: 396-404. https://doi.org/10.1016/j.procs.2015.06.046

[54] Jena, M., Dehuri, S. (2020). Decision tree for classification and regression: A state-of-the art review. Informatica, 44: 405-420. https://doi.org/10.31449/INF.V44I4.3023

[55] Biau, G., Scornet, E. (2016). A random forest guided tour. Test, 25: 197-227. https://doi.org/10.1007/s11749-016-0481-7

[56] Schonlau, M., Zou, R.Y. (2020). The random forest algorithm for statistical learning. The Stata Journal, 20(1): 3-29. https://doi.org/10.1177/1536867X20909688

[57] Speiser, J.L., Miller, M.E., Tooze, J., Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134: 93-101. https://doi.org/10.1016/j.eswa.2019.05.028