Comparative Analysis of Machine Learning and Autoregressive Models for Forecasting Economic Growth: A Case Study

Comparative Analysis of Machine Learning and Autoregressive Models for Forecasting Economic Growth: A Case Study

Malika Messaoudi* Houari Khouidmi

Faculty of Economics, Business and Management Sciences, Hassiba Benbouali University of Chlef, Chlef 0218, Algeria

Faculty of Technology, Hassiba Benbouali University of Chlef, Chlef 0218, Algeria

Corresponding Author Email: 
m.messaoudi@univ-chlef.dz
Page: 
3049-3061
|
DOI: 
https://doi.org/10.18280/ijsdp.190820
Received: 
10 April 2024
|
Revised: 
22 June 2024
|
Accepted: 
10 July 2024
|
Available online: 
29 August 2024
| Citation

© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

This study presents a comparative analysis of machine learning models, specifically gradient boosting machine (GBM) and random forest (RF), against the traditional vector autoregressive (VAR) model for forecasting economic growth in Algeria. By utilizing a dataset comprising key macroeconomic indicators—Gross Domestic Product (GDP), money supply (M), and inflation (I)—we aim to evaluate the predictive accuracy and robustness of these models. Our findings indicate that the RF model outperforms both GBM and VAR in terms of accuracy and reliability, providing a valuable understanding of the economic dynamics of Algeria. These results highlight the potential of advanced machine learning techniques in improving economic forecasting and informing policy decisions in emerging economies.

Keywords: 

machine learning (ML), economic forecasting, RF, GBM, VAR, Algerian economy, macroeconomic indicators

1. Introduction

Economic forecasting, particularly in predicting GDP growth, is of paramount importance for policymakers, investors, and businesses to make informed decisions and mitigate risks in an ever-changing economic landscape [1]. Algeria, has an economy heavily dependent on oil and gas exports, making it vulnerable to global oil price fluctuations. Despite its resource wealth, Algeria faces challenges such as high unemployment, inflation, and the need for economic diversification [2].

Traditional econometric models have long been the cornerstone of economic forecasting; however, the emergence of ML techniques has opened new avenues for improving the accuracy and reliability of economic predictions. ML techniques have emerged as potent ensemble learning methods, garnering broad acclaim across diverse domains [3-6], notably within economics [5, 7]. Through ongoing refinement efforts by researchers and practitioners, ML algorithms have evolved to tackle myriad challenges encountered in predictive modeling, such as overfitting, ensuring model interpretability, and enhancing computational efficiency. Leveraging their versatility in handling varied data types, capturing nonlinear relationships, and processing large-scale datasets, ML algorithms have found extensive utility across multifarious domains finance, healthcare, marketing, and more. Researchers have leveraged ML to improve the accuracy and timeliness of economic predictions, enabling policymakers and businesses to anticipate changes in economic conditions and formulate proactive strategies.

In the context of economic modeling, ML technique has shown remarkable versatility and effectiveness in capturing complex relationships and nonlinearities inherent in economic data. Its applications range from nowcasting GDP to predicting financial market trends and identifying key drivers of economic performance [8].

GBM can be traced back to the pioneering work of Jerome Friedman in the late 1990s [9], marking a significant milestone in the landscape of machine learning ML techniques, where he introduced the concept of boosting weak learners into strong ones through gradient descent optimization. This innovative approach aimed to iteratively minimize the errors of the previous models, thereby improving predictive performance. Since its inception, GBM has seen significant advancements, becoming one of the most powerful and widely used ML algorithms. In contrast, RF introduced by Breiman [10], constructs a multitude of decision trees independently and combines their predictions through a averaging or voting mechanism. This ensemble method leverages the diversity of individual trees to reduce overfitting and improve model robustness. RF has gained popularity due to its simplicity, scalability, and resistance to overfitting, making it suitable for a wide range of machine learning tasks. Moreover, RF's ability to handle high-dimensional data, missing values, and categorical variables further enhances its utility in practical applications. Despite its simplicity, RF consistently delivers competitive performance and remains a staple algorithm in the machine learning toolkit.

1.1 Literature review

Economic forecasting in Algeria has traditionally relied on conventional econometric models, which have proven instrumental in analyzing relationships among economic variables. However, these models are constrained by their difficulty in capturing non-linear interactions prevalent in Algeria's volatile economy, heavily influenced by fluctuating oil prices. Moreover, these models often require assumptions such as data stationarity and linearity, conditions that may not always hold true in practice.

Several studies have attempted to address these challenges. Simkins [11] explored the effectiveness of imposing business cycle restrictions on VAR models, showing potential improvements in forecasting accuracy by aligning model behaviors with historical business cycle patterns. This approach is particularly relevant for economies like Algeria characterized by volatility. Touitou et al. [12] investigated the impact of exchange rates on Algeria's economic growth, revealing intricate macroeconomic interactions necessitating their inclusion in forecasting models for more accurate economic projections.

Fekir and Bouras [13] focused on financial development's influence on Algerian economic growth, advocating for the integration of financial variables into forecasting models to better capture economic dynamics. Their study underscores the potential benefits of combining financial development indicators with advanced machine learning techniques to enhance forecast accuracy. Haouas et al. [14] used growth accounting frameworks to analyze Algeria's economic growth sources, emphasizing the role of labor growth over capital accumulation and productivity gains, and highlighting policy areas for long-term economic improvement.

In addition, Ayad et al. [15] explored the causal relationship between government expenditure and economic growth in Algeria, employing rigorous statistical tests to uncover nuanced links between fiscal policy and economic activity. Meanwhile, Messaoudi [16] assessed the impacts of fiscal and monetary policies on Algerian economic growth from 1980 to 2022, stressing the importance of robust data collection and processing techniques to enhance forecast precision.

The application of ML techniques in economic forecasting has garnered considerable attention in recent years, with a growing body of literature exploring its potential and limitations. Several studies have demonstrated the effectiveness of ML models, including neural networks, support vector machines, and ensemble methods, in predicting various economic indicators such as GDP, inflation rate, money supply, unemployment, etc.

Periklis et al. [17] addresses the critical issue of forecasting the unemployment rate in the Euro Area with Machine Learning. The authors employ three machine learning methodologies: decision trees (DT), RF, and support vector machines (SVM), alongside an elastic-net logistic regression (logit) model from the field of econometrics. Similarly, Katris [18] applied a diverse set of forecasting techniques, including both time series and machine learning approaches, to achieve robust and accurate predictions of unemployment rates. In a more the work presented by Sermpinis et al. [19] focuses on the use of a hybrid machine learning technique called genetic support vector regression (GSVR) for forecasting inflation and unemployment. Authors utilize genetic support vector regression, which combines support vector regression (SVR) with genetic algorithms (GA) for parameter optimization.

Regarding forecasting GDP growth, many researchers conducted the application of machine learning methodologies, highlighting the advantages of ensemble methods in handling nonlinearities and capturing complex patterns in economic data, showcasing its superior performance compared to traditional forecasting models [20-23]. Yoon [20] investigated the prediction of real GDP growth using gradient boosting and random forest approaches, showcasing their effectiveness in capturing the intricate dynamics of economic variables. Their research employed a novel feature engineering approach and model architecture optimization to improve the accuracy of GDP growth predictions. Velidi [21] examined the application of deep learning techniques such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks for forecasting GDP growth, offering novel methodologies for economic analysis and prediction. Ghosh and Ranjan [22] explored a machine learning approach to GDP nowcasting, focusing on emerging markets, and offering insights into improving real-time GDP prediction using novel methodologies. By incorporating real-time economic data and leveraging the flexibility of ML, Shams et al. [23] proposed a PC-LSTM-RNN model for predicting GDP in urban profiling areas, contributing innovative methods for GDP forecasting in specific urban contexts, and demonstrated significant improvements in forecasting accuracy compared to baseline models.

Recent studies have started exploring the potential of ML models in economic forecasting for Algeria. For example, Hamiane et al. [24] conducted a comparative analysis of LSTM, ARIMA, and hybrid models for forecasting future GDP, finding that hybrid models could significantly improve forecasting accuracy by leveraging the strengths of both traditional and ML approaches. Similarly, Sahed et al. [25] used an Adaptive Neuro-Fuzzy Inference System (ANFIS) to forecast Algerian GDP, demonstrating its superiority over traditional models in handling non-linear data patterns and providing more accurate predictions.

While authors in references [24, 25] demonstrated the potential of ML models, their studies also highlighted issues such as the need for extensive computational resources and the 'black-box' nature of ML models, which can limit their interpretability. This limitation is significant for policymakers who require transparent and interpretable models to make informed decisions. Furthermore, these studies focused primarily on GDP forecasting and did not explore other critical macroeconomic indicators such as inflation and money supply. Our study contributes to this literature by employing advanced data preprocessing methods, such as imputation of missing values and nonlinear transformation, to ensure the reliability and accuracy of our economic forecasts.

Despite these advancements, challenges remain in adopting ML techniques for economic forecasting, including data availability, model interpretability, and robustness to structural breaks. Nonetheless, the growing body of empirical evidence suggests that ML-based approaches hold promise in improving the accuracy and reliability of economic predictions, thereby contributing to more informed decision-making processes in both public and private sectors.

Our study addresses gaps in Algerian economic forecasting through several key contributions. Firstly, we broaden the scope by incorporating multiple essential macroeconomic indicators—GDP, money supply (M), and inflation (I)—to provide a more holistic view of the Algerian economy. Secondly, we enhance interpretability by integrating feature importance analysis into our machine learning models (GBM and RF), thereby making the results more accessible and actionable for policymakers. Thirdly, advanced data preprocessing techniques are implemented to address challenges such as missing data and non-linearity, ensuring the robustness of our models. These efforts contribute significantly to the literature on Algerian economic forecasting by demonstrating the superior predictive performance of ML models over traditional VAR models, improving interpretability through feature analysis, and offering comprehensive insights crucial for informed policy-making. Ultimately, our findings underscore the potential of advanced ML techniques to enhance the accuracy and reliability of economic forecasts in emerging economies like Algeria, providing valuable tools for policymakers to foster economic stability and growth.

1.2 Motivation

This research contributes to the development of computational economics by demonstrating practical applications of artificial intelligence and data-driven methodologies for macroeconomic analysis and forecasting. In recent years, the advent of ML techniques has offered a promising alternative for enhancing economic forecasting capabilities. ML techniques, with its ability to handle complex, high-dimensional data and capture nonlinear relationships that may exist in economic indicators, presents a valuable opportunity to improve the accuracy and reliability of economic predictions.

Economic nowcasting and forecasting is a cornerstone of decision-making in various domains, ranging from financial markets to policy formulation. Historically, autoregressive (AR) models have been a popular choice for time series forecasting, relying on past observations to predict future trends [26]. However, the rise of machine learning techniques has sparked interest in exploring alternative approaches [27]. ML models offer the advantage of capturing complex nonlinear relationships and interactions within the data, making them well-suited for forecasting tasks where traditional linear models may fall short.

In contrast, VAR models are based on the assumption that the future value of a variable depends linearly on its past values. While VAR models are simple and interpretable, they may struggle to capture the nonlinear dynamics present in many economic time series. This raises questions about the efficacy of VAR models compared to more advanced machine learning techniques. Through a comparative analysis of ML models and VAR model for economic forecasting, this study aims to provide insights into the relative strengths and weaknesses of each approach. By evaluating their performance across various metrics and real-world datasets.

Figure 1. Proposed modeling technique

To assess the effectiveness of the proposed approach, the performance of the vector autoregressive (VAR) model is compared with other ML models, such as Gradient Boosting Machine (GBM) and Random Forests (RF). Furthermore, to provide a deeper understanding of the strengths and limitations of the proposed approach, sensitivity analysis is conducted to evaluate the robustness of the ML models to changes in hyperparameters (Figure 1).

This study focuses on Algeria economic growth due to the urgent need for effective economic forecasting tools to inform policy-making. By applying advanced machine learning models to forecast GDP, money supply, and inflation, we aim to provide more accurate and reliable methods that capture the complexities of Algeria's economy better than traditional models.

2. Methodology

In this section, we provide a detailed description of the implementation process for applying the machine learning algorithms (Gradient Boosting Machine GBM, and Random Forests RF) to forecast GDP growth based on traditional economic indicators and economic scenarios data.

2.1 Description

2.1.1 Reasons for choosing GBM and RF models

The selection of GBM and RF models for this study is driven by their proven effectiveness and distinct advantages in handling complex, high-dimensional datasets typical of economic forecasting. These models have been extensively validated in various forecasting domains, demonstrating superior performance compared to traditional econometric models.

2.1.2 Advantages of GBM and RF in economic forecasting

GBM is renowned for its high predictive accuracy due to its iterative boosting process, which focuses on correcting the errors of previous models. This results in a powerful ensemble that performs exceptionally well in forecasting tasks. Additionally, GBM handles non-linearity and interactions effectively, capturing complex relationships within economic data without requiring explicit specification. RF, on the other hand, leverages ensemble learning by constructing multiple decision trees and aggregating their predictions, resulting in a model that is less prone to overfitting and robust to noise and outliers. Both GBM and RF provide measures of variable importance, allowing us to understand which variables are most significant in predicting economic outcomes. This feature enhances the interpretability of the models and their utility for policymakers, providing valuable insights into the key drivers of economic indicators.

2.1.3 Applicability to the Algerian economy

The Algerian economy is characterized by volatility and structural changes, driven by fluctuations in oil prices and other external factors. GBM and RF's robustness and adaptability make them ideal for such a dynamic environment. Their ability to handle large, high-dimensional datasets and capture non-linearities translates into more accurate and reliable economic forecasts, essential for effective policy-making. Furthermore, both models can process and integrate various types of data, including time-series data, which is important for economic forecasting. The superior performance of these models, as demonstrated in our results, ensures that our study leverages their strengths to improve the accuracy and reliability of economic forecasts for Algeria, addressing the specific challenges and requirements of forecasting in an emerging economy.

2.2 Implementation process

In the initial stage, depicted in Figure 2, the machine learning workflow begins with feature selection. Through a meticulous examination of various variables impacting GDP growth, statistical techniques are employed to identify the most significant features. These selected features are then designated as input variables for the model. Following feature selection, the dataset undergoes a pivotal step of data splitting using cross-validation methods. This process divides the dataset into training and testing sets, enabling the fine-tuning of hyperparameters and providing a robust assessment of the model's performance.

Figure 2. Machine learning workflow

2.2.1 GBM technique

GBM works by combining multiple weak learners (typically decision trees) sequentially, where each subsequent learner corrects the errors of the previous one (Figure 3). GBM iteratively fits new trees to the residuals of the model, gradually reducing the errors and improving the overall prediction accuracy.

Figure 3. GBM modeling technique

2.2.2 RF technique

RF is a versatile and powerful machine learning algorithm. It belongs to the ensemble learning family and operates by constructing multiple decision trees during training. Each decision tree is built using a random subset of the features and data points from the original dataset (Figure 4).

Figure 4. RF modeling technique

2.2.3 Data preparation

Investigating the relationship between Algeria's money supply, inflation, and GDP growth offers critical insights into the effectiveness of the country's monetary policy and its broader implications for economic stability and growth. Algeria's central bank plays a pivotal role in managing inflationary pressures through its control over the money supply. Analyzing how changes in the money supply influence inflation rates and, subsequently, GDP growth provides valuable indicators of the central bank's policy effectiveness. By assessing the central bank's ability to maintain price stability while fostering sustainable economic growth, policymakers and researchers can gain a deeper understanding of Algeria's macroeconomic environment. Moreover, this analysis can highlight potential challenges and risks, such as overheating or recessionary pressures, and inform policy decisions aimed at promoting long-term economic stability and prosperity. The investigation leverages a comprehensive dataset comprising three key macroeconomic variables (GDP, inflation rate, and money supply), measured quarterly from March 1964 through March 2022, sourced from data published by the World Bank. This dataset provides a reliable foundation for analyzing the dynamics of Algeria's monetary policy and economic performance over the specified period, enabling rigorous examination of the interplay between monetary policy instruments, inflationary trends, and GDP fluctuations.

Before proceeding with modeling, it is imperative to ensure that the collected historical data undergoes thorough cleaning and preprocessing. Any missing values should be handled appropriately, and outliers may need to be addressed to maintain the integrity of the dataset. Additionally, standardizing or normalizing the features in the dataset is recommended to ensure that they are on a similar scale, which can facilitate the convergence and performance of the predicted and forecasted model. We begin by collecting the historical data (money supply, and inflation rate) denoted by M and I respectively, as well as economic scenarios data (M_DIFF and I_DIFF). These datasets are combined into a feature matrix X where each row represents a historical observation and each column represents a feature.

$X=\left[\begin{array}{cccc}M_1 & I_1 & M_{-} D I F F_1 & I_{-} D I F F_1 \\ M_2 & I_2 & M_{-} D I F F_2 & I_{-} D I F F_1 \\ \vdots & \vdots & \vdots & \vdots \\ M_N & I_N & M_{-} D I F F_N & I_{-} D I F F_N\end{array}\right]$          (1)

where,

Money supply (M): The percentage of money supply growth in its broad sense.

Inflation rate (I): The rate at which the general level of prices for goods and services is rising.

M_DIFF: The difference in money supply scenario compared to a baseline scenario (average historical Data).

I_DIFF: The difference in the inflation rate and the central bank's target rate scenario compared to a baseline scenario.

Figure 5. Proposed macro economic time series variables

By displaying the time series represented in Figure 5, all series appear nonstationary and have very different scales. Since estimating the model using non-stationary variables represents a problem, first of all, each series must be transformed appropriately, as shown in Figure 6, which shows that all series appear now stable with different scales.

To fit a VAR model, all variables must be stationary, and to check this we perform the augmented ADF test for unit root nonstationarity. The ADF test results (Appendix Table 1) show that the null hypothesis for GDP, Inflation rate (I), and money supply (M) are not rejected, on the other hand, the ADF test rejects the null hypothesis for all transformed series; GDP Growth, I_DIFF, and M_DIFF, which allows us to include this series in the VAR model. The results also indicate that there is no evidence of significant inertia or positive memory in the GDP growth series for lags 1 to 3 (Appendix Table 2). However, for the fourth lag, the coefficient is positive and statistically significant, suggesting that there may be some influence of GDP growth from four periods ago on the current GDP growth. The p-values for the AR coefficients for lags 1 to 3 suggest that the growth of GDP in the previous periods does not have a significant influence on the current GDP growth. The series is stationary after differencing, as indicated by the Augmented Dickey-Fuller (ADF) test results, with all tests showing statistically significant results (p < 0.001).

Figure 6. Transformed macro economic time series variables

2.4 Modeling method

2.4.1 VAR models

In the context of VAR model for GDP nowcasting, the economic indicators (M and I) could act as predictors (exogenous variables) that influence GDP dynamics over time. The VAR model would then attempt to capture the relationship between past values of GDP and the current values of these economic indicators to forecast future GDP values. The vector autoregressive model is a foundational time series nowcasting technique represented by the Eq. (2):

$\begin{aligned} y_t=\Phi_1 y_{t-1} & +\Phi_2 y_{t-2}+\ldots+\Phi_p y_{t-p}+\beta_1 x_{1, t}+\beta_2 x_{2, t}+\ldots+\beta_k x_{k, t}+\varepsilon_t\end{aligned}$          (2)

where,

$y_t$ represents the value of the time series at time t (GDP at time t).

$\Phi_1, \Phi_2, \ldots, \Phi_p$ are the autoregressive coefficients corresponding to the lagged values, (determined based on the Log-Likelihood indicator.

p is the lag order of the autoregressive model, (determined based on the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).).

$x_{1, t}, x_{2, t}, \ldots, x_{k, t}$ represent the values of the economic indicators at time t.

$\varepsilon_t$ is a white noise error term at time t.

$\beta_1, \beta_2, \ldots, \beta_k$ are the coefficients representing the impact of the economic indicators on $y_t$.

To forecast future values of the endogenous variables (such as GDP) using a VAR model, the estimated coefficients from the model along with future values of the exogenous variables (such as economic indicators) must be used. The forecast equation for a VAR(p) model can be expressed as:

$\begin{aligned} \hat{y}_{t+h}= & \hat{\Phi}_1 \hat{y}_{t+h-1}+\hat{\Phi}_2 \hat{y}_{t+h-2}+\ldots+\hat{\Phi}_p \hat{y}_{t+h-p}+ \\ & +\hat{\beta}_1 \hat{x}_{1, t+h}+\hat{\beta}_2 \hat{x}_{2, t+h}+\ldots+\hat{\beta}_k \hat{x}_{k, t+h}+\hat{\varepsilon}_{t+h}\end{aligned}$          (3)

where,

$\hat{y}_{t+h}$ represents the forecasted values of the time series (future GDP) at time t+h).

$\widehat{\Phi}_1, \widehat{\Phi}_2, \ldots, \widehat{\Phi}_p$ are the estimated coefficient matrices from the VAR model.

$\hat{x}_{1, t}, \hat{x}_{2, t}, \ldots, \hat{x}_{k, t}$ represent the forecasted values of the economic indicators at time t+h.

$\hat{\varepsilon}_{t+h}$ represents the forecasted error terms at time t+h.

$\hat{\beta}_1, \hat{\beta}_2, \ldots, \hat{\beta}_k$, are the estimated coefficients representing the impact of the economic indicators on $\hat{y}_{t+h}$.

2.4.2 GBM models

The GBM algorithm is employed as the primary modeling technique in our approach. During model training, various hyperparameters such as the number of trees, learning rate, and maximum tree depth are experimented with to optimize model performance. Regularization techniques, including shrinkage (learning rate) and feature subsampling, are employed to prevent overfitting and enhance the generalization ability of the model.

Our modeling method utilizes a gradient boosting machine (GBM) algorithm, which sequentially fits multiple weak learners (decision trees) to the residuals of the model (Figure 2). The final prediction is the sum of predictions from all trees weighted by their respective coefficients. Mathematically, the prediction of a GBM model can be formulated by the Eq. (4):

$\hat{Y}=\sum_{i=1}^N \beta_i h_i(X)$          (4)

where, $\hat{Y}$ represents the predicted GDP growth, $N$ is the number of weak learners (decision trees) in the ensemble, $X$ denotes the input features (Eq. (1)), $h_i(X)$ are individual regression trees (prediction of the $i^{\text {th }}$ weak learner), and $\beta_i$ are the corresponding tree weights.

For forecasting future GDP growth, the economic scenarios data for the forecasted period is aligned with the format and structure of the historical data used during model training. The trained GBM model is then applied to predict GDP growth for the upcoming periods. Additionally, uncertainty associated with the forecasted GDP growth is evaluated by generating prediction intervals or confidence intervals, providing valuable insights into the range of possible outcomes.

For forecasting future GDP growth, we extend the feature matrix X to include the economic scenarios data for the forecasted period. The trained GBM models are then applied to predict GDP growth for the upcoming periods. The equation for forecasting can be interpreted as follows:

$\hat{Y}_{{future }}=\sum_{i=1}^N \beta_i h_i\left(X_{{future }}\right)$          (5)

where, $\hat{Y}_{{future}}$ is the forecasted GDP growth for the future periods, $h_i\left(X_{\text {future }}\right)$ denotes the prediction made by the ith weak learner for the future feature matrix ${X}_{{future}}$.

2.4.3 RF models

Random Forest is an ensemble learning method that operates by constructing a multitude of decision trees during training and outputting the mode of the average prediction (regression) of the individual trees. Each decision tree is built using a random subset of the training data and a random subset of the features. The prediction of Random Forest model can be formulated by the Eq. (6):

$\hat{Y}=\frac{1}{N} \sum_{i=1}^N f_i(X)$          (6)

where, $\hat{Y}$ represents the predicted outcome (predicted GDP growth), $N$ is the number of decision trees in the forest, and $f_i(X)$ represents the prediction of the i-th decision tree.

For forecasting future GDP growth, the trained RF models are then applied to forecast GDP growth for the upcoming periods. The equation for forecasting can be interpreted as follows:

$\hat{Y}_{{future }}=\frac{1}{N} \sum_{i=1}^N f_i\left(X_{{future}}\right)$          (7)

where, $\hat{Y}_{{future}}$ represents the predicted outcome (forecasted GDP growth) and $f_i\left(X_{{future }}\right)$ represents the prediction of the i-th decision tree for the data $X_{{future }}$.

2.5 Models evaluation

To assess the performance of AR, GBM, and RF models, we calculate several evaluation metrics, including the Mean squared error (MSE), Mean Absolute Error (MAE), and the coefficient of determination (R2). These metrics are defined as follows:

MSE:

$M S E=\frac{1}{N} \sum_{i=1}^N\left(Y_i-\hat{Y}_i\right)^2$          (8)

MAE:

$M A E=\frac{i}{N} \sum_{i=1}^N\left|\left(Y_i-\hat{Y}_i\right)\right|$          (9)

R2:

$R^2=1-\frac{\sum_{i=1}^N\left|\left(Y_i-\hat{Y}_i\right)\right|}{\sum_{i=1}^N\left|\left(Y_i-\bar{Y}_i\right)\right|}$          (10)

where, $N$ is the number of observations, $Y_i$ denotes the observed (actual) GDP growth and $\hat{Y}_i$ represents the predicted GDP growth considering both input features $X$ and economic scenarios (M_DIFF and I_DIFF), and $\bar{Y}_i$ is the mean of $Y_i$.

3. Results and Analysis

In this section, we outline the anticipated outcomes of comparing the performance of the autoregressive (AR) model with other established machine learning forecasting techniques, including GBM, and RF models. Additionally, we discuss the importance of conducting sensitivity analysis to evaluate the robustness of the machine learning models to variations in input features and hyperparameters.

3.1 Comparative analysis

A comprehensive comparative analysis is conducted to assess the performance of ML models against alternative forecasting techniques. Specifically, the accuracy, precision, and generalization ability of the ML models are compared with AR model. Key metrics such as mean absolute error, mean squared error, and R-squared values are evaluated to provide insights into the strengths and limitations of each approach.

The fitted VAR model, as depicted in Figure 7, it incorporates GDP growth, the first difference of the inflation rate, and the money supply time series as endogenous variables with 1 to 4 AR lags. This model exhibits a reduced variability compared to the observed data, as evidenced by the fitted values not reaching the extremes observed in the dataset. This finding suggests that while the model performs adequately for minor to moderate fluctuations in GDP, it demonstrates larger errors when confronted with more pronounced changes in GDP. Consequently, a thorough examination of residuals and autocorrelation reveals heavy-tailed distributions, as detailed in Appendix Table 1. These outcomes imply that the model may be more dependable for forecasting purposes when operating within expected parameter ranges, yet it may exhibit less reliability when analyzing extreme tail behavior in the data. This limitation underscores the need for more sophisticated modeling approaches, such as machine learning techniques.

Figure 7. Predicted GDP growth with AR model

Figure 8 illustrates the fitted GBM model, while Figure 9 represents the RF model. Comparatively, these machine learning models demonstrate higher efficiency in predicting GDP growth. The GBM model exhibits a mean absolute error of 0.8175 and a mean squared error of 4.1606, with an R-squared value of 0.9362. On the other hand, the RF model achieves a mean absolute error of 0.7900 and a mean squared error of 3.9025, with an R-squared value of 0.9444. While the RF model gives a mean absolute error of 2.5322 and a mean squared error of 43.6876, with an R-squared value of 0.1924. These metrics, as shown in Table 1, indicate that both machine learning models significantly outperform the AR model in terms of predictive accuracy and explanatory power represented by a lower MAE and MSE errors, and a higher R-squared values.

Figure 8. Predicted GDP growth and training error evolution with GB model

Figure 9. Predicted GDP growth and training error evolution with RF model

Table 1. Performance metrics comparison

Modeling Method

Mean Absolute Error (MAE)(1)

Mean Squared Error (MSE)(2)

R-Squared(3)

VAR

2.5322

43.6876

0.1924

GBM

0.8175

4.1606

0.9362

RF

0.7900

3.9025

0.9444

Notes:(1),(2): The MAE and MSE represents the average absolute and squared difference between the predicted and actual values of GDP growth. A lower MAE indicates better accuracy, and in this case, both the GBM and Random Forest models outperform the AR model, with the RF achieving the lowest MAE.  (3): R-squared indicates the proportion of variance in the GDP growth data explained by the model. Higher R-squared values suggest better generalization ability. In this comparison, both GBM and Random Forest models exhibit significantly higher R-squared values compared to AR model, indicating their ability to capture a larger portion of the variability in the data and providing better predictions.

Figure 10, shows the depicted results obtained from Figures 7, 8, and 9 with a zoom-in focusing on a specific period (1983 to 1987) to provide a more detailed analysis of the anticipated GDP growth trajectory. This visualization allows for a closer examination of the forecasted values and their alignment with observed data points, offering insights into the accuracy and precision of the RF model's predictions.

Figure 10. Predicted GDP growth with AR, GBM, and RF models

Figure 11 showcases the error of the predicted GDP Growth for the AR, GBM, and RF methods, providing a comparative analysis of their forecasting performance. By examining the magnitude and distribution of errors across different forecasting horizons, this figure offers valuable insights into the relative strengths and weaknesses of each method in capturing and predicting GDP growth dynamics.

Figure 11. Predicted GDP growth errors of AR, GBM, and RF models

The forecasted results obtained using the VAR method, depicted in Figure 12, offer insights into the anticipated GDP growth trajectory for the next six years. Despite its classical approach, the VAR model may encounter challenges in capturing complex relationships and dynamic patterns present in the data, especially those influenced by the money supply and inflation rate scenario. Consequently, while the forecasted values serve as a baseline projection, they may lack the precision and robustness offered by more advanced machine learning techniques in scenarios where the money supply and inflation rate play significant roles in shaping economic outcomes.

Figure 12. Forecasted GDP/GDP growth using AR model

The forecasted results generated using the GBM method, illustrated in Figure 13, provide a more nuanced perspective on GDP growth predictions, taking into account the influence of the Money Supply and Inflation Rate scenario. Leveraging advanced machine learning algorithms, such as gradient boosting, the GBM model excels in capturing intricate patterns and nonlinear relationships inherent in the data, including those influenced by changes in the Money Supply and Inflation Rate. As a result, the forecasted values derived from the GBM model may exhibit greater accuracy and reliability, particularly in scenarios involving complex economic dynamics and uncertainties associated with variations in the Money Supply and Inflation Rate.

Figure 13. Forecasted GDP/GDP growth using GBM model

Figure 14. Forecasted GDP/GDP growth using RF model

Finally, the forecasted results obtained using the RF method, showcased in Figure 14, underscore the exceptional predictive performance of this machine learning approach, even in the presence of the Money Supply and Inflation Rate scenario. The RF model excels in capturing the underlying patterns and variations in the data, including those influenced by changes in the Money Supply and Inflation Rate, thereby providing highly accurate and reliable forecasts. With its ensemble learning framework and robustness to overfitting, the RF method emerges as the preferred choice for forecasting GDP growth, outperforming both the VAR and GBM methods, especially when considering the dynamic influences of the Money Supply and Inflation Rate on economic outcomes.

In conclusion, the obtained results indicate that the RF model outperformed the other techniques (AR, GBM) in terms of forecasting accuracy. It achieved the lowest mean absolute error (MAE) and mean squared error (MSE), and the highest R-squared value. The RF model demonstrated its effectiveness in capturing complex relationships between economic indicators (Money supply, Inflation rate) and GDP growth. However, it is essential to note that the performance of machine learning models may vary depending on the dataset, feature selection, hyperparameter tuning, and other factors. While the RF model showed promising results in our study, further research is needed to explore its robustness across different economic scenarios and time periods.

3.2 Sensitivity analysis

This section conducts a sensitivity analysis by systematically varying the hyperparameters—number of trees, learning rate, and maximum tree depth—of the GBM and RF models. It calculates the evaluation metrics (MAE, MSE, R-squared) for each combination of hyperparameters and identifies the optimal settings to minimize MAE, minimize MSE, and maximize R-squared. The provided evaluation metrics offer valuable insights into the comparative performance of different forecasting techniques.

Figures 15 and 16 illustrate the sensitivity analysis results of the GBM and RF models respectively. These surface figures provide a visual representation of how changes in hyperparameters, including the number of trees, learning rate, and max tree depth, impact the performance metrics of each model. By examining the contours and gradients of the surfaces, the RF model generally outperforms the GBM model. This conclusion is drawn from the optimal hyperparameters selected for each model across different evaluation metrics (Table 2). For example, when minimizing MAE, the RF model achieved a lower value (0.7672) compared to the GBM model (0.7917). Similarly, for minimizing MSE, the RF model obtained a lower value (3.4599) compared to the GBM model (3.6983). Additionally, when maximizing R-squared, the RF model achieved a slightly higher value (0.9450) compared to the GBM model (0.9433). These findings suggest that the RF model demonstrates greater robustness and effectiveness across various hyperparameter configurations, indicating its superior performance in sensitivity analysis compared to the GBM model. These findings highlight the effectiveness of ensemble-based techniques like Random Forest method for economic forecasting tasks.

In conclusion, GBM demonstrated several strengths, including high predictive accuracy and the ability to capture complex, non-linear relationships within the economic data (see Table 1 and Figures 8, 10, and 13). This capability is particularly beneficial given the volatile and dynamic nature of Algeria's economy. Additionally, GBM provides valuable insights through feature importance, helping policymakers understand the key drivers of economic growth. Its robustness to overfitting, achieved through techniques such as shrinkage and subsampling, ensures reliable performance on unseen data.  However, GBM's iterative nature increases computational demands, making it less efficient with very large datasets or limited computational resources. Furthermore, GBM's performance is highly sensitive to hyperparameter settings, requiring careful tuning to achieve optimal results (see Table 2). RF also has notable strengths, such as robustness to noise and outliers due to its ensemble approach, which minimizes individual tree errors (see Table 1 and Figures 9, 10, and 14). This robustness is particularly useful for handling economic data that may contain irregular shocks. RF's generalization capability, achieved by averaging the results of multiple decision trees, reduces the risk of overfitting, providing reliable predictions across different data subsets. It is also easier to use, with fewer hyperparameters and less sensitivity to their settings compared to GBM. However, RF's interpretability is generally lower than GBM, as it lacks the clear, additive model structure of boosting. While RF is less computationally demanding than GBM, it can still be intensive, especially with a high number of trees and large datasets (see Table 2). The differences in model performance can be attributed to the unique characteristics of Algeria's economic data. Algeria's economy is heavily influenced by external factors such as oil prices, leading to high volatility and structural breaks in the data. GBM's strength in handling non-linear relationships and interactions is particularly advantageous in this context, allowing it to capture the complex dynamics of the economy more effectively than RF (see Figures 7, 11, and 15). However, RF's robustness to noise and outliers also proves beneficial given the presence of irregular economic shocks. The generalization capability of RF ensures that it remains a reliable choice for making stable forecasts despite the inherent volatility of the Algerian economy (see Table 1 and Figures 11, 12, and 16). Both models have demonstrated their utility in economic forecasting, with their respective strengths offering complementary benefits. GBM's high predictive accuracy and detailed feature importance provide deep insights, while RF's robustness and ease of use offer practical advantages. The choice between these models ultimately depends on the specific requirements of the forecasting task and the available computational resources. By understanding the strengths and weaknesses of each model, policymakers can make informed decisions when selecting appropriate forecasting tools. This ensures that they are well-equipped to navigate the complexities of Algeria's economic landscape.

Figure 15. Sensitivity analysis result of GBM model

Figure 16. Sensitivity analysis result of RF model

Table 2. Optimal hyperparameters selection for evaluation metrics comparison

 

Hyperparameters

Number of Trees

Learning Rate

Max Tree Depth

Minimizing (MAE)(1)

GBM

0.7917

150

0.20

5

RF

0.7672

50

0.10

3

Minimizing (MSE) (2)

GBM

3.6983

150

0.10

7

RF

3.4599

50

0.10

3

Maximizing R-squared(3)

GBM

0.9450

50

0.10

5

RF

0.9433

50

0.01

3

Notes: (1),(2)These hyperparameters are identified as the best combination for minimizing the absolute and squared difference between the predicted and actual values of GDP growth. A lower MAE and MSE suggests that the model's predictions are, on average, closer to the true values (better model performance in terms of reducing prediction errors). (3)These hyperparameters maximize the coefficient of determination (R-squared), indicating the proportion of variance in the GDP growth data explained by the model. Higher R-squared values suggest that the model captures a larger portion of the variability in the data and provides better predictions.

4. Conclusions

In this study, we explored the application of different machine learning techniques for forecasting GDP growth using economic indicators. We compared the performance of AR models, GBM, and RF in predicting future GDP growth based on historical economic data. Through a comparative and sensitivity analysis, we anticipate gaining valuable insights into the effectiveness and versatility of the ML models for forecasting GDP growth. By systematically evaluating its performance against established techniques and assessing its robustness to variations in hyperparameter tuning, we aim to provide a comprehensive assessment of alternative modeling approaches, such as machine learning model's suitability for economic forecasting applications. These findings will contribute to advancing our understanding of forecasting methodologies and inform decision-making processes in economic analysis and policy formulation by making policymakers, economists, and analysts more informed decisions and better navigate the complexities of the global economy.

Based on the findings of this study, several recommendations can be made for applying these forecasting models to the Algerian economy. First, policymakers should consider employing GBM for scenarios requiring high predictive accuracy and detailed insights into the key drivers of economic growth. This can be particularly useful for strategic planning and policy formulation. Second, RF should be used in situations where robustness to noise and outliers is crucial, such as in the presence of irregular economic shocks or when dealing with highly volatile data. Third, the choice of model should take into account the specific requirements of the forecasting task, including the availability of computational resources and the need for interpretability versus predictive power. Lastly, continuous monitoring and validation of the models should be implemented to ensure their reliability and accuracy over time, adapting to any structural changes in the economy. By leveraging the strengths of both GBM and RF, Algerian policymakers can enhance their economic forecasting capabilities, leading to more informed and effective decision-making.

Despite the promising results, this study has several limitations. First, the analysis was confined to a limited set of economic variables, which may not fully capture the multifaceted nature of Algeria's economic dynamics. Future research should consider incorporating a broader range of economic indicators, such as employment rates, inflation, and foreign direct investment, to provide a more comprehensive analysis. Second, the data used in this study was sourced from a single database, which might introduce biases or overlook other critical data sources. Expanding the data sources to include international databases and more granular local data could enhance the robustness of the forecasts. Third, while GBM and RF models were chosen for their robustness and accuracy, exploring other advanced machine learning techniques such as neural networks or hybrid models could provide additional insights and potentially improve forecasting performance. Finally, longitudinal studies that evaluate the performance of these models over extended periods and under varying economic conditions would be valuable in assessing their long-term applicability and reliability.

Acknowledgment

The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this paper. Furthermore, there is no conflict of interest exists.

Nomenclature

GDP

Gross Domestic Product

M, I

Money Supply, Inflation

VAR

Vector Autoregressive

ML

Machine Learning

GBM

Gradient Boosting Machine

RF

Random Forest

DT

Decision Trees

SVM

Support Vector Machines

SVR

Support Vector Regression

Appendix

Appendix Table 1. ADF table

 

h

P Value

Stat

C Value

Lags

Alpha

Model

Test

 

___

_____

______

______

___

____

_____

____

Test1

True

0.001

-15.513

-1.9421

0

0.05

{'AR'}

{'T1'}

Test2

True

0.001

-10.703

-1.9421

1

0.05

{'AR'}

{'T1'}

Test3

True

0.001

-8.7858

-1.9421

2

0.05

{'AR'}

{'T1'}

Test4

True

0.001

-9.9026

-1.9421

3

0.05

{'AR'}

{'T1'}

Test5

True

0.001

-8.3068

-1.9421

4

0.05

{'AR'}

{'T1'}

AR-Stationary 3-Dimensional VAR (4) Model

Effective Sample Size: 231

Number of Estimated Parameters: 39

LogLikelihood: -7885.44

AIC: 15848.9

BIC: 15983.1

Appendix Table 2. Summary of estimated model parameters

 

Value

Standard Error

T Statistic

P Value

 

___

_____

______

______

Constant(1)

1.0926

0.55812

1.9576

0.050281

Constant(2)

0

2.0601e+10

0

1

Constant(3)

0.11735

0.55785

0.21036

0.83339

AR{1}(1,1)

-0.0268

0.062262

-0.43054

0.66681

...........

……

…….

……..

…….

AR{2}(1,1)

-0.0274

0.062279

-0.44097

0.65924

...........

……

…….

……..

…….

AR{3}(1,1)

-0.0235

0.062283

-0.37832

0.7052

...........

……

…….

……..

…….

AR{4}(1,1)

0.44418

0.062262

7.1341

9.7428e-13

Implementation Details of ML Algorithms (Sensitivity Analysis of RF and GBM Models)

1. Gradient Boosting Machine (GBM):

  1. Hyperparameters and Rationale:
    • Learning Rate: Set to 0.01, 0.1, and 0.2 to balance learning speed and model complexity.
    • Number of Trees: Varies between 50, 100, and 150 to explore different ensemble sizes.
    • Max Depth: Explores depths of 3, 5, and 7 to control model complexity and overfitting.
  2. Hyperparameter Tuning:
    • A grid search was conducted over:
      • Learning rate (0.01, 0.1, 0.2),
      • Number of trees (50, 100, 150),
      • Max depth (3, 5, 7).
    • Cross-validation involved a 5-fold split to ensure robust evaluation.

2. Random Forest (RF):

  1. Hyperparameters and Rationale:
    • Number of Trees: Explores 50, 100, and 150 to balance between model performance and computational efficiency.
    • Max Features: Uses 'NumPredictorsToSample', which corresponds to 'sqrt' in RF, limiting the number of features considered per split.
    • Min Samples Split and Leaf: Default settings (not explicitly set in code) are used to allow the model to make informed decisions without excessive pruning.
  2. Hyperparameter Tuning:
    • A grid search was conducted over:
      • Number of trees (50, 100, 150),
      • Max features ('auto', 'sqrt', 'log2').
    • Cross-validation (5-fold) ensured robust evaluation and optimal hyperparameter selection.

3. Validation Methods:

  1. Cross-Validation:
    • Both GBM and RF models underwent 5-fold cross-validation:
      • Data was split into 5 subsets (Data partitioning technique).
      • Each model was trained on 4 subsets and validated on the remaining subset.
      • This process was repeated 5 times to cover all data points.
  • 5-fold cross-validation helps in mitigating overfitting and provides a reliable estimate of model performance on unseen data.
  References

[1] Ali, A., Fatima, N., Ali, B.J.A.R., Husain, F. (2023). Imports, exports and growth of gross domestic product (GDP)-A relational variability analysis. International Journal of Sustainable Development and Planning, 18(6): 1681-1690. https://doi.org/10.18280/ijsdp.180604

[2] World Bank. (2024). Algeria Economic Update. Investing in Data for Diversified Growth. Retrieved from World Bank.

[3] Mahmoudi, O., Bouami, M.F., Badri, M. (2022). Arabic language modeling based on supervised machine learning. Revue d'Intelligence Artificielle, 36(3): 467-473. https://doi.org/10.18280/ria.360315

[4] Reddy, D.M.S., Neerugatti, U.R. (2023). A comparative analysis of machine learning models for crop recommendation in India. Revue d'Intelligence Artificielle, 37(4): 1081-1090. https://doi.org/10.18280/ria.370430

[5] Talib, M.M., Croock, M.S. (2024). Optimizing energy consumption in buildings: Intelligent power management through machine learning. Mathematical Modelling of Engineering Problems, 11(3): 765-772. https://doi.org/10.18280/mmep.110321

[6] Nasrallah, H.S., Stepanyan, I.V., Nassrullah, K.S., Florez, N.J.M., AL-Khafaji, I.M.A., Zidoun, A.M., Sekhar, R., Shah, P., Parihar, S. (2024). Elevating mobile robotics: Pioneering applications of artificial intelligence and machine learning. Revue d'Intelligence Artificielle, 38(1): 351-363. https://doi.org/10.18280/ria.380137

[7] Baker, M.R., Mahmood, Z.N., Shaker, E.H. (2022). Ensemble learning with supervised machine learning models to predict credit card fraud transactions. Revue d'Intelligence Artificielle, 36(4): 509-518. https://doi.org/10.18280/ria.360401

[8] Richardson, A., van Florenstein Mulder, T., Vehbi, T. (2021). Nowcasting GDP using machine-learning algorithms: A real-time assessment. International Journal of Forecasting, 37(2): 941-948. https://doi.org/10.1016/j.ijforecast.2020.10.005

[9] Friedman, J.H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): 1189-1232.

[10] Breiman, L (2001). Random forests. Machine Learning, 45: 5-32. https://doi.org/10.1023/A:1010933404324

[11] Simkins, S. (1995). Forecasting with vector autoregressive (VAR) models subject to business cycle restrictions. International Journal of Forecasting, 11(4): 569-583. https://doi.org/10.1016/0169-2070(95)00616-8

[12] Touitou, M., Laib, Y., Boudeghdegh, A. (2019). The impact of exchange rate on economic growth in Algeria. In CBU International Conference Proceedings, Prague, Czech Republic, pp. 323-330.

[13] Fekir, H., Bouras, N. (2022). The impact of financial development on economic growth: Empirical evidence from Algeria. Beam Journal of Economic Studies, 6(2): 689-700.

[14] Haouas, A., Ochi, A., Labidi, M.A. (2024). Sources of Algeria's economic growth, 1979-2019: Augmented growth accounting framework and growth regression method. Regional Science Policy & Practice, 16(3): 12448. https://doi.org/10.1111/rsp3.12448

[15] Ayad, H., Hassoun, S.E.S., Mostefa, B. (2020). Causality between government expenditure and economic growth in Algeria: Explosive behavior tests and frequency domain spectral causality. Economic Computation & Economic Cybernetics Studies & Research, 54(2): 315-332. https://doi.org/10.24818/18423264/54.2.20.19

[16] Messaoudi, M. (2023). Predicting the impact of fiscal and monetary policy on economic growth in Algeria during the period 1980-2022. Journal of Al Mayadine Al Iktissadia, 6(1): 487-508.

[17] Periklis,G., Papadimitriou, T., Sofianos, E. (2022). Forecasting unemployment in the euro area with machine learning. Journal of Forecasting, 41(3): 551-566. https://doi.org/10.1002/for.2824 

[18] Katris, C. (2019). Prediction of unemployment rates with time series and machine learning techniques. Computational Economics, 55(2): 673-706. https://doi.org/10.1007/s10614-019-09908-9

[19] Sermpinis, G., Stasinakis, C., Theofilatos, K., Karathanasopoulos, A. (2014). Inflation and unemployment forecasting with genetic support vector regression. Journal of Forecasting, 33(6): 471-487. https://doi.org/10.1002/for.2296

[20] Yoon, J. (2021). Forecasting of real GDP growth using machine learning models: Gradient boosting and random forest approach. Computational Economics, 57(1): 247-265. https://doi.org/10.1007/s10614-020-10054-w

[21] Velidi, G. (2022). GDP prediction for countries using machine learning models. Journal of Emerging Strategies in New Economics, 1(1): 41-49. https://doi.org/10.56801/jesne.v1i1.5

[22] Ghosh, S., Ranjan, A. (2023). A machine learning approach to GDP nowcasting: An emerging market experience. Bulletin of Monetary Economics and Banking, 26: 33-54. https://doi.org/10.59091/1410-8046.2055

[23] Shams, M.Y., Tarek, Z., El-kenawy, E.S.M., Eid, M.M., Elshewey, A.M. (2024). Predicting Gross Domestic Product (GDP) using a PC-LSTM-RNN model in urban profiling areas. Computational Urban Science, 4(1): 3. https://doi.org/10.1007/s43762-024-00116-2

[24] Hamiane, S., Ghanou, Y., Khalifi, H., Telmem, M. (2024). Comparative analysis of LSTM, ARIMA, and hybrid models for forecasting future GDP. Ingénierie des Systèmes d’Information, 29(3): 853-861. https://doi.org/10.18280/isi.290306

[25] Sahed, A., Kahoui, H., Mekidiche, M. (2023). Forecasting Algerian GDP using adaptive neuro-fuzzy inference system during the period 1990-2019. Journal of Studies in Economics and Growth, 7(2): 81-106. 

[26] Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M, (2009). Time Series Analysis: Forecasting and Control. John Wiley & Sons.

[27] Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.