Optimizing Lettuce Crop Yield Prediction in an Indoor Aeroponic Vertical Farming System Using IoT-Integrated Machine Learning Regression Models

ABSTRACT


INTRODUCTION
The increasing population, climate change, and food constraints have led to a growing interest in alternative farming methods like hydroponics and aeroponics.These methods offer year-round harvests, weather protection, easy transportation, support for various crop cultivars, and diseasefree practices, making them crucial for addressing food security concerns in the global economy.Aeroponics, a soilless method with an innovative tower structure, has shown significant improvements in crop yields ranging from 7% to 65%, accelerated crop maturation rates, and optimized water, pesticide and fertilizer consumption patterns when compared to the traditional farming techniques [1,2].
Soil-free cultivation uses hydroponic or aeroponic systems to grow plants without soil.Hydroponics involves submerging roots in nutrient solutions, while aeroponics aerosolizes the solution.These systems offer a controlled environment and easy nutrient manipulation, making them ideal for genetic studies and screening mutant phenotypes.Aeroponic systems are more efficient due to their ability to suspend roots in mist, improve oxygen exposure, and produce fine particles [3][4][5].
Artificial Intelligence (AI) has shown potential for improving crop yield predictions in fields like healthcare, robotics, and meteorology.It can enhance efficiency and accuracy in agricultural yield prediction by optimizing parameters like light exposure, nutrient supply, and temperature [2,[6][7][8].In aeroponic systems, the utilization of AI techniques like machine learning algorithms plays a vital role, especially in data analysis, real-time growth monitoring, resource management and predictive modeling.
The study stresses how important it is to accurately predict aeroponic crop yields in modern farming.This lets farmers use advanced machine learning algorithms to make the best use of their resources, come up with effective farming strategies, and cut down on losses [9,10].Accurate yield prediction in aeroponic systems is crucial for food production and resource management.It optimizes factors like yields, crop appearance, nutritional content, quality, and taste while minimizing resource usage like nutrients, water, and energy, leading to effective cost utilization and optimized resource utilization.
Real-time aeroponics systems require improved decisionmaking processes and accuracy in yield prediction models to address the aforementioned factors, which have been elaborated on in this research work.The structure of this document comprises four sections: Section 2 provides an overview of existing works, while Section 3 details the methods used to collect and analyze data, including the implementation of machine learning models and interpretability techniques.Section 4 presents the findings and their implications, while Section 5 summarizes key takeaways and potential areas for further research.

SURVEY OF LITERATURE
The literature survey on lettuce yield predictions in aeroponic vertical farming systems using machine learning regression algorithms is a critical examination of precision agriculture research.The survey focuses on lettuce cultivation in aeroponic vertical farming and aims to identify trends, methodologies, and key findings in predictive modeling for yield outcomes.The comprehensive exploration not only establishes the theoretical foundation for future research but also contributes insights for developing robust predictive models tailored to the unique challenges and opportunities presented by aeroponic vertical farming in lettuce cultivation.
Nutrient sensors detect and measure plant environment and data transmitted through wireless networks, determining the necessary nutrients for plant growth, such as nitrogen, phosphorus, and potassium, which are crucial for vertical or closed crop cultivation [11].
The study introduces the Lettuce Crop Development Monitoring-Boost (LCGM-Boost) regression model, which improves lettuce crop monitoring and predicts yield in aeroponic vertical farming systems.The model considers pH, EC, PPM, turbidity, and temperature parameters.It shows robustness against outliers, superior prediction accuracy, and reduced error rates.This model is suitable for automating lettuce crop growing settings and predicting yield [12].
Aeroponics, a soilless farming technique, has been significantly transformed by technology, offering environmental control, automated nutrient delivery, and plant health monitoring.The most common technology is sensing technology and Industry 4.0, offering sustainability and time efficiency.However, technical complexity and power dependency pose challenges.The Technology Adoption and Integration in Sustainable Agriculture (TAISA) model assesses technology integration in sustainable agriculture systems.Asia leads in technology integration, with Indonesia being the most studied country.As technology advances, careful consideration of benefits and limitations will lead to more efficient, productive, and resilient aeroponic cultivation systems [13].
The study assesses the use of Support Vector Regression (SVR) in estimating crop yields using the LCGMS Regression model, revealing environmental factors affecting crop growth.It suggests future research should focus on improving evaluation indices and data features for evidence-based decision-making, food security, and sustainable agricultural practices [14].
The authors have developed a meta-heuristic optimization technique for diagnosing heart disease using sound waves.The method uses Particle Swarm Optimization, the Firefly approach, and the Cuckoo Search Algorithm to find the most optimal feature vector.The approach is evaluated on the Pascal dataset, which is divided into separate sets for testing and training.Machine learning methods like Random Forest, K-Nearest Neighbors, Support Vector Machines, and Naive Bayes are used.The model achieved the highest classification accuracy of 90.32% using CSA and Naive Bayes [15].
The article suggests using shape curvature and multi-feature fusion for weed identification in crops.Shape curvature is useful for shape-based identification, while texture features provide discriminatory information.Combining both is advantageous.The SVM classifier outperformed other classifiers with 99.33% classification accuracy, potentially benefiting autonomous weed management systems by reducing false negative rates [16].
The study presents a high-throughput architecture for detecting anomalies in streaming data using the Apache-Kafka-powered model.The RF algorithm achieves average accuracy, precision, recall, f-score, and computation time values of 98.6%, 91.8%, 90.4%, 91.09%, and 38.5ms, respectively.However, it exhibits over-fitting tendencies when dealing with small-sized data.The architecture's ability to channel data without data loss and consistent accuracy make it feasible for real-life applications [17].
A machine learning framework has been developed to assess students' satisfaction with online admissions counseling.The framework uses a Decision Tree Classifier without SMOTE and SVC-linear using SMOTE to estimate satisfaction rates.The accuracy was achieved at 48% in the Decision Tree Classifier without and 88% in SVC-linear using SMOTE, allowing for the optimization of students' choices based on their strengths, weaknesses, and related parameters [18].
Franchetti et al. [19] used 3D plant modeling and deep segmentation techniques to forecast the plant growth of Basil phenotyping with the help of features plant height, leaf area, and leaf weight where the accuracy was moderate.In another article, the authors used random forest and SVM for predicting rosette phenotyping with the help of plant leaves as a feature [20].The LSSVM machine learning framework was proposed to find the water stress of the wheat crop.Here, the plant leaf was used as an essential feature [21].Techniques like Self-Organizing Maps (SOM), hierarchical clustering, and k-means algorithm were utilized for lettuce crop growth prediction with the extracted feature plant leaf and achieved higher accuracy rates [22].Data visualization and Logistic regression approaches were used for analyzing the distribution of the dataset of the lettuce crop and produced the average error rates while predicting the lettuce yield [23].In Mamatha and Kavitha [24], K nearest neighbors were implemented for predicting the yield of leafy vegetables which used the plant growth as the feature vector and produced a higher prediction accuracy.Reinforcement learning has been adopted by the authors for finding the phenotyping of the crops chili, beans, potatoes, and onions with a prediction accuracy of 83.563%.This work extracted plant leaves as the observed features for learning purposes [25].The authors in the article [26] analyzed the effectiveness of the random forest regression model in predicting the aeroponic lettuce crop yield.
So, from all these previous researches, it is inferred that most of the authors have utilized the applications of integrated IoT and ML algorithms without any doubts.Hence, the comparatives of those ML algorithms with their specific advantages have been carried out by the authors in this manuscript to provide which model is better for predicting aeroponic lettuce crop yield.

AEROPONIC LETTUCE YIELD PREDICTION
This section deals with the prediction of growth stages and harvesting of the Lactuca Sativa i.e. botanical name of lettuce crop.The yield prediction usually involves two different methodologies, 1) manual and 2) technology-driven approach.Both techniques are explained in brief in the following subsections.To increase lettuce crop production through vertical aeroponic systems, this research examines a twofold methodology that blends conventional techniques (manual or traditional) with advanced technology (technology-driven applications) to accurately predict yields.Given the distinctive features of aeroponic systems, it is crucial to adopt an integrated approach that combines tried-and-true agricultural practices with state-of-the-art tools to achieve optimal results.

Parameters
Manual Approach Technology-Driven Approach

Data Collection
Conventional methods involve collecting data through visual evaluations of plant health, nutrient availability, and growth patterns, allowing researchers to identify and document key factors for qualitative analysis.
Sensor technology like Internet of Things devices and environmental sensors provide real-time quantitative data on crop growth variables, ensuring continuous observation and increased accuracy in information gathering.

Model Developments
Prediction models are enhanced by the addition of experts' subject knowledge.In building models, leaf color, size, and general health of plants are taken into account along with information gathered manually.
ML algorithms process sensor data to identify complex patterns, and analyze ensemble techniques, neural networks, or regression models, providing lettuce yield estimates as a quantitative framework.

Performance Evaluation
Agricultural specialists conduct qualitative analysis to evaluate model effectiveness, based on their extensive expertise, to evaluate the models' usefulness and applicability.
ML models' accuracy and efficacy are evaluated using quantitative measures like mean squared error and Rsquared values, providing a basis for identifying reliable prediction algorithms.

Predicting the yield of aeroponic lettuce-manual and technology-driven methods
The basic comparison of predicting yield using manual and with the help of technology is presented in the form of Table 1.

Machine learning in lettuce yield prediction
With their advanced analytical ability to cope with the complexity of agricultural systems, machine learning (ML) algorithms have become a potent tool in the prediction of lettuce crop yields.Here, in this work, we have utilized different machine-learning regression models that provide a greater impact on the yield prediction of the lettuce crop.

Utilized machine learning models
Linear Regression.Linear regression is the fundamental and interpretable machine learning regression model used for predicting numerical values with the help of the linear equation.In an aeroponic lettuce crop yield prediction system, the model estimates the linear relationship between the one input variable and the output variable.It is mathematically represented as: where, y is the dependent variable (crop yield), x is the independent variable (input parameters), m is the slope and b is the intercept term.Multiple Linear Regression Model.Linear regression models are simple approaches used to find the relationships between two variables, the input, and the output variable.But for more complex relationships that require more consideration, the multiple linear regression models were highly utilized to find the relationships between the multiple input variables and the output variable i.e. the situation where multiple independent variables are used to estimate the outcome of the single dependent variable.There are two main uses of this regression analysis: 1) to determine the dependent variable based on the multiple independent variables and 2) to determine how strong the relationship is between the variables.
Multiple linear regression is often used when forecasting more complex relationships.In an aeroponic lettuce crop yield prediction system, multiple regression models can make effective predictions on the new and unseen data.The coefficients of the feature variables are determined which allows the growers to make informed decisions about the crop behavior and yields.Equation 2 is the mathematical representation of the MLR: where, y is the dependent variable (crop yield), [x1, x2, …, xn] is the independent variable (input parameters) and [b0, b1, b 2, …, b n] are the coefficients.Support Vector Regression.Support Vector regression is a type of supervised machine learning algorithm that works similarly to that of the SVM algorithm.The model aims to minimize the errors in the actual and predicted values which fit the hyper plane into the data points.In an aeroponic lettuce yield prediction system, the dependent variable lettuce yield is predicted using the independent variables which are the environmental factors for growing the lettuce crop with the help of different kernel functions to fix the non-linearities into linear problems.It deals with the complex relationship between the environmental factors and the yield.SVR allows hyper-parameter tuning which improves the accuracy of the prediction model to better fit into the dataset.Like other regression models, SVR can be iteratively improved by incorporating the new and the unseen data.
The mathematical formulation of the SVR objective function involves defining a hyperplane that finds the relationship between the input parameters and the output (yield).The data points n concerning the input parameters Xi and the corresponding output (yields) yi, where, i=1, 2, 3, …, n the SVR objective function could be written as two different equations as represented below.
i) In the case of linear kernel where, ϕ(Xi) is the transformation of Xi into a highdimensional space.
In these equations,    are the parameters to be learned from the training data, ζi and   * are slack variables allowing for deviations from the actual output and C is a regularization parameter controlling the trade-off between model simplicity and accuracy.
Random Forest Regression.The Random Forest (RF) is the collection and utilization of multiple decision trees for output predictions.It is the ensemble learning approach that combines the output of multiple weak learners to improve the accuracy and robustness of the model.Each decision tree deals with the subset of random features that promotes the diversity leading to the chances of better predictions.It has the capability of handling missing values which does not require any external preprocessing techniques.Also, the model could effectively handle larger datasets.In an aeroponic vertical farming system, the RF supports the complex interaction between the dependent and the independent features.One of the main advantages of RF regression is that it handles the overfitting problem due to the randomness in the feature selection.With the help of feature importance, the growers were able to gain insights into the input parameters that have the most significant impact on the lettuce yield.
It is represented as the average of individual tree predictions which is given below: where,  ̂() is the predicted output (yield) for the given set of input parameters (X), N is the number of trees in the random forest, Fi(X) is the prediction output from i th decision tree.
Here, each tree Fi(X) is constructed based on the random subset of features at each split.The final prediction is an average of these individual tree predictions.
XGBoost Regression.The Extreme Gradient Boosting-XGBoost model is a powerful machine learning algorithm that excels in real-world prediction tasks.It uses a decision-treebased ensemble model to reduce errors and improve accuracy.The learning rate is used to control the behavior of each decision tree, affecting the overall model's accuracy.The model is effective in aeroponic lettuce crop yield prediction, handling missing values, non-linearities, and complex relationships, and preventing overfitting.It also focuses on feature importance, identifying environmental factors, and ensuring sufficient resource allocation and decision-making by growers.The model learns patterns and predicts outcomes effectively with new data.
Assuming the dataset with n observations and m features and predicting a continuous output variable y based on the input features X, the XGBoost regression model is given by:   ̂= (  ) = ∑   (  )  =1 (6) where,   ̂ is the predicted output for observation i, (  ) = ∑   (  )  =1 is the ensemble prediction for observation i, fk(xi) is the prediction of the k th regression tree.
The individual regression tree prediction, fk(xi) is constructed based on the sum of predictions from each tree node along with the path that observation  takes down the tree.
(  ) =   (,) (7) where,   (,) is the weight associated with the terminal node q(i.k) that observation  reaches in the k th regression tree.Hence, the overall objective function for the XGBoost regression model is the sum of a regularized training loss and the regularization term: where, θ represents the parameter of the model; ( ,   ̂) represents the training loss of the observation i and Ωfk is the regularization term for the k th regression tree.Here, important to note that is, the training loss is often MSE for the regression trees.

Systematic representation of lettuce yield prediction
The systematic representation or the workflow diagram is represented in Figure 1.It is the collection of different modules used to describe the stepwise implementation of the proposed system.In other words, it is the encapsulation of the workflow that provides a clear-cut graphical illustration of the implementation procedure.It improves communication and provides an easy understanding of the underlying mechanism.

Figure 1. Lettuce crop yield prediction system
From Figure 1, it is clear that the implementation procedure starts with data collection and proceeds with the series of processes towards the yield prediction as the outcome.The detailed description of the various processes is explained below.

Data collection and data visualization
The first and foremost step in the implementation procedure is the data collection.Here, sensors such as pH sensor, EC sensor, temperature sensor, total dissolved salts (TDS) sensor, turbidity sensor, humidity sensor, and light sensor were deployed in the aeroponic lettuce growth tower.The data were collected from the growth tower at regular intervals of time, sample data is represented in the Figure 2. To easily understand the data distributions, data visualization techniques like bar charts (univariate data representation technique), correlogram (bivariate data analysis technique), and Andrews curve were utilized and implemented using the Python packages with the help of Python programming language.
From Figure 2 (a-j), the input parameters are represented individually with the help of bar plots.
Correlogram of the input dataset highlights the correlation between the input variables.Here, in Figure 2(l), the considered lettuce growth parameters were less correlated with the other parameters.This showcases that the parameters are independent of each other i.e. one cultivation parameter will not affect another parameter which is necessary for efficient lettuce growth and yield prediction.One of the most important steps in the machine learning implementation is the pre-processing of the dataset for efficient prediction output.Here, in the aeroponic lettuce crop yield prediction system, the outliers are the major cause of higher error rates and low prediction accuracy.Hence, the removal of the outlier's mechanism is incorporated for effective prediction by the regression models.The dataset size is represented below before pre-processing as the old shape and after pre-processing as the new shape of the dataset.

Dataset splitting
Once the data is collected, pre-processed and ready for the implementation process, there is a necessary step called data partitioning or splitting of the data, before the data is fed into the ML model.In the case of the efficient implementation of the classification or regression model, the data has to be split into two: training data and testing data as shown in Figure 4.The actual work of the implementation phase begins now.A structured methodology is used to train and evaluate machine learning regression models for predicting aeroponic lettuce crop yields.The collected, analyzed and pre-processed datasets were fed into all four machine-learning models for training purposes.Once, the training of the models is done, next comes the testing phase.The test dataset is supplied to the trained machine learning models for testing the performance of the models.The testing scores were recorded and based on the produced results, the process called hyper-parameter tuning is carried out to achieve better results further.The detailed description of the results produced by the models was described in the results and discussions section.

System requirements
The system requirements that are essential to carry out the result analysis were the Anaconda Navigator, Jupyter Notebook with the Python programming language, and the desktop system or the personal computer or the laptop with the storage provided in the system or the laptop.
In this section, the detailed notes on the performance of different machine learning models were described elaborately.The best model was chosen based on the error rates and the prediction accuracy produced by the model, i.e. how accurately the regression model predicts the yield of the lettuce crop in the aeroponic environment.

Evaluation of the ML models using the performance metrics along with performance analysis
Performance metrics are the fundamentals used for assessing the performance of the machine learning regression models based on the produced prediction output from the actual values and interpreting the accuracy of the predictions.The most commonly used evaluation metrics in lettuce yield prediction analysis are listed below.

Mean squared error (MSE)
It is the average of the squared differences between the predicted values (xi) and the actual values (yi).It penalizes larger errors more heavily.
The MSE score of the implemented models is given in Table 2 and Figure 5.All these regression models produce different error rates and linear regression shows less performance accuracy when compared to other regression algorithms.

Root mean squared error (RMSE)
It is the square root of the MSE.It provides the measure of the average magnitude of the errors in the predicted values, in the same units as the response variable.

𝑅𝑀𝑆𝐸 = √𝑀𝑆𝐸 (10)
The RMSE score of the implemented models is given in Table 3 and Figure 6.
The XGBoost regression algorithm produced a minimum rmse score than the other regression algorithms.Next random forest regression algorithm produces an error rate less than the other five regression algorithms.The maximum rmse score is produced by the linear regression model.4 and Figure 7.The MedAPE score of the implemented models is given in Table 6 and Figure 9: The RMSLE score of the implemented models was given in Table 7 and Figure 10: R-squared metrics represent the proportion of the variance in the independent variable that is predictable from the independent variable.Usually, this metric ranges between 0 and 1.A higher R-squared value indicates a better fit of the model to the data.The R-squared score of the implemented models is given in Table 8 and Figure 11.

Comparative analysis of the performance metrics
The performance analysis of all the utilized ML models has been done in this sub-section.Various results produced were depicted in the Table 9 and Figures 12, 13.

Figure 12. Performance graph of the utilized regression models
All these models are compared individually only with my collected dataset.Based on the results produced (refer Table 9, Figure 12 and Figure 13) by linear regression, Support Vector Regressor with their kernels: sigmoid, linear, radial basis function (RBF) and polynomial, random forest and XGBoost regression, it is observed that there is a consistent improvement in the predictive performance and decrease in the error metrics respectively.It should be noticed that a higher value of 0.89 R-squared metrics is shown by the XGBoost regression model.

Prediction graphs
The graphs that showcase the predictive performance of the supervised and unsupervised machine learning classification and regression models by describing the complex relationships between the original (actual) values and the predicted values are termed prediction graphs.These graphs are used to perform a comprehensive analysis of various prediction algorithms to depict the efficacy of each algorithm separately.These graphs, not only highlight the individual strengths of each model but also contribute valuable insights for understanding the applicability of each model in predicting the complex relationships between the variables or parameters within the dataset.In this research work, Figure 14 (a) linear regression prediction graph highlights the linear relationship between the input parameters (actual values) and the output parameter (predicted values).From Figure 14, it is clear that the prediction accuracy gradually increases from support vector kernel-sigmoid, rbf, linear to polynomial kernel.These kernels exhibit distinctive patterns across each kernel which represents the average fit of the dataset and enhances the model's ability to capture the non-linearities.Next comes the random forest and the XGBoost regressors that showcase remarkable accuracy, illustrating their robustness to outliers and noise by capturing the complex relationships within the dataset.

Choosing the best model using the training and validation loss curves
In simple terms, both these curves: the training loss curve and validation loss curves are crucial in machine learning regression as these curves showcase the generalization ability of the ML model on the unseen data i.e., the model should have the capability to generate the same type of output produced on the seen data (to predict the lettuce crop yield in our case) when it is exposed to the unseen dataset from the external environment.

CONCLUSION AND FUTURE SCOPE
In summary, the purpose of the study was to optimize lettuce crop growth by integrating precision agriculture practices with intelligent techniques.Also, the results comprehensively analyzed the performance of several machine learning regression models in the context of a vertical aeroponic farming system to make accurate predictions of lettuce production.We have gained valuable insights into their effectiveness in handling the complex interactions between environmental variables such as pH, EC, temperature, total dissolved salts (TDS), turbidity, humidity, light and growth in days that are inherent to aeroponic cultivation as a result of our in-depth analysis and comparison of models such as linear regression, support vector regression, and random forest regression.This was accomplished through rigorous analysis and comparison of these models.
According to the findings of our research, XGBoost surpasses the others in terms of error rates, accuracy and predictive power, demonstrating its potential as an excellent option for the prediction of lettuce production in aeroponic vertical farming.However, it is crucial to note that there are multiple aspects of agricultural systems and the selection of the most appropriate model may change depending on certain environmental conditions.This is something that has to be acknowledged.
This research work makes a significant contribution to the expanding body of knowledge in the field of precision agriculture, specifically the aeroponics indoor farming systems by providing practical recommendations on the application of machine learning regression models to the problem of maximizing the output of lettuce grown in aeroponic conditions.The research work enhances crop prediction in vertical farming systems, paving the way for future research and technology interventions to improve agricultural practices, reduce environmental impact, and enhance crop production.It also encourages competition in crop markets by incorporating diversification and crop rotation strategies, minimizing resource usage and promoting shortterm growth while minimizing pests, diseases, and climatic variability.
The future development of the Aeroponic Lettuce Yield Prediction System is focused on enhancing its accuracy and reducing errors.This involves investigating various factors such as environmental conditions, nutrient levels, plant growth patterns, and more.In addition, the team plans to employ advanced machine learning techniques like ensemble learning and data augmentation to optimize model performance.Realtime sensor data integration and leveraging pre-trained models are also part of the roadmap to further boost prediction capabilities.To make the system easy to use for farmers and operators, an intuitive interface with clear visualizations and actionable insights will be implemented.
(a) Sample dataset (b) pH data distribution in the dataset (c) TDS data distribution in the dataset (d) Temperature data distribution in the dataset (e) EC data distribution in the dataset (f) Turbidity data distribution in the dataset (g) Humidity data distribution in the dataset (h) Light data distribution in the dataset (i) Growth data distribution in the dataset (j) Yield data distribution in the dataset (k) Andrews curve of the pH dataset (l) Correlogram (Correlation) of the input dataset

Figure 2 .
Figure 2. Sample dataset and dataset visualization techniques 3.3.2Data preprocessingOne of the most important steps in the machine learning implementation is the pre-processing of the dataset for efficient prediction output.Here, in the aeroponic lettuce crop yield prediction system, the outliers are the major cause of higher error rates and low prediction accuracy.Hence, the removal of the outlier's mechanism is incorporated for effective prediction by the regression models.The dataset size is represented below before pre-processing as the old shape and after pre-processing as the new shape of the dataset.
(a) After preprocessing the dataset (b) Boxplot representation after preprocessing

Figure 3 .
Figure 3. Data preprocessing The boxplot represents the dataset after pre-processing.The x-axis provides the different features of the lettuce growth dataset i.e. [0-8] is [pH to Yield] collected from the aeroponic vertical farming tower which is highlighted in Figure 3 (a) and (b).

Figure 4 .
Figure 4. Dataset splitting 3.3.4Machine learning implementation: Model training and model testingThe actual work of the implementation phase begins now.A structured methodology is used to train and evaluate machine learning regression models for predicting aeroponic lettuce crop yields.The collected, analyzed and pre-processed datasets were fed into all four machine-learning models for training purposes.Once, the training of the models is done, next comes the testing phase.The test dataset is supplied to the trained machine learning models for testing the performance of the models.The testing scores were recorded and based on the produced results, the process called hyper-parameter tuning is carried out to achieve better results further.The detailed description of the results produced by the models was described in the results and discussions section.

Figure 5 .
Figure 5. Graph for MSE score

Figure 6 .
Figure 6.Graph for RMSE score of the utilized models 4.2.3Mean absolute error (MAE)It computes the average absolute differences between the predicted (xi) and the actual values (yi), providing the measure of the average magnitude of errors.The MAE score is highlighted in Table4and Figure7.

Figure 8 .
Figure 8. MAPE Scores of the utilized models 4.2.5 Median absolute percentage error (MedAPE) It is the median of the absolute percentage errors, making it less sensitive to outliers than MAPE. =  (   −     ) × 100 (13)

Figure 10 .
Figure 10.RMSLE scores of the utilized models 4.2.7 R-squared metrics (Coefficient of determination)R-squared metrics represent the proportion of the variance in the independent variable that is predictable from the independent variable.Usually, this metric ranges between 0 and 1.A higher R-squared value indicates a better fit of the model to the data.

Figure 13 .
Figure 13.Accuracy of the utilized models

( a )Figure 14 .
Figure 14.Prediction Graphs of the utilized models

Figure 15 .
Figure 15.Training and validation loss

Table 4 .
MAE scores MAE Scores of the utilized models4.2.4 Mean absolute percentage error (MAPE)The MAPE expresses the errors as a percentage of the actual values, providing a relative measure of accuracy.Below presented Table5and Figure8highlights the obtained MAPE scores of the model.

Table 6 .
MedAPE scoresFigure 9. MedAPE scores of the utilized models 4.2.6 Root mean square logarithmic error (RMSLE) It is the measure of the average difference between the logarithm of the predicted (xi) and the actual values (yi).It is particularly useful when the target variable has a wide range.

Table 8 .
R-squared scores Figure 11.R-squared scores of the utilized models

Table 9 .
Consolidated evaluation metrics of the ML models