Predictive Modeling of Daily Evapotranspiration in Arid Regions Using Artificial Neural Networks

ABSTRACT


INTRODUCTION
The hydrological cycle's second-most significant component is the evapotranspiration (ET).Evapotranspiration is defined as the process in which water leaves the Earth's surface and plant bodies as vapor, then reaches the atmosphere.It is considered one of the most important calculations of the water balance in arid and semi-arid environments [1,2].It must be accurately estimated for different reasons which include: climate research, water balance, and water management in various agricultural activities [3][4][5].
Huge efforts from researchers, to introduce different studies to manage water resources have been achieved.Large number of such studies interested in the hydrological cycle parameters mostly the water losses that resulted from the evapotranspiration at the arid and semi-arid regions [6].Since direct measurement of the rate of evapotranspiration is challenging and the process is imperceptible, number of models have been introduced to estimate it, based on the meteorological data that were collected under various climatic circumstances [7].The physical-based complicated Penman-Monteith (PM) equation using full meteorological data is the only suggested approach for ET estimation [8,9].For many years, one of the main issues in hydrology was how to measure transpiration from various types of plants and evaporation from open water surfaces.The method that was used for detecting evaporation is called X, and it might not be accessible in certain climate stations.If it is, evaporation measurements could occasionally be unreliable because of things like improper maintenance or algae development within the tank [10].Most studies indicate that the interrelationships that explain the occurrence of the process of evapotranspiration and the factors affecting its occurrence are unclear and non-linear [11].The capability of artificial intelligence to mimic human brain functions gives it the ability to manage and understand complex problems and situations [7,11,12].
Due to the importance of the evapotranspiration process, especially in dry areas where water resources are scarce, this study emerged in an attempt to create a predictive mathematical model using artificial neural networks and devise a predictive equation which is lacking in the study area for estimating the quantities of water losses, resulting from evapotranspiration by identifying the climate variables that control the evapotranspiration process and use them as input to the predictive model from the weather station in Kafr El-Sheikh zone as inputs, in addition to the reference evapotranspiration values which were calculated through the Penman method -Monteith equation (PM), to develop a predictive model for evapotranspiration (ET).Seven different ANN models were trained and tested using the datasets.These models include various combinations of five diurnal meteorological variables which are: precipitation (P), wind speed (u), maximum and minimum air temperature (Tmax and Tmin), and dew point temperature (Tdw).The optimization method was a feedforward multilayer artificial neural network.For the purpose of creating the final structure, Tansigmoid transfer function with the structure 1-5-6 was used.The study showed that adding the time indicator as an input to all groups greatly improved the process of predicting ETo values, and that involving the rainfall component as a variable had no effect on the network's performance.
Patel et al. [14] predicted weekly reference evapotranspiration (ETo) in Sikkim region, India using multiple linear regression (MLR) method and artificial neural network (ANN).
The Penman Monteith method was used to estimate the value of ETo based on daily meteorological data for the period from 1985 -2009.To develop the ANN and MLR model, based on the data set, several different sets of meteorological parameters were created.For the purpose of examining the efficiency of the models that were developed using ANN and MLR, several performance indicators were used, such as the coefficient of determination (R 2 ), root mean square error (RMSR), mean absolute relative error (MAPE), mean absolute error (MAE), and Nash-Sutcliffe efficiency (NSC).
This study concluded that using a smaller number of meteorological variables and additional parameters increases the prediction efficiency of ANN and MLR models.Üneş et al. [5], estimated daily evapotranspiration (ET) using the Penman-Monteith equation and the classical Hargrens-Samani-Turk equation and compared the results with an artificial neural network (ANN) model in Lake Hartwell -USA.Relative humidity (RH), wind speed (U), solar radiation (SR), average daily air temperature (T), and maximum and minimum daily air temperatures (Tmax, Tmin) were used.An artificial neural network with feed forward and feedback was used to build the model.The study concluded that the performance of the ANN model in estimating daily ETo is better than other models.An analysis of the empirical equations and the ANN model reveal that the ANN model performs better than the other models in estimating daily ET.
Ali and Faraj [10] used the Penman-Monteith equation to estimate total daily evapotranspiration by calculating water loss by daily evaporation using data from 22 climate stations in Iraq for period of time from 2004 to 2013.The correlation coefficient is strong, according to the results.This study offered and suggested a technique to use the computed evaporation value, to determine the amount of evaporation for missing data for climatic stations in Iraq.The distribution of transpiration and evaporation in Iraq is illustrated in GIS maps.In their work, Pakhale et al. [15] estimated the daily evapotranspiration (ET) of reference grass crops by utilizing ANNs.Performance indicators were utilized to determine the performance of ANNs and traditional Penman-Monteith models, which were utilized for calculating ET.Using the MARS tool and the trial-and-error methodology that produced the best error statistics, and based on the above.Therefore, the input parameters and ANN model structure were chosen.Different ANN structures including BPNN, RBFNN and GRNN were employed.The results of the model indicates that the BPNN architecture is suitable for predicting the evapotranspiration of reference crops.Later, Aghelpour et al. [16] used machine learning techniques to estimate the daily evapotranspiration by utilizing an indirect method.Three meteorological stations were used for collecting seven daily meteorological variables from 2003 to 2016.To estimate the evapotranspiration, different techniques such as: Group Method of Data Handling (GMDH), Generalized Regression Neural Network (GRNN), Multilayer Perceptron (MLP), Radial Basis Function (RBF) were utilized.It was found that the GMDH model produced the best estimation, compared to the other models.
On the other hand, Bidabadi et al. [17] introduced a new model that was built utilizing the artificial intelligence to predict the evapotranspiration in Iran which has an arid region.To achieve this goal, they employed different techniques such as: ANN-gray wolf optimization (ANN-GWO), fuzzy neural adaptive inference system (ANFIS), and artificial neural network (ANN).They found that the ANFIS produces the best estimation for the ET.
Recently, Aly et al. [18] used super learning techniques to estimate the evapotranspiration.This technique is used when there is a lack in data.Four machine learning technique models were employed in the study which are: K-Nearest Neighbor, AdaBoost Regression, Support Vector Regressor, and Extra Tree Regressor.It is found that the model has a high performance in predicting the ET.
Despite the good estimation of the mentioned models, these models are not applicable in the region of the current study which has different climatic circumstances and for the limited meteorological information in this region.
The objective of current study is to demonstrate the feasibility of ANN techniques in modeling daily reference ET using climatic parameters in the arid area, Ramadi city in Iraq.

STUDY AREA AND DATA COLLECTION
Ramadi city, within Anbar province, is located at the western part of Iraq.It was chosen as a study area, as shown in Figure 1.Little rainfall, high temperatures, low relative humidity and a significant difference between day and night temperatures are the most important features of climate of this area.The maximum temperature may exceed 50℃ during summer, while it may decline to reach 9℃ during winter.The average annual evaporation may reach 3000 mm.The dryness coefficient, that represents the evaporation divided by rain, ranges from 25 to 35 [19].In the present study, the climatic data for time period from 23/11/2020 to 1/10/2022, were utilized to predict the evapotranspiration.This data is collected from the digital meteorological station that was installed at the Upper Euphrates Basin Developing Centre at the University of Anbar, Ramadi, Iraq.The station, which has 50 m altitude above mean sea level, is located at 33.43° N latitude and 43.33° E longitude.The daily climatic data includes: evapotranspiration, wind speed, minimum temperature, maximum temperature, average temperature, relative humidity, and solar radiation.

ARTIFICIAL NEURAL NETWORKS (ANNs)
The design of ANNs was inspired by the brain's neural system.The majority of ANN models include three layers: the input layer that provides data for the model, the hidden layer which represents the detector for the input parameters characteristics and the output layer that reflects the model's reaction to specific inputs.In present study, two methods of artificial intelligence were used for modelling the evapotranspiration in arid region.The introduced models are simple to apply, time saving while producing results, and have a significant accuracy in prediction in the scope of hydrology.

Radial basis function neural network (RBFNN)
Figure 2. General architecture of a RBFNN [20] Powell [20] was the first to introduce the RBFNNs for solving the interpolation problem in a multidimensional space with as many centers as data points.RBFNNs are based on a concept of function approximation.The interesting feature of RBFNNs, is the presence of the linear learning algorithm that is rapidly built in the network, which is able to model complicated non-linear mapping.RBFNNs are currently being investigated in numerical analysis and involved in machine learning techniques.The Euclidean distance from the point being evaluated to the center of each neuron is estimated, and a radial basis function (RBF) is applied to the distance, to determine the influence for each neuron.As shown in Figure 2, a radial basis function neural network consists of a threelayers network.The layers include: input layer, hidden layer and output layer.
The input layer can include several predictor variables, which are connected to the independent neurons in the hidden layer.Multiple input vectors were propagated from the input layer to the hidden layer.The hidden layer includes a number of radial basis function units (ℎ) with Gaussian kernels and bias ().The j-th Gaussian function is determined by a center () and a width () [21].The distance between the input vector (x) and the Gaussian function center (cj), Euclidean distance, is calculated by the RBFNN classifier, then the nonlinear transformation is conducted in the hidden layer as presented in Eq. ( 1) bellow: where, ℎ  represents the output of the j-th neuron in the hidden layer.On the other hand, the linear operation of the output layer is shown in the following equation: where,  represents the k-th output unit for the input vector x, wkj represents the weight connection that links the j-th hidden layer unit and the k-th output unit, and bk represents the bias.So, weights (1, 2, …, ℎ) are associated with each neuron in the output layer.The value output of a singular neuron in each hidden layer is multiplied by the weight associated with the neuron and transferred to the summation, which sums up the weighted values and displays the sum as the network's output.The bias value has been multiplied by a weight bk and provided into the output layer.

Generalized regression neural network
The generalized regression neural network (GRNN) is a variant to radial basis neural networks.It is proposed by Specht in 1991 [22].It can be used for prediction, classification, and regression.GRNN is an enhanced approach which is built based on nonparametric regression.The training sample, in GRNN, represents a mean to a radial basis neuron.
GRNN has a high accuracy in the estimation with using single pass learning, therefore it does not require backpropagation [23].The GRNN can be mathematically represented by the following equation: where, Y(x) is the prediction value of input x, K(x, xk) is the Radial basis function kernel (Gaussian kernel)and yk is the activation weight for the pattern layer neuron at k.

METHODOLOGY
RBFNN and GRNN models have been used for predicting the evapotranspiration in Ramadi city.MATLAB 2020 Ra software was employed to analyze the evapotranspiration data.The Max temperature, Min temperature, Avar.Temperature, Humidity, Wind speed and Solar radiation were used as input for ANN parameters.The hidden layer of the RBFNN has a structure of six neurons, while it is one neuron for hidden layer of the GRNN model.The structure of the neural network model was found by applying the trial and error concept.Figure 3 illustrates the architecture of ANN model.For ANN modelling purposes, many functions can be used.By trial and error, the best transfer function equation for present model was TANCH (Hyperbolic Tangent) for both input and output layers.
Levenberg-Marquardt backpropagation (trainlm) technique was utilized in the present paper, which is recommended for most cases [24].To get the optimum generalization, the dataset should be divided into three parts: training, validation, and test.In present study, the input data was split into three sets.The first one is the training set which represents 70% of data.The second and third sets took up to 15% for both test and validation sets.
The training dataset is utilized to train the model by reducing the error of this dataset during training.The validation dataset is used for evaluating the performance of a neural network on patterns that were not trained.The tested dataset is used for assessing the overall performance of a trained and verified network [25].
The dimension variables must be eliminated to ensure that equal attention is required for each variable during training, which can be realized by scaling of all variables before training.The variables scaling depends on the output transfer function.The scaling of inputs and outputs data will have unity standard deviation and zero mean [26].The data of present model was scaled based on the output transfer function (TANCH) by using the following equation: where, Xn is the scaling value, X is the raw value, Xmax is the maximum value, and Xmin is the minimum value.After completing the modeling, Performance evaluation criteria are used for checking the model's accuracy depending on the validation.By estimating the relation between the observed and the predicted data the accuracy will be achieved.In present study, four well-known metrics: RMSE, MAE, RE, and R 2 were used for assessing the performance of the two models.
One of the most important criteria for selecting models is the test samples.Test set is the set of data that used for evaluating the final performance of a trained model.It serves as an unbiased measure of how well the model generalizes to unseen data, assessing its generalization capabilities in realworld scenarios.By keeping the test set separate throughout the development process, we obtain a reliable benchmark of the model's performance.
The test dataset also helps gauge the trained model's ability to handle new data.Since it represents unseen data that the model has never encountered before, evaluating the model fit on the test set provides an unbiased metric for its practical applicability.This assessment enables us to determine if the trained model has successfully learned relevant patterns and can make accurate predictions beyond the training and validation contexts.

Descriptive analysis
The fundamental characteristics of the datasets in the current study, such as the minimum, maximum, mean, and standard deviation for the meteorological parameters' dataset, were described using descriptive statistics.
The ET data ranged from 0.33 mm to 13.8 mm with mean 5.46 mm, Max Temp ranged from 9.10℃ to 48.10℃ with mean 26.95℃ while Min Temp.ranged from -1.7℃ to 33.60℃ with mean 14.31℃.

Correlation matrix
Creating a correlation matrix is a statistical method that is used for assessing the relationship between two variables in records of a dataset.Each cell in the matrix table has a correlation coefficient.The correlation coefficient has a value equal to (1) for strong association, while (0) value represents a neutral relationship, and (-1) value represents a weak relationship between the variables.
Table 2 shows a strong positive correlation between ET and Solar_Rad, Max_Temperature, Min_Temperature and a negative relationship between ET and relative humidity.
Meanwhile, there was a moderate relationship between ET and WS.Furthermore, the table also shows that there are strong relationships between independent variables, such as the relationship between Max_Temp, Min_Temp and Avar_Temp.In addition to the above, there are weak relationships between temperatures with wind speed.

Validation of model accuracy
The model's performance must be evaluated to investigate the capability of the model to achieve an accurate estimation of the evapotranspiration.Some statistical directories can be used for examining the output of model against the actual records of the evapotranspiration value.In the present model the statistical directories such as Mean Absolute Error (MAE), Root Mean Square error (RMSE), relative error (RE), and coefficient of determination (R 2 ) were used.The following equations were utilized to calculate the performance criterion of models between output and the measured values.Table 3, demonstrates the measured statistical directories for the present model.The value of statistical directories in Table 3 with coefficient of determination values shows that the model has a high accuracy with a very strong positive relationship between measured and predicted values.For all statistical indicators in Table 3, the GRNN model outperformed on the RBFNN model in the prediction of ET.This is clear in the value of RMSE, MAE, RE and R 2 for two models.Below are the equations that were used in the calculation of statistical directories [27].
where, Mi is the measurements, Pi is the predicted value.

The measured and predicted relationship
The data that was recorded at the meteorological station for about 540 days were employed in present model.This data was utilized to ensure the ability of the ANN model for prediction of daily evapotranspiration.The ANN is used for identifying the relation between observed values and predicted values, which is based on the number of hidden layer nodes and transfer functions.The best models are selected based on the idea of trial and error by continuous training to obtain the best value of coefficient of determination (R 2 ) in the test stage with the smallest error, indicating that the best prediction performance has been achieved.By trial and error, the optimal structure of GRNN and RBFNN model was one hidden layer and three hidden layers, respectively.The (R 2 ) value between measured and predicted values was 98.3% and 97% for GRNN and RBFNN model, respectively.Figure 4 and Figure 5 illustrate the scatter plot between the predicted and measured values for the two forementioned models.
The GRNN and RBFNN models with six inputs data of climatic were utilized to predict the evapotranspiration.The results demonstrated that the GRNN and RBFNN models were able to predict ET with high accuracy.A nonlinear equation has been found that can be utilized to predict daily evapotranspiration.This equation represents the first model for determining the evapotranspiration in the study area.the new equation is very useful for calculating the losses of evapotranspiration in Ramadi city, which contributes to the management of water resources in the city.
On the other hand, the relation between measured and predicted values for both models can be shown in Figure 6 and Figure 7.In these figures, a high convergence between measured ET and predicted ET can be noted for both models.

Sensitivity analysis
To investigate the input elements and find the element that has the most effect on evapotranspiration, the modified Garson algorithm method has been used.This method is used for determining the relevance of input elements.It is simple to be used for determining the relative relevance of every single element inside a network by dividing the neural network link's output weights [28].The algorithm specifics are shown in the equation below.
where, -  is the percent of input variable effect on output.
-  is the weight of a connection between input neuron i and hidden neuron j. -  is the weight of the connection between hidden neuron j and output neuron k.
is the weighted connection sum between the N input neurons and the hidden neuron j.
Figure 8 demonstrates the sensitivity analysis and the relevance of the input elements on the evapotranspiration losses.

Figure 8. Importance of input factors on evapotranspiration
Figure 8 illustrates that the most important input factors to determine the evapotranspiration losses are the solar radiation and wind speed -their importance exceeds 60%.The influence of humidity (17%) followed by Maximum temperature (13%) have the second significance effect to determine the evapotranspiration losses.From the results of sensitivity test of the model, the solar radiation and wind speed can clearly have an effect on the evapotranspiration losses.A slight increase in solar radiation and wind speed can lead to an increase in evapotranspiration.On the other hand, the average and minimum temperatures have no impact on the evapotranspiration.Undoubtedly the results of sensitivity analysis of the present model between the evapotranspiration losses and six aforementioned parameters varies from one region to another, according to the condition of the study area.

ANN model equations
Due to the importance of the evapotranspiration process in water resource management, especially in arid regions where water resources are scarce, this study developed a predictive mathematical model using artificial neural networks technique and devised a predictive equation that is lacking in the study area for estimating the quantities of water losses resulting from evapotranspiration.

CONCLUSION
The artificial neural network has a high capability of predicting the evapotranspiration.Developing an equation can be employed to predict the daily evapotranspiration losses in arid regions.According to the sensitivity analysis of the data, the solar radiation and wind speed have the most influence on evapotranspiration followed by humidity and maximum temperature.On the other hand, the average and minimum temperatures have an unnoticeable effect on the evapotranspiration losses.According to the statistical criteria and coefficient of determination, the GRNN model outperforms the RBFNN model in predicting the evapotranspiration.According to the findings of this study, the GRNN model is a very useful tool for calculating evapotranspiration.The sensitivity analysis shows that solar radiation and wind speed are more important input factors of evapotranspiration.The neural network technology that was employed in this paper can be widely used in improving water resources management.By understanding the elements of the hydrological cycle, more effective and practical decisions can be made in managing available resources and reducing evapotranspiration.The importance of the current study comes from the fact that the study area lacks data and scientific research in this field.In addition to availability of few numbers of weather stations.

Figure 1 .
Figure 1.The area under study

Figure 6 .Figure 7 .
Figure 6.The measured and predicted evapotranspiration for GRNN model

Table 1
depicts the descriptive statistics of meteorological.

Table 1 .
Descriptive analysis for meteorological parameters dataset

Table 2 .
Pearson's correlation coefficient between ET and other meteorological parameters

Table 3 .
The statistical directories for the model