JOURNAL METRICS

CiteScore 2023: 2.5 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: 0.239 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: 0.67 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

qqtu_pian_20240428144739.png

A Deep Learning Approach for Water Quality Assessment: Leveraging Gated Linear Networks for Contamination Classification

Nanduri Ashok Kumar^* | Jeevanantham Vellaichamy

Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Chennai 600062, India

Corresponding Author Email:

vtd1102@veltech.edu.in

Received:

23 September 2024

Revised:

16 December 2024

Accepted:

14 February 2025

Available online:

27 February 2025

| Citation

isi_30.02_06.pdf

OPEN ACCESS

Abstract:

Water pollution, particularly high ammonia concentrations, poses a significant threat to aquatic life, especially fish. Previous attempts to classify water pollution levels using traditional methods have often faced limitations in handling complex attribute relationships. To address these challenges, this study proposes a novel approach utilizing Gated Linear Networks (GLNs). By considering key water quality parameters such as pH, dissolved oxygen, temperature, turbidity, and ammonia, GLNs offer improved computational efficiency and model interpretability compared to nonlinear models. Our experimental results demonstrate that the proposed GLN-based model surpasses traditional methods in accurately classifying water contamination levels. This advancement has significant implications for water quality monitoring and management, contributing to the preservation of fish habitats and the overall health of aquatic ecosystems.

Keywords:

GLNs, ammonia, water quality, deep learning

1. Introduction

Water is that quality that those in aquatic environment [1] pending their services as barometers of the state of an ecosystem and the health standards of the people. One prime element that falls under this category is ammonia, this is a very dangerous water pollutant [2] that is lethal to water sources and is known to be toxic to fish and other forms of water life [3-6]. Ammonia concentration is very harmful to the metabolism of fish and growth [7] and development and every moment it is possible to eliminate all the fish from the water bodies. As such monitoring of ammonia concentration [8] is a very important activity in utilization and management of water resources [9] and in environmental sustainability [10-12]. They involve restricted movement, the actual near real time tracking of pollution and the track and trace mechanism as customers may not frequent sampling areas take samples, transport them to laboratory and wait for results [13]. While it has been made possible using smart devices available to monitor sensors as well as acquire data, there is always a challenge on how to model the pattern to classify the degree of water contamination obtained from the data acquired.

Existing methods [14-25] that have been applied to water quality classification have used several of these other standard ML methods including the SVMs together with the RFs. But in a way and manner where non form association of water quality parameters [26-29] with contamination degree such models sometime go wrong. But the two best suitable for the problem solution are the Convolutional Neural Networks (CNNs) and the Recurrent Neural Networks (RNNs) but it is not addressing the challenges in existing water quality assessment methods, such as limitations in handling high-dimensional data, real-time adaptability, and accuracy under dynamic environmental conditions. These shortcomings underscore the necessity for advanced methods like the proposed GLN model, which leverages deep learning to overcome these issues [30-38]. They are useful particularly in the process of extracting spatial representations from image data but in the sample water quality data there [36-38]. This is probably because the RNNs that are developed for use with sequence data seem to have some limitations in respect of the interdependence of the water quality parameters. To avoid such limitations, the above research recommended a new method that utilizes Gated Linear Network (GLNs) to classify the level of water contamination depending on some of the quality factors are pH, dissolved oxygen level, temperature, turbidity, and ammonia level. Informal insightful, it is still likely that GLNs are more than conventional models because the linear models with the acknowledgement of non-linear gating mechanisms. Hence this study can worsen the positive aspects of GLNs to ensure there will be sufficient categorization of water pollution to improve the utilization and conservation of water resources.

2. Literature Survey

Currently, an increasing number of publications are aimed at water quality assessment and prediction, and within this topic, different levels of ML models are used or considered depending on the problem’s complexity. Of the major classical techniques which may be incorporated into use, it is possible to determine the water quality parameters using the Support Vector Machines or Random Forests. Palani et al. [9] applied the ANNs of marine water quality while Castrillo and García 2008 employed multi-factor linear regression (MLR) and Random Forest (RF) models of riverine water quality. In the analysis, the following techniques were used: RF, MLR models and ANN which Zambrano et al. [8] applied in the assessment of water quality of fish farming reservoirs. Therefore, the above-mentioned research studies offered an excellent example of how the types of mentioned models can be applied for water quality prediction While it is correct to state that the kind of pattern that is present in the considered data sets can be captured rather effectively with the help of such models however their ability to capture the type of pattern that is usually recognized as the most challenging can be deemed rather ineffective at best. Earlier, deep learning has described how DL modelling can be applied in the analysis of water quality in the following manner. Some of the traditional models include the Convolutional Neural Networks (CNNs) that are primarily used in imaging and the same have been used in the prediction of water quality. As demonstrated in previous studies [11. 12], most models focused on CNNs used for spectrum analysis, which was achieved using Fourier analysis. On the contrary, we would see that if there is a very common technique used together with water quality data known as time series then the use of CNNs might be greatly restricted. The temporal dependencies have been addressed when it comes to extraction of information from water quality databases for example by using Recurrent Neural Network (RNNs) especially with Long Short-Term Memory (LSTM) network. For the cage-cultured environment, forecasts have been developed using LSTM deep neural networks [13]. Additionally, IoT applications for monitoring water quality in cage-culturing environments have been explored [14]. However, questions remain regarding the temporal behavior of these RNNs, their sophistication, and the computational cost, which are major limitations of these studies.

Models have also been proposed with using in one or another the architecture of deep learning to increase the degree of reliability of the prediction. Ahmed et al. [15] have used gradient boosting as well as MLP from the ensemble methods and the neural networks listed in the study. Other works with hybrid models have shown promising results. For example, nine-layer MLP models combined with KNN imputation have been utilized, demonstrating that these models can operate optimally even with gaps in the data [16]. Additionally, it has been noted that while industrialized aquaculture may benefit from certain techniques like IG, SVM remains a suitable choice in some contexts. Meanwhile, other traditional machine learning methods are widely used across various fields [17]. Other machine learning approaches for dissolved oxygen include the use of Support Vector Machine (SVM), Non-Linear Modelling using Multi-Layer Perceptron (MLP), and the dynamic pattern of dissolved oxygen analyzed by linear Long Short-Term Memory (LSTM) networks [18].

There was also the use of hybrid modelling as the authors stated, which is indeed very efficient in the modelling of all the necessary aspects of water quality data. However, in recent enhancement of the models, other models have been designed in a bid to enhance the capability of prediction proficiency. For instance, da Silva and his fellows proposed the toxicity-warning sensor for water quality monitoring based on LCA model and ML which is both are physical and ML based models. Chen et al. [21] proposed the designing of the intelligent variable flow equipment for the quality water and the authors mention that the efficiency of the control systems is a key factor to water quality. However, Yang et al. [22] employed CNN, GRU and attention mechanism based deep learning network for predicting the water quality of RAS and they stated that more than one deep learning process is useful for improvement, and they achieved an improvement of more than one than the deep learning process. Moreover, the use of integrated models such as ANN-WT-LSTM has been demonstrated to enhance predictive ability in applications like the Jinjiang River [23]. For instance, improved models have been proposed using wavelet decomposition combined with W-ARIMA and GRU neural networks for water quality analysis in Beijing [24]. Chen et al. [25] also used LSTM and at the same time attention based on long short-term memory (AT-LSTM) applied to water quality of Burnett River in Australia and focused mainly on attention to data. Based on the study [26], which discusses the analysis of Kalman-filtered LSTM with attention, it becomes clear that the application of the Kalman filter as well as connection with deep learning helped to enhance the Haimen Bay data prediction. However, Farzana et al. [27] did not use either XGBoost & GRU to analyze the Toowoomba reservoirs an issue of the best strategy in water management under the climate condition was given. These works confirm rich in the range of the variety of the ML techniques and the use of their combinations for the water quality prediction, and it is mentioned that the application should be different for some water bodies for the certain goal or objective of monitoring.

3. Proposed Methodology

This research aims at using the proposed GLNs in categorizing contamination degree of water by using water quality indices (Figure 1). Thus, GLNs can reside as a good stand in linear models as well as the gating mechanisms that contain a nonlinear eqn. At the same time, for this problem, really non-scalar structures can be too excessive – CNNs and RNNs, while GLNs let track the relationships within the data with enough number of layers and, simultaneously, provide demand in computational performance. At the gates, information can be filtered hence there is specific information on the pattern of the water quality in the model of the GLNs.

1.png

Figure 1. Proposed GLN architecture

GLNs utilize a combination of gated mechanisms and linear transformations to efficiently learn complex relationships in data. The model includes layers with gated units that adaptively control the flow of information, ensuring robust feature learning. Key parameters, such as the number of gated units in each layer and activation functions, were optimized to achieve high accuracy while maintaining computational efficiency.

Initialize Parameters

Input vector: The input vector $X_t$ is the data taken by the GRU cell at time step $t$ . We can define the input vector of words, the output vector of words in NLP, the time series data and the extraction features of the extracted data and so on time series data analysis.

$X_t$ denote the state of $X_t$ in time step $t$ .

Hidden state: The hidden state $h_t$ is a vector that contain the memory of the GRU for that particular time step $t$ ; it considers the current input $X_t$ and the previous hidden state $h_{t-1}$ .

$h_{t-1}$ from previous time step where transition probabilities of movements are conditioned with respect to current observation at time $t$ .

Weight matrices: The GRU uses several weight matrices to transform and combine the input data $X_t$ and the previous hidden state $h_{t-1}$ .

Update gate: $W_z$ and $U_z$

Reset gate: $W_r$ and $U_r$

Candidate hidden state: $W_h$ and $U_h$

Bias terms: Bias terms are added to the computations within the GRU to allow each unit to have a trainable offset. They help in shifting the activation functions, enabling better learning.

b_z, b_r, b_h

Compute the update gate: The update gate determines how much of the previous hidden state $h_{t-1}$ one needs to send forward to the future hidden state $h_t$ . Actually, if assists in defining the trade-off between memorization of prior inputs and accumulation of new knowledge from the current input $X_t$ .

$z_t=\sigma\left(W z X_t+U z h_{t-1}+b z\right)$

where, $\sigma$ is the sigmoid activation function.

Compute the reset gate: Reset gate decides with what extent of the previous hidden state $h_{t-1}$ should be forgotten. Using the current and previous time step has a role in regulating the impacts of past information for the current candidate hidden state.

$r t=\sigma(W r X t+U r h t-1+b r)$

Compute the candidate hidden state: The candidate new hidden state is a possible new hidden state at some time $t$ , which depends on the current input and the resetting of the prior hidden state.

$\tilde{h}_t=\tanh \left(W_h X_t+U_h\left(r_t \odot h_{t-1}\right)+b_h\right)$

where, $\odot$ stands for element wise multiplication.

Compute the final hidden state: The last one is obtained from merging the previous hidden state and the candidate hidden state with help of the update gate.

$h_t=\left(1-z_t\right) \odot h_{t-1}+z_t \odot \tilde{h}_t$

Output the final hidden state: The last of the aforementioned hidden state $h_t$ acts as the output of the GRU for the current time step. It incorporates the compressed data from the parallel input at the current time step, $X_t$ and the previous hidden state $h_{t-1}$ .

Algorithm 1: Lightweight GLN model for water contamination classification

1. Initialization:

$W_z, U_z, b_z$ = initialize parameters for update gate ()

$W_r, U_r, b_r$ = initialize parameters for reset gate ()

$W_h, U_h, b_h$ = initialize parameters for candidate state ()

2. Assume we have T time steps and input data X of shape $\left(T\right.$ , input $\left._{\text {dim }}\right) \quad T=X . \operatorname{shape}[0]$

3. Initialize hidden state h0

$h_{\text {prev }}$ = initialize hidden state()

4. Loop over each time step

for t in range(T):

$z_t=\operatorname{sigmoid}\left(W z @ X[t]+U z @ h_p r e v+b z\right)$

$r_t=\operatorname{sigmoid}\left(W r @ X[t]+U r @ h_p r e v+b r\right)$

$h_h a t_t=\tanh \left(W h @ X[t]+U h @\left(r_t * h_p r e v\right)+b h\right)$

$h_t=\left(1-z_t\right) * h_p r e v+z_t * h_h a t_t$

$h_p r e v=h_t$

5. Output

The hyperparameters of the GLN model, such as the learning rate, batch size, and optimizer, have been carefully tuned to achieve optimal performance. A learning rate of 0.001 was selected to ensure a balance between convergence speed and stability. The batch size was set to 32, enabling efficient processing of data while maintaining stable gradient updates. The Adam optimizer was chosen for its effectiveness in minimizing loss and handling sparse gradients.

4. Results and Discussions

The empirical data and discourses of this paper’s Results and discussion section dwells on the performance evaluation of the proposed GLN model to in recognition of water contaminant. When solving problem involving multi-variate water quality data, the model was found to be more efficient than other models. Such conclusions undoubtedly affirm that the developed approach can be insightful for real-time monitoring of water sources and hence it can lay the foundation for the further preservation of the environment.

The study done used data set from catfish ponds in freshwater aquaculture system. One of the used components were ESP 32 microcontroller with array of sensors aimed to monitor several water quality parameters per 5 seconds. The sensors used were Dallas Instrument Temperature sensor (DS18B20) for the temperature of water, a DF Robot Turbidity sensor for turbidity of water, a DF Robot Dissolved Oxygen sensor for calculating amount of dissolved oxygen in water and a DF Robot pH sensor V2. It comprises of the pH Level, for which we have placed an MQ-137 Ammonia sensor and Nitrate Level for which an MQ-135 Nitrate sensor has been placed. In addition, data collection was also done by the Lacuna Award for Agriculture in Sub-Saharan Africa 2020 by the Meridian Institute based in Colorado, USA. Water quality analysis of twelve aquaponics catfish ponds was conducted from June to mid-October, 2021. There were 6 sensors for each pond while each unit produced data records of over 170,000. Data was collected, processed and classified from time to time with a view of managing biasness and variability. This is attributable to the fact that often the large quantity of data is available which can be used effectively to provide good basis for benchmarking the Attention-Quantized GRU model while comparing the performance of the proposed model for the categorization of water quality in aquaculture systems.

The preprocessing performed on data have been detailed to ensure clarity and reproducibility. Missing values were addressed by imputing the mean, numerical features were standardized to achieve a zero mean and unit variance, and duplicate entries were removed to maintain data consistency. These preprocessing steps significantly improved the quality of the input data and contributed to the enhanced performance of the GLN model.

Accuracy of proposed GLN and existing models is depicted in Figure 2. Compared these results with the conditions under which GLN was initiated the exposure to one-hundred epochs produced an enhanced accuracy of 0.92 It may be increased up to 0.992 besides, its initial value is 0.92 thus making it to be relatively higher than other models. However, there is also the AT-LSTM one which has also been precise with the performance enhancing with an accuracy of 0. 979 indicating that the developed models have relatively high predictive capability.

It should noted, that in general both the models developed for BMI and systolic blood pressure have relative high value of coefficient R square statistic of the fit 0.979. We can observe that both AODEGRU and DSTCNN have optimized, but they have started from accuracies which are almost equally small and marginal in differences of 0.951 and 0.946 respectively. All the models work but from the graph we see that Generalization is highest in GLN.

2.png

Figure 2. Accuracy

3.png

Figure 3. Precision

The last precision metrics in Figure 3 are the proposed GLN with high capability of selecting positive instances as the precision values increases from 0. 91 to 0. 984. The final precision of AT-LSTM stands at 0. 963 whereas AODEGRU and DSTCNN get less reflecting falser positive than GLN that is it expresses more fake negative as positives than any model on this paper so far. The further development of precise for GLN also increases its reliability in the sense of making forecasts.

Figure 4 illustrates the recall of proposed GLN against existing models, the final epoch shows that GLN has reached maximum 0. 992 suggesting how accurate this particular model can be aquiring true positive results with the cases that occur during the test phase or validation period. Next, is AT-LSTM that marks recall rate at the end equivalent to 0. 969 which means it is good with detecting negatives within its group though not as great as GLN’s ability. At approximately 0. 552 for AODEGRU while the lowest average node degree was 0 for both CAMBRIDGE and AODEGRU. Recall of 0.947 for DSTCNN their recalls peak thus proving that the models work well but the possibility to miss some positive examples is small as opposed to the GLN model that does not make such mistakes as evidenced by slightly higher scores.

Figure 5 depicts the F1 scores it shows that only for these models, we will get the highest percentage up to 98%. This implies their favourable scores propose that in execute they show equivalent sensitivity and precision and that there is no way that either of the two can influence the other. Nonetheless, the F1 score of0.965 which is high and still it is not as good in this aspect as GLN. DSTCNN and AODEGRU have comparatively lower F1 score which means there is a difference in the capability of the two in terms of precision and recall which may be a setback in situations where both precision and recall are very crucial.

4.png

Figure 4. Recall

5.png

Figure 5. F1-Score

Figure 6 shows the loss value; it shows how the models are able to reduce the error in training. The loss reducing rates of GLN were highest among all the groups and started from 0. 133 to 0. 079 suggesting that learning and optimisation occurred throughout all phases. While AT-LSTM, AODEGRU, and DSTCNN have also attained the smaller losses compared with GLN, they are not getting into similar level stating that errors are minimized better by GLN therefore improving its ability to make more accurate predictions as well.

6.png

Figure 6. Loss

Confusion matrix analysis is very important in deep learning since it is used to assess the performance of the classification models. It affords a graphical comparison between the model’s predictions and the actual ground truth labels with a clear view of the zones that the model performs excellently or poorly. The confusion matrix of proposed and existing models for water contamination analysis has been shown in the Figure 7. Therefore, there is consistency in water quality of dissolved oxygen pH temperature and ammonia and hence affect the fish in the aquaculture farms. Outbreak of diseases as well as development of stress and deaths may originate from poor water quality hence the importance of regular examination of water contaminants. Other previous models such as DSTCNN AODEGRU and AT-LSTM has also been employed in predicting the water quality, but each of them has its own disadvantage.

For instance, DSTCNN is limited by temporal dependencies and therefore gets confused between the contamination of water in different conditions and has a higher rate of error. Compared to other algorithms, AODEGRU provides a better result but suffers from noisy and high dimensional data which hampers its performance as well as especially focusing on not-contaminated water. Overall, AT-LSTM provides better outcome prediction but, it overfits; this means that it’s less likely to generalize in areas with dramatically different aquatic landscapes.

However, the proposed GLN model does not have such problems because it is accurate and flexible for real time monitoring of water quality. The confusion matrix demonstrates fewer errors hence its efficiency in differentiating between contaminated and not contaminated water, which is crucial in taking immediate correction in aquaculturing. The implementation of the GLN model also leads to the enhancement of the computational capability to walk this latency thus being important in dynamic health environments where response is crucial. However, conversely as it is more adaptable to the constantly changing water patterns for a better long-term expectation, it is a better long-term tool to ensure fish health and profitability of aquaculture more than the traditional models.

7a.png

(a) DSTCNN

7b.png

(b) AODEGRU

7c.png

7d.png

(d) GLN

Figure 7. Confusion matrix analysis of proposed and existing models in water contamination classification

Table 1. 10-fold cross validation

Epoch	Model	Accuracy	Precision	Recall	F1-Score	Loss
10	DSTCNN	0.89	0.88	0.89	0.887	0.154
	AODEGRU	0.91	0.91	0.92	0.91	0.149
	AT-LSTM	0.88	0.91	0.92	0.91	0.144
	GLN (Proposed)	0.92	0.91	0.93	0.92	0.133
20	DSTCNN	0.905	0.887	0.907	0.905	0.153
	AODEGRU	0.915	0.911	0.923	0.917	0.142
	AT-LSTM	0.93	0.92	0.93	0.92	0.132
	GLN (Proposed)	0.93	0.925	0.935	0.93	0.122
30	DSTCNN	0.912	0.909	0.918	0.913	0.15
	AODEGRU	0.922	0.915	0.916	0.924	0.143
	AT-LSTM	0.93	0.92	0.93	0.92	0.13
	GLN (Proposed)	0.942	0.936	0.944	0.94	0.112
40	DSTCNN	0.922	0.917	0.924	0.921	0.147
	AODEGRU	0.931	0.918	0.931	0.927	0.136
	AT-LSTM	0.95	0.93	0.94	0.93	0.123
	GLN (Proposed)	0.954	0.946	0.954	0.95	0.109
50	DSTCNN	0.924	0.921	0.926	0.923	0.146
	AODEGRU	0.936	0.929	0.934	0.931	0.131
	AT-LSTM	0.95	0.94	0.94	0.94	0.121
	GLN (Proposed)	0.964	0.955	0.963	0.96	0.101
60	DSTCNN	0.93	0.924	0.932	0.927	0.135
	AODEGRU	0.94	0.934	0.944	0.937	0.118
	AT-LSTM	0.96	0.95	0.96	0.95	0.112
	GLN (Proposed)	0.974	0.966	0.975	0.97	0.098
70	DSTCNN	0.938	0.929	0.937	0.933	0.132
	AODEGRU	0.947	0.936	0.945	0.94	0.117
	AT-LSTM	0.967	0.954	0.959	0.956	0.108
	GLN (Proposed)	0.976	0.967	0.977	0.972	0.089
80	DSTCNN	0.945	0.935	0.946	0.939	0.123
	AODEGRU	0.949	0.941	0.943	0.942	0.111
	AT-LSTM	0.971	0.962	0.967	0.963	0.101
	GLN (Proposed)	0.982	0.973	0.981	0.977	0.087
90	DSTCNN	0.942	0.939	0.943	0.941	0.122
	AODEGRU	0.951	0.947	0.939	0.944	0.11
	AT-LSTM	0.977	0.964	0.968	0.966	0.099
	GLN (Proposed)	0.986	0.978	0.987	0.982	0.081
100	DSTCNN	0.946	0.943	0.947	0.944	0.117
	AODEGRU	0.934	0.941	0.946	0.931	0.109
	AT-LSTM	0.979	0.963	0.969	0.965	0.097
	GLN (Proposed)	0.992	0.984	0.992	0.988	0.079

Cross validation is shown in Table 1, it is one of the methods that can come up when evaluating a model, and to address the issue of over fitting in DL 10-fold cross validation is used. It entails dividing the data into 10 portions; uses nine portions to train the language analyzing model and the last portion to test the language analyzing model. This dataset is divided randomly into 10 folds and anyone-fold is used as a testing set while the other set is taken as training set. By averaging the result across these ten divides, cross validation, especially 10-fold cross validation provides a far better approximation of the generalization ability of the model as compared to having train and test split. This helps to minimize the over fitting that sets in when the model is tuned with the training data and thus gave a poor prediction when the test data was presented to it. About specifics of the models’ performance, it is clear from the scenario depicted in Tale-1 that the GLN model constitutes the highest levels of performance among all the models under consideration. Earlier, it provided answers to queries with 92% accuracy and which later went up to 99%. After 10 folds, not only GLN performance becomes better, but it also identifies the contaminated water samples with the least error of 2%. The model produces the accuracy of about 98%. For contamination 4% and for the recall of 99. This can be evidenced on the 2% which is seen to be capable of offering an accurate detection without compromising on a huge number of contaminated samples hence making this model the best for contamination detection. The latter is quite similar to AT-LSTM, which has scored noble 97.9% and 93%, 90% for precision and recall values respectively and as a result; it is ideal in the analysis of the water contamination level.

For the other models, the performances are moderate and for AODEGRU and DSTCNN respectively, have obtained 95% of the maximum accuracy. 1% while DSTCNN was slightly low at 94%. From the models, the results show that both forms function well with the discovery that they possess slightly lower reconstruction accuracy, lower accuracy in the identification of contaminated products, and slightly higher loss as compared to GLN and AT-LSTM.

Table 2 shows how the different sampling frequencies of the sensors (1Hz, 5Hz, and 10Hz) affect the GLN model. When the sampling frequency increases, the number of time, accuracy, precision, recall and F1-score also enhances, which proved that higher frequency gives more detailed data for the better prediction of the model.

Table 3 compares the performance of the GLN model in the presence of noise in sensor data (5% and 10%) and also where the data is missing (5% and 10%). Here, the performances of the model reveal the models’ robustness by slightly decline causes by the noise or missing data but are still reasonable and reliable.

Table 4 compares the computational efficiency of three existing models—DSTCNN, AODEGRU, and AT-LSTM—with the proposed GLN model. The number of parameters indicates the computational complexity of each model, while the runtime in seconds highlights the time required for processing. The proposed GLN model has the lowest number of parameters (1.8M), demonstrating its lightweight architecture compared to the other models. This reduced complexity contributes to its significantly faster runtime (12.8 seconds), making it more efficient while maintaining or exceeding the accuracy of existing models. These results showcase the computational advantages of the proposed GLN model.

Table 2. Performance of proposed model with different sampling frequencies

Experiment	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Sampling Frequency (1Hz)	89.5	88.9	90.1	89.5
Sampling Frequency (5Hz)	92.1	91.8	92.5	92.1
Sampling Frequency (10Hz)	93.8	93.2	94	93.6

Table 3. Performance of proposed model with Noise and missing data

Experiment	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Sensor Data Noise (5%)	90.3	89.7	91.2	90.4
Sensor Data Noise (10%)	87.9	86.7	88.2	87.4
Missing Data (5%)	91.2	90.5	92	91.2
Missing Data (10%)	88.4	87.8	89.1	88.3

Table 4. Model complexity and run time comparison

Model	Number of Parameters	Runtime (Seconds)
DSTCNN	2.5 M	15.2
AODEGRU	3.1 M	18.7
AT-LSTM	4.8 M	22.5
Proposed GLN	1.8 M	12.8

5. Conclusions

High concentration of ammonia poses a threat to the health of the aquatic systems and the fish which is an important aspect considered under the present work. In that regard, the use of GLN model to categorize water pollution based on necessary water quality parameters as suggested in this research would enhance precision and reliability when determining the quality of water. Another parameter explained the fact that the GLN model’s performance is higher than the performance of traditional models. This means that the GLN model is very robust in recognition or categorization of contamination rates attaining average precision are superior. The following result shows comparable fluctuation for the year 1992 while holding a high level of precision and a high recall rate. This way it is tone to balance sensitivity and specificity presented with F1-score and one more thing is about it stating it is useful for practice where false negative and positive instances should be eliminated. Furthermore, the losses should be cut to 0 levels from much higher earlier levels. The following is the summary of the current and the earlier loss levels: 0.079 gives the amount of data by which the model is trained for a forecast before the model is optimized to make it efficient. So besides being conceptually consistent this also means that GLN can be used to declare real time water quality control objectives. For the purpose of the water resource management, it could improve the existing approaches to provide more adequate and timely assessment of the pollution degree that might negatively impact ecology or fish.

Before continuing any further in this discussion, it must be pointed out that this research holds very concrete practical implications. For instance, in the environmental monitoring stations the application of the GLN model can lead to the fact that some test on water quality can be carried out 24/7. In this way, quick actions like restoration measures can be made when a traceability system alerts contamination early enough hence protecting the lives of aquatic animals including human beings’ lives. Additionally, they appear to be easily interpreted; environmental scientists and policy makers do not have to work hard to try and understand them; thus, can trust the predictions from models developed from this type to increase the probability of their adoption into regulations.

References

[1] Amin, M., Musdalifah, L., Ali, M. (2020). Growth performances of Nile Tilapia, Oreochromis niloticus, reared in recirculating aquaculture and active suspension systems. IOP Conference Series: Earth and Environmental Science, 441(1): 012135. https://doi.org/10.1088/1755-1315/441/1/012135

[2] Arepalli, P.G., Naik, K.J., Amgoth, J. (2024). An IoT based water quality classification framework for aqua-ponds through water and environmental variables using CGTFN model. International Journal of Environmental Research, 18(4): 73. https://doi.org/10.1007/s41742-024-00625-2

[3] Arepalli, P.G., Naik, K.J. (2024). An IoT based smart water quality assessment framework for aqua-ponds management using Dilated Spatial-temporal Convolution Neural Network (DSTCNN). Aquacultural Engineering, 104: 102373. https://doi.org/10.1016/j.aquaeng.2023.102373

[4] Gibtan, A., Getahun, A., Mengistou, S. (2008). Effect of stocking density on the growth performance and yield of Nile tilapia [Oreochromis niloticus (L., 1758)] in a cage culture system in Lake Kuriftu, Ethiopia. Aquaculture Research, 39(13): 1450-1460.

[5] Daudpota, A.M., Kalhoro, I.B., Shah, S.A., Kalhoro, H., Abbas, G. (2014). Effect of stocking densities on growth, production and survival rate of red tilapia in hapa at fish hatchery Chilya Thatta, Sindh, Pakistan. Journal of Fisheries, 2(3): 180-186. https://doi.org/10.17017/jfish..v2i3.2014.45

[6] Arepalli, P.G., K, J.N., Rout, J.K. (2024). Aquaculture water quality classification with sparse attention transformers: Leveraging water and environmental parameters. In Proceedings of the 2024 13th International Conference on Software and Computer Applications, Bali Island, Indonesia, pp. 318-325. https://doi.org/10.1145/3651781.3651829

[7] Ani, J.S., Manyala, J.O., Masese, F.O., Fitzsimmons, K. (2022). Effect of stocking density on growth performance of monosex Nile Tilapia (Oreochromis niloticus) in the aquaponic system integrated with lettuce (Lactuca sativa). Aquaculture and Fisheries, 7(3): 328-335. https://doi.org/10.1016/j.aaf.2021.03.002

[8] Zambrano, A.F., Giraldo, L.F., Quimbayo, J., Medina, B., Castillo, E. (2021). Machine learning for manually-measured water quality prediction in fish farming. Plos One, 16(8): e0256380. https://doi.org/10.1371/journal.pone.0256380

[9] Palani, S., Liong, S.Y., Tkalich, P. (2008). An ANN application for water quality forecasting. Marine Pollution Bulletin, 56(9): 1586-1597. https://doi.org/10.1016/j.marpolbul.2008.05.021

[10] Castrillo, M., García, Á.L. (2020). Estimation of high frequency nutrient concentrations from water quality surrogates using machine learning methods. Water Research, 172: 115490. https://doi.org/10.1016/j.watres.2020.115490

[11] Anand, M.V., Sohitha, C., Saraswathi, G.N., Lavanya, G.V. (2023). Water quality prediction using CNN. Journal of Physics: Conference Series, 2484(1): 012051. https://doi.org/10.1088/1742-6596/2484/1/012051

[12] Ye, B., Cao, X., Liu, H., Wang, Y., Tang, B., Chen, C., Chen, Q. (2022). Water chemical oxygen demand prediction model based on the CNN and ultraviolet-visible spectroscopy. Frontiers in Environmental Science, 10: 1027693. https://doi.org/10.3389/fenvs.2022.1027693

[13] Hu, Z., Zhang, Y., Zhao, Y., Xie, M., Zhong, J., Tu, Z., Liu, J. (2019). A water quality prediction method based on the deep LSTM network considering correlation in smart mariculture. Sensors, 19(6): 1420. https://doi.org/10.3390/s19061420

[14] Liu, P., Wang, J., Sangaiah, A.K., Xie, Y., Yin, X. (2019). Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability, 11(7): 2058. https://doi.org/10.3390/su11072058

[15] Ahmed, U., Mumtaz, R., Anwar, H., Shah, A.A., Irfan, R., García-Nieto, J. (2019). Efficient water quality prediction using supervised machine learning. Water, 11(11): 2210. https://doi.org/10.3390/w11112210

[16] Juna, A., Umer, M., Sadiq, S., Karamti, H., Eshmawi, A.A., Mohamed, A., Ashraf, I. (2022). Water quality prediction using KNN imputer and multilayer perceptron. Water, 14(17): 2592. https://doi.org/10.3390/w14172592

[17] Li, T., Lu, J., Wu, J., Zhang, Z., Chen, L. (2022). Predicting aquaculture water quality using machine learning approaches. Water, 14(18): 2836. https://doi.org/10.3390/w14182836

[18] Wang, X., Li, Y., Qiao, Q., Tavares, A., Liang, Y. (2023). Water quality prediction based on machine learning and comprehensive weighting methods. Entropy, 25(8): 1186. https://doi.org/10.3390/e25081186

[19] Cojbasic, S., Dmitrasinovic, S., Kostic, M., Turk Sekulic, M., Radonic, J., Dodig, A., Stojkovic, M. (2023). Application of machine learning in river water quality management: A review. Water Science & Technology, 88(9): 2297-2308. https://doi.org/10.2166/wst.2023.331

[20] Da Silva, L.F., Yang, Z., Pires, N.M., Dong, T., Teien, H.C., Storebakken, T., Salbu, B. (2018). Monitoring aquaculture water quality: Design of an early warning sensor with Aliivibrio fischeri and predictive models. Sensors, 18(9): 2848. https://doi.org/10.3390/s18092848

[21] Chen, F., Du, Y., Qiu, T., Xu, Z., et al. (2021). Design of an intelligent variable-flow recirculating aquaculture system based on machine learning methods. Applied Sciences, 11(14): 6546. https://doi.org/10.3390/app11146546

[22] Yang, J., Jia, L., Guo, Z., Shen, Y., et al. (2023). Prediction and control of water quality in Recirculating Aquaculture System based on hybrid neural network. Engineering Applications of Artificial Intelligence, 121: 106002. https://doi.org/10.1016/j.engappai.2023.106002

[23] Wu, J., Wang, Z. (2022). A hybrid model for water quality prediction based on an artificial neural network, wavelet transform, and long short-term memory. Water, 14(4): 610. https://doi.org/10.3390/w14040610

[24] Zhou, S., Song, C., Zhang, J., Chang, W., Hou, W., Yang, L. (2022). A hybrid prediction framework for water quality with integrated W-ARIMA-GRU and LightGBM methods. Water, 14(9): 1322. https://doi.org/10.3390/w14091322

[25] Chen, H., Yang, J., Fu, X., Zheng, Q., et al. (2022). Water quality prediction based on LSTM and attention mechanism: A case study of the Burnett River, Australia. Sustainability, 14(20): 13231. https://doi.org/10.3390/su142013231

[26] Cai, H., Zhang, C., Xu, J., Wang, F., Xiao, L., Huang, S., Zhang, Y. (2023). Water quality prediction based on the KF-LSTM encoder-decoder network: A case study with missing data collection. Water, 15(14): 2542. https://doi.org/10.3390/w15142542

[27] Farzana, S.Z., Paudyal, D.R., Chadalavada, S., Alam, M.J. (2023). Prediction of water quality in reservoirs: A comparative assessment of machine learning and deep learning approaches in the case of Toowoomba, Queensland, Australia. Geosciences, 13(10): 293. https://doi.org/10.3390/geosciences13100293

[28] Arepalli, P.G., Khetavath, J.N. (2023). An IoT framework for quality analysis of aquatic water data using time-series convolutional neural network. Environmental Science and Pollution Research, 30(60): 125275-125294. https://doi.org/10.1007/s11356-023-27922-1

[29] Kolding, J., Haug, L., Stefansson, S. (2008). Effect of ambient oxygen on growth and reproduction in Nile tilapia (Oreochromis niloticus). Canadian Journal of Fisheries and Aquatic Sciences, 65(7): 1413-1424. https://doi.org/10.1139/F08-059

[30] Arepalli, P.G., Naik, K.J. (2024). A deep learning-enabled IoT framework for early hypoxia detection in aqua water using light weight spatially shared attention-LSTM network. The journal of Supercomputing, 80(2): 2718-2747. https://doi.org/10.1007/s11227-023-05580-x

[31] Arepalli, P.G., Naik, K.J. (2023). An IoT-based water contamination analysis for aquaculture using lightweight multi-headed GRU model. Environmental Monitoring and Assessment, 195(12): 1516. https://doi.org/10.1007/s10661-023-12126-4

[32] Lawson, T.B. (Ed.). (1994). Fundamentals of Aquacultural Engineering.Springer Science & Business Media.

[33] El-Sherif, M.S., El-Feky, A.M.I. (2009). Performance of Nile tilapia (Oreochromis niloticus) fingerlings. I. Effect of pH. International Journal of Agriculture and Biology, 11(3): 297-300.

[34] Hargreaves, J.A., Tucker, C.S. (2004). Managing Ammonia in Fish Ponds (Vol. 4603). Stoneville: Southern Regional Aquaculture Center.

[35] Stone, N.M., Thomforde, H.K. (2004). Understanding Your Fish Pond Water Analysis Report. Cooperative Extension Program, University of Arkansas at Pine Bluff, US Department of Agriculture and County Governments Cooperating.

[36] Boyd, C.E., Tucker, C.S. (2012). Pond Aquaculture Water Quality Management. Springer Science & Business Media.

[37] Boyd, C.E. (1982). Water Quality Management for Pond Fish Culture. Elsevier: Amsterdam, The Netherlands.

[38] Arepalli, P.G., Naik, K.J. (2024). Water contamination analysis in IoT enabled aquaculture using deep learning based AODEGRU. Ecological Informatics, 79: 102405. https://doi.org/10.1016/j.ecoinf.2023.102405

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

A Deep Learning Approach for Water Quality Assessment: Leveraging Gated Linear Networks for Contamination Classification