Evaluating the Adaptability of Deep Learning-Based Multi-feature Sonar Image Detection Algorithms

ABSTRACT


INTRODUCTION
Sonar images are based on sound waves and have a wide range of applications.Robust target detection in sonar images is an essential task for underwater exploration, navigation, and mapping and is valuable in exploring information about the underwater environment [1,2].In sonar images, sonar can easily be imaged through turbulence and small particles, but the target in the sonar image is mixed with the background, has high noise, and is susceptible to environmental interference [3].Therefore, it is necessary to extract target information through feature detection and classify and recognize the target.Traditional sonar image processing methods mainly include filtering, threshold segmentation, morphological operations, etc.However, these methods have many problems, such as sensitivity to noise and insensitivity to target shapes.Although experts have conducted in-depth research on traditional sonar image processing methods, their main applications in underwater sonar imaging have not yet been discovered [4].Deep learning techniques have been studied in depth in recent years as they continue to evolve.Deep learning is a method of learning from massive data that can automatically extract features from images and detect and classify various types of objects.
How to effectively improve the detection accuracy and stability of sonar images is one of the current research hotspots.
For example, in response to the current problems in sonar detection, Wang et al. [5] proposed a new quantization-based frog jump detection method based on previous research.On this basis, he established a mathematical model for segmentation, distribution, and noise entropy in frog jump detection, achieving quantitative analysis of detection results and verification of detection results.In addition, Sung et al. [6] introduced a method of using production adversarial networks to simulate real sonar images.Through experimental verification, this algorithm can achieve target detection from multiple angles and scenes without the need for sea trials and has strong robustness.In response to the current problems in underwater acoustic signal processing, Abu and Diamant [7] introduced a constant false alarm rate detection algorithm called K-distribution and conducted experiments on 270 actual sonar images in actual environments.The outcomes indicated that the proposed method had good reception performance.In addition, Abu and Diamant [8] also proposed a new unsupervised statistical learning method that has high resolution and is not distance dependent.This method can effectively solve the problem of target recognition.These studies have certain reference significance for the detection of multiple targets in sonar images.
As deep learning technology continues to develop, it has made significant progress in image processing, especially in target detection, where deep learning has become a mainstream approach.For example, aiming at the problem that the target is near and leads to the distortion of the target shape, which brings difficulties to target detection, Sung et al. [9] proposed a target detection and elimination method based on CNN.This research result can be extended to other target detection algorithms to enhance the reliability of target detection.Abbas and Celebi [10] developed a new dermal classification system by integrating multiple visual features and deep neural networks, which can extract new aggregates of visual features and descriptors in perceptual color spaces.To achieve efficient and accurate monitoring of subsea pipelines, Xiong et al. [11] proposed a real-time automatic monitoring, evaluation, and positioning integrated subsea pipeline status monitoring and evaluation method based on 3D real-time sonar.This method can improve the real-time monitoring of subsea pipelines, avoid manual intervention, and ensure the efficiency and accuracy of subsea pipeline status monitoring.In underwater unmanned detection systems, automatic recognition of underwater targets is an important technology.To enhance the accuracy of automatic underwater target recognition, Jin et al. [12] studied the target sonar image recognition method based on CNN and combined salient region segmentation technology with conical pool fusion technology based on the characteristics of underwater acoustic signals to enhance the efficiency and accuracy of underwater acoustic signal processing, providing a new approach for underwater acoustic signal processing.These pieces of literature have certain research significance for the adaptability evaluation of multi-feature detection algorithms in sonar images, but they have not been analyzed based on the current situation.
The current sonar image feature fusion methods mostly use manually designed features, so there are often issues with them not being comprehensive and accurate enough.This article attempted to use a deep learning-based CNN algorithm for feature extraction from sonar images and combined multiple feature fusion methods to perform multiple feature detection on sonar images so as to achieve high detection and classification results.The VGG-16 model in CNN and the weighted feature method were selected to establish a complete sonar image multi-feature detection model.In the experiment, the classification and detection capabilities of CNN and other algorithms were tested using images from a common dataset of sonar images.After testing, the accuracy of sonar image recognition using CNN in different scenarios has remained above 90%, and the CNN algorithm has achieved significant results in multi-feature detection and classification of sonar images.

Data preprocessing
Sonar imaging is the use of sound energy emitted by sonar transmitting devices when sound waves are transmitted in water or other media and the echo signal received by sonar receiving devices to form an image.It has extensive applications in underwater exploration, marine scientific research, aerospace, and other fields [13].Sonar imaging technology has broad application prospects in medical imaging and ocean exploration.The ability to detect changes in underwater scenes has many applications, including environmental monitoring, strategic maritime waterway monitoring, and naval landmine countermeasures [14].The limitations of the hydroacoustic channel and existing hydroacoustic communication technologies make sonar imaging highly susceptible to a variety of typical distortions [15].Currently, sonar imaging faces issues like a low signalto-noise ratio and poor imaging quality, necessitating preprocessing to effectively enhance its imaging quality.In the preprocessing process of sonar images, clutter removal, signal enhancement, Doppler suppression, etc. are important steps, which are explained below.

Clutter removal
In the sonar imaging process, there are some interference signals unrelated to the target.Electronic clutter, scattering clutter, and acoustic clutter are the most common types.Eliminating the impact of clutter on sonar imaging quality can effectively improve imaging quality.

Signal enhancement
In the sonar imaging process, due to the relatively weak strength of the target signal, it is necessary to strengthen it.In image processing, methods such as grayscale elongation, contrast enhancement, and histogram equalization are commonly used to enhance signals.Among them, grayscale expansion is to expand the dynamic range of the image through linear conversion of the pixel value range of the input image to improve the contrast and sharpness of the image.

Doppler suppression
The Doppler effect is the change in echo frequency caused by the different moving speed of the target in the process of acoustic reflection on sonar, which causes strip artifacts and throttling phenomena in sonar imaging.To eliminate Doppler interference, various techniques, such as Doppler filtering and averaging filtering, can be utilized.
Preprocessing sonar images is an essential step.On this basis, techniques such as clutter removal, signal enhancement, and Doppler suppression have been adopted to further improve the quality and accuracy of sonar imaging, laying a solid theoretical and technical foundation for subsequent data processing and analysis.

Multiple feature target extraction
The multi-feature detection of sonar images refers to the processing and analysis of sonar images, extracting various feature information contained in the images, and detecting this feature information to achieve the goal of recognizing and locating target objects.The reflection intensity and time of sound waves on the surface of an object, as well as the shape, texture, and color of the object surface, are the basic characteristics of the object surface.The basic idea of multifeature detection is to extract and analyze information such as intensity, time, shape, texture, color, etc., during the reflection process of sonar signals to achieve target recognition and localization.The novelty of the multi-feature extraction approach is to improve the performance of target classification by integrating the representation learning capability of various features and exploiting the neighborhood multi-oriented correlation [16].Therefore, effective analysis and processing of it can help improve the accuracy and reliability of sonar image processing.In this paper, a sonar image feature extraction method based on CNN is selected.Through the special structure of CNN, the image texture features are extracted.
In this article, the VGG-16 CNN model is utilized to extract multi-level features of sonar images, and its analysis is carried out.VGG-16 is a typical CNN, which includes a 16-layer convolutional layer, a complete connection layer, and a maximum soft layer.The characteristic of this method lies in its relatively simple network structure, deeper layers, consistent convolutional kernel size, and increasing channel numbers in the feature map, thus possessing good generalization ability and high recognition accuracy.In the first two levels of VGG-16, based on the basic theory of CNN, the network is improved by using the dropout algorithm and L2 regularization algorithm, so as to effectively suppress the "overfitting" phenomenon.For the first layer of VGG-16, convolution operations can be expressed as: In Formula (1), i and j represent the size of the image.Among them,   (, ) represents the output of the first layer, which represents the operation of the convolutional layer.The maximization operation of the pooling layer can be expressed by the following formula: In Formula (2), after convolution and pooling operations in the first few layers, a certain size of feature map can be obtained.The obtained feature map is transformed into highdimensional vectors as input to the fully connected layer.This transformation process is represented as: In Formula (3), n represents the dimension size of the feature map.The activation function used by VGG-16 is softmax, and the cross-entropy loss function is used.The expression of the activation function is as follows: In Formula (4), (  ) means activation function.In addition, the cross-entropy loss function is expressed as follows: (5) Formula ( 5) represents the loss function of VGG-16.In the process of extracting sonar image features, the pre-trained VGG-16 model is first loaded and used as a feature extractor.The sonar images are converted into appropriate formats, and preprocessing steps are applied to each sonar image.Subsequently, the VGG-16 model is utilized to extract the features of each sonar image.The first few convolutional layers of the model can extract lower-level features, such as edges and textures, while deeper layers can extract higherlevel features, such as the shape and contour of the object.Each extracted image feature vector is standardized and normalized to ensure that it has the same scale and distribution.Finally, dimensionality reduction techniques are applied to each image feature vector, and other features are extracted from each sonar image to provide grayscale histograms, texture features, and shape features of the sonar features.All extracted sonar image features are combined into a feature vector for sonar image classification and fusion.Figure 1 shows the grayscale histograms of sonar images before and after equalization.

Sonar image multi feature fusion and classification
A sonar detector is an instrument that utilizes acoustic principles to detect underwater targets.Autonomous underwater detectors are important tools for underwater exploration, as they can enter dangerous places to avoid danger to humans [17].However, in the actual testing process, due to factors such as the shape and size of the tested object, the echo signal presents a complex waveform, which brings certain errors to the testing.Therefore, when conducting sonar sounder measurements, it is necessary to detect and fuse multiple features of sonar images, as well as accurately analyze and process them to avoid measurement errors.To further improve the classification effect, this article adopts a

Grayscale histogram of sonar image after equalization
weighted feature fusion method to fuse features at different levels.
The detection rate of multi-feature fusion algorithms is higher than that of algorithms such as single-basis detection [18,19].Weighted feature fusion is a method of feature fusion in machine learning that weights and aggregates different features according to a certain weight to obtain a more comprehensive feature representation.The weight-based feature fusion method assigns a weight to each feature to represent its contribution and importance to the model.Secondly, the weighted eigenvalues of each data point are calculated using a weighted method, and each data point is merged into a new comprehensive eigenvalue.This new integrated feature vector can be used in tasks such as model training, classification, and regression.The weighted feature fusion calculation formulas are as follows: In Formulas ( 6) and ( 7),   is the normalized weight, and   is the initialization weight coefficient.Assuming that sonar features are combined for the sonar image classification task, CNN is utilized to extract image features first.For each data point, image features are assigned different weights, and different image feature vectors are multiplied by corresponding weights to obtain weighted feature vectors.
Then, the weighted feature vectors are combined to obtain a new comprehensive feature vector.Finally, this new comprehensive feature vector is used as input to train a sonar image classification model.Compared with the most advanced deep learning-based methods, the multi-feature fusion method can greatly reduce learning parameters and training time [20].The steps of sonar image multi-feature fusion and classification based on deep learning CNN are shown in Figure 2.
The framework in Figure 2 includes data collection, feature extraction, feature weighting, feature fusion, model establishment, and model evaluation.Firstly, it is necessary to collect underwater sonar signal data, including characteristics such as echo amplitude, echo delay, echo morphology, and echo frequency.Then, the deep learning CNN algorithm extracts different features from sonar signal data, such as using signal-to-noise ratio to calculate echo amplitude and using reflection delay to calculate echo delay.Features are selected and assigned different weights to different features.According to the characteristics of sonar signals and practical application requirements, weight allocation is carried out.During feature fusion, each feature is combined linearly or nonlinearly according to its weight to obtain a comprehensive feature that is used to describe the sonar signal.Based on comprehensive features, a CNN model for sonar signal processing suitable for practical applications is established.Finally, the established CNN model is evaluated and optimized to verify its accuracy and stability.The established model has been applied to sonar detection, recognition, and positioning in practical underwater environments.

Adaptability assessment
This article uses the cross-validation method to evaluate the adaptability of the algorithm.Cross-validation is a common method in statistics, mainly used to evaluate the predictive ability of models.Its basic idea is to divide the dataset into several parts, then train and test each part, and finally average the results.The dataset is divided into two parts: the training set and the testing set.The model is trained through a training set, and the retest set is used to evaluate the performance of the model.By repeating it multiple times, the performance indicator data of the algorithm that needs to be tested is finally obtained to evaluate the algorithm's performance.The selection of evaluation indicators can include detection accuracy, recognition accuracy, false detection rate, stability, and other indicators.By analyzing the experimental results, the applicability of different algorithms in different scenarios is determined, and the advantages and disadvantages are identified to provide a reference for further improving the algorithm.Finally, the experimental results are compiled into a report, and the experimental methods, dataset, and result analysis are annotated for reference and reproduction by other researchers.The following is an experimental analysis.

EXPERIMENT ON SONAR IMAGE FEATURE DETECTION AND CLASSIFICATION BASED ON CNN
In sonar image processing, feature extraction and fusion are two of the key steps that directly affect the recognition accuracy of sonar images.This project is based on CNN for feature extraction and fusion of sonar images, and adaptability

Linear and nonlinear combination
Comprehensive features evaluation experiments are conducted on this method.The sonar image dataset used in this article is a publicly available dataset that includes 1,000 sonar images of various shapes and sizes.The image size is 256*256, and the pixel values are normalized to a range of [0, 1].The dataset contains sonar images of various targets, among which the shape, size, orientation, depth, reflectivity, etc., of the targets are different, as illustrated in Figure 3.

Figure 3. Sonar image dataset
To conduct adaptability evaluation experiments for sonar image feature fusion, the steps for evaluating the adaptability of sonar image multi-feature detection algorithms are as follows: Step 1: The CNN model of the sonar image is established.The CNN model is built using the deep learning framework, and the weight and offset are optimized through backpropagation to achieve automatic feature extraction and fusion.
Step 2: The CNN model is tested.The CNN model is compared with traditional feature extraction methods to test the performance of feature extraction and fusion in the CNN model.
Step 3: The adaptability of the CNN model is evaluated.Different datasets are used for training and testing to evaluate the adaptability and generalization ability of CNN models.
Before feature extraction, data needs to be preprocessed, including image scaling, normalization, noise reduction, and other steps, to improve the robustness and accuracy of the algorithm.Figure 4 illustrates the sonar image after noise reduction.

Figure 4. Sonar image after noise reduction
The two methods used in Figure 4 for denoising sonar images, including median filtering and mean filtering, show that the denoising effect of mean filtering is better.In the feature fusion experiment, two feature images are fused, and the fused feature images are used as input to the CNN classifier.Figure 5 shows the fusion results.Different feature-fusion methods have different advantages.To test the feature extraction and fusion method based on CNN in this article, the classification accuracy of geometric features, grayscale features, color features, texture features, and multifeature fusion was tested.Multi-feature fusion is the fusion of the first four types of features.The six sonar data samples mentioned above were tested separately, and the test results are illustrated in Figure 6.
Figure shows the classification accuracy of sonar images using five feature extraction methods, including geometric features, grayscale features, color features, texture features, and multi-feature fusion methods.Six sonar images were tested, with sonar image datasets numbered 1-6.From the data in Figures 6a, 6b, 6c, 6d, and 6e, it can be learned that the classification accuracy of the geometric feature processing method for sonar images was between 62% and 68%, and the classification accuracy of the grayscale feature processing method was between 58% and 62%.The classification accuracy of the color feature processing method was between 72% and 78%, and the classification accuracy of the texture feature processing method was between 69% and 74%.The classification accuracy of the method of multi-feature fusion processing for sonar images was between 86% and 91%.It can be seen that the multi-feature fusion method is the best for processing sonar images, achieving the best classification results, and significantly improving classification accuracy.The sonar image contains different acoustic signals and reflectivity, and the spacecraft can obtain the direction of the sonar signal when obtaining the sonar image.This article is based on a CNN model, which automatically obtains signals emitted by sonar and predicts their direction to obtain more accurate sonar image features.To evaluate the performance of CNN, the predictions of linear regression, decision tree, random forest, and CNN are tested, respectively, as shown in Figure 7.
Figure 7 tests the ability of linear regression, decision tree, Random forest, CNN and other four algorithms to predict sonar signals.The red curve in the figure represents the signal predicted by the algorithm, while the black curve represents the actual sonar signal curve collected by the aircraft.From the data in Figures 7a, 7b, 7c, and 7d, it can be seen that the error of the sonar signal trend curve predicted by linear regression is relatively large.Whether it is the trend of the sonar signal in the early or late stages, the accuracy of the prediction still lags behind the actual sonar trend.Decision tree and Random forest are the same.There is a certain gap between the predicted signal and the actual signal, and the accuracy needs to be improved.However, the prediction curve of CNN is relatively consistent with the direction of real sonar signals, with small errors and high accuracy.As the path of the sonar signal changes, the predicted signal path of CNN lingers in the real path, almost always in a consistent state, closer to the true value than other algorithms, indicating that the deep learning CNN algorithm has better convergence.
The data in sonar images is complex, and detecting sonar signals in different scenes is a relatively complex task.Moreover, sonar images contain elements of different categories.Accurate detection of different categories of sonar elements requires high-performance algorithms, and traditional algorithms are prone to missed detections.By testing the collected sonar images in different scenarios, the detection quantity and error quantity of the CNN algorithm were tested.Table 1 shows the results of CNN's detection of sonar images.
Table 1 shows the statistics of the number of detected sonar images and errors detected by the CNN algorithm based on sonar images collected in different scenarios.From the data in Table 1, it can be learned that the difference between the sonar images detected by CNN and the correct number is small, and the number of detected errors can also be controlled below 15, indicating that there is less interference from different scenes.In order to demonstrate the accuracy of CNN in detecting sonar images, its recognition accuracy and detection error rate were compared with traditional detection algorithms, as illustrated in Figure 8.
Figure 8 shows the comparison of the accuracy and detection error rates of two algorithms for recognizing sonar images collected in six scenarios.From the data in Figures 8a,  8b, 8c, and 8d, it can be learned that the recognition accuracy of traditional algorithms was below 90%, and they have not achieved a recognition accuracy rate of over 90%.The error rate has also reached over 15%.However, CNN's deep perception image recognition accuracy was mostly close to 100%, and the sonar image recognition accuracy in all scenarios was above 90%, with only 5.5% having the highest detection error rate.Therefore, it can be learned that the CNN model based on deep learning has a more accurate performance in detecting sonar images, and has good performance in sonar target detection and classification of images in different scenes.combining multiple feature detection methods, the accuracy and robustness of the algorithm have been improved.The experimental analysis results showed that CNN had good performance in detecting and classifying different sonar images, and the classification accuracy and recognition accuracy of sonar images were the best among the test algorithms.By improving the accuracy and efficiency of sonar image processing technology, rapid and accurate recognition and positioning of target objects in the ocean can be achieved, providing solid support for maritime traffic safety and marine resource development.

Figure 8 .
Figure 8.Comparison of recognition accuracy and detection error rates between traditional algorithms and CNN algorithms (a.Recognition accuracy of traditional algorithms; b.Recognition accuracy of CNN; c. Traditional algorithm detection error rate; d.CNN detection error rate)

Table 1 .
CNN algorithm for detecting sonar images in different scenarios