CNN Models Using Chest X-Ray Images for COVID-19 Detection: A Survey

ABSTRACT


INTRODUCTION
COVID-19 (Coronavirus disease 2019) is a contagious virus that has been spreading all over the world since December 2019 in Wuhan (China).It is caused by a virus called by the International Committee on Taxonomy of Viruses(ICT) SARS-CoV-2 [1].There are various methods to identify patients infected with COVID-19, the most effective test is Reverse Transcription Polymerase Chain Reaction (RT-PCR).In this method, a miniature target sequence of nucleic acids which consists of a fragment of DNA is copied many times to make it easier to detect.As a consequence, the test is positive when this amplification is detected (using a fluorophore-labeled probe).While RT-PCR tests have been widely used to detect COVID-19 and have played a crucial role in the pandemic response, they have certain limitations, particularly in terms of accuracy and scalability [2].
The accuracy of RT-PCR tests can vary depending on when the test is administered during the infection.They are generally more sensitive in the early stages of infection when viral loads are higher but may produce false negatives in later stages or if the sample quality is suboptimal.RT-PCR tests can sometimes yield false positives, especially if there is contamination during sample collection or processing.So RT-PCR test is not completely accurate and needs a second test for the diagnosis confirmation.Thus, RT-PCR test is timeconsuming.It takes more than a day to get results.It is also costly.RT-PCR test is resource-intensive and can be relatively costly to administer, especially when large-scale testing is required.RT-PCR tests are not efficiently scalable because it requires specialized laboratory equipment, managed by professional technicians.Scaling up testing capacity can be challenging, especially during surges in cases, leading to delays in testing and reporting which makes it an expensive test and difficult to scale.
Other methods that are useful for detecting COVID-19 include clinical examination, pathological tests, and radiography.Due to their easy accessibility, many clinicians prefer to use chest imaging for the diagnosis of COVID-19.The role of chest imaging in assessing complications, disease progression, and prognosis of COVID-19 has been discussed by Inui et al. [3].Also, the cochrane COVID-19 Diagnostic Test Accuracy Group [4] considered chest imaging as a good diagnostic tool for detecting and managing COVID-19 pneumonia.They have conducted different studies to evaluate the diagnostic accuracy of chest imaging in people of any age with suspected COVID-19.Their studies assessed chest CT, chest X-Ray, or ultrasound of the lungs for the diagnosis of COVID-19, using a reference standard that included RT-PCR.
Among medical imaging modalities, chest X-Ray was generally preferred for different reasons.Chest X-Ray images are widely available and can be easily taken in most healthcare settings.They are less expensive compared to other imaging techniques like computed tomography (CT) scans.As COVID-19 is an emergency, X-Ray images are very beneficial because they provide quick results compared to RT-PCR tests, and they do not expose the patient to as much radiation as other imaging techniques like CT scans.X-Ray images can be useful in monitoring the progress of the disease and assessing the response to treatment.Cozzi et al. [5] investigated the principal radiological features of COVID-19 by describing the most important chest X-Ray findings in a selected cohort of patients.They also correlated the radiological appearance with the RT-PCR test and the outcome of the patients.Accelerate diagnostic means using automatic tools that provide rapid results.The AI techniques have proven their effectiveness in different stages: the acquisition of medical images, then segmentation, and finally in the diagnosis of Covid-19.To illustrate the latest advances in medical imaging and radiology in the fight against COVID-19, Shi et al. [6] have illustrated the integration of AI with X-Rays, which are widely used in frontline hospitals.
Huang et al. [7] have provided a discussion of the challenges and perspectives of machine learning (ML) and deep learning (DL) in the detection of COVID-19.As a result, the researchers found that an AI model can be as accurate as experienced physicians in the diagnosis of COVID-19.
In general, Deep learning has shown massive potential for healthcare applications.Specifically, for medical data analysis and diagnosis through medical image processing [8].Due to their ability to learn complex features and patterns from the images, deep learning models, especially convolutional neural networks (CNNs) have shown promising results in detecting COVID-19 from X-Ray images and have the potential to assist professionals in diagnosing the disease accurately [9].
However, some challenges need to be addressed, such as the lack of publicly available data for training and evaluating COVID-19 detection models which limit the ability to build accurate models.Another issue is the variation in X-Ray image quality, and not all images will be clear or well-defined, leading to issues with classification.Also, the generalization challenge, using X-ray images to detect COVID-19 may limit the generalization capability of the model when it comes to identifying the virus in other parts of the body.
Consequently, the main intention behind this survey is to provide an overview of CNNs that have contributed to tackling these challenges.We have organized the reviewed CNN Models into three main categories.The first one focuses on transfer learning from pre-trained CNN models.In the second, the model is developed from scratch and called the Custom CNN Model.The last category couples the two previous ones to constitute the Hybrid CNN Model.A summary for each study is presented by giving the dataset size, the number of classes, the model architecture, and the performance evaluation criteria (Accuracy, Sensitivity, and Specificity).
The remainder of this paper is organized as follows.Section 2 presents the description of related work, where we classify CNN models through three categories: 1-Transfer Learning from pre-trained models; 2-Custom model, and 3-Hybrid model.Section 3 is devoted to some preliminary notions dealing with image classification, Datasets, CNN model architecture.Finally, the paper is concluded in Section 4 with a discussion of the related works.

RELATED WORK
Deep learning has developed good solutions with high accuracy for medical image segmentation and classification in the health sector [10][11][12][13].Many deep learning techniques based on medical images chest X-Ray have been developed to detect COVID-19 [14].Jain et al. [15]  CNN models perform feature extraction of medical images (chest X-Rays) of patients suspected to be infected with the virus and classification.CNNs extract relevant features using convolutional layers that apply filters on the images, reduce the dimensionality of feature maps with pooling layers, and use fully connected layers to perform classification by generating class probabilities.The model's weights are optimized during training to minimize the loss and regularized to prevent overfitting.Transfer learning is another popular technique used in CNNs to improve feature extraction.It involves using pre-trained models that have already learned relevant features from large datasets and fine-tuning them on a specific task.This helps to speed up the training process and improve the accuracy of the model.
Different approaches for X-Ray image classification based on CNN are proposed.Some are focusing on transfer learning from pre-trained CNN models.Others are developed from scratch (custom CNN models).Other works couple these two approaches and constitute an approach called the « hybrid CNN Model».In this section, we present an overview of related work, where we organize them through three categories: 1-Transfer Learning from pre-trained models; 2-Custom model, and 3-Hybrid model.

Transfer learning from pre-trained CNN models
A pre-trained model is a model that is trained on large datasets to extract high-level features from images such as VGG, Inception, Resnet, etc.Using a pre-trained model is justified for several motives.Firstly, expensive computation power is required if the training models are large and the datasets are sizeable.The second reason is that the process of training in large models can be slow.It can take several weeks.Finally, a pre-trained model can gain time in order to generalize the network and accelerate the convergence.Adapting a pre-trained model and applying it to a new dataset or task by fine-tuning is the principle of the Transfer Learning technique.
In this section, we give an overview of related studies that focus on transfer learning (TL) from pre-trained CNN models with an X-Ray dataset.
Traditional, parallel convolutional layers, and residual connections are three ideas integrated in the developed model DCNN (deep convolutional neural network) [18].The proposed model accomplished an 86.6% of F1-score when trained from scratch.It got 89.4% using the transfer learning from different domains of the targeted dataset, and it obtained 97.6%with transfer learning from the same domain of the targeted dataset.However, there is a crucial issue according to the source data type utilized by the TL compared to the target dataset.This in terms of data features, sizes, and characteristics.Also, it has been demonstrated that the transfer learning from different domains has a little effect on the performance, as lightweight models trained from scratch perform close to transferred standard models [19].
The Table 1 ranks recent research and exploits several metrics, such as Accuracy, Sensitivity, and Specificity, to compare the performance of models.
In each study, the first column of the table illustrates the best pre-trained model in terms of accuracy when compared to the others.

Custom CNN models
Custom CNN models are deep learning architectures that are designed from scratch to solve a specific problem or task.They are constructed using a series of layers and tailored to the specific requirements of the problem at hand.Compared to pre-trained models, custom CNN models can have a much higher level of control over the architecture's structure, allowing for more flexibility and customization.There were several contributions in which authors presented their own CNN models without dependending on pre-trained models.The Table 2 presents some of these studies aiming to classify chest X-Ray images as COVID-19 or other classes.

Hybrid CNN models
Hybrid CNNs are models which combine predesigned CNNs with other machine learning techniques such as Support Vector Machine (SVM).The idea is simply to add new blocks to the pre-trained CNN model.For example, In the study [45], the process of Covid-19 detection using chest X-Rayis operated on three popular The Table 3 presents a brief overview of significant hybrid CNN Models.

CNNS MATERIALS AND TECHNIQUES
Deep learning techniques have the potential for modeling large sets of complex data.Building and training accurate and efficient deep learning models are based on key concepts.First, the neural network architecture is the foundation of a deep learning model.It is composed of layers of interconnected artificial neurons that process the input data and generate output predictions.The design of the architecture is crucial to the model's performance.Other concepts must be considered carefully such as the selection of appropriate activation functions used to introduce non-linearity into the output of a neural network.Also, the regularization techniques are used to reduce variance in the model by introducing restrictions that prevent it from overfitting to the training data.The accuracy and efficiency of the deep learning model can significantly differ depending on the hyperparameters that are selected.Hyperparameter tuning is a critical step to fine-tune the performance of the model but is also a challenging and time-consuming task.
CNN is one of the most widely used models in the domain of deep learning, because the appropriate features of input data are automatically extracted without any intervention of humans.CNN covers a wide range of topics: we mention computer vision [54], speech processing [55], Face Recognition [56], etc. CNNs are commonly used for image classification and have been successfully applied to medical image classification tasks.They use filter kernels to extract relevant features from images, and multiple layers for image classification.
The present section is devoted to discussing the basic bricks to build a CNN model that deals with Covid-19 classification from X-Ray images.It introduces the important characteristics of an X-Ray image.Then explains why we need a large dataset of X-Ray images and different techniques used for preprocessing data.Additionally, it displays the basic CNN architecture for image classification and the different categories of CNN models.Popular CNN models are summarized in each category.The process of image classification using CNN models, in general, is presented.The section is finalized by talking about metrics that will be used to evaluate how well the CNN model performs once built.

X-Ray characteristics
An electromagnetic pulse called an X-Ray is frequently used in medicine to image several organs, including the bones and the lungs [57].Techniques based on chest X-Rays offer noninvasive disease diagnosis.The difference in tissue chest X-Ray attenuation produces a two-dimensional contrast image, commonly known as an X-Ray.A patient's chest X-Rays are often obtained from a variety of positions and angles having regard to the panel of the source and the detector.Radiographic analysis chest X-Ray is one of the simplest and most basic detection methods that are widely accessible and inexpensive.Chest X-Ray can play an important role in the diagnosis of patients suspected of having SARS-CoV-2. Figure 1 gives three examples of chest X-Ray images of the infected persons and three others for normal persons.The identification process of Covid-19 is based on the presence of a white patchy shadow in the lungs [58].

Dataset
To combat the pandemic of COVID-19, open source data and methods are needed to enable the global scientific community to collaborate on research that is verifiable and transparent [59,60].As a result, a set of electronic medical repositories are developed to help researchers locate the appropriate resource.In our context, we are interested in public chest X-Ray datasets.Several datasets are developed such as PADCHEST1 [61].It is reported at San Juan hospital in Spain by radiologists, the project (COVID chest X-Ray) [62] has the aim of collecting X-ray images showing COVID-19 from online and others sources [63].In general, most of the available X-Ray datasets for COVID-19 classification are collected from Kaggle and Github platforms, such as: Covid-19 radiography database [64], COVID 19-image-dataset [65], Covid-chest X-ray-dataset [66].However, these datasets are limited by the number of lung images with COVID-19 infection.And this is not efficient for training a deep learning model.The model may overfit the data.Therefore, a hybrid dataset combining different repositories is used to train the existing works [67].
The dataset is split into different sets, once it is determined.The purpose of splitting is to avoid overfitting.In deep learning, overfitting occurs when the capacity to learn is so large that the network is learning wrong features instead of meaningful patterns.The original dataset is usually divided into three sets: training, validation, and test sets.The model is trained to employ the training set.The validation set contains examples utilized to vary parameters in the learning process.The final model evaluates the testing set and compares it to the previous datasets.Generally, there are no rules that specify how to partition the data.This may depend on the predictor's number in a predictive model or the original data pool size.Each dataset split should contain subfolders.The number of subfolders depends on the number of output classifications (binary, ternary, ...), for example, in binary classification, we find for each dataset two subfolders: Covid and normal.

Preprocessing techniques
Preprocessing techniques are methods aiming to transform data before it is delivered to the machine learning or deep learning algorithm.Training a CNN on unprocessed images can lead to poor classification results, which can be interpreted in terms of overfitting issues.Overfitting is considered to be a major problem in deep learning.To reduce this problem, an important step must be added as a preprocessing step.It is called data augmentation, which is the process of artificially increasing the amount of data by generating new data points from existing data [68].Data augmentation techniques include: Resizing: To standardize the dataset, all the images are resized into fixed size such as 224×224 or 299×299.
Flipping or Rotating: To increase the sample size of the datasets, horizontal and vertical flipping are mainly used.
Scaling or Cropping: To reduce the redundancy, because not all parts of the images need to be used.
Brightness or Intensity: To increase or decrease the images brightness.
Traditional image preprocessing techniques such as Contrast Stretching, Histogram Equalization, Smoothing, Sharpening, Adaptive Winner Filter, Histogram Enhancement, and Color Space Conversion, are used in some works [69][70][71][72].As a consequence, the preprocessing step is essential to improve data quality and accurate AI Models.

Basic CNN architecture for classification
The structure of CNNs is inspired by neurons in the brains of humans or animals.It works in a similar way to a conventional neural network [73].The CNN composition consists of a series of connected layers of three types: convolutional layers, pooling layers, and fully connected layers.The CNN architecture is based on a combination of these layers [74].The role of the convolutional layer is to subjugate the input data (image) to the convolution operation to produce a series of output features.
Pooling layers can reduce the size of the input image which comes from the convolution results without losing important features.This reduces the number of parameters and computations in the network.This leads to identifying patterns in the data more quickly and accurately in the network.
Finally, the last type of layers is Fully Connected (FC) layers.They are usually positioned before the output layer and form the final layers of a CNN structure.After feature extraction, comes the role of fully connected layers to split the data into different classes, so as to classify the input (see Figure 2 [75]).

Figure 2. CNN architecture
Usually, a complementing layer is added in the CNN architecture: The dropout layer.It is a technique that drops some neurons toward the next layer and doesn't modify all others.The dropped neuron does not participate in the backpropagation or the forward-propagation during the training process, but it is a part of the full-scale network to perform predictions during the testing process.In the prediction process, a technique called the Activation Function is adopted.It selects a neuron that should be activated or not.This means that it decides whether the neuron's input to the network is significant or not.ReLU (Rectified Linear Unit), Softmax, Sigmoid, and tanH (Hyperbolic tangent) are the most used activation functions.
RelU: It converts all values of the input to positive numbers.The central advantage of ReLU compared to others is the lower computational load, because it does not activate, simultaneously, all the neurons.Softmax: It is usually applied as a final activation function in a neural network.It aims at normalizing a network's output to a probability distribution over a set of classes (predicted output classes).
Sigmoid: It is used for binary classification in the CNN model.The input of the sigmoid function is real numbers, and the output value is limited between 0 and 1.
Tanh: It is equivalent to the sigmoid function.Its input is real numbers, but the output is limited to between -1 and 1.

Spatial Exploitation (SE):
In order to improve performance, spatial filters have been used.Spatial filters typically consist of a small matrix of weights (usually 3×3 or 5×5) applied to each pixel in the image.The strength of the feature detected in the image is determined by the weights in the filter.Convolutional operations are affected by the size of the filter.Small filters are usually used to extract fine-grained information, while large filters are used to extract coarsegrained information.
Multi-path (MP): Shortcut connections or numerous pathways allow the flow of information between layers by evading some in-between levels [77].Through cross-layer connectivity, the network is divided into sections.These pathways extend the gradient to lower layers, which resolve the vanishing gradient issue.
Depth (Dp): Network depth is a key factor in determining the performance of a network and its ability to learn [78][79][80][81].Deep networks are more effective at representing specific classes of functionality than shallow systems.
Breadth (B): besides depth, width is a key factor in network learning process.It is confirmed that using ReLU activation functions in neural networks needs to be wide enough to maintain a universal approximation property, while at the same time increasing in depth [82].However, the main issue is that a set of layers may fail to learn features even if the depth is raised.Furthermore, if the maximum width of a network is not greater than the input dimension, then it is not possible for any deep network to approximate a class of continuous functions on a compact set [83].Consequently, to address this issue, researchers focused on wide and thin designs than deep and narrow ones.
Dimension (Dm): separable convolutions (called also depth-wise separable) are introduced to improve the efficiency of normal convolutions.The separable convolutions use point-wise to encode the spatial one, and depth-wise to encode the channel-wise.
Channel boosting (CB): The channel boosting is called also as the input channel dimension.It adds supplementary learners in the CNN for the enhancement of the network's representation [84].The input representation is an influential parameter in network learning.Network performance can be hampered by poor diversity and information about the class in the input.

Feature-map exploitation (FE):
The selection of feature maps can have a significant impact on the improvement of network generalization.Different feature extraction steps are achieved, allowing for diverse types of features.However, excessive feature sets may provide a noise effect, leading to the over-fitting of the network [85].
Attention (A): In CNN, improving representation and overcoming computational limitations are the aims of using the concept of attention.It also gives intelligence to a CNN in order to differentiate elements even in complex scenarios and busy backdrops.
The Table 4 lists some CNN models basing on each cited category [33,41].Metrics formulas base on the terms TP, TN, FP, FN, where: TP means True positive, the number of cases that were correctly predicted as positive; TN denotes True Negative, the number of cases that were correctly predicted as negative; FP signifies False Positive, the number of cases that were incorrectly predicted as positive; FN means False Negative, the number of cases that were incorrectly predicted as negative.
The accuracy is a metric used to measure the percentage of the true and negative cases that are correctly predicted.It is calculated by the formula: The sensitivity (called also Recall) is a measure of a model's ability to correctly identify true positives.It is given by the formula: The specificity is the percentage of negative cases correctly predicted.It is calculated by the formula: The precision measures the proportion of positive prediction results that are correct.

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃
There are additional metrics, such as the confusion matrix and the Area under the Curve (AUC), that the present paper does not mention.

Process of image classification using CNN models
The image classification technique is used for classifying or predicting the category of a particular object in an image.The main goal of this technique is identifying accurately the features in an image.In this study, we summarize the process of image classification based on the CNN model in 7 main steps (see Figure 3).The learning stage includes step 1 to step 5, while the inference one is determined by step 6 and step 7. Step 2: Preprocessing the dataset using different techniques; Step 3: Construction of the CNN model; Step 4: Training and testing the model on the chosen dataset (in step 1); Step 5: The classification performance of a CNN algorithm is evaluated using metrics such as accuracy, specificity, F1 score, and sensitivity.
Step 6: Preprocessing the X-Ray image to be classified to enhance it and reduce noise or unwanted details; Step 7: CNN classification of the introduced image.

DISCUSSION AND CONCLUSION
In literature, a lot of works demonstrate that the CNN model is a powerful method to classify Covid-19 using X-Ray images.We have reviewed significant research papers since the emergence of the virus.Based on the literature and after analyzing studies in this field, we have classified CNN Models into three categories.The first one focuses on transfer learning from pre-trained CNN models.In the second, the CNN model is developed from scratch and called the Custom CNN Model.The last category couples the two previous ones to constitute the Hybrid CNN Model.We have presented a summary of each model in each category by giving the dataset size, the number of classes, the model architecture, and the performance evaluation criteria (Accuracy, Sensitivity, and Specificity).As deep learning models need a large amount of data, dataset size is considered a critical factor in determining the classification performance of the CNN model.The majority of CNN models have achieved high performance for Covid-19 classification.Without ignoring that most research applies different techniques for preprocessing data to enhance their quality.
We further observed that models based on transfer learning from pre-trained models are the most used category in terms of performance and this is for multiple reasons.A pre-trained model that has learned general features can be used to improve the accuracy of the model on a new dataset.By transferring the knowledge acquired from a pre-trained model, transfer learning decreases the training time as well.Since the model has already learned the features, it requires less time to learn new features and patterns in the new domain.Fewer computational resources are needed to train a new model and achieve comparable results.Finally, Transfer learning allows models to be more easily scaled to new applications, platforms, and tasks, allowing for more efficient use of resources and increased flexibility.Without ignoring the great efforts of the developed models in the two other categories.
Custom CNN models can provide high accuracy if they are designed correctly and trained on large datasets.Still, they tend to require more computational power and time to train and optimize compared to pre-trained models and transfer learning techniques.
Hybrid CNN models can be challenging to design and optimize, and finding the optimal combination and tune hyperparameters can be time-consuming.However, these models can lead to superior performance and generalization ability compared to both transfer learning and custom CNN models.
The proposed models in the three categories have the same goal which is the contribution to reduce the ongoing pandemic's impact by improving diagnosis accuracy and assisting research efforts.
We remarked, in our survey, that CNN Models for COVID-19 detection acquire data from a wide range of sources (labs, hospitals, universities, research centers, etc.).So developing a health data collection platform is a necessity that enhances the possibilities for storing, sharing, and analyzing health data among researchers in several fields.Through real-time data acquisition and AI machine learning, researchers have access to huge, diverse, and high-quality datasets.The platform can help them uncover new patterns, create novel therapies, interventions, and prevention methods, and make informed judgments, all of which can reduce illness burden and improve patient outcomes through improved medical research.
We noticed, also, that the proposed models are not applicable or beneficial in hospitals or clinics, while most of these efforts and research aim to help the radiologist to make better decisions in taackling Covid-19.CNNs have the potential for early detection and diagnosis.CNNs have been used to monitor areas to ensure that people maintain social distancing.CNNs can accelerate the vaccine development process by using computational models to predict which parts of the virus are most prone to inducing an immune response.This helps researchers design and optimize vaccine candidates that can potentially protect against COVID-19.
Finally, we aim with this paper to provide important insights for CNN models based on X-Ray images to help researchers develop a standardized, reliable, and comprehensive system for COVID-19 diagnosis and control that can scale up globally and help mitigate the impact of the pandemic on public health and the economy.
CNN models: VGG19, MobileNet, and InceptionV3 by adding two mechanisms: ConvLSTM and SE Block.ConvLSTM means Convolutional Long Short-Term Memory which is a layer used to encode the spatial dependency among the feature mapsreceived from the CNN's last layer of convolution and also to improve the model's image representational capability.The other block is squeezeand-excitation block.It is then added and used to assign weights to significant local features.These two mechanisms are employed to improve the classification strength of CNN models through VGG19 + ConvLSTM + SE block, Inception V3 + ConvLSTM + SE block, and MobileNet + ConvLSTM + SE block.Accuracies using only VGG16, Inception V3, and MobileNet are respectively: 96.01%, 95.22%, and 96.98%.The accuracies after adding blocks are respectively: 97.88%, 97.23%, and 97.80%.

Figure 1 .
Figure 1.Examples of chest X-Ray images of normal vs.infected persons

Figure 3 . 1 :
Figure 3. Process of Image Classification using CNN Step 1: Choosing the Dataset, a large training dataset is essential;Step 2: Preprocessing the dataset using different techniques;Step 3: Construction of the CNN model;Step 4: Training and testing the model on the chosen dataset (in step 1);Step 5: The classification performance of a CNN algorithm is evaluated using metrics such as accuracy, specificity, F1 score, and sensitivity.Step 6: Preprocessing the X-Ray image to be classified to enhance it and reduce noise or unwanted details;Step 7: CNN classification of the introduced image.

Table 1 .
Overview of models based on TL from pre-trained CNN models

Table 2 .
Custom CNN models

Table 3 .
Hybrid CNN models

Table 4 .
Summary of popular CNN Models in each category