Optoelectronic Retinal Images for the Prediction of Diabetic Macular Edema Based on a Hybrid Deep Transfer Learning Technique

ABSTRACT


INTRODUCTION
Diabetes is a major health threat which affects up to 7.2% of the population world-wide and this will increase up to 650 million by the end of 2040 [1,2].Among the diabetics, one third of all diabetics develop Diabetic Retinopathy (DR) and the most complicated stage of DR is Diabetic Macular edema (DME).DME typically manifests itself when the retinal vessels are impacted by the accumulation of fluid [3,4] and causes vision loss.It affects nearly 2.8% of the population which is estimated to increase even to 10% of the total global population.DME affects around 26.7 million individuals, and it is anticipated that this figure will increase to almost 50 million by 2025 [5][6][7][8].
Despite having effective screening for early diagnosis of DME in developed countries, avoiding the false prediction of DME has always been a challenge for diagnosticians.Because of the limited number of ophthalmologists available in developing nations, it is difficult to keep up with the constantly growing DME cases [9,10].Also, in developing nations, the provision of appropriate and timely treatment at an affordable cost is another problem in the healthcare industry.Under these circumstances, automated diagnosis frameworks can lower the diagnostic costs and reduce the workloads of ophthalmologists.These systems can also be able to manage the lack of ophthalmologists by restricting referrals to just those cases that require immediate evaluation.To reduce DME cases, it will be essential to reduce the time to diagnosis and effort that ophthalmologists spend on diagnosis.
Propelled by the above challenges, several imaging diagnosis systems have been developed based on Optoelectronic retinal images using machine and deep learning algorithms [11][12][13][14][15][16][17][18][19][20].A two-stage method is used to identify and classify the severity of DME using colour fundus images [11].A supervised learning technique is used to carry out the process of DME detection.The feature extraction strategy captures the global characteristics and differentiates the DME and normal images.
A unique model is discussed in the study of Lee et al. [12] to accomplish automated image analysis by combining deep neural networks with machine learning.Optical Coherence Tomography (OCT) provides deep and rich data when combined with labels produced from the electronic medical record.The diagnosis of DR based on Convolutional Neural Networks (CNN) is discussed in the study of Perdomo et al. [13].It combines images of the eye fundus with the location of exudates for the automated classification of DME.A deep CNN for the classification of DR is discussed in the study of He et al. [14].The classification of DR, DME, and multi-label are carried out by three different CNNs independent of one another.All CNN's features are fused for effective classification.
A neural network system based on recurrent attention mechanisms is described in the study of Shaikh et al. [15] which helps reduce the computations of processing overhead required when executing convolution filter operations on high-resolution images.Two different medical images are employed for the classification tasks: brain tumor classification using magnetic resonance imaging and predicting the severity of DME using fundus images.
A novel cross-disease attention module (AM) is developed in the study of Li et al. [16] to classify DME and DR.It is accomplished by investigating the intrinsic link between diseases by using image-level supervision.The diseasespecific AM allows for the selective learning of relevant characteristics for diseases, and their internal relationships are captured using the disease-dependent AM.
An efficient framework to correctly locate and classify disease lesions is discussed in the study of Nazir et al. [17].In contrast to the current DR and DME classification methods, the system can successfully classify low-intensity and noisy images by extracting representative key points from such images.Retinal fundus and OCT images are used to create an automated framework to classify DR and normal in the study of Hassan et al. [18].It utilizes deep ensemble learning where a deep CNN is used to identify the input OCT and fundus images.Subsequently, the second layer extracts the essential feature descriptors required for the classification.
The automatic detection of AMD and DME is described in the study of Kaymak and Serener [19] using a deep learning technique.It classifies the input image into wet or dry AMD, DME, and healthy.The effectiveness of Iowa Detection Programme for automated DR detection using publicly available fundus images is discussed in the study of Abrà moff et al. [20].
These automated diagnosis frameworks can decrease the cost and workloads, as well as the shortage of ophthalmologists.Also, these methods play a pivotal role in reducing the DME cases and reducing the clinician's efforts.But these methods need its improvisation in terms of achieving accurate segmentation, more subtle feature extraction along with the high-speed classification layers that can act as potential tool for high certainty DME diagnosis systems to be used by the ophthalmologists across the globe.
Motivated by this challenge, HET-EYE-NETS model which automatically analyses the fundus images is developed for the prediction of DME.Pre-processing on the images is performed in three stages such as morphological filtering, pixel intensive testing and image enhancement.The IDRiD and MESSIDOR database images are enhanced using augmentation techniques to address class imbalance problem, hence improving the HET-EYE-NETS model's performance.After pre-processing, the images are fed to a three-stage pipeline architecture.In the 1 st stage, the DME region is segmented from the fundus images, and in the 2 nd stage, image features are extracted and finally detect the DME using high speed classification layers.
The HET-EYE-NETS model is completely built on the principle of transfer learning ensembled with high accurate Extreme Learning Machines (ELM).It is the first of its kind model used for an early diagnosis of the DME from optoelectronic retinal images.

MATERIALS AND METHODS
The HET-EYE-NETS model is a hybrid transfer learningbased system for DME classification.It consists of three important modules: data preprocessing, U-Net based macular segmentation, and classification by ELM. Figure 1 shows the architecture of HET-EYE-NETS using optoelectronic retinal images for the prediction of DME.

Image datasets
To build robust and accurate classification models, it is important to have a uniform distribution of the datasets.Class imbalance is a common problem in image/signal datasets.It significantly impacts the model's performance when training the network and often creates an overfitting problem for smaller datasets.To address this issue, data augmentation is employed in which each image undergoes a series of transformations such as flips, jittering, scaling, and rotation, producing a uniform arrangement of data to train the network.The proposed HET-EYE-NETS model uses two publicly available datasets, IDRiD [21] and MESSIDOR [22].The specifications of these databases are given in Table 1 and Table 2 respectively.

Data preparation
Pre-processing plays an important role in enhancing the accuracy of the training model by minimizing background noise.It removes the different noise levels in the images, thus making data more consistent for training.Since the image datasets mentioned above have different resolutions, a preprocessing technique is adopted to create the standardized datasets.The proposed model has three pre-processing techniques.First, the morphological filtering technique, which filters background noises, is employed.In the second stage, intensive testing is applied on fundus images to remove the inconsistent and noisy pixels.Finally, image histogram methods to enhance the image quality are adopted.

HET-EYE-NETS
The proposed methodology works on the three different pipelined stages.Macular segmentation is the first stage, followed by feature extraction maps and classification of different macular grades.This research proposes the ensemble layers of transfer learning based on CNN for effective segmentation and feature extraction.

Transfer learning mechanism
Various studies have established that transfer learningbased CNN is better than traditional CNN training from scratch.Transfer learning is used in image classification [23], skin cancer diagnosis [24], brain cancer diagnosis [25], and lung cancer diagnosis [26].Training from scratch suffers from computational overhead and complexity when larger datasets are involved.In the medical field, expert annotation is also an expensive issue.To overcome these problems, transfer learning is adopted to train CNN effectively.In transfer learning, CNN first learns features in one setting and uses the same settings in another task.For effective segmentation and feature extraction, the proposed methodology uses an ensemble of U-Nets and AlexNet.

Macular U-Net segmentation
In HET-EYE-NETS model, the U-Net network segments macular from the eye images.U-Net captures local and global characteristics using an encoder-decoder architecture.The encoder gathers contextual information, whereas the decoder provides precise localization.It enables hierarchical feature learning and the skip connections in U-Net helps to preserve fine-grained details.U-Net has lower parameter count than other deep learning networks.Due to its popularity and effectiveness in various computer vision applications, the proposed system employs U-Net for macular segmentation.
The working of U-Nets can be divided into two components.The first approach is the contracting path, which employs a conventional CNN architecture.Every block in the contracting route is comprised of two consecutive 3×3 convolution filters, which ae then followed by a Rectified Linear Unit (ReLU) unit and a pooling layer.This pattern is repeated several times to enhance the effectiveness of the training.The unique characteristic of this framework is the use of an extension route in which 2×2 up-convolution is employed to up samples the feature maps.Then, the feature maps in the contracting path are cropped and merged onto the up sampled feature map.Next, there are two consecutive 3×3 convolution filters and ReLU activation.Finally, 1×1 convolution filters are used to reduce the feature maps to the desired channels and then the segmentation results are generated.Cropping is used to eliminate extraneous contextual information and to segment the objects from the surrounding overlapping area.In this work, U-Net is used to separate the macular regions.It has advantages for image segmentation in smaller datasets and effectively mitigating the issue of overfitting.Figure 2 shows the U-Net framework used for macular segmentation.The inputs to the feature extraction layers are the segmented images from the U-Net framework.In the second stage, AlexNet extracts the feature maps from the segmented images.AlexNet success in the ImageNet challenge marked a significant breakthrough in the deep learning revolution.The parallel processing capabilities allow the model to be trained more efficiently than other architectures.Also, the model generalization is improved by local response normalization by AlexNet across local groups of neurons and incorporated ReLU function.It generally comprises of 5 convolutional layers and 3 Fully connected layers.The primary subtleties of each layer in the network are shown in Figure 3.

Feedforward layers classification layers
In the third stage, extracted feature maps are used to train the model to classify the different grades of the macular images.The proposed methodology uses ELM [27] for the high-speed and accurate classification of different grades.ELM employs a single hidden layer, which does not necessarily need to be tuned.It utilizes the kernel function to achieve high precision, resulting in improved performance.They have low training error and improved approximation.ELM is mainly used in classification tasks due to its utilization of auto-tuning for biases, weights and non-zero activation functions.
In ELM, the hidden layer's neurons must use an activation function (for instance, the sigmoid function) that is differentiable, while the output layer's activation function of the output layer remains linear.In ELM, it is not necessary to tune the hidden layers' weight and they are assigned randomly including the bias weights.The presence of hidden nodes is significant in this case.However, it is not necessary to adjust them, and the parameters of the hidden neuron may be generated in advance, that is, before handling the training set data.The working principle of ELM is discussed in the study of Wang et al. [28].A single-hidden layer ELM is defined in Eq. (1). where, x is the input features.The output hidden layer ( ) (x h ) and output weight vector (  ) are defined in Eq. ( 2), and Eq. ( 3) respectively.
To determine the ELM's target vector, the output hidden layer in Eq. ( 3) is redefined in Eq. ( 4) Eq. ( 5) represents the minimal non -linear least square method used by the ELM.
where,  H is the Moore−Penrose generalized inverse and the above Eq.( 5) can also be rewritten as Hence, the output function can be defined in Eq. ( 7) The different grades of images are classified based on the mathematical Eq. ( 7).At the output layer, sigmoid function is used to classify DME images.The ELM parameters for the HET-EYE-NETS architecture are shown in Table 3.The abovementioned hyperparameters are applied to train the ELM model to achieve optimized prediction of DME.The output matrix H of hidden layer's is generated using the randomly selected biases and weights, along with the activation functions.Also, the '  matrix is computed using the training data.Finally, ' gives the classification of DME.

RESULTS AND DISCUSSION
To evaluate the HET-EYE-NETS's performance, standard performance measures have been employed.The performance metrics such as precision, recall, F1-score, specificity, and accuracy are calculated using the mathematical expression as presented in Table 4.  4, TN is True Negative values, TP is True Positive values, FN is False Negative values and FP is False Positive.Within the scope of this investigation, 5-fold cross-validation is used to generalize the HET-EYE-NETS model's performance and to evaluate the classification measures.Both datasets are partitioned into five equal-sized sets while ensuring that each set represents the independent data by preserving random seeds across the iteration.Four partitions are employed for training, and for testing, the remaining partition is used.These five steps are iterated for two datasets for which average classification performance is evaluated.

Sensitivity or recall
Due to imbalance in both datasets, data augmentation is utilized to balance the number of images in each category.As the images in Grade-1 category is less than 100, the application of data augmentation with different rotation and flipping processes increases to 600 images per category.The complete algorithm was implemented using Tensor flow 2.1 backend with Keras libraries, which runs on the PC with i9 CPU, 16 GB RAM which operates at 3.4 GHZ, and NVIDIA TITAN GPU.This study is based on the three-stage HET-EYE-NETS, which uses the ELMs as a key classification mechanism.It is found that the HET-EYE-NETS model has uniform performance in classifying DME images into different grades of severity.Tables 5 and 6 show the performance metrics calculated for the HET-EYE-NETS model using different datasets.Figure 4 shows the accuracy and loss curves of the U-Net framework used for the macular segmentation for IDRiD dataset.
To prove the excellence of HET-EYE-NETS system, their performances are compared with different learning models using different datasets.present the comparative analysis between HET-EYE-NETS system and existing algorithms using IDRiD and MESSIDOR datasets to classify the severities of DME images.
Tables 7-12 show that the proposed algorithm has shown uniform and high performances when handling the different datasets.Also, it has been proved that HET-EYE-NETS system has outperformed other existing algorithms for classifying severity levels of DME.The inclusion of data preprocessing techniques and three-stage working mechanisms have made the proposed algorithm exhibit superior performances from the other algorithms even greater than the hybrid learning mode -DMENETS.

Figure 2 .
Figure 2. U-Net framework used for macular segmentation

Figure 3 .
Figure 3. AlexNet framework used for feature extraction

Table 1 .
Specification of images in the IDRiD database

Table 2 .
Specification of images in the MESSIDOR database

Table 5 .
Performance metrics of the HET-EYE-NETS model on the IDRiD dataset sets

Table 6 .
Performance metrics of the HET-EYE-NETS model on the MESSIDOR datasets

Classification Mode Performance Analysis Accuracy Precision Recall Specificity F1-Score
Figure 4. Macular segmentation's accuracy and loss curves of the U-Net frameworks

Table 7 .
Performance comparison of different algorithms using IDRiD datasets for Grade 0 detection

Table 8 .
Performance comparison of different algorithms using IDRiD datasets for Grade 1 detection

Table 9 .
Performance comparison of different algorithms using IDRiD datasets for Grade 2 detection

Table 10 .
Performance comparison of different algorithms using MESSIDOR images for normal detection

Table 11 .
Performance comparison of different algorithms using MESSIDOR images for Grade 1 detection