© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Healthcare has historically relied on medicinal plants, and research worldwide continues to assess their efficacy, resulting in the development of plant-based medications. This study proposes improving medicinal plant classification performance using transfer learning models. To address the limitations of small datasets, we employed StyleGAN2-ADA for data augmentation in comparison with conventional augmentation techniques. Also developed a challenging dataset, named the Iraqi Medicinal Plant Dataset, comprising 438 images from ten medicinal plant species: Hibiscus rosa-sinensis, Brassica oleracea var. italica, Ricinus communis, Lactuca sativa, Datura innoxia, Capparis spinosa, Hibiscus sabdariffa, Senna sulfurea, Matricaria chamomilla, and Silybum marianum. The dataset was primarily collected from the Research Unit for Palms Lab at the College of Agricultural Engineering Sciences, University of Baghdad. then applied six pre-trained deep convolutional neural network architectures; Bit_Sr50x1, ResNet_V1_152, Inception_V3, MobileNetV2_130_224, Inception_ResNet_V2, and Nasnet Large on both conventional and GAN-augmented dataset. By integrating StyleGAN2-ADA augmented data led to significant improvements in F1-scores, outperforming conventional augmentation methods by margins of 3.56% to 19.42%, depending on the architecture. These results highlight the effectiveness of integrating GAN-based data augmentation with deep learning architectures in improving classification performance for small and complex datasets.
transfer learning, generative adversarial networks, plant recognition, deep learning, Bit_Sr50x1
Iraq's rich medicinal plant diversity, deeply rooted in Sumerian, Babylonian, and Assyrian traditions, presents classification challenges due to undocumented species, morphological similarities, and environmental variations. For instance, the diverse landscapes of Sulaymaniyah support a unique traditional healthcare system that relies heavily on oral knowledge. However, habitat destruction, overharvesting, and a lack of scientific validation threaten these valuable resources [1].
Medicinal plants are the plants or parts of plants that are used for therapeutic and nutritional purposes and from the beginning of human existence, plant materials with medicinal qualities have been used extensively to cure human illnesses [2]. Conserving biodiversity is essential due to the extinction risk encountered by several plant species, and conventional medicinal systems extensively depend on a wide variety of plants, providing an alternative to synthetic pharmaceuticals and fostering wellness. Notwithstanding the importance of these plants, records for medicinal herbs are not easily accessible [3]. Given their vital role in healthcare and biodiversity, medicinal plants have been the focus of extensive research. However, classifying and identifying these species remains a complex and time-consuming task, even for experienced botanists. This study aims to document medicinal plants to support conservation efforts and facilitate future phytochemical research.
Consequently, a vision-based method like neural networks may assist scientists and the general populace in identifying plant species with higher speed and precision [4]. Especially since neural networks have shown significant progress in multiple fields. They have been used for object detection [5], design and implementation of monitoring robotic systems [6], malware detectors [7], predictive maintenance [8], and many applications in the medical field, including diagnosing diseases like COVID-19 [9]. However, deep learning models are data-intensive, requiring significant amounts of training data for optimal efficacy, and a limited dataset leads to overfitting. To address this issue, various approaches like dropout, data augmentation, and semi-supervised learning were implemented [10]. On the other hand, amassing a substantial training dataset requires significant time and effort, especially since the collection of plant samples is tough due to the changing environmental conditions and the extensive diversity of plants. Samples must be obtained under diverse weather circumstances and growth stages, encompassing differences in shape, size, and color [11].
Recently, Generative Adversarial Networks (GAN) developed by Ian Goodfellow in 2014 [12], are progressively utilized for the production of image data. Research demonstrates that GAN has superior capabilities for producing picture data for training compared to conventional data augmentation techniques in image processing [13]. Generative Adversarial Networks (GANs) are a form of deep learning models including two main components: the generating network and the discriminative network. The primary neural network is the generator, which creates novel synthetic data instances that mimic the training data. The discriminator seeks to distinguish between real data and fake data generated by the generator network [14].
Collecting a sufficiently large dataset of images for a certain application is tough due to constraints related to subject type, quality of picture, geographical position, period of time, privacy, copyright status, and other factors. The challenges are intensified in applications involving the collecting of a new, distinct dataset: obtaining, processing, and distributing the images needed to train an effective, high-quality, high-resolution GAN [12] is a cost-intensive operation. This restricts the increasing utilization of generative models in domains such as medicine. Since a significant amount of real training data is still required to train GAN models for generating samples with appropriate quality, in 2020, developers at NVIDIA released StyleGAN2-ADA [15], which significantly advances the effective training of GANs with limited data. This was achieved by the use of an Adaptive Discriminator Augmentation (ADA) mechanism, which stabilizes training in scenarios that have limited data, hence minimizing common overfitting concerns associated with the discriminator [16].
This study employed the StyleGAN2-ADA model to generate more samples for the ten medicinal plants dataset. Then compare the results of classification using six transfer learning models when using the conventional augmentation techniques versus StyleGAN2-ADA. Figure 1 shows the real data with the data generated by StyleGAN2-ADA of ten medicinal plant species.
Figure 1. The results of StyleGAN2-ADA on the Iraqi Medicinal Plant Dataset, (a) Real images, (b) StyleGAN2-ADA
This study made the following significance contributions:
The rest of the study is organized as follows: Section 2 is composed of the most recent related work in the field. Section 3 explains the methodology in detail. Section 4 presents all experiments and results. Section 5 concludes the study.
Since the initial development of GAN, various researchers have suggested numerous versions of GANs to improve the performance of deep learning models on plant classification. Sachar and Kumar [17] employed Deep Convolutional Generative Adversarial Networks (DCGAN) on their manually collected dataset of 30 leaves per species from five medicinal plant species. As a solution for the limited quantity of images, which prevents the convolutional neural network's ability to learn features, hence amplifying the dataset's variance. The authors conducted a comparative analysis of deep learning models, specifically VGG16, ResNet50, and DenseNet 121, as classifiers. The DenseNet architecture model produced the best mean accuracy of 97.51% across the five folds.
Deshmukh [4] examined four deep learning classifiers: Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs), and Multilayer Perceptrons (MLPs), utilized on an optimized dataset of medicinal plant leaves. comprising 15 distinct Indian species, totalling 82,500 images, with approximately 5,500 images per species. The Wasserstein GAN (WGAN) achieves 96.77%, while the multi-layer perceptron classifier exhibits a performance accuracy of 99.01%. Convolutional neural networks achieve an accuracy of 98.3%, and Recurrent Neural Networks accuracy was 97%.
Swathika et al. [3] applied Conditional Generative Adversarial Networks (CGAN) for data synthesis on a medium plant dataset consisting of 50 classes with approximately 5000 images in total and tested them using their model that combines the best features of three pretrained deep learning models, which were MobileNetV2, ResNet-50, and Xception, which achieved 98.6% accuracy.
An improved conditional Deep Convolutional Generative applied on stressed strawberry leaves was grouped into seven small datasets [18]. The largest sample size was 100, and the smallest was 6. A Residual Network (ResNet) classifier was established before and after data augmentation. The results showed an improvement in accuracy by 6.9%. It also showed that the minimal sample size achieving effective data augmentation could be as low as 20.
A generative adversarial network (CA-GAN) powered by a channel attention strategy that can generate realistic synthetic weed data [19]. Two datasets were used to evaluate the model's performance: the Institute for Sustainable Agro-ecosystem Services (ISAS) dataset, which includes five common summer weeds in Japan, and the public segmented Plant Seedling Dataset (sPSD), which includes nine common broadleaf weeds from agricultural land. The suggested CA-GAN produced a synthetic dataset with a recognition accuracy of 93.46% on the ISAS dataset and 82.63% on the sPSD.
Diffusion models were employed to generate images of weeds, enhancing weed detection [20]. Experiments conducted on two publicly available multi-class weed datasets, CottonWeedID10 and Deep Weeds. The incorporation of synthetic weed images enhanced the accuracy of weed categorization by four deep learning models (VGG16, Inception-v3, and ResNet50) by 1.17%. yielded a testing accuracy over 94%. Table 1 summarizes the best results from recent related works that explored various generative adversarial networks (GANs) and, in one case, diffusion models to improve plant classification. These works addressed different dataset scenarios, including limited multiclass datasets, large multiclass datasets, and even single-class limited datasets.
Table 1. Literature survey
Reference |
Year |
Augmentation |
No. of Classes |
No. of Images |
Classification Accuracy |
Sachar and Kumar [17] |
2023 |
DCGAN |
5 |
150 |
97.51% |
Deshmukh [4] |
2024 |
WGANs |
15 |
82,500 |
96.77% |
Swathika et al. [3] |
2024 |
cGAN |
50 |
6000 |
97.96% |
Zhu et al. [18] |
2024 |
cDCGAN |
1 |
100 |
Acc gains 6.9% |
Li et al. [19] |
2024 |
CA-GAN |
9 5 |
3097 2997 |
82.63% 93.46% |
Chen et al. [20] |
2024 |
Diffusion models |
25 |
- |
94% |
In contrast, the Iraqi Medicinal Plant Dataset posed a greater challenge as it consists of a limited number of multiclass images. Despite this, we surpassed the accuracies achieved in the related works by employing the StyleGAN2-ADA model, which effectively addressed the challenges of our dataset and improved classification performance.
The main objective of this paper is to improve the performance of medicinal plants identification especially on limited or unbalanced dataset. To achieve this goal multiple stages have been done as shown in the framework in Figure 2.
Figure 2. Overview of the proposed methodology
Starting with the stage of forming a dataset for medicinal plants in Iraq by taking pictures of medicinal plants. Since plants have specific growth stages and seasons in different regions of Iraq, between north and south, east, and west, it takes a long time and effort to obtain a sufficient number of pictures. Therefore, additional pictures were collected from reliable flora sites. After that verify the collected images by a botanist to make sure that all the images are belonging to the right class and labelled with right scientific name. Then resizing all the images into 256 × 256 pixels and change their format to jpg. Next comes the most important stage that can improve the accuracy of classifier and solve overfitting problem, which is the augmentation. Two types of augmentation were applied, the conventional augmentation which includes a python code do rotation, translation and zoom processing and the other is a GAN-based augmentation by StyleGAN2-ADA model. After training StyleGAN2-ADA model on the dataset, it generates good quality images depending on the size of the training dataset. Both types of augmentation results will be classified by pretrained deep learning models.
3.1 Dataset description
This dataset is a worldwide attempt to document and study medicinal plants that support traditional medicinal systems internationally. A detailed description is provided in Table 2. Given the widespread distribution of medicinal plants across various provinces in Iraq some of which are difficult to access the ten species with the highest number of available images were selected to enhance accessibility and dataset quality. The collection has 438 photos across 10 species of medicinal plants: Hibiscus rosa-sinensis, Brassica oleracea var. italica, Ricinus communis, Lactuca sativa, Datura innoxia, Capparis spinosa, Hibiscus sabdariffa, Senna sulfurea, Matricaria chamomilla, and Silybum marianum. The majority of these images were independently obtained using an iPhone 13 camera at the Research Unit for Palms Lab, College of Agricultural Engineering Sciences, University of Baghdad. Meanwhile, the remaining data is gathered from botanical image repositories such as Plant of The World, all were scaled at 256 × 256 pixels.
Table 2. The Iraqi Medicinal Plant Dataset samples
The Scientific Name |
No. of Samples |
Sample Image |
Hibiscus rosa-sinensis |
59 |
|
Brassica oleracea var. italica |
57 |
|
Ricinus communis |
56 |
|
Lactuca sativa |
55 |
|
Datura innoxia |
42 |
|
Capparis spinosa |
37 |
|
Hibiscus sabdariffa |
35 |
|
Senna sulfurea |
34 |
|
Matricaria chamomilla |
32 |
|
Silybum marianum |
31 |
|
Total |
438 |
|
3.2 Proposed model
The proposed model includes two main deep learning components. The first component is GAN, which is used for the preprocessing phase, while the deep transfer model is used in the training, validation, and testing phases. The proposed GAN/deep transfer learning framework enhances image data preprocessing and augmentation using GANs, which boosts the dataset's quality and diversity. This data is fed into a CNN model for training, with performance assessed through validation and testing phases. The approach ensures robust model training and evaluation, improving generalizability and accuracy in image-based tasks.
3.2.1 Training StyleGAN2-ADA
The StyleGAN2-ADA algorithm is an enhanced version of StyleGAN2, with ADA signifying Adaptive Discriminator Augmentation. The primary objective is to address the issue of potential overfitting of the discriminator in StyleGAN2 during training whereas the training dataset is limited [21]. This may prevent the generator from successfully adjusting the model parameters, resulting in reduced training efficiency.
The up-sampling network in Figure 3 comprises a latent input $z$, often a style picture, which is normalized, encoded, and non-linearly transformed into a latent space $w$ using a multi-layer perceptron (MLP) with 8 fully linked layers. The generator network comprises a constant input vector with a mean of zero and a standard deviation of one, which is employed in the first layer and acts as the input for the first Adaptive Instance Normalization (AdaIN) operation. The styles from the encoded latent space $w$ are applied via the AdaIN operation at the corresponding resolutions of the generator network. The output of the corresponding convolutional layers is dictated by the style imparted by the AdaIN operation. Gaussian-distributed noise is introduced per pixel following the corresponding convolutional layers to effectively mitigate distortions that may manifest in the picture, like water droplets in areas such as wrinkles and hair on human faces. This method successfully incorporates stochastic variance into the produced pictures. While the discriminator is a down sampling network shown below in Figure 4, that aligns the picture dimensions with the final output of the generating network. The picture is inputted into the discriminator network and subsequently enhanced with prescribed augmentation transformations. Additionally, these input pictures are down sampled to the appropriate resolutions of the generator network utilizing residual or skip connections. The discriminator network performs binary classification, differentiating between authentic and counterfeit pictures [22].
Figure 3. StyleGAN2-ADA generator network architecture [21]
Figure 4. StyleGAN2-ADA discriminator network architecture [21]
Training StyleGAN2_ADA on Iraq Medicinal Plant Dataset duration was 18 hours, 11 minutes, and 27 seconds on a Google Colaboratory Pro Plus equipped with 52 GB of RAM and a P100 GPU. The dataset produced the model (.pkl) file following a designated training period. The (.pkl) file will be created every 100 epochs. Upon achieving training results of 256×256 pixels, the training process terminated and utilized the final (.pkl) file, after saving it. The initial training iteration seemed irregular as they shifted from the original photos to a new set. Subsequently, the maximum number of iterations significantly differed from the initial iterations, and then images were generated.
Even though the Iraq medicinal plants dataset is challenging because it is limited and unbalanced, the images of the ten classes dataset increased nearly two times to 856 medicinal plant images. The classes with the highest number of images get better quality images. Table 3 displays StyleGAN2-ADA parameters, where the model was trained with images at a resolution of 256×256, using a batch size of 16 and a learning rate of 0.0025. The training progressed over 416 ticks, corresponding to 1,664 kimg (1,664,000 images processed). These settings balance model stability and computational efficiency, allowing the generator and discriminator to learn effectively at this resolution.
Table 3. Parameters for StyleGAN2-ADA
Parameters |
Values |
Tick/Epoch |
416 |
Kimg |
1664 |
Learning Rate |
0.0025 |
Resolution |
256×256 |
Batch |
16 |
3.2.2 Classifiers
In the second component of the methodology, this study employs the same transfer learning models that were utilized in previous study [23]. Selected deep learning models differ in architecture, efficiency, and feature extraction capabilities, making them suitable for various aspects of medicinal plant classification. Bit_Sr50x1 leverages strong transfer learning from large datasets, ensuring good generalization with minimal fine-tuning. ResNet_V1_152 and Inception_ResNet_V2 offer deep feature extraction and improved learning through residual connections, making them ideal for fine-grained plant classification. Inception_V3 balances accuracy and computational efficiency by capturing multi-scale features, while MobileNetV2_130_224 is optimized for real-time classification on resource-limited devices. Finally, Nasnet Large, designed via Neural Architecture Search, dynamically optimizes performance, making it well-suited for complex plant datasets. This diverse selection enables a comprehensive evaluation of plant classification under different constraints and requirements.
(1) Bit_Sr50x1
A deep learning model that has been pre-trained [24]. The acronym "Bit_s" denotes the Big Transfer family-s, while "r50x1" implies ResNet-50x1. This implies that the architecture is a ResNet-50 that has been pretrained on ImageNet-1k. Bit_s-r50x1 is recognized for its outstanding performance across a wide range of image classifying task, which is a direct result of its exhaustive pre-training on large datasets. It was selected for its capacity to generalize effectively to new tasks with minimal fine-tuning.
(2) Resnet_V1_152
A deep convolutional neural network design was developed known as ResNet_v1_152, which is an abbreviation for Residual Network version 1 with 152 layers [25]. It is a member of the Res Net family, which was created to address the issue of very deep neural network degradation by implementing residual learning. Known for its exceptional performance in deep learning tasks, it was attributed to its residual connections, which effectively mitigated the vanishing gradient problem.
(3) Inception_V3
Inception-v3 is a convolutional neural network (CNN) architecture that belongs to the Inception family. Inception-v3 [26] has been widely employed in several computer vision applications, including object recognition, picture segmentation, and image classification. It has exhibited superior performance on benchmarks such as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).
(4) MobileNetV2_130_224
MobileNetV2 architecture has a special variant: 130_224. It has been developed for efficiency in picture classification [27]. "130" in MobileNetV2_130_224 is the width multiplier, which regulates the number of channels in each network layer, and "224" represents the input resolution. It is effective and lightweight in resource-constrained environments, such as IoT, mobile, and edge computing platforms. Depth wise separable convolutions, inverted residual blocks, and skip connections make the MobileNetV2_130_224 architecture efficient and accurate. The lightweight architecture was designed to evaluate plant recognition models on resource-limited devices.
(5) Inception_Resenet_V2
The Inception-ResNet-v2 deep convolutional neural network architecture combines the advantages of the Inception module with residual connections. This advanced deep neural network reaches superior performance across multiple computer vision tasks, such as object detection, image segmentation, and image classification [28]. Inception-ResNet-v2 shares the same architecture as Inception-ResNet-v1 but varies in its foundational elements.
(6) Nasnet_Large
NASNet-A is a convolutional neural network architecture designed for image classification. The configuration of its convolutional layers has been determined by Neural Architecture Search (NAS). NASNets, as revealed via researches [29, 30], are available in multiple sizes, including Nasnet_large of NASNet-A for ImageNet, which initiates with 168 convolutional filters and utilizes 18 Normal Cells, Nasnet_large is an architecture developed by Neural Architecture Search (NAS), which enhances the network configuration for improved performance. It was incorporated to assess whether this automated design approach might surpass hard designed models.
The classifiers were trained with a batch size of 16, a learning rate of 0.001, and a validation split of 20% over 10 epochs, with fine-tuning enabled. The Categorical Crossentropy loss function, incorporating label smoothing (0.1), and the SGD optimizer with a momentum of 0.9 ensured stable and effective model optimization. This consistent setup allowed for a robust comparison across models under both conventional and StyleGAN2-ADA-augmented datasets.
4.1 Experimental environment
All experiments were executed on Google Colab Pro+ using Python 3.10.12. They were conducted on a PC equipped with a 12th Gen Intel® Core™ i7 processor and 16GB of RAM.
4.2 Performance evaluation metrices
The performance of deep learning models is assessed using accuracy, precision, recall, and F1-score to ensure a comprehensive evaluation of medicinal plant classification. These metrics provide insights into the model’s predictive capabilities, particularly in handling class imbalances and ensuring reliable classification. Accuracy serves as a general performance indicator but may be insufficient for imbalanced datasets. Precision measures the proportion of correctly classified positive instances, minimizing false positives and reducing misidentifications in ethnobotanical studies. Recall evaluates the model’s ability to identify all relevant instances, preventing the omission of medicinal plant species. The F1-score balances precision and recall, offering a robust measure when false positives and false negatives must be considered [31]. The mathematical formulations of these metrics are provided in Eqs. (1)-(4).
Accuracy $=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$ (1)
Recall $=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$ (2)
Precision $=\frac {\mathrm{T P}}{\mathrm{T P+F P}}$ (3)
F1_score $=\frac{2 * \text { Precision } * \text { Recall }}{\text { Precision }+ \text { Recall }}$ (4)
where, true positive, true negative, false positive and false negative are denoted as TP, TN, FP and FN.
The performance of six transfer learning models was evaluated using the Iraqi Medicinal Plant Dataset enhanced first with conventional data augmentation techniques, including rotation, translation, and zoom operations. Table 4 displays their results. Second, the performance evaluation when using StyleGAN2-ADA and Table 5 shows the results of them. The models were compared based on accuracy, validation accuracy, loss, validation loss, and additional metrics such as precision, recall, and F1-score.
Table 4. Performance analysis of classification method with conventional augmentation
Transfer Learning Model |
Accuracy (%) |
Validation Accuracy (%) |
Loss |
Validation Loss |
Precision (%) |
Recall (%) |
Bit_Sr50x1 |
99.71 |
93.75 |
0.5505 |
0.7894 |
94.07 |
93.10 |
Resnet_V1_152 |
97.67 |
81.25 |
0.9032 |
1.1957 |
83.54 |
81.60 |
Inception_V3 |
96.79 |
88.75 |
0.9173 |
1.0685 |
91.25 |
88.50 |
MobileNetV2_130_224 |
95.63 |
77.50 |
0.8098 |
1.3111 |
84.37 |
75.86 |
Inception_Resenet_V2 |
93.29 |
86.25 |
1.2708 |
1.8002 |
86.65 |
85.05 |
Nasnet_Large |
91.84 |
90.00 |
2.0168 |
2.0976 |
93.65 |
90.80 |
Table 5. Performance analysis of classification method with StyleGAN2_ADA augmentation
Transfer Learning Model |
Accuracy (%) |
Validation Accuracy (%) |
Loss |
Validation Loss |
Precision (%) |
Recall (%) |
F1-score (%) |
Bit_Sr50x1 |
100 |
96.43 |
0.5363 |
0.6096 |
96.59 |
96.49 |
96.47 |
Resnet_V1_152 |
98.38 |
94.64 |
0.8708 |
0.9453 |
95.27 |
94.73 |
94.79 |
Inception_V3 |
97.93 |
94.64 |
0.8566 |
0.9028 |
94.99 |
94.76 |
94.76 |
MobileNetV2_130_224 |
96.45 |
91.07 |
0.7607 |
0.9072 |
92.12 |
91.22 |
91.30 |
Inception_Resenet_V2 |
96.16 |
95.24 |
1.1779 |
1.2602 |
95.54 |
95.32 |
95.26 |
Nasnet_Large |
95.27 |
94.64 |
1.9202 |
1.9104 |
94.93 |
94.73 |
94.72 |
Using conventional augmentation techniques such as rotation, scaling, and flipping, the classifiers achieved moderate performance, with Bit_Sr50x1 leading with a training accuracy of 99.71% and validation accuracy of 93.75%, alongside a balanced F1-score of 93.15%. Inception_V3 also performed well, with a validation accuracy of 88.75% and an F1-score of 89.27%, indicating good generalization. However, models like MobileNetV2_130_224 and ResNet_V1_152 exhibited significant overfitting, with validation accuracy dropping to 77.50% and 81.25%, respectively, despite higher training accuracy. Similarly, Nasnet_Large demonstrated a competitive validation accuracy of 90.00% but suffered from high loss values, highlighting the limitations of conventional augmentation in handling a small and challenging dataset.
In contrast, the incorporation of synthetic data generated using StyleGAN2-ADA markedly enhanced the performance of all models. Bit_Sr50x1 achieved a perfect training accuracy of 100% and an improved validation accuracy of 96.43%, with balanced metrics (F1-score of 96.47%) and reduced losses (0.5363 training and 0.6096 validation). Inception_ResNet_V2 also showed significant improvement, achieving a validation accuracy of 95.24% and an F1-score of 95.26%, while ResNet_V1_152 and Inception_V3 reached validation accuracies of 94.64% with F1-scores of 94.79% and 94.76%, respectively. Even MobileNetV2_130_224 improved substantially, with validation accuracy increasing to 91.07%, and Nasnet_Large reached a validation accuracy of 94.64%, overcoming earlier challenges. These results highlight the ability of StyleGAN2-ADA to generate high-quality synthetic data, effectively mitigating overfitting, improving generalization, and significantly enhancing classification performance compared to conventional augmentation methods.
These results indicate that all models experienced an increase in validation accuracy after applying StyleGAN2-ADA. The highest improvement ratios were observed in ResNet_V1_152 13.39% and MobileNetV2_130_224 +13.57%. Among all models, Bit_Sr50x1 achieved the highest validation accuracy 96.43% and reached 100% training accuracy, highlighting its superior performance. Additionally, all models exhibited a decrease in validation loss, demonstrating that StyleGAN2-ADA enhanced model stability and learning efficiency. The most significant reduction in validation loss was observed in Inception_ResNet_V2, which dropped from 1.8002 to 1.2602, overcoming previous challenges with high loss values. Overall, Bit_Sr50x1 emerged as the most robust model, achieving the highest validation accuracy 96.43% and F1-score 96.47%.
Figures 5-10 illustrate the loss and accuracy plots for the six models: Bit_Sr50x1, ResNet_V1_152, Inception_V3, MobileNetV2_130_224, Inception_ResNet_V2, and NasNet_Large respectively, comparing their performance with conventional augmentation and StyleGAN2-ADA augmentation.
(a)
(b)
(c)
(d)
Figure 5. (a, c) accuracy and loss of Bit_sr50x1 model with conventional augmentation, (b, d) accuracy and loss with StyleGAN2_ADA
(a)
(b)
(c)
(d)
Figure 6. (a, c) accuracy and loss of Resnet_v1_152 model with conventional augmentation, (b, d) accuracy and loss with StyleGAN2_ADA
(a)
(b)
(c)
(d)
Figure 7. (a, c) accuracy and loss of Inception_v3 model with conventional augmentation, (b, d) accuracy and loss with StyleGAN2_ADA
(a)
(b)
(c)
(d)
Figure 8. (a, c) accuracy and loss of mobilenet_v2_224 model with conventional augmentation, (b, d) accuracy and loss with StyleGAN2_ADA
(a)
(b)
(c)
(d)
Figure 9. (a, c) accuracy and loss of inception_resenet_v2 model with conventional augmentation, (b, d) accuracy and loss with StyleGAN2_ADA
(a)
(b)
(c)
(d)
Figure 10. (a, c) accuracy and loss of Nasnet_Large model with conventional augmentation, (b, d) accuracy and loss with StyleGAN2_ADA
StyleGAN2-ADA augmentation improved model performance by generating high-quality synthetic images, enhancing data diversity, and mitigating class imbalance. The added variations in texture, lighting, and morphology enriched feature representation, leading to better generalization. This augmentation reduced overfitting, particularly benefiting underrepresented classes. As a result, models trained with StyleGAN2-ADA-augmented datasets achieved higher accuracy, recall, and F1-score, demonstrating its effectiveness in medicinal plant classification. In a comparison with the state-of-the-art studies displayed in Table 6, demonstrated superior performance in medicinal plant classification. Sachar and Kumar [17] employed a DCGAN model on a dataset of five classes with 30 images per class, achieving an accuracy of 97.51%. Deshmukh [4] utilized WGANs on a dataset of 15 classes with 5,500 images per class, reporting a recall and precision of 92% and 96%, respectively. Swathika et al. [3] applied a cGAN for 50 classes with 100 images per class, achieving an accuracy of 93.06%, with a recall of 93.13% and precision of 94.32%. In comparison, the proposed approach using StyleGAN2-ADA for data augmentation achieved significantly higher results. The Bit_Sr50x1 model Results After Applying Different GAN Models reached 100% accuracy, with a recall of 96.49%, an F1-score of 96.47%, and a precision of 96.59%, outperforming the state-of-the-art methods. Similarly, ResNet_V1_152 and Inception_V3 achieved high accuracies of 98.38% and 97.93%, respectively, with balanced recall, precision, and F1-scores above 94%. Even models like MobileNetV2_130_224 and Nasnet_Large, which showed lower performance, still achieved impressive accuracies of 96.45% and 95.27%, respectively. These results highlight the effectiveness of the proposed approach in leveraging StyleGAN2-ADA to enhance dataset diversity and improve classification performance, surpassing all previously reported methods.
Table 6. Performance comparison between classification
Reference |
GAN Model |
No. of Classes |
No. of Images / Class |
Accuracy (%) |
Recall (%) |
F1-Score (%) |
Precision (%) |
Sachar and Kumar [17] |
DCGAN |
5 |
30 |
97.51 |
- |
- |
- |
Deshmukh [4] |
WGANs |
15 |
5500 |
--- |
92 |
92 |
96 |
Swathika et al. [3] |
cGAN |
50 |
100 |
93.06 |
93.13 |
---- |
94.32 |
Proposed Approach |
Bit_Sr50x1 |
StyleGAN2_ADA |
10 |
+30 |
100 |
96.49 |
96.47 |
|
Resnet_V1_152 |
|
|
|
98.38 |
94.73 |
94.79 |
|
Inception_V3 |
|
|
|
97.93 |
94.76 |
94.76 |
|
MobileNetV2_130_224 |
|
|
|
96.45 |
91.22 |
91.30 |
|
Inception_Resenet_V2 |
|
|
|
96.16 |
95.32 |
95.26 |
|
Nasnet_Large |
|
|
|
95.27 |
94.73 |
94.72 |
While this study provides valuable insights into medicinal plant classification in Iraq, there are a few limitations to consider. First, the relatively small dataset size may limit the model's ability to fully capture the diversity of medicinal plant species. Although the most representative species were selected, expanding the dataset could improve model robustness and accuracy. Second, limitations in image capturing techniques may affect dataset quality. Variations in image resolution, lighting conditions, and angles could introduce inconsistencies, potentially impacting the model's generalization capabilities. Future studies could benefit from standardizing image capture methods to enhance dataset consistency and reliability. These limitations highlight the need for further research to validate the models and improve the dataset, enabling more robust and generalized prediction.
In this study, we introduced an innovative approach to address the challenge of limited datasets in medicinal plant classification by utilizing StyleGAN2-ADA for synthetic data augmentation. The results highlight the effectiveness of augmenting the dataset with synthetic images, leading to significant improvements in the performance and generalization capabilities of several state-of-the-art transfer learning models. Notably, the Bit_Sr50x1 model achieved an impressive 96.47 F1-score, with models like ResNet_V1_152 and Inception_V3 also demonstrating substantial gains over traditional augmentation methods. This approach not only mitigates overfitting but also strengthens model robustness, showcasing StyleGAN2-ADA's potential as a valuable tool for improving plant classification with limited data.
The novelty of this work lies in its application of advanced synthetic augmentation techniques to a domain with historically limited image datasets, providing a meaningful step toward more accurate and efficient automated plant recognition systems.
Future research could expand on this approach by exploring the extension of StyleGAN2-ADA to larger, more complex datasets, including a wider range of plant species. Further investigations into the combination of synthetic data generation with other deep learning techniques may open new possibilities for enhanced plant classification in diverse environmental and geographical contexts.
We sincerely thank the plant taxonomist, Dr. Ali Haloob, for his invaluable assistance in verifying the identification and nomenclature of the medicinal plants in this study and for directing me to reliable sources for plant imagery. I also wish to express my appreciation to the Iraqi National Herbarium and the College of Agricultural Engineering at the University of Baghdad for granting access to their facilities and fields for photographing medicinal plants.
GAN |
Generative Adversarial Network |
ADA |
Adaptive Discriminator Augmentation |
DCGAN |
Deep Convolutional Generative Adversarial Networks |
CNN |
Convolutional Neural Network |
RNN |
Recurrent Neural Network |
MLPs |
Multilayer Perceptrons |
ResNet |
Residual Network |
AdaIN |
Adaptive Instance Normalization |
Bit_s |
Big Transfer family-s |
Nas |
Neural Architecture Search |
TP |
True Positive |
FP |
False Positive |
TN |
True Negative |
FN |
False Negative |
[1] Ahmed, H.M. (2016). Ethnopharmacobotanical study on the medicinal plants used by herbalists in Sulaymaniyah Province, Kurdistan, Iraq. Journal of Ethnobiology and Ethnomedicine, 12: 1-17. https://doi.org/10.1186/s13002-016-0081-3
[2] Anyamele, T., Onwuegbuchu, P.N., Ugbogu, E.A., Ibe, C. (2023). Phytochemical composition, bioactive properties, and toxicological profile of Tetrapleura tetraptera. Bioorganic Chemistry, 131: 106288. https://doi.org/10.1016/j.bioorg.2022.106288
[3] Swathika, P., Ajithar, A., Hishaamudin, Z., Nitheeswaran, E. (2024). Medi-Plant: A deep learning approach for medicinal plant classification with pix2pix generative adversarial network. Research Square. https://doi.org/10.21203/rs.3.rs-4245022/v1
[4] Deshmukh, M. (2024). Deep learning for the classification and recognition of medicinal plant species. Indian Journal of Science and Technology, 17(11): 1070-1077. https://doi.org/10.17485/1ST/v17i11.3099
[5] Oleiwi, B.K., Kadhim, M.R. (2022). Real time embedded system for object detection using deep learning. AIP Conference Proceedings, 2415(1). https://doi.org/10.1063/5.0093469
[6] Al-Tameemi, M.I., Hasan, A.A., Oleiwi, B.K. (2023). Design and implementation monitoring robotic system based on you only look once model using deep learning technique. IAES International Journal of Artificial Intelligence, 12(1): 106. https://doi.org/106.10.11591/ijai.v12.i1
[7] Nasser, A.R., Hasan, A.M., Humaidi, A.J. (2024). DL-AMDet: Deep learning-based malware detector for android. Intelligent Systems with Applications, 21: 200318. https://doi.org/10.1016/j.iswa.2023.200318
[8] Nasser, A., Al-Khazraji, H. (2022). A hybrid of convolutional neural network and long short-term memory network approach to predictive maintenance. International Journal of Electrical and Computer Engineering (IJECE), 12(1): 721-730. https://doi.org/10.11591/ijece.v12i1
[9] Alwawi, B.K.O.C., Abood, L.H. (2021). Convolution neural network and histogram equalization for COVID-19 diagnosis system. Indonesian Journal of Electrical Engineering and Computer Science, 24(1): 420-427. https://doi.org/10.11591/ijeecs.v24.i1
[10] Hong, C., Cha, B., Kim, B., Oh, T.H. (2023). Enhancing classification accuracy on limited data via unconditional GAN. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, pp. 1057-1065. https://doi.org/10.1109/ICCVW60793.2023.00113
[11] Fawakherji, M., Suriani, V., Nardi, D., Bloisi, D.D. (2024). Shape and style GAN-based multispectral data augmentation for crop/weed segmentation in precision farming. Crop Protection, 184: 106848. https://doi.org/10.1016/j.cropro.2024.106848
[12] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27. https://doi.org/10.1145/3422622
[13] Wada, K., Chakraborty, B. (2021). Performance study of image data augmentation by generative adversarial networks. In 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, pp. 1022-1026. https://doi.org/10.1109/IEMCON53756.2021.9623117
[14] Kola, R. (2019). Generation of synthetic plant images using deep learning architecture. Master thesis. Blekinge Institute of Technology, Sweden.
[15] Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T. (2020). Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33: 12104-12114.
[16] Peres, R.S., Azevedo, M., Araújo, S.O., Guedes, M., Miranda, F., Barata, J. (2021). Generative adversarial networks for data augmentation in structural adhesive inspection. Applied Sciences, 11(7): 3086. https://doi.org/10.3390/app11073086
[17] Sachar, S., Kumar, A. (2022). DCGAN-based deep learning approach for medicinal leaf identification. Journal of Information and Optimization Sciences, 44(4): 717-723. https://doi.org/10.47974/JIOS-1270
[18] Zhu, F., Wang, J., Lv, P., Qiao, X., He, M., He, Y., Zhao, Z. (2024). Generating labeled samples based on improved cDCGAN for hyperspectral data augmentation: A case study of drought stress identification of strawberry leaves. Computers and Electronics in Agriculture, 225: 109250. https://doi.org/10.1016/j.compag.2024.109250
[19] Li, T., Asai, M., Kato, Y., Fukano, Y., Guo, W. (2024). Channel attention GAN-based synthetic weed generation for precise weed identification. Plant Phenomics, 6: 0122. https://doi.org/10.34133/plantphenomics.0122
[20] Chen, D., Qi, X., Zheng, Y., Lu, Y., Huang, Y., Li, Z. (2024). Synthetic data augmentation by diffusion probabilistic models to enhance weed recognition. Computers and Electronics in Agriculture, 216: 108517. https://doi.org/10.1016/j.compag.2023.108517
[21] Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 5907-5915. https://doi.org/10.1109/ICCV.2017.629
[22] Ufuah, D. Thomas, G., Balocco, S., Manickavasagan, A. (2022). A data augmentation approach using style-based generative adversarial networks for date fruit classification. Applied Engineering in Agriculture, 38(6): 975-982. https://doi.org/10.13031/aea.15107
[23] Sahib, K.A., Oleiwi, B.K., Nasser, A.R. (2024). Medicinal plants recognition using deep transfer learning models. International Journal of Design & Nature and Ecodynamics, 19(5): 1501-1510. https://doi.org/10.18280/ijdne.190504
[24] Kolesnikov, A., Beyer, L., Zhai, X., Puigcerver, J., Yung, J., Gelly, S., Houlsby, N. (2020). Big transfer (bit): General visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August, pp. 491-507. https://doi.org/10.1007/978-3-030-58558-7_29
[25] He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 770-778. https://doi.org/10.48550/arXiv.1512.03385
[26] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 2818-2826. https://doi.org/10.1109/CVPR.2016.308
[27] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 4510-4520. https://doi.org/10.48550/arXiv.1801.04381
[28] Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11231
[29] Zoph, B., Le, Q.V. (2016). Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. https://doi.org/10.48550/arXiv.1611.01578
[30] Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V. (2018). Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 8697-8710. https://doi.org/10.48550/arXiv.1707.07012
[31] Hajam, M.A., Arif, T., Khanday, AM.U.D., Neshat, M. (2023). An effective ensemble convolutional learning model with fine-tuning for medicinal plant leaf identification. Information, 14(11): 618. https://doi.org/10.3390/info14110618