© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Sustainable farming practices are essential to balancing environmental health and food productivity. Corn (Zea mays L.) is a primary food crop whose productivity heavily depends on stalk health. Accurate monitoring of corn stalk conditions is crucial for optimizing crop yields. With advances in artificial intelligence technology, Convolutional Neural Network (CNN) methods have emerged as effective approaches in plant image classification. This study uses ResNet-101 by combining colors (RGB and HSV) by augmenting the data of each unbalanced class, CNN architecture specifically using a learning rate of 0.001, Batch size of 8, with epochs of 10, and optimization using the Adam method. This is used to overcome the problem of vanishing gradients through shortcut connections, which allows for very deep model training without decreasing accuracy. This model can recognize visual features from corn stalk images, including physical damage, texture, and color changes. The classification process involves pre-processing to enhance image quality, followed by feature extraction using ResNet-101. The study analyzed five classes: healthy corn and four diseases (Erwinia carotovora, Pythium, Stenocarpella, and Gibberella). Our results demonstrate that ResNet-101 achieved optimal performance with RGB images using data augmentation, reaching 89.47% accuracy, while HSV color space performed better without augmentation at 90.67% accuracy.
food productivity, corn stalk disease, classification, ResNet-101
Corn (Zea mays L.) is one of the important food crops in the world, including in Indonesia. Besides being a main food source, corn is also used as animal feed and industrial raw material [1-3]. Corn productivity is highly dependent on plant health, including the corn stalk. A healthy stalk can support optimal growth, while stalks infected by diseases or damaged will affect both the quality and quantity of the harvest. Therefore, accurate monitoring and classification of corn stalk conditions is very important in supporting efforts to increase agricultural productivity.
Technological advancement, particularly in artificial intelligence and digital image processing, has opened new opportunities in agricultural management [4]. One approach that is increasingly being used for plant condition monitoring is the Convolutional Neural Network (CNN) method. CNN is a type of neural network architecture designed for image data processing and has proven to be effective in various image classification tasks [5, 6]. In this context, CNN is used to analyze images of corn stalks to detect the health condition of the plants.
ResNet-101, a variant of the Convolutional Neural Network (CNN) architecture known for its residual networks (ResNet), is a relevant choice for corn stalk classification. ResNet itself is a CNN architecture developed to address the vanishing gradient problem that often occurs during the training process of deep neural networks [7]. By using the concept of shortcut connections, ResNet enables the training of very deep models without negatively impacting accuracy [8]. ResNet-101, with its 101 layers, is considered capable of handling the complexity of corn stalk images, which exhibit variations in shape, color, and texture. The use of ResNet-101 for corn stalk classification offers several advantages over conventional methods. This architecture can better recognize essential visual features of corn stalk images. Features such as patterns of physical damage, changes in stalk texture due to disease, or color alterations caused by infections can be automatically identified by the network without the need for direct human inspection.
Therefore, this approach is still widely used as a problem-solving method across various sectors, such as a study conducted in 2021 that used the ResNet-101 model for classifying acute lymphoblastic leukemia from microscopic images [9]. By combining ResNet-101 with hyperparameter optimization methods to improve accuracy, the study achieved an average accuracy of 82.09% after optimal hyperparameter tuning. Additionally, research in 2022 also utilized ResNet-101 to enhance the quality and accuracy of skin cancer classification [10]. Using the public ISIC-2018 dataset, the model achieved an accuracy of 96.03%, precision of 95.40%, recall of 96.05%, and an AUC of 0.98, demonstrating exceptional capability in detecting skin cancer lesions.
Color spaces (RGB and HSV) provide richer visual information to help detect disease features accurately. Augmentation ensures that the model can cope with variations in real-world conditions and still recognize diseases consistently. This combination improves model generalization, results in higher classification accuracy, and makes the model more reliable across scenarios [11].
Therefore, based on previous studies, this research proposes using ResNet-101 with a combination of color space and augmentation for classifying corn stalks. In the process of classifying corn stalks using the ResNet-101 CNN, several key steps are involved. First, images of corn stalks are collected and pre-processed to improve image quality, including contrast adjustment, noise removal, and cropping to focus on the stalk. Next, the processed images are fed into the ResNet-101 model for feature extraction. Through multiple convolutional layers and residual blocks, the model learns visual patterns from the training data and uses these patterns to predict the category of the stalk, whether it is healthy or not. One of the challenges in applying this combination method is the need for a large and high-quality dataset. The deep ResNet-101 model requires diverse data to learn effectively and produce accurate predictions. Therefore, this research also involves an augmentation process to enhance the dataset. By using ResNet-101 for the color space augmentation, it is expected to produce a corn stalk classification model that has high accuracy in recognizing the health conditions of the plants. The speed and accuracy of detecting these issues are crucial for taking prompt preventive actions, thereby minimizing potential losses due to diseases or damage to the stalks.
This research focuses on classifying diseases in corn stalks using digital images. One of the key elements of data mining is classification. Data mining is the process of using mathematical techniques, artificial intelligence, statistics, and machine learning to extract, identify, and analyze different types of data in order to find patterns within them [11, 12]. Knowledge discovery from databases is another name for data mining. Data mining can be classified into a number of categories based on its functions, including description, prediction, estimation, classification, clustering, and association [13]. In this research, the classification of digital images involves three main stages, starting with dataset collection, followed by the training process, and finally, the evaluation process to determine how well the trained model performs.
2.1 Color SpaceRGB (red, green, blue)
The RGB color space is a color system that uses three basic components: red, green, and blue, to produce a variety of other colors [14, 15]. Each color in this space is determined by the combination of intensities of the three components. By adjusting the intensity of each RGB component within a specific range, such as from 0 to 255 in an 8-bit representation, different colors can be created. For example, the color white is produced with full intensity from all three components (255, 255, 255), while the color black is produced with zero intensity (0, 0, 0) [16].
In digital image classification, the RGB color space is often used due to its ability to represent colors in a way that closely aligns with human perception. Each digital image typically consists of three primary colors: red, green, and blue, each containing information about the intensity of the color for each pixel. The classification process utilizes this information to identify and categorize objects or patterns present in the image. Thus, by using the RGB color space, classification algorithms can analyze color features from the images to assist in the identification process. For example, in the research, the classification model is trained to recognize specific color characteristics that distinguish between several categories. The RGB color space provides a strong foundation for feature extraction and image analysis due to its ability to capture and store detailed color information, which is crucial for achieving high classification accuracy. In Figure 1, this is an image of a corn stalk in RGB color space.
Figure 1. RGB corn stalk image
HSV (Hue, Saturation, Value)
The HSV color space is a color representation system that consists of three main components: Hue (H), Saturation (S), and Value (V) [17]. Hue refers to the base color perceived, such as red, blue, or green, and is measured in degrees from 0 to 360 on the color wheel [18]. Saturation describes the intensity or purity of a color, where high values indicate bright and pure colors, while low values indicate more muted or grayish colors [18]. Value, or brightness, measures how light or dark a color is, with a value of 0 indicating black and a full value representing the brightest form of that color [18, 19].
In the context of digital image classification, the HSV color space has the advantage of separating color information from brightness information. This allows the classification model to focus on the Hue component to recognize specific colors without being affected by variations in lighting or brightness. This is particularly useful when dealing with varying or inconsistent lighting, as the Saturation and Value components enable the system to manage differences in intensity and color saturation [20].
Figure 2. HSV corn stalk image
The use of the HSV color space in digital image classification allows algorithms to be more accurate in recognizing and classifying objects based on their colors, as the HSV color space facilitates the separation and identification of colors even when lighting conditions vary. Thus, HSV helps improve the performance of image classification systems by making them more robust against changes in lighting and color saturation. This is as depicted in Figure 2, which is a picture of a corn stalk in HSV color space.
2.2 CNN model: ResNet-101
ResNet-101 is one of the Residual Network (ResNet) architectures designed to address issues in training deep neural networks [21]. ResNet, introduced by Kaiming He and his team on the research in 2015 [22], that introduces the concept of residual learning to address the common vanishing gradient problem encountered in very deep networks. ResNet-101 is an enhanced version with 101 layers and includes various modern optimizations to improve performance in image recognition.
ResNet are designed to enable deep networks to be trained effectively without a decline in accuracy as the number of layers increases. This architecture leverages residual blocks, which feature shortcut connections that allow gradients to pass through several layers without losing important information.
Table 1 shows that there are 5 main layers in the ResNet-101 architecture [23]. The first layer is the Conv Layer (Initial Convolution Layer) which consists of Conv1 and pooling. In Conv1, the network begins with a standard convolution layer with a kernel size of 7×7, stride 2, and padding 3. This is the first layer that functions to extract basic features from the input image. The output is then sent to the max pooling layer with a kernel size of 3×3 and stride 2, which reduces the output size. After the first convolution, the image is processed through max pooling to reduce its dimensions and speed up computation. The second layer is the Residual Block. ResNet-101 has four main stages, each consisting of several residual blocks, with each stage having a different number of blocks. Stage 1 (Conv2_x) has 3 residual blocks. Each block consists of three layers: two 1×1 convolutions (to reduce and restore dimensions) and one 3×3 convolution in between. In addition, shortcut connections in each block connect the initial input to the output after the convolution operation, allowing information to be passed directly. Stage 2 (Conv3_x) Has 4 residual blocks with a similar architecture. The first convolution in each block is responsible for reducing the dimensionality of the features. While Stage 3 (Conv4_x) has 23 residual blocks, making it the deepest part of the network. At this depth, the network is able to learn complex features from the image, such as complex texture patterns, object shapes, or special characteristics of the image. and the last stage, Stage 4 (Conv5_x) has 3 residual blocks. This is the final part of the convolutional network before the features are sent to the fully connected layers. These residual blocks are responsible for learning the last high-level features of the image, which helps improve classification accuracy. Layer 3 is Shortcut Connections, where each residual block uses shortcut connections, allowing information to pass directly from input to output. This helps overcome the vanishing gradient problem and makes it easier for the network to learn, even as the number of layers increases. layer 4 is Global Average Pooling. This layer functions to flatten all extracted features into a one-dimensional vector. This process reduces the dimensionality of the data without losing important information, compressing the output before sending it to the fully connected layer. and the last layer is the Fully Connected Layer and SoftMax. This layer is responsible for mapping the learned features to probability values. while SoftMax is used to determine the final class of the input based on these probabilities. This allows the network to make classification predictions more efficiently.
Table 1. ResNet-101 architectures
Layers |
Output Size |
ResNet-101 |
Conv 1 |
112×112 |
7×7 conv, stride 2 |
Conv 2_x |
56×56 |
3×3 max pool, stride 2 |
[1×1.643×3.641×1,256]×3 |
||
Conv 3_x |
28×28 |
[1×1.1283×3.1281×1.512]×4 |
Conv 4_x |
14×14 |
[1×1.2563×3.2561×1.1024]×23 |
Conv 5_x |
7 × 7 |
[1×1.5123×3.5121×1.2048]×3 |
|
1×1 |
Average pool, 1000-d fc, SoftMax |
FLOPs |
7.6×109 |
2.3 Confusion matrix
One approach to evaluate how effectively a machine learning classification model performs is a confusion matrix, which shows how well or weakly the model accurately classifies data. Several performance evaluation metrics, including accuracy, precision, recall (sensitivity), and F1-score, can be computed using the confusion matrix data [24]. Table 2 shows the confusion matrix that can serve as a reference for its calculation.
Table 2. Confusion matrix
|
Predicted Class |
||
Positive |
Negative |
||
Actual Class |
Positive |
TP |
FP |
Negative |
FN |
TN |
The following formulas can be used to determine the accuracy, precision, recall, F1-score values based on the confusion matrix table [25].
1. Accuracy
Accuracy=TP+TNTP+TN+FP+FN (1)
2. Precision
Precision=TPTP+FP×100% (2)
3. Recall
Recall=TNTN+FN×100% (3)
4. F1 Score
F−measure=2×Recall×PrecisionRecall+Precision×100% (4)
where,
TP: The number of instances where the model correctly predicted the positive class,
TN: The number of instances where the model correctly predicted the negative class,
FN: The number of instances where the model incorrectly predicted the negative class when the actual class was positive,
FP: The number of instances where the model incorrectly predicted the positive class when the actual class was negative.
3.1 Dataset corn stalk
The data processed in this classification involves digital images of corn stalk, comprising 750 samples across five class. The details of each class are presented in Table 3.
Table 3. Image corn stalk dataset
No. |
Class |
Value |
1 |
Healthy |
90 |
2 |
Erwina Carotovora |
102 |
3 |
Pythium |
131 |
4 |
Stenocarpella |
210 |
5 |
Gibberella |
217 |
Total |
750 |
3.2 Analysis
This section describes how the research was carried out to solve the issues that were found utilizing the previously mentioned techniques. The figure that follows will show the steps in the classification process.
Based on the illustration in Figure 3, the structure of the classification system consists of four main components, namely input, pre-processing, process, and output. The following is an explanation of each component of the four phases. The first is the input of RGB and HSV image data of corn stalk disease. This corn stalk disease image was taken from the Madura Islands, Indonesia using a 48 MegaPixel camera with an image size of 4,000×3,000 pixels. The collected corn stalk images will be input as the initial step in the classification process, which serves as the information needed for the model to learn patterns or features that distinguish various categories or classes. With labeled data, the model can be trained to recognize and classify new data into the appropriate category. The dataset used in this study consists of 750 corn stalk images classified into five classes, namely Erwinia carotovora, Pythium, Stenocarpella, Gibberella, and healthy. The second is preprocessing, the data that has been taken will be adjusted to the image size of 224×224 pixels. Each image is normalized to ensure that the pixel values are within a certain range which aims to accelerate convergence during model training. then the data will be augmented to 250 for each class. this augmentation process aims to equalize the amount of data, so that 1,250 data are obtained. Augmentation is carried out by Rotation, Flipping and contrast settings, zooming and cropping. The dataset will later be divided into 80% training data and 20% testing data. The training data will then be divided into training and validation sets. the third is the Training & Testing process, The prepared dataset will enter the training process to form a classification model using the ResNet-101 architecture with the Adam optimizer. The resulting classification model will undergo testing to evaluate how well the model can recognize new images. Performance will be measured using a confusion matrix. the fourth is the classification process is the core procedure for producing a training model that can categorize the dataset into a predetermined target class.
Figure 3. System design classification corn stalk disease
In this results and discussion section, the performance of the ResNet-101 model in classifying corn stalks will be presented and analyzed. The classification model is formed using a dataset of 750 samples with five target classes. The parameters used in the system modelling with ResNet-101 are as follows:
Learning rate (Lr) = 0.001
Batch size = 8
Optimizer = Adam
Number of Epoch = 10
With these parameters, the model training process was conducted using both the augmented dataset and the non-augmented dataset. The results obtained are displayed in Table 4 and Table 5.
Table 4. Performance results of ResNet-101 with augmentation
Performance Measurement |
RGB |
HSV |
Accuracy |
89.47 |
67.11 |
Recall |
89.60 |
70.12 |
Precision |
89.72 |
81.14 |
F1 Score |
89.66 |
75.23 |
Time |
10,914.47 s |
9,661.28 s |
Table 5. Performance results of ResNet-101 without augmentation
Performance Measurement |
RGB |
HSV |
Accuracy |
74.67 |
90.67 |
Recall |
72.58 |
90.92 |
Precision |
75.87 |
92.22 |
F1 Score |
74.19 |
92.57 |
Time |
4,390.29 s |
4,250.59 s |
Based on the two tables above, the ResNet-101 model with RGB images achieved an accuracy of 74.67%, with precision at 75.87%, recall at 72.58%, and an F1-Score of 74.19%. Although these results are good, they indicate that without data augmentation, the model has limitations in recognizing the actual variations in image patterns. However, with the implementation of augmentation, the accuracy significantly increased to 89.47%. This improvement in accuracy demonstrates that data augmentation successfully added variation that helps the model learn more patterns, thereby enhancing its generalization and classification accuracy.
Meanwhile, the use of HSV image data yielded the best results without augmentation, achieving an accuracy of 90.67%, precision of 92.22%, recall of 90.92%, and an F1-Score of 92.57%. This indicates that the conversion to HSV helps in recognizing color features more distinctly, which are important indicators in identifying corn stalk diseases. However, when augmentation was applied to the HSV images, the accuracy dropped to 67.11%, which is lower than all other tests. This decrease may be due to the augmentation introducing unnatural color variations, making it difficult for the model to recognize relevant patterns. This highlights the need for augmentation to be tailored to the type of color format used to achieve accurate and effective results.
Based on the analysis conducted in an effort to classify diseases in corn stalks using the CNN ResNet-101 architecture with the Adam optimizer, and utilizing datasets with two different color spaces, namely HSV and RGB, consisting of 750 data points with 5 target classes, it can be concluded that Without Augmentation: In the initial experiment without augmentation, the model obtained an accuracy of 74.67% on RGB images and 90.67% on HSV images. These results indicate that converting to HSV helps improve accuracy.
After augmentation to 250 data per class or 1,250 data, the accuracy on RGB images increased significantly to 89.47%, while on HSV images it decreased. This shows that variations in augmentation for RGB help the model recognize more patterns, while excessive augmentation for HSV can interfere with the training process.
As a recommendation for further development and research, researchers can develop augmentation techniques, such as mix up or cutout, which can help expand the range of possible patterns without adding unnecessary noise. and explore classification using other CNN architectures to identify corn stem diseases, thereby producing a more accurate classification model.
Our gratitude goes to Trunojoyo University, Madura, for providing the opportunity for researchers to complete this research. This research article is supported by the 2024 UTM DIPA Fund for Penelitian Kolaborasi Nasional.
[1] Rachmad, A., Fuad, M., Rochman, E.M.S. (2023). Convolutional neural network-based classification model of corn leaf disease. Mathematical Modelling of Engineering Problems, 10(2): 530-536. https://doi.org/10.18280/mmep.100220
[2] Ansori, N., Rachmad, A., Rochman, E.M.S., Fauzan, B.H., Asmara, Y.P. (2024). Corn stalk disease classification using random forest combination of extraction features. Communications in Mathematical Biology and Neuroscience, 2024: 19. https://doi.org/10.28919/cmbn/8404
[3] Putro, S.S., Syakur, M.A., Rochman, E.M.S., Rachmad, A. (2022). Comparison of backpropagation and ERNN methods in predicting corn production. Communications in Mathematical Biology and Neuroscience, 2022: 10. https://doi.org/10.28919/cmbn/7082
[4] Méndez-Zambrano, P.V., Tierra Pérez, L.P., Ureta Valdez, R.E., Flores Orozco, Á.P. (2023). Technological innovations for agricultural production from an environmental perspective: A review. Sustainability, 15(22): 16100. https://doi.org/10.3390/su152216100
[5] Purwono, P., Ma'arif, A., Rahmaniar, W., Fathurrahman, H.I.K., Frisky, A.Z.K., Ul Haq, Q.M. (2022). Understanding of convolutional neural network (CNN): A review. International Journal of Robotics and Control Systems, 2(4): 739-748. http://doi.org/10.31763/ijrcs.v2i4.888
[6] Maurício, J., Domingues, I., Bernardino, J. (2023). Comparing vision transformers and convolutional neural networks for image classification: A literature review. Applied Sciences, 13(9): 5521. https://doi.org/10.3390/app13095521
[7] Yallappa, M.S., Bharamagoudar, G.R. (2023). Classification of knee X-ray images by severity of osteoarthritis using skip connection based ResNet-101. International Journal of Intelligent Engineering & Systems, 16(5): 738-747. https://doi.org/10.22266/ijies2023.1031.62
[8] Nagpal, P., Bhinge, S.A., Shitole, A. (2022). A comparative analysis of ResNet architectures. In 2022 International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON), pp. 1-8. https://doi.org/10.1109/SMARTGENCON56628.2022.10083966
[9] Chen, Y.M., Chou, F.I., Ho, W.H., Tsai, J.T. (2021). Classifying microscopic images as acute lymphoblastic leukemia by Resnet ensemble model and Taguchi method. BMC bioinformatics, 22(Suppl 5): 615. https://doi.org/10.1186/s12859-022-04558-5
[10] Chaturvedi, S.S., Tembhurne, J.V., Diwan, T. (2020). A multi-class skin Cancer classification using deep convolutional neural networks. Multimedia Tools and Applications, 79(39): 28477-28498. https://doi.org/10.1007/s11042-020-09388-2
[11] Rachmad, A., Chamidah, N., Rulaningtyas, R. (2020). Mycobacterium tuberculosis identification based on colour feature extraction using expert system. Annals of Biology, 36(2): 196-202.
[12] Rochman, E.M.S., Suprajitno, H., Kamilah, I., Rachmad, A., Santosa, I. (2023). Tuberculosis classification using random forest with K-prototype as a method to overcome missing value. Communications in Mathematical Biology and Neuroscience, 2023: 81. https://doi.org/10.28919/cmbn/7873
[13] Nisia, T.G., Rajesh, S. (2021). Extraction of high-level and low-level feature for classification of image using Ridgelet and CNN based image classification. Journal of Physics: Conference Series, 1911(1): 1-6. https://doi.org/10.1088/1742-6596/1911/1/012019
[14] Shishmanova, S., Rinaldi, A. (2018). RGB color wheel intended to create color harmony compositions in modern art and design. International Journal of Science and Engineering, 4(4): 45-57. https://doi.org/10.53555/eijse.v4i4.163
[15] Mary, G.G., Rani, M.M.S. (2016). A study on secret image hiding in diverse color spaces. International Journal of Advanced Research in Computer and Communication Engineering, 5(5): 779-783. https://doi.org/ 10.17148/IJARCCE.2016.55191
[16] Setiawan, W., Rochman, E.M.S., Satoto, B.D., Rachmad, A. (2022). Machine learning and deep learning for maize leaf disease classification: A review. Journal of Physics: Conference Series, 2406(1): 012019. https://doi.org/10.1088/1742-6596/2406/1/012019
[17] Chernov, V., Alander, J., Bochko, V. (2015). Integer-based accurate conversion between RGB and HSV color spaces. Computers & Electrical Engineering, 46: 328-337. https://doi.org/10.1016/j.compeleceng.2015.08.005
[18] Kurniastuti, I., Yuliati, E.N.I., Yudianto, F., Wulan, T.D. (2022). Determination of hue saturation value (HSV) color feature in kidney histology image. Journal of Physics: Conference Series, 2157(1): 012020. https://doi.org/10.1088/1742-6596/2157/1/012020
[19] Islami, F. (2021). Implementation of HSV-based thresholding method for iris detection. Journal of Computer Networks, Architecture and High Performance Computing, 3(1): 97-104. https://doi.org/10.47709/cnahpc.v3i1.939
[20] Kartika, D.S.Y., Herumurti, D., Yuniarti, A. (2018). Butterfly image classification using color quantization method on HSV color space and local binary pattern. IPTEK Journal of Proceedings Series, 1: 78-82. https://doi.org/10.12962/J23546026.Y2018I1.3512
[21] Hasanah, S.A., Pravitasari, A.A., Abdullah, A.S., Yulita, I.N., Asnawi, M.H. (2023). A deep learning review of ResNet architecture for lung disease Identification in CXR Image. Applied Sciences, 13(24): 13111. https://doi.org/10.3390/app132413111
[22] He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778.
[23] Albahli, S., Nazir, T. (2022). AI-CenterNet CXR: An artificial intelligence (AI) enabled system for localization and classification of chest X-ray disease. Frontiers in Medicine, 9: 955765. https://doi.org/10.3389/fmed.2022.955765
[24] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S. (2022). On evaluation metrics for medical applications of artificial intelligence. Scientific Reports, 12(1): 5979. https://doi.org/10.1038/s41598-022-09954-8
[25] Devi, M.U., Babu, R. (2019). Categorizing the age group and measuring accuracy of fuzzy model. International Journal of Electronics and Communication Engineering and Technology, 10(5): 36-46. https://ssrn.com/abstract=3554313.