Weed Detection and Localization in Soybean Crops Using YOLOv4 Deep Learning Model

ABSTRACT


INTRODUCTION
Soybean is a widely grown edible oil seed as it is rich protein food for human being and animals.Animal consumes it through soybean meal, and humans use it as oil.According to Soy stats, Brazil is the world's major soybean producer and it shares around 25% of edible oil.It is needed to improve the quality and quantity of soybean by removing weeds.Weeds can compete with soybean plants for essential resources like water nutrients and sunlight, and so crop yield can be negatively impacted.Also weeds increase the risk of disease and pests, interfere the harvest and post-harvest process, thus increasing the production cost.So accurate and efficient weed detection model is needed to optimize crop yields quality, minimize herbicide usage and production costs, promoting sustainable and eco-friendly farming practices and enable precision weed management.At present, for weed management herbicides are sprayed, which causes harmful environmental effects.Locating the weed precisely and spraying the herbicides at the specific location reduces the adverse effects.Additionally, weeds and soybean are similar in color and shape, Intra-and inter-species variability of weeds in terms of its features like shape, size, color, and texture is also very less.So accurate and robust detection of weeds remains as a challenging task.To address this issue various technologies and methods have been developed for detection of weeds in soybean field.Earlier methods include visual inspection of the field by farmers, where weeds are identified by their appearance and manually removed.This is labor-intensive and time -consuming, not practical for large fields.Later feature based methods are used considering color, histograms, texture descriptors and shape features.In recent years, machine learning algorithms like Support vector machines (SVM), K-nearest neighbors (K-NN), etc. are used for classification.These methods have limitations that they may not have capability to learn and adapt to variations in lighting conditions, view point and background clutter.
Deep learning models such as convolutional neural networks (CNNs) has ability to handle complex and diverse datasets effectively.In object detection, to localize the multiple objects popular models like single shot multibox detector (SSD), region based convolutional neural network (R-CNN) and You Only Look Once (YOLO) are widely used.In order to improve the detection accuracy and to increase the robustness of the model, Contrast Limited Adaptive Histogram Equalization (CLAHE) is used for preprocessing.
The objectives of this paper are as follows: 1. Applying CLAHE for preprocessing the images in the dataset.This paper is organized into five chapters.Chapter2 is for literature work on earlier traditional feature based algorithms and deep learning algorithms used in weed/crop detection.Chapter3 presents the Data Acquisition, labeling, and implementation of YOLOv4 with Loss function.Chapter4 presents the simulation results along with performance analysis of different pre-defined networks, R-CNN, SSD and YOLOv4.Finally, chapter5 concludes with some multidirections for future work.

RELATED WORK
Weeds are the major concern in crop production as they affect the crop yield.About 30% of crop yield being effected worldwide due to weeds [1].Presently, weed control is done by spraying herbicides on the whole field, instead of precisely spraying on weeds.Due to this the food products contain herbicides residue which is harmful and also crop yield may get effected [2].Hence there is need to develop efficient weed control methods for harvesting crops with good yield.Precise weed identification and localization is a challenging task for development of automated weed control methods.
In the early methods, computer vision is used for classifying and detecting objects.To process the weed images and extract their features different feature extraction methods are used [3].With computer vision techniques, features like color, shape and texture are used to identify and differentiate soybean crop or weed [4].However, selection of significant features suitable for the application is very difficult and also the extraction of features is time consuming.To improve the performance of image classification, machine learning techniques gained attention in recent years [5].However, in machine learning techniques feature extraction has to be done before training the classifier like Support Vector machine [6][7][8].The researchers used color and texture features for discriminating Soybean crops and weeds.For this RGB and HSV color spaces, Gray Level Co-occurrence Matrix (GLCM), and Local Binary Pattern (LBP) features are used to train the Support Vector Machine (SVM) classifier yielding an accuracy of about 96%.The researchers applied different classifiers like KNN, Random Forest and SVM on unmanned aerial vehicle images for Classifying weed and soybean accuracy of 91.34% [9].Machine learning methods have gained significant attention in due to ability to make predictions, classify and extract values insights from large datasets.But the disadvantage and challenges of machine learning methods is that it takes a long time to extract features, huge dataset is required for training and decision making.Also high quality, clean and wellstructured dataset is essential.
In recent years, Deep learning methods are widely used in image classification applications.In deep learning models, the features are learned automatically from the raw data.Compared to machine learning models, deep learning produces the better models.Particularly, Convolution Neural Networks (CNN) performs better than other machine learning models for the task of classifying images [10].Selection of features and Automatic feature extraction was made possible in Deep learning with convolution layers.With availability of high speed computational systems with high memory capacity, researchers started using deep learning networks in several fields including agriculture [11][12][13].The researchers used convNets for weed detection in soybean crop, yielding an accuracy of 98% [14].Convolutional Neural Networks (CNN) and other architectures like MobileNetV2, ResNet50 are widely used for weed detection.The researchers compared the detection using different models viz., MobileNetV2, ResNet50 and custom CNN models for weed detection in realtime and recorded the accuracy as 97.7% with a custom CNN model [15].For weed removal or control, along with wee detection, finding its location is also important.Thus classification and localization are the two aspects to be considered.Different deep learning models are proposed in research for detection and localization.R-CNN uses region proposals to localize the objects within an image.Later improved models are suggested like Faster R-CNN and Mask R-CNN, which uses anchor box to locate the object and then predict the category of the object [16].So these methods have two stages viz., Localization and prediction.YOLO [17] and SSD [18] on the other hand are Single-stage object detection methods, which perform a one-pass regression of class probabilities and bounding box locations.YOLO and its versions viz., YOLOv1, YOLOv2, YOLOv3, YOLOv4 etc. are used in different applications of object detection [19,20].YOLOv2 used for medical face mask detection [21], YOLOv3 with Darknet-53 for Target detection [22] and YOLOv4 for Human Detection [23], YOLO-sesame model weed detection [24], etc. Different researchers applied YOLO and its different versions weed detection.Deep learning based on YOLO-v2 is used for weed detection in romaine lettuce crop [25] and YOLOv4 is used for weed detection in carrot fields by Ying et al. [16].For weed detection and localization in soybean field, this work investigated YOLOv4 model.For comparison, stateof-art models viz., R-CNN, SSD are investigated and various performance metrics are evaluated.Additionally, different pre-trained networks viz., Darknet19, Mobilenetv2, VGG19, Resnet18, Inceptionv3 and Densenet201 are also investigated for classification of weed/soybean crop.We found that the experimental results help to enable the precise targeting of weed control measures, avoiding the herbicides and reducing the environmental impact.For this work the data set used is a publicly available dataset from Kaggle website.Santos Ferreira used soybean images that are captured using drone, contains soybean images, grass images, broadleaf images and soil images.From this we considered grass and broad leaf as weed and soybean as crop.The data set is split into three parts.Sixty percent is for training, ten percentages for validation and remaining for testing.Example sample input images shown in Figure 1.

YOLOv4 architecture
YOLOv4 algorithm is used for accurate and efficient object detection tasks shown in PANet helps the model to aggregate features from different layers and ability to detect objects at different scales and aspect ratios.SAM focus on relevant regions to enhance feature fusion.Head network is responsible for predicting bounding boxes and class probabilities, it has three detection sub heads, which are designed for corresponding objects at different scales.The prediction includes bounding box coordinates (x, y, width, and height), object score and class probabilities.FPN is for combining multi-scale features.It enhances the model's ability to detect objects of various sizes and maintain good accuracy.SSP can improve the networks ability to detect objects at different spatial resolutions.It also allows the model to focus on both small and large objects in the image.SSP kernels size is 1×1, 5×5, 9×9 and 13×13 for max pooling, the stride is considered as 1.

YOLOv4 loss function
In YOLOv4, object detection model, the loss function is composed of three components viz., classification loss, localization loss and confidence loss.These are used to train the model to accurately detect objects in an image.Classification loss is used to determine how well the model classifies objects within each grid cell.If an object is present in the grid cell then classification loss is computed.If there is no object present in the cell, the loss is calculated based on confidence score, which should be close to zero.The Localization loss or Regression loss is used to measure how well the model predicts the bounding box coordinates for each objects in the image.Confidence loss on the other hand measures how well the model predicts the confidence score, which indicates the likelihood that an object exists within a grid cell.These three loss functions are combined to form the final loss function used for training YOLOv4.
Classification loss can be evaluated as given by Eq. ( 1): Here, 1   = 1 means the object present in the cell or else it is zero, and   ^() is class 'c' conditional class probability.Localization loss evaluated using Eq. ( 2): Here, 1   =1 if the j th bounding box of cell 'i' is accountable for object detection, or else it is 0.

Figure 2. Architecture of YOLOv4
Confidence loss is obtained by Eq. ( 3): ^ gives the confidence scores of the box j in cell i.
1   = 1 if object is present in j th bounding box of cell 'i' or else it is 0.
No object is detected means confidence loss is obtained using Eq. ( 4).
^ is j th box of cell 'i' confidence score.

EXPERIMENTAL RESULTS
In this work, different pretrained models viz., Darknet19, Mobilenetv2, VGG19, Resnet18, Inceptionv3 and Densenet201 are applied with the dataset.Figure 3(a) shows the 2×2 confusion matrix, Figure 3(b) shows the confusion matrix for densenet201 and Table 1 shows the performance comparison of these networks with standard metrics viz., Accuracy, Recall, Precision, F1-score, etc. Table 2 shows the performance comparison of different networks, R-CNN, SSD with YOLOv4 using some standard metrics Accuracy, Recall and mAP values.YOLOv4 yielded best performance, for batch size of 64, 50 epochs and learning rate of 0.001.With more than 50 epochs, overfitting and less than it underfitting is observed.Figure 4 System Specifications: Processor Intel(R) Core (TM) i5-1035G1 CPU @ 1.00GHz, 8.00 GB RAM.
In deep learning, object detection and classification the standard performance metrics are calculated.

Recall = TP / (TP + FN).
Also called sensitivity or true positive rate.Which measure the ability of the to identify all relevant instances in the database.

CONCLUSIONS
Different state-of-art deep learning models viz., Darknet19, Mobilenetv2, VGG19, Resnet18, Inceptionv3 and Densenet201 are trained and tested for classification of weed and soybean crop.Densenet201 yielded outstanding performance compared to other deep learning networks and previous research results.Also, for detection and localization YOLOv4, R-CNN and SSD networks are trained and tested.YOLOv4 achieved better overall performance than R-CNN, and SSD.YOLOv4 model detected accurately two types of weeds viz., broad leaf and grass.There are few limitations of this work.Firstly, all the models are tested with the publicly available dataset.Secondly, for detection and localization, images considered have crop and weeds with wide spacing.In future, from soybean fields the images are to be acquired at different stages of crop growth and with a greater number of weed types, to create a custom dataset.Further, detection of weeds in very closely spaced field is to be studied.

Figure 1 .
Figure 1.Sample images in dataset

Figure 2 .
It has three parts viz., backbone, neck and head network.The backbone is based on CSPDarknet53 (Cross Stage Partial Darknt53) to extract the hierarchical features from the input image.The cross-stage part represents connecting information across different stages of layers of the network, where as partial network implies that not all the layers or stages are used.CSPDarknet53 represents combination of Darknet framework and CSP architecture, using 53 layers in the network.Neck uses path aggregation network (PANet) and Spatial Attention Module (SAM).
(a), (b) and (c) show the output image with bounding boxes and corresponding confidence score.Figures 5-7 show curves of loss and accuracy for different Epochs of 25, 50 and 100.

Table 1 .
Classification performance with different pre-trained models 6. False positive Rate (FPR) = FP / (TN + FP).Measures the proportion of negative instances that are incorrectly classified as positive.

Table 2 .
Detection performance of R-CNN, SSD and YOLOv4