CN115588117B

CN115588117B - Method and system for detecting diaphorina citri based on YOLOv5s-BC

Info

Publication number: CN115588117B
Application number: CN202211264285.0A
Authority: CN
Inventors: 吕石磊; 柯尊柏; 李震; 薛秀云; 姜晟; 周旭
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2022-10-17
Filing date: 2022-10-17
Publication date: 2023-06-09
Anticipated expiration: 2042-10-17
Also published as: CN115588117A

Abstract

The invention discloses a method and a system for detecting diaphorina citri based on YOLOv5s-BC, wherein the method comprises the following steps: constructing a diaphorina citri detection data set and preprocessing data to obtain a data set to be detected; constructing a YOLOv5s-BC model; and inputting the data set to be detected into a YOLOv5s-BC model for detection, and obtaining a detection result. By using the method, the accurate detection of the diaphorina citri in the defocused state can be realized. The method and the system for detecting the diaphorina citri based on the YOLOv5s-BC can be widely applied to the technical field of diaphorina citri detection.

Description

Method and system for detecting diaphorina citri based on YOLOv5s-BC

Technical Field

The invention relates to the technical field of diaphorina citri detection, in particular to a diaphorina citri detection method and system based on YOLOv5 s-BC.

Background

Citrus yellow dragon disease is a destructive disease caused by infection of phloem of citrus plants with gram-negative bacteria, and currently, an HLB detection method is mainly adopted for a detection method of citrus psyllids, and mainly comprises two types of field diagnosis and laboratory biochemical analysis. The field diagnosis mainly depends on manual diagnosis and identification, is simple and easy to operate and does not need equipment assistance, but the field diagnosis method is too dependent on subjective judgment of detection personnel, the detection accuracy is difficult to effectively guarantee, the detection process of a laboratory biochemical analysis method is relatively complex, the detection period is long, the requirements on professional knowledge and experience reserve of the detection personnel are high, the current detection technology is difficult to directly popularize and apply to actual production of the citrus orchard, however, the detection accuracy of the existing target detection algorithm on the small target of the diaphorina citri is low, the detection effect on the out-of-focus target is poor, namely, the data can be divided into out-of-focus data, near-view data and far-view data from the aspect of data quality, and the out-of-focus data and the far-view data are serious due to target feature loss, so that the detection difficulty is relatively high.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a method and a system for detecting diaphorina citri based on YOLOv5s-BC, which can realize accurate detection of diaphorina citri in a defocused state.

The first technical scheme adopted by the invention is as follows: the method for detecting the diaphorina citri based on the YOLOv5s-BC comprises the following steps of:

constructing a diaphorina citri detection data set and preprocessing data to obtain a data set to be detected;

constructing a YOLOv5s-BC model;

and inputting the data set to be detected into a YOLOv5s-BC model for detection, and obtaining a detection result.

Further, the step of constructing a diaphorina citri detection data set and preprocessing data to obtain a data set to be detected specifically includes:

obtaining a diaphorina citri sample image;

selecting a preset number of diaphorina citri sample images to perform random cutting treatment to obtain cut sample images;

splicing the cut sample images according to a preset sequence to obtain spliced sample images;

and synthesizing the spliced sample image and the diaphorina citri sample image to obtain a data set to be detected.

Further, the YOLOv5s-BC model specifically includes an input module, a backbone network module, a neck module, and a prediction module, wherein:

the trunk network module comprises a slice layer, a Bottleneck CSP_C layer, a convolution layer, a SE-Net attention layer and a pooling layer, wherein the Bottleneck CSP_C layer consists of the convolution module, a bottleneck layer, a Conv2d convolution layer, a BatchNorm2d and SiLU activation functions, and the SE-Net attention layer is respectively connected with the convolution layer and the bottleneck layer;

the neck module comprises an FPN layer and a PAN layer;

the prediction module includes a convolutional layer.

Further, the step of inputting the data set to be detected to the YOLOv5s-BC model for detection to obtain a detection result specifically includes:

inputting the data set to be detected into a YOLOv5s-BC model;

based on an input module of the YOLOv5s-BC model, carrying out data enhancement processing on a data set to be detected to obtain an enhanced data set;

the method comprises the steps of performing feature extraction processing on an enhanced data set based on a backbone network module of a YOLOv5s-BC model to obtain a feature data set;

performing feature fusion processing on the feature data set based on a neck module of the YOLOv5s-BC model to obtain a fused data set;

and (3) performing feature screening processing on the fused data set based on a prediction module of the YOLOv5s-BC model to obtain a detection result.

Further, the step of performing feature extraction processing on the enhanced data set by the backbone network module based on the YOLOv5s-BC model to obtain a feature data set specifically includes:

inputting the enhanced data set to a backbone network module of the YOLOv5s-BC model;

slicing the reinforced data set based on a slicing layer of the backbone network module to obtain a sliced data set;

based on a BottleneckCSP_C layer of the backbone network module, performing dimension reduction processing on the sliced data set to obtain a dimension reduced data set;

based on a convolution layer of the backbone network module, carrying out convolution operation on the data set after dimension reduction to obtain an operated data set;

based on the SE-Net attention layer of the backbone network module, squeezing and exciting the operated data set to obtain a data set with global characteristics;

and carrying out average pooling calculation on the data set with the global characteristics based on the pooling layer of the backbone network module to obtain the characteristic data set.

Further, based on the neck module of the YOLOv5s-BC model, the step of performing feature fusion processing on the feature data set to obtain a fused data set, specifically includes:

inputting the feature dataset into a neck module of the YOLOv5s-BC model;

based on the FPN layer of the neck module, performing up-sampling operation on the characteristic data set to obtain a sampled data set;

and (3) based on the PAN layer of the neck module, carrying out aggregation shallow feature fusion processing on the sampled data set to obtain a fused data set.

Further, the method also comprises the step of carrying out index measurement calculation on the detection result, wherein the index measurement calculation comprises an accuracy rate, a recall rate and an average accuracy rate.

The second technical scheme adopted by the invention is as follows: YOLOv5s-BC based diaphorina citri detection system comprising:

the preprocessing module is used for constructing a diaphorina citri detection data set and preprocessing data to obtain a data set to be detected;

the construction module is used for constructing a YOLOv5s-BC model;

the detection module is used for inputting the data set to be detected into the YOLOv5s-BC model for detection, and obtaining a detection result.

The method and the system have the beneficial effects that: according to the invention, the acquired diaphorina citri is subjected to data enhancement processing, then the original YOLOv5s model is improved, the YOLOv5s-BC model is constructed, the Bottleneck CSP module replaces the Bottleneck CSP module of the original YOLOv5s model to enlarge the perception field of view, the SE-Net attention module is added into the backbone network to further induce the acquisition capability of the model on the shallow characteristics of the diaphorina citri, the diaphorina citri subjected to data enhancement is input into the improved YOLOv5s-BC model for detection, and the accurate detection of the diaphorina citri in the defocus state can be realized.

Drawings

FIG. 1 is a flow chart of the steps of the YOLOv5 s-BC-based diaphorina citri detection method of the present invention;

FIG. 2 is a block diagram of a YOLOv5 s-BC-based diaphorina citri detection system of the present invention;

FIG. 3 is a schematic diagram of the structure of a YOLOv5s-BC model obtained by modifying the YOLOv5s model;

FIG. 4 is a schematic illustration of the slicing principle operation of the Focus module of the backbone network of the present invention;

FIG. 5 is a schematic diagram of the structure of a Bottleneck CSP module of the original YOLOv5s model;

FIG. 6 is a schematic diagram of the structure of a BottleneckCSP_C module of the invention;

FIG. 7 is a schematic diagram of a comparison of model backbone network modules before and after modification;

FIG. 8 is a schematic diagram of the structure of the SE-Net attention module of the present invention;

FIG. 9 is a graph of out-of-focus data samples obtained by the camera of the present invention;

FIG. 10 is a diagram of a sample of close-up data acquired by a camera of the present invention;

FIG. 11 is a view of a sample of perspective data obtained by a camera of the present invention;

FIG. 12 is a schematic diagram of the Mosaic data enhancement in an embodiment of the present invention.

Detailed Description

The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Referring to fig. 1, the present invention provides a YOLOv5s-BC based method for detecting diaphorina citri, comprising the steps of:

s1, constructing a diaphorina citri detection data set and preprocessing data to obtain a data set to be detected;

specifically, referring to fig. 12, young shoots, tender leaves and other citrus psyllids of fruit trees are photographed by a micro-single digital camera and a handheld photographic device, the pictures are uniformly adjusted to 900 x 900 pictures after being transmitted into a model, and Mosaic data enhancement is introduced during model training, before each iteration starts, a deep learning frame not only reads images from a training set, but also generates new images through the Mosaic data enhancement, then the newly generated images and the read images are combined into training samples, the training samples are input into the model for training, 4 images are randomly selected from the training set of citrus psyllids through the Mosaic data enhancement, the 4 selected images are randomly cut respectively, the cut images are spliced in sequence to obtain 1 new image, and the resolution of the Mosaic data enhancement generated images is consistent with that of the 4 training set images.

S2, constructing a YOLOv5S-BC model;

specifically, referring to fig. 3, a YOLOv5s model is improved to construct a YOLOv5s-BC model, wherein compared with YOLOv4, the YOLOv5s model has the characteristics of smaller mean weight file and shorter training time and reasoning speed on the basis of less reduction of average detection precision; the structure of YOLOv5s is divided into four parts of an input end, a Backbone network (Backbone), a Neck (ck) and a Prediction (Prediction), wherein the input end is used for training an enhanced photo input model.

S21, improving a model backbone network module of the YOLOv 5S;

s211, a Focus module;

specifically, referring to fig. 4, for the existing YOLOv5s model, the backbone network mainly includes a slicing structure (Focus), a bottleeck csp, a convolution module (Conv), and a spatial pyramid pooling module (SPP), and the slicing operation is performed on the picture based on the Focus module of the YOLOv5s-BC model, specifically, a value is taken at every other pixel in a picture, similar to the adjacent downsampling, so that four pictures are taken, the four pictures are complementary, and have a long length, but no information is lost, so W, H information is concentrated into a channel space, the input channel is expanded by 4 times, that is, the spliced picture becomes 12 channels relative to the original RGB three-channel mode, and finally, the obtained new picture is subjected to the convolution operation, and finally, the double downsampling feature map under the condition of no information loss is obtained. In a specific example, the original 900×900×3 image is input into a Focus structure, and is firstly changed into a 450×450×12 feature map by slicing operation, and then is subjected to a convolution operation to finally become a 450×450×32 feature map.

S212, improvement of a Bottleneck CSP module;

specifically, referring to fig. 5 and 6, the bottlebeckcsp is an important component block of feature extraction in YOLOv5s algorithm, consisting of a number of convolution blocks (Conv), bottleneck layers (bottleck), convolution layers (Conv 2 d), battnorm 2d, and SiLU activation functions. The convolution module is formed by adding a Conv2d convolution layer with a BN (Batch Normalizations) layer and adding a SiLU activation function; the bottleneck layer is a convolution of 1*1 followed by a convolution of 3*3, where the convolution of 1*1 halves the number of channels, the convolution of 3*3 doubles the number of channels, and then the input is added, set to reduce the number of parameters. The method comprises the steps of reducing the dimension of data, then carrying out convolution of a conventional convolution kernel, and finally carrying out dimension-increasing on PW data, wherein the Bottleneck CSP aims to reduce the calculation bottleneck and the memory cost while enhancing the learning capability of a network, and adopts a structure of a lightweight network, so that the detection of a small target group is unfriendly.

S213, improvement of a Backbone module (Backbone);

specifically, referring to fig. 7, the present invention performs an attention module parameter analysis experiment based on the diaphorina citri target in an experimental analysis section, aiming at the problem that YOLOv5s is not ideal for small target group detection performance and the omission condition is serious. Introducing a SE-Net channel attention module at an eighth layer and a tenth layer in the backspace respectively so as to guide an algorithm to focus on a small target of diaphorina citri, wherein the SE-Net overall structure is shown in fig. 8, fsq is a Squeeze operation, fex represents an Excitation operation, and Fscan represents a scaling operation; firstly, performing a Squeeze operation, compressing feature vectors (256, 512, 1024) of three channel dimensions output by an original Yolov5s network detection layer by using global pooling, obtaining global information among features of each channel, changing a feature map U (HW C) into a scalar of 1C, then establishing a correlation model among channels by two fully connected layers, performing nonlinear transformation on the channel features, outputting weight information of the channel C, adding a ReLu activation function between the two fully connected layers to increase nonlinearity among the channels, performing weight normalization operation by using a Sigmoid function, finally weighting the features among the channels by using a scaling operation, multiplying the weight among the channels with the features of the original feature map to obtain a brand-new channel weight, and improving the weight specific gravity among small target channels by SE-Net, so that a guiding algorithm attaches more importance to the relevant feature information of the small targets, reinforcing training the features, and further improving the detection performance of the algorithm on the long-distance small targets.

S22, a Neck (Neck) module;

specifically, data output by the Backbone module are transmitted to the Neck module, the FPN is in an up-sampling mode to obtain a spliced feature map, shallow features are polymerized from bottom to top through a PAN network structure, image features of all layers are fully integrated, a Neck is a feature fusion network, a structure that the FPN is combined with the PAN is adopted, a conventional FPN layer is combined with a feature pyramid from bottom to top, extracted semantic features are fused with position features, and a trunk layer is fused with a detection layer, so that the model is better fused with multi-scale features to obtain richer feature information.

S23, a Prediction (Prediction) module;

specifically, finally, for the data transmission output by the Neck (negk) module, the Prediction module uses CIOU_Loss as a Loss function, uses non-maximum suppression to screen out redundant target frames, predicts image features and finds the best detection position, and outputs a vector with the class probability of the target object, the object score and the position of the object bounding box. The detection network consists of three detection layers, the feature maps with different sizes are used for detecting target objects with different sizes, each detection layer outputs corresponding vectors, and finally, a prediction boundary box and a category of the target in the original image are generated and marked.

S3, inputting the data set to be detected into a YOLOv5S-BC model for detection, and obtaining a detection result.

Specifically, the accuracy rate (P), the recall rate (R) and the average accuracy rate (mAP) are used as relevant indexes for algorithm performance evaluation, and the accuracy rate is used for measuring the accuracy of algorithm detection, namely the precision rate. Recall is used to evaluate the comprehensiveness of algorithm detection, i.e., recall, and single-class precision (AP) is calculated by integrating the precision, recall curve and area enclosed by the coordinate axis, and the mAP value can be obtained by adding the AP values of a single class and dividing by the class number, and the mAP value, i.e., mAP@0.5, is generally calculated by iou=0.5, where IOU is the cross-over ratio, and is an important function for calculating the mAP, and the specific formula is as follows:

in the above formula, A represents a prediction frame, and B represents a trueThe box, TP, positive target is predicted as positive, FP as false positive, false negative target as positive, FN as false negative, positive target as negative, P (r) as smooth curve of precision and recall rate, and the integral operation is to calculate the area occupied by the smooth curve, k as class number, AP _i The accuracy of the i-th category is represented, wherein i is a sequence number, and the number of categories in the text is 1.

Further, a specific experiment is carried out, the operating system of the experiment is Windows 10, the processor model is Intel (R) Core (TM) i9-10900K CPU@3.70GHz 3.70GHz, the memory is 32G, the GPU is GeForce RTX 3080, and the deep learning framework is Pytorch, python3.6.5;

in order to obtain the diaphorina citri samples under different natural illumination intensities, the diaphorina citri samples are collected in 3 time periods, namely 8:00-9:00, 12:00-13:00 and 16:00-17:00, a micro single digital camera and a handheld photographic device are used for shooting the diaphorina citri adults on new shoots, tender leaves and other parts of fruit trees, the shooting distance is 50-100 cm, 97 diaphorina citri images shot by citrus specialists or fruit farmers in various places throughout the country are added on the basis, 660 diaphorina citri images of an orchard under natural environment are obtained, the data can be divided into 900 x 900 and 600 x 600 (pixels) formats from the aspect of resolution, the data can be divided into out-of-focus data, near view data and far view data from the aspect of data quality, the out-of-focus data and the far view data are serious due to target feature loss, and therefore, the shooting results are shown in fig. 9, 10 and 11;

further, the number of images per incoming algorithm was 2, the number of training rounds was 100, and the training and test image resolutions were set to 900 x 900. The self-adaptive momentum estimation method is adopted to update parameters, the weight attenuation is set to be 5 multiplied by 10 < -4 >, the momentum factor is set to be 0.937, the learning rate attenuation adopts a mode of combining preheating training and cosine annealing, and the formula is as follows:

in the above, T _t Representing the total number of training wheels, taking T _t ＝100，T _warm Representing the total number of wheels of preheating training, taking T _warm ＝3，T _cos Representing training total wheel number by cosine annealing method, taking T _cos ＝85，T _cur Expressed as the current training round number, the training per one period is automatically increased by 1, eta _t Learning rate, eta, representing the number of training rounds at present _max Represents the maximum learning rate and takes eta _max ＝2.5×10 ^-4 ，η _min Represents the minimum learning rate and takes eta _min ＝2.5×10 ^-6 ω represents the preheat training decay rate, taking ω=0.1;

further, in order to analyze the influence of the proposed improved scheme of adding the SE-Net attention module and the Bottleneck CSP_C module into the YOLOv5s algorithm on the performance of the original algorithm, an ablation experiment is performed to verify the superiority of the mixed use of the modules, and experimental data are shown in the following table 1:

table 1 comparison table for ablation experiments

Respectively training the divided data, taking the average value of the obtained indexes as a final measurement standard, taking original Yolov5s without any improvement as a reference, and +representing the module mixing improvement;

from an examination of Table 1, the raw Yolov5s P, R and mAP were 88.20%, 83.31% and 91.02%, respectively. Based on the method, basically, each index after improvement is improved to a certain extent, an SE-Net attention module is respectively introduced into an eighth layer and a tenth layer in a backbond structure, the Bottleneck CSP structure in the backbond and the Neck is improved, mAP respectively reaches 92.40%, 91.49% and 92.67%, and the mAP respectively increases by 1.38%, 0.47% and 1.65% relative to the standard. If the SE-Net attention module is introduced into the eighth layer and the tenth layer in the back-bone structure at the same time and the Bottleneck CSP structure is improved, mAP reaches 93.43%, and the mAP, P and R are improved by 2.41%, 1.31% and 4.22% respectively relative to the standard although P is reduced relative to the single introduction of the SE-Net attention module and the single improvement of the Bottleneck CSP structure in the back-bone structure. Further verifying the feasibility of inserting SE-Net attention modules at the eighth and tenth layers and improving the improvement scheme of the Bottleneck CSP structure, the improved YOLOv5s-BC structure is shown in the following table:

TABLE 2 network structure of yolov5s-BC

/>

In order to further measure the performance of the algorithm for detecting diaphorina citri, under the data set collected herein, a consistent data partitioning strategy and experimental setting are adopted, and 100 rounds of the algorithm is trained by the same experimental parameters with the existing target detection algorithms of YOLOv5s, YOLOv5m, YOLOv5x, YOLOv5l, YOLOv3-tiny, and Faster RCNN, and the experimental results are shown in table 3:

table 3 performance comparison data table of the inventive algorithm and existing target detection algorithm

Model	P/％	R/％	mAP@0.5/％	mAP@.5:.95/％
					YOLOv5s	88.20	83.31	91.02	49.63
YOLOv5m	91.31	84.02	89.03	46.57
					YOLOv5x	92.31	84.02	91.93	47.15
YOLOv5l	96.54	83.02	92.21	46.67
					YOLOv3	85.33	81.01	85.21	42.65
YOLOv3-tiny	86.21	88.02	89.52	43.23
					Faster RCNN	87.11	88.38	89.94	44.63
YOLOv5s-BC	89.51	87.53	93.43	47.02

Compared with the target detection algorithm Faster RCNN based on regional recommendation, the R is reduced by 0.85%, but P, mAP@0.5 and mAP.5:.95 are respectively improved by 2.40%, 3.49% and 2.39%; compared with a regression-based target detection algorithm YOLOv3, P, R, mAP@0.5 and mAP.5:.95, the method has the advantages that the method is improved by 4.18%, 6.52%, 8.22% and 4.37% respectively; compared with a regression-based target detection algorithm YOLOv3, R is reduced by 0.49%, but P, mAP@0.5 and mAP.5:.95 are respectively improved by 3.30%, 3.91% and 3.79%; compared with the YOLOv5s algorithm before improvement, although mAP.5:.95 is reduced by 2.61%, P, R and mAP@0.5 are respectively improved by 1.31%, 4.22% and 2.41%, wherein mAP@5 is mAP when IoU is 0.5, and mAP@5:. 95 is the average number of mAP when IoU is 0.5 to 0.95 in 0.05 step size.

Referring to fig. 2, the YOLOv5s-BC based diaphorina citri detection system comprises:

the construction module is used for constructing a YOLOv5s-BC model;

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present invention has been described in detail, the invention is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the invention, and these modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims

1. The method for detecting the diaphorina citri based on the YOLOv5s-BC is characterized by comprising the following steps of:

constructing a YOLOv5s-BC model;

the YOLOv5s-BC model specifically comprises an input module, a backbone network module, a neck module and a prediction module, wherein;

the trunk network module comprises a slice layer, a Bottleneck CSP_C layer, a convolution layer, a SE-Net attention layer and a pooling layer, wherein the Bottleneck CSP_C layer consists of a convolution module, a bottleneck layer, a Conv2d convolution layer, batchNorm2d and SiLU activation functions, and the SE-Net attention layer insertion positions are between a seventh convolution layer and an eighth pooling layer in the trunk network module and after the ninth Bottleneck CSP_C layer, namely, a series connection relation is formed between the SE-Net attention layer and the convolution layer, the pooling layer and the Bottleneck CSP_C layer in the trunk network module;

the neck module comprises an FPN layer and a PAN layer;

the prediction module comprises a convolution layer;

inputting the data set to be detected into a YOLOv5s-BC model;

2. The YOLOv5s-BC based diaphorina detection method according to claim 1, wherein said constructing the diaphorina detection data set and performing data preprocessing to obtain the data set to be detected comprises:

obtaining a diaphorina citri sample image;

3. The YOLOv5 s-BC-based diaphorina detection method according to claim 2, wherein said YOLOv5s-BC model-based backbone network module performs feature extraction processing on the enhanced dataset to obtain a feature dataset, and the method specifically comprises the steps of:

4. The YOLOv5 s-BC-based diaphorina detection method of claim 3, wherein said YOLOv5s-BC model-based neck module performs feature fusion processing on the feature data set to obtain a fused data set, and the method specifically comprises the steps of:

inputting the feature dataset into a neck module of the YOLOv5s-BC model;

5. The YOLOv5s-BC based diaphorina detection method of claim 4, further comprising performing an index measurement calculation on the detection results, wherein the index measurement calculation comprises accuracy, recall, and average accuracy.

6. YOLOv5s-BC based diaphorina citri detection system, comprising the following modules:

the construction module is used for constructing a YOLOv5s-BC model;

the neck module comprises an FPN layer and a PAN layer;

the prediction module comprises a convolution layer;

the detection module is used for inputting the data set to be detected into the YOLOv5s-BC model for detection to obtain a detection result; based on an input module of the YOLOv5s-BC model, carrying out data enhancement processing on a data set to be detected to obtain an enhanced data set;