CN115761305A - Lightweight network design method for pest and disease identification application - Google Patents

Lightweight network design method for pest and disease identification application Download PDF

Info

Publication number
CN115761305A
CN115761305A CN202211303885.3A CN202211303885A CN115761305A CN 115761305 A CN115761305 A CN 115761305A CN 202211303885 A CN202211303885 A CN 202211303885A CN 115761305 A CN115761305 A CN 115761305A
Authority
CN
China
Prior art keywords
network
yolov5
loss
module
pest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211303885.3A
Other languages
Chinese (zh)
Inventor
周省邦
赵戈
李传起
刘书田
陈东
曾倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanning Normal University
Original Assignee
Nanning Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanning Normal University filed Critical Nanning Normal University
Priority to CN202211303885.3A priority Critical patent/CN115761305A/en
Publication of CN115761305A publication Critical patent/CN115761305A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of pest prediction, in particular to a lightweight network design method for pest identification application, which provides a Light-Yolov5 network model on the basis of a Yolov5s original network, firstly introduces a Ghost convolution idea to replace a part of convolution modules in the original Yolov5 network and reduces the number of network parameters; secondly, a lightweight CA attention mechanism is independently embedded into the 9 th layer of the network and then fused with a GhostC3 module to form the GhostC3CA module, so that the feature extraction and fusion capability of the network model is enhanced; and finally, changing the original loss function CIOU into SIOU to accelerate the convergence speed of the model, so that the gradient descending direction in the training process is more accurate, and on the premise that the accuracy rate of the Light-YOLOV5 network model is relatively close, the calculation resources are saved, and the method is more suitable for being deployed on mobile equipment with shortage of calculation resources.

Description

Lightweight network design method for pest and disease identification application
Technical Field
The invention relates to the technical field of pest and disease damage prediction, in particular to a lightweight network design method for pest and disease damage identification application.
Background
Crop diseases and insect pests are one of the most complex and variable and difficult-to-overcome factors for restricting crop growth, are the main causes of agricultural production and economic loss all over the world, and the loss can be reduced only by timely discovering the diseases and insect pests and making correct diagnosis and treatment on the diseases and insect pests, so that the yield of agricultural products is improved. Most of the traditional pest and disease identification methods are agricultural expert diagnosis, depend on subjective experience of experts, and have low efficiency, large error and high cost.
With the development of image processing technology, a crop pest and disease identification method based on machine learning becomes possible, but features still need to be extracted manually, the operation is complicated, the abstract features are difficult to extract, and the precision is difficult to be improved to an application level. Since then, with the development of Graphics Processing Units (GPUs), deep learning techniques relying on powerful GPU processing capabilities have also been developed rapidly and are widely used in the field of agricultural pest identification. Compared with the traditional pest and disease identification method and the identification method based on machine learning, the accuracy rate of applying the deep learning technology to crop pest and disease identification is better, but the cost of improving the accuracy rate of most deep learning network models is that the models are more complex, a high-performance GPU (graphics processing unit) is infinitely consumed, the calculation overhead is huge, and the use of the deep learning network models at a low-cost and low-computing-power mobile terminal is severely limited.
The existing method for realizing the lightweight deep learning network model still has the technical problems of low model identification accuracy, complex operation and large parameter quantity and calculated quantity.
Disclosure of Invention
The invention aims to provide a lightweight network design method for pest identification application, and aims to solve the technical problems of low model identification accuracy, complex operation and large parameter quantity and calculated quantity in the pest prediction process of the conventional lightweight deep learning network model.
In order to achieve the purpose, the invention provides a lightweight network design method for pest identification application, which comprises the following steps:
selecting a YOLOV5 original network and introducing a GhostMoudele structure;
inserting a CA attention mechanism;
replacing the loss function from CIOU to SIOU to generate a Light-YOLOV5 network model;
and training a Light-Yoloov 5 network model by adopting the pest image set, and predicting the pest by using the trained Light-Yoloov 5 network model.
The process of introducing the GhostMoudle structure comprises the following steps:
replacing the Bottlenteck in the C3 module with Ghostbottleneck, and replacing the rest CBS composite convolution modules in the C3 module with GhostCBS to form a complete GhostC3 module;
replacing the SPP module with GhostSPP;
the CBS modules responsible for downsampling are all replaced with ghost CBS.
Wherein, the step of inserting the CA attention mechanism comprises the following steps:
embedding coordinate information;
and generating coordinate information.
Wherein the CA attention mechanism is separately embedded in the 9 th layer of the Backbone network and is fused with the GhostC3 module.
In the process of replacing the loss function from CIOU to SIOU, the SIOU loss function considers the distance loss, the aspect ratio loss and the IOU loss of the GroudTruth frame and the prediction frame included in the CIOU loss function, and also considers the loss of the GroudTruth frame and the prediction frame in the direction.
The invention provides a lightweight network design method for pest identification application, which provides a Light-Yolov5 network model on the basis of a Yolov5s original network, firstly introduces a Ghost convolution idea to replace a part of convolution modules in the original Yolov5 network and reduces the network parameter number; secondly, a lightweight CA attention mechanism is independently embedded into the 9 th layer of the network and then fused with a GhostC3 module to form the GhostC3CA module, so that the feature extraction and fusion capability of the network model is enhanced; and finally, changing the original loss function CIOU into SIOU to accelerate the convergence speed of the model, so that the gradient descending direction in the training process is more accurate, and on the premise that the accuracy rate of the Light-YOLOV5 network model is relatively close, the calculation resources are saved, and the method is more suitable for being deployed on mobile equipment with shortage of calculation resources.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic structural diagram of a Ghost-series network module in the lightweight network design method for pest identification application of the present invention.
Fig. 2 is a schematic diagram of a Light-yoloov 5 network structure in the lightweight network design method for pest identification application of the present invention.
Fig. 3 is a graph comparing the normal convolution with the Ghost convolution.
Fig. 4 is a schematic diagram of the CoordinateAttention module structure of the present invention.
FIG. 5 is a schematic representation of a Pilocarpus phaseoloides image collection employed in an embodiment of the present invention.
FIG. 6 is a schematic representation of a set of images of the disease of Nyctanthus martensii Karsch employed in a specific embodiment of the present invention.
FIG. 7 is a diagram illustrating a pattern for enhancing data of two sets of images according to an embodiment of the present invention.
Fig. 8 is a schematic diagram illustrating a result of the mosaic data enhancement processing according to the embodiment of the present invention.
FIG. 9 is a schematic flow chart of an experimental procedure according to an embodiment of the present invention.
FIG. 10 is a diagram illustrating the convergence of a model on a training set and a validation set in an embodiment of the present invention.
FIG. 11 is a graphical representation of performance results for the Pieris canadensis test set in a specific embodiment of the present invention.
Figure 12 is a graphical representation of performance statistics for the no nuclearity voronoi canker disease test set in a specific embodiment of the invention.
FIG. 13 is a graph showing statistical performance of three network models on the Pieris neteri test set in an embodiment of the present invention.
FIG. 14 is a graphical representation of statistical performance of three network models on the Nymotus nutans ulcer disease test set in an embodiment of the invention.
Fig. 15 is a diagram illustrating comparison of recognition results of three network models according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
Referring to fig. 1 to 4, the invention provides a lightweight network design method for pest identification application, comprising the following steps:
s1: selecting a YOLOV5 original network and introducing a GhostMoudle structure;
s2: inserting a CA attention mechanism;
s3: replacing the loss function from CIOU to SIOU to generate a Light-YOLOV5 network model;
s4: and training a Light-Yoloov 5 network model by adopting the pest image set, and predicting the pest by using the trained Light-Yoloov 5 network model.
The invention is further explained by combining background knowledge and implementation steps as follows:
YOLOV5 is a single-stage detection algorithm proposed by Glen Jocher in 2020, and can be divided into four network models with different complexity, namely YOLOV5s, YOLOV5l, YOLOV5m and YOLOV5x, according to the difference of two scaling factors of network depth and width. The method comprehensively considers two factors of accuracy and light weight of identification, and selects a YOLOV5s network model to carry out the following research.
The Ghost Moudle idea was first introduced into the YOLOV5s network model. Replacing a Bottleneck in the C3 module with a Ghostbottleneck, and replacing the rest CBS composite convolution modules in the C3 module with GhostCBS, so as to form a complete GhostC3 module; replacing the SPP module with GhostSPP; and replacing all CBS modules in the original network which are responsible for downsampling with GhostCBS. The Ghost family of network components is shown in figure 1. The Coordinate Action (CA) attention mechanism concept was then introduced. Considering various factors such as network lightweight and identification accuracy, the CA module is embedded into the 9 th layer of the backhaul network; in addition, the CA module is fused with the GhostC3 module, so that the feature extraction and fusion capability of the network model is further improved. In summary, the Light-YOLOV5 network model structure is shown in fig. 2, and the detailed structure parameters are shown in table 1.
TABLE 1 Light-YOLOV5 network architecture details
Figure BDA0003904995440000041
Figure BDA0003904995440000051
In table 1, "layers" indicates the layer number, "from" indicates which layer the input of the current layer comes from, "parameters" indicates the number of parameters, "motion" indicates the module name; "-1" represents the previous layer, "Concat" represents the splicing operation, and "n" represents the number of times the module is repeated.
As shown in fig. 2, if the input picture Resize is 640 × 3, the input picture is fed into a Light-yoloov 5 network, image features are extracted through a Backbone network, the extracted features are input into a hack network for feature fusion, and prediction results of three different scales, namely 80 × 255, 40 × 255 and 20 × 255, are obtained through a Head network.
Yolov version 5s v5.0 uses CIOU as a loss function, and the formula is as follows:
Figure BDA0003904995440000052
the IOU is the ratio of intersection and union of the prediction box and the Grounttruth box, rho 2 (b,b gt ) C represents the diagonal distance, w, h, w, which can contain the minimum surrounding area of the prediction frame and the group Truth frame at the same time gt ,h gt The width and height of the prediction box and the groudtruth box are indicated, respectively.
In order to make the improved Light-Yoloov 5 network more superior in performance, the loss function is changed into SIOU, so that the convergence speed of the network model is increased, and the gradient descending direction is more accurate.
Specifically, three important definitions in the improvement step are as follows:
1. ghost Moudle structure
GhostNet is a cheap and efficient neural network system structure proposed by Han K and the like in 2020, can reduce the calculation cost of a common convolution layer on the premise of keeping similar identification performance, is highly modularized and highly portable, and has the advantages of strong plug and play. FIG. 3 is a graph comparing a conventional convolution with a Ghost convolution, wherein (a) is a structural feature of the conventional convolution and (b) is a structural feature of the Ghost convolution. The core idea of the GhostNet is to split the common convolution operation into two steps according to the relation between feature maps: firstly, a small number of feature maps are generated by adopting a common convolution method; secondly, the feature map generated in the first step is used for enhancing features by using cheap linear operation, finally, two groups of feature maps are spliced on channel dimensions, and the inherent feature map is ensured by adopting a parallel mode of the feature map and linear transformation, so that the generated feature map can be matched with a given output channel.
The general convolution calculation is shown in equation (1):
Y=X*f+b (1)
where is the convolution operation, b is the bias term, for a given input data
Figure BDA0003904995440000061
Output of
Figure BDA0003904995440000062
Expressed as formula (1), c is the input channel, h, w, h ', w' are the width and height of the input and output data respectively,
Figure BDA0003904995440000063
represents convolution operations with n convolution kernels of size k × k × c. The FLOPs of a normal convolution can be expressed as n × h '× w' × c × k × k.
And for the Ghost convolution, a step convolution mode is adopted, and the calculation formulas are shown as formulas (2) and (3).
Y′=X*f′ (2)
Figure BDA0003904995440000064
In the formula (2), the result of a small amount of feature maps generated by the input X through m (m < n) ordinary convolutions with the convolution kernel size k × k × c is
Figure BDA0003904995440000065
And performing cheap linear operation on the m feature maps, generating s feature maps by each feature map, and generating n = m × s feature maps as output data of the Ghost module, wherein the process is shown as formula (3). In the formula (3), phi i,j The ith feature map y representing the first step of ordinary convolution generation i ' carry out the jth Linear operation (characteristic identity)Mapping).
Assuming that the convolution kernel size of each linear transformation is d × d, the theoretical acceleration ratio of the Ghost convolution replacing the normal convolution is equation (4).
Figure BDA0003904995440000066
The calculation amount of the normal convolution is approximately s times of Ghost as a result of the reduction of the equation (4), and the parameter amount can be calculated to be s times in the same manner.
The invention reduces the dimension of the Convolution output in the GhostCBS module to half of the original dimension, then carries out cheaper linear transformation on the feature map output by the GhostCBS module in the GhostBottleneck by using Depth-wise separable Convolution (Depth-wise Convolution), and eliminates the correlation among channels by using the Depth-wise separable Convolution, so that the current channel feature is only correlated with the self, on one hand, the generation mode of the redundancy feature is simulated, and on the other hand, the parameter quantity and the calculated quantity are obviously reduced.
2. CA attention mechanism
Coordinate Attention (CA) is an attentive mechanism with lightweight characteristics published in CAVR by Hou Q et al in 2021, can be flexibly inserted into a classical mobile network, has almost no calculation overhead and enables the network to acquire wider area information, and has better performance than attentive mechanisms such as Squeeze-and-Excitation (SE).
The CA attention mechanism is accomplished in two steps: coordinate Information Embedding (Coordinate Information Embedding), coordinate attention Generation (Coordinate attention Generation), the structure of which is shown in fig. 4.
(1) Coordinate Information Embedding (Coordinate Information Embedding): in order to enable the CA attention module to capture remote spatial interactions with precise location information, the global average pooling is decomposed in the manner of equation (5), which translates into a pair of 1D feature encoding operations. For a given input X, the channel is first encoded along the horizontal and vertical directions using pooling kernels of sizes (H, 1) and (1, w), respectively. The output of the c-th channel having a height h and the c-th channel having a width w may be expressed as equation (6) and equation (7), respectively.
Figure BDA0003904995440000071
Figure BDA0003904995440000072
Figure BDA0003904995440000073
(2) Coordinate information Generation (coordinate attention Generation): in order to use the global receptive field obtained by the above transformation and encode accurate position information, after the transformation of the coordinate information embedding part, concat operation is carried out and the channels are compressed by using convolution variation function, the number of channels is changed from C to C/r (r controls the compression rate). The calculation formula is shown in (8).
f=δ(F 1 ([z h ,z w ])) (8)
In the formula (8), F 1 Represents a 1 x 1 convolution transform, [,.]Denotes the Concat operation and δ denotes the nonlinear activation function.
Further, f is decomposed into tensors f h And f w The number of channels output by matching the Residual is shown in equations (9) and (10).
g h =σ(F h (f h )) (9)
g w =σ(F w (f w )) (10)
In the formulae (9) and (10), F h And F w Represents a 1 × 1 convolution transformation and σ represents a Sigmoid function.
Finally, the CA attention mechanism module integrates two steps of Coordinate Information Embedding (Coordinate Information Embedding) and Coordinate Information Generation (Coordinate attention Generation), and obtains a calculation formula of the output Y as shown in formula (11).
Figure BDA0003904995440000081
According to the invention, a CA attention mechanism is independently embedded into the 9 th layer of the Backbone network, and is fused with a GhostC3 module, so that the expression capability of the learning characteristics of the network is enhanced.
3. SIOU loss function
SIOU LOSS is the most recent LOSS function in the field of target detection proposed by GevorgyanZ at 5 months 2022. The method not only considers the distance loss, the aspect ratio loss and the IOU loss between the group Truth frame and the prediction frame included in the loss function (such as GIOU, CIOU and the like) proposed previously, but also considers the loss of the group Truth frame and the prediction frame in the direction, and solves the problem of slow convergence speed caused by 'wandering around' of the prediction frame in the training process. The SIOU loss function consists of four cost functions, each of which is an angle cost (L) ang ),distance cost(L dis ),shape cost(L sha ),Iou cost(L IOU ) The SIOU calculation formula is shown in formula (12).
Figure BDA0003904995440000082
The calculation of IOU, delta and omega are respectively shown in formulas (12), (13) and (14).
Figure BDA0003904995440000083
Figure BDA0003904995440000084
L sha =Ω=∑ t=w,h (1-e -ωt ) θ (15)
Figure BDA0003904995440000085
In the formula (13), the reaction mixture is,
Figure BDA0003904995440000086
wherein
Figure BDA0003904995440000087
Respectively (horizontal and vertical coordinates of the central point of the prediction frame),
Figure BDA0003904995440000088
respectively representing (horizontal and vertical coordinates of the center point of the group Truth frame); in the formula (14), the compound represented by the formula (I),
Figure BDA0003904995440000089
γ =2- Λ, where ρ xy Represents the distance between the central point of the prediction frame and the central point of the Ground Truth frame in the horizontal direction and the vertical direction, c w ,c h Respectively representing the width and the height of a prediction box; in the formula (15), the reaction mixture is,
Figure BDA00039049954400000810
wherein w, h represent the width and height of the prediction box, respectively, w gt ,h gt Respectively (width and height of a group Truth frame), and theta is a parameter for controlling the attention degree to Shape cost; in the formula (16), B represents a prediction frame, B GT A group Truth box is indicated.
According to the method, SIOU is used for replacing the original CIOU, and loss of the Ground Truth frame and the prediction frame in the direction is considered, so that the direction of gradient reduction in the training process is more accurate, and the convergence speed of the model and the accuracy of reasoning are improved.
Furthermore, the invention also provides a specific embodiment for experimental verification:
the invention selects the green pea cabbage worm and the anucleate wok orange canker as research objects. The image of the Pieris rapae is collected from a Pieris rapae town of Wuming district Gong province, nanning city of Zhuang nationality in Guangxi, and the weather is cloudy on the same day. Adopting a digital camera to shoot 441 Dutch bean cabbage caterpillar pictures from different angles, wherein the resolution is 5184 multiplied by 3456; the image of the anucleate Or citrus canker is collected from the Yiwogan plantation in the Wuming district of Nanning City of Zhuang nationality in Guangxi, and the weather is sunny. 200 Dutch bean and cabbage caterpillar pictures are shot by adopting a Huawei P40 mobile phone from different angles, and the resolution is 4096 multiplied by 3072. The two pests have been identified by agricultural plant protection experts as shown in fig. 5 and fig. 6, where fig. 5 is the cabbage caterpillar and fig. 6 is the anucleate vorax citri canker.
Further, specific pretreatment steps for the cabbage caterpillar picture are as follows: (1) the method comprises the steps of (1) collecting original pictures, (2) carrying out histogram equalization, and expanding 441 Dutch bean cabbage caterpillar pictures to 881; aiming at the picture of the anucleate wogonian ulcer disease, the specific pretreatment steps are as follows: (1) the method comprises the steps of (1) collecting original pictures, (2) cutting and turning, (3) equalizing histograms, and expanding non-nuclear wok citrus canker pictures from 200 to 800. The specific case is shown in fig. 7.
Specifically, fig. 7 (a) - (b) enhance the data of the images of the green insects of the netherlands beans, namely, (a) an original image, (b) the original image + histogram equalization; fig. 7 (c) - (h) data enhancement for nrt kumquat canker (c) raw picture, (d) raw picture + histogram equalization, (e) raw picture + cut, (f) cut + flip, (g) cut + histogram equalization, (h) cut + flip + histogram equalization.
And then, carrying out data annotation on the pictures by using LabelImg software, and generating a label in a Pascal VOC format for storing in an xml file format. Thereby forming 1681 pest and disease data sets for training, verifying and testing of subsequent models.
Besides the data enhancement mode, mosaic data enhancement is adopted to further expand the training set in the training process. The mosaic data enhancement method is proposed in YOLOV4 paper of 2020 by Alexey Bochkovski and the like, and the main idea is to splice 4 pictures in a data set into one picture after random cutting, so that the method not only plays a role of expanding the data set, but also enriches the backgrounds of detected objects, has a good effect on small target identification, and the picture enhanced by the mosaic data is shown in FIG. 8.
The experimental platform adopts a Linux server, in the aspect of software, an operating system is Ubuntu20.04 (professional edition), pycharm is used as a program editing environment, an operating environment of a pytorch frame 1.9.0 is constructed in an Anaconda environment, wherein python3.8 is used as an interpreter, and the CUDA version is 11.2; in the aspect of hardware, the processor is intel i9-11900K, the display card is RTX3090-24G, and the memory is 64G. The experimental set-up is shown in table 2.
TABLE 2 Experimental Equipment configuration parameters
Configuring parameters Detailed information
Operating system Unbuntu20.04
Display card RTX3090-24G
Memory device 64G
Programming language Python 3.8.1
Deep learning framework Pytorch 1.9.0
GPU acceleration environment CUDA 11.2
Processor with a memory for storing a plurality of data intel i9-11900K
In order to count the performance of the Light-YoloV5 network model on the test set, six evaluation indexes of Precision (P), recall (R), average Precision (AP), meaneAverage Precision (mAP), parameter Precision (Params) and Floating point operation Precision (GFLOPs) are set, and the formula is as follows (17) - (22):
Figure BDA0003904995440000101
Figure BDA0003904995440000102
Figure BDA0003904995440000103
Figure BDA0003904995440000104
Params=C in ×C out ×k 2 (21)
GFLOPs=10 9 ×FLOPs≈2×10 9 ×H×W×Params=2×10 9 ×H×W×C in ×C out ×k 2 (22)
wherein, precision (P) represents the proportion of the predicted pair in the positive sample predicted by the model, namely Precision; recall (R) represents the predicted fraction of the actual positive sample, i.e., the Recall ratio; the Average Precision (AP) is the Average value of P in any one category, the quality of the evaluation model in the category can be approximately calculated as the area enclosed by a P-R curve and a coordinate axis in practical application, the Mean Average Precision (mAP) represents the Average value of the AP in all categories, and the quality of the evaluation model in all categories is obtained; parameters (Params) represent Parameters contained in the model, directly determine the size of a model file and influence the occupation amount of the memory during reasoning; the Floating Point Operation Quantity (GFLOPs) represents Floating Point calculation times required in reasoning, determines complexity of calculation time, and is 1GFLOPs =1 × 10 9 FLOPs。
In the equations (17) - (20), TP is True Positive, and means that it is determined as a Positive sample, and is actuallyIs also a positive sample; TN is True Negative, which means that it is determined as Negative sample, and actually is also Negative sample; FP is False Positive, which means that the sample is judged to be Positive, but actually is negative; FN is False Negative, indicating that it is judged to be a Negative sample, but actually a positive sample. In the formulae (21) and (22), H, w, C in Respectively the height, width and channel number of the input feature map, k is the convolution kernel size, C out Is the number of output channels.
The specific training process is as follows: first, the data set is divided into 8:1:1, dividing the input picture into a training set, a verification set and a test set, then performing adaptive filling and scaling on the input picture to be 640 x 640 in size, and finally inputting the input picture into a Light-weight Light-Yoloov 5 network model for training.
In the training process, the Batch-size is set to be 16, each Batch has 85 pictures, 500 epochs are iterated, the initial learning rate is set to be 0.01, the model is verified on a verification set once in each iteration, the obtained prediction data is fed back to the model so as to correct the gradient descending direction of the model during training, and the optimal model weight is stored and used for model testing after the training is finished. And then, performing performance statistics on the test set by using the optimal model weight to obtain an optimal pest and disease identification model Light-Yoloov 5. Finally, training on the same data set with original YOLOV5s and YOLOX _ s in the YOLO series and comparing the test results. The experimental process flow chart is shown in fig. 9.
After the model training iterates for 500 epochs, the convergence conditions of the training set and the validation set are saved, and a loss function curve is drawn, as shown in fig. 10. Which comprises the following steps: a Localization Loss (bounding Box Loss or Localization Loss), a frame targeting Loss (Objectness Loss or Confidence Loss), and a Classification Loss (Classification Loss). Localization Loss, which represents the Loss caused by the error between the prediction box and the groudtree box, and the smaller the Loss function, the more accurate the prediction box. The objective Loss (also called confidence Loss) of a box represents a measure of the probability that an object is present in the region of interest, i.e. the probability that the prediction is correct, the smaller the value of the Loss function, the more accurate the detection. Classification loss (Classification loss) is the loss caused by misprediction of a given object class, and the smaller the value of the loss function, the more accurate the Classification.
In fig. 10, the first three pictures represent the convergence of the training set loss function, and the second three pictures represent the convergence of the verification set loss function; "Box" and "val Box" denote Bounding Box Loss on the training set and the validation set, respectively; "Objectness" and "val Objectness" respectively denote "Objectness Loss" and "class" on the training set and the verification set, and "class" and "val class" respectively denote "class Loss" on the training set and the verification set. It can be seen that the loss function shows a gradual decrease trend in the training process, the first 80 epoch loss functions decrease sharply, and the P, R, and mAp parameters increase rapidly in the period; the descending trend of the loss function is gradually slowed down when 80-450 epochs are used, meanwhile, the ascending trend of the P, R and mAP is slowed down, the loss function tends to be stable and does not descend any more until about 450 epochs are used, meanwhile, the parameters of the P, R and mAP are not increased any more, and at the moment, the model converges to the optimal state.
And selecting the remaining untrained test set to be input into the Light-YOLOv5 model for testing. The test results were compared with the original YOLOV5 and YOLOX _ s, and the comparison results are shown in fig. 11-14.
Figure 11 is performance results for the test set of the pea bean caterpillar. It can be seen that the Light-YOLOV5 network model compares to the original YOLOV5 model: p is improved by 1.4%, R is improved by 4.2%, and AP is improved by 1.8%; compared to the YOLOX _ s model: p is reduced by 1.2%, R is improved by 1.4%, and AP is improved by 7.3%.
Figure 12 is a performance statistic for the anucleate vorax aurantii test set. It can be seen that the Light-YOLOV5 network model compares to the original YOLOV5 model: p is reduced by 4.1%, R is improved by 1.1%, and AP is reduced by 2.2%; compared to the YOLOX _ s model: p is reduced by 2.4%, R is improved by 1.1%, and AP is improved by 3.4%.
FIGS. 13 and 14 are statistical performance results of Light-Yolov5, original Yolov5, and Yolox _ s three network models on the test set of two categories of Dutch bean caterpillar and Seedless wogonian canker. It can be seen that Light-YOLOV5 model decreased 59.2% and 67.4% respectively, GFLOPs decreased 51% and 70.5% respectively, map @0.5 decreased 0.2% and 5.4% respectively, and map @ 5: 95 decreased 8.5% and 1.4% respectively, compared to original YOLOV5 and YOLOX _ s Params.
In conclusion, the Light-Yoloov 5 provided by the invention has the best performance when the IOU threshold is set to be 0.5, the identified AP of the green pea cabbage worm and the ulcer of the seedless woolly mandarin orange can reach 97 percent and 93.4 percent respectively, the mAP @0.5 reaches 95.2 percent, and the Params and the GFLOPs are reduced by more than 50 percent under the condition of ensuring the relative level of the AP and the mAP, so that the computing resources are saved, the deployment of edge computing equipment is facilitated, and two Light-Yoloov 5 models can be paralleled under the same computing power.
Further, to verify the actual classification effect of Light-YOLOV5, original YOLOV5 and YOLOX _ s, three networks were tested for the ulcer disease of the cabbage caterpillar and the seedless woolly citrus, and the identification results are shown in fig. 15.
Specifically, fig. 15 is a schematic comparison of the recognition results of three network models. Wherein (a-c) is directed against Pieris oleracea, (a) Light-Yolov5, (b) primitive Yolov5, (c) Yolox _ s, (d-f) is directed against Seedless Wobo canker, (d) Light-Yolov5, (b) primitive Yolov5, (c) Yolox _ s.
As shown in fig. 15, the three network recognition effects are similar, and the cabbage caterpillar and the anucleate woolly orange ulcer disease can be effectively detected. For practical application scenarios, the network model can diagnose and process the plant diseases and insect pests correctly as long as the network model can effectively detect the types and the occurrence positions of the plant diseases and insect pests. In combination with the above, the parameters and floating point operands are more important on the premise that the correct result can be effectively detected, so the Light-YOLOV5 network model has better performance than the original YOLOV5 and YOLOX _ s.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (5)

1. A lightweight network design method for pest identification application is characterized by comprising the following steps:
selecting a YOLOV5 original network and introducing a GhostMoudle structure;
inserting a CA attention mechanism;
replacing the loss function from the CIOU to the SIOU to generate a Light-YOLOV5 network model;
and training a Light-YoloV5 network model by adopting a pest image set, and predicting pests by using the trained Light-YoloV5 network model.
2. A pest identification application-oriented lightweight network design method according to claim 1,
the process for introducing the GhostMoudle structure comprises the following steps:
replacing the Bottlenteck in the C3 module with Ghostbottleneck, and replacing the rest CBS composite convolution modules in the C3 module with GhostCBS to form a complete GhostC3 module;
replacing the SPP module with GhostSPP;
the CBS modules responsible for downsampling are all replaced with ghost CBS.
3. A pest identification application-oriented lightweight network design method according to claim 2, wherein,
a step of inserting a CA attention mechanism, comprising the steps of:
embedding coordinate information;
and generating coordinate information.
4. A pest identification application-oriented lightweight network design method according to claim 3, wherein,
the CA attention mechanism is embedded in the layer 9 of the Backbone network separately and is fused with the GhostC3 module.
5. A lightweight network design method for pest identification application according to claim 4,
in the process of replacing the loss function from CIOU to SIOU, the SIOU loss function considers the distance loss, the aspect ratio loss and the IOU loss of the group Truth frame and the prediction frame included in the CIOU loss function, and also considers the loss of the group Truth frame and the prediction frame in the direction.
CN202211303885.3A 2022-10-24 2022-10-24 Lightweight network design method for pest and disease identification application Pending CN115761305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211303885.3A CN115761305A (en) 2022-10-24 2022-10-24 Lightweight network design method for pest and disease identification application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211303885.3A CN115761305A (en) 2022-10-24 2022-10-24 Lightweight network design method for pest and disease identification application

Publications (1)

Publication Number Publication Date
CN115761305A true CN115761305A (en) 2023-03-07

Family

ID=85353043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211303885.3A Pending CN115761305A (en) 2022-10-24 2022-10-24 Lightweight network design method for pest and disease identification application

Country Status (1)

Country Link
CN (1) CN115761305A (en)

Similar Documents

Publication Publication Date Title
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN114202672A (en) Small target detection method based on attention mechanism
JP2023003026A (en) Method for identifying rural village area classified garbage based on deep learning
CN108108751B (en) Scene recognition method based on convolution multi-feature and deep random forest
CN111882002A (en) MSF-AM-based low-illumination target detection method
CN108805070A (en) A kind of deep learning pedestrian detection method based on built-in terminal
CN106096655B (en) A kind of remote sensing image airplane detection method based on convolutional neural networks
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN107092883A (en) Object identification method for tracing
CN110991444A (en) Complex scene-oriented license plate recognition method and device
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN113569881A (en) Self-adaptive semantic segmentation method based on chain residual error and attention mechanism
CN110046568A (en) A kind of video actions recognition methods based on Time Perception structure
CN115294563A (en) 3D point cloud analysis method and device based on Transformer and capable of enhancing local semantic learning ability
CN111695640A (en) Foundation cloud picture recognition model training method and foundation cloud picture recognition method
CN115861799A (en) Light-weight air-to-ground target detection method based on attention gradient
CN115311502A (en) Remote sensing image small sample scene classification method based on multi-scale double-flow architecture
CN114429208A (en) Model compression method, device, equipment and medium based on residual structure pruning
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN116152226A (en) Method for detecting defects of image on inner side of commutator based on fusible feature pyramid
CN111339950B (en) Remote sensing image target detection method
CN111860601A (en) Method and device for predicting large fungus species
CN116630828A (en) Unmanned aerial vehicle remote sensing information acquisition system and method based on terrain environment adaptation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination