CN115761305A

CN115761305A - Lightweight network design method for pest and disease identification application

Info

Publication number: CN115761305A
Application number: CN202211303885.3A
Authority: CN
Inventors: 周省邦; 赵戈; 李传起; 刘书田; 陈东; 曾倩
Original assignee: Nanning Normal University
Current assignee: Nanning Normal University
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-03-07

Abstract

The invention relates to the technical field of pest prediction, in particular to a lightweight network design method for pest identification application, which provides a Light-Yolov5 network model on the basis of a Yolov5s original network, firstly introduces a Ghost convolution idea to replace a part of convolution modules in the original Yolov5 network and reduces the number of network parameters; secondly, a lightweight CA attention mechanism is independently embedded into the 9 th layer of the network and then fused with a GhostC3 module to form the GhostC3CA module, so that the feature extraction and fusion capability of the network model is enhanced; and finally, changing the original loss function CIOU into SIOU to accelerate the convergence speed of the model, so that the gradient descending direction in the training process is more accurate, and on the premise that the accuracy rate of the Light-YOLOV5 network model is relatively close, the calculation resources are saved, and the method is more suitable for being deployed on mobile equipment with shortage of calculation resources.

Description

Lightweight network design method for pest and disease identification application

Technical Field

The invention relates to the technical field of pest and disease damage prediction, in particular to a lightweight network design method for pest and disease damage identification application.

Background

Crop diseases and insect pests are one of the most complex and variable and difficult-to-overcome factors for restricting crop growth, are the main causes of agricultural production and economic loss all over the world, and the loss can be reduced only by timely discovering the diseases and insect pests and making correct diagnosis and treatment on the diseases and insect pests, so that the yield of agricultural products is improved. Most of the traditional pest and disease identification methods are agricultural expert diagnosis, depend on subjective experience of experts, and have low efficiency, large error and high cost.

With the development of image processing technology, a crop pest and disease identification method based on machine learning becomes possible, but features still need to be extracted manually, the operation is complicated, the abstract features are difficult to extract, and the precision is difficult to be improved to an application level. Since then, with the development of Graphics Processing Units (GPUs), deep learning techniques relying on powerful GPU processing capabilities have also been developed rapidly and are widely used in the field of agricultural pest identification. Compared with the traditional pest and disease identification method and the identification method based on machine learning, the accuracy rate of applying the deep learning technology to crop pest and disease identification is better, but the cost of improving the accuracy rate of most deep learning network models is that the models are more complex, a high-performance GPU (graphics processing unit) is infinitely consumed, the calculation overhead is huge, and the use of the deep learning network models at a low-cost and low-computing-power mobile terminal is severely limited.

The existing method for realizing the lightweight deep learning network model still has the technical problems of low model identification accuracy, complex operation and large parameter quantity and calculated quantity.

Disclosure of Invention

The invention aims to provide a lightweight network design method for pest identification application, and aims to solve the technical problems of low model identification accuracy, complex operation and large parameter quantity and calculated quantity in the pest prediction process of the conventional lightweight deep learning network model.

In order to achieve the purpose, the invention provides a lightweight network design method for pest identification application, which comprises the following steps:

selecting a YOLOV5 original network and introducing a GhostMoudele structure;

inserting a CA attention mechanism;

replacing the loss function from CIOU to SIOU to generate a Light-YOLOV5 network model;

and training a Light-Yoloov 5 network model by adopting the pest image set, and predicting the pest by using the trained Light-Yoloov 5 network model.

The process of introducing the GhostMoudle structure comprises the following steps:

replacing the Bottlenteck in the C3 module with Ghostbottleneck, and replacing the rest CBS composite convolution modules in the C3 module with GhostCBS to form a complete GhostC3 module;

replacing the SPP module with GhostSPP;

the CBS modules responsible for downsampling are all replaced with ghost CBS.

Wherein, the step of inserting the CA attention mechanism comprises the following steps:

embedding coordinate information;

and generating coordinate information.

Wherein the CA attention mechanism is separately embedded in the 9 th layer of the Backbone network and is fused with the GhostC3 module.

In the process of replacing the loss function from CIOU to SIOU, the SIOU loss function considers the distance loss, the aspect ratio loss and the IOU loss of the GroudTruth frame and the prediction frame included in the CIOU loss function, and also considers the loss of the GroudTruth frame and the prediction frame in the direction.

The invention provides a lightweight network design method for pest identification application, which provides a Light-Yolov5 network model on the basis of a Yolov5s original network, firstly introduces a Ghost convolution idea to replace a part of convolution modules in the original Yolov5 network and reduces the network parameter number; secondly, a lightweight CA attention mechanism is independently embedded into the 9 th layer of the network and then fused with a GhostC3 module to form the GhostC3CA module, so that the feature extraction and fusion capability of the network model is enhanced; and finally, changing the original loss function CIOU into SIOU to accelerate the convergence speed of the model, so that the gradient descending direction in the training process is more accurate, and on the premise that the accuracy rate of the Light-YOLOV5 network model is relatively close, the calculation resources are saved, and the method is more suitable for being deployed on mobile equipment with shortage of calculation resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a Ghost-series network module in the lightweight network design method for pest identification application of the present invention.

Fig. 2 is a schematic diagram of a Light-yoloov 5 network structure in the lightweight network design method for pest identification application of the present invention.

Fig. 3 is a graph comparing the normal convolution with the Ghost convolution.

Fig. 4 is a schematic diagram of the CoordinateAttention module structure of the present invention.

FIG. 5 is a schematic representation of a Pilocarpus phaseoloides image collection employed in an embodiment of the present invention.

FIG. 6 is a schematic representation of a set of images of the disease of Nyctanthus martensii Karsch employed in a specific embodiment of the present invention.

FIG. 7 is a diagram illustrating a pattern for enhancing data of two sets of images according to an embodiment of the present invention.

Fig. 8 is a schematic diagram illustrating a result of the mosaic data enhancement processing according to the embodiment of the present invention.

FIG. 9 is a schematic flow chart of an experimental procedure according to an embodiment of the present invention.

FIG. 10 is a diagram illustrating the convergence of a model on a training set and a validation set in an embodiment of the present invention.

FIG. 11 is a graphical representation of performance results for the Pieris canadensis test set in a specific embodiment of the present invention.

Figure 12 is a graphical representation of performance statistics for the no nuclearity voronoi canker disease test set in a specific embodiment of the invention.

FIG. 13 is a graph showing statistical performance of three network models on the Pieris neteri test set in an embodiment of the present invention.

FIG. 14 is a graphical representation of statistical performance of three network models on the Nymotus nutans ulcer disease test set in an embodiment of the invention.

Fig. 15 is a diagram illustrating comparison of recognition results of three network models according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.

Referring to fig. 1 to 4, the invention provides a lightweight network design method for pest identification application, comprising the following steps:

s1: selecting a YOLOV5 original network and introducing a GhostMoudle structure;

s2: inserting a CA attention mechanism;

s3: replacing the loss function from CIOU to SIOU to generate a Light-YOLOV5 network model;

s4: and training a Light-Yoloov 5 network model by adopting the pest image set, and predicting the pest by using the trained Light-Yoloov 5 network model.

The invention is further explained by combining background knowledge and implementation steps as follows:

YOLOV5 is a single-stage detection algorithm proposed by Glen Jocher in 2020, and can be divided into four network models with different complexity, namely YOLOV5s, YOLOV5l, YOLOV5m and YOLOV5x, according to the difference of two scaling factors of network depth and width. The method comprehensively considers two factors of accuracy and light weight of identification, and selects a YOLOV5s network model to carry out the following research.

The Ghost Moudle idea was first introduced into the YOLOV5s network model. Replacing a Bottleneck in the C3 module with a Ghostbottleneck, and replacing the rest CBS composite convolution modules in the C3 module with GhostCBS, so as to form a complete GhostC3 module; replacing the SPP module with GhostSPP; and replacing all CBS modules in the original network which are responsible for downsampling with GhostCBS. The Ghost family of network components is shown in figure 1. The Coordinate Action (CA) attention mechanism concept was then introduced. Considering various factors such as network lightweight and identification accuracy, the CA module is embedded into the 9 th layer of the backhaul network; in addition, the CA module is fused with the GhostC3 module, so that the feature extraction and fusion capability of the network model is further improved. In summary, the Light-YOLOV5 network model structure is shown in fig. 2, and the detailed structure parameters are shown in table 1.

TABLE 1 Light-YOLOV5 network architecture details

In table 1, "layers" indicates the layer number, "from" indicates which layer the input of the current layer comes from, "parameters" indicates the number of parameters, "motion" indicates the module name; "-1" represents the previous layer, "Concat" represents the splicing operation, and "n" represents the number of times the module is repeated.

As shown in fig. 2, if the input picture Resize is 640 × 3, the input picture is fed into a Light-yoloov 5 network, image features are extracted through a Backbone network, the extracted features are input into a hack network for feature fusion, and prediction results of three different scales, namely 80 × 255, 40 × 255 and 20 × 255, are obtained through a Head network.

Yolov version 5s v5.0 uses CIOU as a loss function, and the formula is as follows:

the IOU is the ratio of intersection and union of the prediction box and the Grounttruth box, rho ² (b,b ^gt ) C represents the diagonal distance, w, h, w, which can contain the minimum surrounding area of the prediction frame and the group Truth frame at the same time ^gt ,h ^gt The width and height of the prediction box and the groudtruth box are indicated, respectively.

In order to make the improved Light-Yoloov 5 network more superior in performance, the loss function is changed into SIOU, so that the convergence speed of the network model is increased, and the gradient descending direction is more accurate.

Specifically, three important definitions in the improvement step are as follows:

1. ghost Moudle structure

GhostNet is a cheap and efficient neural network system structure proposed by Han K and the like in 2020, can reduce the calculation cost of a common convolution layer on the premise of keeping similar identification performance, is highly modularized and highly portable, and has the advantages of strong plug and play. FIG. 3 is a graph comparing a conventional convolution with a Ghost convolution, wherein (a) is a structural feature of the conventional convolution and (b) is a structural feature of the Ghost convolution. The core idea of the GhostNet is to split the common convolution operation into two steps according to the relation between feature maps: firstly, a small number of feature maps are generated by adopting a common convolution method; secondly, the feature map generated in the first step is used for enhancing features by using cheap linear operation, finally, two groups of feature maps are spliced on channel dimensions, and the inherent feature map is ensured by adopting a parallel mode of the feature map and linear transformation, so that the generated feature map can be matched with a given output channel.

The general convolution calculation is shown in equation (1):

Y＝X*f+b (1)

where is the convolution operation, b is the bias term, for a given input data

Output of

Expressed as formula (1), c is the input channel, h, w, h ', w' are the width and height of the input and output data respectively,

represents convolution operations with n convolution kernels of size k × k × c. The FLOPs of a normal convolution can be expressed as n × h '× w' × c × k × k.

And for the Ghost convolution, a step convolution mode is adopted, and the calculation formulas are shown as formulas (2) and (3).

Y′＝X*f′ (2)

In the formula (2), the result of a small amount of feature maps generated by the input X through m (m < n) ordinary convolutions with the convolution kernel size k × k × c is

And performing cheap linear operation on the m feature maps, generating s feature maps by each feature map, and generating n = m × s feature maps as output data of the Ghost module, wherein the process is shown as formula (3). In the formula (3), phi _i,j The ith feature map y representing the first step of ordinary convolution generation _i ' carry out the jth Linear operation (characteristic identity)Mapping).

Assuming that the convolution kernel size of each linear transformation is d × d, the theoretical acceleration ratio of the Ghost convolution replacing the normal convolution is equation (4).

The calculation amount of the normal convolution is approximately s times of Ghost as a result of the reduction of the equation (4), and the parameter amount can be calculated to be s times in the same manner.

The invention reduces the dimension of the Convolution output in the GhostCBS module to half of the original dimension, then carries out cheaper linear transformation on the feature map output by the GhostCBS module in the GhostBottleneck by using Depth-wise separable Convolution (Depth-wise Convolution), and eliminates the correlation among channels by using the Depth-wise separable Convolution, so that the current channel feature is only correlated with the self, on one hand, the generation mode of the redundancy feature is simulated, and on the other hand, the parameter quantity and the calculated quantity are obviously reduced.

2. CA attention mechanism

Coordinate Attention (CA) is an attentive mechanism with lightweight characteristics published in CAVR by Hou Q et al in 2021, can be flexibly inserted into a classical mobile network, has almost no calculation overhead and enables the network to acquire wider area information, and has better performance than attentive mechanisms such as Squeeze-and-Excitation (SE).

The CA attention mechanism is accomplished in two steps: coordinate Information Embedding (Coordinate Information Embedding), coordinate attention Generation (Coordinate attention Generation), the structure of which is shown in fig. 4.

(1) Coordinate Information Embedding (Coordinate Information Embedding): in order to enable the CA attention module to capture remote spatial interactions with precise location information, the global average pooling is decomposed in the manner of equation (5), which translates into a pair of 1D feature encoding operations. For a given input X, the channel is first encoded along the horizontal and vertical directions using pooling kernels of sizes (H, 1) and (1, w), respectively. The output of the c-th channel having a height h and the c-th channel having a width w may be expressed as equation (6) and equation (7), respectively.

(2) Coordinate information Generation (coordinate attention Generation): in order to use the global receptive field obtained by the above transformation and encode accurate position information, after the transformation of the coordinate information embedding part, concat operation is carried out and the channels are compressed by using convolution variation function, the number of channels is changed from C to C/r (r controls the compression rate). The calculation formula is shown in (8).

f＝δ(F ₁ ([z ^h ,z ^w ])) (8)

In the formula (8), F ₁ Represents a 1 x 1 convolution transform, [,.]Denotes the Concat operation and δ denotes the nonlinear activation function.

Further, f is decomposed into tensors f _h And f _w The number of channels output by matching the Residual is shown in equations (9) and (10).

g ^h ＝σ(F _h (f ^h )) (9)

g ^w ＝σ(F _w (f ^w )) (10)

In the formulae (9) and (10), F _h And F _w Represents a 1 × 1 convolution transformation and σ represents a Sigmoid function.

Finally, the CA attention mechanism module integrates two steps of Coordinate Information Embedding (Coordinate Information Embedding) and Coordinate Information Generation (Coordinate attention Generation), and obtains a calculation formula of the output Y as shown in formula (11).

According to the invention, a CA attention mechanism is independently embedded into the 9 th layer of the Backbone network, and is fused with a GhostC3 module, so that the expression capability of the learning characteristics of the network is enhanced.

3. SIOU loss function

SIOU LOSS is the most recent LOSS function in the field of target detection proposed by GevorgyanZ at 5 months 2022. The method not only considers the distance loss, the aspect ratio loss and the IOU loss between the group Truth frame and the prediction frame included in the loss function (such as GIOU, CIOU and the like) proposed previously, but also considers the loss of the group Truth frame and the prediction frame in the direction, and solves the problem of slow convergence speed caused by 'wandering around' of the prediction frame in the training process. The SIOU loss function consists of four cost functions, each of which is an angle cost (L) _ang )，distance cost(L _dis )，shape cost(L _sha )，Iou cost(L _IOU ) The SIOU calculation formula is shown in formula (12).

The calculation of IOU, delta and omega are respectively shown in formulas (12), (13) and (14).

L _sha ＝Ω＝∑ _t＝w,h (1-e ^-ωt ) ^θ (15)

In the formula (13), the reaction mixture is,

wherein

Respectively (horizontal and vertical coordinates of the central point of the prediction frame),

respectively representing (horizontal and vertical coordinates of the center point of the group Truth frame); in the formula (14), the compound represented by the formula (I),

γ =2- Λ, where ρ _x ,ρ _y Represents the distance between the central point of the prediction frame and the central point of the Ground Truth frame in the horizontal direction and the vertical direction, c _w ,c _h Respectively representing the width and the height of a prediction box; in the formula (15), the reaction mixture is,

wherein w, h represent the width and height of the prediction box, respectively, w ^gt ,h ^gt Respectively (width and height of a group Truth frame), and theta is a parameter for controlling the attention degree to Shape cost; in the formula (16), B represents a prediction frame, B ^GT A group Truth box is indicated.

According to the method, SIOU is used for replacing the original CIOU, and loss of the Ground Truth frame and the prediction frame in the direction is considered, so that the direction of gradient reduction in the training process is more accurate, and the convergence speed of the model and the accuracy of reasoning are improved.

Furthermore, the invention also provides a specific embodiment for experimental verification:

the invention selects the green pea cabbage worm and the anucleate wok orange canker as research objects. The image of the Pieris rapae is collected from a Pieris rapae town of Wuming district Gong province, nanning city of Zhuang nationality in Guangxi, and the weather is cloudy on the same day. Adopting a digital camera to shoot 441 Dutch bean cabbage caterpillar pictures from different angles, wherein the resolution is 5184 multiplied by 3456; the image of the anucleate Or citrus canker is collected from the Yiwogan plantation in the Wuming district of Nanning City of Zhuang nationality in Guangxi, and the weather is sunny. 200 Dutch bean and cabbage caterpillar pictures are shot by adopting a Huawei P40 mobile phone from different angles, and the resolution is 4096 multiplied by 3072. The two pests have been identified by agricultural plant protection experts as shown in fig. 5 and fig. 6, where fig. 5 is the cabbage caterpillar and fig. 6 is the anucleate vorax citri canker.

Further, specific pretreatment steps for the cabbage caterpillar picture are as follows: (1) the method comprises the steps of (1) collecting original pictures, (2) carrying out histogram equalization, and expanding 441 Dutch bean cabbage caterpillar pictures to 881; aiming at the picture of the anucleate wogonian ulcer disease, the specific pretreatment steps are as follows: (1) the method comprises the steps of (1) collecting original pictures, (2) cutting and turning, (3) equalizing histograms, and expanding non-nuclear wok citrus canker pictures from 200 to 800. The specific case is shown in fig. 7.

Specifically, fig. 7 (a) - (b) enhance the data of the images of the green insects of the netherlands beans, namely, (a) an original image, (b) the original image + histogram equalization; fig. 7 (c) - (h) data enhancement for nrt kumquat canker (c) raw picture, (d) raw picture + histogram equalization, (e) raw picture + cut, (f) cut + flip, (g) cut + histogram equalization, (h) cut + flip + histogram equalization.

And then, carrying out data annotation on the pictures by using LabelImg software, and generating a label in a Pascal VOC format for storing in an xml file format. Thereby forming 1681 pest and disease data sets for training, verifying and testing of subsequent models.

Besides the data enhancement mode, mosaic data enhancement is adopted to further expand the training set in the training process. The mosaic data enhancement method is proposed in YOLOV4 paper of 2020 by Alexey Bochkovski and the like, and the main idea is to splice 4 pictures in a data set into one picture after random cutting, so that the method not only plays a role of expanding the data set, but also enriches the backgrounds of detected objects, has a good effect on small target identification, and the picture enhanced by the mosaic data is shown in FIG. 8.

The experimental platform adopts a Linux server, in the aspect of software, an operating system is Ubuntu20.04 (professional edition), pycharm is used as a program editing environment, an operating environment of a pytorch frame 1.9.0 is constructed in an Anaconda environment, wherein python3.8 is used as an interpreter, and the CUDA version is 11.2; in the aspect of hardware, the processor is intel i9-11900K, the display card is RTX3090-24G, and the memory is 64G. The experimental set-up is shown in table 2.

TABLE 2 Experimental Equipment configuration parameters

Configuring parameters	Detailed information
		Operating system	Unbuntu20.04
Display card	RTX3090-24G
		Memory device	64G
Programming language	Python 3.8.1
		Deep learning framework	Pytorch 1.9.0
GPU acceleration environment	CUDA 11.2
		Processor with a memory for storing a plurality of data	intel i9-11900K

In order to count the performance of the Light-YoloV5 network model on the test set, six evaluation indexes of Precision (P), recall (R), average Precision (AP), meaneAverage Precision (mAP), parameter Precision (Params) and Floating point operation Precision (GFLOPs) are set, and the formula is as follows (17) - (22):

Params＝C _in ×C _out ×k ² (21)

GFLOPs＝10 ⁹ ×FLOPs≈2×10 ⁹ ×H×W×Params＝2×10 ⁹ ×H×W×C _in ×C _out ×k ² (22)

wherein, precision (P) represents the proportion of the predicted pair in the positive sample predicted by the model, namely Precision; recall (R) represents the predicted fraction of the actual positive sample, i.e., the Recall ratio; the Average Precision (AP) is the Average value of P in any one category, the quality of the evaluation model in the category can be approximately calculated as the area enclosed by a P-R curve and a coordinate axis in practical application, the Mean Average Precision (mAP) represents the Average value of the AP in all categories, and the quality of the evaluation model in all categories is obtained; parameters (Params) represent Parameters contained in the model, directly determine the size of a model file and influence the occupation amount of the memory during reasoning; the Floating Point Operation Quantity (GFLOPs) represents Floating Point calculation times required in reasoning, determines complexity of calculation time, and is 1GFLOPs =1 × 10 ⁹ FLOPs。

In the equations (17) - (20), TP is True Positive, and means that it is determined as a Positive sample, and is actuallyIs also a positive sample; TN is True Negative, which means that it is determined as Negative sample, and actually is also Negative sample; FP is False Positive, which means that the sample is judged to be Positive, but actually is negative; FN is False Negative, indicating that it is judged to be a Negative sample, but actually a positive sample. In the formulae (21) and (22), H, w, C _in Respectively the height, width and channel number of the input feature map, k is the convolution kernel size, C _out Is the number of output channels.

The specific training process is as follows: first, the data set is divided into 8:1:1, dividing the input picture into a training set, a verification set and a test set, then performing adaptive filling and scaling on the input picture to be 640 x 640 in size, and finally inputting the input picture into a Light-weight Light-Yoloov 5 network model for training.

In the training process, the Batch-size is set to be 16, each Batch has 85 pictures, 500 epochs are iterated, the initial learning rate is set to be 0.01, the model is verified on a verification set once in each iteration, the obtained prediction data is fed back to the model so as to correct the gradient descending direction of the model during training, and the optimal model weight is stored and used for model testing after the training is finished. And then, performing performance statistics on the test set by using the optimal model weight to obtain an optimal pest and disease identification model Light-Yoloov 5. Finally, training on the same data set with original YOLOV5s and YOLOX _ s in the YOLO series and comparing the test results. The experimental process flow chart is shown in fig. 9.

After the model training iterates for 500 epochs, the convergence conditions of the training set and the validation set are saved, and a loss function curve is drawn, as shown in fig. 10. Which comprises the following steps: a Localization Loss (bounding Box Loss or Localization Loss), a frame targeting Loss (Objectness Loss or Confidence Loss), and a Classification Loss (Classification Loss). Localization Loss, which represents the Loss caused by the error between the prediction box and the groudtree box, and the smaller the Loss function, the more accurate the prediction box. The objective Loss (also called confidence Loss) of a box represents a measure of the probability that an object is present in the region of interest, i.e. the probability that the prediction is correct, the smaller the value of the Loss function, the more accurate the detection. Classification loss (Classification loss) is the loss caused by misprediction of a given object class, and the smaller the value of the loss function, the more accurate the Classification.

In fig. 10, the first three pictures represent the convergence of the training set loss function, and the second three pictures represent the convergence of the verification set loss function; "Box" and "val Box" denote Bounding Box Loss on the training set and the validation set, respectively; "Objectness" and "val Objectness" respectively denote "Objectness Loss" and "class" on the training set and the verification set, and "class" and "val class" respectively denote "class Loss" on the training set and the verification set. It can be seen that the loss function shows a gradual decrease trend in the training process, the first 80 epoch loss functions decrease sharply, and the P, R, and mAp parameters increase rapidly in the period; the descending trend of the loss function is gradually slowed down when 80-450 epochs are used, meanwhile, the ascending trend of the P, R and mAP is slowed down, the loss function tends to be stable and does not descend any more until about 450 epochs are used, meanwhile, the parameters of the P, R and mAP are not increased any more, and at the moment, the model converges to the optimal state.

And selecting the remaining untrained test set to be input into the Light-YOLOv5 model for testing. The test results were compared with the original YOLOV5 and YOLOX _ s, and the comparison results are shown in fig. 11-14.

Figure 11 is performance results for the test set of the pea bean caterpillar. It can be seen that the Light-YOLOV5 network model compares to the original YOLOV5 model: p is improved by 1.4%, R is improved by 4.2%, and AP is improved by 1.8%; compared to the YOLOX _ s model: p is reduced by 1.2%, R is improved by 1.4%, and AP is improved by 7.3%.

Figure 12 is a performance statistic for the anucleate vorax aurantii test set. It can be seen that the Light-YOLOV5 network model compares to the original YOLOV5 model: p is reduced by 4.1%, R is improved by 1.1%, and AP is reduced by 2.2%; compared to the YOLOX _ s model: p is reduced by 2.4%, R is improved by 1.1%, and AP is improved by 3.4%.

FIGS. 13 and 14 are statistical performance results of Light-Yolov5, original Yolov5, and Yolox _ s three network models on the test set of two categories of Dutch bean caterpillar and Seedless wogonian canker. It can be seen that Light-YOLOV5 model decreased 59.2% and 67.4% respectively, GFLOPs decreased 51% and 70.5% respectively, map @0.5 decreased 0.2% and 5.4% respectively, and map @ 5: 95 decreased 8.5% and 1.4% respectively, compared to original YOLOV5 and YOLOX _ s Params.

In conclusion, the Light-Yoloov 5 provided by the invention has the best performance when the IOU threshold is set to be 0.5, the identified AP of the green pea cabbage worm and the ulcer of the seedless woolly mandarin orange can reach 97 percent and 93.4 percent respectively, the mAP @0.5 reaches 95.2 percent, and the Params and the GFLOPs are reduced by more than 50 percent under the condition of ensuring the relative level of the AP and the mAP, so that the computing resources are saved, the deployment of edge computing equipment is facilitated, and two Light-Yoloov 5 models can be paralleled under the same computing power.

Further, to verify the actual classification effect of Light-YOLOV5, original YOLOV5 and YOLOX _ s, three networks were tested for the ulcer disease of the cabbage caterpillar and the seedless woolly citrus, and the identification results are shown in fig. 15.

Specifically, fig. 15 is a schematic comparison of the recognition results of three network models. Wherein (a-c) is directed against Pieris oleracea, (a) Light-Yolov5, (b) primitive Yolov5, (c) Yolox _ s, (d-f) is directed against Seedless Wobo canker, (d) Light-Yolov5, (b) primitive Yolov5, (c) Yolox _ s.

As shown in fig. 15, the three network recognition effects are similar, and the cabbage caterpillar and the anucleate woolly orange ulcer disease can be effectively detected. For practical application scenarios, the network model can diagnose and process the plant diseases and insect pests correctly as long as the network model can effectively detect the types and the occurrence positions of the plant diseases and insect pests. In combination with the above, the parameters and floating point operands are more important on the premise that the correct result can be effectively detected, so the Light-YOLOV5 network model has better performance than the original YOLOV5 and YOLOX _ s.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A lightweight network design method for pest identification application is characterized by comprising the following steps:

selecting a YOLOV5 original network and introducing a GhostMoudle structure;

inserting a CA attention mechanism;

replacing the loss function from the CIOU to the SIOU to generate a Light-YOLOV5 network model;

and training a Light-YoloV5 network model by adopting a pest image set, and predicting pests by using the trained Light-YoloV5 network model.

2. A pest identification application-oriented lightweight network design method according to claim 1,

the process for introducing the GhostMoudle structure comprises the following steps:

replacing the SPP module with GhostSPP;

the CBS modules responsible for downsampling are all replaced with ghost CBS.

3. A pest identification application-oriented lightweight network design method according to claim 2, wherein,

a step of inserting a CA attention mechanism, comprising the steps of:

embedding coordinate information;

and generating coordinate information.

4. A pest identification application-oriented lightweight network design method according to claim 3, wherein,

the CA attention mechanism is embedded in the layer 9 of the Backbone network separately and is fused with the GhostC3 module.

5. A lightweight network design method for pest identification application according to claim 4,

in the process of replacing the loss function from CIOU to SIOU, the SIOU loss function considers the distance loss, the aspect ratio loss and the IOU loss of the group Truth frame and the prediction frame included in the CIOU loss function, and also considers the loss of the group Truth frame and the prediction frame in the direction.