CN114708231A

CN114708231A - Sugarcane aphid target detection method based on light-weight YOLO v5

Info

Publication number: CN114708231A
Application number: CN202210371938.9A
Authority: CN
Inventors: 徐伟悦; 徐涛; 王炎; 陈伟; 张储
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-07-05

Abstract

The invention relates to the technical field of target detection, in particular to a sugarcane aphid target detection method based on light YOLO v5, which comprises the following steps: s1: collecting sugarcane aphid images; s2: preprocessing an image; s3: making a data set; s4: the YOLO v5 model is improved in light weight; s5: and training an input data set to obtain a lightweight model. According to the method, a Stem module and an Inverted Residual module of ShuffleNet V2 are used, the neck network width is reduced to lighten the YOLO v5 model, meanwhile, a large-scale detection layer is deleted, and a small-scale detection layer is added, so that the network can better accord with the detection of dense small targets in sugarcane aphids, and compared with an original YOLO v5 model, the method has the advantages of high identification precision, high identification speed, low memory occupancy rate and the like, and is beneficial to the deployment of the model on a mobile terminal.

Description

Sugarcane aphid target detection method based on light-weight YOLO v5

Technical Field

The invention relates to the technical field of target detection, in particular to a sugarcane aphid target detection method based on light-weight YOLO v 5.

Background

The sorghum is often damaged by sugarcane aphids in the growth process, the healthy growth of the sorghum is damaged, disasters often occur, and huge economic loss is caused to the yield of the sorghum every year. Therefore, sugarcane aphids on sorghum leaves need to be monitored quickly and accurately, reasonable prevention measures are made, and economic losses are reduced. The traditional sugarcane aphid detection main method still needs basic level workers to enter the field to observe the distribution condition of the sugarcane aphids on the leaves, judges the number grade of the sugarcane aphids on the leaves by means of manual counting, and diagnoses the pest and disease conditions of the region. The method has the characteristics of large workload and low efficiency, and can not reflect and predict the occurrence of plant diseases and insect pests quickly and accurately.

In the early pest identification method, the pest identification is realized by machine learning methods such as a decision tree, gradient promotion, a support vector machine and the like mainly according to the characteristics such as color, texture, form and the like of pests. The method has high requirements on characteristics of the pests in the data set, such as body types, colors and the like, the identification result is not stable enough, the detection rate is low, and the method depends on the non-natural farmland actual environment.

With the development of computing resources, the deep learning progresses rapidly, and particularly in the field of images, a technical basis is provided for lightweight farmland pest detection. With the growing maturity of deep learning technology, some excellent detection models, such as SSD, Faster-RCNN, YOLO, emerge in the field of target detection. The target detection model with excellent performance extracts features through a convolutional neural network, and generally realizes target positioning based on an anchor frame. Recently, some target detection algorithms without anchor boxes have emerged, such as centret; but the algorithm is a no-priori frame algorithm used for small target objects such as pests, and the performance acceptability of the algorithm is low.

Existing pest detection methods are typically trained using images taken under steady lighting or under a single environmental condition (e.g., controlled lighting, greenhouses, and pest traps) as a data set. Such as Mamdouh, etc^[1]Olive drosophila images taken at yellow traps were examined using the method of modified Yolov4, with an average accuracy of 96.68%; hong et al^[2]The method of fast-RCNN is used for detecting the image of the pine cone pest shot under indoor four-way LED illumination, and the average precision of 80.2% to 89.8% is realized. The pest identification accuracy of the method is high, but the model aims at pest detection under the non-structural farmland background and the complex natural illumination environment, and particularly the problems of low accuracy and poor robustness exist in the identification and counting of extremely small pest targets; meanwhile, the number of model parameters is large, and the model parameters are supported by hardware equipment with high computing resources, so that the model parameters are difficult to be applied to agricultural pest detection in a large scale.

The invention provides a lightweight sugarcane aphid target detection algorithm, which aims at solving the problems of multiple parameters, complex network and large calculation amount of the existing pest target detection model. The model can be deployed on low-cost mobile equipment, is suitable for identifying sugarcane aphids in a farmland variable natural illumination environment, and finally realizes large-scale and intelligent application of a sugarcane aphid detection system.

[1]Mamdouh N,Khattab A.Yolo-based deep learning framework for olive fruit fly detection and counting[J].IEEE Access,2021,9:84252-84262.

[2]Hong S J,Nam I,Kim S Y,et al.Automatic pest counting from pheromone trap images using deep learning object detectors for matsucoccus thunbergianae monitoring[J].Insects,2021,12(4):342.

Disclosure of Invention

Aiming at the defects of the existing algorithm, the invention provides a sugarcane aphid detection method based on light-weight YOLO v5, which is suitable for the variable natural illumination environment of farmlands; the method has the advantages of high identification precision, high identification speed and low memory occupancy rate, and is favorable for being deployed on a mobile terminal platform.

The technical scheme adopted by the invention is as follows: a sugarcane aphid target detection method based on light-weight YOLO v5 comprises the following steps:

s1: collecting sugarcane aphid images;

further, a camera is used for collecting images of the grain sorghum leaves with the sugarcane aphids as experimental image data; the acquired image includes four illumination intensities: weak light, strong light, direct light and diffused light, and is suitable for changeable natural illumination of farmlands;

s2: preprocessing an image;

further, dividing the acquired original image into a plurality of sub-images with certain pixels as a data set for experiments; because the sugarcane aphids are extremely small compared with the original image, the original image is segmented and processed aiming at the small target;

s3: making a data set;

further, manually labeling the sugarcane aphids of each image in the data set by using LabelImg, wherein the minimum circumscribed rectangle of the target is used as a real frame during labeling; randomly selecting 80% of images in the prepared data set as training data of a sugarcane aphid detection algorithm, and using the rest 20% of images as test data for testing and verifying the sugarcane aphid detection method provided by the invention;

s4: the YOLO v5 model is improved in light weight;

further, the method comprises the following steps:

s41, replacing the Focus module in the backbone network with the Stem module;

the Stem module is an economical and efficient module, so that the feature expression capability can be effectively improved without increasing extra calculation amount; the concrete structure of the Stem module is that 3x3 convolution with the step length of 2 is used for fast dimensionality reduction, then a two-branch structure is used, one branch uses 1 x 1 convolution and 3x3 convolution with the step length of 2, and the other branch uses a maximum pooling; finally, connecting the two branches through Concat, performing 1 × 1 convolution output, and completing 4-time down-sampling of the picture after passing through a Stem module;

the Stem module uses a 3x3 convolution with the step size of 2 and a pooling strategy of maximum pooling, and the combined pooling uses maximum pooling and mean pooling, so that the Stem module can enrich the feature layer;

s42, replacing the C3 module, the Conv module and the SPP module in the backbone network by using the invested Residual module of the ShuffleNet V2;

an inversed Residual module of ShuffleNet V2 is formed by combining two units, whether downsampling operation is carried out or not is controlled through an input step length parameter, and the specific structure of the downsampling operation is as follows:

when the step size is 1, the invested Residual module extracts features by using continuously repeated 1 × 1 convolution and 3 × 3 depth convolution, and the network depth is increased by adopting short-circuit connection; adding Channel split operation before each short circuit connection, dividing the input characteristic Channel into two parts, effectively improving the calculation efficiency of the convolutional neural network, and reducing huge parameters; the two branches are connected through the Concat and then added into a Channel shuffle module, and each Channel is disordered and rearranged to mix characteristics, so that the characteristic expression capability and the detection precision of the convolutional neural network model are improved; the application of Channel split and Channel shunt reduces the calculation complexity of the model, reduces the memory occupancy rate of the model, and greatly improves the calculation efficiency of the model;

when the step length is 2, the inversed Residual module uses a downsampling unit block, Channel split is not adopted any more, the downsampling is directly carried out according to the original input, each characteristic Channel adopts 3 × 3 deep convolution with the step length of 2, one branch is added with 1 × 1 convolution after the characteristic Channel is sampled, and the other branch is added with 1 × 1 convolution before and after the characteristic Channel is sampled; after the two branches are connected through the Concat, the space size of the characteristic diagram is reduced by half, but the number of channels is doubled; finally, a Channel shuffle module is used for disordering and rearranging each Channel to mix features, so that the feature expression capability and the detection precision of the convolutional neural network model are improved;

the deep convolution is that one convolution kernel is responsible for one channel, and one channel is only convoluted by one convolution kernel, namely, each channel of the input layer is independently subjected to convolution operation.

S43: reducing the neck web width;

the number of convolution kernels of the C3 module and the Conv module of the neck network part of the YOLO v5 is large, the number of generated feature map channels is large, the generated feature map channels occupy more cache space, the operation speed is reduced, and the calculation amount of the whole network is influenced; in order to lighten the model, the number of convolution kernels of a C3 module and a Conv module of a neck network part in a YOLO v5 configuration file is reduced to 128, the number of channels for generating a feature map can be reduced, the network width is reduced, the parameter quantity and the calculated quantity of the model are reduced, and the running speed of the model is accelerated;

s44: improving the detection scale;

when the input image size is 640 × 640, the detection scales of 3 specifications corresponding to YOLO v5 are 20 × 20, 40 × 40, and 80 × 80, respectively; for a detection scale of 20 × 20, a pixel region with a reception field of 32 × 32 size is mainly used for detecting a large object, while for a detection scale of 40 × 40, a pixel region with a reception field of 16 × 16 size is mainly used for detecting a medium object, and finally, for a detection scale of 80 × 80, a pixel region with a reception field of 8 × 8 size is mainly used for detecting a small object;

sugarcane aphids shot under natural illumination are distributed at all positions and all corners of leaves, and sugarcane aphid forms in different growth periods exist on the leaves, so that the sugarcane aphids mapped to the image have different sizes, and particularly, the sugarcane aphid target dense image comprises a plurality of sugarcane aphid targets; because the proportion of a single sugarcane aphid individual to be detected in an input image is small, multiple times of downsampling can cause that tiny target characteristic information extracted from a convolutional layer is lost in a training process, and larvae of the sugarcane aphid with tiny body types cannot be identified; a dense image may contain a plurality of sugarcane aphid targets of medium scale, small scale, micro scale and the like; the small-scale detection layer of YOLO v5 has poor applicability to the tiny sugarcane aphid larvae, and the large-scale detection layer has difficulty in detecting the sugarcane aphid; in order to cope with a complex dense scene under natural illumination, the method improves three detection scales of YOLO v 5; the detection scales of two small and medium targets of 40 multiplied by 40 and 80 multiplied by 80 are reserved in the three detection scales, the detection scale of a large target of 20 multiplied by 20 is deleted, and the detection scale of a small size of 160 multiplied by 160 is added; the micro-scale detection layer is used for reducing the dimensionality of an input image by 4 times through a Stem module, extracting lower spatial features and fusing deep semantic features to generate a feature map; the new microscale detection layer forms a wider and more detailed detection network structure, and is more suitable for detecting tiny and dense sugarcane aphids in images under the natural illumination scene of a farmland, so that the network can better accord with the target to be detected;

s5: and training the input data set to obtain a lightweight model.

Using the pytorch1.7.0 deep learning framework, the training image size was set to 640 x 640; the learning rate, the batch size, the number of iterations and the number of categories are set to 0.001, 16, 500 and 1 respectively; training the data set by adopting a minimum YOLO v5s model in four models of YOLO v 5; carrying out data enhancement on the data set picture by adopting a Mosaic enhancement (Mosaic) method of YOLO v 5; the method comprises the following steps that the Mosaic data enhancement adopts the modes of random zooming, random cutting and random arrangement to splice 4 random pictures, and finally the spliced pictures are zoomed into a set input size and are transmitted into a model as a new sample; the Mosaic data enhancement method enriches the position distribution condition of the target and enlarges the small-size target to a certain extent, thereby improving the generalization capability of the model while improving the training efficiency of the model; and inputting and training a data set to obtain a sugarcane aphid target detection model based on light weight YOLO v 5.

The invention has the beneficial effects that:

according to the light-weight YOLO v 5-based sugarcane aphid detection method, the Stem module and the Inverted Residual module of ShuffleNet V2 are used, the neck network width is reduced to lighten the YOLO v5 model, meanwhile, the large-scale detection layer is deleted, and the small-scale detection layer is added, so that the network can better accord with the detection of dense small targets in the sugarcane aphids, and compared with the original YOLO v5 model, the method has the advantages of high identification precision, high identification speed, low memory occupancy rate and the like, and is beneficial to the deployment of the model on a mobile terminal.

Drawings

FIG. 1 is a general flow diagram of a sugarcane aphid target detection method based on light weight YOLO v 5;

FIG. 2 is a diagram of a sugarcane aphid target detection model based on YOLO v 5;

FIG. 3 is a diagram of a lightweight YOLO v 5-based sugarcane aphid target detection model structure;

FIG. 4 is a diagram of a Stem module architecture;

FIG. 5 is a block diagram of the invoked Residual module of ShuffleNet V2;

FIG. 6 is a schematic diagram of a Channel shuffle rearrangement process;

FIG. 7 is a diagram of a neck network structure for the improved model;

FIG. 8 is a comparison of detection layer improvements;

FIG. 9 is a partial training diagram after enhancement of the Mosaic data;

FIG. 10 is a schematic diagram of the detection of sugarcane aphid image targets based on the lightweight YOLO v5 model.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.

As shown in fig. 1, a sugarcane aphid target detection method based on light-weight YOLO v5 comprises the following steps:

s1: collecting sugarcane aphid images;

the images used for the experiments were from images of grain sorghum leaves with sugarcane aphids taken using a camera in a natural field lighting environment (image resolution of 2448 × 3264); the vertical distance between the camera and the blade is about 0.2 m; the acquired image includes four illumination intensities: weak light (illumination intensity less than 20kLux), strong light (illumination intensity greater than 30kLux), direct light (taken between noon and afternoon on a sunny day with sun angles greater than 40 °) and diffuse light (taken at a light intensity between 20kLux and 30 kLux).

S2: preprocessing an image;

due to the fact that the resolution of the original image is high, the proportion of sugarcane aphids in the original image is too small, and repeated downsampling can cause loss of sugarcane aphid target characteristic information extracted from the convolutional layer in the training process, and therefore sugarcane aphid targets cannot be identified; therefore, each collected sugarcane aphid image is cut into 4 x 4 sub-images, the image resolution is 612 x 816, 300 sub-images which are provided with sugarcane aphid targets and clear in shooting are randomly selected from the images to serve as data set images of the research, and the data set images contain 2443 sugarcane aphid targets.

S3: making a data set;

manually labeling the sugarcane aphids of each image in the data set by using LabelImg, wherein the minimum circumscribed rectangle of the target is used as a real frame during labeling; randomly selecting 80% (240 images, 2230 aphid samples) from the prepared data set to serve as training data of the sugarcane aphid detection algorithm, and using the remaining 20% (60 images, 213 aphid samples) as test data for testing and verifying the sugarcane aphid detection method provided by the invention;

s4: the YOLO v5 model is improved in light weight;

as shown in fig. 2, the YOLO v5 model uses a C3 module as a backbone network for extracting image features, and adopts a structure of FPN (feature pyramid network) + PAN (path aggregation network) as a neck network to better fuse the extracted features, and uses three-dimensional detection heads of large, medium and small dimensions to form a head network for detecting a target object;

the basic building blocks of the target detection network based on YOLO v5 are as follows:

a Focus module: slicing an input image to expand an input channel and simultaneously reserving complete image information so as to reserve complete image down-sampling information for subsequent feature extraction and enable the feature extraction to be sufficient;

a Conv module: the image feature extraction device consists of a convolution layer, a batch normalization layer and a SiLu activation function and is used for extracting image features;

module C3: the main module for learning the residual error features is structurally divided into two branches, one branch uses a plurality of Bottleneck stacks and 3 standard convolution layers, the other branch only passes through one basic convolution module, the two branches are subjected to Concat operation, and features are output through one basic convolution module;

SPP (spatial pyramid pool) module: fusing the multi-scale features using a max-pooling method;

as shown in fig. 3, based on the light-weight model structure of YOLO v5, the steps for improving the light weight of the YOLO v5 model are as follows:

s41, replacing the Focus module in the backbone network with the Stem module, wherein the specific content is as follows:

the Stem module is an economical and efficient module, so that the feature expression capability can be effectively improved without increasing extra calculation amount; as shown in FIG. 4, the concrete structure of the Stem module is to perform fast dimensionality reduction by using convolution with step size 2 at 3x3, and then use a two-branch structure, one branch using convolution with 1 x 1 and convolution with step size 2 at 3x3, and the other branch using a maximal pooling. This part is very similar to the combined pooling, and the Stem module uses the pooling strategy derived from the advantages of the 3x3 convolution with the step size of 2 and the maximum pooling (the combined pooling uses the maximum pooling and the mean pooling), so that the feature layer can be enriched; finally, the two branches are connected by Concat and output after 1 × 1 convolution. The picture passes through a Stem module and then is subjected to 4-time down-sampling;

the ShuffleNet V2 is a lightweight neural network designed based on mobile equipment, and the ShuffleNet V1 is improved according to four design guidelines of a lightweight high-efficiency network as follows: using 1 × 1 convolution instead of 1 × 1 set of convolutions; new operations are introduced at the beginning of the module: channel split; in the short circuit connection, serial operation is adopted to replace addition operation;

the Inverted Residual module of shuffle V2 is formed by combining two units, and whether to perform downsampling operation is controlled by an incoming step size parameter, and the specific structure of the module is as shown in fig. 5:

as shown in fig. 5a, when the step size is 1, the invoked Residual module extracts features by using repeated 1 × 1 convolution and 3 × 3 deep convolution, and adopts short-circuit connection to increase the network depth; adding Channel split operation before each short circuit connection, dividing the input characteristic Channel into two parts, effectively improving the calculation efficiency of the convolutional neural network, and reducing huge parameters; as shown in fig. 6, the two branches are connected by Concat and then added into a Channel shuffle module, and each Channel is shuffled and rearranged to mix features, so that the feature expression capability and the detection accuracy of the convolutional neural network model are improved. The application of Channel split and Channel shunt reduces the calculation complexity of the model, reduces the memory occupancy rate of the model and greatly improves the calculation efficiency of the model;

as shown in fig. 5b, when the step size is 2, the invoked Residual module uses a block of downsampling units, Channel split is not used, but directly according to the original input, each branch uses a 3 × 3 deep convolution with the step size of 2 to perform downsampling, one branch adds a 1 × 1 convolution after the branch, and the other branch adds a 1 × 1 convolution before and after the branch; after the two branches are connected through the Concat, the size of the characteristic diagram space is reduced by half, but the number of channels is doubled; finally, the Channel shuffle module scrambles and rearranges each Channel to mix the characteristics, thereby improving the characteristic expression capability and the detection precision of the convolutional neural network model;

S43: reducing the neck web width;

as shown in fig. 7, the improved model proposed by the present invention continues to use the structure of FPN (feature pyramid network) + PAN (path aggregation network) of YOLO v5 as the neck network; the number of convolution kernels of the C3 module and the Conv module of the neck network part is large, the number of generated characteristic diagram channels is large, the generated characteristic diagram channels occupy more cache space, the operation speed is reduced, and the calculation amount of the whole network is influenced; in order to lighten the model, the number of convolution kernels of a C3 module and a Conv module of a neck network part in a configuration file is reduced to 128, so that the number of channels for generating a feature map can be reduced, the network width can be reduced, the parameter quantity and the calculation quantity of the model can be reduced, and the running speed of the model can be accelerated.

S44: the detection scale is improved, and the specific content is as follows:

when the input image size is 640 × 640, the detection scales of 3 specifications corresponding to YOLO v5 are 20 × 20, 40 × 40, and 80 × 80, respectively; for a detection scale of 20 × 20, a pixel region with a detection field of 32 × 32 is mainly used for detecting large objects, while a pixel region with a detection field of 16 × 16 is mainly used for detecting medium objects for a detection scale of 40 × 40, and a pixel region with a detection field of 8 × 8 is mainly used for detecting small objects for a detection scale of 80 × 80;

sugarcane aphids shot under natural illumination are distributed at all positions and all corners of the leaves, and sugarcane aphid forms in different growth periods exist on the leaves, so that the sugarcane aphids mapped to the images have different sizes, and particularly dense images containing a plurality of sugarcane aphid targets; according to the method, the proportion of a single sugarcane aphid individual to be detected in an input image is small, and multiple downsampling causes that tiny target characteristic information extracted from a convolutional layer is lost in a training process, so that larvae of the sugarcane aphid with tiny body shapes cannot be identified; a dense image may contain a plurality of sugarcane aphid targets of medium scale, small scale, micro scale and the like; the small-scale detection layer of YOLO v5 is poor in applicability to the tiny sugarcane aphid larvae, and meanwhile, the large-scale detection layer is difficult to play a role in detection of the sugarcane aphid. In order to cope with a complex dense scene under natural illumination, as shown in fig. 8, the invention improves three detection scales of YOLO v 5; the detection scales of two small and medium targets of 40 multiplied by 40 and 80 multiplied by 80 are reserved in the three detection scales, the detection scale of a large target of 20 multiplied by 20 in the original model is deleted, and the detection scale of a small size of 160 multiplied by 160 is increased; the micro-scale detection layer is used for fusing lower spatial features and deep semantic features which are extracted by the step module through 4 times of down-sampling of the input image to generate a feature map. The new microscale detection layer forms a wider and more detailed detection network structure, and is more suitable for detecting tiny and dense sugarcane aphids in images under natural illumination scenes, so that the network can better accord with a target to be detected.

S5: training an input data set to obtain a lightweight model;

using the pytorch1.7.0 deep learning framework, the training image size was set to 640 x 640; the learning rate, the batch size, the iteration number and the category number are respectively set to be 0.001, 16, 500 and 1; training the data set by adopting a minimum YOLO v5s model in four models of YOLO v 5; as shown in fig. 9, data enhancement is performed on the data set picture by using a Mosaic enhancement (Mosaic) method of YOLO v 5; the method comprises the following steps that the Mosaic data enhancement adopts the modes of random zooming, random cutting and random arrangement to splice 4 random pictures, and finally the spliced pictures are zoomed into a set input size and are transmitted into a model as a new sample; the Mosaic data enhancement method enriches the position distribution condition of the target and enlarges the small-size target to a certain extent, thereby improving the generalization capability of the model while improving the training efficiency of the model; inputting a data set and training to obtain a sugarcane aphid target detection model based on light-weight YOLO v 5;

the method adopts the precision (P), the recall rate (R) and the average precision average (mAP) as relevant indexes of the model performance evaluation; the accuracy rate is used for measuring the accuracy of model detection, namely precision ratio; the recall rate is used for evaluating the comprehensiveness of the model detection, namely recall rate; calculating the accuracy rate, the recall rate curve and the area surrounded by the coordinate axes by adopting an integral method for the single-category accuracy rate (AP); the value of mAP can be obtained by adding the AP values of the single category and dividing the AP values by the category number, and the value of mAP is generally calculated when IOU is 0.5, namely mAP @0.5, wherein IOU is an intersection ratio, and the specific formula is as follows for calculating an important function of mAP:

a, B in the formula (1) are respectively a prediction box and a real box, the denominator is the intersection of the two boxes, and the numerator is the union of the two boxes. TP in formulae (2) and (3) is true positive: predicting a positive target as positive; FP was false positive: negative target misprediction is positive; FN was false negative: a positive target is mispredicted negative. In the formula (4), p (r) is a curve of the accuracy and the recall rate after smoothing, and the area occupied by the curve after smoothing is obtained by integrating the curve. In the formula (5), C is the number of categories, APi represents the accuracy of the ith category, where i is the serial number, and the number of categories in the present invention is 1.

The experimental result according to table 1 shows that the total parameter number of the lightweight model is reduced by 95.13% compared with the original model, the storage space of the model is only 1.03MB, the inference time is reduced by 61.75%, and the Mean accuracy average (mAP) is improved by 0.9%;

TABLE 1 improved model vs. original model

The improvement scheme of the invention also improves the precision and the detection speed of the model on the basis of the lightweight model; FIG. 10 is a schematic diagram of the lightweight YOLO v5 model for detecting the aphid images of sugarcane, and it can be seen that the model has a good detection effect in dense aphid images of sugarcane; therefore, the sugarcane aphid target detection method suitable for the farmland natural illumination environment has the advantages of high identification precision, high identification speed, low memory occupancy rate and the like, and is beneficial to the deployment of a model on a mobile terminal.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A sugarcane aphid target detection method based on light YOLO v5 is characterized by comprising the following steps:

s1: collecting sugarcane aphid images;

s2: preprocessing an image;

s3: making a data set;

s4: the YOLO v5 model is improved in light weight;

s5: and training an input data set to obtain a lightweight model.

2. The sugarcane aphid target detection method based on light-weight YOLO v5 as claimed in claim 1, wherein step S1 comprises: the acquired image includes four illumination intensities: weak light, strong light, direct light and diffuse light.

3. The sugarcane aphid target detection method based on light-weight YOLO v5 as claimed in claim 1, wherein step S2 comprises: the acquired original image is divided into a plurality of sub-images of certain pixels.

4. The sugarcane aphid target detection method based on light-weight YOLO v5 as claimed in claim 1, wherein step S3 comprises: and (3) manually labeling the sugarcane aphids of each image in the data set by using LabelImg, wherein the minimum circumscribed rectangle of the target is used as a real frame during labeling.

5. The sugarcane aphid target detection method based on light-weight YOLO v5 as claimed in claim 1, wherein step S4 comprises:

s41: replacing a Focus module in the backbone network by using a Stem module;

s42: replacing a C3 module, a Conv module and an SPP module in the backbone network by using an inversed Residual module of ShuffLeNetV 2;

s43: reducing the neck web width;

s44: and improving the detection scale.

6. The method for detecting sugarcane aphids target based on light YOLO v5 as claimed in claim 5, wherein step S41 comprises:

the Stem modular structure first uses a 3 × 3 convolution with step size of 2; then a two-branch structure is used, one branch using a 1 × 1 convolution and a 3 × 3 convolution with step size 2, the other branch using a max pooling; and finally, connecting the two branches through Concat, performing 1 × 1 convolution output, and completing 4-time down-sampling of the picture after passing through a Stem module.

7. The method for detecting sugarcane aphids target based on light YOLO v5 as claimed in claim 5, wherein step S42 comprises:

when the step length is 1, the invested Residual module extracts features by using continuously repeated 1 × 1 convolution and 3 × 3 depth convolution, and the network depth is increased by adopting short-circuit connection; adding Channel split operation before each short circuit connection, and dividing an input characteristic Channel into two parts; connecting the two branches through Concat, adding the two branches into a Channel shuffle module, and disordering and rearranging each characteristic Channel to mix characteristics;

when the step length is 2, the inversed Residual module uses a downsampling unit block, each characteristic channel adopts 3 × 3 depth convolution with the step length of 2 to carry out downsampling, one branch is added with 1 × 1 convolution after the other branch is added with 1 × 1 convolution before and after the other branch, and the two branches are connected through Concat; finally, using a Channel shuffle module to shuffle and rearrange each feature Channel to mix features;

deep convolution is one convolution kernel responsible for one channel, one channel being convolved by only one convolution kernel.

8. The method for detecting sugarcane aphids target based on light YOLO v5 as claimed in claim 5, wherein step S43 comprises: the number of C3 and Conv module convolution kernels for the neck network portion in the YOLO v5 profile was set to 128.

9. The method for detecting sugarcane aphids target based on light YOLO v5 as claimed in claim 5, wherein step S44 comprises: the detection scales of 3 specifications corresponding to the light-weight YOLO v5 are respectively 40 × 40, 80 × 80 and 160 × 160; the added 160 x 160 micro-scale detection layer is used for generating a feature map by fusing the extracted lower spatial features and deep semantic features, which reduce the dimensionality of the input image by 4 times through a Stem module.

10. The sugarcane aphid target detection method based on light-weight YOLO v5 as claimed in claim 1, wherein step S5 comprises: using a pytorch1.7.0 deep learning framework, setting the size of a training image to be 640 multiplied by 640, and setting the learning rate, the batch size, the iteration number and the class number to be 0.001, 16, 500 and 1 respectively; and performing data enhancement on the data set picture by adopting a Mosaic enhancement method.