CN111652247A

CN111652247A - Diptera insect identification method based on deep convolutional neural network

Info

Publication number: CN111652247A
Application number: CN202010471036.3A
Authority: CN
Inventors: 陈彦彤; 王俊生; 张献中
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2020-09-11

Abstract

The invention provides a diptera insect identification method based on a deep convolutional neural network, which comprises the following steps of: collecting a diptera insect image, and making a data set of the diptera insect image; performing data enhancement on the data set; constructing an improved RetinaNet target detection model; setting training parameters, and training the RetinaNet target detection model through the data set; classifying and positioning the test set images of the diptera insects based on the trained target detection model. The invention provides a diptera insect identification method based on a deep convolutional neural network, which adopts a RetinaNet target detection model, simultaneously adds an improved convolutional block attention module, improves a characteristic pyramid network, ensures that a large amount of manpower and material resources do not need to be consumed, can also solve the problem of dependence on manual design characteristics, and is simple to operate.

Description

Diptera insect identification method based on deep convolutional neural network

Technical Field

The invention relates to the field of insect species identification, in particular to a diptera insect identification method based on a deep convolutional neural network.

Background

Diptera is the fourth largest order of the Insecta, second only to Coleoptera, Lepidoptera, Hymenoptera. Diptera are insects belonging to the complete metamorphosis, and have only one pair of wings, and the bodies are generally short, wide or slender, cylindrical or spherical, and the body length is very little more than 25 mm. Dipteran insects are wide and diverse in feeding habits and are roughly classified into phytophagy, saprophagy or fecundity, predation and parasitism. Diptera insects are closely related to human life, and some species transmit pathogens such as bacteria, parasites, viruses and the like between human beings and animals, and also comprise larvae of seed flies, leaf miners, fruit flies, gall midges and the like, and are important pests in agriculture. Therefore, the method effectively identifies the species of the diptera insects, and has important significance for the healthy growth of human and animals and the prevention and control of agricultural and forestry plant diseases and insect pests.

The dipteran insects are of various types, very similar in shape and not easy to distinguish. The traditional identification method mainly identifies by observing various characteristics such as the shape, the color, the texture and the like of the insect, but the method needs to consume a great deal of manpower and time, and particularly reduces the identification accuracy rate under the long-time work of people. The traditional machine learning identification method usually adopts a mode of combining feature extraction and a classifier, but needs to manually select feature parameters, so that the problem of dependence on manually designed features exists, and the accuracy and the stability of an identification result are influenced. With the rapid development of the convolutional neural network, the insect species can be identified by utilizing deep learning. The deep learning algorithm in the field of target detection is mainly divided into a two-stage target detection algorithm and a single-stage target detection algorithm, wherein the two-stage target detection algorithm has the problem of consuming detection time; the single-stage target detection algorithm has high detection speed, but has low accuracy in identifying different targets with similar characteristics, particularly identifying dipteran insects with similar characteristics such as shapes, colors and textures.

Disclosure of Invention

In accordance with the above-mentioned technical problem, there is provided a method for identifying a dipteran insect based on a deep convolutional neural network, comprising the steps of:

step S1: collecting a diptera insect image, and making a data set of the diptera insect image;

step S2: performing data enhancement on the data set, classifying the enhanced data set into a training set, a verification set and a test set according to the proportion of 8:1:1 and the mode that the number of each diptera insect image is equal, and labeling the diptera insects in the data set by a labelImg image labeling tool to generate a labeled image in an XML format;

the data enhancement of the data set comprises: rotating the data set image by 90 degrees, zooming the data set image by 20 percent and carrying out local fuzzy processing;

step S3: constructing an improved RetinaNet target detection model;

step S4: setting training parameters, and training the RetinaNet target detection model through the data set;

step S5: classifying and positioning the test set images of the diptera insects based on the trained target detection model.

Compared with the prior art, the invention has the following advantages:

the invention provides a diptera insect identification method based on a deep convolutional neural network, which does not need to consume a large amount of manpower and material resources, solves the problem of dependence on manual design characteristics, and is simple in operation of an image acquisition method. The method adopts a RetinaNet target detection model, takes a ResNeXt network as a feature extraction network, adds an improved attention module in the feature extraction network, enhances the expression capability of a convolutional neural network, effectively improves the information flow in the network, improves a feature pyramid network FPN, and solves the problem of low identification accuracy rate of the diptera insects caused by similarity of various features such as shape, color, texture and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a method for identifying diptera insects based on a deep convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a camera according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a RetinaNet target detection model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the connection of a channel and a spatial attention module according to an embodiment of the present invention;

FIG. 5 is a schematic view of a channel attention module according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a spatial attention module according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, the present invention provides a method for identifying diptera insects based on a deep convolutional neural network, comprising the following steps:

step S1: and acquiring the diptera insect image and making a data set of the diptera insect image.

the data enhancement of the data set comprises: and (3) performing 90-degree rotation on the data set image, zooming the data set image by 20%, and performing local blurring processing.

Step S3: and constructing an improved RetinaNet target detection model.

Step S4: and setting training parameters, and training the RetinaNet target detection model through the data set.

In the embodiment of the present invention, as a preferred implementation manner, in step S1, the collected diptera insect images are laboratory sample images, as shown in fig. 2, which is a schematic diagram of a photographing apparatus according to an embodiment of the present invention, each diptera insect sample is photographed by a camera in vertical and horizontal directions at 45 ° intervals, and the number of each diptera insect image is equal.

In this embodiment of the present invention, the step S3 further includes the following steps:

step S31: the ResNeXt network is used as a characteristic extraction network;

step S32: adding an improved convolution block attention module in the feature extraction network;

step S33: improving a feature pyramid network FPN, and taking a small full convolution network FCN as a classification subnet and a regression subnet;

step S34: and adopting a Focal local function as a classification Loss function, and adopting a KL Loss function as a frame regression Loss function.

In step S31 described in the embodiment of the present invention, as shown in fig. 3, a schematic diagram of a RetinaNet target detection model according to the embodiment of the present invention is shown. The feature extraction network adopts a ResNeXt network, the ResNeXt network adopts a layer stacking strategy similar to a VGG network and a ResNet network, a splitting-converting-fusing idea is adopted in a simple and extensible mode, a radix increasing idea is adopted in the ResNeXt network, and each branch adopts the same topological structure.

In step S32, FIG. 4 is a schematic diagram of the connection between the channel and the spatial attention module according to the embodiment of the present invention, which shows a specific process of giving an intermediate feature map F ∈ R^C×H×WAs an input, a one-dimensional channel attention map M is obtained by a channel attention module_C(F) Will M_C(F) Multiplying the feature map F element by element to obtain a new feature map F', and generating a two-dimensional space attention map M by a space attention module_S(F ') and multiplying the obtained result by the F ' element by element to obtain a final output feature map F '.

Specifically, as shown in FIG. 5, which is a schematic diagram of a channel attention module according to an embodiment of the present invention, an improved specific process of the channel attention module is to firstly input a feature map F ∈ R^C×H×WAcquiring three channel attention vectors through spatial information of an average-posing, max-posing and mixed-posing aggregated feature map, entering a shared network formed by a hidden layer and a multilayer perceptron MLP, generating three attention vectors with the dimension of C × 1 × 1, namely the acquired three channel attention vectors, and simultaneously setting the activation size of the hidden layer as R for reducing the parameter number^C/r×1×1The activation function is ReLU, where r represents the reduction rate; adding the number of the feature vectors to C to obtain the same number of feature vectors as the number of feature map channels from the output layer, and adding the threeSumming corresponding positions of the feature vectors, and finally generating a channel attention diagram M with the dimension C × 1 × 1 through a sigmoid function_C(F)，M_C(F) The expression is as follows:

wherein, W₀And W₁Is the weight of the MLP, delta is the sigmoid function,

in order to average the pooling of the feature maps F,

in order to maximize pooling of the feature map F,

to pool the feature map F in a mixed manner.

Specifically, as shown in fig. 6, which is a schematic diagram of a spatial attention module according to an embodiment of the present invention, an improved specific process of a channel attention module includes: drawing the channel attention map M_C(F) Multiplying the feature map F element by element to obtain an optimized feature map F' with a channel attention map, then performing average-posing, max-posing and mixed-posing to generate three feature maps with the same dimensionality, performing convolution operation through a convolution kernel of 7 × 7 after connecting the three feature maps, and finally generating a two-dimensional spatial attention map M through a sigmoid function_S(F')，M_S(F') the expression is as follows:

wherein f is^7×7For the convolution operation with a convolution kernel of 7 × 7,

to average the pooling of the feature maps F',

to maximize pooling of the feature map F',

to pool the feature map F' in a mixed manner.

In step S32 described in this embodiment of the present invention, the improved feature pyramid network is to construct a cross-pyramid hierarchical fusion structure based on FPN, where in resenext, the outputs of conv3, conv4, and conv5 are denoted as C3, C4, and C5. P6 is obtained by 3 × 3 convolutional layers with the step size of 2 on the basis of C5, P7 is obtained by 3 × 3 convolutional layers with the step size of 2 after the ReLU function on the basis of P6, a feature map obtained by four-time upsampling of P7 is subjected to element fusion with a feature map obtained by 1 × 1 convolutional layers of C5, and then P5 is obtained by 3 × 3 convolutional layers, a feature map obtained by four-time upsampling of P6 is subjected to element fusion with a feature map obtained by 1 × 1 convolutional layers of C4 and a feature map obtained by twice upsampling of P5, and then P4 is obtained by 3 × 3 convolutional layers, a feature map obtained by four-time upsampling of P5 is subjected to element fusion with a feature map obtained by 1 × 1 convolutional layers of C3 and a feature map obtained by two-time upsampling of P4, and then P3 is obtained by 3 × 3 convolutional layers.

In step S34, the calculation formula of the Focal local function as the classification Loss function for identifying dipteran insects is as follows:

L_cls＝FL(p_i)；

FL(p_i)＝-α_i(1-p_i)^γlog(p_i)；

wherein L is_clsAs a function of the classification loss, p_iPredicting probabilities for different classes of targets, α_iAdjusting the proportion of positive and negative samples as a weight parameter, and adjusting the proportion of weights of easily classified samples by taking gamma as a focusing parameter;

the KL loss function is used as a border regression loss function of the diptera insect recognition, and the calculation formula is as follows:

L_reg＝KL；

wherein L is_regIs a bounding box regression loss function, x_gAs the true bounding box position, x_eTo predict bounding box positions, σ is the standard deviation of localization, and to avoid gradient explosion, the model predicts β ═ log (σ)²) (ii) a When | x_g-x_e|>1, the calculation formula is as follows:

in an embodiment of the present invention, the step S4 includes the following steps:

step S41: a pre-trained weight model on the ImageNet dataset was used as the RetinaNet initialization weight model.

Step S42: in the training process, a random gradient descent algorithm is adopted to optimize a training model, the initial learning rate is 0.001, the weight attenuation is 0.0001, the momentum factor is 0.9, the epoch is initialized to 81, iteration is performed to the 45 th epoch, the learning rate is reduced to 10% of the previous stage, iteration is performed to the 65 th epoch, the learning rate is reduced to 10% of the previous stage, and the number of training images in each batch is 32.

Step S43: when the intersection ratio IoU of the anchor frame anchorage and the real mark frame is more than 0.5, the anchorage is marked as a positive sample, when IoU of the anchorage and the real mark frame is less than 0.3, the anchorage is marked as a negative sample, when IoU of the anchorage and the real mark frame is between 0.3 and 0.5, the anchorage is ignored during training, the non-maximum suppression is set to be 0.7, redundant detection frames are removed, and an object detection position is found.

In step S34 according to the embodiment of the present invention, the local function sets the weight parameter α to be 0.25 and the focus parameter γ to be 2.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A diptera insect identification method based on a deep convolutional neural network is characterized by comprising the following steps:

s1: collecting a diptera insect image, and making a data set of the diptera insect image;

s2: performing data enhancement on the data set, classifying the enhanced data set into a training set, a verification set and a test set according to the proportion of 8:1:1 and the mode that the number of each diptera insect image is equal, and labeling the diptera insects in the data set by a labelImg image labeling tool to generate a labeled image in an XML format;

s3: constructing an improved RetinaNet target detection model;

s4: setting training parameters, and training the RetinaNet target detection model through the data set;

s5: classifying and positioning the test set images of the diptera insects based on the trained target detection model.

2. The method of claim 1, wherein the deep convolutional neural network-based insect recognition of Diptera is performed by a deep convolutional neural network,

in step S1, the collected diptera insect images are laboratory sample images, and each diptera insect sample is photographed by a camera in the vertical and horizontal directions at an interval of 45 °, and the number of each diptera insect image is equal.

3. The method of claim 1, wherein the deep convolutional neural network-based insect recognition of Diptera is performed by a deep convolutional neural network,

the step S3 includes the steps of:

s31: the ResNeXt network is used as a characteristic extraction network;

s32: adding an improved convolution block attention module in the feature extraction network;

s33: improving a feature pyramid network FPN, and taking a small full convolution network FCN as a classification subnet and a regression subnet;

s34: and adopting a Focal local function as a classification Loss function, and adopting a KL Loss function as a frame regression Loss function.

4. The method of claim 3, wherein the deep convolutional neural network-based insect recognition of Diptera is performed,

the improved channel attention module specifically inputs a feature map F ∈ R first^C×H×WObtaining three channel attention vectors through spatial information of aggregate characteristic diagrams of average-posing, max-posing and mixed-posing respectively, entering a shared network formed by a hidden layer and a multilayer perceptron MLP, and generating three attention vectors with the dimensionality of C × 1 × 1, namely the obtained attention vectorsThree channels attention vectors, while to reduce the number of parameters, the hidden layer activation size is set to R^C ^/r×1×1Adding the number of the feature vectors to C to obtain the same number of feature vectors as the number of the feature map channels, summing the corresponding positions of the three feature vectors, and finally generating a channel attention map M with the dimension of C × 1 × 1 by a sigmoid function_C(F)，M_C(F) The expression is as follows:

wherein, W₀And W₁Is the weight of the MLP, delta is the sigmoid function,

in order to average the pooling of the feature maps F,

in order to maximize pooling of the feature map F,

to pool the feature map F in a mixed manner.

5. The method of claim 4, wherein the deep convolutional neural network-based insect recognition of Diptera is performed,

drawing the channel attention map M_C(F) Multiplying the feature map F element by element to obtain an optimized feature map F' with a channel attention map, then performing average-posing, max-posing and mixed-posing to generate three feature maps with the same dimensionality, performing convolution operation through a convolution kernel of 7 × 7 after connecting the three feature maps, and finally generating a two-dimensional spatial attention map M through a sigmoid function_S(F')，M_S(F') the expression is as follows:

to average the pooling of the feature maps F',

to maximize pooling of the feature map F',

to pool the feature map F' in a mixed manner.

6. The method of claim 3, wherein the deep convolutional neural network-based insect recognition of Diptera is performed,

in step S33, the improved feature pyramid network is to construct a cross-pyramid hierarchical fusion structure based on the original FPN, and in resenext, the outputs of conv3, conv4, and conv5 are denoted as C3, C4, and C5. P6 is obtained by 3 × 3 convolutional layers with the step size of 2 on the basis of C5, P7 is obtained by 3 × 3 convolutional layers with the step size of 2 after the ReLU function on the basis of P6, a feature map obtained by four-time upsampling of P7 is subjected to element fusion with a feature map obtained by 1 × 1 convolutional layers of C5, and then P5 is obtained by 3 × 3 convolutional layers, a feature map obtained by four-time upsampling of P6 is subjected to element fusion with a feature map obtained by 1 × 1 convolutional layers of C4 and a feature map obtained by twice upsampling of P5, and then P4 is obtained by 3 × 3 convolutional layers, a feature map obtained by four-time upsampling of P5 is subjected to element fusion with a feature map obtained by 1 × 1 convolutional layers of C3 and a feature map obtained by two-time upsampling of P4, and then P3 is obtained by 3 × 3 convolutional layers.

7. The method of claim 3, wherein the deep convolutional neural network-based insect recognition of Diptera is performed,

in step S34, the focallloss function calculation formula is as follows:

L_cls＝FL(p_i)；

FL(p_i)＝-α_i(1-p_i)^γlog(p_i)；

the KL loss function calculation formula is as follows:

L_reg＝KL；

wherein L is_regIs a bounding box regression loss function, x_gAs the true bounding box position, x_eTo predict the bounding box position, σ is the standard deviation of localization, and to avoid gradient explosion, the model predicts β -log (σ)²)；

For when | x_g-x_e|>1, the adopted calculation formula is as follows:

8. the method of claim 1, wherein the deep convolutional neural network-based insect recognition of Diptera is performed by a deep convolutional neural network,

the step S4 includes the steps of:

s41: taking a pre-training weight model on the ImageNet data set as a RetinaNet initialization weight model;

s42: in the training process, a random gradient descent algorithm is adopted to optimize a training model, the initial learning rate is 0.001, the weight attenuation is 0.0001, the momentum factor is 0.9, the epoch is initialized to 81, iteration is performed to the 45 th epoch, the learning rate is reduced to 10% of the previous stage, iteration is performed to the 65 th epoch, the learning rate is reduced to 10% of the previous stage again, and the number of training images in each batch is 32;

s43: when the intersection ratio IoU of the anchor frame anchorage and the real mark frame is more than 0.5, the anchorage is marked as a positive sample, when IoU of the anchorage and the real mark frame is less than 0.3, the anchorage is marked as a negative sample, when IoU of the anchorage and the real mark frame is between 0.3 and 0.5, the anchorage is ignored during training, the non-maximum suppression is set to be 0.7, redundant detection frames are removed, and an object detection position is found.

9. The method for identifying Diptera insects based on the deep convolutional neural network as claimed in claim 3 or 7, wherein the Focal local function is set with a weight parameter α of 0.25 and a focusing parameter γ of 2.