CN111178120B

CN111178120B - Pest image detection method based on crop identification cascading technology

Info

Publication number: CN111178120B
Application number: CN201811586713.5A
Authority: CN
Inventors: 陈天娇; 王儒敬; 王方元; 谢成军; 刘万才; 张洁; 李�瑞; 陈红波; 董伟; 胡海瀛
Original assignee: Hefei Institutes of Physical Science of CAS
Current assignee: Hefei Institutes of Physical Science of CAS
Priority date: 2018-12-25
Filing date: 2018-12-25
Publication date: 2023-04-21
Anticipated expiration: 2038-12-25
Also published as: CN111178120A

Abstract

The invention relates to a pest image detection method based on a crop identification cascading technology, which solves the defect that cross correlation among different types of pests influences pest detection results compared with the prior art. The invention comprises the following steps: acquiring a basic data image; constructing and training a multi-layer perception information recognition network; constructing and training a multi-projection detection model; acquiring an image of a pest to be detected; and (3) detecting pest images. The invention provides a two-stage cascade pest detection method based on mobile vision, which can be applied to large-scale multiple pest data, and by evaluating the method in a newly established large-scale data set of grain crop field pest data set, the method is superior to the traditional advanced object detection method.

Description

Pest image detection method based on crop identification cascading technology

Technical Field

The invention relates to the technical field of pest image detection, in particular to a pest image detection method based on a crop identification cascading technology.

Background

In the prior art, automatic field pest detection and identification by using a mobile vision technology is a hot topic of modern intelligent agriculture, but the automatic field pest detection and identification also faces serious challenges including complexity of a wild environment, detection of tiny pests and classification of various pests. While recent deep learning based mobile vision techniques have met with some success in overcoming the above problems, one key issue is that for a large variety of pest data, the unbalanced classes significantly reduce their detection and recognition accuracy.

In popular terms, the conventional deep learning pest image recognition method, which trains for a large number of pest images, involves a plurality of pest species such as aphids on rice, spider wheat, etc. However, from the data source of the training samples, it is impossible to achieve that the training numbers of the pests related to various crops are similar, and from the field collection perspective, the sample sets with similar numbers cannot be obtained, so that the current situation that a certain pest training sample depends on a certain crop is more and another pest training sample depends on another crop is less occurs. The current situation of the samples leads to that the traditional deep learning detection model can effectively train the pests with a large number of pest samples when carrying out pest image training, and can effectively detect and identify during detection and identification; and the pests with small pest sample number cannot be effectively trained, are easily mistakenly identified as other types of pests during detection and identification, or cannot be positioned, and have poor robustness.

Therefore, how to avoid the cross-correlation between pest categories to achieve the effect of accurately identifying pest images has become an urgent technical problem to be solved.

Disclosure of Invention

The invention aims to solve the defect that cross correlation among different types of pests influences pest detection results in the prior art, and provides a pest image detection method based on a crop identification cascade technology to solve the problems.

In order to achieve the above object, the technical scheme of the present invention is as follows:

a pest image detection method based on crop identification cascading technology comprises the following steps:

acquiring a basic data image: acquiring a pest image basic data set, and performing label definition on the pest image basic data set, wherein label categories are classified pest images, geographic information, time information and environment information;

constructing and training a multi-layer perception information recognition network: constructing a multi-layer perception information identification network for classifying crops depending on pests based on various perception information and training the multi-layer perception information identification network;

constructing and training a multi-projection detection model: respectively constructing a multi-projection detection model according to the crop types and training;

acquiring an image of a pest to be detected;

detection of pest images: firstly inputting the pest image to be detected into a multi-layer perception information recognition network to recognize the crop category on which the pests depend; and inputting the pictures into a multi-projection detection model corresponding to the specific crops to detect and count pests.

The construction and training of the multi-layer perception information identification network comprises the following steps:

the multi-layer perception information identification network is set to comprise four ResNet-50 convolutional neural networks and a decision network, wherein the four ResNet-50 convolutional neural networks are respectively used for classifying crop types depended by pests in images, geographic information acquired by the images, time information and environmental information;

the method comprises the steps of calling a basic data image and acquired multiple kinds of perception information from a pest image basic data set, encoding the multiple kinds of perception information into corresponding labels, and inputting a multi-layer perception information recognition network for training;

the four ResNet-50 convolutional neural networks of the multi-layer perception information identification network identify crop category labels, geographic information category labels, time information category labels and environment information category labels of the basic data image, and are combined into image feature vectors;

and sending the image feature vector into a decision network, and identifying crop types depending on pests in the pest image, wherein the architecture of the decision network comprises a full-connection layer and a nonlinear activation function.

The construction and training of the multi-projection detection model comprises the following steps:

taking the pest image as input of a residual network, taking a low-level feature map output by the residual network as input of a projection network, and classifying a pest image basic data set according to crop categories;

a multi-projection detection model is constructed and trained based on the number of crop categories.

setting a plurality of projection convolution blocks;

the initial input of the projected convolution blocks is a depth residual network low-level feature map, and the input of each projected convolution block is sent to a convolution layer of a3×3 convolution kernel to extract pest features, followed by batch normalization and a ReLU function;

integrating the characteristics of the multi-projection convolution network and the depth residual convolution network into super-resolution characteristics, combining the final result generated by the multi-projection convolution network with the last Res5 layer of the depth residual convolution network ResNet-50 to generate super-resolution characteristics, and increasing the weight of low-level convolution through a plurality of projection convolution blocks;

the super-resolution features are fed to the FPN network, trained by the FPN network along with other residual blocks, and then the RPN network is used to generate a number of candidate boxes for field pest detection for classification and regression.

The crop type is wheat, rice or corn, and the multi-projection detection model is a wheat multi-projection detection model, a rice multi-projection detection model or a corn multi-projection detection model.

The detection of the pest image includes the steps of:

inputting the pest image to be detected into a multi-layer perception information identification network, and detecting crop types on which pests depend in the pest image to be detected by the multi-layer perception information identification network;

inputting the pest image to be detected into a multi-projection detection model corresponding to the crop category, generating a super-resolution feature through the multi-projection detection model, then fusing the super-resolution feature and other multi-layer features through an FPN (fast forward network), and finally generating a plurality of pest detection candidate frames for classification and regression by using the RPN network to detect and count pests.

The method also comprises parameter fine adjustment of the multi-projection detection model, and comprises the following specific steps:

using a model trained by the ImageNet image set as a pre-training model of the multi-projection detection model;

iteratively fine-tuning training a multi-projection detection model using all pest images of all categories related to the current crop;

the multi-projection detection model is trained using iterative fine tuning of all pest images of all crop related categories.

Advantageous effects

Compared with the prior art, the pest image detection method based on the crop identification cascading technology firstly extracts various perception information of the image as a priori knowledge label so as to establish various perception information identification networks for initially classifying the pest image into crop categories; then, detecting and training pest images related to crops by utilizing a multi-projection pest detection model, and combining pest characteristic information from a low-level convolution layer with pest characteristic information in a high-level convolution layer to generate super-resolution characteristics; and finally, improving the effectiveness of field pest detection by using a concentration mechanism and data enhancement.

The invention provides a two-stage cascade pest detection method based on mobile vision, which can be applied to large-scale multiple pest data, and by evaluating the method in a newly established large-scale data set of grain crop field pest data set, the method is superior to the traditional advanced object detection method.

Drawings

FIG. 1 is a process sequence diagram of the present invention;

FIG. 2a is a pest artwork according to the prior art;

FIG. 2b is a graph of pest image characteristics detected using the method of the present invention;

FIG. 2c is a graph of pest image characteristics detected using the Scale-specific method of the prior art;

FIG. 2d is a graph of pest image characteristics detected using the ResNet-50 method of the prior art;

fig. 2e is a characteristic diagram of a pest image detected using the VGG-16 method of the related art.

Detailed Description

For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:

by collecting and analyzing our data sets, we find that some pests will appear on specific crops and have different time, space and various environmental information such as temperature, and various sensing information, and it is necessary to introduce the sensing information into detection.

In addition, we observe that most of the pests on images are independent and small in size, and in these images, pest position information is easily lost after high-level convolution by using the most advanced deep learning object detection method, and field pest characteristics are difficult to extract after shallow-level convolution. Therefore, a field pest detection architecture is proposed that can compensate for the lack of only high-level or only low-level characteristic information.

The multi-projection detection model is used for detecting field pests on specific crops, namely one image firstly detects the types of the crops through a multi-perception information network, and then detects the specific number and positions of the types of the pests through the projection detection model based on the specific crops.

As shown in fig. 1, the pest image detection method based on the crop identification cascade technology of the invention comprises the following steps:

first, a basic data image is acquired. And acquiring a pest image basic data set, and performing label definition on the pest image basic data set, wherein label categories are classified into pest images, geographic information, time information and environment information.

Although there are presently some pest data sets acquired in laboratory environments, such as butterfly data sets and bee data sets. However, the model trained by the data sets cannot be well generalized to pest application in the field natural environment, and a special-task field important pest database based on grain crops can be constructed for solving the detection and counting tasks of pests under natural conditions, wherein the special-task field important pest database comprises 17192 field pest images and 76595 pest labels.

The images were randomly divided into 10 parts, 9 of which were used as training sets and the remainder as test sets from the experimental verification point of view. The data of the pest database are shown in table 1. All images are acquired by intelligent acquisition equipment which is independently developed, when the images are acquired, parameters of a CCD camera are set to be 4mm focal length, the aperture is f/3.3, the image size is 1440 x 1080, a large number of pest images are collected through the intelligent acquisition equipment, and temperature, humidity and geographic position information of the pest images are recorded through sensor equipment. The initial model may be first trained using few artificial tags (about 1k-2 k) during the marking of the data, then more pest images may be automatically marked using the initial trained model, and the automatically marked results corrected manually. Thus, continuous iterative training and correction can improve the performance of the model and save more human resources and costs than full manual labeling.

Table 1 pest database sample display table

And secondly, constructing and training a multi-layer perception information recognition network.

Based on various perception information, a multi-layer perception information recognition network for classifying related crops depending on pests is constructed and trained.

The field pest image photographed by the CCD has various sensing information including geographical location information, time information, and environmental information. Compared with the method for classifying crops by using the original images, the method has the advantages that the field pest images can be effectively classified by fully utilizing various perception information of the field pest images.

Therefore, training is carried out according to the fact that different perception information comprises an original image, the different perception information is encoded into labels of the image, and corresponding classification networks are respectively constructed through the different perception information. For different perception information labels of one image, a corresponding CNN model is used for extracting specific perception information in a training stage, so that a plurality of labels corresponding to the perception information of a single Zhang Haichong image are obtained, an infrastructure ResNet-50 convolutional neural network is used for extracting characteristics, a red CNN model is used for roughly classifying pest images, and other information such as geographic information, time information, environmental information (temperature) and the like of the images are respectively identified by utilizing other CNN models with different colors. The final image classification result is finally obtained by concatenating the different sensory information extracted from each CNN model into feature vectors and putting them into a decision network to achieve a combination between multiple sensory information labels. Briefly, the architecture of a decision network includes a fully connected layer and a nonlinear activation function. Note that the multi-perception information network cannot ensure that each field pest image can be accurately classified, so that it is necessary to use all field pest images to fine tune a specific multi-projection detection model, so as to reduce the influence of image misclassification and improve the system performance and robustness.

The specific steps for constructing and training the multi-layer perception information identification network are as follows:

(1) The multi-layer perception information identification network is set to comprise four ResNet-50 convolutional neural networks and a decision network, wherein the four ResNet-50 convolutional neural networks are respectively used for classifying crop types depended by pests in images, geographic information acquired by the images, time information and environmental information.

(2) And calling the basic data image and the acquired multiple kinds of perception information from the pest image basic data set, coding the multiple kinds of perception information into corresponding labels, and inputting the multiple kinds of perception information into a multi-layer perception information recognition network for training.

(3) The four ResNet-50 convolutional neural networks of the multi-layer perception information identification network identify crop category labels, geographic information category labels, time information category labels and environment information category labels of the basic data image, and are combined into an image feature vector.

(4) And sending the image feature vector into a decision network, and identifying crop types depending on pests in the pest image, wherein the architecture of the decision network comprises a full-connection layer and a nonlinear activation function.

And thirdly, constructing and training a multi-projection detection model. And respectively constructing a multi-projection detection model according to the crop types and training.

After determining the crop category in the pest image, the location and category of the pest needs to be calculated. However, for most CNN models, such as res net and acceptance, the target object size is too small to be visible on the higher-level convolution feature map (e.g., the 32 x 32 or 64 x 64 object in the original map occupies only one pixel in the higher-level convolution feature map) because the resolution of the feature map is reduced to 1/32 or 1/64 of the original image after the higher-level convolution layer. Because shallow semantic information is weaker, the shallow semantic information and deep semantic information are combined, and the capability of improving the shallow semantic information has important significance. In this case, a multi-projection detection model is proposed to improve the performance of field pest detection and reduce the influence of feature disappearance. Unlike the ResNet, inception and VGG detection methods, we introduce multiple projected convolution blocks to increase the weight of the low-level convolution.

Features of the shallow and deep convolution layers are integrated into the super-resolution features by introducing several projected convolution layers. Unlike the residual network, which is a deep network, the multi-projection convolutional network is a shallow network whose final results are combined with the last layer Res5 of residual block Res net-50 to generate super-resolution features, the input of each projection convolutional block is sent to a convolutional layer with a3 x 3 convolutional kernel to extract pest features, followed by batch normalization and ReLU for greatly speeding up the convergence process and avoiding the risk of gradient explosion, and the outputs of the final projection convolutional block and residual block are fed to the FPN, trained with the FPN and other residual blocks, after which many candidate boxes for field pest detection are generated using the RPN for classification and regression.

The specific steps for constructing and training the multi-projection detection model are as follows:

(1) And taking the pest image as the input of a residual network, taking a low-level feature map output by the residual network as the input of a projection network, and classifying the pest image basic data set according to crop categories. The number of the categories may be determined according to the actual use requirement, for example, in the actual application, if only the crop category is wheat, rice or corn, the identified crop category is also wheat, rice or corn.

(2) The multi-projection detection model is constructed and trained according to the number of crop categories, and according to the above example, the multi-projection detection model to be constructed and trained herein is a wheat multi-projection detection model, a rice multi-projection detection model or a corn multi-projection detection model. The wheat multi-projection detection model is used for detecting wheat spiders, armyworms and rice multi-projection detection models are used for detecting rice planthoppers. Through the multi-projection detection model with different crop categories, the influence of training data caused by unbalanced samples among different crop pest categories is avoided. The construction and training process of a plurality of multi-projection detection models are the same, and the construction and training process of a multi-projection detection model, namely the construction and training of the multi-projection detection model, comprises the following steps:

a1 A number of projected convolution blocks are set.

The initial inputs to the projected convolution blocks are depth residual network low-level feature maps, with the inputs to each projected convolution block being sent to the convolution layer of the 3 x 3 convolution kernel to extract pest features, followed by batch normalization and ReLU functions.

Here, the initial input of the projected convolution blocks is a depth residual network low-level feature map, unlike the depth network, which is a residual network, the multi-projected convolution network is a shallow network, the input of each projected convolution block is sent to the convolution layer of the 3 x 3 convolution kernel to extract pest features, followed by batch normalization and ReLU for greatly accelerating the convergence process and avoiding the risk of gradient explosion, the multi-projected convolution network has no pooling layer but the resulting features retain receptive fields.

A2 The characteristics of the multi-projection convolution network and the depth residual convolution network are integrated into the super-resolution characteristics, the final result of the multi-projection convolution network is combined with the last layer Res5 of the depth residual convolution network ResNet-50 to generate the super-resolution characteristics, and the weight of low-level convolution is increased through a plurality of projection convolution blocks.

A3 The super-resolution features are fed to the FPN network, trained by the FPN network along with other residual blocks, and then the RPN network is used to generate a number of candidate boxes for field pest detection for classification and regression.

In this case, in the training process, firstly, an ImageNet pre-trained model is used as an initial parameter, then, pest images of specific crops can be used for fine adjustment, namely, how many kinds of crops exist, how many kinds of detection models are fine-adjusted, and in order to minimize the influence of misclassification, the robustness of the system is improved, and all pictures except the specific crops need to be taken for continuously fine-adjusting the detection model of each specific crop a small amount of times. The parameter fine adjustment of the multi-projection detection model comprises the following specific steps:

b1 Using the model trained by the ImageNet image set as a pre-training model of the multi-projection detection model;

b2 Multiple iterative fine tuning training of a multi-projection detection model using all pest images of all categories related to the current crop;

b3 A multi-projection detection model is trained using all pest images of all crop-related categories with a small number of iterative fine tuning.

The parameter fine adjustment meaning of the multi-projection detection model is as follows: in the practical application process, if the multi-layer perception information identification network does not accurately identify the crop types in the image and identifies the rice as wheat, the detection of the wheat spiders aiming at the rice planthoppers in the wheat multi-projection detection model of the image information is likely to cause detection errors. In order to avoid the occurrence of errors, the wheat multi-projection detection model has certain detection capability for rice planthoppers, so that the wheat multi-projection detection model can perform repeated iterative fine tuning training on the wheat spider images and perform less iterative fine tuning training on the rice planthopper images.

Fourth, obtaining the pest image to be detected. And obtaining a pest image to be detected so as to carry out detection and identification of the pest image.

And fifthly, detecting pest images. Firstly, inputting pest images to be detected into a multi-layer perception information identification network to identify crop types (rice) on which pests depend; and inputting the pictures into a multi-projection detection model (rice planthoppers) corresponding to the specific crops to detect and count pests. The method comprises the following specific steps:

(1) And inputting the pest image to be detected into a multi-layer sensing information identification network, and detecting the crop type on which the pests in the pest image to be detected depend by the multi-layer sensing information identification network.

(2) Inputting the pest image to be detected into a multi-projection detection model corresponding to the crop category, generating a super-resolution feature through the multi-projection detection model, then fusing the super-resolution feature and other multi-layer features through an FPN (fast forward network), and finally generating a plurality of pest detection candidate frames for classification and regression by using the RPN network to detect and count pests.

The verification results for the method of the present invention are shown below:

the detection results of the method of the invention on armyworms (wheat), rice planthoppers (rice) and wheat spiders (wheat) are shown in table 2.

TABLE 2 comparison of the recognition rates of the method of the present invention and the prior art method

As can be seen from Table 2, the pest identification rate of the method of the present invention is higher than that of the currently advanced detection methods of Scale-specific detection (from P.Hu and D.Ramann.finishing Tiny faces (CVPR), 2017), VGG-16 and ResNet-50.

As shown in fig. 2a, 2b, 2c, 2d and 2e, wherein the first row is an artwork, the second row and the third row are enlarged views of the detection details in the artwork of the first row, respectively. It can be seen that the pest is activated more obviously in the characteristic diagram of the method, which is more beneficial to detection and counting.

Here, the present invention learns the spatial information of pests from the lower convolutional layer of the res net-50, and in order to verify the validity of the spatial information in the different convolutional layers, experimental results from four different residual blocks are introduced (because the res net-50 has only five residual blocks). As shown in Table 3, the fusion of different layers can improve the performance of the method for detecting myxoplasma, rice planthoppers and wheat spider pests, and the recursion of the spatial information in the low-level convolution layer is proved to have important significance for detection. Another problem is that spatial information from a shallower convolutional layer is more useful than spatial information from a deeper convolutional layer, and if a multi-projection convolutional network is introduced at the third or fourth residual block, the pest spatial information feature has been corrupted by several residual blocks.

Table 3 recognition rate comparison table with different layers of residual error network as projection convolution network input

The verification results of the parameter tuning step for the multi-projection detection model are shown in table 4, and the recognition rates for a small amount of tuning (multi-projection detection model parameter tuning) and no tuning (multi-projection detection model parameter tuning) for all crop pictures are shown in table 4. As can be seen from Table 4, the pest identification rate after fine adjustment of the parameters of the multi-projection detection model is also improved correspondingly.

Table 4 comparison table of recognition rate before and after parameter fine tuning of multi-projection detection model

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The pest image detection method based on the crop identification cascading technology is characterized by comprising the following steps of:

11 Acquiring a base data image): acquiring a pest image basic data set, and performing label definition on the pest image basic data set, wherein label categories are classified pest images, geographic information, time information and environment information;

12 Constructing and training a multi-layer perceptual information recognition network: constructing a multi-layer perception information identification network for classifying crops depending on pests based on various perception information and training the multi-layer perception information identification network;

121 Setting a multi-layer perception information identification network comprising four ResNet-50 convolutional neural networks and a decision network, wherein the four ResNet-50 convolutional neural networks are respectively used for classifying crop types depended on pests in images, geographic information acquired by the images, time information and environmental information;

122 Extracting basic data images and acquired multiple kinds of perception information from the pest image basic data set, encoding the multiple kinds of perception information into corresponding labels, and inputting the multiple kinds of perception information into a multi-layer perception information recognition network for training;

123 Four ResNet-50 convolutional neural networks of the multi-layer perception information identification network identify crop category labels, geographic information category labels, time information category labels and environment information category labels of the basic data image, and are combined into image feature vectors;

124 The image feature vector is sent into a decision network, the crop category on which the pests depend in the pest image is identified, and the architecture of the decision network comprises a full-connection layer and a nonlinear activation function;

13 Constructing and training a multi-projection detection model: respectively constructing a multi-projection detection model according to the crop types and training; the construction and training of the multi-projection detection model comprises the following steps:

131 Taking the pest image as input of a residual network, taking a low-level feature map output by the residual network as input of a projection network, and classifying the pest image basic data set according to crop categories;

132 Constructing and training a multi-projection detection model according to the number of crop categories; the construction and training of the multi-projection detection model comprises the following steps:

1321 Setting a plurality of projection convolution blocks;

1322 The characteristics of the multi-projection convolution network and the depth residual convolution network are integrated into the super-resolution characteristics, the final result generated by the multi-projection convolution network is combined with the last layer Res5 of the depth residual convolution network ResNet-50 to generate the super-resolution characteristics, and the weight of low-level convolution is increased through a plurality of projection convolution blocks;

1323 Feeding the super-resolution features to the FPN network, training together by the FPN network and other residual blocks, and then using the RPN network to generate a number of candidate boxes for field pest detection for classification and regression;

14 Obtaining an image of the pest to be detected;

15 Detection of pest images: firstly inputting the pest image to be detected into a multi-layer perception information recognition network to recognize the crop category on which the pests depend; and inputting the pictures into a multi-projection detection model corresponding to the specific crops to detect and count pests.

2. The pest image detection method based on the crop identification cascade technology according to claim 1, characterized by comprising the steps of: the crop type is wheat, rice or corn, and the multi-projection detection model is a wheat multi-projection detection model, a rice multi-projection detection model or a corn multi-projection detection model.

3. The pest image detection method based on the crop identification cascade technology according to claim 1, wherein the detection of the pest image includes the steps of:

31 Inputting the pest image to be detected into a multi-layer perception information identification network, and detecting the crop type on which the pests in the pest image to be detected depend by the multi-layer perception information identification network;

32 Inputting the pest image to be detected into a multi-projection detection model corresponding to the crop category, generating a super-resolution feature through the multi-projection detection model, then fusing the super-resolution feature and other multi-layer features through an FPN network, and finally generating a plurality of pest detection candidate frames for classification and regression by using the RPN network to detect and count pests.

4. The pest image detection method based on the crop identification cascade technology according to claim 1, characterized by comprising the steps of: the method also comprises parameter fine adjustment of the multi-projection detection model, and comprises the following specific steps:

41 Using the model trained by the ImageNet image set as a pre-training model of the multi-projection detection model;

42 Iteratively fine-tuning a multi-projection detection model using all pest images of all categories related to the current crop;

43 Iterative fine tuning training of a multi-projection detection model using all pest images of all crop related categories.