CN116597312A

CN116597312A - Crop leaf disease and pest identification method based on small sample image semantic segmentation

Info

Publication number: CN116597312A
Application number: CN202310577254.9A
Authority: CN
Inventors: 李雅琴; 王丹丹; 袁操; 曾山
Original assignee: Wuhan Polytechnic University
Current assignee: Wuhan Polytechnic University
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-08-15

Abstract

The invention discloses a method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation, which is applied to the technical field of identifying crop leaf diseases and insect pests and comprises the following steps: collecting images of crop disease and insect pest leaves, dividing the images into two groups of images of a support set and a query set, and carrying out image segmentation labeling on the images in the support set to obtain a support set with labels and a query set without labels; carrying out feature extraction on images in a support set and a query set by adopting a pre-trained deep learning network model to obtain a feature map of the support set and a feature map of the query set; using a dense comparison module to carry out dense comparison on the support set feature map and the query set feature map so as to obtain a preliminary segmentation image of the query set; and optimizing the preliminary segmentation image of the query set by using an iterative optimization module to obtain an accurate segmentation image of the query set. The method can realize the identification of crop leaf diseases and insect pests by using a very small number of marked images.

Description

Crop leaf disease and pest identification method based on small sample image semantic segmentation

Technical Field

The invention relates to the technical field of crop leaf disease and insect pest identification, in particular to a method for identifying crop leaf disease and insect pest based on small sample image semantic segmentation.

Background

Small sample size is a significant challenge in agricultural image segmentation because acquiring large amounts of annotation data for training is often impractical and time consuming. However, accurate image segmentation is critical for various agricultural applications such as crop monitoring, yield estimation and disease diagnosis. Conventional supervised deep learning based segmentation methods can achieve good results when trained on large amounts of marker data. However, training of these networks on agronomic images is often difficult to achieve for two reasons: a sufficient amount of expert annotation data is lacking for training because these annotations require not only relevant lesion knowledge but also cost and time; in addition, it is impractical to train a new, specific model to cover every invisible topic, regardless of the number of agricultural plant lesion segmentation topics.

In recent years, development of a deep learning method provides a new idea for plant pest detection. The plant disease and pest detection method based on deep learning mainly comprises a Convolutional Neural Network (CNN) based method, a cyclic neural network (RNN) based method, an image segmentation based method and the like. The convolutional neural network method is widely applied to the segmentation task of plant diseases and insect pests, and remarkable performance improvement is achieved. However, acquiring large amounts of marker data for training is often impractical and time consuming, and convolutional neural network models are prone to overfitting when faced with small sample data, thereby affecting the accuracy of the classification and segmentation results.

To solve this problem, researchers have begun to try to improve the performance of plant pest segmentation based on a small sample learning approach. Small sample learning has become a promising approach to agricultural image segmentation. For a never seen class, the small sample model can learn its features from a very small amount of several annotation data. Allowing the model to effectively segment new targets using limited labeling data without requiring retraining the model with large amounts of labeling data. If small sample learning is applied to an agronomic image, the user can effectively segment rare or new lesions using only a small number of labeled samples. The small sample learning method comprises a data enhancement technology, meta learning, transfer learning and the like. The data enhancement technology expands the scale and diversity of the training data set by performing operations such as transformation, clipping, rotation, noise addition and the like on the original image, thereby alleviating the problem of overfitting. Meta-learning improves the generalization ability of a model on new tasks by pre-training a generic model or optimizer on a large-scale dataset and fine-tuning or updating on a small sample dataset. Transfer learning reduces the amount of training data required by the target domain by leveraging knowledge already in the source domain to assist in learning related tasks in the target domain.

Therefore, the invention provides a method for identifying crop leaf diseases and insect pests based on semantic segmentation of small sample images, which solves the problems of the prior art and is a problem to be solved by the technicians in the field.

Disclosure of Invention

In view of the above, the invention provides a method for identifying crop leaf diseases and insect pests based on semantic segmentation of small sample images, and by using the method, the identification of crop leaf diseases and insect pests can be realized by using a very small number of labeling images.

In order to achieve the above object, the present invention provides the following technical solutions:

a crop leaf disease and insect pest identification method based on small sample image semantic segmentation comprises the following steps:

s1, acquiring images of crop disease and insect pest leaves, dividing the images into two groups of images of a support set and a query set, and carrying out image segmentation labeling on the images in the support set to obtain a support set with labels and a query set without labels;

s2, carrying out feature extraction on images in a support set and a query set by adopting a pre-trained deep learning network model to obtain a feature map of the support set and a feature map of the query set;

s3, performing intensive comparison on the support set feature map and the query set feature map by using a intensive comparison module to obtain a preliminary segmentation image of the query set;

and S4, optimizing the preliminary segmentation image of the query set by using an iterative optimization module to obtain an accurate query set segmentation image.

Optionally, in S1, the support set image is segmented by a Canny edge detector and the segmented regions are manually labeled.

Optionally, in S2, feature extraction is performed on the support set and query set images using a deep learning network model res net.

Alternatively, the network in ResNet is represented by the following mathematical model:

(1) Convolution layer: for the input image I and the convolution kernel K, the convolution operation can be expressed as:

wherein C (i, j) is an element of the ith row and jth column of the output feature map;

(2) Pooling layer: for the input feature map F, the pooling operation can be expressed as:

where P (i, j) is the element of the ith row and jth column of the output feature map, s is the pooling step size, m and n are the size of the pooling window;

(3) Residual error module: for the input feature map X and residual function F, the residual module can be expressed as:

Y＝F(X)+X

wherein Y is an output feature map and F (X) is a nonlinear function composed of a convolution layer and an activation function;

(4) Global average pooling layer: for the input feature map G, the global average pooling layer can be expressed as:

wherein v is _c Is the c-th element of the output vector, H and W are the height and width of the input feature map, and c is the number of channels.

Optionally, in S3, the pest category of the query set is calculated by comparing the similarity between the support set feature map and the query set feature map.

Optionally, in the dense comparison module of S3, a cosine distance is used as a measure function of the similarity of the feature map, and for two vectors a, B in space, the remaining chord distances are:

wherein, represents the dot product, the A represents the modular length of the vector A, and the calculation formula is as follows

Optionally, in S3, the support set has multiple images, and then a more accurate support set feature map is obtained by using the attention mechanism module; the attention mechanism module consists of two convolution blocks, the first having 256 3 x 3 filters followed by a 3 x 3 max pooling layer; the second convolution block has a 3 x 3 convolution layer followed by a global averaging pooling layer; the output of the attention branch is λ, representing the weight of each support sample.

Optionally, the SoftMax function is used to normalize the weights of all the support samples so that their sum is 1, and the specific calculation formula is as follows:

compared with the prior art, the invention discloses a crop leaf disease and pest identification method based on small sample image semantic segmentation, which has the following beneficial effects:

(1) Adopting a step 1, using an edge detection algorithm based on a Canny operator, smoothing an image by using a Gaussian filter, calculating the gradient strength and direction of each pixel point in the image, carrying out non-maximum suppression on the gradient amplitude, and detecting by using a double-threshold algorithm to determine the real and potential edges. The Canny operator is insensitive to noise, can detect a real weak edge, and is a very effective edge detection method.

(2) And 2, performing feature extraction on images in a support set and a query set, performing feature extraction on a ResNet deep learning network, and compared with a traditional CNN network, the ResNet introduces the concept of residual blocks (residual blocks) which add input and output through skip connections (skip connections), so that the problem of gradient disappearance or explosion is avoided. Thus, resNet can build very deep feature extraction networks, thereby mining more image features.

(3) And 3, comparing the similarity of the support set feature map and the query set feature map by using the cosine distance to obtain a preliminary query set segmentation image. The cosine distance (Cosine Similarity) is the similarity between two vectors determined by measuring the cosine value of the angle between the two vectors. It is usually used for positive space and thus gives a value between 0 and 1. It represents a relative difference in direction, irrespective of absolute differences in value. Its advantage is high Euclidean distance in high-dimensional data.

(4) And 4, optimizing the edges of the preliminary divided images by using an iterative residual error connection method based on the preliminary divided images of the query set in the step 3 to obtain accurate query set divided images.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for identifying crop leaf diseases and insect pests based on semantic segmentation of small sample images.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention discloses a crop leaf pest identification method based on small sample image semantic segmentation, which comprises the following steps:

Further, S1 is specifically:

since the pest areas supporting the collection image need to be marked, the surrounding areas of the pest areas need to be segmented by using an image segmentation algorithm. Edge-based image segmentation algorithms refer to the process of achieving segmentation by detecting edges of different regions in an image, typically using first or second derivatives to find where the pixel gray values change drastically, and then connecting the edge points to form a closed region. The image segmentation algorithm based on the edge comprises a Canny edge detector, a Sobel operator, a Laplacian operator, a watershed algorithm and the like.

The Canny operator is a non-differential edge detection operator, the image is smoothed by a Gaussian filter, the gradient strength and direction of each pixel point in the image are calculated, the gradient amplitude is subjected to non-maximum suppression, and the real and potential edges are determined by detection of a double-threshold algorithm. The Canny operator is insensitive to noise, can detect a real weak edge, and is a very effective edge detection method. The main steps of the Canny edge detector are as follows:

gaussian smoothing filtering: convolving the image with a gaussian filter kernel to generate the equation:

where x and y are the center points of the kernels and σ is the standard deviation.

Calculating gradient amplitude and direction: the gradient amplitude G and the direction theta of each pixel point in the image are calculated by using a first-order finite difference operator (such as a Sobel operator), and the calculation formula is as follows:

θ＝arctan(G _y /G _x )

wherein G is _x And G _y The gradient values in the X and Y directions can be obtained by convolving the image with a Sobel operator.

Let a window of n x n in the image be A and Sobel operator be S _x ,S _y All are the same as the a dimension. Then there are:

G _x ＝S _x *A,G _y ＝S _y *A

where is a convolution operator.

Non-maximum suppression: and judging the local maximum value of the gradient amplitude according to the gradient direction, and inhibiting non-maximum value points to obtain a refined edge image. The method comprises the following specific steps:

1) The gradient intensity of the current pixel is compared with two pixels along the positive and negative gradient directions.

2) If the gradient intensity of the current pixel is maximum compared to the other two pixels, the pixel point remains as an edge point, otherwise the pixel point will be suppressed.

Double threshold detection: screening the edge image by using double thresholds (a high threshold T1 and a low threshold T2), reserving strong edge points, excluding points lower than the low threshold, and connecting weak edge points between the high threshold point and the low threshold point to obtain a final edge image. The method comprises the following specific steps:

1) If the gradient value of the edge pixel is higher than the high threshold T1, marking the edge pixel as a strong edge pixel;

2) If the gradient value of the edge pixel is less than the high threshold T1 and greater than the low threshold T2, it is marked as a weak edge pixel;

3) If the gradient value of the edge pixel is smaller than the low threshold T2, it is suppressed.

Suppressing isolated low threshold points: by looking at the weak edge pixels and their 8 neighborhood pixels, the weak edge points can remain as true edges as long as one is a strong edge pixel. The method comprises the following specific steps:

1) Traversing all weak edge pixels;

2) For each weak edge pixel, checking whether 8 neighborhood pixels thereof have strong edge pixels;

3) If so, reserving the weak edge pixels; if not, the weak edge pixels are suppressed.

Further, S2 is specifically:

in the field of image recognition, deep learning is a very effective tool, where convolutional neural networks have excellent results in feature extraction. The convolution neural network carries out convolution on pixels on the image through different convolution kernels, and the different convolution kernels correspond to the features of different dimensions. Through the convolutional neural network, invisible features on the images can be extracted, so that the model can classify specific images through the image features. On this basis, resNet introduces the concept of residual blocks (skip links) that add input and output via skip connections, thus avoiding the problem of gradient extinction or explosion. Thus, resNet can build very deep feature extraction networks, thereby mining more image features. The specific feature extraction structure of ResNet is different according to different depths, and can be generally divided into the following parts:

a first layer: a 7 x 7 convolutional layer, step size 2, output channel number 64, followed by a 3 x 3 max pooling layer, step size 2.

A second layer: and a plurality of residual modules, wherein each residual module comprises two 3×3 convolution layers, the number of output channels is 64, and the step size is 1.

Third layer: and each residual module comprises three convolution layers, namely 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1, the number of output channels is 128, and the step size is 2.

Fourth layer: and each residual module comprises three convolution layers, namely 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1, the number of output channels is 256, and the step size is 2.

Fifth layer: and each residual module comprises three convolution layers, namely 1 multiplied by 1,3 multiplied by 3 and 1 multiplied by 1, the number of output channels is 512, and the step size is 2.

Sixth layer: and a global averaging pooling layer for converting the feature map of the last layer into a one-dimensional vector.

ResNet-18 has 2,2 residual modules, resNet-34 has 3,4,6,3 residual modules, and ResNet-50 has 3,4,6,3 residual modules. In particular, the network in its steps can be represented by the following mathematical model:

convolution layer: for the input image I and the convolution kernel K, the convolution operation can be expressed as:

where C (i, j) is an element of the ith row and jth column of the output feature map.

Pooling layer: for the input feature map F, the pooling operation can be expressed as:

where P (i, j) is the element of the ith row and jth column of the output feature map, s is the pooling step size, and m and n are the size of the pooling window.

Residual error module: for the input feature map X and residual function F, the residual module can be expressed as:

Y＝F(X)+X

where Y is the output signature and F (X) is a nonlinear function consisting of a convolution layer and an activation function.

Global average pooling layer: for the input feature map G, the global average pooling layer can be expressed as:

Further, S3 is specifically:

in the feature extraction module, the invention uses the same feature extraction module to extract the features of the support set and the query set, and obtains the feature map of the support set and the feature map of the query set. After that, the present invention uses a dense comparison module to perform dense comparisons of the support set feature map and the query set feature map. Because the support set is an image marked by image segmentation, the disease and pest category of the support set image is known, and the disease and pest category of the query set can be calculated by comparing the similarity between the feature map of the support set and the feature map of the query set. In particular, in the dense comparison module of the present invention, cosine Distance (Cosine Distance) is used as a measure function of similarity of feature images.

In comparing the similarity between features, the Euclidean distance (Euclidean Distance) is typically chosen as the comparison function. Euclidean distance refers to the actual distance between two points in n-dimensional space. The straight line distance between two points is calculated by the Pythagorean theorem. For two vectors a (x ₁ ,x ₂ ,…,x _n ),B(y ₁ ,y ₂ ,…,y _n ) The euclidean distance can be expressed as:

its advantages are simple process, and easy implementation. But it has the disadvantage of not being scale-invariant, which means that the calculated distance may be skewed in terms of units of elements. Typically, the data needs to be normalized before this distance metric can be used. Furthermore, as the dimension of the data increases, the Euclidean distance is less useful.

Unlike Euclidean distance, cosine distance (Cosine Similarity) is the similarity between two vectors determined by measuring the cosine value of the angle between the two vectors. It is usually used for positive space and thus gives a value between 0 and 1. It represents a relative difference in direction, irrespective of absolute differences in value. Its advantage is high Euclidean distance in high-dimensional data. But it has the disadvantage that the size of the vectors is not considered, but only their direction. For two vectors a, B in space, the remaining chordal distance is:

Further, S4 is specifically:

firstly, taking an output characteristic diagram of a dense comparison module and a prediction mask of the last iteration as inputs, and combining the output characteristic diagram and the prediction mask of the last iteration through a residual connection, wherein the formula of the residual connection is as follows:

M _t ＝x+F(x,y _t-1 )

wherein x is the output characteristic of the dense comparison module; y is _t-1 Is the prediction mask of the last iteration step, M _t Is the output of the residual block, and the function F (·) is the feature x and the prediction mask y _t-1 Is a cascade of (a) and (b).

Then, two common residual blocks and a hole space pyramid pooling module (ASPP) are used to capture multi-scale information, outputting a finer prediction mask.

And finally, taking the prediction mask as the input of the next iteration, and repeating the process until the preset iteration times are reached.

Specifically, for the third step, if the support set has multiple images, a more accurate support set feature map is obtained by using the attention mechanism module, and the steps specifically include:

when more than one image is supported centrally, the present invention employs an attention mechanism to fuse the results of comparisons of different support samples. Of course, simply averaging OR logically OR (Logical OR) the comparison results of the different support samples is not reasonable, as the variability and relevance of the different support samples to the query sample is ignored. Thus, the present invention adds an attention branch to the third step dense comparison module for calculating the weight of each support sample. The attention branch consists of two convolution blocks, the first having 256 3 x 3 filters followed by a 3 x 3 max pooling layer; the second convolution block has a 3 x 3 convolution layer followed by a global averaging pooling layer. The output of the attention branch is λ, representing the weight of each support sample. The weights of all support samples are then normalized here with the SoftMax function so that their sum is 1, whose calculation formula is:

finally, the characteristics of the different support samples are weighted and summed by the normalized weights, so that the final output characteristics are obtained.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation is characterized by comprising the following steps of:

2. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation according to claim 1, wherein the method comprises the following steps of,

in S1, the support set image is segmented through a Canny edge detector, and the segmented areas are marked manually.

3. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation according to claim 1, wherein the method comprises the following steps of,

the feature extraction of the support set and query set images in S2 uses a deep learning network model res net.

4. A method for identifying crop leaf diseases and insect pests based on semantic segmentation of small sample images according to claim 3,

the network in ResNet is represented by the following mathematical model:

Y＝F(X)+X

5. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation according to claim 1, wherein the method comprises the following steps of,

s3, calculating the plant diseases and insect pests of the query set by comparing the similarity between the feature map of the support set and the feature map of the query set.

6. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation according to claim 5, wherein the method comprises the following steps of,

in the dense comparison module of S3, the cosine distance is used as a measurement function of the similarity of the feature map, and for two vectors a, B in space, the rest chord distances are:

7. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation according to claim 5, wherein the method comprises the following steps of,

in S3, a plurality of images are collected in the support set, and then a more accurate support set feature map is obtained by using an attention mechanism module; the attention mechanism module consists of two convolution blocks, the first having 256 3 x 3 filters followed by a 3 x 3 max pooling layer; the second convolution block has a 3 x 3 convolution layer followed by a global averaging pooling layer; the output of the attention branch is λ, representing the weight of each support sample.

8. The method for identifying crop leaf diseases and insect pests based on small sample image semantic segmentation according to claim 7, wherein the method comprises the following steps of,

the weights of all support samples are normalized using the SoftMax function in S3 so that their sum is 1, and the specific calculation formula is as follows: