CN116758415A

CN116758415A - Lightweight pest identification method based on two-dimensional discrete wavelet transformation

Info

Publication number: CN116758415A
Application number: CN202310616466.3A
Authority: CN
Inventors: 李晖; 谭廷俊; 胡欣仪; 唐栩燃; 罗伟; 赵雪如; 李超然; 赵泽华
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2023-09-15

Abstract

The invention relates to a light pest identification method based on two-dimensional discrete wavelet transform (2D-DWT), which comprises the following steps: carrying out spatial multi-scale feature fusion and downsampling on an input image by using a two-dimensional discrete wavelet transform module (2D-DWTM), and carrying out feature extraction under the condition that the image scale is unchanged; the Residual Attention Module (RAM) is used for improving the attention of the network to important characteristics while reducing the gradient vanishing problem of the network, so that the characterization capability of the network is improved; compressing the characteristic channels by using global average pooling (avgpool), reducing the size and complexity of the characteristic map, and improving the generalization capability of the model; finally, the classification is completed by using a full connection layer (Linear) as a classification module. The invention adopts a multi-scale feature fusion and attention weighted fusion mechanism, effectively reduces the interference of complex background pictures on pest classification, and improves the classification accuracy. Meanwhile, by optimizing the memory occupation of the model, the more efficient classification precision is realized, the industrial popularization is facilitated, and important contribution is made to the field of agricultural protection.

Description

Lightweight pest identification method based on two-dimensional discrete wavelet transformation

Technical Field

The invention belongs to the field of deep learning, and relates to a light pest identification method based on two-dimensional discrete wavelet transform.

Technical Field

With the continuous aggravation of greenhouse effect, the plant diseases and insect pests of agriculture and forestry are increased increasingly, and the phenomenon of grain yield reduction is more and more common. In order to solve this problem, elaborate preventive and control measures are required. At present, a mode of manually identifying pests is generally adopted, but the mode has high cost, low efficiency and large workload. Therefore, an efficient and low-cost automatic pest identification algorithm needs to be researched to reduce the production cost and improve the agricultural production efficiency.

The traditional pest automatic identification algorithm generally adopts a machine learning technology and is divided into three steps of image preprocessing, feature extraction and classification. In the preprocessing stage, the algorithm enhances significant areas in the image and removes background noise. Then, in the feature extraction stage, the algorithm will extract the features such as color, texture and shape of the pest image. Finally, the algorithm classifies the pest images using a support vector machine (Support Vector Machine, SVM), adaboost, or artificial neural network (Artificial Neural Network, ANN), etc. However, the algorithm has the limitations of low precision, low robustness, too much dependence on manual skills in the feature extraction process and the like.

Deep learning technology is widely applied in the field of pest automatic identification, wherein a convolutional neural network-based method becomes the mainstream. In order to achieve accurate identification of pests in a complex setting, researchers have started from the following three aspects. The first method is a pest identification method based on visual saliency features of a graph by extracting and highlighting salient regions in an input image and then performing feature extraction and classification using a convolutional neural network. However, conventional saliency algorithms have difficulty extracting high-level semantic information because complex background disturbances such as color and texture are often present in pest images. The second method is a pest identification method combining an attention mechanism, which adds a channel or a spatial attention mechanism in a convolutional neural network to enhance the feature extraction capability of the network. However, the attention mechanism increases the number of parameters of the model, the spatial relationship between the features is not processed sufficiently, and the convolutional neural network with small receptive field cannot extract high-level semantic information, so that good balance can not be achieved in accuracy and speed. The third method is a pest identification method integrating multiple models, and the method is characterized in that different models are trained, and then the structures and weights of the models are integrated to obtain a new pest identification model, so that pest identification with higher accuracy is realized. However, the method only uses the accuracy as a unique index, so that the model parameter amount is large, the training cost is high, the training and deployment on high-performance computing equipment are often required, and certain limitations exist for environments such as farmlands, mountain areas and the like with weak signals.

Disclosure of Invention

In view of the above, an object of the present invention is to use the characteristics of two-dimensional discrete wavelet transform, that is, to enable lossless feature extraction of an image, to reduce the spatial size of a feature map for the purpose of reducing the amount of computation, and to enable faster expansion of the receptive field compared to convolution. In addition, aiming at pest identification in a complex environment, the residual error attention module is added, so that the attention of a network to key features can be improved, and the problem of gradient disappearance is solved.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a light pest identification method based on two-dimensional discrete wavelet transformation comprises the following steps:

s1: carrying out space multi-scale feature fusion and downsampling on an input image by using a two-dimensional discrete wavelet transform module, and carrying out feature extraction under the condition that the image scale is unchanged;

s2: the residual attention module is used for improving the attention of the network to important features while reducing the gradient vanishing problem of the network, so that the characterization capability of the network is improved.

S3: compressing the characteristic channels by using global average pooling, reducing the size and complexity of the characteristic map, and improving the generalization capability of the model;

s4: and finally, using the full connection layer as a classification module to finish classification.

Further, in step S1, the specific operation of the two-dimensional discrete wavelet transform module is as follows:

s1.1: extracting the image characteristics which can be learned and are unchanged in space through a convolution layer;

s1.2: performing spatial token mixing and downsampling using a two-dimensional discrete wavelet transform to perform scale-invariant feature extraction;

s1.3: carrying out channel mixing by using a learnable multi-layer perceptron;

s1.4: then restoring the spatial resolution of the feature map using the transposed convolutional layer;

s1.5: and finally, adjusting the output format to be the same as the input by using batch normalization operation, and connecting the output format with the input in series to output, so that the problem of network gradient disappearance is reduced.

Further, in step S2, the residual attention module is used to reduce the gradient vanishing problem of the network and improve the attention of the network to the important features, so as to improve the characterization capability of the network, and the expression is as follows:

x _out ＝RAM(x _in )

wherein x is _in Representing input, x _out Representing the output, RAM (. Cndot.) represents the residual attention module.

Further, the step S3 specifically includes: the feature channels are compressed by using global average pooling, the size and complexity of feature graphs are reduced, the generalization capability of a model is improved, and the expression is as follows:

x _out ＝avgpool(x _in )

wherein avgpool (·) represents global average pooling.

Further, the step S5 specifically includes: the full-connection layer is used as a classification module to complete classification, and the expression is as follows:

Pre＝Linear(x _in )

the Linear (·) represents a classification module, and a final prediction result is obtained by transmitting the processed one-dimensional vector to the full connection layer.

The invention has the beneficial effects that:

(1) The invention tries to apply the two-dimensional discrete wavelet transformation to the agricultural pest identification, and the two-dimensional discrete wavelet transformation can perform lossless feature extraction on the image; strong priori knowledge of the image, such as translational invariance, proportional invariance and edge sparsity, can be effectively learned; meanwhile, the space size of the feature map is reduced, so that the memory and time required by forward and backward transmission are reduced, the occupied memory of the model is reduced, and the industrial popularization is facilitated.

(2) According to the pest image with the complex background characteristics, the residual attention module is added to carry out weighted fusion on important features in the image, so that the attention degree and generalization capability of the important features are enhanced. Therefore, the influence of the complex background image on the pest classification accuracy is effectively reduced, the classification accuracy is improved, the accurate classification of the pest species is achieved, and the contribution is made to the field of agricultural protection.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

For the purposes, technical solutions and advantages of the present invention that will become more apparent, the present invention will be described in detail below with reference to the attached drawings, wherein:

FIG. 1 is a flow chart of a lightweight pest identification method based on two-dimensional discrete wavelet transform according to the present invention;

FIG. 2 is a diagram of a lightweight pest identification network framework based on two-dimensional discrete wavelet transform according to the present invention;

FIG. 3 is a schematic diagram of a two-dimensional discrete wavelet transform module according to the present invention;

FIG. 4 is a schematic diagram of a residual attention module of the present invention;

FIG. 5 is a comparison of the parameters and recognition accuracy of the present invention with other lightweight models;

Detailed Description

The advantages and effects of the present invention will be readily apparent to those skilled in the art from the following description, by describing specific embodiments of the present invention by the following specific examples. Furthermore, the invention is capable of other and different embodiments or of being practiced or of being carried out in various ways. Various modifications or changes may be made in the details of the description and in the examples which follow, in view of the various aspects and applications, but it must be ensured that the spirit of the invention will not depart. It is noted that the schematic diagrams provided below are only illustrative of the basic idea of the invention, and features can be combined with each other without conflict.

The schematic is provided only to illustrate examples of the invention and not to limit it. The components in the figures may be omitted, scaled or enlarged for better illustration of the embodiments, and do not represent actual product dimensions. Some well-known structures and descriptions thereof may be omitted to the skilled person.

The same reference numbers correspond to the same or similar elements. In the description of the present invention, if the terms "upper", "lower", "left", "right", "front", "rear", etc. are used as words of description, they are merely for convenience of description and for simplicity of description, and do not necessarily mean that the apparatus or element being referred to has a particular orientation or is constructed and operated in a particular orientation. Accordingly, the terms of positional relationship in the drawings are merely for illustration and are not to be construed as limiting the invention. The specific meaning of the above-described location indication words can be understood by those skilled in the art according to actual circumstances.

FIG. 1 is a flowchart of a lightweight pest identification method based on two-dimensional discrete wavelet transform according to the present invention, and is further described below with reference to FIG. 1, the method mainly includes the steps of:

step 1: carrying out space multi-scale feature fusion and downsampling on an input image by using a two-dimensional discrete wavelet transform module, and carrying out feature extraction under the condition that the image scale is unchanged;

step 2: the residual attention module is used for improving the attention of the network to important features while reducing the gradient vanishing problem of the network, so that the characterization capability of the network is improved.

Step 3: compressing the characteristic channels by using global average pooling, reducing the size and complexity of the characteristic map, and improving the generalization capability of the model;

step 4: and finally, using the full connection layer as a classification module to finish classification.

FIG. 2 is a diagram showing the overall structure of the network constructed this time, and the input image is x _in The size of the input image at this time is h×w×c, where C is the number of channels input, H is the height of the image, and W is the width of the image. In the figure, n represents the number of input images, and f represents the number of channels input to the two-dimensional discrete wavelet transform module. First, the input image is passed through a convolution layer to create a feature map, i.e., a feature map that can be trained is extracted. Then go to feature map F _r1 The feature map has the size of H multiplied by W multiplied by f, and the two-dimensional discrete wavelet transformation module of the input stack performs further feature extraction and parameter reduction, so that the effect of reducing the calculated amount can be achieved while the image features are acquired. To further enhance the attention of the network to the key features, a feature map F obtained by a two-dimensional discrete wavelet transform module is obtained _r2 The characteristic size is still H×W×f, and the characteristic size is further input into a residual attention module. The residual attention module further improves the attention degree of key features through stacking of main branch residual blocks and weighted fusion of mask branches, and outputs a feature map F _r3 。F _r3 The size of (2) is H×W×f, in which case the global average pooling layer pair F is used _r3 The compression of the feature map is performed to reduce the size and complexity of the feature map. Finally, compressing the obtained characteristic diagram F _r4 And classifying by using the full connection layer to finally obtain the prediction probability of the pest image.

FIG. 3 is a block diagram of a two-dimensional discrete wavelet transform module of a lightweight pest identification method based on two-dimensional discrete wavelet transform according to the present invention, and the present invention is described below with reference to FIG. 3The structural principle of the invention is further described. From the figure it can be seen that the model of the invention contains 5 layers in total, first the convolution layer, the use of a trainable convolution before the wavelet transform is a key design aspect of our architecture, since it only allows to extract feature mappings that fit the selected wavelet basis functions. Input tensor X _in The size of the vector is H multiplied by W multiplied by C, and after passing through the convolution layer, the tensor channel number is reduced by four times to obtain X ₀ The size is H X W X C/4, namely:

x ₀ ＝c(x _in ,ε)；

wherein X is _in Is the input tensor, X0 is the output tensor, c (·) represents the convolution operation, ε represents the corresponding set of trainable parameters. The second layer is to perform two-dimensional discrete wavelet transform to obtain tensor X ₀ The effect of reducing the spatial resolution is achieved by calculating the approximate sub-band and the three detail sub-bands by performing two-dimensional discrete wavelet transform, and the output tensor is X ₁ The size of the input channel is H/2 XW/2 XC/4, namely, after two-dimensional discrete wavelet transformation operation, both tensor H and W are reduced to 1/2 of the input channel, and the channel number is unchanged. Then, through aggregation operation, the four sub-graphs obtained by two-dimensional discrete wavelet transformation are subjected to characteristic fusion by using a multi-layer perceptron so as to lead the characteristic tensor X ₂ Is restored to H/2 XW/2 XC, and then the spatial resolution of the feature tensor is restored and output by transpose convolution layer restoration feature mapping and batch normalization, i.e., feature tensor X ₃ Is H W C.

Fig. 4 is a schematic diagram of a residual attention module according to the present invention, in which an input picture is first processed by a single-layer residual module, and the size of the feature map is h×w×c. Where H is the length of the image. W is the width of the image and C is the number of channels of the image. It is then split into a trunk branch and a mask branch for processing. The main branch performs three residual operations, and the image size remains unchanged and is h×w×c. For mask branching, the picture is first reduced in size to half by max-pooling downsampling, where the image size is H/2 xw/2 xc. And then carrying out residual error processing twice, and carrying out upsampling by utilizing bilinear interpolation to restore the original size to H multiplied by W multiplied by C. Finally, by adding to the trunk branches and performing a series of normalization, convolution, etc., a tensor between 0 and 1 is finally obtained, the size of which is H×W×C. The tensor is then added and multiplied with the tensor of the trunk branch, i.e. by weighted fusion, the effect of enhancing the important features is achieved.

In general, the module consists of two key components, namely a backbone branch and a mask branch. Wherein the main branches are responsible for extraction of image features and the mask branches are used to learn the attention mask that soft weights the output features. Specifically, the input of the module is split into two different paths after residual layer processing. One path directly enters the next layer, the other path is weighted by the attention mechanism and then multiplied by the first path and added, and finally enters the next layer for processing, namely:

H _i,c (x)＝(1+M _i,c (x))×F _i,c (x)

where x is the input, i traverses all spatial locations, and C is the index of the channel (c.epsilon.1..C). Furthermore, M (x) is the output of the mask branch, and F (x) is the original feature of the trunk branch.

As can be seen from the simulation result of FIG. 5, the model designed by the invention has higher recognition accuracy while realizing light weight.

The above-described embodiments are merely illustrative of the technical solutions of the present invention, and should not be construed as limiting the present invention. While the invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A lightweight pest identification method based on two-dimensional discrete wavelet transform is characterized in that: the method comprises the following steps:

2. The method for identifying lightweight pests based on two-dimensional discrete wavelet transform according to claim 1, wherein: in step S1, the specific operation of the two-dimensional discrete wavelet transform module is as follows:

s1.3: carrying out channel mixing by using a learnable multi-layer perceptron;

3. The method for identifying lightweight pests based on two-dimensional discrete wavelet transform according to claim 1, wherein: in the step S2, the residual attention module is used to reduce the gradient vanishing problem of the network and improve the attention of the network to the important features, thereby improving the characterization capability of the network, and the expression is as follows:

x _out ＝RAM(x _in )

4. The method for identifying lightweight pests based on two-dimensional discrete wavelet transform according to claim 1, wherein: the step S3 specifically includes: the feature channels are compressed by using global average pooling, the size and complexity of feature graphs are reduced, the generalization capability of a model is improved, and the expression is as follows:

x _out ＝avgpool(x _in )

wherein avgpool (·) represents global average pooling.

5. The method for identifying lightweight pests based on two-dimensional discrete wavelet transform according to claim 1, wherein: the step S4 specifically includes: the full-connection layer is used as a classification module to complete classification, and the expression is as follows:

Pre＝Linear(x _in )