CN113239947B

CN113239947B - Pest image classification method based on fine-grained classification technology

Info

Publication number: CN113239947B
Application number: CN202110264082.0A
Authority: CN
Inventors: 钱蓉; 董伟; 程泽凯; 朱静波; 夏皖; 孔娟娟; 刘桂民; 张萌; 李闰枚; 王忠培; 管博伦
Original assignee: Agricultural Economy And Information Research Of Anhui Academy Of Agricultural Sciences
Current assignee: Agricultural Economy And Information Research Of Anhui Academy Of Agricultural Sciences
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-09-23
Anticipated expiration: 2041-03-10
Also published as: CN113239947A

Abstract

The invention relates to a pest image classification method based on a fine-grained classification technology, which overcomes the defect of poor pest fine-grained identification effect compared with the prior art. The invention comprises the following steps: acquiring a training image; constructing a pest identification network; training a pest recognition network; acquiring an image of the pest to be identified; and obtaining a pest identification result. The invention achieves the highest performance by using the characteristic filtering fusion and the design loss function, can be simultaneously suitable for the classification of similar pests and rough pests, and can obtain ideal effects. Meanwhile, the target can be concerned about by the pests with very complicated backgrounds and the colors and the forms of the pests close to the backgrounds, so that the targets can be accurately identified, and the pest category number of the automatic classification of the agricultural pests is further widened.

Description

Pest image classification method based on fine-grained classification technology

Technical Field

The invention relates to a pest image identification method, in particular to a pest image classification method based on a fine-grained classification technology.

Background

The conventional image classification refers to coarse classification, such as cat-dog classification, and the fine-grained classification refers to classifying a plurality of sub-categories in the same coarse classification, such as different types of dogs, different types of birds and the like, wherein the differences among the sub-categories are not obvious, and the individuals in the classes have obvious differences in posture, motion, appearance and the like, so that the fine-grained image classification task requires a classification model to extract fine feature information of a target object.

In recent years, fine-grained image classification attracts more and more attention, and the FGVC has great application in some practical scenes requiring fine classification. Extracting corresponding local features and global features based on a target part and fusing the local features and the global features are the classic method of fine-grained classification. (Berg and Belhumeur 2013) obtain features of different positions by means of positions marked manually for classification, wherein some methods (Zhang and Donahue 2014; Krause and Jin 2015; Huang and Xu 2016; Zhang and Xu 2016; Lam and Mahassei 2017; Wei and Xie 2018; Liu and Xie 2020) perform more accurate semantic segmentation and feature fusion by searching for the best position to obtain better fine-grained position feature representation. Some scholars (Simon and Rodner 2015; Zhang and Wei 2016; He and Peng 2017; Ge and Lin 2019; Wang and Wang 2020, Huang and Li 2020; and others) extract site features by weak supervision and unsupervised methods, reducing the cost of human labeling. Still other researchers (Zhang and Xiong 2016; Wang and Morariu 2018; see the discussion of the above discussion) have sought potential sites for targets with the aid of deep convolution filters, without the need for additional site labeling; some methods (Xiao and Xu 2015; Fu and Zheng 2017; Zheng and Fu 2017; Sun and Yuan 2018; Zheng and Fu 2019) incorporating the attribute mechanism have also been proposed, and more relevant site features are extracted by the attention mechanism. (Ji and Wen 2020) incorporates an attention mechanism on a binary neural tree structure network, learning the target representation from coarse to fine, and focusing on capturing discriminative features. In addition, Zhuang and Wang learn the different parts among similar objects by means of comparative learning, and merge the part characteristics which are obtained by difference through gating on the common characteristics for classification. The method has the working key point of extracting better differential part characteristics and fusing the part characteristics with the overall characteristics to achieve better classification effect.

Unlike methods that focus on location, methods based on end-to-end feature coding focus on extracting higher-order representations and interactive relationships of features of the target location. (Lin 2015) the bilinear CNN model is used for fine-grained classification, so that good classification effect is achieved, and researchers have strong research interest in an end-to-end method. After that, Gao 2016 proposes to replace the original full bilinear representation with a bilinearly-pooled low-dimensional compact representation in response to the deficiency of excessive bilinear feature dimensionality. Based on bilinear pooling, Cui and Zhou 2017 propose a generic pooling framework that captures higher-order interactions of features in the form of kernels. To further reduce the amount of bilinear computations, Kong and Fowles 2017 proposes a classifier co-decomposition method that compresses the model by decomposing the set of bilinear classifiers into a common factor and compact sub-class terms. Zheng and Fu 2019 proposes a Deep Bilinear Transform (DBT) block, which is used to divide the input channels evenly into several semantic groups. And the pair interaction in each group is respectively calculated to represent bilinear transformation, so that the calculation cost is greatly reduced. Furthermore, Yu and Zhao 2018 proposes a cross-layer bilinear pooling framework integrating multiple cross-layer bilinear features to capture the feature relationship of the interlayer position and enhance the representation capability of the feature relationship. Replacing classical first-order pooling with global covariance pooling in convolutional neural networks has yielded impressive improvements, but typically requires longer training times. Li and Xie 2018 provides an iterative matrix square root normalization method which can accelerate the training speed of an end-to-end network based on global covariance pooling. Engin and Wang 2018 propose end-to-end training by jointly learning local descriptors and pooling the representation by replacing covariance matrices with kernel matrices. In addition, Cai and Zuo 2017 et al represent activation of layered convolution as local representations of different proportions, capture high-order statistics of convolution activation by a polynomial-kernel-based predictor, and model partial interaction to obtain higher-order intra-layer and inter-layer feature relationships. Gao and Han 2020 designs a Channel Interaction Network (CIN) to model intra-image and inter-image channel interactions, explore intra-image channel correlations, and enable the model to learn complementary features from the correlated channels, thereby yielding stronger fine-grained features. The end-to-end method is simpler and more effective, extra shallow features are added to deep features learned by the model through the deconvolution block, and more fine-grained level information is provided for the model for classification in a mode of fusing smaller-scale detail features for final feature representation.

In fine-grained identification, the difficulty of inter-class separation is higher than that of a conventional image classification task, and some methods improve the fine-grained classification effect by improving a loss function. (Wang et al.2016) used triplet state losses to achieve better inter-class separation. However, triple loss increases the computational cost of training. (Dubey and Gupta 2018) by using the maximum entropy principle, a maximum entropy loss is proposed that can be used for fine-grained classification. (Dubey and Gupta 2018) then reduced overfitting during training by deliberately introducing clutter in the model output activation. Sun and Cholakkal 2020 et al designed a "gradient enhancement" loss function to accelerate model convergence by focusing on only the aliased class of each sample. However, the above method neglects that the confidence score distribution of the simple sample and the similar sample is very different during model training, and some difficult samples interfered by the similar sample are predicted correctly, but the confidence score distribution can be known that the prediction confidence of the model to the samples is insufficient, so that the recognition rate is low and the classification effect is poor.

Disclosure of Invention

The invention aims to solve the defect of poor pest fine-grained identification effect in the prior art, and provides a pest image classification method based on a fine-grained classification technology to solve the problem.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a pest image classification method based on a fine-grained classification technology comprises the following steps:

11) acquisition of training images: acquiring a pest image data set to be trained and preprocessing the pest image data set;

12) constructing a pest classification model: constructing a pest classification model based on a ResNet18 network and a cross entropy loss function, and marking as a DB _ RN18 model;

13) training a pest classification model: training a pest DB _ RN18 model by using a pest image data set to be trained, designing a loss function which is more sensitive to a classification result of the DB _ RN18 model, and completing the training in an end-to-end mode;

14) acquiring an image of the pest to be identified: acquiring a pest image to be identified and preprocessing the pest image;

15) obtaining a pest classification result: and inputting the preprocessed pest image into the trained DB _ RN18 model to obtain a classification result.

The construction of the pest classification model comprises the following steps:

21) setting a ResNet18 network, and adding two DeconvBlock modules between the last ResBlock layer and the classification layer of the ResNet18 network to construct a DB _ RN18 classification model;

22) adding Channel attention and Spatial attention mechanisms in both DeconvBlock modules;

23) constructing a loss function suitable for fine-grained image classification based on the cross entropy loss function: setting a Loss function which is sensitive to the confidence score of the fine-grained image recognition model, in the same batch size, predicting correctly, and under the three conditions that the confidence of a sample set in the same batch size is stable, predicting correctly but the confidence of the sample set is dynamically changed and predicting in the same batch size is wrong, respectively giving out different Loss functions for rewarding or punishing the model, and gradually lightening the punishment degree along with the depth addition of the model.

The training of the pest classification model comprises the following steps:

31) inputting a pest image data set to be trained into a ResNet18 network for training;

32) performing Channel attribute and Spatial attribute processing on the last layer of the convolutional layer feature map of the ResNet18 network to obtain an attribute map containing deep feature information of the convolutional layer feature map and expand the receptive field of the model;

33) carrying out deconvolution processing on the final layer of the convolution layer feature map of ResNet18, and expanding the size of the convolution feature map;

34) fusing the Attention map and the deconvolution feature map information, and extracting fused feature map information by using double convolution layers; namely, the output Op of the last ResBlock and the output Oss of the previous ResBlock with the same output scale are used as the input of the deconvolution block, and the expression is as follows:

wherein, N and M respectively represent the number of channels output by the previous ResBlock and the ResBlock with the same output scale, H and W represent the height and width of the output characteristic diagram, and H 'is 2 × H, and W' is 2 × W;

35) training a loss function of the classification of the fine-grained images;

351) model prediction error samples are given penalties: setting a penalty factor alpha for each error sample, wherein the value of the penalty factor is gradually reduced as the network depth is deeper and the error frequency is reduced, namely the penalty degree is reduced, and the expression is as follows:

wherein F _α (α ₀ )＝(α ₀ -1) ² ，N _bs Is the size of the current batch, s _n Prediction for all labels for the nth sampleThe number of points is given to the user,

predict confidence score, N, that is a true tag al for the nth sample in batch _l Number of true tags, s ^al And s is expressed as:

α ₀ and N _al Is the accuracy of the representation within the current batch, calculated as follows:

wherein N is _al The number of samples of which the labels corresponding to the predicted maximum scores are real labels is represented, namely the number of samples in one batch which are predicted correctly;

352) the model predicts a sample with correct prediction but low confidence score, sets an incentive factor beta, and distinguishes confidence score intervals between the sample with correct prediction but low confidence score and a sample with wrong prediction, so that the model can be competent for the precise identification task of fine-grained pests, and the expression form is as follows:

wherein, F _β () As a mapping function, β ₀ Represents the ratio of the second highest score to the highest score in all label scores corresponding to the correctly predicted sample when beta is ₀ Greater than 0.5, with beta ₀ Increasing, we believe that the model is progressively more penalized by the penalty, β ₀ Is calculated as follows:

wherein

A score representing all tags except the genuine tag;

353) the model predicts a correct sample with an unstable confidence score, sets a reward factor gamma, normalizes the confidence score of the correct sample, and leads the confidence score of the sample to tend to be stable, and the expression form is as follows:

where F γ () is the mapping function, γ ₀ The confidence level of the true tag representing the correct sample to be predicted to reach the maximum confidence score in the current batch is calculated as follows:

wherein

Predicting correctness for nth sample in batchConfidence score of true tag al of time, will be in case of prediction error

Is set to be 0;

354) the reward and punishment multiple

Is expressed in the form of

And giving a value a, wherein a belongs to [0,1), when the correct rate in the same batch reaches a set value, additionally punishing the model, wherein the loss calculation formula of the model is as follows:

wherein, the hyperparameter alpha is determined to be an optimal value through an optimization algorithm based on a sequence model.

Advantageous effects

Compared with the prior art, the pest image classification method based on the fine-grained classification technology achieves the highest performance by using the feature filtering fusion and the design loss function, can be simultaneously suitable for classifying similar pests and rough pests, and can obtain ideal effects. Meanwhile, the target can be concerned about by the pests with very complicated backgrounds and the colors and the forms of the pests close to the backgrounds, so that the targets can be accurately identified, and the pest category number of the automatic classification of the agricultural pests is further widened.

The method uses the reverse convolution block to introduce shallow information into the deep layer of the model, and filters background characteristics through attention to strengthen the attention target; the S3-Loss lifting model is more sensitive to the low confidence coefficient phenomenon, so that similar samples or difficult samples are separated, a corresponding Loss function form is selected according to the confidence score distribution of each sample, the model is more sensitive to the similar samples, and the purpose of separating the similar samples is achieved.

Drawings

FIG. 1 is a sequence diagram of the method of the present invention.

Detailed Description

So that the manner in which the above recited features of the present invention can be understood and readily understood, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings, wherein:

as shown in FIG. 1, the pest image classification method based on the fine-grained classification technology comprises the following steps:

the first step of training image acquisition: and acquiring a pest image data set to be trained and preprocessing the pest image data set.

Secondly, constructing a pest classification model: a pest identification network is constructed based on the ResNet18 network. Compared with the Baseline network, the main improvements here include: (a) between the last resplock of the original ResNet18 network and the classification layer, two deconvblocks are added. The two deconvolution blocks promote the scale of the feature map, introduce shallow features, add more related detailed information to the feature map, weaken background features by using the Attention in the DeconvBlock, and help the model to more accurately extract the features of the target; (b) and S3-Loss predicts the fraction according to the sample given by the model, and the design operator imposes different punishments on the similar sample and the difficult sample to finely adjust the CE-Loss of the sample.

The method expands the size of the final layer of the ResNet18 output feature diagram, fuses features of different layers, and prepares for adding more fine features for a deep network. In a DeconvBlock, let the last ResBlock output feature map Op of a Backbone successively pass through a Deconv layer and a Conv layer, so as to enlarge the scale of the feature map, in an upper branch, use 1 × 1 convolution to promote the channel number of a shallow feature map, combine with Attention to useful features, reduce background feature interference, and then add the results of the upper and lower branches to obtain a fused feature map Odc ═ Odc { (Odc) } _n ：n∈[1,N]}，

The specific calculation is as follows:

Odc＝Op′+Ossa′

Op′＝Cn(Dcn(Op))

Ossa′＝A(Oss′)

where Op' denotes the upscaled feature map Op, Cn and Dcn represent convolution and deconvolution operations, respectively, A denotes the Attention operation performed on the feature map,

to boost the number of channels from M to a1 x 1 convolution of N,

performing Attention operation on the feature map Oss 'after the channel is lifted, wherein Ossa' contains strengthened target shallow features, and the specific calculation in the Attention is as follows:

W _ca ＝GAP(Oss′)+GMP(Oss′)

we use channel-based weights W _ca ,

And a space-based weight W _sa ,

From the superficial layerObtaining effective details of the target as much as possible in the feature map, wherein GAP represents global average pooling, GMP represents global maximum pooling, Mean and Max represent operations of averaging the feature map and the maximum feature map in the Oss' channel dimension, respectively, taking the two feature maps as the input of a convolution layer with only 7 × 7kernel, and finally obtaining the space-based weight W _sa 。

Residual connection we further extracted the fused features using the dual-layer convolution layer and performed a residual connection to obtain the output Odb of the whole deconvolution block { Odb ═ b _n :n∈[1,N]},

Odb＝Odc+Odc′

Odc′＝Cn(Cn(Odc)) (2)

Where Odc' represents the signature after Odc passes through two convolutional layers.

The output Odb of the last DeconvBlock enters the classifier, and finally the prediction score s of the model is obtained, i.e., { s {(s) } _l :l∈[1,L]},

L represents the number of real tags.

s＝Dense(GAP(Odb)) (3)

Wherein GAP represents global maximum pooling, density represents a fully-connected layer, and dropout is set to 0.5 between the pooling layer and the fully-connected layer.

The construction of the pest identification network comprises the following specific steps:

(1) setting a ResNet18 network, and adding two DeconvBlock modules between the last ResBlock layer and a classification layer of the ResNet18 network to construct a DB _ RN18 classification model;

(2) adding Channel attention and Spatial attention mechanisms in both DeconvBlock modules;

(3) constructing a loss function suitable for fine-grained image classification based on the cross entropy loss function: setting a Loss function which is sensitive to the confidence score of the fine-grained image recognition model, in the same batch size, predicting correctly, and under the three conditions that the confidence of a sample set in the same batch size is stable, predicting correctly but the confidence of the sample set is dynamically changed and predicting in the same batch size is wrong, respectively giving out different Loss functions for rewarding or punishing the model, and gradually lightening the punishment degree along with the depth addition of the model.

Thirdly, training a pest classification model: and training the pest recognition network by using the pest image data set to be trained.

In the design of the Loss function, we design an S3-Loss sensitive to the confidence score of the model, and calculate three additional weights for each sample in the two cases of prediction correct and wrong respectively, and finally add to obtain omega, that is, the original CE-Loss of each sample needs to be increased by an additional multiple, thereby imposing additional punishment on the model. Let the prediction Score of the model be denoted S, and each time before S3-Loss needs to be performed, we translate S e [ min (S), max (S) ], to S e [ min (S) + (0-min (S)) + upsilon, maxs +0-mins + upsilon, i.e. S e upsilon, maxs-min (S) + upsilon, where upsilon is 1 e-12. In order to make different degrees of punishment on the behavior of model prediction errors during training, we give severe punishment on the behavior of model prediction errors in the early stage of training, but the extra punishment degree becomes lighter and lighter along with the deep training. The method comprises the following specific steps:

(1) and inputting a pest image data set to be trained into a ResNet18 network for training.

(2) And (3) performing Channel attribute and Spatial attribute processing on the last convolutional layer characteristic diagram of the ResNet18 network to obtain an attribute map containing deep characteristic information of the convolutional layer characteristic diagram, and expanding the receptive field of the model.

(3) Deconvolution is performed on the final convolutional layer signature of ResNet18 to expand the convolutional signature size.

(4) The Attention map and the deconvolution feature map information are fused, the fused feature map information is extracted by using the double convolution layers, and the feature information obtained by adopting the class activation map verification model is richer than that obtained by adopting the traditional method; (ii) a Namely, the output Op of the last ResBlock and the output Oss of the previous ResBlock with the same output scale are used as the input of the deconvolution block, and the expression is as follows:

Oss＝{Oss _m :n∈[1,M]},

where N and M denote the number of channels output by the previous ResBlock and the ResBlock of the same output scale, respectively, H and W denote the height and width of the output profile, and H '═ 2 × H, and W' ═ 2 × W.

(5) Setting a Loss function which is sensitive to the confidence score of a fine-grained image recognition model, predicting correctly in the same batch, and giving out three conditions of stable confidence of a sample set, correct prediction but dynamic change of the confidence of the sample set and wrong prediction in the batch, wherein the three conditions are that the sample set in the same batch is correct in prediction, different Loss functions are respectively used for rewarding or punishing the model, and the punishment degree is gradually reduced along with the addition of the model depth;

A1) model prediction error samples are given penalties: setting a penalty factor alpha for each error sample, wherein the value of the penalty factor is gradually reduced as the network depth is deeper and the error frequency is reduced, namely the penalty degree is reduced, and the expression is as follows:

wherein F _α (α ₀ )＝(α ₀ -1) ² ，N _bs Indicating the size of the current batch,

confidence score, s, representing the prediction of the nth sample in batch as a true tag al ^al The expression of (a) is:

wherein s is _n Indicating that the nth sample corresponds to the prediction scores of all tags,

nl represents the number of real tags;

A2) the model predicts a sample with correct prediction but low confidence score, sets a reward factor beta, and distinguishes confidence score intervals between the sample with correct prediction but low confidence score and a sample with wrong prediction, so that the model can be competent for the precise identification task of fine-grained pests, and the expression form is as follows:

wherein beta is ₀ Represents the ratio of the second highest score to the highest score in all label scores corresponding to the correctly predicted sample when beta is ₀ Above 0.5, with beta ₀ Increasing, we believe that the model is progressively more penalized by the penalty, β ₀ Is calculated as follows:

wherein

Scores representing all tags except the genuine tag;

A3) the model predicts a correct sample with an unstable confidence score, sets a reward factor gamma, normalizes the confidence score of the correct sample, and leads the confidence score of the sample to tend to be stable, and the expression form is as follows:

wherein, γ ₀ The degree to which the confidence of the genuine tag representing the correct sample predicted can reach the maximum confidence score in the current batch is calculated as follows:

wherein

For the sample n in the batch, the confidence score of the real label al when the prediction is correct will be

Setting to 0;

A4) the reward and punishment multiple

The expression of (b) is set to ω ═ α + β + γ,

the super-parameter alpha is determined by a sequence model-based optimization algorithm, wherein the super-parameter alpha is determined by the sequence model-based optimization algorithm, and the model accuracy is optimal when alpha is 0.8 as can be known through experimental analysis.

Fourthly, obtaining an image of the pest to be identified: and acquiring a pest image to be identified and preprocessing the pest image.

Fifthly, obtaining a pest identification result: and inputting the preprocessed pest image into the trained DB _ RN18 model to obtain a classification result.

In order to verify the effectiveness of the algorithm, under the same framework and experimental environment, three network models, namely a Baseline network RN18, a classic BLCNN algorithm and an RN18_ DB, are reproduced, wherein the BLCNN algorithm is a typical end-to-end training-based method, and the RN18_ DB uses a peak suppression and gradient enhancement loss function method for the first time in the field of fine-grained image classification, so that a remarkable classification effect is achieved. The three models are compared with the method provided by the invention, and the comparison experiments are respectively carried out on a pest fine-grained data set ArgFIP20 and an agricultural pest data set AgrIP138, and the detailed experiment results are shown in Table 1.

TABLE 1 comparison of the results of the three models and the method of the present invention

Above, the four different algorithms adopt the experimental result of the ab initio training mode on the ArgFIP20 and the ArgIP138, and in general, the algorithm based on the Resnet18 is superior to the VGG16, and is 2.74% higher than that of the Baseline algorithm, and is improved by 1.11% compared with the latest algorithm in the fine-grained field.

Analysis of experimental results shows that the method provided by the invention achieves ideal effects on both ArgFIP20 data set and ArgIP 138. On ArgFIP20, a peak suppression method is adopted to force a model to automatically find more areas with differences, a self-adaptive method without position marking has better performance on pest data sets with different types and inconsistent target position structures, and the automatic identification accuracy rate reaches 92.25%. According to the invention, the anti-convolution layer is introduced to enable the model to learn more detailed information, the background characteristics are filtered by an attention mechanism, the related characteristics of the target are strengthened, and the detailed characteristics of the target body can be accurately captured. On ArgIP138, the classification of the invention reaches 98.23%, the recognition effect is better than that of the other three models, however, RN18_ DB adopts a random mode to acquire more fine features, but in this case, compared with Baseline, the classification is not effectively improved, and the recognition accuracy is lower than that of Baseline.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A pest image classification method based on a fine-grained classification technology is characterized by comprising the following steps:

121) setting a ResNet18 network, and adding two DeconvBlock modules between the last ResBlock layer and a classification layer of the ResNet18 network to construct a DB _ RN18 classification model;

122) adding Channel attention and Spatial attention mechanisms in both DeconvBlock modules;

123) constructing a loss function suitable for fine-grained image classification based on the cross entropy loss function: setting a Loss function which is sensitive to the confidence score of the fine-grained image recognition model, in the same batch size, predicting correctly, and under the three conditions that the confidence of a sample set in the same batch size is stable, predicting correctly but the confidence of the sample set is dynamically changed and predicting in the same batch size is wrong, respectively giving out different Loss functions for rewarding or punishing the model, and gradually lightening the punishment degree along with the depth addition of the model;

the training of the pest classification model comprises the following steps:

131) inputting a pest image data set to be trained into a ResNet18 network for training;

132) carrying out Channel attribute and Spatial attribute processing on the output characteristic diagram of the ResNet18 network upper branch ResBlock to obtain an attribute map containing deep characteristic information of the convolutional layer characteristic diagram and expand the receptive field of the model;

133) performing deconvolution processing on the feature map of the last convolutional layer of the ResNet18, and expanding the size of the convolutional feature map;

134) fusing the Attention map and the deconvolution feature map information, and extracting fused feature map information by using double convolution layers; namely, the output Op of the last ResBlock and the output Oss of the up branch ResBlock are used as the input of the deconvolution block, and the expression is as follows:

Oss＝{Oss _n' :n'∈[1,M]},Oss∈R ^{M×H′×W′} ，

wherein, N and M respectively represent the number of channels output by the last ResBlock and the upstream branch ResBlock, H and W represent the height and width of the output characteristic diagram, and H '═ 2 × H, W' ═ 2 × W;

135) training a loss function of fine-grained image classification;

14) obtaining an image of the pest to be identified: acquiring a pest image to be identified and preprocessing the pest image;