CN115690541A - Deep learning training method for improving recognition accuracy of small sample and small target - Google Patents

Deep learning training method for improving recognition accuracy of small sample and small target Download PDF

Info

Publication number
CN115690541A
CN115690541A CN202211357500.1A CN202211357500A CN115690541A CN 115690541 A CN115690541 A CN 115690541A CN 202211357500 A CN202211357500 A CN 202211357500A CN 115690541 A CN115690541 A CN 115690541A
Authority
CN
China
Prior art keywords
small
samples
sample
feature
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211357500.1A
Other languages
Chinese (zh)
Inventor
王祎
徐振宇
刘怡光
楼旭东
房景鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202211357500.1A priority Critical patent/CN115690541A/en
Publication of CN115690541A publication Critical patent/CN115690541A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a deep learning training method for improving the identification accuracy of small samples and small targets, and relates to the technical field of identification of small samples and small targets. The invention provides a feature enhancement module for fusing a double-attention machine mechanism aiming at the problem of overhigh background similarity of different small samples and small target types during training; secondly, aiming at the over-fitting problem possibly generated by prediction under the condition of a single sample, a feature generation module based on Gaussian distribution is provided to improve generalization capability; finally, three typical small sample training methods are unified into a two-stage training model to be integrated into the proposed method. The idea and the improvement are applied to the traditional pest classification data set IP102 for the first time, and the identification accuracy can be improved by 2.11 to 6.87 percent on the basis of a reference method.

Description

Deep learning training method for improving identification accuracy of small sample and small target
Technical Field
The invention relates to the technical field of small sample and small target recognition, in particular to a deep learning training method for improving the recognition accuracy of small samples and small targets.
Background
Agricultural problems concern the population, and the large variety of pests brings great challenges to grain production and crop safety, so that safe and efficient identification of agricultural pests is particularly important. Meanwhile, the image recognition technology based on deep learning has also made a rapid progress, the performance of various improved convolutional neural networks and transform mechanisms in certain specific scenes is better than that of human beings, and in the face of the trade-off of economy and efficiency, students make various active attempts to use a machine vision method in the field of agricultural pest recognition.
In the identification research process of small samples and small targets in complex farmland ecology, wuxiang uses a 12-layer convolutional neural network to identify 128 x 128 input color images in 2016, and the identification accuracy rate of 76.7 percent is achieved on the identification of 10 types of pests; the Cheng and the like identify the pests under the complex farmland background in 2017 through a depth residual error module, and the accuracy rate reaches 99.67%; in 2018, loeming proposed a lightweight model suitable for mobile terminal deployment, and the recognition accuracy rate is 93.5% under the condition of ensuring certain efficiency; according to the method, agricultural pest identification is researched by means of a transfer learning idea in 2019, various super parameters are adjusted, and the accuracy rate of identification of cited three types of pests is up to 95% or more; in 2020, NANNI and the like fuse a significance method and a convolutional neural network on a more challenging ip102 data set, so that the recognition accuracy rate is 61.93%; in 2021, HRIDOY and the like use different convolutional neural networks to perform comparison experiments on early okra pest images, wherein the MobileNet V2 has the best expression effect and achieves the recognition accuracy of 98%. Although the existing research achieves better results, the recognition accuracy rate of the existing research depends on the data set seriously, and the new untrained class is unfair to the restraint
In order to further improve the recognition accuracy of agricultural pest recognition under the condition of small samples and small targets, three most typical small sample training methods, namely a matching network, a prototype network and a graph neural network, are available at present. The three small sample learning methods have advantages in data with different characteristics, but the three typical small sample training methods still have the following defects:
1. the three typical network small sample training methods only pay attention to the fact that the measurement method is appropriate and whether the critical characteristic information of each pest category is lost in the process of mapping the image information to the characteristic space is ignored, and if each pest category lacks a unique characteristic identifier, the classification effect is influenced.
2. According to the three typical network small sample training methods, even though the mapped feature vectors are enough to keep the category information of each pest image, only one picture is used for classification guidance of each type of pests in the testing stage, if the features of the pictures are not clear or some non-critical information is too strong, the classification result is guided wrongly, and the accuracy of pest and disease features is influenced.
3. Image recognition techniques based on deep learning must be trained on a large number of labeled samples before specific applications, however, in real scenes target domain samples may be very rare. The existing small sample training method can be used for predicting overfitting possibly generated in a small sample scene and possibly existing small target and background confusion during training.
Disclosure of Invention
The invention aims to provide a deep learning training method for improving the identification accuracy of small samples and small targets, provides a universal two-stage training model to fuse the existing mainstream method and enhance the performance of the mainstream method, and provides a feature enhancement module for fusing a double-attention machine system aiming at the problem of confusion of the small targets and the background possibly existing during training. Secondly, a feature generation module based on Gaussian distribution is provided for the overfitting problem possibly generated by prediction under a small sample scene so as to improve generalization capability. Finally, the improvement idea is unified into a two-stage training model, and 2.11% to 6.87% of unequal improvement effects are achieved on three typical small sample identification methods.
In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:
the deep learning training method for improving the identification accuracy of small samples and small targets divides the traditional small sample learning training method into two stages of training models, and introduces a double-attention machine system to strengthen the difference information characteristics of different small samples and small targets in the training stage; in the verification stage, a feature generation module based on Gaussian distribution generates a related sample by means of a small sample and small target information with rich base classes, and corrects the biased distribution caused by the scarcity of test samples.
Furthermore, the double attention mechanism is formed by fusing a space attention mechanism and a channel attention mechanism, and is integrated into a feature extractor which is trained based on a traditional small sample learning training method and can extract key information of small samples and small target categories, so that the key information features of the small samples and the small targets are enhanced;
furthermore, in the verification stage, the characteristics of the key information of the small samples and the small targets are assumed to be Gaussian distribution, the mean value and the variance of the classes of the base class small samples and the small targets on the characteristic space are calculated based on the characteristic extraction network of the key information in the training stage, the labeled samples are generated based on the nearest neighbor algorithm and the similar class characteristics of the new class small samples and the small targets, and then a classifier is trained by combining the generated labeled samples with the new class small samples and the small target support set;
further, the steps of performing backbone network information fusion on the channel attention mechanism and the space attention mechanism are as follows:
step 1, the channel attention mechanism is obtained by carrying out data characteristic processing on new small samples and small targets of average uniform pooled downsampling and maximum pooled downsampling, and the expression is as follows:
Figure BDA0003920660480000031
F∈R C×H×W c, H and W represent the depth, height and width of the characteristic diagram respectively for the characteristic diagram of the selected intermediate layer; α, δ respectively represent average pooling downsampling and maximum pooling downsampling, the feature map size after the action is C × 1 × 1, and the whole process can be represented as:
Figure BDA0003920660480000032
wherein W 0 ,W 1 Is a shared convolution layer for changing the number of channels of the characteristic diagram;
its complete channel attention operates as follows:
ω 1 =σ(W 1 ReLU(W 0 α(F))+W 1 ReLU(W 0 δ(F)))
ReLU () and σ () denote a ReLU activation function and a sigmoid activation function, respectively;
and 2, step: the spatial attention mechanism performs feature fusion on results of average pooling downsampling and maximum pooling downsampling, and then superposes a convolution layer to obtain spatial weight features, wherein the formalization is represented as follows:
ω 2 =σ(W 7*7 (α(F);δ(F)))
and 3, step 3: and (3) fusing backbone network information of the channel attention machine and the space attention machine in the step (1) and the step (II) into a double attention machine, wherein the fusion process is as follows:
Figure BDA0003920660480000041
Figure BDA0003920660480000042
ω 12 representing the noted channel attention mechanism and spatial attention mechanism respectively,
Figure BDA0003920660480000043
representing element-by-element multiplication.
Further, the detailed steps in the verification stage are as follows:
step 1: calculating base class feature information, wherein the mean value mu of the i-th class i Sum variance Σ i Comprises the following steps:
Figure BDA0003920660480000044
Figure BDA0003920660480000045
where i represents the number of sample classes in the support set and j represents the specific number of samples in the corresponding class. n is i Then represents the total number of samples in base class i, x j Is the feature vector of the jth sample in base class i;
and 2, step: generating most relevant class characteristics, carrying out affinity measurement by using Euclidean distance to find K classes, and generating most relevant class characteristic information through the mean value and variance of the K classes;
Figure BDA0003920660480000046
Figure BDA0003920660480000047
is a vector representation of the labeled exemplars in the new class in the feature space, N d Is a set of Euclidean distances between all samples in the base class, and the first t elements with the nearest distance are selected as a supplementary class set N t
Figure BDA0003920660480000048
From the elements within the set of supplementary classes, the mean and variance may be generated that are closely related to the new class:
Figure BDA0003920660480000049
and step 3: training a linear classifier based on the generated feature samples, and referring the generated feature set to F y ,F y ={(μ' 1 ,Σ' 1 ),…,(μ' k ,∑' k ) Of which is mu' i ,∑' i The mean value and the variance of the calibration distribution generated according to the characteristic vector of the ith type of image in the support set; the mean value and the feature vector of the known category are randomly sampled to generate a supplementary sample set G y
Figure BDA00039206604800000410
The method comprises the following steps of (i) generating a complementary sample corresponding to an ith sample, mixing the generated sample with an original sample to be used as an enhanced sample, and training a linear classifier by minimizing cross entropy loss:
Figure BDA0003920660480000051
wherein
Figure BDA0003920660480000052
And for the category number of each training task, theta is a trainable parameter, and the query set of the new category is predicted and the accuracy is calculated through a linear classifier.
The small sample training method is applied to agricultural pest recognition training.
The invention has the beneficial effects that:
1. in the training stage, a space attention mechanism and a channel attention mechanism are introduced to strengthen the different information of different types of pests; in the verification stage, base class information is used for reinforcing the characteristics of the sample of the new class support set so as to avoid overfitting, and finally the two methods are combined, so that the model identification accuracy is greatly improved on the basis of the reference method. By applying the method, when a new pest species such as African locust invades, only one-time calibration is needed by experts, and subsequently, the pest species which are easy to be confused in the region can be identified, so that a targeted control plan is carried out.
2. The invention applies the research thought of the field of small samples to the IP102 of the large-scale open agricultural pest data set for experiment and improvement, and compared with the problem that the traditional image classification method cannot identify without a large number of training samples when a new category appears, the method can identify well only by one guide sample when the same problem is faced.
3. The method integrates the traditional typical network small sample training method into a two-stage training mode, introduces a space attention mechanism and a channel attention mechanism in the first stage to enable the network to better extract characteristics, corrects new class distribution by means of a base class data set in order to prevent network overfitting and based on a Gaussian sampling method in the second stage, integrates the three typical network small sample training methods, and can greatly improve the identification precision.
Of course, it is not necessary for any product to practice the invention to achieve all of the above-described advantages at the same time.
Drawings
FIG. 1 is a schematic diagram of a 5way-1shot test task;
FIG. 2 is a diagram of the overall network architecture;
FIG. 3 is a flow chart of a dual-attention machine mechanism;
fig. 4 is a flowchart according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
The deep learning training method for improving the identification accuracy of small samples and small targets divides the traditional small sample learning training method into two stages, and introduces a double-attention mechanism to strengthen the difference information characteristics of different small samples and small targets in the training stage; in the verification stage, a feature generation module based on Gaussian distribution generates a related sample by means of a small sample and small target information with rich base classes, and partial distribution caused by scarcity of test samples is corrected.
The invention is illustrated below with reference to specific examples:
example 1
The embodiment provides a small sample training method based on deep learning, which is based on the recognition processes of three traditional typical small sample learning training methods.
In this embodiment, when three conventional typical small sample training methods are used to define agricultural pest problems of small samples and small targets, the training targets for small sample learning are set to be a set of labeled graph agricultural pest new sets S = { (x) 1 ,y 1 ),(x 2 ,y 2 )...(x i ,y i ) In which x is i ∈R d×d Is the image characteristic of pests in the farmland ecosystem, y i E.C is a sample label, and S is artificially divided into base classes S b And new class S n In a relationship between them
Figure BDA0003920660480000061
S b ∪S n = S, for S in base class and new class I I = b | n, and there are a Support set (Support set) and a Query set (Query set), respectively
Figure BDA0003920660480000062
The purpose is to pass S b Such that the network has a data structure at S n In the general generalization ability of
Figure BDA0003920660480000063
In the middle, through the supporting set
Figure BDA0003920660480000064
Can guide the
Figure BDA0003920660480000065
Into the correct pest category, wherein
Figure BDA0003920660480000066
To measure the generalization ability to treat agricultural pest models, a K is typically defined w -way N s The problem of shot, namely: under the condition of new agricultural pests, randomly extracting k categories for each task, wherein each category only provides N samples with labels as a support set, and then classifying and calculating the accuracy of the newly appeared unknown label samples. When N is present s The time of =1 is single sample learning, that is, only one labeled sample is provided for learning for each category in the support set. The task information is shown in fig. 1.
The first method comprises the following steps: a matching network is based on a metric learning approach. The idea of the matching network is to assume a new test sample given a supporting set of agricultural pests samples
Figure BDA0003920660480000071
Need to separately find
Figure BDA0003920660480000072
Belong to y i And taking the maximum class label probability as the final prediction result. The model is generally represented as:
Figure BDA0003920660480000073
k is the number of categories, and the probability needs to be solved for each category of the support set; a is the target of network training, and the general expression is as follows:
Figure BDA0003920660480000074
d is the distance measurement between the query set and the support set, cosine similarity is used as measurement in the original text, f and g are network embedded models, and the purpose is to map the query set and the support set to different feature spaces of agricultural pests.
And the second method comprises the following steps: prototype networks are based on methods of metric learning. The prototype network assumes that a prototype position prototype exists in the feature space for each type of agricultural pest sample, and the agricultural pest samples of the same type are distributed nearby the prototype. Therefore, during training, data are mapped to the feature space of the agricultural pest atlas through an embedding function, and then the vectorization mean value is used as the prototype space of each type. The formalization is represented as:
Figure BDA0003920660480000075
c k for each class of prototype, f φ The method is an embedded model and aims to convert the image into a feature space and enhance the computational efficiency.
And (3) processing the distance between the query set data and each category prototype by softmax to obtain the category to which the query set data belongs during testing, namely:
Figure BDA0003920660480000076
d is a distance metric function that measures the distance between the query set and the category prototype.
And the third is that: graph nerves are based on methods of metric learning. The method of the neural network is then used for image classification by propagating tagged information from tagged agricultural pest samples to untagged agricultural pest samples. Firstly, constructing an initial node of a neural network of a graph:
Figure BDA0003920660480000077
f φ (x i ) The vector representation of the agricultural pest sample in the feature space is realized, h (l) is the One-Hot vector of the label, and the vector representation and the h (l) are spliced to serve as initial nodes of the graph.
Next, the adjacency matrix of the graph is obtained:
Figure BDA0003920660480000081
and superposing the difference between the absolute values of the nodes i and j on a multilayer perceptron to be used as a critical matrix of the graph, wherein the MLP is the multilayer perceptron,
Figure BDA0003920660480000082
is a parameter that can be learned
Finally, obtaining the overall graph model and carrying out message propagation
Figure BDA0003920660480000083
B is a set of adjacency matrices,
Figure BDA0003920660480000084
is a parameter that can be trained, and it can obtain the vertex of the node of the current layer and its adjacent matrix and update it to the vertex of the next layer, and ρ (-) is the activation function introduced to improve the generalization ability. Through the method, the label information can be transmitted to the label-free agricultural pest sample through multiple iterations.
Example 2
The embodiment provides two-stage training steps fusing different feature enhancement modules, a double-attention mechanism is introduced in the first stage to perform data enhancement on key feature information of each type of agricultural pests in the feature extraction stage, a feature generation module based on Gaussian distribution is adopted in the second stage, relevant samples are generated by means of agricultural pest sample information with abundant base classes to correct biased distribution caused by scarcity of test samples, and the whole network architecture is shown in fig. 2.
In the first stage (training stage), a good feature extractor such as resnet capable of accurately extracting key information of different pest categories is trained based on a traditional small sample learning method, and spatial attention and channel attention mechanisms are introduced in the first stage, so that the feature extraction network can identify slight differences of different categories under various backgrounds, and the unique feature information of each pest category is focused.
Second stage (verification stage) since the data in the new class of agricultural pests has only a limited number of labeled samples, direct classification using conventional methods is likely to produce an overfitting situation. Therefore, assuming that the features are based on a gaussian distribution, the class information of each class is directly related to its mean value, which represents the prototype of each class, and its variance, which represents the intra-class difference, so that the distribution of the new class can be estimated more accurately by means of the distribution of the base class. Therefore, in the second stage, the mean value and the variance of the base class in the feature space are calculated by means of the feature extraction network in the first stage, a large number of marked samples are generated by utilizing the most similar class features to the new class based on the thought of a nearest neighbor algorithm, and finally a more excellent classifier is trained by combining the generated samples and the new class support set to realize classification.
When the network is extracted by combining features of spatial attention and channel attention, an attention mechanism enables a network student to focus on a key area, for example, the key difference between a long-horned grasshopper and a grasshopper is the proportion of an antenna to a body length, two modes of the spatial attention mechanism and the channel attention mechanism are generally provided, the spatial attention mechanism focuses on difference information of image dimensions, and the channel attention mechanism focuses on difference information of channel dimensions.
For agricultural pests, the image dimension represents information such as sizes, angles and positions of different pests, the channel dimension represents information such as colors and textures, and the two dimensions are very important for identifying different pests, so that the characteristic diagram generated at each stage is enhanced by combining spatial attention and channel attention mechanisms. Whether ConvNet or ResNet is used as a feature extraction network, a double attention mechanism is introduced in a jump connection stage to achieve balance of accuracy and efficiency, and specific implementation details are shown in a flow chart of the double attention mechanism in FIG. 3.
The fusion steps of the double attention mechanism are as follows:
step 1: when the channel attention mechanism is used for processing, in the process of mapping the agricultural pest image to the feature space, image information can be continuously compressed through multilayer convolution, different channels of each convolution layer can pay attention to different information, the channel attention mechanism solves the problem that the network should pay attention to the channel information, and one channel attention mechanism comprises the following two parts:
Figure BDA0003920660480000091
F∈R C×H×W c, H and W represent the depth, height and width of the characteristic diagram respectively for the characteristic diagram of the selected intermediate layer; α, δ respectively represent average pooling downsampling and maximum pooling downsampling, the feature map size after the action is C × 1 × 1, and the whole process can be represented as:
Figure BDA0003920660480000092
wherein W 0 ,W 1 The convolution layer is shared in order to change the number of channels of the characteristic diagram.
A complete channel attention operation is represented as follows:
ω 1 =σ(W 1 ReLU(W 0 α(F))+W 1 ReLU(W 0 δ(F)))
ReLU () and σ () denote a ReLU activation function and a sigmoid activation function, respectively.
Step 2: the spatial attention mechanism solves the problem that the network should pay attention to which key areas, and is similar to the channel attention mechanism, and the implementation method is to perform feature fusion on the results of average pooling downsampling and maximum pooling downsampling and then superimpose a convolution layer to obtain spatial weight features, which can be formally expressed as:
ω 2 =σ(W 7*7 (α(F);δ(F)))
and 3, step 3: when backbone network information is fused, the output of each stage is subjected to a double attention mechanism, a channel attention mechanism converts an original image into a C multiplied by 1 one-dimensional vector, and the vector is multiplied by a feature map to learn the weight information of a channel; the spatial attention mechanism converts the original image into a 1 × H × W feature map to learn the weight value of each spatial information of the feature map.
The dual-attention mechanism can be represented as the following process:
Figure BDA0003920660480000101
Figure BDA0003920660480000102
ω 12 representing the noted channel attention mechanism and spatial attention mechanism respectively,
Figure BDA0003920660480000103
which represents the multiplication of the elements. A complete parallel double attention mechanism takes the output of the channel attention mechanism as the input of the spatial attention mechanism, and finally the attention result is obtained through superposition.
Example 3
The embodiment provides specific detailed steps generated based on a feature module based on gaussian distribution, each type of agricultural pests has only one guide picture in the testing stage, the trained model is easy to over-fit due to the partial distribution formed by insufficient samples, and a large number of samples in the base class data need to be reasonably utilized to prevent over-fitting. For agricultural pests, although the new agricultural pests are unseen classes, the new agricultural pests belong to insect species, and the image features have great similarity to the base-class images, so that the bias distribution can be calibrated by means of the mean and variance of similar samples in the training set to prevent overfitting. The method comprises the following specific steps:
step 1: when the feature information of the base-class agricultural pests is calculated, the original image is mapped to the feature space by using the feature extraction network generated in the first stage, so that the data dimensionality can be effectively reduced, and the features can be rapidly extracted. Because of the abundance of data samples in the base-class agricultural pests,the statistical information of each agricultural pest category in the feature space can be conveniently obtained, wherein the mean value mu of the ith category i Sum variance Σ i Comprises the following steps:
Figure BDA0003920660480000104
Figure BDA0003920660480000111
where i represents the number of sample classes in the support set and j represents the specific number of samples in the corresponding class. n is i Then represents the total number of samples in base class i, x j Is the feature vector of the jth sample in base class i.
Step 2: when the most relevant species feature is generated, the premise of feature migration and calibration is that the similarity of agricultural pest samples is high, so that the difference between the mean value and the variance in a feature space is small, the class-based agricultural pest data are abundant in class and cannot be completely used for sample generation, so that agricultural pest species information which is most closely related to a new class needs to be found out.
Figure BDA0003920660480000112
In the formula
Figure BDA0003920660480000113
Is a vector representation of the labeled exemplars in the new class in the feature space,
Figure BDA0003920660480000114
is a set of Euclidean distances between all samples in the base class, and selects the first t elements with the nearest distance as a supplementary class set
Figure BDA0003920660480000115
Figure BDA0003920660480000116
The means and variances most closely related to the new class can be generated from the elements in the complementary class set:
Figure BDA0003920660480000117
and 3, step 3: when the linear classifier is trained according to the generated feature sampling samples, for each test task, the mean value and the variance generation samples are calculated based on the method to correct the distribution of agricultural pest samples in the support set, and the generated feature set is called F y ,
F y ={(μ′ 1 ,∑′ 1 ),…,(μ′ k ,∑′ k ) Of which is mu' i ,∑' i Is the mean and variance of the calibration distribution generated from the feature vectors of the class i images in the support set.
If the mean value and the feature vector of the category are known, generating a supplementary sample by random sampling, and setting the generated sample set as G y And then:
Figure BDA0003920660480000118
and (3) correspondingly generating a supplementary sample (x, y) from the ith sample, mixing the generated agricultural pest sample with the agricultural pest original sample to be used as an enhanced sample, and training a linear classifier by minimizing cross entropy loss.
Figure BDA0003920660480000119
Wherein
Figure BDA0003920660480000121
The method is characterized in that the method is a class number for each training task, theta is a trainable parameter, and after training is finished, a linear classifier trained by a mode is used for predicting and calculating the accuracy of a new agricultural pest query set.
Example 4
This example provides comparative experiments based on the small sample training method of the present invention and the conventional training method. The data of the experiment is based on an IP102 large-scale agricultural pest data set, the data comprises more than 75000 images, 102 types of common pests found in the wild real environment have 737 samples per type on average, but the data set is unevenly distributed and has a long tail distribution effect.
For the traditional image recognition task, a typical original data set partitioning mode is adopted, namely a training set: and (3) test set: validation set = 6; for the small sample image recognition task, in order to prevent the verification set and the training set from generating class intersection, the programming program randomly selects sixty classes as the training set, and the verification set and the test set in the rest classes respectively occupy 21 classes.
The conventional method is represented as follows:
the traditional image training method mainly comprises methods for manually extracting features and extracting the features by utilizing deep learning, wherein the method mainly comprises the methods such as SIFT, HOG and the like, and the methods have better performance on low-level semantic features such as edges, colors and textures; while the latter is more excellent in terms of high-level semantic information. SIFT and HOG are respectively adopted for feature extraction, SVM and KNN are used as classifiers for result evaluation, and the experimental results are shown in Table 1.
TABLE 1 conventional image classification results
Figure BDA0003920660480000122
Figure BDA0003920660480000131
The experimental result shows that the deep learning feature extraction effect is generally better than the traditional manual feature extraction effect, the best effect is to extract features by using ResNet as a backbone and classify agricultural pests by KNN, but the average accuracy is only 43.7%, and the ip102 data set reflected from the side surface has certain identification difficulty.
The comparative experiment results are as follows:
for the mainstream and better effect three methods respectively use 5-way1-shot as comparison experiments, a classifier adopts logistic regression, each test is performed on the same test set for 10000 times in a circulating experiment, average accuracy is taken as a final result, because the original paper proposes that Conv4 is mostly used as a feature extraction network, but recent research finds that Resnet10 can give consideration to speed and precision better, in order to verify the effectiveness of the method, the comparison experiments are performed by respectively combining the two feature extraction networks, and the experiment results are shown in the following table 2:
table 2 shows the method for improving the effect
Figure BDA0003920660480000132
The experimental result shows that the method provided by the method has obvious effect, the highest identification accuracy rate of different methods is enhanced by 6.87%, and the worst identification accuracy rate is improved by 2.11%. The best performing GNN network combined with the proposed feature enhancement module yielded a classification accuracy of 46.06% on ip102 dataset.
For the traditional method, all data sets are used for prediction on 102 categories, and for the single-sample method, only one picture is extracted for prediction on 5 categories, the recognition accuracy rate is not directly comparable in numerical value, but for practical application, the single agricultural pest sample method is obviously more valuable, because for unknown categories, recognition on all agricultural pest categories is not required, and instead, recognition on only a single expert labeled sample is more meaningful.
Performance on other datasets:
the main public data set of the current small sample learning training method is a mini-ImageNet data set, the data set is composed of 60000 images extracted from a large agricultural pest image classification data set ImageNet, 100 classes are classified in total, each class comprises 600 natural images with the size of 84 × 84, the data set is divided according to a mainstream method, and the training set, the verification set and the test set respectively account for 64,16 and 20 classes. The best performance of ProtopicalNet and GnNet under the conductive and Inductive methods is selected for experiments, and the improvement results are shown in Table 3:
TABLE 3 Mini-Imagenet test results
Figure BDA0003920660480000141
The experimental result shows that although the performance is excellent compared with that of prototypical pilcalnet and gnnenet in other methods, the feature enhancement module provided by the method is fused to obtain a good enhancement effect, and the method has a certain universality effect.
Ablation experiment:
to verify the two modules proposed: under the actions of the dual attention mechanism module and the feature generation module, the GNN model which best performs on the two data sets is used as a basic framework, and an ablation experiment is performed on the two data sets by using Resnet10 as a backbone, and the experimental results are shown in table 4:
TABLE 4 ablation experiment
Figure BDA0003920660480000151
The experimental result shows that only the attention mechanism on the two data sets has an unobvious lifting effect, which is only 1.04% and 0.55% respectively, and the lifting effect is obvious after the feature generation module is added, but the effects of the attention mechanism and the feature generation module are added simultaneously and are quite obvious, which respectively achieve the lifting accuracy of 5.87% and 3.95%, and the theoretical effectiveness of the method is also proved at the same time.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand the invention for and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. The deep learning training method for improving the identification accuracy of small samples and small targets is characterized in that the traditional small sample learning training method is divided into two stages, and in the training stage, a double-attention machine system is introduced to strengthen the difference information characteristics of different small samples and small targets; in the verification stage, a feature generation module based on Gaussian distribution generates a relevant sample by means of a base class.
2. The deep learning training method for improving the recognition accuracy of small samples and small targets according to claim 1, wherein the dual attention mechanism is fused by a spatial attention mechanism and a channel attention mechanism, and is fused into a feature extractor for training key information based on a traditional small sample learning training method to enhance features of the key information.
3. The deep learning training method for improving the identification accuracy of small samples and small targets according to claim 1, wherein the feature of the critical information of the small samples and the small targets is assumed to be gaussian distributed in the verification stage, the mean and variance of the base class of the small samples and the small targets in the feature space are calculated based on the feature extraction network of the critical information in the training stage, labeled samples are generated based on the nearest neighbor algorithm and the features of the new class of similar classes of the small samples and the small targets, and the labeled samples and the new class small samples and the small target support set are generated by combining the labeled samples and the training classifier.
4. The deep learning training method for improving the recognition accuracy of small samples and small targets according to claim 2, wherein the step of performing backbone network information fusion on a channel attention mechanism and a space attention mechanism is as follows:
step 1, the channel attention mechanism is obtained by carrying out data characteristic processing on new small samples and small targets of average uniform pooled downsampling and maximum pooled downsampling, and the expression is as follows:
Figure FDA0003920660470000011
F∈R C×H×W c, H and W represent the depth, height and width of the characteristic diagram respectively for the characteristic diagram of the selected intermediate layer; α, δ respectively represent average pooling downsampling and maximum pooling downsampling, the feature map size after the action is C × 1 × 1, and the whole process can be represented as:
Figure FDA0003920660470000012
wherein W 0 ,W 1 Is a shared convolution layer for changing the number of channels of the characteristic diagram;
its complete channel attention operates as follows:
ω 1 =σ(W 1 ReLU(W 0 α(F))+W 1 ReLU(W 0 δ(F)))
ReLU () and σ () denote a ReLU activation function and a sigmoid activation function, respectively;
step 2: the spatial attention mechanism performs feature fusion on results of average pooling downsampling and maximum pooling downsampling, and then superposes a convolution layer to obtain spatial weight features, wherein the formalization is represented as follows:
ω 2 =σ(W 7*7 (α(F);δ(F)))
and 3, step 3: and (3) fusing backbone network information of the channel attention machine and the space attention machine in the step (1) and the step (II) into a double attention machine, wherein the fusion process is as follows:
Figure FDA0003920660470000021
Figure FDA0003920660470000022
ω 12 respectively representing the channel attention mechanism and the spatial attention mechanism mentioned,
Figure FDA0003920660470000023
representing element-by-element multiplication.
5. The deep learning training method for improving the recognition accuracy of small samples and small targets according to claim 3, wherein the detailed steps in the verification stage are as follows:
step 1: calculating base class feature information, wherein the mean value mu of the i-th class i Sum variance Σ i Comprises the following steps:
Figure FDA0003920660470000024
Figure FDA0003920660470000025
where i represents the number of sample classes in the support set and j represents the specific number of samples in the corresponding class. n is i Then represents the sample in base class iTotal number of (2), x j Is the feature vector of the jth sample in base class i;
step 2: generating most relevant class characteristics, carrying out affinity measurement by using Euclidean distance to find K classes, and generating most relevant class characteristic information through the mean value and variance of the K classes;
Figure FDA0003920660470000026
Figure FDA0003920660470000027
is a vector representation of the labeled exemplars in the new class in the feature space, N d Is a set of Euclidean distances between all samples in the base class, and the first t elements with the nearest distance are selected as a supplementary class set N t
Figure FDA0003920660470000028
From the elements within the set of supplementary classes, the mean and variance may be generated that are closely related to the new class:
Figure FDA0003920660470000031
and step 3: training a linear classifier based on the generated feature samples, and referring the generated feature set to Fy, F y ={(μ' 1 ,Σ' 1 ),…,(μ' k ,∑' k ) Of which is mu' i ,∑' i The mean value and the variance of the calibration distribution generated according to the characteristic vector of the ith type of image in the support set; the mean and feature vector of the known class are randomly sampled to generate a complementary sample set G y
Figure FDA0003920660470000032
The method comprises the following steps of (i) generating a complementary sample corresponding to an ith sample, mixing the generated sample with an original sample to be used as an enhanced sample, and training a linear classifier by minimizing cross entropy loss:
Figure FDA0003920660470000033
wherein
Figure FDA0003920660470000034
And for the category number of each training task, theta is a trainable parameter, and the query set of the new category is predicted and the accuracy is calculated through a linear classifier.
CN202211357500.1A 2022-11-01 2022-11-01 Deep learning training method for improving recognition accuracy of small sample and small target Pending CN115690541A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211357500.1A CN115690541A (en) 2022-11-01 2022-11-01 Deep learning training method for improving recognition accuracy of small sample and small target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211357500.1A CN115690541A (en) 2022-11-01 2022-11-01 Deep learning training method for improving recognition accuracy of small sample and small target

Publications (1)

Publication Number Publication Date
CN115690541A true CN115690541A (en) 2023-02-03

Family

ID=85048324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211357500.1A Pending CN115690541A (en) 2022-11-01 2022-11-01 Deep learning training method for improving recognition accuracy of small sample and small target

Country Status (1)

Country Link
CN (1) CN115690541A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861847A (en) * 2023-02-24 2023-03-28 耕宇牧星(北京)空间科技有限公司 Intelligent auxiliary marking method for visible light remote sensing image target
CN116310894A (en) * 2023-02-22 2023-06-23 中交第二公路勘察设计研究院有限公司 Unmanned aerial vehicle remote sensing-based intelligent recognition method for small-sample and small-target Tibetan antelope
CN116432089A (en) * 2023-05-15 2023-07-14 厦门星拉科技有限公司 Electric power internet of things inspection system and method
CN117151342A (en) * 2023-10-24 2023-12-01 广东省农业科学院植物保护研究所 Litchi insect pest identification and resistance detection method, litchi insect pest identification and resistance detection system and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116310894A (en) * 2023-02-22 2023-06-23 中交第二公路勘察设计研究院有限公司 Unmanned aerial vehicle remote sensing-based intelligent recognition method for small-sample and small-target Tibetan antelope
CN116310894B (en) * 2023-02-22 2024-04-16 中交第二公路勘察设计研究院有限公司 Unmanned aerial vehicle remote sensing-based intelligent recognition method for small-sample and small-target Tibetan antelope
CN115861847A (en) * 2023-02-24 2023-03-28 耕宇牧星(北京)空间科技有限公司 Intelligent auxiliary marking method for visible light remote sensing image target
CN115861847B (en) * 2023-02-24 2023-05-05 耕宇牧星(北京)空间科技有限公司 Intelligent auxiliary labeling method for visible light remote sensing image target
CN116432089A (en) * 2023-05-15 2023-07-14 厦门星拉科技有限公司 Electric power internet of things inspection system and method
CN117151342A (en) * 2023-10-24 2023-12-01 广东省农业科学院植物保护研究所 Litchi insect pest identification and resistance detection method, litchi insect pest identification and resistance detection system and storage medium
CN117151342B (en) * 2023-10-24 2024-01-26 广东省农业科学院植物保护研究所 Litchi insect pest identification and resistance detection method, litchi insect pest identification and resistance detection system and storage medium

Similar Documents

Publication Publication Date Title
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
Masana et al. Metric learning for novelty and anomaly detection
CN111738315B (en) Image classification method based on countermeasure fusion multi-source transfer learning
Zeng et al. Traffic sign recognition using kernel extreme learning machines with deep perceptual features
CN115690541A (en) Deep learning training method for improving recognition accuracy of small sample and small target
CN111783831B (en) Complex image accurate classification method based on multi-source multi-label shared subspace learning
CN110321967B (en) Image classification improvement method based on convolutional neural network
CN113076994B (en) Open-set domain self-adaptive image classification method and system
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
Zhao et al. SEV‐Net: Residual network embedded with attention mechanism for plant disease severity detection
Borwarnginn et al. Breakthrough conventional based approach for dog breed classification using CNN with transfer learning
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN113688894B (en) Fine granularity image classification method integrating multiple granularity features
CN110619059A (en) Building marking method based on transfer learning
CN112232395B (en) Semi-supervised image classification method for generating countermeasure network based on joint training
CN112329771A (en) Building material sample identification method based on deep learning
CN116563410A (en) Electrical equipment electric spark image generation method based on two-stage generation countermeasure network
Pei et al. Few-shot object detection by knowledge distillation using bag-of-visual-words representations
Wu et al. STR transformer: a cross-domain transformer for scene text recognition
Suárez et al. Revisiting binary local image description for resource limited devices
CN111797700B (en) Vehicle re-identification method based on fine-grained discrimination network and second-order reordering
CN111259938B (en) Manifold learning and gradient lifting model-based image multi-label classification method
CN115830401B (en) Small sample image classification method
Wang et al. Multiscale anchor box and optimized classification with faster R‐CNN for object detection
CN115965968A (en) Small sample target detection and identification method based on knowledge guidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination