CN115690541A

CN115690541A - Deep learning training method for improving recognition accuracy of small sample and small target

Info

Publication number: CN115690541A
Application number: CN202211357500.1A
Authority: CN
Inventors: 王祎; 徐振宇; 刘怡光; 楼旭东; 房景鑫
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-11-01
Filing date: 2022-11-01
Publication date: 2023-02-03

Abstract

The invention discloses a deep learning training method for improving the identification accuracy of small samples and small targets, and relates to the technical field of identification of small samples and small targets. The invention provides a feature enhancement module for fusing a double-attention machine mechanism aiming at the problem of overhigh background similarity of different small samples and small target types during training; secondly, aiming at the over-fitting problem possibly generated by prediction under the condition of a single sample, a feature generation module based on Gaussian distribution is provided to improve generalization capability; finally, three typical small sample training methods are unified into a two-stage training model to be integrated into the proposed method. The idea and the improvement are applied to the traditional pest classification data set IP102 for the first time, and the identification accuracy can be improved by 2.11 to 6.87 percent on the basis of a reference method.

Description

Deep learning training method for improving identification accuracy of small sample and small target

Technical Field

The invention relates to the technical field of small sample and small target recognition, in particular to a deep learning training method for improving the recognition accuracy of small samples and small targets.

Background

Agricultural problems concern the population, and the large variety of pests brings great challenges to grain production and crop safety, so that safe and efficient identification of agricultural pests is particularly important. Meanwhile, the image recognition technology based on deep learning has also made a rapid progress, the performance of various improved convolutional neural networks and transform mechanisms in certain specific scenes is better than that of human beings, and in the face of the trade-off of economy and efficiency, students make various active attempts to use a machine vision method in the field of agricultural pest recognition.

In the identification research process of small samples and small targets in complex farmland ecology, wuxiang uses a 12-layer convolutional neural network to identify 128 x 128 input color images in 2016, and the identification accuracy rate of 76.7 percent is achieved on the identification of 10 types of pests; the Cheng and the like identify the pests under the complex farmland background in 2017 through a depth residual error module, and the accuracy rate reaches 99.67%; in 2018, loeming proposed a lightweight model suitable for mobile terminal deployment, and the recognition accuracy rate is 93.5% under the condition of ensuring certain efficiency; according to the method, agricultural pest identification is researched by means of a transfer learning idea in 2019, various super parameters are adjusted, and the accuracy rate of identification of cited three types of pests is up to 95% or more; in 2020, NANNI and the like fuse a significance method and a convolutional neural network on a more challenging ip102 data set, so that the recognition accuracy rate is 61.93%; in 2021, HRIDOY and the like use different convolutional neural networks to perform comparison experiments on early okra pest images, wherein the MobileNet V2 has the best expression effect and achieves the recognition accuracy of 98%. Although the existing research achieves better results, the recognition accuracy rate of the existing research depends on the data set seriously, and the new untrained class is unfair to the restraint

In order to further improve the recognition accuracy of agricultural pest recognition under the condition of small samples and small targets, three most typical small sample training methods, namely a matching network, a prototype network and a graph neural network, are available at present. The three small sample learning methods have advantages in data with different characteristics, but the three typical small sample training methods still have the following defects:

1. the three typical network small sample training methods only pay attention to the fact that the measurement method is appropriate and whether the critical characteristic information of each pest category is lost in the process of mapping the image information to the characteristic space is ignored, and if each pest category lacks a unique characteristic identifier, the classification effect is influenced.

2. According to the three typical network small sample training methods, even though the mapped feature vectors are enough to keep the category information of each pest image, only one picture is used for classification guidance of each type of pests in the testing stage, if the features of the pictures are not clear or some non-critical information is too strong, the classification result is guided wrongly, and the accuracy of pest and disease features is influenced.

3. Image recognition techniques based on deep learning must be trained on a large number of labeled samples before specific applications, however, in real scenes target domain samples may be very rare. The existing small sample training method can be used for predicting overfitting possibly generated in a small sample scene and possibly existing small target and background confusion during training.

Disclosure of Invention

The invention aims to provide a deep learning training method for improving the identification accuracy of small samples and small targets, provides a universal two-stage training model to fuse the existing mainstream method and enhance the performance of the mainstream method, and provides a feature enhancement module for fusing a double-attention machine system aiming at the problem of confusion of the small targets and the background possibly existing during training. Secondly, a feature generation module based on Gaussian distribution is provided for the overfitting problem possibly generated by prediction under a small sample scene so as to improve generalization capability. Finally, the improvement idea is unified into a two-stage training model, and 2.11% to 6.87% of unequal improvement effects are achieved on three typical small sample identification methods.

In order to achieve the technical purpose and achieve the technical effect, the invention is realized by the following technical scheme:

the deep learning training method for improving the identification accuracy of small samples and small targets divides the traditional small sample learning training method into two stages of training models, and introduces a double-attention machine system to strengthen the difference information characteristics of different small samples and small targets in the training stage; in the verification stage, a feature generation module based on Gaussian distribution generates a related sample by means of a small sample and small target information with rich base classes, and corrects the biased distribution caused by the scarcity of test samples.

Furthermore, the double attention mechanism is formed by fusing a space attention mechanism and a channel attention mechanism, and is integrated into a feature extractor which is trained based on a traditional small sample learning training method and can extract key information of small samples and small target categories, so that the key information features of the small samples and the small targets are enhanced;

furthermore, in the verification stage, the characteristics of the key information of the small samples and the small targets are assumed to be Gaussian distribution, the mean value and the variance of the classes of the base class small samples and the small targets on the characteristic space are calculated based on the characteristic extraction network of the key information in the training stage, the labeled samples are generated based on the nearest neighbor algorithm and the similar class characteristics of the new class small samples and the small targets, and then a classifier is trained by combining the generated labeled samples with the new class small samples and the small target support set;

further, the steps of performing backbone network information fusion on the channel attention mechanism and the space attention mechanism are as follows:

step 1, the channel attention mechanism is obtained by carrying out data characteristic processing on new small samples and small targets of average uniform pooled downsampling and maximum pooled downsampling, and the expression is as follows:

F∈R ^C×H×W c, H and W represent the depth, height and width of the characteristic diagram respectively for the characteristic diagram of the selected intermediate layer; α, δ respectively represent average pooling downsampling and maximum pooling downsampling, the feature map size after the action is C × 1 × 1, and the whole process can be represented as:

wherein W ₀ ,W ₁ Is a shared convolution layer for changing the number of channels of the characteristic diagram;

its complete channel attention operates as follows:

ω ₁ ＝σ(W ₁ ReLU(W ₀ α(F))+W ₁ ReLU(W ₀ δ(F)))

ReLU () and σ () denote a ReLU activation function and a sigmoid activation function, respectively;

and 2, step: the spatial attention mechanism performs feature fusion on results of average pooling downsampling and maximum pooling downsampling, and then superposes a convolution layer to obtain spatial weight features, wherein the formalization is represented as follows:

ω ₂ ＝σ(W ^7*7 (α(F)；δ(F)))

and 3, step 3: and (3) fusing backbone network information of the channel attention machine and the space attention machine in the step (1) and the step (II) into a double attention machine, wherein the fusion process is as follows:

ω ₁ ,ω ₂ representing the noted channel attention mechanism and spatial attention mechanism respectively,

representing element-by-element multiplication.

Further, the detailed steps in the verification stage are as follows:

step 1: calculating base class feature information, wherein the mean value mu of the i-th class _i Sum variance Σ _i Comprises the following steps:

where i represents the number of sample classes in the support set and j represents the specific number of samples in the corresponding class. n is _i Then represents the total number of samples in base class i, x _j Is the feature vector of the jth sample in base class i;

and 2, step: generating most relevant class characteristics, carrying out affinity measurement by using Euclidean distance to find K classes, and generating most relevant class characteristic information through the mean value and variance of the K classes;

is a vector representation of the labeled exemplars in the new class in the feature space, N _d Is a set of Euclidean distances between all samples in the base class, and the first t elements with the nearest distance are selected as a supplementary class set N _t ：

From the elements within the set of supplementary classes, the mean and variance may be generated that are closely related to the new class:

and step 3: training a linear classifier based on the generated feature samples, and referring the generated feature set to F _y ,F _y ＝{(μ' ₁ ,Σ' ₁ ),…,(μ' _k ,∑' _k ) Of which is mu' _i ,∑' _i The mean value and the variance of the calibration distribution generated according to the characteristic vector of the ith type of image in the support set; the mean value and the feature vector of the known category are randomly sampled to generate a supplementary sample set G _y ：

The method comprises the following steps of (i) generating a complementary sample corresponding to an ith sample, mixing the generated sample with an original sample to be used as an enhanced sample, and training a linear classifier by minimizing cross entropy loss:

wherein

And for the category number of each training task, theta is a trainable parameter, and the query set of the new category is predicted and the accuracy is calculated through a linear classifier.

The small sample training method is applied to agricultural pest recognition training.

The invention has the beneficial effects that:

1. in the training stage, a space attention mechanism and a channel attention mechanism are introduced to strengthen the different information of different types of pests; in the verification stage, base class information is used for reinforcing the characteristics of the sample of the new class support set so as to avoid overfitting, and finally the two methods are combined, so that the model identification accuracy is greatly improved on the basis of the reference method. By applying the method, when a new pest species such as African locust invades, only one-time calibration is needed by experts, and subsequently, the pest species which are easy to be confused in the region can be identified, so that a targeted control plan is carried out.

2. The invention applies the research thought of the field of small samples to the IP102 of the large-scale open agricultural pest data set for experiment and improvement, and compared with the problem that the traditional image classification method cannot identify without a large number of training samples when a new category appears, the method can identify well only by one guide sample when the same problem is faced.

3. The method integrates the traditional typical network small sample training method into a two-stage training mode, introduces a space attention mechanism and a channel attention mechanism in the first stage to enable the network to better extract characteristics, corrects new class distribution by means of a base class data set in order to prevent network overfitting and based on a Gaussian sampling method in the second stage, integrates the three typical network small sample training methods, and can greatly improve the identification precision.

Of course, it is not necessary for any product to practice the invention to achieve all of the above-described advantages at the same time.

Drawings

FIG. 1 is a schematic diagram of a 5way-1shot test task;

FIG. 2 is a diagram of the overall network architecture;

FIG. 3 is a flow chart of a dual-attention machine mechanism;

fig. 4 is a flowchart according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The deep learning training method for improving the identification accuracy of small samples and small targets divides the traditional small sample learning training method into two stages, and introduces a double-attention mechanism to strengthen the difference information characteristics of different small samples and small targets in the training stage; in the verification stage, a feature generation module based on Gaussian distribution generates a related sample by means of a small sample and small target information with rich base classes, and partial distribution caused by scarcity of test samples is corrected.

The invention is illustrated below with reference to specific examples:

example 1

The embodiment provides a small sample training method based on deep learning, which is based on the recognition processes of three traditional typical small sample learning training methods.

In this embodiment, when three conventional typical small sample training methods are used to define agricultural pest problems of small samples and small targets, the training targets for small sample learning are set to be a set of labeled graph agricultural pest new sets S = { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )...(x _i ，y _i ) In which x is _i ∈R ^d×d Is the image characteristic of pests in the farmland ecosystem, y _i E.C is a sample label, and S is artificially divided into base classes S _b And new class S _n In a relationship between them

S _b ∪S _n = S, for S in base class and new class _I I = b | n, and there are a Support set (Support set) and a Query set (Query set), respectively

The purpose is to pass S _b Such that the network has a data structure at S _n In the general generalization ability of

In the middle, through the supporting set

Can guide the

Into the correct pest category, wherein

To measure the generalization ability to treat agricultural pest models, a K is typically defined _w -way N _s The problem of shot, namely: under the condition of new agricultural pests, randomly extracting k categories for each task, wherein each category only provides N samples with labels as a support set, and then classifying and calculating the accuracy of the newly appeared unknown label samples. When N is present _s The time of =1 is single sample learning, that is, only one labeled sample is provided for learning for each category in the support set. The task information is shown in fig. 1.

The first method comprises the following steps: a matching network is based on a metric learning approach. The idea of the matching network is to assume a new test sample given a supporting set of agricultural pests samples

Need to separately find

Belong to y _i And taking the maximum class label probability as the final prediction result. The model is generally represented as:

k is the number of categories, and the probability needs to be solved for each category of the support set; a is the target of network training, and the general expression is as follows:

d is the distance measurement between the query set and the support set, cosine similarity is used as measurement in the original text, f and g are network embedded models, and the purpose is to map the query set and the support set to different feature spaces of agricultural pests.

And the second method comprises the following steps: prototype networks are based on methods of metric learning. The prototype network assumes that a prototype position prototype exists in the feature space for each type of agricultural pest sample, and the agricultural pest samples of the same type are distributed nearby the prototype. Therefore, during training, data are mapped to the feature space of the agricultural pest atlas through an embedding function, and then the vectorization mean value is used as the prototype space of each type. The formalization is represented as:

c _k for each class of prototype, f _φ The method is an embedded model and aims to convert the image into a feature space and enhance the computational efficiency.

And (3) processing the distance between the query set data and each category prototype by softmax to obtain the category to which the query set data belongs during testing, namely:

d is a distance metric function that measures the distance between the query set and the category prototype.

And the third is that: graph nerves are based on methods of metric learning. The method of the neural network is then used for image classification by propagating tagged information from tagged agricultural pest samples to untagged agricultural pest samples. Firstly, constructing an initial node of a neural network of a graph:

f _φ (x _i ) The vector representation of the agricultural pest sample in the feature space is realized, h (l) is the One-Hot vector of the label, and the vector representation and the h (l) are spliced to serve as initial nodes of the graph.

Next, the adjacency matrix of the graph is obtained:

and superposing the difference between the absolute values of the nodes i and j on a multilayer perceptron to be used as a critical matrix of the graph, wherein the MLP is the multilayer perceptron,

is a parameter that can be learned

Finally, obtaining the overall graph model and carrying out message propagation

B is a set of adjacency matrices,

is a parameter that can be trained, and it can obtain the vertex of the node of the current layer and its adjacent matrix and update it to the vertex of the next layer, and ρ (-) is the activation function introduced to improve the generalization ability. Through the method, the label information can be transmitted to the label-free agricultural pest sample through multiple iterations.

Example 2

The embodiment provides two-stage training steps fusing different feature enhancement modules, a double-attention mechanism is introduced in the first stage to perform data enhancement on key feature information of each type of agricultural pests in the feature extraction stage, a feature generation module based on Gaussian distribution is adopted in the second stage, relevant samples are generated by means of agricultural pest sample information with abundant base classes to correct biased distribution caused by scarcity of test samples, and the whole network architecture is shown in fig. 2.

In the first stage (training stage), a good feature extractor such as resnet capable of accurately extracting key information of different pest categories is trained based on a traditional small sample learning method, and spatial attention and channel attention mechanisms are introduced in the first stage, so that the feature extraction network can identify slight differences of different categories under various backgrounds, and the unique feature information of each pest category is focused.

Second stage (verification stage) since the data in the new class of agricultural pests has only a limited number of labeled samples, direct classification using conventional methods is likely to produce an overfitting situation. Therefore, assuming that the features are based on a gaussian distribution, the class information of each class is directly related to its mean value, which represents the prototype of each class, and its variance, which represents the intra-class difference, so that the distribution of the new class can be estimated more accurately by means of the distribution of the base class. Therefore, in the second stage, the mean value and the variance of the base class in the feature space are calculated by means of the feature extraction network in the first stage, a large number of marked samples are generated by utilizing the most similar class features to the new class based on the thought of a nearest neighbor algorithm, and finally a more excellent classifier is trained by combining the generated samples and the new class support set to realize classification.

When the network is extracted by combining features of spatial attention and channel attention, an attention mechanism enables a network student to focus on a key area, for example, the key difference between a long-horned grasshopper and a grasshopper is the proportion of an antenna to a body length, two modes of the spatial attention mechanism and the channel attention mechanism are generally provided, the spatial attention mechanism focuses on difference information of image dimensions, and the channel attention mechanism focuses on difference information of channel dimensions.

For agricultural pests, the image dimension represents information such as sizes, angles and positions of different pests, the channel dimension represents information such as colors and textures, and the two dimensions are very important for identifying different pests, so that the characteristic diagram generated at each stage is enhanced by combining spatial attention and channel attention mechanisms. Whether ConvNet or ResNet is used as a feature extraction network, a double attention mechanism is introduced in a jump connection stage to achieve balance of accuracy and efficiency, and specific implementation details are shown in a flow chart of the double attention mechanism in FIG. 3.

The fusion steps of the double attention mechanism are as follows:

step 1: when the channel attention mechanism is used for processing, in the process of mapping the agricultural pest image to the feature space, image information can be continuously compressed through multilayer convolution, different channels of each convolution layer can pay attention to different information, the channel attention mechanism solves the problem that the network should pay attention to the channel information, and one channel attention mechanism comprises the following two parts:

wherein W ₀ ,W ₁ The convolution layer is shared in order to change the number of channels of the characteristic diagram.

A complete channel attention operation is represented as follows:

ω ₁ ＝σ(W ₁ ReLU(W ₀ α(F))+W ₁ ReLU(W ₀ δ(F)))

ReLU () and σ () denote a ReLU activation function and a sigmoid activation function, respectively.

Step 2: the spatial attention mechanism solves the problem that the network should pay attention to which key areas, and is similar to the channel attention mechanism, and the implementation method is to perform feature fusion on the results of average pooling downsampling and maximum pooling downsampling and then superimpose a convolution layer to obtain spatial weight features, which can be formally expressed as:

ω ₂ ＝σ(W ^7*7 (α(F)；δ(F)))

and 3, step 3: when backbone network information is fused, the output of each stage is subjected to a double attention mechanism, a channel attention mechanism converts an original image into a C multiplied by 1 one-dimensional vector, and the vector is multiplied by a feature map to learn the weight information of a channel; the spatial attention mechanism converts the original image into a 1 × H × W feature map to learn the weight value of each spatial information of the feature map.

The dual-attention mechanism can be represented as the following process:

which represents the multiplication of the elements. A complete parallel double attention mechanism takes the output of the channel attention mechanism as the input of the spatial attention mechanism, and finally the attention result is obtained through superposition.

Example 3

The embodiment provides specific detailed steps generated based on a feature module based on gaussian distribution, each type of agricultural pests has only one guide picture in the testing stage, the trained model is easy to over-fit due to the partial distribution formed by insufficient samples, and a large number of samples in the base class data need to be reasonably utilized to prevent over-fitting. For agricultural pests, although the new agricultural pests are unseen classes, the new agricultural pests belong to insect species, and the image features have great similarity to the base-class images, so that the bias distribution can be calibrated by means of the mean and variance of similar samples in the training set to prevent overfitting. The method comprises the following specific steps:

step 1: when the feature information of the base-class agricultural pests is calculated, the original image is mapped to the feature space by using the feature extraction network generated in the first stage, so that the data dimensionality can be effectively reduced, and the features can be rapidly extracted. Because of the abundance of data samples in the base-class agricultural pests,the statistical information of each agricultural pest category in the feature space can be conveniently obtained, wherein the mean value mu of the ith category _i Sum variance Σ _i Comprises the following steps:

where i represents the number of sample classes in the support set and j represents the specific number of samples in the corresponding class. n is _i Then represents the total number of samples in base class i, x _j Is the feature vector of the jth sample in base class i.

Step 2: when the most relevant species feature is generated, the premise of feature migration and calibration is that the similarity of agricultural pest samples is high, so that the difference between the mean value and the variance in a feature space is small, the class-based agricultural pest data are abundant in class and cannot be completely used for sample generation, so that agricultural pest species information which is most closely related to a new class needs to be found out.

In the formula

Is a vector representation of the labeled exemplars in the new class in the feature space,

is a set of Euclidean distances between all samples in the base class, and selects the first t elements with the nearest distance as a supplementary class set

The means and variances most closely related to the new class can be generated from the elements in the complementary class set:

and 3, step 3: when the linear classifier is trained according to the generated feature sampling samples, for each test task, the mean value and the variance generation samples are calculated based on the method to correct the distribution of agricultural pest samples in the support set, and the generated feature set is called F _y ,

F _y ＝{(μ′ ₁ ，∑′ ₁ )，…，(μ′ _k ，∑′ _k ) Of which is mu' _i ,∑' _i Is the mean and variance of the calibration distribution generated from the feature vectors of the class i images in the support set.

If the mean value and the feature vector of the category are known, generating a supplementary sample by random sampling, and setting the generated sample set as G _y And then:

and (3) correspondingly generating a supplementary sample (x, y) from the ith sample, mixing the generated agricultural pest sample with the agricultural pest original sample to be used as an enhanced sample, and training a linear classifier by minimizing cross entropy loss.

Wherein

The method is characterized in that the method is a class number for each training task, theta is a trainable parameter, and after training is finished, a linear classifier trained by a mode is used for predicting and calculating the accuracy of a new agricultural pest query set.

Example 4

This example provides comparative experiments based on the small sample training method of the present invention and the conventional training method. The data of the experiment is based on an IP102 large-scale agricultural pest data set, the data comprises more than 75000 images, 102 types of common pests found in the wild real environment have 737 samples per type on average, but the data set is unevenly distributed and has a long tail distribution effect.

For the traditional image recognition task, a typical original data set partitioning mode is adopted, namely a training set: and (3) test set: validation set = 6; for the small sample image recognition task, in order to prevent the verification set and the training set from generating class intersection, the programming program randomly selects sixty classes as the training set, and the verification set and the test set in the rest classes respectively occupy 21 classes.

The conventional method is represented as follows:

the traditional image training method mainly comprises methods for manually extracting features and extracting the features by utilizing deep learning, wherein the method mainly comprises the methods such as SIFT, HOG and the like, and the methods have better performance on low-level semantic features such as edges, colors and textures; while the latter is more excellent in terms of high-level semantic information. SIFT and HOG are respectively adopted for feature extraction, SVM and KNN are used as classifiers for result evaluation, and the experimental results are shown in Table 1.

TABLE 1 conventional image classification results

The experimental result shows that the deep learning feature extraction effect is generally better than the traditional manual feature extraction effect, the best effect is to extract features by using ResNet as a backbone and classify agricultural pests by KNN, but the average accuracy is only 43.7%, and the ip102 data set reflected from the side surface has certain identification difficulty.

The comparative experiment results are as follows:

for the mainstream and better effect three methods respectively use 5-way1-shot as comparison experiments, a classifier adopts logistic regression, each test is performed on the same test set for 10000 times in a circulating experiment, average accuracy is taken as a final result, because the original paper proposes that Conv4 is mostly used as a feature extraction network, but recent research finds that Resnet10 can give consideration to speed and precision better, in order to verify the effectiveness of the method, the comparison experiments are performed by respectively combining the two feature extraction networks, and the experiment results are shown in the following table 2:

table 2 shows the method for improving the effect

The experimental result shows that the method provided by the method has obvious effect, the highest identification accuracy rate of different methods is enhanced by 6.87%, and the worst identification accuracy rate is improved by 2.11%. The best performing GNN network combined with the proposed feature enhancement module yielded a classification accuracy of 46.06% on ip102 dataset.

For the traditional method, all data sets are used for prediction on 102 categories, and for the single-sample method, only one picture is extracted for prediction on 5 categories, the recognition accuracy rate is not directly comparable in numerical value, but for practical application, the single agricultural pest sample method is obviously more valuable, because for unknown categories, recognition on all agricultural pest categories is not required, and instead, recognition on only a single expert labeled sample is more meaningful.

Performance on other datasets:

the main public data set of the current small sample learning training method is a mini-ImageNet data set, the data set is composed of 60000 images extracted from a large agricultural pest image classification data set ImageNet, 100 classes are classified in total, each class comprises 600 natural images with the size of 84 × 84, the data set is divided according to a mainstream method, and the training set, the verification set and the test set respectively account for 64,16 and 20 classes. The best performance of ProtopicalNet and GnNet under the conductive and Inductive methods is selected for experiments, and the improvement results are shown in Table 3:

TABLE 3 Mini-Imagenet test results

The experimental result shows that although the performance is excellent compared with that of prototypical pilcalnet and gnnenet in other methods, the feature enhancement module provided by the method is fused to obtain a good enhancement effect, and the method has a certain universality effect.

Ablation experiment:

to verify the two modules proposed: under the actions of the dual attention mechanism module and the feature generation module, the GNN model which best performs on the two data sets is used as a basic framework, and an ablation experiment is performed on the two data sets by using Resnet10 as a backbone, and the experimental results are shown in table 4:

TABLE 4 ablation experiment

The experimental result shows that only the attention mechanism on the two data sets has an unobvious lifting effect, which is only 1.04% and 0.55% respectively, and the lifting effect is obvious after the feature generation module is added, but the effects of the attention mechanism and the feature generation module are added simultaneously and are quite obvious, which respectively achieve the lifting accuracy of 5.87% and 3.95%, and the theoretical effectiveness of the method is also proved at the same time.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand the invention for and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. The deep learning training method for improving the identification accuracy of small samples and small targets is characterized in that the traditional small sample learning training method is divided into two stages, and in the training stage, a double-attention machine system is introduced to strengthen the difference information characteristics of different small samples and small targets; in the verification stage, a feature generation module based on Gaussian distribution generates a relevant sample by means of a base class.

2. The deep learning training method for improving the recognition accuracy of small samples and small targets according to claim 1, wherein the dual attention mechanism is fused by a spatial attention mechanism and a channel attention mechanism, and is fused into a feature extractor for training key information based on a traditional small sample learning training method to enhance features of the key information.

3. The deep learning training method for improving the identification accuracy of small samples and small targets according to claim 1, wherein the feature of the critical information of the small samples and the small targets is assumed to be gaussian distributed in the verification stage, the mean and variance of the base class of the small samples and the small targets in the feature space are calculated based on the feature extraction network of the critical information in the training stage, labeled samples are generated based on the nearest neighbor algorithm and the features of the new class of similar classes of the small samples and the small targets, and the labeled samples and the new class small samples and the small target support set are generated by combining the labeled samples and the training classifier.

4. The deep learning training method for improving the recognition accuracy of small samples and small targets according to claim 2, wherein the step of performing backbone network information fusion on a channel attention mechanism and a space attention mechanism is as follows:

its complete channel attention operates as follows:

ω ₁ ＝σ(W ₁ ReLU(W ₀ α(F))+W ₁ ReLU(W ₀ δ(F)))

step 2: the spatial attention mechanism performs feature fusion on results of average pooling downsampling and maximum pooling downsampling, and then superposes a convolution layer to obtain spatial weight features, wherein the formalization is represented as follows:

ω ₂ ＝σ(W ^7*7 (α(F)；δ(F)))

ω ₁ ,ω ₂ respectively representing the channel attention mechanism and the spatial attention mechanism mentioned,

representing element-by-element multiplication.

5. The deep learning training method for improving the recognition accuracy of small samples and small targets according to claim 3, wherein the detailed steps in the verification stage are as follows:

where i represents the number of sample classes in the support set and j represents the specific number of samples in the corresponding class. n is _i Then represents the sample in base class iTotal number of (2), x _j Is the feature vector of the jth sample in base class i;

step 2: generating most relevant class characteristics, carrying out affinity measurement by using Euclidean distance to find K classes, and generating most relevant class characteristic information through the mean value and variance of the K classes;

and step 3: training a linear classifier based on the generated feature samples, and referring the generated feature set to Fy, F _y ＝{(μ' ₁ ,Σ' ₁ ),…,(μ' _k ,∑' _k ) Of which is mu' _i ,∑' _i The mean value and the variance of the calibration distribution generated according to the characteristic vector of the ith type of image in the support set; the mean and feature vector of the known class are randomly sampled to generate a complementary sample set G _y ：

wherein