CN116665039A

CN116665039A - Small sample target identification method based on two-stage causal intervention

Info

Publication number: CN116665039A
Application number: CN202310423390.2A
Authority: CN
Inventors: 耿杰; 马伟晨; 蒋雯
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2023-08-29

Abstract

The application discloses a small sample target identification method based on two-stage causal intervention, which comprises the following steps: firstly, constructing and utilizing a causal intervention method to pretrain a small sample target recognition depth network model; extracting confusion factors and target features by using the trained target recognition depth network model; step three, causal intervention is performed on the target characteristics of the support set; step four, calculating and correcting a distance matrix and optimizing a network model; and step five, obtaining a final small sample target recognition network model. According to the application, a model pre-training process of a migration type small sample target recognition method is optimized by utilizing causal intervention, false association of the confusion factors and target features is weakened, alignment relations between the confusion factors and the target features under different distributions are fully considered, the ground domain target features are subjected to counter facts, distance measurement of the different domain target features is facilitated, and classification precision and robustness of target recognition under the condition of small samples are improved.

Description

Small sample target identification method based on two-stage causal intervention

Technical Field

The application belongs to the field of intelligent interpretation of remote sensing images, and particularly relates to a small sample target identification method based on two-stage causal intervention.

Background

With the rapid development of machine learning technology, convolutional Neural Networks (CNNs) have become extremely popular and indispensable architectures in target classification, and have a great advantage over traditional methods in that their data-driven mechanisms avoid the complexity and complexity of feature extractor design. In addition, the methods of local connection, shared weighting, pooling, multi-layer stacking and the like promote deeper networks and hierarchical feature representations, and have made great progress in solving the problems of target feature extraction and classification.

However, deep CNN training requires a large number of high quality training samples to prevent model overfitting at the time of reasoning. In practice, the collection of image data and sample marking can be quite expensive and difficult, resulting in difficulty for researchers to obtain large-scale training samples, some of which may be only a few or tens of labeled samples. In this case, the performance of the depth CNN is significantly degraded, and a problem of small sample object recognition arises.

Generally, small sample target recognition methods can be divided into two main categories: meta learning method and migration learning method. The meta learning method comprises two stages of meta training and meta testing, wherein training samples are combined into a batch in the form of tasks, a model is trained on a support set, and reasoning is carried out on a query set. The meta-learning method can be subdivided into optimization-based meta-learning and metric-based meta-learning. The optimization learning method aims at obtaining a model with strong generalization capability through a meta-training stage, performing model fine tuning by using a small number of support set samples with labels in a meta-testing stage, and then classifying unlabeled query set samples. The measurement learning method maps the sample into the high-dimensional space, and the distance between different sample features in the high-dimensional space is measured to classify the sample by a commonly used distance measurement formula or a leavable distance measurement method, so that the measurement learning method has a better effect on target recognition than optimization learning. However, for image object recognition, factors such as strong background scattering interference and the like cause large characteristic changes of similar objects. In this case, the features of the training set are hard to generalize to the test set, the feature vectors of the same class in the feature space show different structures, and the metric learning method is very dependent on the similarity between the unlabeled sample and the labeled sample, and the performance of the metric learning method is severely reduced along with inaccurate expression of the sample relationship.

Some recent migration type small sample learning methods obtain more accurate classification effects than complex meta-learning algorithms. The transfer learning first performs multi-class pre-training on the base class data and then fine-tunes on the scarce new class data. And the part of the method is added with contrast learning and self-supervision distillation in the pre-training stage, so that a better pre-training model is obtained, and the classification performance of the model exceeds that of a large number of meta-learning methods based on measurement.

However, these pre-training methods ignore which features should be learned by the model during pre-training, and not which features should be learned by the model during pre-training, so when the new domain distribution differs greatly from the base domain distribution, the pre-training model is affected by the false association established by the base domain confusion factor, and the recognition effect is severely reduced under the action of the new domain confusion factor. These methods have not well eliminated false correlations from aliasing factors, and improving the robustness of small sample target recognition is an important issue to be addressed.

Disclosure of Invention

Aiming at the defects in the prior art, the application provides a small sample target recognition method based on two-stage causal intervention, which has the advantages of simple structure and reasonable design, optimizes a small sample target recognition model, optimizes a model pre-training process of a migration type small sample target recognition method by causal intervention, weakens false association of confusion factors and target features, fully considers the counterfactual of the confusion factors and target features under different distributions on the basic domain target features Ji Guan, facilitates the measurement of the target feature distance in different domains, and improves the classification precision and the robustness of target recognition under the condition of small samples.

In order to solve the technical problems, the application adopts the following technical scheme: the small sample target identification method based on the two-stage causal intervention is characterized by comprising the following steps of:

firstly, constructing an image target recognition depth network model, and pre-training the network model by utilizing a basic category data set;

step 101, correcting and changing an objective function trained by a model by using a variational back gate;

102, acquiring data input and modeling a confusion factor C in data;

step 103, approximating the distribution of the background using the pseudo-variational posterior probability

Step 104, utilizing probability distribution of the prognosis of the cause and the effect according to the data input and the confusion factorTraining and optimizing a depth network model;

step two, extracting confusion factors and target features of a support set and a query set respectively by utilizing the pre-trained target recognition depth network: the confusion factor characteristic expression of the support set and the query set isThe target characteristic expression is->The target characteristic expression corrected by the confusion factor is +.>

Thirdly, performing causal intervention on target features of the support set data:

step 301, obtaining confusion factor C of the support set and the query set from the second step _s And C _q Generating new confusion factor C based on support set confusion factor _q ′；

Step 302, utilizing a new confusion factor C _q ' counter facts for target features of a support set, which can be expressed asc _i ～C _q ′；

Step four: calculating and correcting the distance matrix, and optimizing a network model after obtaining the classification result:

step 401, calculating N class prototypes after the counterfactualWherein M represents that M feature vectors are included in the ith class, using cosine distance +.>Measuring N class prototype cls _i Distance from K query set target features F, where<·,·>Representing the vector inner product, a distance matrix D with dimension N x K can be obtained ₁ ；

Step 402, calculating N category prototypes of the untrimmed factWherein M represents that M feature vectors are included in the ith class, using cosine distance +.>Measuring N class prototype cls _i Target feature F not corrected by confusion factor with K query sets _c The distance between them, wherein ·>Representing the vector inner product, a distance matrix D with dimension N x K can be obtained ₂ ；

Step 403, according to formula D _out ＝λD ₁ +(1-λ)D ₂ Obtaining a final corrected distance matrix D _out Then obtaining a target identification result of the query set;

and step 404, calculating cross entropy loss according to the target recognition result, and updating the optimized depth network model parameters.

Step five: and obtaining a final small sample target recognition network model.

The small sample target identification method based on the two-stage causal intervention is characterized by comprising the following steps of: the specific method of step 101 includes:

step 1011, constructing a causal graph: C-X, C-Y, X-Y, wherein C represents a aliasing factor, X represents an input picture, and Y represents a prediction result for the picture;

step 1012, cutting off the relation between C and X through a do operator, and obtaining a model output of the fruit trunk prognosis through a backdoor correction formulaWherein->

Step 1013, introducing a new distribution Q (C|X) as probability estimation of hidden variables when given picture input by using variation inference, changing the objective function into the lower evidence bound of model output, namely

The small sample target identification method based on the two-stage causal intervention is characterized by comprising the following steps of: the specific method of step 102 includes:

step 1021, randomly initializing feature mapping matrix W _i And scaling matrix a _i ；

Step 1022, calculating the input picture X and the aliasing factor c _i Is of the associated score s of (2) _i ＝a _i ·Tanh(W _i ·X)；

Step 1023, obtaining according to the association score

The small sample target identification method based on the two-stage causal intervention is characterized by comprising the following steps of: the specific method of step 103 includes:

step 1031, randomly generating R pseudo picture samples X';

step 1032, repeating step 1022 and step 1023Approximation formula for background distributionIs->

The small sample target identification method based on the two-stage causal intervention is characterized by comprising the following steps of: the model output expression of the prognosis of the fruit in step 104 isWherein phi is _i Representation encoder, θ _i Representing the parameters of the encoder, c _i From Q (c|x) in step 102.

The small sample target identification method based on the two-stage causal intervention is characterized by comprising the following steps of: the specific method for generating the aliasing factor in step 301 includes:

step 3011, setting a confusion factor to generate a network g (-), and initializing parameters of the network g (-);

step 3012, input support set confusion factor C _s Obtaining a newly generated confusion factor C _q ′＝g(C _s )；

Step 3013, calculate C _q And C _q ' KL divergence D _KL {C _q ||C _q ' to optimize parameters of the aliasing factor generating network.

Compared with the prior art, the application has the following advantages:

1. the application has simple structure, reasonable design and convenient realization, use and operation.

2. According to the application, the causal intervention is utilized to change the objective function of the pre-training stage, so that the false association of the confusion factor and the objective feature is weakened, and the optimized objective recognition depth network model is obtained, thereby initially improving the robustness of the whole optimized objective recognition depth network model to small sample data sets under different distributions, and providing a foundation for the counterfactual of the objective feature.

3. The application fully considers the alignment relation between the confusion factors and the target features under different distributions, and uses the basic domain d ₁ Generation of new domain d based on obfuscation factors ₂ Is convenient to be different by reversing the facts of the target characteristics of the basic domainAnd the domain target feature distance measurement improves the classification precision and the robustness of the small sample condition target identification.

In conclusion, the method has the advantages of simple structure and reasonable design, optimizes the model pre-training process of the migration type small sample target recognition method, weakens the false association of the confusion factors and the target features, fully considers the alignment relation between the confusion factors and the target features under different distributions, carries out counterfactual on the basic domain target features, facilitates the measurement of the distance between the target features in different domains, and improves the classification precision and the robustness of target recognition under the condition of small samples. The technical scheme of the application is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of the method of the present application.

Detailed Description

The method of the present application will be described in further detail with reference to the accompanying drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the application described herein may be capable of being practiced otherwise than as specifically illustrated and described. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Spatially relative terms, such as "above … …," "above … …," "upper surface at … …," "above," and the like, may be used herein for ease of description to describe one device or feature's spatial location relative to another device or feature as illustrated in the figures. It will be understood that the spatially relative terms are intended to encompass different orientations in use or operation in addition to the orientation depicted in the figures. For example, if the device in the figures is turned over, elements described as "above" or "over" other devices or structures would then be oriented "below" or "beneath" the other devices or structures. Thus, the exemplary term "above … …" may include both orientations of "above … …" and "below … …". The device may also be positioned in other different ways (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

As shown in fig. 1, the small sample target identification method based on two-stage causal intervention of the application comprises the following steps:

102, acquiring data input and modeling a confusion factor C in data;

Step 402, calculating N category prototypes of the untrimmed factWherein M represents that M feature vectors are included in the ith class, using cosine distance +.>Measuring N class prototype cls _i Target feature F not corrected by confusion factor with K query sets _c Distance between (1) and (b) wherein<·,·>Representing the vector inner product, a distance matrix D with dimension N x K can be obtained ₂ ；

Step five: and obtaining a final small sample target recognition network model.

Step 1023, obtaining according to the association score

step 1031, randomly generating R pseudo picture samples X';

step 1032, repeating step 1022 and step 1023The approximate estimation formula of the background distribution is +.>

The foregoing is merely an embodiment of the present application, and the present application is not limited thereto, and any simple modification, variation and equivalent structural changes made to the foregoing embodiment according to the technical matter of the present application still fall within the scope of the technical solution of the present application.

Claims

1. The small sample target identification method based on two-stage causal intervention is characterized by comprising the following steps of: the method comprises the following steps:

102, acquiring data input and modeling a confusion factor C in data;

Step 302, utilizing a new confusion factor C _q ' counter facts for target features of a support set, which can be expressed as

Step five: and obtaining a final small sample target recognition network model.

2. The two-stage causal intervention based small sample target identification method of claim 1, wherein: the specific method of step 101 includes:

3. The two-stage causal intervention based small sample target identification method of claim 1, wherein: the specific method of step 102 includes:

Step 1023, obtaining according to the association score

4. The two-stage causal intervention based small sample target identification method of claim 1, wherein: the specific method of step 103 includes:

step 1031, randomly generating R pseudo picture samples X';

step 1032, repeating step 1022 and step 1023The approximate estimation formula of the background distribution is as follows

5. The two-stage causal intervention based small sample target identification method of claim 1, wherein: the model output expression of the prognosis of the fruit in step 104 isWherein phi is _i Representation encoder, θ _i Representing the parameters of the encoder, c _i From Q (c|x) in step 102.

6. The two-stage causal intervention based small sample target identification method of claim 1, wherein: the specific method for generating the aliasing factor in step 301 includes: