CN113112005A

CN113112005A - Domain self-adaption method based on attention mechanism

Info

Publication number: CN113112005A
Application number: CN202110456916.8A
Authority: CN
Inventors: 何克磊; 季雯; 霍静; 许悦; 高阳; 张峻峰
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-07-13

Abstract

The invention discloses a domain self-adaptive method based on an attention mechanism, which comprises the following steps of obtaining a third sample of a second source domain of a second domain of an unsupervised domain self-adaptive model and a fourth sample of a second target domain of the second domain through a converter of the unsupervised domain self-adaptive model based on a first sample of the first source domain of the first domain and a second sample of the first target domain of the first domain of the unsupervised domain self-adaptive model, obtaining a conversion prediction result of the unsupervised domain self-adaptive model through a neural network model and an attention obtaining mechanism, obtaining a prediction loss model through a loss function model, and realizing alignment based on the attention mechanism in cross-domain through minimizing an intra-domain consistency function based on attention, improving the performance of the unsupervised domain self-adaptive model, improving the efficiency of the model and the accuracy of cross-domain adaptation, it is possible to locate distinct areas with high sensitivity to small variations in these areas.

Description

Domain self-adaption method based on attention mechanism

Technical Field

The invention relates to the field of deep neural networks, in particular to a field self-adaptive method based on an attention mechanism.

Background

Training tasks using deep neural networks often requires manual labeling of data sets as supervisory information. However, manually labeling data sets is time consuming and laborious. One possible approach is to use already labeled data for the relevant task domain. However, data distribution in different task domains often differs greatly, and the performance of the model is greatly reduced during model migration. Therefore, the unsupervised domain adaptive method has received a lot of attention in recent years.

The definition of the classical unsupervised domain adaptive learning scenario is that when the source domain has a large number of labeled samples and the target domain has no or only a small number of labeled samples, we hope that the model trained in the source domain can have better generalization performance on the target domain, which forms one. The model is trained by using the source domain samples with rich information, and the model can be well adapted to the transfer learning method of the sample distribution of the target domain.

The current unsupervised field adaptive methods can be briefly divided into two categories:

(1) method based on non-antagonistic learning

Such methods typically use a metric to measure and minimize the difference in high-order characteristics of the source and target domains. For example, using the Maximum Mean Difference (MMD) as a criterion, after finding a kernel function, both source domain and target domain features are mapped onto a hubert (Hilbert) space of a regeneration kernel, where the distribution distance is narrowed to extract domain invariant (domainvariant) features. For simple distributions, this approach can align the two domain distributions well, but for complex distributions it is often not good.

(2) Methods based on antagonistic learning.

Such methods are typically used when the source and target domains are widely distributed. The antagonism-based learning approach minimizes the difference of the source and target domains by learning the domain distribution.

For example, Ganin et al propose that Domain-antagonistic migration networks (DANN) contain not only a loss for label prediction but also a loss for Domain classification, so that the network learns both classification goals and Domain information. The Maximum Classification Difference (MCD) method proposed by Saito et al establishes two classifiers for classifying source domain samples. For the target sample, the two classifiers must have different task-specific decision boundaries. Lee et al propose Drop-to-adapt (dta) that utilizes antagonistic Dropout to learn robust discriminative features and robustness by implementing clustering assumptions.

However, the performance of these methods still remains to be improved, as they only directly align the two domain distributions, but do not exploit the potential distribution information directly of the two domains. Moreover, only two domain global features are used in the alignment process of the methods, and a global feature network contains a large amount of redundant information and is computationally inefficient. From the global perspective, the traditional consistency learning method for aligning the distribution of two domains based on global features is not efficient. For some fine-grained classification tasks, the characteristic with identification is often used as a decision for output prediction, and other parts may convey redundant noise.

Disclosure of Invention

The invention aims to introduce a domain adaptation method based on an attention mechanism, which focuses more on identified regions and improves the performance of a domain adaptive model.

In order to solve the technical problem, the invention provides a field adaptive method based on an attention mechanism, which comprises the following steps of:

s1, obtaining a third sample of a second source domain of a second field and a fourth sample of a second target domain of the second field of the unsupervised field self-adaptive model through a converter of the unsupervised field self-adaptive model based on a first sample of the first source domain of the first field and a second sample of the first target domain of the first field of the unsupervised field self-adaptive model;

s2, obtaining a conversion prediction result of the unsupervised field self-adaptive model through a neural network model and an attention obtaining mechanism based on the first sample, the second sample, the third sample and the fourth sample;

and S3, obtaining a predicted loss model through a loss function model based on the conversion prediction result, and realizing cross-domain alignment based on an attention mechanism by minimizing an attention-based intra-domain consistency function, thereby improving the performance of the unsupervised domain self-adaptive model.

Preferably, the loss function model includes an input loss function model, an intra-domain loss function model, and an inter-domain loss function model;

the input loss function model is used for obtaining input loss when the first sample and the second sample respectively pass through the converter;

the intra-domain loss function model is used for obtaining a first intra-domain loss between the first sample and the third sample after the attention obtaining mechanism and a second intra-domain loss between the second sample and the fourth sample;

the inter-domain loss function model is used to obtain a first inter-domain loss between the first sample and the third sample, and a second inter-domain loss between the second sample and the fourth sample.

Preferably, the first label of the first and third swatches are identical;

the second label of the second and fourth samples is identical.

Preferably, the input loss function model includes, but is not limited to, a cross entropy loss function.

Preferably, the inter-domain loss function model comprises an output loss function model;

based on the consistency of output prediction results of the first target domain and the second target domain, an output loss function model is constructed for limiting the consistency of semantic information before and after conversion;

and constructing an inter-domain loss function model by adding a regularization term based on the output loss function model and the input loss function model.

Preferably, the input loss function model comprises a source domain input loss function model and a target domain input loss function model;

the source domain input loss function model is used for obtaining a first input loss of a first sample;

the target domain input loss function model is used to obtain a second input loss for the second sample.

Preferably, the method for constructing the intra-domain loss function model comprises the following steps:

s2.1, based on a first sample, a second sample, a third sample and a fourth sample, obtaining a first feature map corresponding to the first sample, a second feature map corresponding to the second sample, a third feature map corresponding to the third sample and a fourth feature map corresponding to the fourth sample through a neural network model;

s2.2, converting the first feature diagram, the second feature diagram, the third feature diagram and the fourth feature diagram into a first attention diagram corresponding to the first feature diagram, a second attention diagram corresponding to the second feature diagram, a third attention diagram corresponding to the third feature diagram and a fourth attention diagram corresponding to the fourth feature diagram respectively based on an attention obtaining mechanism;

s2.3, constructing a first intra-domain loss function by setting a first discriminator based on the first attention map and the fourth attention map;

s2.4, constructing a second intra-domain loss function by setting a second discriminator based on the second attention diagram and the third attention diagram;

and S2.5, constructing an intra-domain loss function model based on attention consistency through the first intra-domain loss function and the second intra-domain loss function based on the antagonistic learning environment.

Preferably, the first intra-domain loss function is a first intra-domain attention consistency loss function based on the source domain;

the second intra-domain penalty function is a second intra-domain attention consistency penalty function based on the target domain.

Preferably, the intra-domain loss function model is based on attention consistency

The expression of (a) is:

where a denotes an attention map, s denotes a source domain, t denotes a target domain, DAS denotes a first discriminator, DAt denotes a second discriminator, At → s is a third attention map, At denotes a second attention map, and Y' is a label representing a sample source domain having the same dimension as that of the attention map a.

Preferably, the neural network model comprises at least a Resnet50 neural network model;

the attention computing mechanism includes at least a CAM attention computing mechanism.

The invention discloses the following technical effects:

compared with the existing method, the method adds the attention-based domain adaptation, improves the efficiency of the model and the accuracy of the cross-domain adaptation, can locate the distinguishing regions, has high sensitivity to the tiny changes of the regions, and gives less attention to the irrelevant regions, so the accuracy is higher.

The discriminator allows the network to pay more attention to areas of inconsistency between the two domains to eliminate subtle differences.

Based on the two points, the method is particularly suitable for tasks with high fine granularity, and the accuracy is greatly improved.

The invention can realize automatic batch labeling of the label-free data set by utilizing the existing data set, and does not depend on specific human priori knowledge, thereby having good generalization.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in FIG. 1, the invention discloses a domain adaptive method based on an attention mechanism, which comprises the following steps:

The loss function model comprises an input loss function model, an intra-domain loss function model and an inter-domain loss function model; the input loss function model is used for obtaining input loss when the first sample and the second sample respectively pass through the converter; the intra-domain loss function model is used for obtaining a first intra-domain loss between the first sample and the third sample after the attention obtaining mechanism and a second intra-domain loss between the second sample and the fourth sample; the inter-domain loss function model is used to obtain a first inter-domain loss between the first sample and the third sample, and a second inter-domain loss between the second sample and the fourth sample.

The first label of the first and third samples is identical; the second label of the second and fourth samples is identical.

The input loss function model includes, but is not limited to, a cross entropy loss function.

The inter-domain loss function model comprises an output loss function model; based on the consistency of output prediction results of the first target domain and the second target domain, an output loss function model is constructed for limiting the consistency of semantic information before and after conversion; and constructing an inter-domain loss function model by adding a regularization term based on the output loss function model and the input loss function model.

The input loss function model comprises a source domain input loss function model and a target domain input loss function model; the source domain input loss function model is used for obtaining a first input loss of a first sample; the target domain input loss function model is used to obtain a second input loss for the second sample.

The construction method of the intra-domain loss function model comprises the following steps:

The first intra-domain loss function is a first intra-domain attention consistency loss function based on the source domain; the second intra-domain penalty function is a second intra-domain attention consistency penalty function based on the target domain.

Intra-domain loss function model based on attention consistency

The expression of (a) is:

The neural network model comprises at least a Resnet50 neural network model;

Example 1: attention, feature and output based domain adaptation

1. Let the source domain be S and the target domain be T, take a single set of sample input as an example, and randomly extract a sample X from each of the domains S and T_s，X_tAs input, a converter is used to generate samples of the other field respectively as X_s→tAnd X_t→sThe converter comprises a generator and an arbiter with a loss L_GAnd L_D。

The converted sample is consistent with the label of the original sample, because the converter only changes the knowledge of sample style and the like, but basically keeps semantic consistency; namely X_sAnd X_s→tConsistent label, X_tAnd X_t→sThe labels are consistent.

2. And processing the sample and inputting the processed sample into a neural network such as Resnet50 and the like to obtain a prediction result.

3. Computing input X using a loss function_sAnd X_s→tThe predicted loss of (c) is calculated as follows, taking cross entropy loss (CEL, cross entropy) as an example:

where C is the classifier in the neural network and S 'and T' are the feature distributions of the images generated by the transformation.

4. Based on 3.1, X_t→sAnd X_tShould be consistent, so the present implementation proposes a loss of output consistency to limit semantic information consistency before and after conversion:

5. the inter-domain consistency mainly comprises the label prediction consistency of cross-domain samples, and lambda is added as a regularization term, so that the inter-domain consistency loss function is as follows:

6. the inter-domain consistency loss function is used for parameter updating and model optimization in the training process, and the source domain and the target domain can share parameters of the neural network in the process generally.

7. In the training process, an attempt is made to minimize an inter-domain loss function so as to reduce the feature distribution difference between the source domain and the target domain, so that the aim of limiting the consistency of semantic information and style of samples of the source domain and the target domain is achieved, and the domain adaptation based on output is realized.

8. For a fine-grained task, output-only domain adaptation lacks sensitivity to fine regions.

9. Methods that only output coherences lack constraints on samples in the domain, i.e. X_sAnd X_t→sAnd X_tAnd X_s→t。

Based on 8 and 9, the invention combines domain adaptation based on features and attention to improve the accuracy of classification tasks. The specific method comprises the following steps:

10. obtaining a feature map after passing the converted sample and the original sample through a feature extractor (which can be realized by a neural network model such as Resnet 50) to obtain F_s，F_t→s，F_t，F_s→t。

11. A learning strategy is generated using the countermeasures such that the feature distributions between the source domain and the target domain are aligned.

12. In order to realize the feature-based domain adaptive method, two discriminators D are provided_Fs,D_FtAnd the method is used for distinguishing the domain distribution of the feature map (namely, the source domain of the sample to which the feature map belongs).

The specific implementation method comprises the following steps:

13. in the antagonistic learning environment, the loss function is:

where Y is a label representing the area of origin of the sample in the same dimension as the attention map a.

14. Since the two pairs of cross-domain features are jointly optimized, the resulting loss of consistency is the accumulation of errors caused by the two features.

15. In the discrimination process, the intra-domain feature consistency of the source domain is attempted to reach F_s,F_t→sSo the loss function of the source domain is defined as:

similarly, the feature alignment of the target domain may be defined as:

16. since the feature map F will convey global information of the samples, there is some redundancy. And more attention is given to the features with the identification degrees, so that tasks such as classification prediction and the like are simpler and have higher accuracy.

17. In particular, for a fine-grained task, even a small change between the source domain and the target domain may have a large effect on the model prediction result.

18. Based on 16 and 17, the present invention incorporates attention alignment for source domain samples and target domain conversion to source domain samples, i.e., alignment X_sAnd X_t→sThe attention of (1) is sought.

19. A feature map F obtained by passing the converted sample and the original sample through a feature extractor (which can be realized by a neural network model such as Resnet50, etc.)_s，F_t→s，F_t，F_s→tThen obtaining an attention diagram A through an attention computing mechanism such as CAM and the like_s，A_t→s，A_t，A_s→t。

Taking the original sample of the source domain as an example, taking the CAM attention computing mechanism as an example, the feature mapping is

Wherein F'_s,jIs the output feature map F of the jth channel_s′(F_s' is a pre-GAP feature),

is the jth weight that predicts the fully-connected layer of the ith class.

20. In order to improve the performance of the unsupervised domain adaptive method, on the basis of the domain adaptive method based on the output and the characteristics, the invention implements a new domain adaptive method based on the output, the characteristics and the attention mechanism and a related loss function by combining the attention mechanism.

21. To make it practicalIn the field self-adaptive method part based on attention mechanism in the invention, two discriminators D are provided_As，D_AtAnd (3) domain distribution for the discriminant attention map (namely, for the source domain of the sample to which the discriminant attention map belongs).

D_AsIs applied to A_s，A_t→s；D_AtIs applied to A_t，A_s→t。

22. The attention consistency loss function is defined as:

where Y' is a label representing the area of origin of the sample in the same dimension as the attention map a.

23. The attention-based intra-domain consistency loss function for the source domain is:

24. the attention-based intra-domain consistency loss function for the target domain is:

based on the above, D_AsIs aligned with A_sAnd A_t→sThe domain distribution of (2).

25. Based on the above, the intra-domain consistency total loss function is:

26. based on the above, the overall loss function is:

in the embodiment of the invention, neural networks such as Resnet50 and the like can be used as a backbone network for feature extraction, samples are output through the neural networks, and semantic information consistency of cross-domain samples is kept by minimizing a loss function for the output. The method comprises the steps that a sample is processed through a feature extractor to obtain a feature graph to provide global information, the feature graph is processed through an attention calculation mechanism such as CAM and the like to obtain an attention graph to provide local information, and through continuous iteration and parameter optimization, a feature extraction network extracts features with consistency when the attention graph is distributed on a source domain and a target domain, so that good performance can be obtained on the target domain by utilizing a model with supervision information training of the source domain. And through an attention mechanism, the model focuses more on a local area with identification, and the prediction performance of the model is improved. With the improvement of the model performance, the accuracy of the attention mechanism is continuously improved, and a part with more identification significance is extracted. Cross-domain attention-based mechanism alignment is achieved by minimizing an attention-based intra-domain coherence function. Through the methods, the invention effectively improves the performance of the unsupervised domain adaptive model, namely the domain adaptation based on the output, the characteristics and the attention mechanism.

The above solution of the invention can be applied to a plurality of specific works, and the following are only exemplary to give some specific application directions: labeling of unlabeled datasets for machine learning model training. For example, for classifying cartoon face features, currently, there are few cartoon face features with labels, but a large number of real face sample data sets with labels already exist. The method can use a real face sample data set as a source domain to predict the face features of the cartoon in batches.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A domain adaptive method based on an attention mechanism is characterized by comprising the following steps:

s1, obtaining a third sample of a second source domain of a second domain of an unsupervised domain adaptive model and a fourth sample of a second target domain of the second domain through a converter of the unsupervised domain adaptive model based on a first sample of the first source domain of the first domain and a second sample of the first target domain of the first domain of the unsupervised domain adaptive model;

and S3, obtaining a prediction loss model through a loss function model based on the conversion prediction result, and realizing cross-domain alignment based on an attention mechanism by minimizing an attention-based intra-domain consistency function, thereby improving the performance of the unsupervised domain self-adaptive model.

2. The attention mechanism-based domain adaptation method of claim 1,

the loss function model comprises an input loss function model, an intra-domain loss function model and an inter-domain loss function model;

the input loss function model is used for obtaining the input loss of the first sample and the second sample when the first sample and the second sample respectively pass through the converter;

the intra-domain loss function model is used for obtaining a first intra-domain loss between the first sample and the third sample and a second intra-domain loss between the second sample and the fourth sample after being processed by the attention obtaining mechanism;

the inter-domain loss function model is used to obtain a first inter-domain loss between the first sample and the third sample and a second inter-domain loss between the second sample and the fourth sample.

3. The attention mechanism-based domain adaptation method of claim 2,

the first label of the first and third swatches are identical;

the second label of the second and fourth samples is identical.

4. The attention mechanism-based domain adaptation method of claim 2,

5. The attention mechanism based domain adaptation method of claim 3,

the inter-domain loss function model comprises an output loss function model;

based on the consistency of the output prediction results of the first target domain and the second target domain, constructing the output loss function model for limiting the consistency of semantic information before and after conversion;

and building the inter-domain loss function model by adding a regularization term based on the output loss function model and the input loss function model.

6. The attention mechanism-based domain adaptation method of claim 5,

the input loss function model comprises a source domain input loss function model and a target domain input loss function model;

the source domain input loss function model is used for obtaining a first input loss of the first sample;

the target domain input loss function model is used to obtain a second input loss of the second sample.

7. The attention mechanism-based domain adaptation method of claim 2,

the method for constructing the intra-domain loss function model comprises the following steps:

s2.1, based on the first sample, the second sample, the third sample and the fourth sample, obtaining a first feature map corresponding to the first sample, a second feature map corresponding to the second sample, a third feature map corresponding to the third sample and a fourth feature map corresponding to the fourth sample through the neural network model;

s2.2, converting the first feature map, the second feature map, the third feature map and the fourth feature map into a first attention map corresponding to the first feature map, a second attention map corresponding to the second feature map, a third attention map corresponding to the third feature map and a fourth attention map corresponding to the fourth feature map respectively based on an attention obtaining mechanism;

and S2.5, constructing the intra-domain loss function model based on attention consistency through the first intra-domain loss function and the second intra-domain loss function based on a confrontation learning environment.

8. The attention mechanism based domain adaptation method of claim 7,

the first intra-domain loss function is a first intra-domain attention consistency loss function based on a source domain;

the second intra-domain loss function is a second intra-domain attention consistency loss function based on the target domain.

9. The attention mechanism based domain adaptation method of claim 8,

the intra-domain loss function model based on attention consistency

The expression of (a) is:

where A denotes an attention map, s denotes a source domain, t denotes a target domain, and D_ASDenotes a first discriminator, D_AtDenotes a second discriminator, A_t→sFor the third attention diagram, A_tRepresenting a second attention map, Y' is a label representing the area of origin of the sample in the same dimension as the attention map a.

10. The attention mechanism-based domain adaptation method of claim 1,

the neural network model comprises at least a Resnet50 neural network model;