CN110135579A

CN110135579A - Unsupervised field adaptive method, system and medium based on confrontation study

Info

Publication number: CN110135579A
Application number: CN201910276847.5A
Authority: CN
Inventors: 张娅; 张烨珣; 王延峰
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2019-08-16

Abstract

The present invention provides a kind of unsupervised field adaptive method, system and media based on confrontation study, it include: characteristic extraction step: to the image of source domain and target domain, the feature that image is extracted using feature extraction network, obtains the characteristics of image of source domain and the characteristics of image of target domain；Class prediction step: according to the characteristics of image of the characteristics of image of the source domain of acquisition and target domain, forecast image belongs to the probability of each classification, obtains class prediction probability；Field discriminating step: according to the characteristics of image of the characteristics of image of the source domain of acquisition and target domain, probability of the neural network forecast characteristics of image from source domain and target domain is differentiated by field, obtains domain prediction probability.The present invention can image zooming-out field to source domain and target domain be constant and feature with stronger judgement index, to realize that unsupervised field adapts to.

Description

Unsupervised domain adaptation method, system and medium based on antagonistic learning

Technical Field

The present invention relates to methods in the field of computer vision and image processing, and in particular, to unsupervised domain adaptation methods, systems, and media based on counterstudy.

Background

Deep neural network models have received increasing attention because of their superior performance in many areas. Training a deep neural network model often requires a large amount of labeled data. However, it is time and labor consuming to collect a large scale annotated data set for each new task. Fortunately, we can find data for related tasks for other domains, and using these auxiliary data may help reduce the dependency of the current task on annotating a new data set. However, due to differences in data collection methods and the like, data distribution of two different fields tends to be different. Due to the existence of "domain shift", the performance of the model is greatly reduced when the model trained on one domain is directly tested on the other domain. As a branch of migration learning, domain adaptation is just to learn about the "domain shift" problem that exists between never different domains.

According to the number of labeled samples in the target domain, the domain adaptation can be divided into: supervision, semi-supervision and unsupervised domain adaptation. For unsupervised field adaptation, the target field has no corresponding label, and the model needs to train a model which can achieve a better effect on the target field according to labeled source field data and unlabeled target field data. Currently, many unsupervised domain adaptation methods attempt to align the statistical distributions of the source domain and the target domain through various mechanisms, such as maximum mean difference, correlation alignment, KL divergence, and the like. Recently, there is a research to align the distribution of data by extracting features that are indistinguishable by domain discriminators, in a counterlearning manner. Usually, there are two different feature extraction networks for the source domain and the target domain, and the feature extraction network of the source domain is trained in advance and remains fixed in the learning process. In addition, some domain adaptation methods based on image transformation have been developed, which convert images of a source domain into images of a target domain, the images retaining labeling information of the images of the source domain, and then train a classification network based on the converted images for classifying the images of the target domain.

Patent document CN107958286A (application number: 201711183073.9) discloses a deep migration learning method for a domain adaptive network, which determines a value of a loss function of the domain adaptive network according to a distribution difference, a classification error rate and a mismatch degree corresponding to each task related layer, and updates parameters of the domain adaptive network based on the value of the loss function, so that the domain adaptive network adapts to a target domain. However, this patent document does not use a counterlearning method and does not consider the importance of the discrimination power of the features to the field adaptation.

Disclosure of Invention

In view of the defects in the prior art, the present invention aims to provide an unsupervised domain adaptation method, system and medium based on antagonistic learning.

The invention provides an unsupervised field adaptation method based on antagonistic learning, which comprises the following steps:

a characteristic extraction step: extracting the characteristics of the images in the source field and the target field by using a characteristic extraction network to obtain the image characteristics of the source field and the image characteristics of the target field;

a category prediction step: predicting the probability of the image belonging to each category according to the obtained image characteristics of the source field and the target field to obtain category prediction probability;

a field discrimination step: according to the obtained image features of the source field and the image features of the target field, the probability that the network prediction image features come from the source field and the target field is judged through the field, and a field prediction probability is obtained;

and a counterstudy step: designing a loss function for the obtained domain prediction probability, and enabling the feature extraction network and the domain discrimination network to perform counterstudy, so that the feature extraction network can extract domain invariant features;

a characteristic discrimination force improving step: for the obtained image features of the source field, improving the discrimination of the features by using a central loss function;

and (3) a conditional probability alignment step: and performing conditional probability alignment on the obtained image features of the source field and the target field according to the obtained class prediction probability.

Preferably, the feature extraction step:

inputting the images of the source field and the images of the target field into a feature extraction network by utilizing the feature extraction network, and extracting the image features of the source field and the image features of the target field;

the feature extraction network is a deep convolutional neural network;

the source field image and the target field image are from two different distributed images aiming at the same classification task, the source field image is correspondingly labeled, and the target field image is not labeled information.

Preferably, the category predicting step:

predicting the probability of each category by using a classification network consisting of a full connection layer and a softmax layer according to the obtained image characteristics of the source field and the target field to obtain a category prediction probability;

the category refers to a category of an event contained in the image;

the category predicting step includes:

and a probability calculation step: denote the feature extraction network by E, E (x)_i) Image x representing feature extraction network extraction_iThe dimension is N dimension, C represents a classification network formed by the full connection layer, a total of K classes are preset, the parameter of the full connection layer is an N × K dimension matrix, which is marked as W, and the output of the full connection layer is:

C(E(x_i))＝W^TE(x_i)

wherein,

E(x_i) Image x representing feature extraction network extraction_iThe features of (1);

C(E(x_i) C represents the input E (x) of a classification network consisting of fully connected layers_i) The output obtained by the rear full link layer;

superscript T denotes transpose;

w represents a matrix with the parameter of the full connection layer being N multiplied by K dimension;

converting the output of the full connection layer into an image x through the softmax layer_iProbability for each class, where image x_iThe probability of being of class k is:

wherein,

P_k(x_i) Representing an image x_iProbability of class k;

e represents a natural constant;

[W^TE(x_i)]_kis W^TE(x_i) A value of the k-th dimension;

after calculating the probability of each image category, the image x can be obtained_iThe prediction class (2) of (1), i.e., the class with the highest prediction probability, is as follows:

wherein,

representing an image x_iA category prediction tag of (1);

a step of classification network learning: for the image of the marked source field, the image x obtained in the probability calculation step_iFor each class of probabilities compared to the corresponding label, the following classification network loss function can be calculated:

wherein L is_sRepresenting a class prediction loss function;

the training targets representing the network are: optimizing a parameter θ of a feature extraction network_EAnd a parameter θ of the class prediction network_CMinimizing class prediction loss;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s,Y_s) A distribution of images and labels representing a source domain;

P(x_i) Representing an image x_iProbabilities for each category;

x_ian image representing a source domain;

y_ithe label of the expression category is in the form of one-hot vector, namely the label of the kth class is a K-dimensional vector with the kth dimension being 1 and the other dimensions being 0;

h represents a cross entropy function;

and learning the classification network C and the feature extraction network E according to the obtained classification network loss function, obtaining the learned classification network C and the feature extraction network E, and returning to the probability calculation step for continuous execution.

Preferably, the domain discriminating step:

image x extracted by feature extraction network E_iIs characterized by E (x)_i) The dimension is N, D represents a domain discrimination network composed of fully connected layers, and the output is D (E (x)_i) Let D (E (x))_i) Has dimension 1, is converted to [0,1 ] by sigmoid function h]Interval, h (D (E (x))_i) ))) represents an image x_iProbability from the source domain, then 1-h (D (E (x)_i) ) represents the probability that the image is from the target domain, wherein the sigmoid function can be expressed as:

wherein,

h(D(E(x_i) ))) represents an image x_iProbability from source domain;

D(E(x_i) Represents the output of a domain discrimination network composed of fully connected layers.

The antagonistic learning step:

according to the obtained domain prediction probability, adopting a counterstudy target function to carry out counterstudy on the feature extraction network and the domain discrimination network, enabling the domain discrimination network to distinguish the image of the source domain and the image of the target domain as much as possible, enabling the feature extraction network to extract the invariant features of the domain, thereby confusing the domain discrimination network and enabling the domain discrimination network to carry out misjudgment, even if the domain discrimination network cannot distinguish whether the image features are from the source domain or the target domain;

the domain invariant features refer to image semantic information shared by a source domain and a target domain;

the confrontation learning objective function refers to a feature extraction network minimized confrontation loss function and a domain discrimination network maximized confrontation loss function, and is as follows:

wherein,

L_advrepresenting a resistance loss function;

the optimization target of the expression feature extraction network E is a minimized countermeasure loss function, and the optimization target of the domain discrimination network D is a maximized countermeasure loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Dis a parameter of the domain discrimination network.

Preferably, the feature discrimination force increasing step:

setting a class center point for each class, and adding a center loss function to enable the image characteristics of the source field to be close to the center point of the corresponding class, so that samples scattered in inter-class areas are reduced, and the discrimination of the extracted characteristics is improved;

the center loss function: by calculating the euclidean distance of each feature from the center point of the corresponding class, the following center loss function can be obtained:

wherein,

a central loss function representing a minimized source domain;

L_csa central loss function representing the source domain, which is associated with the feature extraction network E;

the training targets representing the network are: optimizing a parameter θ of a feature extraction network_EMinimizing the central loss of the source domain;

c_yiis class y_iA class center point of (1);

during the first iteration, the center points of each category are initialized by using the data of the current batch, and then the center points are updated in the following way:

wherein,

is the kth classThe center point of the t +1 th iteration;

representing the center point of the kth category at the time of the t-th iteration;

γ is the update rate of the class center point;

k is the number of categories;

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

i (.) represents the indicator function when y_iWhen k is true, I (y)_iK) 1, otherwise 0;

N_kis the number of class k samples in the batch.

Preferably, the conditional probability aligning step:

predicting labels according to the obtained categories of the target domain imagesDesigning a graph of a source domain to minimize a central loss function of a target domainThe class conditional probability P (X | Y) of the image and the image of the target field are aligned, so that the image of the target field is close to the corresponding class center, the distribution of the two fields is aligned, and the image characteristics of the target field without labels have discriminative power;

the minimization of the central loss function of the target domain is expressed as follows:

wherein,

a central loss function representing a minimization target area;

L_ctrepresenting a center loss function of the target field obtained by calculating Euclidean distances between the sample characteristics of the target field and the corresponding class center points;

the optimization objectives for the representation network are: optimizing a parameter θ of a feature extraction network_EMinimizing the central loss of the target area;

representing categoriesA class center point of (1);

Φ(X_t) Is a subset of the target domain, where the samples satisfy the following condition:

Φ(X_t)＝{x_i∈X_tand max(p(x_i))>T}

wherein,

p(x_i) Represents a K-dimensional vector whose K-th dimension represents a sample x_iProbability of being class k;

t represents a threshold value, and a predictive tag is only trusted if its probability is greater than this threshold value.

The invention provides an unsupervised field adaptation system based on antagonistic learning, which comprises the following components:

a feature extraction module: extracting the characteristics of the images in the source field and the target field by using a characteristic extraction network to obtain the image characteristics of the source field and the image characteristics of the target field;

a category prediction module: predicting the probability of the image belonging to each category according to the obtained image characteristics of the source field and the target field to obtain category prediction probability;

a domain discrimination module: according to the obtained image features of the source field and the image features of the target field, the probability that the network prediction image features come from the source field and the target field is judged through the field, and a field prediction probability is obtained;

the confrontation learning module: designing a loss function for the obtained domain prediction probability, and enabling the feature extraction network and the domain discrimination network to perform counterstudy, so that the feature extraction network can extract domain invariant features;

a feature discrimination power boosting module: for the obtained image features of the source field, improving the discrimination of the features by using a central loss function;

a conditional probability alignment module: and performing conditional probability alignment on the obtained image features of the source field and the target field according to the obtained class prediction probability.

Preferably, the feature extraction module:

the feature extraction network is a deep convolutional neural network;

the source field image and the target field image are from two different distributed images aiming at the same classification task, the source field image is correspondingly labeled, and the target field image is not labeled information;

the category prediction module:

the category refers to a category of an event contained in the image;

the category prediction module includes:

a probability calculation module: denote the feature extraction network by E, E (x)_i) Image x representing feature extraction network extraction_iThe dimension is N dimension, C represents a classification network formed by the full connection layer, a total of K classes are preset, the parameter of the full connection layer is an N × K dimension matrix, which is marked as W, and the output of the full connection layer is:

C(E(x_i))＝W^TE(x_i)

wherein,

superscript T denotes transpose;

throughsoftmax layer, converting the output of the full connected layer into image x_iProbability for each class, where image x_iThe probability of being of class k is:

wherein,

P_k(x_i) Representing an image x_iProbability of class k;

e represents a natural constant;

[W^TE(x_i)]_kis W^TE(x_i) A value of the k-th dimension;

wherein,

representing an image x_iA category prediction tag of (1);

the classification network learning module: for the image of the marked source field, the image x obtained by the probability calculation module is used_iFor each class of probabilities compared to the corresponding label, the following classification network loss function can be calculated:

wherein L is_sRepresenting a class prediction loss function;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s,Y_s) A distribution of images and labels representing a source domain;

P(x_i) Representing an image x_iProbabilities for each category;

x_ian image representing a source domain;

h represents a cross entropy function;

and learning the classification network C and the feature extraction network E according to the obtained classification network loss function, obtaining the learned classification network C and the feature extraction network E, and returning to the probability calculation module for continuous execution.

Preferably, the domain discrimination module:

wherein,

h(D(E(x_i) ))) represents an image x_iProbability from source domain;

D(E(x_i) Output of a domain discrimination network composed of all-connected layers;

the confrontation learning module:

wherein,

L_advrepresenting a resistance loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Dis a parameter of the domain discrimination network;

the feature discrimination force improving module:

wherein,

a central loss function representing a minimized source domain;

is class y_iA class center point of (1);

wherein,

is the center point of the kth class at the t +1 th iteration;

γ is the update rate of the class center point;

k is the number of categories;

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

N_kis the number of class k samples in the batch;

the conditional probability alignment module:

predicting labels according to the obtained categories of the target domain imagesDesigning a central loss function of a minimized target field to align the class conditional probabilities P (X | Y) of the images of the source field and the target field, so that the images of the target field are close to the corresponding class centers, thereby aligning the distribution of the two fields and enabling the image characteristics of the target field without annotation to have discriminative power;

wherein,

a central loss function representing a minimization target area;

representing categoriesA class center point of (1);

Φ(X_t)＝{x_i∈X_tand max(p(x_i))>T}

wherein,

According to the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the unsupervised domain adaptation method based on opponent learning of any one of the above.

Compared with the prior art, the invention has the following beneficial effects:

1. the method can be used for realizing the characteristics of unchanged image extraction field and stronger discrimination of the source field and the target field, thereby realizing the unsupervised field adaptation.

2. The invention forces the feature extraction network to extract the features shared between the two domains, namely the semantic information of the image, and ignores the specific information of each domain by sharing the same feature extraction network between the two domains and making the domain discrimination network unable to judge the source of the features. Since the extracted features are domain invariant, the class prediction network obtained by image training in the source domain can also be applied to the target domain, thereby realizing domain adaptation. By introducing the central loss function, the features of the images of the same category in the source field can be more gathered and compact, so that the discrimination of the features is improved. In addition to the marginal probability distribution, the invention also considers the conditional probability distribution of the two domains, and the characteristics of the target domain also keep better discrimination capability by aligning the conditional probability distribution of the two domains, thereby improving the class prediction accuracy of the model in the target domain. In addition, by sharing the same feature extraction network, the image features of the two domains are jointly learned, and at the time of testing, it is not necessary to know whether the image is from the source domain or the target domain.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a schematic flow chart of a method provided in preferred embodiment 2 of the present invention.

Fig. 2 is a schematic diagram of the system principle provided by the preferred embodiment 2 of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Specifically, the feature extraction step:

the feature extraction network is a deep convolutional neural network;

Specifically, the category prediction step:

the category refers to a category of an event contained in the image;

the category predicting step includes:

C(E(x_i))＝W^TE(x_i)

wherein,

superscript T denotes transpose;

wherein,

P_k(x_i) Representing an image x_iProbability of class k;

e represents a natural constant;

[W^TE(x_i)]_kis W^TE(x_i) A value of the k-th dimension;

wherein,

representing an image x_iA category prediction tag of (1);

wherein L is_sRepresenting a class prediction loss function;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s,Y_s) A distribution of images and labels representing a source domain;

P(x_i) Representing an image x_iProbabilities for each category;

x_ian image representing a source domain;

h represents a cross entropy function;

Specifically, the domain discriminating step:

wherein,

h(D(E(x_i) ))) represents an image x_iProbability from source domain;

The antagonistic learning step:

wherein,

L_advrepresenting a resistance loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Dis a parameter of the domain discrimination network.

Specifically, the feature discrimination power raising step:

wherein,

a central loss function representing a minimized source domain;

is class y_iA class center point of (1);

wherein,

is the center point of the kth class at the t +1 th iteration;

γ is the update rate of the class center point;

k is the number of categories;

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

N_kis the number of class k samples in the batch.

Specifically, the conditional probability aligning step:

wherein,

a central loss function representing a minimization target area;

representing categoriesA class center point of (1);

Φ(X_t)＝{x_i∈X_tand max(p(x_i))>T}

wherein,

The unsupervised field adaptation system based on the antagonistic learning can be realized through the step flow of the unsupervised field adaptation method based on the antagonistic learning. The person skilled in the art can understand the unsupervised domain adaptation method based on antagonistic learning as a preferred example of the unsupervised domain adaptation system based on antagonistic learning.

Specifically, the feature extraction module:

the feature extraction network is a deep convolutional neural network;

the category prediction module:

the category refers to a category of an event contained in the image;

the category prediction module includes:

a probability calculation module: denote the feature extraction network by E, E (x)_i) Image x representing feature extraction network extraction_iIs characterized in that the dimension is N dimension, C represents the component of the full connection layerIf a class network is preset with a total of K classes, the parameter of the full connection layer is an N × K dimensional matrix, which is denoted as W, and the output of the full connection layer is:

C(E(x_i))＝W^TE(x_i)

wherein,

superscript T denotes transpose;

wherein,

P_k(x_i) Representing an image x_iProbability of class k;

e represents a natural constant;

[W^TE(x_i)]_kis W^TE(x_i) A value of the k-th dimension;

wherein,

representing an image x_iA category prediction tag of (1);

wherein L is_sRepresenting a class prediction loss function;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s,Y_s) A distribution of images and labels representing a source domain;

P(x_i) Representing an image x_iProbabilities for each category;

x_ian image representing a source domain;

y_irepresenting class labels, in the form of one-hot vectors, i.e.The labels of the kth class are K-dimensional vectors with the kth dimension being 1 and the other dimensions being 0;

h represents a cross entropy function;

Specifically, the domain discrimination module:

wherein,

h(D(E(x_i) ))) represents an image x_iProbability from source domain;

the confrontation learning module:

wherein,

L_advrepresenting a resistance loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Dis a parameter of the domain discrimination network;

the feature discrimination force improving module:

wherein,

a central loss function representing a minimized source domain;

is class y_iA class center point of (1);

wherein,

is the center point of the kth class at the t +1 th iteration;

γ is the update rate of the class center point;

k is the number of categories;

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

N_kis the number of class k samples in the batch;

the conditional probability alignment module:

wherein,

a central loss function representing a minimization target area;

representing categoriesA class center point of (1);

Φ(X_t)＝{x_i∈X_tand max(p(x_i))>T}

wherein,

The present invention will be described more specifically below with reference to preferred examples.

Preferred example 1:

an unsupervised domain adaptation method based on domain invariant antagonistic learning comprises the following steps:

a characteristic extraction step: extracting the features of the images of the source field and the target field by using a depth convolution neural network to obtain the image features of the source field and the image features of the target field;

a category prediction step: predicting the probability of each category by using a classification network consisting of a full connection layer and a softmax layer according to the obtained image characteristics of the source field and the target field; the category refers to a category of an event contained in the image, such as a cat, a dog, a car, and the like.

A field discrimination step: the probability that the features come from the source field and the target field is predicted through a field discrimination network for the image features of the source field and the image features of the target field obtained in the feature extraction step;

and a counterstudy step: designing a loss function for the domain prediction probability obtained in the domain distinguishing step, and carrying out counterstudy on the feature extraction network and the domain distinguishing network so that the feature extraction network can extract domain-invariant features, namely image semantic information shared by the source domain and the target domain;

a characteristic discrimination force improving step: for the image characteristics of the source field obtained in the characteristic extraction step, improving the discrimination of the characteristics by using a central loss function;

and (3) a conditional probability alignment step: and performing conditional probability alignment on the image characteristics of the source field and the image characteristics of the target field obtained in the characteristic extraction step.

The feature extraction step, wherein:

inputting the image of the source field and the image of the target field into a shared feature extraction network by using a deep convolutional neural network model, extracting the image features of the source field and the image features of the target field, namely semantic information shared by the source field and the target field, and neglecting information related to the fields;

The class prediction step, wherein: and performing class prediction on the image in the source field and the image in the target field by utilizing a classification network consisting of a full connection layer and a softmax layer.

The category prediction step specifically comprises the following steps:

we denote the feature extraction network by E, E (x)_i) Image x representing feature extraction network extraction_iThe dimensionality is N-dimensional, and C represents a classification network formed by full connection layers. Assuming a total of K classes, the parameters of the fully-connected layer are N × K dimensional matrix, denoted as W, and the output of the fully-connected layer is:

C(E(x_i))＝W^TE(x_i)

wherein [ W ]^TE(x_i)]_kIs W^TE(x_i) The value of the k-th dimension. Computing images into categoriesAfter the probability of (2), we can get the image x_iThe prediction class of (2), i.e. the class with the highest prediction probability:

for the image of the source domain with the label, comparing the probability obtained in the class prediction step with the corresponding label, the following loss function can be calculated: (the loss function is used to help the classification network learn, so that the model can learn the classification of the predicted image)

Wherein L is_sRepresenting a class prediction loss function;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s,Y_s) A distribution of images and labels representing a source domain;

x_ian image representing a source domain;

h denotes a cross entropy function.

The field discrimination step, wherein: the domain discrimination network is a network composed of fully-connected layers (here, the fully-connected layer is a 3-layer fully-connected layer, and the learned parameters are different, and therefore different from the network composed of fully-connected layers above), and it can output the probability that the features belong to the source domain and the target domain.

The field discrimination step specifically comprises the following steps:

image x extracted by hypothetical feature extraction network E_iIs characterized by E (x)_i) The dimension is N, D represents a domain discrimination network composed of 3 layers of fully connected layers, and the output is D (E (x)_i)). Since there are a total of 2 fields (source field and target field), we order D (E (x)_i) Has dimension 1, is converted to [0,1 ] by sigmoid function h]Interval, h (D (E (x))_i) ))) represents an image x_iProbability from the source domain, then 1-h (D (E (x)_i) ) represents the probability that the image is from the target domain. Wherein the sigmoid function can be expressed as:

the antagonistic learning step, wherein: the feature extraction network and the domain discrimination network play a min-max game (which means that the feature extraction network minimizes an opposition loss function and the domain discrimination network maximizes an opposition loss function), the domain discrimination network distinguishes images of a source domain and images of a target domain as much as possible, the feature extraction network confuses the domain discrimination network as much as possible, and the domain discrimination network cannot distinguish which domain the feature comes from by extracting the feature with unchanged domains.

The counterstudy step is as follows:

the feature extraction network and the domain discrimination network play a min-max game (which means a feature extraction network minimized countermeasure loss function and a domain discrimination network maximized countermeasure loss function), and the objective function is as follows:

wherein,

L_advrepresenting a resistance loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Eis a parameter of the feature extraction network;

θ_Dis a parameter of the domain discrimination network;

h(D(E(x_i) ))) is an image x_iThe probability from the source domain is,

through such counterstudy, the domain discrimination network makes the probability that the predicted image of the source domain comes from the source domain as large as possible, and the probability that the image of the target domain comes from the source domain as small as possible; the feature extraction network has an opposite effect and aims to extract features with invariable domains, so that the domain discrimination network is confused and misjudged.

The feature discrimination force improvement step, wherein: assuming that each class has a class center point, by adding a center loss function, the image features of the source field will be close to the center point of the corresponding class, thereby reducing samples scattered in the inter-class area and improving the discrimination of the extracted features.

The characteristic discrimination force improving step specifically comprises the following steps:

assuming that each class has a class center point, after obtaining the features of the samples in the source domain, by calculating the euclidean distance between each feature and the corresponding class center point, the following center loss function can be obtained:

wherein,

central loss function representing minimized source domain

is class y_iThe class center point of (2).

Usually, the central point of each class is calculated from the features of the images belonging to the class in the entire dataset, but since the entire dataset cannot be calculated due to the batch processing method adopted during the network training, the central points of the classes are continuously updated in the network training process by the iterative update method described below. During the first iteration, the center points of each category are initialized by using the data of the current batch, and then the center points are updated in the following way:

wherein,

is the center point of the kth class at the t +1 th iteration,

gamma is the update rate of the class center point,

k is the number of categories by which the user can access the content,

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

N_kis the number of class k samples in the batch,

the conditional probability aligning step, wherein: due to lack of visionSince it is difficult to directly align the conditional probabilities P (Y | X) of the two domains with the image labels of the target domain, we approximately align the conditional probabilities by aligning the conditional probabilities P (X | Y) of the two domains. The category prediction label of the target domain image obtained by the category prediction step is utilizedThe class conditional probabilities P (X | Y) of the image of the source field and the image of the target field are aligned, so that the image of the target field is close to the corresponding class center, the distribution of the two fields is better aligned, and the image characteristics of the target field without labels have better discrimination.

The conditional probability aligning step is specifically as follows:

in addition to the edge probability distribution, the conditional probability distributions of the two domains are usually different, i.e., P (Y | X) is different. Since the target domain has no labeling information, it is difficult to directly align the conditional probability distributions of the two domains, and therefore, the target domain is approximated by an alignment-like edge distribution P (X | Y). Using the class of the image of the target domain obtained in the class prediction step, the following loss function is designed to align P (X | Y):

wherein,

a central loss function representing a minimization target area;

is a sample x_iThe prediction tag of (a) is determined,

Φ(X_t)＝{x_i∈X_tand max(p(x_i))>T}

wherein, p (x)_i) Is a K-dimensional vector whose K-th dimension represents sample x_iIs the probability of class k, T is a threshold, and a predicted tag is considered to be authentic only if its probability is greater than this threshold.

Preferred example 2:

as shown in fig. 1, which is a flowchart of an embodiment of an unsupervised domain adaptation method based on domain-invariant antagonistic learning according to the present invention, the method processes an image of a source domain and an image of a target domain into an image feature of the source domain and an image feature of the target domain through a feature extraction step, predicts a class of the image of the source domain and the image of the target domain using a class prediction step, and makes the extracted features domain-invariant through a domain discrimination step and an antagonistic learning step, so that a class prediction network obtained by optimizing the image of the source domain can also be applied to the target domain to implement domain adaptation. Through the step of improving the feature discrimination, the features of the images of the same category in the source field are more aggregated and compact, so that the discrimination of the features is improved. In addition, the conditional probability aligning step aligns the conditional probabilities of the two domains, so that the characteristics of the target domain also keep better discrimination capability, and the category prediction accuracy in the target domain is improved.

The invention forces the feature extraction network to extract the shared features between the two domains, namely the semantic information of the image, and ignores the specific information of the two domains by sharing the same feature extraction network between the two domains and making the domain discrimination network unable to judge the source of the features. By sharing the same feature extraction network, the image features of the two domains are jointly learned, and at the time of testing, it is not necessary to know whether the image is from the source domain or the target domain.

Specifically, with reference to fig. 1, the method comprises the steps of:

a characteristic extraction step: extracting the characteristics of the images in the source field and the target field by using a depth convolution neural network to obtain the image characteristics in the source field and the image characteristics in the target field;

a category prediction step: the probability that the image features of the source field and the image features of the target field obtained in the feature extraction step are classified into various categories by using a classification network prediction image formed by a full connection layer;

and a counterstudy step: designing a loss function for the domain prediction probability obtained in the domain distinguishing step, and performing counterstudy on the feature extraction network and the domain distinguishing network so that the feature extraction network can extract domain-invariant features, namely image semantic information shared by the source domain and the target domain;

Corresponding to the method, the invention also provides an embodiment of an unsupervised domain adaptation system based on domain-invariant antagonistic learning, which comprises the following steps:

a feature extraction module: extracting the characteristics of the images in the source field and the target field by using a depth convolution neural network to obtain the image characteristics in the source field and the image characteristics in the target field;

a category prediction module: the probability that the image features of the source field and the image features of the target field obtained by the feature extraction module are classified into various categories by using a classification network predicted image formed by a full connection layer;

a domain discrimination module: the probability that the features come from the source field and the target field is predicted through a field discrimination network for the image features of the source field and the image features of the target field obtained by the feature extraction module;

the confrontation learning module: designing a loss function for the domain prediction probability obtained by the domain discrimination module, and performing counterstudy on the feature extraction network and the domain discrimination network so that the feature extraction network can extract domain-invariant features, namely image semantic information shared by the source domain and the target domain;

a feature discrimination power boosting module: for the image characteristics of the source field obtained by the characteristic extraction module, improving the discrimination of the characteristics by using a central loss function;

a conditional probability alignment module: and performing conditional probability alignment on the image characteristics of the source field and the image characteristics of the target field obtained by the characteristic extraction module.

Technical features realized by each module of the unsupervised domain adaptation system based on the domain-invariant antagonistic learning can be the same as technical features realized by corresponding steps in the unsupervised domain adaptation method based on the domain-invariant antagonistic learning.

Specific implementations of the above steps and modules are described in detail below to facilitate understanding of the technical solutions of the present invention.

In some embodiments of the present invention, the feature extraction step, wherein: and inputting the image of the source field and the image of the target field into a shared feature extraction network by using a deep convolutional neural network model, extracting field-invariant features, namely semantic information shared by the source field and the target field, and neglecting information related to the fields. The source field image and the target field image are from two different distributed images aiming at the same classification task, the source field image is correspondingly labeled, and the target field image is not labeled information.

In some embodiments of the present invention, the category predicting step, wherein: and performing class prediction on the image of the source field and the image of the target field by utilizing a classification network formed by full connection layers.

In some embodiments of the present invention, the domain identifying step includes: the domain discrimination network is a network composed of full connection layers and can output the probability that the features belong to the source domain and the target domain.

In some embodiments of the invention, the antagonistic learning step, wherein: the feature extraction network and the domain discrimination network play a min-max game, the domain discrimination network distinguishes images of a source domain and images of a target domain as much as possible, the feature extraction network confuses the domain discrimination network as much as possible, and the domain discrimination network cannot distinguish which domain the features come from by extracting the features of which the domain is invariant.

In some embodiments of the present invention, the feature determination force increasing step includes: assuming that each class has a class center point, by adding a center loss function, the image features of the source field will be close to the center point of the corresponding class, thereby reducing samples scattered in the inter-class area and improving the discrimination of the extracted features.

In some embodiments of the present invention, the conditional probability aligning step includes: and aligning the class conditional probabilities P (X | Y) of the images of the source field and the target field by using the class labels obtained in the class prediction step, so that the images of the target field are close to the corresponding class center, and the image characteristics of the unmarked target field have better discrimination.

Specifically, a domain adaptive system network framework composed of a feature extraction module, a category prediction module, a domain discrimination module, a countercheck learning module, a feature discrimination enhancement module and a conditional probability alignment module is shown in fig. 2, and the whole system framework can be trained end to end.

In the system framework of the embodiment shown in fig. 2, the image of the source domain and the image of the target domain are input to the feature extraction module, and the feature of the image of the source domain and the feature of the image of the target domain are output, the feature extraction module is a down-sampling module composed of a series of convolution layers (+ batch norm layer + relu layer), and the existing network structure, such as Alexnet, VGG, Resnet, etc., can be used. The characteristics of the image of the source field are input into a category prediction module to predict the category of the image, and the following loss function is generated:

wherein, theta_EIs a parameter of the feature extraction network, θ_CIs a parameter of the class prediction network, (X)_s,Y_s) Distribution of images and labels representing source domain, x_iImage representing the source domain, y_iThe method is a class label, E represents a feature extraction network, C represents a class prediction network, and H represents a cross entropy function.

In order to extract the domain invariant features, as shown in fig. 2, the extracted image features of the source domain and the extracted image features of the target domain pass through a domain discrimination module, and the domain discrimination network is a network formed by full connection layers and can output probabilities that the features belong to the source domain and the target domain. The feature extraction network and the domain discrimination network play a min-max game, the domain discrimination network distinguishes images of a source domain and images of a target domain as much as possible, the feature extraction network confuses the domain discrimination network as much as possible, and the domain discrimination network cannot distinguish which domain the features come from by extracting the features of which the domain is invariant. The specific objective function is as follows:

wherein, theta_EIs a parameter of the feature extraction network, θ_DIs a parameter of the domain discrimination network, D (E (x)_i) Is an image x_iProbability from the source domain, by which the domain discrimination network makes the probability of the predicted image of the source domain from the source domain as large as possible and the probability of the image of the target domain from the source domain as small as possible; the feature extraction network has an opposite effect and aims to extract features with unchanged fields, so that the field discrimination network is confused and misjudged.

In order to make the extracted source domain features more discriminative, as shown in fig. 2, the extracted image features of the source domain are passed through a feature discrimination boosting module, and assuming that each class has a class center, the features of the image of the source domain need to be close to the center of its corresponding class, so as to reduce the number of samples of the inter-class region, and thus define the following center loss function:

wherein,is class y_iThe class center point of (2). Usually, the central point of each category is calculated from the features of the images belonging to the category in the entire dataset, and since the entire dataset cannot be used for calculation due to the batch processing mode adopted during the network training, the central points of the categories are continuously updated in the network training process by the iterative update mode described below. At the first iteration, using the current batchThe data initializes the center point for each category and then updates the center point in the following manner:

wherein,is the center point of the kth class at iteration t +1, gamma is the update rate of the class center point, K is the number of classes,

wherein, B^tIs the batch data at the t-th iteration; i (.) represents the indicator function when y_iWhen k is true, I (y)_iK) 1, otherwise 0; n is a radical of_kIs the number of class k samples in the batch,

through the center loss function, the features of the images of the same category obtained by the feature extraction module can be gathered together, so that the discrimination of the features is increased.

In addition to the edge probability distribution, the conditional probability distributions of the two domains are also different, i.e., P (Y | X) is different. Since the target domain has no labeling information, it is difficult to directly align the conditional probability distributions of the two domains, and therefore, the target domain is approximated by an alignment-like edge distribution P (X | Y). Designing the following loss function by using the image category of the target field obtained in the category prediction step:

wherein,is a sample x_iPredictive label of phi (X)_t) Is a subset of the target domain, where the samples satisfy the following condition:

Φ(X_t)＝{x_i∈X_tand max(p(x_i))>T}

In summary, the same feature extraction network is shared between the two domains, and the domain discrimination network cannot judge the source of the feature, so that the feature extraction network is forced to extract the feature shared between the two domains, namely semantic information of the image, and ignore the specific information of each domain. Since the extracted features are domain invariant, the class prediction network obtained by image training in the source domain can also be applied to the target domain, thereby realizing domain adaptation. Due to the introduction of the central loss function, the features of the images of the same category in the source field can be more gathered and compact, and therefore the discrimination of the features is improved. In addition, by aligning the conditional probability distribution of the two domains, the characteristics of the target domain also keep better discrimination capability, so that the category prediction accuracy rate on the target domain is improved. In addition, by sharing the same feature extraction network, the image features of the two domains are jointly learned, and at the time of testing, it is not necessary to know whether the image is from the source domain or the target domain.

Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. An unsupervised domain adaptation method based on antagonistic learning, comprising:

2. The unsupervised domain adaptation method based on antagonistic learning according to claim 1, characterized in that said feature extraction step:

the feature extraction network is a deep convolutional neural network;

3. The unsupervised domain adaptation method based on antagonistic learning according to claim 2, characterized in that said class prediction step:

the category refers to a category of an event contained in the image;

the category predicting step includes:

C(E(x_i))＝W^TE(x_i)

wherein,

superscript T denotes transpose;

wherein,

P_k(x_i) Representing an image x_iProbability of class k;

e represents a natural constant;

[W^TE(x_i)]_kis W^TE(x_i) A value of the k-th dimension;

wherein,

representing an image x_iA category prediction tag of (1);

wherein L is_sRepresenting a class prediction loss function;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s，Y_s) A distribution of images and labels representing a source domain;

P(x_i) Representing an image x_iProbabilities for each category;

x_ian image representing a source domain;

h represents a cross entropy function;

4. The unsupervised domain adaptation method based on antagonistic learning according to claim 3, characterized in that said domain discriminating step:

wherein,

h(D(E(x_i) ))) represents an image x_iProbability from source domain;

The antagonistic learning step:

wherein,

L_advrepresenting a resistance loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Dis a parameter of the domain discrimination network.

5. The unsupervised domain adaptation method based on antagonistic learning according to claim 4, characterized in that said feature discrimination power boosting step:

wherein,

a central loss function representing a minimized source domain;

c_yiis class y_iA class center point of (1);

wherein,

is the center point of the kth class at the t +1 th iteration;

γ is the update rate of the class center point;

k is the number of categories;

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

N_kis the number of class k samples in the batch.

6. The unsupervised domain adaptation method based on antagonistic learning according to claim 5, characterized in that said conditional probability aligning step:

wherein,

a central loss function representing a minimization target area;

representing categoriesA class center point of (1);

Φ(X_t)＝{x_i∈X_tand max(p(x_i))＞T}

wherein,

7. An unsupervised domain adaptation system based on antagonistic learning, comprising:

8. The supervised-domain adaptation system for antagonistic learning based on claim 7, wherein the feature extraction module:

the feature extraction network is a deep convolutional neural network;

the category prediction module:

the category refers to a category of an event contained in the image;

the category prediction module includes:

C(E(x_i))＝W^TE(x_i)

wherein,

superscript T denotes transpose;

wherein,

P_k(x_i) Representing an image x_iProbability of class k;

e represents a natural constant;

[W^TE(x_i)]_kis W^TE(x_i) A value of the k-th dimension;

wherein,

representing an image x_iA category prediction tag of (1);

wherein L is_sRepresenting a class prediction loss function;

θ_Eparameters representing a feature extraction network;

θ_Cparameters representing a class prediction network;

(X_s，Y_s) A distribution of images and labels representing a source domain;

P(x_i) Representing an image x_iProbabilities for each category;

x_ian image representing a source domain;

h represents a cross entropy function;

9. The unsupervised domain adaptation system based on antagonistic learning according to claim 8, characterized in that the domain discrimination module:

wherein,

h(D(E(x_i) ))) represents an image x_iProbability from source domain;

the confrontation learning module:

wherein,

L_advrepresenting a resistance loss function;

X_sa set of samples representing a source domain;

X_ta set of samples representing a target domain;

θ_Dis a field judgmentParameters of other networks;

the feature discrimination force improving module:

wherein,

a central loss function representing a minimized source domain;

c_yiis class y_iA class center point of (1);

wherein,

is the kth class at the t +1 th iterationA center point of (a);

γ is the update rate of the class center point;

k is the number of categories;

representing a center pointThe calculation formula of (2) is as follows:

wherein,

B^tis the batch data at the t-th iteration;

N_kis the number of class k samples in the batch;

the conditional probability alignment module:

wherein,

a central loss function representing a minimization target area;

representing categoriesA class center point of (1);

Φ(X_t)＝{x_i∈X_tand max(p(x_i))＞T}

wherein,

10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the unsupervised domain adaptation method based on opponent learning of any one of claims 1 to 6.