CN117152563A

CN117152563A - Training method and device for hybrid target domain adaptive model and computer equipment

Info

Publication number: CN117152563A
Application number: CN202311337554.6A
Authority: CN
Inventors: 陆玉武; 胡雪
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2023-12-01
Anticipated expiration: 2043-10-16
Also published as: CN117152563B

Abstract

The embodiment of the application is suitable for the technical field of machine learning, and provides a training method, a training device and computer equipment for a hybrid target domain self-adaptive model, wherein the method comprises the following steps: acquiring a source domain and a target domain, wherein the source domain comprises a plurality of marked samples, and the target domain is obtained by mixing a plurality of sub-target domains comprising a plurality of unmarked samples; respectively extracting sample characteristics by a characteristic extractor, wherein the sample characteristics comprise source domain characteristics of marked samples, target domain characteristics of unmarked samples and fusion characteristics of the source domain samples and the target domain samples; calculating model loss based on the source domain features, the target domain features and the fusion features; if the loss function convergence of the mixed target domain adaptive model is determined based on the model loss, determining that the mixed target domain adaptive model training is completed; if the loss function is not converged, updating parameters of the feature extractor, the classifier and the domain discriminator, and continuing training the model until the loss function is converged. By the method, the performance of the model can be improved.

Description

Training method and device for hybrid target domain adaptive model and computer equipment

Technical Field

The application belongs to the technical field of machine learning, and particularly relates to a training method and device for a hybrid target domain self-adaptive model and computer equipment.

Background

When deep learning is performed, training on a model is mostly performed on rich label data. However, in practice, manually annotating large amounts of data is an expensive and time-consuming task, which greatly limits the feasibility of deep learning. Furthermore, conventional deep learning suffers from poor generalization over a new data set due to domain bias between data sets. That is, a model trained on one labeled dataset may not be used on another unlabeled dataset.

To address these issues, a domain adaptation (Domain Adaptation) method may be used to perform migration learning, thereby applying a model with higher accuracy of source domain data training to a target domain with less marked data to reduce the labor cost of tagging the data.

However, existing domain adaptation methods mainly align the entire image across the source and target domains. However, not all parts of the image are migratable and forced alignment of non-migratable areas, such as the background, may lead to negative migration. Negative migration may prevent training of the model on the target domain, which may reduce the performance of the model. For example, the resulting trained model may not achieve accurate classification when performing classification tasks.

Disclosure of Invention

In view of this, the embodiment of the application provides a training method, a training device and a computer device for a hybrid target domain adaptive model, which are used for improving the classification accuracy of the model.

A first aspect of an embodiment of the present application provides a training method of a hybrid target domain adaptive model, where the hybrid target domain adaptive model includes a feature extractor, a classifier, and a domain arbiter, and the method includes:

acquiring a source domain and a target domain, wherein the source domain comprises a plurality of marked samples with real tag information, the target domain comprises a plurality of unmarked samples, the target domain comprises a plurality of sub-target domains, each sub-target domain comprises a plurality of unmarked samples, and the plurality of unmarked samples of the sub-target domains are mixed to obtain the target domain;

extracting sample features by a feature extractor, respectively, wherein the sample features comprise source domain features of the marked samples, target domain features of the unmarked samples and fusion features of the samples in the source domain and the samples in the target domain;

calculating model losses based on the source domain features, the target domain features and the fusion features, wherein the model losses comprise source supervision classification losses of the hybrid target domain adaptive model, domain countermeasure losses corresponding to the domain discriminators, prediction distribution difference losses corresponding to the classifier and the feature extractor;

If the loss function convergence of the mixed target domain adaptive model is determined based on the model loss, determining that the mixed target domain adaptive model training is completed;

if the loss function is not converged, updating parameters of the feature extractor, the classifier and the domain discriminator, and continuing training the hybrid target domain adaptive model until the loss function is converged.

A second aspect of an embodiment of the present application provides a training apparatus for a hybrid target domain adaptive model, the hybrid target domain adaptive model including a feature extractor, a classifier, and a domain arbiter, the apparatus comprising:

the data acquisition module is used for acquiring a source domain and a target domain, wherein the source domain comprises a plurality of marked samples with real tag information, the target domain comprises a plurality of sub-target domains, each sub-target domain comprises a plurality of unmarked samples, and the plurality of unmarked samples of the sub-target domains are mixed to obtain the target domain;

a feature extraction module for extracting sample features respectively by a feature extractor, wherein the sample features comprise source domain features of the marked samples, target domain features of the unmarked samples and fusion features of the samples in the source domain and the target domain;

A loss calculation module, configured to calculate model loss based on the source domain feature, the target domain feature, and the fusion feature, where the model loss includes a source supervision classification loss of the hybrid target domain adaptive model, a domain countermeasure loss corresponding to the domain arbiter, and a prediction distribution difference loss corresponding to the classifier and the feature extractor;

the first judging module is used for determining that the training of the hybrid target domain adaptive model is completed if the loss function of the hybrid target domain adaptive model is determined to be converged based on the model loss;

and the second judging module is used for updating the parameters of the feature extractor, the classifier and the domain discriminator if the loss function is not converged, and continuing training the mixed target domain adaptive model until the loss function is converged.

A third aspect of an embodiment of the present application provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to the first aspect as described above when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as described in the first aspect above.

A fifth aspect of an embodiment of the application provides a computer program product for, when run on a computer device, causing the computer device to perform the method of the first aspect described above.

Compared with the prior art, the embodiment of the application has the following advantages:

the hybrid target domain adaptive model in the embodiment of the application can comprise a feature extractor, a classifier and a domain arbiter, training data can be obtained when the hybrid target domain adaptive model is trained, the training data can comprise a source domain and a target domain, wherein the source domain can comprise a plurality of marked samples with real tag information, the target domain can comprise a plurality of unmarked samples, and the unmarked samples can come from a plurality of sub-target domains and are mixed together. The sample features may be extracted based on the feature extractor, respectively, and the extracted sample features may include source domain features of the marked samples, target domain features of the unmarked samples, and fusion features of the marked samples and the marked samples. Based on the extracted sample features, model loss may be calculated, such that it may be determined whether training of the model is complete based on the model loss. When the loss function converges based on the model loss, it may be determined that model training is complete. When the loss function is determined not to be converged based on the model loss, parameters of the feature extractor, the classifier and the domain discriminator can be adjusted, and training of the hybrid target domain adaptive model is continued until the loss function is converged. In the embodiment of the application, when the model is trained, the marked sample and the unmarked sample are subjected to feature fusion, so that the sample features contain semantic information of a source domain and patterns of a target domain, and the fusion features of the marked sample and the unmarked sample are used for training, so that the identification capacity of a classifier can be improved, the pseudo-label precision on the target domain is improved, and the model obtained through training has higher classification accuracy.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the drawings that are required to be used in the embodiments or the description of the prior art.

FIG. 1 is a schematic diagram illustrating a comparison of multi-target domain adaptation and hybrid target domain adaptation according to an embodiment of the present application;

FIG. 2 is a schematic view of a feature visualization of a hybrid object domain adaptation in a hybrid feature space provided by an embodiment of the present application;

FIG. 3 is a schematic step flow diagram of a training method of a hybrid target domain adaptive model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a hybrid target domain adaptive model according to an embodiment of the present application;

FIG. 5 is a graph comparing the effects of different methods provided by embodiments of the present application on discriminative semantic emphasis;

FIG. 6 is a graph showing the comparison of effects of feature clustering based on different models provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a training device for a hybrid target domain adaptive model according to an embodiment of the present application;

fig. 8 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In recent years, deep learning has made remarkable progress in many applications such as image classification, object recognition, natural language processing, and the like. Most of these algorithms are trained on rich marker data. However, in practice, manually annotating large amounts of data is an expensive and time-consuming task, which greatly limits the feasibility of deep learning. Furthermore, conventional deep learning suffers from poor generalization performance over a new data set due to domain bias. To solve these problems, unsupervised field adaptation (UDA: unsupervised Domain Adaptation) has been widely studied. UDA migrates knowledge from tag rich source domains to untagged target domains to solve the domain offset problem.

Most UDA studies focus on adaptation from one or more source domains to a Single target domain (STDA: single-Target Domain Adaptation). However, unlabeled data may come from a number of different domains, resulting in Multi-target domain adaptation (MTDA: multi-Target Domain Adaptation) becoming a hotspot for research. However, most of the work generally requires the construction of complex network models. In fact, different target samples from multiple distributions will typically be mixed with each other, so mixed target domain (BTDA) is a more realistic migration scenario.

Domain adaptation learning (Domain Adaptation Learning) may utilize source domain knowledge associated with a target domain to assist in learning of the target domain. Hybrid target domain adaptation (blended targets domain adaptation, BTDA) is one type of domain adaptation learning. Because the target domains may be different and the distribution of each target class may be different in the real world. Current research on BTDA is mostly based on direct adaptation from source domain to target domain, and there is a large domain difference. In addition, it is also inevitably affected by irrelevant semantic information, resulting in negative migration.

BTDA is more challenging than MTDA for the following reasons: the mixed feature space has serious domain offset and category offset, so that a good cluster structure is difficult to form for the sample; the clusters of one class may overlap with the clusters of another class. Fig. 1 is a schematic diagram illustrating a comparison of multi-target domain adaptation and hybrid target domain adaptation according to an embodiment of the present application, where each target domain in the MTDA is separated from each other, and each target domain in the BTDA is hybrid, as shown in fig. 1. Fig. 2 is a schematic view of feature visualization of a hybrid object domain adaptation in a hybrid feature space, where numbers in fig. 2 represent categories, numbers in the same category may be in different domains, and as shown in fig. 2, a class 1 cluster in a domain may overlap a class 6 cluster in a domain. It can be seen that hybrid target domain adaptation is more complex than multi-target domain adaptation.

Based on this, the present application proposes a Semantic double-countermeasure network (SDN: semantic Dual-adversarial Network for Blended-target Domain Adaptation) method that can effectively reduce domain differences without using domain labels. In particular, the inventive approach may align category distributions by extending the output of the domain arbiter to the number of categories. In this way, the domain arbiter and the feature extractor are trained in a resistive manner. To suppress irrelevant semantic information, a classifier can be introduced to make min-max gaming. The classifier strives to maximize the variance of the predicted distribution, while the extractor strives to minimize the variance. In this process, irrelevant semantic information is suppressed and primary semantic information is emphasized. In addition, the application also introduces a feature fusion scheme based on random ratio to enhance the source domain, so that the source domain has the texture and style of the mixed target domain, and the domain gap is reduced. Thus, the feature distribution of the same class from different fields forms a good cluster.

The technical scheme of the application is described below through specific examples.

Referring to fig. 3, a schematic step flow diagram of a training method of a hybrid target domain adaptive model according to an embodiment of the present application may specifically include the following steps:

S301, acquiring a source domain and a target domain, wherein the source domain comprises a plurality of marked samples with real tag information, the target domain comprises a plurality of sub-target domains, each sub-target domain comprises a plurality of unmarked samples, and the target domain is obtained by mixing the unmarked samples of the sub-target domains.

The method provided by the embodiment of the application can be applied to computer equipment such as mobile phones, tablet computers, wearable equipment, vehicle-mounted equipment, augmented reality (augmented reality, AR)/Virtual Reality (VR) equipment, notebook computers, ultra-mobile personal computer (UMPC), netbooks, personal digital assistants (personal digital assistant, PDA) and the like, and the embodiment of the application does not limit the specific type of the computer equipment.

The hybrid target domain adaptive model can be applied to a cross-domain classification technology, and can be used for cross-domain target recognition and cross-domain image classification. For example, the trained model may be used to perform image classification tasks. The hybrid target domain adaptive model may include a feature extractor, which may be a neural network, for example, a feature extractor that may use ResNet-50, a classifier, and a domain arbiter for extracting features of the sample data. The features extracted by the feature extractor can be transmitted to the classifier and the domain discriminator, so that the classifier can classify according to the features, and the domain discriminator can judge whether the sample belongs to the target domain or the source domain based on the features. In addition, in the application, after training the domain arbiter, the domain arbiter can also identify the category information.

The source and target fields may be data sets composed of training data for model training. Wherein the data in the source domain is data containing real tag information. The target domain may be a hybrid target domain, i.e. a hybrid data in which the target domain may comprise a plurality of different data sets. The target field may include a plurality of sub-target fields, each sub-target field may include a plurality of unlabeled exemplars, and the unlabeled exemplars of the plurality of sub-target fields may be mixed to obtain the target field. In the embodiment of the application, the number of the samples containing the mark can be n _s Is defined asM target domains are defined asWherein->Comprising n _m Unlabeled samples. The M target domains are mixed together to form a mixed target domain T.

The data in the source domain and the target domain may be images, for example. The image in the source domain may have a corresponding label, while the image in the target domain may be unlabeled.

S302, respectively extracting sample features by a feature extractor, wherein the sample features comprise source domain features of the marked samples, target domain features of the unmarked samples and fusion features of the samples in the source domain and the target domain.

In general, feature fusion is a data enhancement method that improves model performance and improves neural network robustness. Some studies have shown that low-level features represent mainly the texture, structure and style of an image. The prior knowledge of the previous layers of convolutional neural network is thus used to extract texture and style information for the target domain and mix it with the source domain in a random scale. At this time, the sample feature may include semantic information of the source domain and a style of the target domain. Based on the fusion scheme, model training can be performed, the recognition capability of the classifier is improved, and the accuracy of the pseudo tag on the target domain is improved.

Illustratively, the low-level feature map may be represented asWhere C represents a channel, and H and W represent a height and a width. First, the mean and variance of the source and target domain samples need to be calculated separately:

then, a mixed mean μ can be obtained by using a random number α between 0 and 1 _st And the mixed variance sigma _st . The mixing method may be defined as follows

Finally, enhanced source features with target styles can be obtained as follows:

where epsilon may be set to 1e-5 to prevent the denominator from being 0.

And the enhanced source domain is used as a bridge connection target domain to carry out knowledge transfer. The fusion of the source features and the target low-level features can improve the final recognition effect.

Based on this, in performing feature extraction, the computer device may extract low-level source domain features of the marked samples and low-level target domain features of the unmarked samples, respectively, through a low-level network of the feature extractor; fusing the low-level source domain features and the low-level target domain features to obtain low-level fusion features; and respectively extracting the low-level source domain features, the low-level target domain features and the low-level fusion features through a high-level network of the feature extractor to obtain the source domain features, the target domain features and the fusion features. The low-level source domain features and the low-level target domain features can be subjected to feature fusion to obtain low-level fusion features through the following formula:

wherein z is _st Z is a low-level fusion feature _s Is a low-level source domain feature, z _t For the low-level target domain feature, H is the height of the sample image, W is the width of the sample image, and α is a random number.

S303, calculating model loss based on the source domain feature, the target domain feature and the fusion feature, wherein the model loss comprises source supervision classification loss of the hybrid target domain self-adaptive model, domain countermeasure loss corresponding to the domain discriminator, and prediction distribution difference loss corresponding to the classifier and the feature extractor.

The existing domain adaptive method mainly aligns the whole image across the source domain and the target domain. However, it is apparent that not all parts of the image are migratable and forced alignment of non-migratable areas, such as the background, may result in negative migration. Therefore, it is necessary to locate irrelevant areas that lead to misclassification and suppress the features of these areas. The application provides discriminant semantic information anti-learning, focuses more on discriminant features in the image, and suppresses the features of error prediction categories. Specifically, the classifier and feature extractor may be counter-trained. In one aspect, the classifier increases the feature weight of the wrong class by maximizing the predicted distribution difference between pairs of samples of the same class. On the other hand, the feature extractor is trained to suppress the wrong class of features and emphasize the discriminative feature map region by minimizing the prediction distribution. Through the countermeasure training process, the scheme of the application can restrain irrelevant semantic information and carry out purer knowledge migration.

Specifically, the source domain feature, the target domain feature and the fusion feature can be respectively input into a classifier to obtain a first prediction tag corresponding to the source domain feature, a second prediction tag corresponding to the target domain feature and a third prediction tag corresponding to the fusion feature; calculating the source supervised classification loss based on the first predictive label and the third predictive label by the following formula:

Wherein L is _CE Supervising the classification loss for the source, n _s For the number of samples of the source domain, l _ce For the cross entropy loss function, C is the classifier, F is the feature extractor,for the ith marker sample, +.>Is->A corresponding first predictive label, f being the deep network of the feature extractor, ++>For the j-th fusion feature,/->Is->The corresponding real label may be a preset real label for the sample in the source domain.

The domain countermeasure loss may then be calculated based on the first predictive tag and the second predictive tag by the following formula:

wherein L is _D To combat losses in the domain, n _s For the number of samples of the source domain, D ^c For the domain arbiter to discriminate category c, F is the feature extractor,for the ith marker sample, +.>Is->Corresponding real label, n _t For the number of samples of the target domain, +.>For the j-th unlabeled sample, +.>Is->Is a hybrid tag of (a). The hybrid label is a hybrid label of a soft pseudo label and a single hot pseudo label of an unlabeled sample.

The predicted distribution difference loss may be calculated based on the first prediction tag and the second prediction tag by the following formula:

wherein L is _JS To predict the distribution discrepancy loss, JS is the divergence function, T is the harmonic parameter, N _s,s To meet the requirements ofN, N _s,t To meet->Sample number of>The prediction labels representing the two source field samples are identical,the real label representing one source domain sample is identical to the predicted label of one target domain sample.

The hybrid target domain adaptive model loss function may be set to:

wherein L is _CE To supervise classification loss for source, L _D To combat losses in the domain, L _JS To predict the distribution difference loss, F is a feature extractor, C is a classifier, D ^c Is a domain discriminator, beta is L _JS Is used for the positive weight parameter of (a).

And S304, if the loss function convergence of the mixed target domain adaptive model is determined based on the model loss, determining that the mixed target domain adaptive model training is completed.

If the model loss is smaller than the preset threshold, the loss function convergence can be determined, and accordingly model training can be judged to be completed.

And S305, if the loss function is not converged, updating parameters of the feature extractor, the classifier and the domain discriminator, and continuing training the mixed target domain adaptive model until the loss function is converged.

In one possible implementation, the marked samples in the source domain are images with real tag information, the unmarked samples in the target domain are unmarked images, and the trained hybrid target domain adaptive model is used to classify the images.

Referring to fig. 4, a schematic diagram of a framework of a hybrid target domain adaptive model according to an embodiment of the present application is shown. As shown in fig. 4, the framework is composed of a feature extractor, a class classifier, and a domain arbiter with class information. The model is trained using source samples with real labels and unlabeled mixed target samples. The feature extractor and domain discriminators perform countermeasure training to align the category distribution, solving the domain offset problem, while the feature extractor and classifier perform countermeasure training to emphasize discriminative semantic information, suppress irrelevant semantic regions, and perform purer knowledge transfer.

Based on the framework of the hybrid target domain adaptive model, the scheme of the application provides a novel method for solving the image classification task of the BTDA. The method of the scheme comprises the steps of discriminant semantic information countermeasure learning, feature fusion scheme based on random ratio and low-uncertainty pseudo-tag assisted explicit class distribution alignment.

In a BTDA scenario, the inventive scheme will contain a number of marked samples of n _s Is defined asM target domains are defined as +.>Wherein->Comprising n _m Unlabeled samples. The M target domains are mixed together to form a mixed target domain T. / >And->Respectively representing the i-th picture data and the corresponding tag. n is n _s And n _m Respectively indicate->And->Is a sample number of (a) in a sample. The data distribution of the source domain and the target domain are different from each other, which is one of the main obstacles to solving the BTDA problem. In this context, the inventive approach represents the feature extractor as F (-), the classifier as C (-), and the domain arbiter as D ^c (. Cndot.) the use of a catalyst. The object of the inventive solution works to add knowledge from the source domain->Migration to Mixed target Domain->And accurately predicts the label of the unlabeled target specimen.

The existing domain adaptive method mainly aligns the whole image across the source domain and the target domain. However, it is clear that not all parts of the image are migratable and forced alignment of non-migratable areas, such as the background, may lead to negative transitions. Therefore, it is necessary to locate irrelevant areas that lead to misclassification and suppress the features of these areas. The scheme of the application provides the discriminant semantic information for resisting learning, focuses more on discriminant features in the image, and suppresses the features of the misprediction category. Specifically, the scheme of the application subjects the classifier and the feature extractor to countermeasure training. In one aspect, the classifier increases the feature weight of the wrong class by maximizing the predicted distribution difference between pairs of samples of the same class. On the other hand, the feature extractor is trained to suppress the wrong class of features and emphasize the discriminative feature map region by minimizing the prediction distribution. Through the countermeasure training process, the scheme of the application can restrain irrelevant semantic information and carry out purer knowledge migration.

The class activation map for a particular class shows a Convolutional Neural Network (CNN) to distinguish portions of the image for that particular class. For a given image, a _k (u, v) represents the activation of the last convolutional layer neural unit k at the spatial location (u, v). After performing global average pooling, result F for cell k _k Is thatWhere H and W are the feature map sizes. Thus, for category c, logic output z _c Is->Wherein->Is the weight of class c for element k. In fact, the _on>Represents F _k Importance for category c. Finally, the predictive score for category c +.>Insertion throughTo z _c It is possible to obtain:

the scheme of the application is M _c Defined as class c activation map, then

Thus M _c (u, v) shows directly the importance of activation a (u, v) in classifying images into category c. By simply upsampling the class activation map to the size of the original input image, the present solution can find the image region that is most relevant to the particular class. Thus, the scheme of the application can see that the prediction score p _c Depending on the corresponding class activation map, the CAM may locate class-specific image regions. It motivates the inventive approach to find irrelevant semantic regions and suppress the features of these regions. The inventive arrangements will be discussed in the next subsection.

The scheme of the application has two types of sample pairs, one is a source-source pair, wherein two images are from a source domain, the other is a source-target pair, one image is from a source domain, and the other image is from a target domain. For each sample pair, all images belong to the same class. In order to increase the weight of the uncorrelated regions, the inventive approach maximizes the prediction distribution variance. Since the sample pairs belong to the same class and the prediction scores of the same class are all high, the weight of the incorrect class is increased in order to maximize the prediction distribution difference of the sample pairs. For example, the inventive regimen has two pictures, one picture being a cat with flowers and the other picture being a cat with hairtails. The predicted distribution of these two images is [0.03,0.81,0.16] and [0.21,0.7,0.09], respectively, each element representing "fish", "cat" and "flower" in turn. When the distribution difference is maximized, the former image increases the score of "flowers", and the latter image increases the score of "fish". "flowers" and "fish" are irrelevant semantics of the prediction of "cats". Thus, the weight of the wrong class increases. The loss to maximize the predicted distribution difference is defined as follows:

The inventive approach uses Jensen-Shannon (JS) divergence to measure the distribution difference between a pair of predictions. Wherein L is _JS To predict the distribution difference loss, T is the harmonic parameter. N (N) _s,s And N _s,t Respectively indicate that the satisfaction is satisfiedAnd->Number of samples. />The true labels representing the two source domain samples are identical,/->The real label representing one source domain sample is identical to the predictive label of one target domain sample. C is a classifier.

In the previous classifier training, the scheme of the application increases the weight of irrelevant semantics and finds the misclassified area. The inventive approach requires suppressing these uncorrelated regions and emphasizing the main features by training the feature extractor. The scheme of the application trains a feature extractor to emphasize a discriminant region by minimizing the predicted distribution difference of a sample pair, and the minimized predicted distribution difference loss is expressed as follows:

by the aid of the contrast training method, purer knowledge migration can be performed, interference of irrelevant semantic information is avoided, and classification accuracy is greatly improved.

In general, feature fusion is a data enhancement method that improves model performance and improves neural network robustness. Some studies have shown that low-level features represent mainly the texture, structure and style of an image. The scheme of the application utilizes the prior knowledge of the convolutional neural network of the first layers to extract the texture and style information of the target domain and mixes the texture and style information with the source domain in a random proportion mode. At this time, the sample of the scheme of the application contains semantic information of the source domain and the style of the target domain. The fusion scheme can perform model training, improve the discrimination capability of the classifier and improve the pseudo tag precision on the target domain.

Formally, the scheme of the application represents the low-level characteristic diagram asWhere C represents a channel, and H and W represent a height and a width. Firstly, the method needs to calculate the mean and variance of the source domain and target domain samples respectively:

/>

the scheme of the application then obtains the mixed mean mu by using a random number alpha between 0 and 1 _st And the mixed variance sigma _st . The mixing method of the scheme of the application is defined as follows

Finally, the scheme of the application obtains the enhancement source characteristics with the target style as follows:

wherein epsilon is set to 1e-5 to prevent the denominator from being 0.

The fusion of the source features and the target low-level features can improve the final recognition effect. Furthermore, the method of the inventive solution does not require the production of specific images, making it efficient in practice.

In previous domain adaptation based on resistance learning, the last layer of domain discriminators had only one output for one sample, called logit. Following the logit is a sigmoid type function to generate a probability representation, i.e., the sample belongs to the source or target. Similar to (paper: class overrides: mutual Conditional Blended-Target Domain Adaptation), the present approach extends the output number of the domain discriminator to c, where c is the number of categories. Each logic works independently as a GAN (generation of an antagonism network) to reduce And->Conditional distribution differences between specific classes. In order to make each logic correspond to a category, the scheme of the application lets the domain discriminator D ^c Generated predictionsMultiplied by the corresponding one-hot label y _i ∈{0,1} ^c The corresponding logic calculation is then activated to account for resistance loss. The domain counter loss is defined as follows:

wherein,a one-hot tag that is the ith source sample, < >>Hybrid tags representing soft and single thermal pseudo tags of target data.

Target pseudo tags obtained from classifier predictions may be unreliable, adversely affecting distribution alignment. And filtering the low-uncertainty predicted value by adopting an uncertainty-guided filtering mechanism, and converting the low-uncertainty predicted value into a single-hot pseudo tag, wherein the high-uncertainty predicted value is directly used as a soft pseudo tag. The inventive scheme uses entropy as a measure of uncertainty and considers this prediction to be reliable if the uncertainty of the prediction is less than a certain threshold γ. The uncertainty metric formula and hybrid tag are as follows:

during training, the domain discriminator gradually changes from indistinguishable to distinguishable classes, each log may represent a class distribution P (z|y). Some previous approaches actually align marginal distributions, but the approach of the present approach aligns category distributions. Therefore, the scheme of the application can align the category distribution and reduce the inter-domain difference, so that the mixed data can form good clusters.

It should be noted that, the sequence number of each step in the above embodiment does not mean the sequence of execution sequence, and the execution sequence of each process should be determined by its function and internal logic, and should not limit the implementation process of the embodiment of the present application in any way.

In order to illustrate the beneficial effects of the embodiments of the present application, the effects of the embodiments of the present application may be compared with other embodiments.

The inventive protocol the method of the inventive protocol was evaluated by standard dataset testing using the following three BTDA.

Office-31: comprising three different domains: amazon (a), webcam (W), and DSLR (D). Each domain contains 31 common classes.

Office-Home: including images from 65 categories of 4 different fields: artistic (Ar), product (Pr), clipart (Cl) and RealWord (Rw).

Image-CLE: four different fields are included, each field having 12 categories of images: bing (B), imageNet ILSVRC 2012 (I), caltech-256 (C), and Pascal VOC 2012 (P).

For fair comparison, the inventive scheme uses ResNet-50 as the backbone network for all data sets. All experiments herein were performed using PyTorch, which was run on a GPU (GeForce RTX-4090). The scheme of the application adopts a small batch SGD optimizer with momentum of 0.9 to perform network optimization. The inventive approach selects one subset from the dataset as the source domain and mixes the remaining subsets together as the destination domain. According to the scheme, the accuracy of each sub-target domain is calculated, and the evaluation index is the average value of all the accuracy.

The classification results of Office-31, office-Home and ImageCLEF-DA are shown in tables 1 and 2. The "Avg" column is the average value.

TABLE 1

The results of Office-31 evaluation using ResNet-50 are shown in Table 1. The average accuracy of the method of the scheme of the application is 91.6%. The method of the present approach provides significant performance improvements over the baseline method DANN. Notably, the most significant accuracy improvement was observed in task A→W/D, with the method of the present solution achieving a significant performance improvement of 16.5%. In addition, the method of the scheme of the application is 2% higher than the most advanced BTDA method (such as MCDA) on Office-31.

In Table 1, the solution of the present application achieves better results on Office-Home. For example, the method of the present scheme achieves an accuracy of about 4.7% compared to the latest BTDA method MCDA. The average accuracy of the method of the scheme of the application is 75.8%, which is significantly more than 17.9% of the baseline method DANN. This substantial improvement demonstrates the effectiveness and superiority of the method of the present application. Furthermore, it is notable that even though methods such as MTDA, DLC, and DCGCT are compared to methods using domain labels, the methods of the present approach still lead them by 11.7% on Office-Home.

TABLE 2

Table 2: accuracy (%) of BTDA on ImageCLEF-DA (ResNet-50). Each field in the table represents a source, and the remaining fields are blended as targets. Precision is the average precision of all target fields in the hybrid target. * Representing the results generated by the publicly published code. Table 2 shows the classification accuracy of the ResNet-50 based ImageCLEF-DA dataset. In the task C-B/I/P, the classification accuracy is slightly lower than that of the current most advanced method HTA. However, among the other three tasks, a large number of experimental results reached the highest value, and the average accuracy on ImageCLEF-DA was also the highest comparison method. Referring to fig. 5, a visual effect diagram for semantic discrimination based on a hybrid target domain adaptive model according to an embodiment of the present application is shown. As shown in fig. 5, semantic information located by different methods is visually displayed. Original represents an Original picture, MCDA is the latest method for comparison, SDN is the method proposed by the present application, and as can be seen in fig. 5, the actual labels of the Original pictures in the prediction result domain of the method proposed by the present application are the same. The scheme of the application has better effect.

In order to verify the discriminant semantic information emphasizing method of the scheme, the feature images sampled from the Office-Home dataset are subjected to random visualization. It can be seen from the figure that the method provided by the application helps to concentrate the adaptive model in the feature map region with discriminant semantic information. These results intuitively demonstrate that SDN can successfully address the semantic emphasis of key parts in image classification.

Referring to fig. 6, a comparison chart of the visual effects of feature clustering performed by different models according to the embodiment of the present application is shown. FIG. 6 visualizes the feature distribution of t-SNE to task A→D/W. For ResNet50, the target domain features are scattered around bad source domain feature clusters in a cluttered manner. The DANN method also fails to form good clusters of all domain features in the mixed feature space. In contrast, SDN builds a compact cluster of all domain features. As a result, the proposed method proved to be very effective for BTDA tasks.

Referring to fig. 7, a schematic diagram of a training apparatus for a hybrid target domain adaptive model according to an embodiment of the present application may specifically include a data acquisition module 71, a feature extraction module 72, a loss calculation module 73, a first judgment module 74, and a second judgment module 75, where:

a data acquisition module 71, configured to acquire a source domain and a target domain, where the source domain includes a plurality of marked samples with real tag information, and the target domain includes a plurality of unmarked samples;

a feature extraction module 72 for extracting sample features by a feature extractor, respectively, the sample features including source domain features of the marked samples, target domain features of the unmarked samples, and fusion features of the samples in the source domain and the samples in the target domain;

A loss calculation module 73, configured to calculate model losses based on the source domain features, the target domain features, and the fusion features, where the model losses include a source supervision classification loss of the hybrid target domain adaptive model, a domain countermeasure loss corresponding to the domain arbiter, and a prediction distribution difference loss corresponding to the classifier and the feature extractor;

a first judging module 74, configured to determine that the training of the hybrid target domain adaptive model is completed if it is determined that the loss function of the hybrid target domain adaptive model converges based on the model loss;

and a second judging module 75, configured to update parameters of the feature extractor, the classifier, and the domain discriminator if the loss function does not converge, and continue training the hybrid target domain adaptive model until the loss function converges.

In one possible implementation, the feature extraction module 72 includes:

a low-level feature extraction sub-module, configured to extract, through a low-level network of the feature extractor, low-level features of the samples in the source domain and low-level features of the samples in the target domain, respectively;

a low-level feature fusion sub-module, configured to fuse the low-level features of the sample in the source domain with the low-level features of the sample in the target domain to obtain low-level fusion features;

And the high-level feature extraction sub-module is used for extracting the features of the low-level source domain feature, the low-level target domain feature and the low-level fusion feature through a high-level network of the feature extractor to obtain the source domain feature, the target domain feature and the fusion feature.

In one possible implementation manner, the low-level source domain feature and the low-level target domain feature are fused by the following formula to obtain a low-level fusion feature:

wherein Z is _st Z for the low-level fusion feature _s For the low-level source domain feature, Z _t And H is the height of the sample image, W is the width of the sample image, and alpha is a random number.

In one possible implementation, the loss calculation module 73 includes:

the prediction tag calculation sub-module is used for respectively inputting the source domain feature, the target domain feature and the fusion feature into the classifier to obtain a first prediction tag corresponding to the source domain feature, a second prediction tag corresponding to the target domain feature and a third prediction tag corresponding to the fusion feature;

a source supervision classification loss calculation sub-module for calculating the source supervision classification loss based on the first prediction tag and the third prediction tag;

A domain counter loss calculation sub-module for calculating the domain counter loss based on the first predictive tag and the second predictive tag;

and the prediction distribution difference loss calculation sub-module is used for calculating the prediction distribution difference loss based on the first prediction tag and the second prediction tag.

In one possible implementation, the source supervised classification loss is calculated based on the first and third predictive labels by:

wherein L is _CE Supervising the classification loss for the source, n _s For the number of samples of the source domain, l _ce For the cross entropy loss function, C is the classifier, F is the feature extractor,for the ith marker sample, +.>Is->A corresponding first predictive label, f being the deep network of the feature extractor, ++>For the j-th fusion feature,/->Is->Corresponding real labels.

In one possible implementation, the domain counter-loss is calculated based on the first predictive tag and the second predictive tag by the following formula:

wherein L is _D To combat losses for the domain, n _s D is the number of samples of the source domain ^c For the domain arbiter to discriminate category C, F is the feature extractor, For the ith marker sample, +.>Is->Corresponding real label, n _t For the number of samples of the target domain, +.>For the j-th unlabeled sample, +.>Is->Is a hybrid tag of (a).

In one possible implementation, the prediction distribution difference loss is calculated based on the first prediction tag and the second prediction tag by the following formula:

wherein L is _JS For the predicted distribution difference loss, js is the divergence function, T is the harmonic parameterNumber N _s,s To meet the requirements ofN, N _s,t To meet->Sample number of>The true labels representing the two source domain samples are identical, < +.>The real label representing one source domain sample is identical to the predicted label of one target domain sample.

In one possible implementation, the loss function is:

wherein L is _CE Supervising the classification loss for the source, L _D To combat losses for the domain, L _JS For the predicted distribution difference loss, F is the feature extractor, C is the classifier, D is the domain discriminator, and β is L _JS Is used for the positive weight parameter of (a).

In one possible implementation, the marked samples in the source domain are images with real tag information, the unmarked samples in the target domain are unmarked images, and the trained hybrid target domain adaptive model is used for classifying images.

For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference should be made to the description of the method embodiments.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 800 of this embodiment includes: at least one processor 80 (only one shown in fig. 8), a memory 81 and a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the processor 80 implementing the steps in any of the various method embodiments described above when executing the computer program 82.

The computer device 800 may be a desktop computer, a notebook computer, a palm computer, a cloud computer, or the like. The computer device may include, but is not limited to, a processor 80, a memory 81. It will be appreciated by those skilled in the art that fig. 8 is merely an example of a computer device 800 and is not intended to limit the computer device 800, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 80 may be a central processing unit (Central Processing Unit, CPU), the processor 80 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 81 may in some embodiments be an internal storage unit of the computer device 800, such as a hard disk or a memory of the computer device 800. The memory 81 may in other embodiments also be an external storage device of the computer device 800, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 800. Further, the memory 81 may also include both internal storage units and external storage devices of the computer device 800. The memory 81 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs etc., such as program codes of the computer program etc. The memory 81 may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a computer device, causes the computer device to perform the steps that can be carried out in the respective method embodiments described above.

The above embodiments are only for illustrating the technical solution of the present application, and are not limited thereto. Although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of training a hybrid target domain adaptive model, the hybrid target domain adaptive model comprising a feature extractor, a classifier, and a domain arbiter, the method comprising:

2. The method of claim 1, wherein the extracting, by the feature extractor, the source domain feature corresponding to the marked sample, the corresponding target domain feature of the unmarked sample, and the fusion feature of the sample in the source domain and the sample in the target domain, respectively, comprises:

extracting low-level features of the samples in the source domain and low-level features of the samples in the target domain respectively through a low-level network of the feature extractor;

fusing the low-level features of the samples in the source domain and the low-level features of the samples in the target domain to obtain low-level fusion features;

and respectively extracting the low-level features of the samples in the source domain, the low-level features of the samples in the target domain and the low-level fusion features through a high-level network of the feature extractor to obtain the source domain features, the target domain features and the fusion features.

3. The method of claim 2, wherein the low-level source domain features are feature fused with the low-level target domain features to obtain low-level fusion features by the following formula:

4. The method of claim 1, wherein the calculating a source supervised classification loss of the hybrid target domain adaptive model, a domain countermeasure loss for the domain arbiter, a predicted distribution difference loss for the classifier and the feature extractor based on the source domain features, the target domain features, and the fusion features comprises:

the source domain feature, the target domain feature and the fusion feature are respectively input into the classifier to obtain a first prediction tag corresponding to the source domain feature, a second prediction tag corresponding to the target domain feature and a third prediction tag corresponding to the fusion feature;

calculating the source supervised classification loss based on the first predictive label and the third predictive label;

calculating the domain countermeasure loss based on the first predictive tag and the second predictive tag;

the predicted distribution difference loss is calculated based on the first prediction tag and the second prediction tag.

5. The method of claim 4, wherein the source supervised classification penalty is calculated based on the first predictive label and the third predictive label by the following formula:

6. The method of claim 4, wherein the domain counter-loss is calculated based on the first predictive tag and the second predictive tag by the following formula:

wherein L is _D To combat losses for the domain, n _s D is the number of samples of the source domain ^c For the discrimination of class c by the domain discriminator, F is the feature extractor,for the ith marker sample, +.>Is->Corresponding real label, n _t For the number of samples of the target domain, +.>For the j-th unlabeled sample, +.>Is->Is a hybrid tag of (a).

7. The method of claim 4, wherein the predicted distribution difference penalty is calculated based on the first prediction tag and the second prediction tag by the following formula:

Wherein L is _JS For the predicted distribution difference loss, JS is a divergence function, T is a harmonic parameter, N _s,s To meet the requirements ofN, N _s,t To meet->Sample number of>The true labels representing the two source domain samples are identical, < +.>Real tag representing a source domain sample and a destinationThe prediction labels of the domain samples are the same.

8. The method of claim 4, wherein the loss function is:

wherein L is _CE Supervising the classification loss for the source, L _D To combat losses for the domain, L _JS For the predicted distribution difference loss, F is the feature extractor, C is the classifier, D ^c Beta is the domain arbiterIs used for the positive weight parameter of (a).

9. The method of claim 1, wherein marked samples in the source domain are images with true label information, unmarked samples in the target domain are unmarked images, and the trained hybrid target domain adaptive model is used to classify images.

10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-9 when executing the computer program.