CN114528913A - Model migration method, device, equipment and medium based on trust and consistency - Google Patents

Model migration method, device, equipment and medium based on trust and consistency Download PDF

Info

Publication number
CN114528913A
CN114528913A CN202210023290.6A CN202210023290A CN114528913A CN 114528913 A CN114528913 A CN 114528913A CN 202210023290 A CN202210023290 A CN 202210023290A CN 114528913 A CN114528913 A CN 114528913A
Authority
CN
China
Prior art keywords
model
source domain
sample
trust
consistency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210023290.6A
Other languages
Chinese (zh)
Inventor
陈辉
丁贵广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202210023290.6A priority Critical patent/CN114528913A/en
Publication of CN114528913A publication Critical patent/CN114528913A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a model migration method, a device, equipment and a medium based on trust and consistency, wherein the method comprises the following steps: based on preset labeled source domain data, a convolutional neural network is used as a feature extractor to provide features of a source domain image; performing label prediction by using a source domain classification layer, and performing training optimization by using a cross entropy loss function to obtain a pre-trained source domain model; based on a pre-trained source domain model and label-free target domain data, model self-adaptive learning is carried out by using a dual classification network, and training optimization is carried out by using a trust and consistency-based mechanism, so that a source domain model after self-adaptive learning is obtained. Therefore, the problem of field self-adaption under the condition of source domain data missing is solved, namely, the source domain model and the target domain data without labels are given, and the target domain self-adaption learning is carried out through a model migration method, so that the unsupervised learning is realized, and the self-adaption capability of the model is obviously improved.

Description

Model migration method, device, equipment and medium based on trust and consistency
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a model migration method, apparatus, device, and medium based on trust and consistency.
Background
The current artificial intelligence technology based on deep learning often encounters the problem of performance degradation caused by information deviation in practical application, namely, target data of practical application and source domain data information of model training are not distributed uniformly. To solve such problems, academic and industrial fields research Unsupervised Domain Adaptation (UDA) methods, which aim to migrate the knowledge of source Domain data to the learning process of target Domain data. In the method in the related art, basically, it is assumed that in the field adaptive learning process, a model can contact a large amount of source field data with labels, so that the labeled source field data and the unlabeled target field data are used for model training at the same time, and the finally obtained model is tested on the unlabeled target field data.
The unsupervised domain adaptive methods in the related art can be roughly divided into three categories: difference-based, reconstruction-based, and countermeasure-based. The difference-based approach aims at optimizing the merit function for measuring the difference between the source domain data and the target domain data distribution. Some popular works include Maximum Mean variance (MMD), high-order central moment variance, contrast domain variance, and Wasserstein metrics. Reconstruction-based methods typically introduce an auxiliary reconstruction task to achieve a shared representation of the two domains. Domain specific reconstruction and cycle consistency is further proposed to improve the adaptive performance. Countermeasure-based approach employing generation of countermeasure networks to optimize distances between different data distributions
While these methods work well, they all assume that the tagged source data can be accessed during domain adaptation. In practical applications, however, this premise assumption cannot always be satisfied due to data privacy or resource-limited issues. Such as many industrial data, e.g., medical diagnostic records, product defect status, and user behavior information, are typically limited to internal maintenance. Furthermore, on terminal devices (e.g., cameras), storage and computing resources are often very limited, which also presents significant difficulties for large data access and model training.
That is, mobility and adaptivity are two major key issues that the domain adaptation needs to solve. The mobility requires that the trained model can migrate the knowledge of the source domain data to the maximum extent in the learning process of the target domain, and the self-adaptability aims to enable the model to sense the specific information distribution of the target domain data and flexibly adjust the parameters of the model to achieve the learning target of the specific information. However, in the existing unsupervised domain adaptive methods independent of source domain data, most of the methods aim to shorten the distance between the source domain information distribution and the target domain information distribution, thereby enhancing the mobility of the model. In practical application, the target domain data and the source domain data have obvious differences in texture, color, even background and the like, and the learning capability of the model on the specific information of the target domain can be weakened and the self-adaptability capability of the model can be reduced by simply shortening the distance between the target domain and the source domain.
Disclosure of Invention
The application provides a model migration method, a device, equipment and a medium based on trust and consistency, which aim to solve the problem of field self-adaptation under the condition of source domain data loss, namely, a source domain model and label-free target domain data are given, and target domain self-adaptation learning is carried out through the model migration method, so that unsupervised learning is realized, and the self-adaptation capability of the model is obviously improved.
An embodiment of a first aspect of the present application provides a model migration method based on trust and consistency, including the following steps:
based on preset labeled source domain data, a convolutional neural network is used as a feature extractor to provide features of a source domain image;
performing label prediction by using a source domain classification layer, wherein the source domain classification layer consists of a full connection layer and a weight specification layer, and performing training optimization by using a cross entropy loss function to obtain a pre-trained source domain model; and
and based on the pre-trained source domain model and the label-free target domain data, performing model adaptive learning by using a dual classification network, and performing training optimization by using a trust and consistency-based mechanism to obtain a source domain model after adaptive learning.
Optionally, the training and optimizing by using a trust and consistency-based mechanism to obtain the adaptively learned source domain model includes:
when model migration is carried out on the target domain data, inputting samples in the target domain data into a model to obtain probability distribution;
selecting a label with the maximum probability as a pseudo label of a corresponding sample, and measuring the trust degree of a model to the pseudo label by using entropy;
and sequencing all samples in the target domain data from small to large according to the entropy generated by the trust degree to obtain trusted samples, and training the network by using the trusted samples and corresponding pseudo labels to obtain the source domain model after the self-adaptive learning.
Optionally, before using model adaptive learning on the even classification network, further comprising:
and constructing the dual classification network, wherein the dual classification network comprises a feature extractor and a dual classification head, parameters of a source classifier are fixed in a training process, and the feature extractor and a target classifier are updated through random gradient descent.
Optionally, the training optimization using the trust and consistency based mechanism includes:
extracting the characteristics of the unmarked samples in the target domain data, and obtaining a first distribution prediction result and a second distribution prediction result based on a preset first classifier and a preset second classifier;
and calculating the information entropy of the model prediction distribution based on the first distribution prediction result and the second distribution prediction result to obtain the credibility of the model prediction.
Optionally, the training optimization using the trust and consistency based mechanism includes:
randomly rotating the trusted sample by a preset angle to obtain a new trusted sample;
inputting the trusted sample and the new trusted sample into the dual classification network, and respectively obtaining the characteristics and the prediction distribution results of the trusted sample and the new trusted sample;
using a preset loss function to enable the characteristics and the probability distribution of the credible sample and the new credible sample to be consistent;
and predicting the rotation angle of the new trusty sample relative to the trusty sample by utilizing a preset classification layer, and calculating the prediction loss of the rotation angle.
The embodiment of the second aspect of the present application provides a model migration apparatus based on trust and consistency, including:
the extraction module is used for extracting the characteristics of the source domain image by using a convolutional neural network as a characteristic extractor based on preset labeled source domain data;
the optimization module is used for predicting labels by using a source domain classification layer, wherein the source domain classification layer consists of a full connection layer and a weight specification layer, and a cross entropy loss function is used for training and optimizing to obtain a pre-trained source domain model; and
and the acquisition module is used for carrying out model self-adaptive learning by using a dual classification network based on the pre-trained source domain model and the non-labeled target domain data, and carrying out training optimization by using a trust and consistency-based mechanism to obtain a source domain model after self-adaptive learning.
Optionally, the optimization module is specifically configured to:
when model migration is carried out on the target domain data, inputting samples in the target domain data into a model to obtain probability distribution;
selecting a label with the maximum probability as a pseudo label of a corresponding sample, and measuring the trust degree of a model to the pseudo label by using entropy;
and sequencing all samples in the target domain data from small to large according to the entropy generated by the trust degree to obtain trusted samples, and training the network by using the trusted samples and corresponding pseudo labels to obtain the source domain model after the self-adaptive learning.
Optionally, before using model adaptive learning on the dual classification network, the obtaining module is further configured to:
and constructing the dual classification network, wherein the dual classification network comprises a feature extractor and a dual classification head, parameters of a source classifier are fixed in a training process, and the feature extractor and a target classifier are updated through random gradient descent.
Optionally, the optimization module is specifically configured to:
extracting the characteristics of the unmarked samples in the target domain data, and obtaining a first distribution prediction result and a second distribution prediction result based on a preset first classifier and a preset second classifier;
and calculating the information entropy of the model prediction distribution based on the first distribution prediction result and the second distribution prediction result to obtain the credibility of the model prediction.
Optionally, the optimization module is specifically configured to:
randomly rotating the trusted sample by a preset angle to obtain a new trusted sample;
inputting the trustable sample and the new trustable sample into the dual classification network, and respectively obtaining the characteristics and the prediction distribution results of the trustable sample and the new trustable sample;
using a preset loss function to enable the characteristics and the probability distribution of the credible sample and the new credible sample to be consistent;
and predicting the rotation angle of the new trusty sample relative to the trusty sample by utilizing a preset classification layer, and calculating the prediction loss of the rotation angle.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the trust and consistency based model migration method as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium storing computer instructions for causing a computer to perform the trust and consistency-based model migration method according to the above embodiments.
Therefore, in the method, a convolutional neural network is used as a feature extractor to provide features of a source domain image based on preset labeled source domain data, a source domain classification layer is used for label prediction, a cross entropy loss function is used for training and optimizing to obtain a pre-trained source domain model, a dual classification network is used for model adaptive learning based on the pre-trained source domain model and unlabeled target domain data, and a trust and consistency mechanism is used for training and optimizing to obtain a source domain model after adaptive learning. Therefore, the problem of field self-adaption under the condition of source domain data missing is solved, namely, the source domain model and the target domain data without labels are given, and the target domain self-adaption learning is carried out through a model migration method, so that the unsupervised learning is realized, and the self-adaption capability of the model is obviously improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a trust and consistency-based model migration method according to an embodiment of the present application;
FIG. 2 is an exemplary diagram of a dual classification network framework according to one embodiment of the present application;
FIG. 3 is a graphical illustration of a comparison of performance on a VisDA dataset according to one embodiment of the present application;
FIG. 4 is an exemplary diagram of a trust and consistency based model migration apparatus according to an embodiment of the present application;
fig. 5 is an exemplary diagram of an electronic device according to an embodiment of the application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The trust and consistency-based model migration method, apparatus, device, and medium according to embodiments of the present application are described below with reference to the accompanying drawings. In the method, a convolutional neural network is used as a feature extractor to provide features of a source domain image based on preset labeled source domain data, a source domain classification layer is used for label prediction, a cross entropy loss function is used for training optimization to obtain a pre-trained source domain model, a dual classification network is used for model adaptive learning based on the pre-trained source domain model and unlabeled target domain data, and a trust and consistency mechanism is used for training optimization to obtain a source domain model after adaptive learning. Therefore, the problem of field self-adaption under the condition of source domain data missing is solved, namely, the source domain model and the target domain data without labels are given, and the target domain self-adaption learning is carried out through a model migration method, so that the unsupervised learning is realized, and the self-adaption capability of the model is obviously improved.
Specifically, fig. 1 is a schematic flowchart of a trust and consistency-based model migration method according to an embodiment of the present application.
As shown in FIG. 1, the model migration method based on trust and consistency comprises the following steps:
in step S101, based on the preset labeled source domain data, a convolutional neural network is used as a feature extractor to extract features of the source domain image.
In step S102, a source domain classification layer is used for performing label prediction, wherein the source domain classification layer is composed of a full-link layer and a weight specification layer, and a cross entropy loss function is used for training optimization to obtain a pre-trained source domain model.
In particular, embodiments of the present application may be given labeled source domain datasets
Figure BDA0003463489740000051
And a set of labels Y, for the sampled specimens,ysTherein of
Figure BDA0003463489740000052
ysE.g. Y, firstly constructing a neural network model thetas={fs,hsIn which fsIs a feature extractor that initializes f using a pre-trained convolutional neural network on ImageNetsParameter of (c), hsIs a classifier and is initialized randomly. During training, use fsDecimating xsThen using hsClassify it, the output result of the classifier is ps(xs)=hs(fs(xs)). Finally, a cross entropy loss training model theta is useds
Figure BDA0003463489740000061
Wherein the content of the first and second substances,
Figure BDA0003463489740000062
represents a sample xsIs ysProbability P (y)s|xs) Here the output p of the classifiers(xs) Above belong to ysThe probability value of (a).
Figure BDA0003463489740000063
Is a desired function. In step S103, based on the pre-trained source domain model and the non-labeled target domain data, model adaptive learning is performed using a dual classification network, and training optimization is performed using a trust and consistency-based mechanism, so as to obtain a source domain model after adaptive learning.
As shown in fig. 2, as can be seen from fig. 2, the dual classification network may adopt a convolutional neural network as a feature extractor, and then use a dual classification head to classify the extracted features, where the dual classification head includes two different classification layers, and a full connection layer and a weight specification layer are located in each layer. When the dual classification network is initialized, the feature extractor parameters of the dual classification network are initialized by using the feature extractor parameters of the source domain model, one classification layer of a dual classification head in the dual classification network is initialized by using the source domain classification layer parameters of the source domain model, and is called a source classifier of the dual classification network, and the other classification layer is initialized by using a random floating point numerical value, and is called a target classifier of the dual classification network. During model migration on target domain data, the parameters of the source classifier are not updated, while the parameters of the target classifier are updated using a random gradient descent.
Optionally, in some embodiments, before performing model adaptive learning using the dual classification network, the method further includes: and constructing a dual classification network, wherein the dual classification network comprises a feature extractor and a dual classification head, the parameters of the source classifier are fixed in the training process, and the feature extractor and the target classifier are updated through random gradient descent.
It should be understood that the dual classification network θ constructed by the embodiments of the present applicationtComprising a feature extractor ftAnd a dual sort header. The parameters of the feature extractor are initialized using the feature extractor of the pre-trained model on the source domain, i.e. ft=fs. The dual classification head comprises two classification layers which are source domain classification layers h'sAnd a target domain classification layer htWherein the parameters of the source domain classification layer are initialized by the classification layer of the pre-trained model on the source domain, i.e. h's=hsAnd the target domain classification layer htInitialization is performed by a random value. The dual classification network constructed finally is thetat={ft,hs,ht}. In the training process, we fix the parameter h of the source classifiersCharacteristic extractor ftAnd a target classifier htUpdated by a random gradient descent.
Optionally, in some embodiments, training optimization is performed using a trust and consistency-based mechanism to obtain an adaptively learned source domain model, including: when model migration is carried out on target domain data, samples in the target domain data are input into the model, and probability distribution is obtained; selecting a label with the maximum probability as a pseudo label of a corresponding sample, and measuring the trust degree of the model to the pseudo label by using entropy; and sequencing all samples in the target domain data from small to large according to the entropy generated by the trust degree to obtain trusty samples, and training the network by using the trusty samples and the corresponding pseudo labels to obtain the source domain model after the self-adaptive learning.
Optionally, in some embodiments, training optimization is performed using a trust and consistency based mechanism, comprising: extracting the characteristics of the unmarked samples in the target domain data, and obtaining a first distribution prediction result and a second distribution prediction result based on a preset first classifier and a preset second classifier; and calculating the information entropy of the model prediction distribution based on the first distribution prediction result and the second distribution prediction result to obtain the reliability of the model prediction.
In particular, in the training process, unlabeled samples of the target domain
Figure BDA0003463489740000071
The embodiment of the application can use a feature extractor ftDecimating sample xtIs then input to two classifiers hsAnd htIn (2), two distribution prediction results are obtained, i.e. ps(xt)=hs(ft(xt) And p)t(xs)=ht(fs(xt)). Next, calculating the information entropy of the model prediction distribution to represent the credibility of the model prediction, which is respectively:
Figure BDA0003463489740000072
Figure BDA0003463489740000073
wherein the content of the first and second substances,
Figure BDA0003463489740000074
represents a sample xtIs ytProbability P (y)t|xt) Here, the classifier hsOutput p ofs(xt) Above belong to ytThe probability value of (a), in the same way,
Figure BDA0003463489740000075
also represents a sample xtIs ytProbability P (y) oft|xt) But here is classifier htOutput p of (2)t(xt) Above belong to ytThe probability value of (a).
Further, for any one sample
Figure BDA0003463489740000076
The embodiment of the application can obtain the prediction reliability of the method, and a classifier h is used in the methodtCalculated confidence level, i.e. Ht(xt). Then, the credibility is ranked from small to large, and samples ranked at the top r% are selected to form a credible sample set
Figure BDA0003463489740000077
For any sample of trustworthiness
Figure BDA0003463489740000078
The output of the model is
Figure BDA0003463489740000079
And
Figure BDA00034634897400000710
we use the label that gets the maximum probability value as its pseudo label yt', i.e.:
Figure BDA00034634897400000711
may also be used herein
Figure BDA00034634897400000712
The experimental result shows that the difference between the two is not great.
Finally, the embodiments of the present application may use only samples in the set of trusted samples
Figure BDA00034634897400000713
And the resulting pseudo label ytTo calculate the loss values for two classifiers:
Figure BDA00034634897400000714
Figure BDA00034634897400000715
wherein the content of the first and second substances,
Figure BDA00034634897400000716
representing a sample
Figure BDA00034634897400000717
Is yt' probability of
Figure BDA00034634897400000718
Here the classifier hsOutput of (2)
Figure BDA00034634897400000719
Above belong to ytThe probability value of, and the same way as,
Figure BDA00034634897400000720
also represents the sample
Figure BDA00034634897400000721
Is yt' probability of
Figure BDA00034634897400000722
But here the classifier htOutput of (2)
Figure BDA00034634897400000723
Above belong to ytThe probability value of.
Figure BDA00034634897400000724
Is a desired function.
Optionally, in some embodiments, training optimization is performed using a trust and consistency based mechanism, comprising: randomly rotating the trusted sample by a preset angle to obtain a new trusted sample; inputting the credible sample and the new credible sample into a dual classification network, and respectively obtaining the characteristics and the prediction distribution results of the credible sample and the new credible sample; using a preset loss function to enable the characteristics and the probability distribution of the credible sample and the new credible sample to be consistent; and predicting the rotation angle of the new trusty sample relative to the trusty sample by utilizing a preset classification layer, and calculating the prediction loss of the rotation angle.
In particular, embodiments of the present application may employ an auto-supervised approach to enhance the learning of features. For a given sample of confidence
Figure BDA0003463489740000081
Another version of the method is obtained by randomly rotating a certain angle
Figure BDA0003463489740000082
The angle may be any one of 0 degrees, 90 degrees, 180 degrees, and 270 degrees. Then, we will
Figure BDA0003463489740000083
And
Figure BDA0003463489740000084
inputting into dual classification network to obtain their respective characteristics
Figure BDA0003463489740000085
And
Figure BDA0003463489740000086
and predicting distribution results
Figure BDA0003463489740000087
Figure BDA0003463489740000088
And
Figure BDA0003463489740000089
next, we force using the following loss function for
Figure BDA00034634897400000810
And
Figure BDA00034634897400000811
the features extracted by the model and the probability distribution remain consistent:
Figure BDA00034634897400000812
Figure BDA00034634897400000813
wherein
Figure BDA00034634897400000814
Is that
Figure BDA00034634897400000815
The rotated version of the other picture as
Figure BDA00034634897400000816
Negative example of
Figure BDA00034634897400000817
Is that
Figure BDA00034634897400000818
Just as an example.
Figure BDA00034634897400000819
Representing a sample
Figure BDA00034634897400000820
Is yt' probability of
Figure BDA00034634897400000821
Here the classifier hsOutput of (2)
Figure BDA00034634897400000822
Above belong to ytThe probability value of, and the same way as,
Figure BDA00034634897400000823
also represents the sample
Figure BDA00034634897400000824
Is yt' probability of
Figure BDA00034634897400000825
But here the classifier htOutput of (2)
Figure BDA00034634897400000826
Above belong to ytThe probability value of.
Figure BDA00034634897400000827
Is a desired function.
In addition, the embodiment of the application can use an additional classification layer hrTo predict
Figure BDA00034634897400000828
Relative to
Figure BDA00034634897400000829
Rotation angle of (d), classification level hrThe predicted distribution of (a) is:
Figure BDA00034634897400000830
wherein [ a, b]Indicating that the two vectors are connected together to form a new vector. The relative rotation angle prediction loss is then calculated:
Figure BDA00034634897400000831
wherein, yrIs the relative rotation angle, minus one of 0-3, corresponding to 0 degrees, 90 degrees, 180 degrees and 270 degrees, respectively.
Figure BDA00034634897400000832
Is a classifier hrOutput of (2)
Figure BDA00034634897400000833
Above belong to yrThe probability value of (a).
Further, in order to allow the model to output a confident probability distribution, further, for any one sample
Figure BDA00034634897400000834
We calculate the maximum information entropy loss function:
Figure BDA00034634897400000835
Figure BDA00034634897400000836
where the sum function represents the addition of all elements of the vector.
Figure BDA00034634897400000837
Finally, the gradient of each loss function is adjusted by integrating all the loss functions and introducing a balance factor:
Figure BDA0003463489740000091
in summary, the model migration method based on trust and consistency in the embodiment of the present application adopts a dual classification network, and introduces a trust and consistency mechanism to train and optimize the dual classification network, wherein a basic framework of the model migration method includes two stages, namely model pre-training on a source domain and model adaptive learning on a target domain, wherein:
model pre-training on the source domain is: giving labeled source domain data, using a convolutional neural network as a feature extractor, providing features of a source domain image, and then using a source domain classification layer for label prediction, wherein the source domain classification layer consists of a full connection layer and a weight specification layer; and (4) performing training optimization by using a cross entropy loss function to obtain a source domain model.
Model adaptive learning on the target domain is: given a pre-trained source domain model and unlabeled target domain data, model adaptive learning is performed using a dual classification network, and training optimization is performed using a trust and consistency-based mechanism.
The trust-based optimization mechanism specifically comprises the following steps: when model migration is performed on target domain data, firstly, a target domain sample is input into a model to obtain probability distribution, then a label with the maximum probability is selected as a pseudo label of the sample, in addition, the trust degree of the model on the pseudo label is measured by using entropy, and the smaller the entropy is, the more the model trusts the label. We rank all target samples from small to large according to entropy, choose the top 80% of the samples as trustworthy samples, and then train the network with these samples and their pseudo-labels.
The optimization mechanism based on consistency is specifically as follows: for each picture sample on the target domain, we randomly rotate an angle, which can be any number of 0 degrees, 90 degrees, 180 degrees and 270 degrees, to get another picture. The two pictures present the same information from different angles, but the features and probability distributions obtained after the two pictures pass through the dual classification network are different, so that the embodiment of the application uses contrast loss to approximate the feature distance of the two pictures and uses cross entropy loss to approximate the probability distributions of the two pictures, thereby ensuring that the two pictures keep consistency on the features and the prediction results.
Therefore, the model migration method based on trust and consistency in the embodiment of the application can enhance the migration and the adaptability of the model, and is greatly helpful for the feature representation learning of the sample, as can be seen from fig. 3, the embodiment of the application can significantly improve the performance of unsupervised field self-adaptation irrelevant to source domain data, and on a common standard evaluation set of VisDA, compared with a leading method SHOT, the performance can be improved by 2.0% under the same condition, even if compared with the SHOT + + using an additional training method, the application also has the improvement of 0.2%, and the effectiveness is fully proved.
According to the model migration method based on trust and consistency, provided by the embodiment of the application, the characteristics of the source domain image can be provided by using a convolutional neural network as a characteristic extractor based on preset labeled source domain data, and label prediction is performed by using a source domain classification layer, wherein the source domain classification layer consists of a full connection layer and a weight specification layer, training optimization is performed by using a cross entropy loss function to obtain a pre-trained source domain model, model self-adaptive learning is performed by using a dual classification network based on the pre-trained source domain model and unlabeled target domain data, and training optimization is performed by using a mechanism based on trust and consistency to obtain a self-adaptively learned source domain model. Therefore, the problem of field self-adaption under the condition of source domain data missing is solved, namely, the source domain model and the target domain data without labels are given, and the target domain self-adaption learning is carried out through a model migration method, so that the unsupervised learning is realized, and the self-adaption capability of the model is obviously improved.
Next, a trust and consistency-based model migration apparatus proposed according to an embodiment of the present application is described with reference to the drawings.
FIG. 4 is a block diagram of a trust and consistency based model migration apparatus according to an embodiment of the present application.
As shown in fig. 4, the trust and consistency-based model migration apparatus 10 includes: an extraction module 100, an optimization module 200 and an acquisition module 300.
The extraction module 100 is configured to use a convolutional neural network as a feature extractor to extract features of a source domain image based on preset labeled source domain data;
the optimization module 200 is configured to perform label prediction by using a source domain classification layer, where the source domain classification layer is composed of a full-link layer and a weight specification layer, and performs training optimization by using a cross entropy loss function to obtain a pre-trained source domain model; and
the obtaining module 300 is configured to perform model adaptive learning using a dual classification network based on a pre-trained source domain model and unlabeled target domain data, and perform training optimization using a trust and consistency-based mechanism to obtain a source domain model after adaptive learning.
Optionally, the optimization module 200 is specifically configured to:
when model migration is carried out on target domain data, samples in the target domain data are input into the model, and probability distribution is obtained;
selecting a label with the maximum probability as a pseudo label of a corresponding sample, and measuring the trust degree of the model to the pseudo label by using entropy;
and sequencing all samples in the target domain data from small to large according to the entropy generated by the trust degree to obtain trusty samples, and training the network by using the trusty samples and the corresponding pseudo labels to obtain the source domain model after the self-adaptive learning.
Optionally, before performing model adaptive learning using the dual classification network, the obtaining module 100 is further configured to:
and constructing a dual classification network, wherein the dual classification network comprises a feature extractor and a dual classification head, and in the training process, the parameters of the source classifier are fixed, and the feature extractor and the target classifier are updated through random gradient descent.
Optionally, the optimization module 200 is specifically configured to:
extracting the characteristics of the unmarked samples in the target domain data, and obtaining a first distribution prediction result and a second distribution prediction result based on a preset first classifier and a preset second classifier;
and calculating the information entropy of the model prediction distribution based on the first distribution prediction result and the second distribution prediction result to obtain the credibility of the model prediction.
Optionally, the optimization module 200 is specifically configured to:
randomly rotating the trusted sample by a preset angle to obtain a new trusted sample;
inputting the credible sample and the new credible sample into a dual classification network, and respectively obtaining the characteristics and the prediction distribution results of the credible sample and the new credible sample;
using a preset loss function to enable the characteristics and the probability distribution of the credible sample and the new credible sample to be consistent;
and predicting the rotation angle of the new trusty sample relative to the trusty sample by utilizing a preset classification layer, and calculating the prediction loss of the rotation angle.
It should be noted that the foregoing explanation of the trust and consistency-based model migration method embodiment is also applicable to the trust and consistency-based model migration apparatus of this embodiment, and details are not repeated here.
According to the model migration device based on trust and consistency, provided by the embodiment of the application, the characteristics of the source domain image can be provided by using a convolutional neural network as a characteristic extractor based on preset labeled source domain data, and label prediction is performed by using a source domain classification layer, wherein the source domain classification layer consists of a full connection layer and a weight specification layer, training optimization is performed by using a cross entropy loss function to obtain a pre-trained source domain model, model self-adaptive learning is performed by using a dual classification network based on the pre-trained source domain model and unlabeled target domain data, and training optimization is performed by using a mechanism based on trust and consistency to obtain a self-adaptively learned source domain model. Therefore, the problem of field self-adaption under the condition of source domain data missing is solved, namely, the source domain model and the target domain data without labels are given, and the target domain self-adaption learning is carried out through a model migration method, so that the unsupervised learning is realized, and the self-adaption capability of the model is obviously improved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 501, processor 502, and computer programs stored on memory 501 and executable on processor 502.
The processor 502, when executing a program, implements the trust and consistency based model migration method provided in the embodiments described above.
Further, the electronic device further includes:
a communication interface 503 for communication between the memory 501 and the processor 502.
A memory 501 for storing computer programs that can be run on the processor 502.
The memory 501 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 501, the processor 502 and the communication interface 503 are implemented independently, the communication interface 503, the memory 501 and the processor 502 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Alternatively, in practical implementation, if the memory 501, the processor 502 and the communication interface 503 are integrated on a chip, the memory 501, the processor 502 and the communication interface 503 may complete communication with each other through an internal interface.
The processor 502 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiments also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the trust and consistency based model migration method as described above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (12)

1. A model migration method based on trust and consistency is characterized by comprising the following steps:
based on preset labeled source domain data, a convolutional neural network is used as a feature extractor to provide features of a source domain image;
performing label prediction by using a source domain classification layer, wherein the source domain classification layer consists of a full connection layer and a weight specification layer, and performing training optimization by using a cross entropy loss function to obtain the pre-trained source domain model; and
and based on the pre-trained source domain model and the label-free target domain data, performing model adaptive learning by using a dual classification network, and performing training optimization by using a trust and consistency-based mechanism to obtain a source domain model after adaptive learning.
2. The method of claim 1, wherein the training optimization using the trust and consistency based mechanism to obtain the adaptively learned source domain model comprises:
when model migration is carried out on the target domain data, inputting samples in the target domain data into a model to obtain probability distribution;
selecting a label with the maximum probability as a pseudo label of a corresponding sample, and measuring the trust degree of a model to the pseudo label by using entropy;
and sequencing all samples in the target domain data from small to large according to entropy generated by the trust degree to obtain trusted samples, and training a network by using the trusted samples and corresponding pseudo labels to obtain the source domain model after the self-adaptive learning.
3. The method of claim 2, further comprising, prior to using model adaptive learning for the dual classification network:
and constructing the dual classification network, wherein the dual classification network comprises a feature extractor and a dual classification head, parameters of a source classifier are fixed in a training process, and the feature extractor and a target classifier are updated through random gradient descent.
4. The method of claim 3, wherein the using a trust and consistency based mechanism for training optimization comprises:
extracting the characteristics of the unmarked samples in the target domain data, and obtaining a first distribution prediction result and a second distribution prediction result based on a preset first classifier and a preset second classifier;
and calculating the information entropy of the model prediction distribution based on the first distribution prediction result and the second distribution prediction result to obtain the credibility of the model prediction.
5. The method of claim 3 or 4, wherein the using a trust and consistency based mechanism for training optimization comprises:
randomly rotating the trusted sample by a preset angle to obtain a new trusted sample;
inputting the trusted sample and the new trusted sample into the dual classification network, and respectively obtaining the characteristics and the prediction distribution results of the trusted sample and the new trusted sample;
using a preset loss function to enable the characteristics and the probability distribution of the credible sample and the new credible sample to be consistent;
and predicting the rotation angle of the new trusty sample relative to the trusty sample by utilizing a preset classification layer, and calculating the prediction loss of the rotation angle.
6. A model migration apparatus based on trust and consistency, comprising:
the extraction module is used for extracting the characteristics of the source domain image by using a convolutional neural network as a characteristic extractor based on preset labeled source domain data;
the optimization module is used for predicting the label by using a source domain classification layer, wherein the source domain classification layer consists of a full connection layer and a weight specification layer, and a cross entropy loss function is used for training and optimizing to obtain the pre-trained source domain model; and
and the acquisition module is used for carrying out model self-adaptive learning by using a dual classification network based on the pre-trained source domain model and the non-labeled target domain data, and carrying out training optimization by using a trust and consistency-based mechanism to obtain a source domain model after self-adaptive learning.
7. The apparatus of claim 6, wherein the optimization module is specifically configured to:
when model migration is carried out on the target domain data, inputting samples in the target domain data into a model to obtain probability distribution;
selecting a label with the maximum probability as a pseudo label of a corresponding sample, and measuring the trust degree of a model to the pseudo label by using entropy;
and sequencing all samples in the target domain data from small to large according to the entropy generated by the trust degree to obtain trusted samples, and training the network by using the trusted samples and corresponding pseudo labels to obtain the source domain model after the self-adaptive learning.
8. The apparatus of claim 7, wherein prior to using model adaptive learning for the dual classification network, the obtaining module is further configured to:
and constructing the dual classification network, wherein the dual classification network comprises a feature extractor and a dual classification head, parameters of a source classifier are fixed in a training process, and the feature extractor and a target classifier are updated through random gradient descent.
9. The apparatus of claim 8, wherein the optimization module is specifically configured to:
extracting the characteristics of the unmarked samples in the target domain data, and obtaining a first distribution prediction result and a second distribution prediction result based on a preset first classifier and a preset second classifier;
and calculating the information entropy of the model prediction distribution based on the first distribution prediction result and the second distribution prediction result to obtain the credibility of the model prediction.
10. The apparatus according to claim 8 or 9, wherein the optimization module is specifically configured to:
randomly rotating the trusted sample by a preset angle to obtain a new trusted sample;
inputting the trusted sample and the new trusted sample into the dual classification network, and respectively obtaining the characteristics and the prediction distribution results of the trusted sample and the new trusted sample;
using a preset loss function to enable the characteristics and the probability distribution of the credible sample and the new credible sample to be consistent;
and predicting the rotation angle of the new trusty sample relative to the trusty sample by utilizing a preset classification layer, and calculating the prediction loss of the rotation angle.
11. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the trust and consistency-based model migration method of any of claims 1-5.
12. A computer-readable storage medium, on which a computer program is stored, the program being executable by a processor for implementing a trust and consistency-based model migration method as claimed in any one of claims 1 to 5.
CN202210023290.6A 2022-01-10 2022-01-10 Model migration method, device, equipment and medium based on trust and consistency Pending CN114528913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210023290.6A CN114528913A (en) 2022-01-10 2022-01-10 Model migration method, device, equipment and medium based on trust and consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210023290.6A CN114528913A (en) 2022-01-10 2022-01-10 Model migration method, device, equipment and medium based on trust and consistency

Publications (1)

Publication Number Publication Date
CN114528913A true CN114528913A (en) 2022-05-24

Family

ID=81620948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210023290.6A Pending CN114528913A (en) 2022-01-10 2022-01-10 Model migration method, device, equipment and medium based on trust and consistency

Country Status (1)

Country Link
CN (1) CN114528913A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186773A (en) * 2022-09-13 2022-10-14 杭州涿溪脑与智能研究所 Passive active field self-adaptive model training method and device
CN116468959A (en) * 2023-06-15 2023-07-21 清软微视(杭州)科技有限公司 Industrial defect classification method, device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186773A (en) * 2022-09-13 2022-10-14 杭州涿溪脑与智能研究所 Passive active field self-adaptive model training method and device
CN115186773B (en) * 2022-09-13 2022-12-09 杭州涿溪脑与智能研究所 Passive active field adaptive model training method and device
CN116468959A (en) * 2023-06-15 2023-07-21 清软微视(杭州)科技有限公司 Industrial defect classification method, device, electronic equipment and storage medium
CN116468959B (en) * 2023-06-15 2023-09-08 清软微视(杭州)科技有限公司 Industrial defect classification method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112990280B (en) Class increment classification method, system, device and medium for image big data
WO2021140426A1 (en) Uncertainty guided semi-supervised neural network training for image classification
CN111666993A (en) Medical image sample screening method and device, computer equipment and storage medium
CN114528913A (en) Model migration method, device, equipment and medium based on trust and consistency
US20220092407A1 (en) Transfer learning with machine learning systems
CN110738235B (en) Pulmonary tuberculosis judging method, device, computer equipment and storage medium
CN111583199B (en) Sample image labeling method, device, computer equipment and storage medium
CN114330588A (en) Picture classification method, picture classification model training method and related device
WO2022121544A1 (en) Normalizing oct image data
CN114663687A (en) Model training method, target recognition method, device, equipment and storage medium
CN112733724B (en) Relativity relationship verification method and device based on discrimination sample meta-digger
Struski et al. ProMIL: Probabilistic multiple instance learning for medical imaging
CN115661502A (en) Image processing method, electronic device, and storage medium
CN113222053A (en) Malicious software family classification method, system and medium based on RGB image and Stacking multi-model fusion
Wu et al. Practical and efficient model extraction of sentiment analysis APIs
CN108364067B (en) Deep learning method based on data segmentation and robot system
CN112446231A (en) Pedestrian crossing detection method and device, computer equipment and storage medium
US20220180200A1 (en) Unsupervised domain adaptation using joint loss and model parameter search
Sirhan et al. Multilabel CNN model for asphalt distress classification
CN116992937A (en) Neural network model restoration method and related equipment
Ali et al. Diagnosing COVID-19 Lung Inflammation Using Machine Learning Algorithms: A Comparative Study
Mercy Rajaselvi Beaulah et al. Categorization of images using autoencoder hashing and training of intra bin classifiers for image classification and annotation
Gallée et al. Interpretable Medical Image Classification Using Prototype Learning and Privileged Information
JP2020181265A (en) Information processing device, system, information processing method, and program
CN115631391B (en) Image selection method and device based on deep active learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination