CN114998602A

CN114998602A - Domain adaptive learning method and system based on low confidence sample contrast loss

Info

Publication number: CN114998602A
Application number: CN202210942337.9A
Authority: CN
Inventors: 王子磊; 张燚鑫; 贺伟男
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2022-09-02
Anticipated expiration: 2042-08-08
Also published as: CN114998602B

Abstract

The invention discloses a domain adaptive learning method and system based on low-confidence sample contrast loss, which uses a contrast learning method, fully utilizes a target domain low-confidence sample on the basis of the original domain adaptive method utilizing a target domain high-confidence sample, and prevents an image classification model from causing suboptimal domain migration effect due to deviation to a sample close to a source domain in the target domain; in contrast learning, original image features are represented again, and semantic information specific to tasks is encoded better; in addition, cross-domain mixing is used for low-confidence samples, the low-confidence samples are dominant in the low-confidence samples, the domain difference is reduced, and the image classification model can better learn the invariant features of the domain. In general, the method and the device utilize low-confidence samples, and improve the accuracy of unsupervised domain adaptation and semi-supervised domain adaptation image classification.

Description

Domain adaptive learning method and system based on low confidence sample contrast loss

Technical Field

The invention relates to the field of image classification, in particular to a domain adaptive learning method and system based on low confidence sample contrast loss.

Background

In recent years, the use of deep neural networks to deal with various types of machine learning problems has been highly effective, however its superior performance has largely relied on large high quality labeled data sets. The high time and labor costs make manually labeling datasets impractical. Traditional deep learning methods also do not generalize well to new data sets due to domain bias issues. In this regard, domain adaptation utilizes knowledge learned over a source domain with a large number of labeled samples to assist in model learning over another target domain that is associated with the source domain but lacks labeling, which can save labeling costs by reducing domain skews. The domain adaptation can be divided into unsupervised domain adaptation and semi-supervised domain adaptation according to whether the target domain sample has the label.

A common approach to solving domain bias is to make the model learning domain invariant features. Existing methods for domain adaptation are typically based on inter-domain difference metrics, or on countermeasures. In the chinese patent application CN113011456A, the application "unsupervised domain adaptation method based on class adaptive model for image classification", a domain transferable encoder is established by a self-attention module and a cross-attention module, so as to achieve intra-domain alignment and inter-domain alignment; a class adaptive decoder is built to reduce domain differences through class prototype learning and alignment. In chinese patent application CN113011523A, an unsupervised depth domain adaptation method based on distributed countermeasure, feature distribution matching is performed on the fully connected layers of a classifier, MK-MMD (multi-core maximum mean difference) is used to measure the feature distribution difference between the domains, and two fully connected networks are built after the convolutional layers to serve as domain discriminators to perform domain countermeasure to reduce the domain difference. In the chinese patent application CN113673555A, the publication number is CN113673555A, the method uses a neural network model to extract the features of pictures in a data set, uses a clustering algorithm to assist a memory to store the features of a source domain and a target domain class by class, trains a neural network, and uses the similarity of the distribution of memories of the source domain and the target domain as a condition constraint neural network. In the chinese patent application CN113283489A, a classification method for semi-supervised domain adaptive learning based on joint distribution matching, the difference between the source object sample data and the target object sample data distribution is measured by a preset algorithm based on a kernel method, and the joint distribution of the target domain and the source domain is drawn. In the chinese patent application CN113378632A, an unsupervised domain adaptive pedestrian re-identification algorithm based on pseudo label optimization, an auxiliary classifier structure is used to calculate the KL divergence (relative entropy) value between the class prediction vector output by the auxiliary classifier structure and the class prediction vector output by the main classifier structure, so as to obtain a more reliable pseudo label. In the chinese patent application CN113610105A, the application "unsupervised domain adaptive image classification method based on dynamic weighted learning and meta learning", the network model parameters are optimized by weighting the samples, dynamically adjusting the weights of domain alignment loss and classification loss, and calculating domain alignment loss and classification loss through meta learning, so as to promote the optimization consistency between the domain alignment task and the classification task.

However, the existing domain adaptation methods: on the one hand, the inherent structure of the tag-free target domain is not explored; on the other hand, some criteria are used for screening out high-confidence samples, meanwhile, low-confidence samples are completely ignored, and the neglected low-confidence samples cannot reflect the structure of real target domain data, so that the image classification model is biased to the high-confidence samples, and the classification accuracy of the image classification model after domain adaptation learning is poor.

Disclosure of Invention

The invention aims to provide a domain adaptive learning method and system based on low confidence sample contrast loss, which are used for performing contrast learning by using low confidence samples and are beneficial to improving the classification accuracy of an image classification model after domain adaptive learning.

The purpose of the invention is realized by the following technical scheme:

a domain adaptive learning method based on low confidence sample contrast loss comprises the following steps:

screening out a low-confidence sample set from the target domain image set according to a set threshold;

for each low confidence sample image, obtaining two different enhanced view images, namely a first enhanced view image and a second enhanced view image, by using a data enhancement method, randomly selecting a source domain sample image from a source domain image set, and obtaining two different enhanced view images, namely a third enhanced view image and a fourth enhanced view image, by using the data enhancement method;

mixing the first enhanced view image and the third enhanced view image to form a query image, inputting the query image into a first image classification model, and performing image feature extraction and re-representation through the first image classification model to obtain a first re-representation feature; inputting the second enhanced view image and the fourth enhanced view image into a second image classification model respectively, and extracting and re-representing image features through the second image classification model respectively to obtain corresponding re-represented features; blending the second enhanced view image with the corresponding re-representation features of the fourth enhanced view image to form blended re-representation features;

and taking the first re-representation feature as a query feature, taking all the rest re-representation features as comparison features, constructing a comparison loss by using the difference between the query feature and each comparison feature, and constructing a total loss function by combining the basic loss of the first image classification model to train the first image classification model.

A low confidence sample contrast loss based domain adaptive learning system, comprising:

the low confidence sample set generating unit is used for screening out a low confidence sample set from the target domain image set according to a set threshold;

an enhanced view image generation unit, configured to, for each low confidence sample image, obtain two different enhanced view images, referred to as a first enhanced view image and a second enhanced view image, using a data enhancement method, randomly select a source domain sample image from a source domain image set, and obtain two different enhanced view images, referred to as a third enhanced view image and a fourth enhanced view image, using the data enhancement method;

the re-representation feature acquisition unit is used for mixing the first enhanced view image and the third enhanced view image to form a query image, inputting the query image into a first image classification model, extracting image features through the first image classification model, and re-representing to obtain a first re-representation feature; inputting the second enhanced view image and the fourth enhanced view image into a second image classification model, and respectively performing image feature extraction and re-representation through the second image classification model to obtain corresponding re-representation features; blending the first re-representation feature with a re-representation feature corresponding to the fourth enhanced view image to form a blended re-representation feature;

and the total loss function construction and model training unit is used for constructing a comparison loss by using the difference between the query feature and each comparison feature and training the first image classification model by combining the basic loss construction total loss function of the first image classification model by using the first re-representation feature as a query feature and all the rest re-representation features as comparison features.

A processing device, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.

A readable storage medium, storing a computer program which, when executed by a processor, implements the aforementioned method.

The technical scheme provided by the invention can show that: (1) by using a contrast learning method, on the basis of an original domain adaptation method using a target domain high-confidence sample, a target domain low-confidence sample is fully used, and suboptimal domain migration effect of an image classification model due to deviation towards a sample close to a source domain in the target domain is prevented; (2) in contrast learning, original image features are represented again, and task-specific semantic information is encoded better; (3) cross-domain mixing is used for low-confidence samples, the low-confidence samples are led to dominate in the low-confidence samples, the field difference is reduced, and the image classification model can better learn the field invariant features. In general, the method and the device utilize low-confidence samples, and improve the accuracy of unsupervised domain adaptation and semi-supervised domain adaptation image classification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a domain adaptive learning method based on low confidence sample contrast loss according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the average similarity between the same type of samples and different types of samples according to an embodiment of the present invention;

FIG. 3 is a flow chart of the calculation of contrast loss according to the embodiment of the present invention;

FIG. 4 is a process diagram of re-representation of features provided by an embodiment of the present invention;

fig. 5 is a flowchart of calculating cross entropy loss of KLD regularization terms and high confidence samples according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a domain adaptive learning system based on low confidence sample contrast loss according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

First, terms that may be used herein are explained as follows: the terms "comprising," "including," "containing," "having," or other similar terms of meaning should be construed as non-exclusive inclusions. For example: including a feature (e.g., material, component, ingredient, carrier, formulation, material, dimension, part, component, mechanism, device, process, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product, or article of manufacture), is to be construed as including not only the particular feature explicitly listed but also other features not explicitly listed as such which are known in the art.

The invention discloses a domain adaptive learning scheme utilizing low-confidence sample contrast loss, which aims to solve the problem of limited accuracy of the existing domain adaptive image classification method and is applicable to unsupervised domain adaptation (namely training data in a target domain are unlabeled) and semi-supervised domain adaptation (namely the training data in the target domain comprise a small part of labeled data and a large part of unlabeled data). The following describes a domain adaptive learning scheme based on low confidence sample contrast loss according to the present invention in detail. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to a person skilled in the art. Those not specifically mentioned in the examples of the present invention were carried out according to the conventional conditions in the art or conditions suggested by the manufacturer.

Example one

The embodiment of the invention provides a domain adaptive learning method based on low confidence sample contrast loss, which mainly comprises the following steps as shown in figure 1:

step 1, screening a low-confidence sample set from a target domain image set according to a set threshold value.

In the embodiment of the invention, the sample image with low confidence coefficient is used for contrast learning, and the low confidence coefficient is determined according to whether the maximum value of the output probability of the sample image is smaller than a given threshold value

Is determined if less than

The low confidence sample image is assigned, and in particular, the output probability of the second image classification model is used.

The invention discovers that the average similarity between low confidence sample images belonging to the same class is found through earlier stage experiments

Average similarity between low confidence sample images that are low and belong to different classes

Higher as shown in fig. 2. The two classes of average similarity are defined as:

wherein the content of the first and second substances,

and

representing two low confidence sample images screened from the target domain image set,

and

class labels representing two low confidence sample images,

representing images

And with

Belong to the same category of the same group,

representing images

And

belonging to a different category of the plant, the plant is,

and

image features representing two low confidence sample images;

is a mathematical expectation symbol, and T is a transposed symbol;

can be

(unsupervised target Domain),

And

(the superscript indicates low confidence and high confidence). From this result, it is reasonable to use the contrast loss only for low confidence samples, since this reduces the adverse effect of the same class of samples being considered negative in contrast loss.

And 2, for the low-confidence sample image of each target domain, obtaining two different enhanced view images called a first enhanced view image and a second enhanced view image by using a data enhancement method, randomly selecting a source domain sample image from a source domain image set, and obtaining two different enhanced view images called a third enhanced view image and a fourth enhanced view image by using the data enhancement method.

The method mainly comprises the steps of respectively processing a low confidence sample image (belonging to a target domain image) and a source domain sample image to obtain different enhanced views which are used as basic data for low confidence sample contrast learning; the related data enhancement method can refer to the conventional technology, and the invention is not described in detail.

Step 3, mixing the first enhanced view image and the third enhanced view image to form a query image, inputting the query image into a first image classification model, and performing image feature extraction and re-representation through the first image classification model to obtain a first re-representation feature; inputting the second enhanced view image and the fourth enhanced view image into a second image classification model respectively, and extracting and re-representing image features through the second image classification model respectively to obtain corresponding re-represented features; blending the second enhanced view image with the corresponding re-representation feature of the fourth enhanced view image to form a blended re-representation feature.

This step is mainly to obtain re-representation characteristics of each image.

Because the existing contrast learning process only considers the feature space structure of the target domain and ignores the domain difference, the embodiment of the invention provides cross-domain hybrid contrast learning, which is used for learning the invariant features of the domain, namely mixing the first enhanced view image and the third enhanced view image to be used as a query image; moreover, the query image, the second enhanced view image and the fourth enhanced view image are processed through the two image classification models respectively to obtain corresponding re-representation characteristics, so that task-specific semantic information can be better coded; furthermore, the two re-represented features in the second image classification model are blended.

And 4, taking the first re-representation feature as a query feature, taking all the rest re-representation features as comparison features, constructing a comparison loss by using the difference between the query feature and each comparison feature, and constructing a total loss function by combining the basic loss of the first image classification model to train the first image classification model.

The step is based on the processing structure of the step to construct cross-domain mixed contrast loss, and the training of the first image classification model is carried out by combining a basic loss function.

In the embodiment of the invention, the first image classification model and the second image classification model have the same structure and respectively comprise a feature extractor, a re-representation module and a classifier. The model training in the embodiment of the invention mainly updates the parameters of the first image classification model, and then generates the parameters of the second image classification model by using Exponential Moving Average (EMA) according to the parameters of the first image classification model. The implementation of the feature extractor and the classifier can refer to the conventional technology, and the present invention is not described in detail.

In order to more clearly show the technical solutions and the technical effects provided by the present invention, a domain adaptive learning method based on low confidence sample contrast loss provided by the embodiments of the present invention is described in detail below with specific embodiments. Since the domain adaptation learning population includes two-part losses, i.e., the contrast loss and the fundamental loss, as described above, the calculation method of the two-part loss is mainly described, and then the total loss function is described. It should be noted that the specific model structures, frame formats, specific parameter values, etc. referred to in the following description are exemplary and not limiting.

First, loss of contrast.

1. And (5) introducing a model structure.

As shown in fig. 3, the main flow of contrast learning with low confidence samples is shown. The left part is the relevant image, the image content is only an example; the image classification method adopts a teacher-student architecture, namely, the first image classification model is equivalent to a student model, and the second image classification model is equivalent to a teacher model. As described earlier, the structures of the two image classification models are identical, but the parameters of the teacher model are generated by the Exponential Moving Average (EMA) of the student model, and the regression coefficient may be set to 0.99, for example. In addition, the input of the classifier in the image classification model is the original features (the calculation method will be described later), and the output is mainly used for calculating the loss of basis, so the relevant classifier is not shown in fig. 3.

2. And comparing the learning process.

The invention provides a cross-domain hybrid (Mixup) fusion contrast learning method, which has the starting point that low-confidence samples in a target domain are low in similarity with source domain samples, so that the samples are difficult to be classified correctly. The existing contrast learning process only considers the characteristic space structure of the target domain and ignores the domain difference. Therefore, the invention further provides cross-domain hybrid contrast learning for learning the domain invariant features. The comparative learning process can also be seen in fig. 3, which mainly includes:

to a first orderiTaking the sample image with low confidence as an example, the corresponding first enhanced view image and the second enhanced view image are respectively recorded as

And

(ii) a Recording the selected source domain sample image as

And the corresponding third enhanced view image and the fourth enhanced view image are respectively marked as

And

。

first enhancing the view image

And a third enhanced view image

And mixing to serve as a query image. In order to ensure that low-confidence target domain samples are dominant in mixing, the invention is used for mixing coefficients

Using a max function, obtain

As a new mixing coefficient, the cross-domain mixing is represented as:

wherein the content of the first and second substances,

in order to obtain a mixing factor,

is composed ofBetaThe parameters of the distribution (beta distribution),

the obtained query images are blended.

Inputting the query image to a first image classification model, and inputting a second enhanced view image

And a fourth enhanced view image

Respectively input to the second image classification model. The right side of fig. 3 shows the processing flow, for a single image classification model, firstly feature extraction is performed by the feature extractor, and then corresponding image features are obtained by processing through an L2 Norm normalization function (L2 Norm), which is represented as:

wherein the content of the first and second substances,Ffor the feature extractor in the first image classification model,

classifying a feature extractor in the model for the second image,

and

the extracted corresponding features are obtained;

is a function normalized to the norm of L2,

the method comprises the steps of normalizing the characteristics of a query image by using an L2 norm normalization function to obtain image characteristics;

is as followsiA second enhanced view image corresponding to the respective low confidence sample image,

to normalize the function pair using the L2 norm

The features of (1) are normalized to obtain image features;

to normalize the function pair using the L2 norm

The features of (a) are normalized to obtain image features.

The image features obtained above are original features, the present invention uses classifier weights as class prototypes and uses them as a set of new coordinates to re-represent the original features, in order to better encode task-specific semantic information, fig. 4 shows the process of feature re-representation, which is expressed as:

wherein the content of the first and second substances,

classifying classifiers in a model for a first imageCThe weight of (not updated by the contrast loss),

represents the softmax function;

and

are all the characteristics of the image, and the image,

and with

Are all characterized in a manner that is re-expressed,

classifying classifiers in a model for a second image

T is the transposed symbol,

in order to re-represent the temperature coefficient,

is a function for re-representing the image features.

Will be provided with

Substituting the first expression as the image feature

Will be

And

respectively carry in a second expression as image features

Corresponding derived re-representation feature

、

And

namely:

wherein the content of the first and second substances,

in order to represent the features for the first re-expression,

as a feature of an image

Corresponding re-representation features;

to normalize the function pair using the L2 norm

The image features of (a) are normalized to obtain image features,

as a feature of an image

The corresponding re-representation feature.

In obtaining re-representation characteristics

And

after that, they are mixed, expressed as:

wherein the content of the first and second substances,

in order to re-characterize the mixture,

for a blending coefficient used when the first enhanced view image is blended with the third enhanced view image.

In the embodiment of the invention

As a query feature, remove

All other re-representation features except for the contrast feature and constructing the contrast loss of the cross-domain mixture, expressed as:

wherein, the first and the second end of the pipe are connected with each other,

in order to query the features of the image,

in order to re-characterize the mixture,

for the corresponding re-representation feature of the second enhanced view image,

for a corresponding re-representation feature of the fourth enhanced view image,

for a memory bankMThe re-representation features corresponding to the second enhanced view image of the other low confidence sample image obtained by the second image classification model are stored in the database;

is a cosine similarity function expressed as:

wherein the content of the first and second substances,

representing cosine similarity functions

Two features of (1).

The core technical points of the present invention can be summarized from three levels: (1) contrast learning is performed using low confidence samples. (2) The input features of contrast loss need to be re-represented. (3) The cross-domain Mixup technology is merged on the basis of contrast learning.

And, the following beneficial effects are mainly obtained: (1) on the basis of an original domain adaptation method using a target domain high-confidence sample, a target domain low-confidence sample is fully used, and suboptimal domain migration effect of a model due to the fact that the model is biased to a sample which is close to a source domain in the target domain is prevented. (2) The classifier weights are used for representing the original features again instead of being directly used, so that semantic information specific to the task is better encoded. (3) Cross-domain mixing is used for low-confidence samples, the low-confidence samples are dominant in the cross-domain mixing, the domain difference is reduced, and the model can better learn the invariant features of the domain. In general, the method utilizes low-confidence samples, and improves the accuracy of unsupervised domain adaptation and semi-supervised domain adaptation image classification.

And secondly, base loss.

To complete the optimization function, the associated basis penalty is introduced below. First, there is a cross-entropy loss on the labeled samples

And loss for cross-domain alignment features

. On the basis, a semi-supervised learning algorithm (FixMatch) based on a pseudo tag technology is added to strengthen the learning process of the high-confidence sample, so that the prediction consistency is improved, the reliable pseudo tag is provided, and a KLD (Kullback-Leibler divergence) regular term in the high-confidence sample is introduced at the same time

And, andcross entropy loss of high confidence samples after using FixMatch

. Thus, the base loss is expressed as:

in order to have a set of annotated images,

for a single annotated image, a collection of annotated images

Corresponding to the source domain adapted to the unsupervised domain, the source domain adapted to the semi-supervised domain and the labeled part of the target domain (namely including all the labeled images in the source domain image set and the target domain image set);

is composed of

With the target domain image set

The union of (a) and (b),

is composed of

The image of (2) is a single image,

a set of high-confidence samples is represented,

for a single high confidence sample image, the high confidence sample set is a set formed by the residual images in the target domain image set after the low confidence sample set is removed, specifically, the maximum probability of the target domain sample image output by the second image classification model is larger than the threshold value

；

For loss of cross-domain alignment features

The weight coefficient of (a) is,

for KLD regularization term in low confidence samples

The weight coefficient of (2).

may be a loss calculated by other common domain adaptation methods (e.g., domain difference metric loss MMD, domain confrontation loss, etc.), and the present invention is not particularly limited.

Cross entropy loss

Is also a conventional loss in the form of:

wherein the content of the first and second substances,

is shown to beAnnotation image

The output class of the classifier in the first image classification model iskThe probability of (a) of (b) being,

for marked images

The category label of (a) is set,

represents the number of categories, expressed as:

，

for temperature parameters (e.g. settings) of the classifier

）。

KLD regularization term in high confidence samples

And cross-entropy loss of high-confidence samples after FixMatch usage

The calculation is performed by a FixMatch model with a regular term, and the calculation process is as shown in FIG. 5, and includes:

definition of

And

separately representing samples from a set of high confidence samples

Single high confidence sample image

Two different enhanced view images (the former being a weak enhanced view image and the latter being a strong enhanced view image);

inputting the data into a second image classification model (the upper half part of figure 5), obtaining a second classification result through feature extraction and classification, and constructing a pseudo label

；

Inputting the image data into a first image classification model (the lower half part of fig. 5), obtaining a first classification result through feature extraction and classification, and calculating a KLD regular term in a high-confidence sample by using the first classification result

And, calculating the cross-entropy loss of the high-confidence samples after using the FixMatch using the first classification result and the corresponding pseudo-label

。

KLD regularization term in high confidence samples

And cross-entropy loss of high-confidence samples after FixMatch usage

Is expressed as:

wherein the content of the first and second substances,

in order to indicate the function,

the number of the categories is indicated and,

representing i.e. strongly enhanced view images

The class output by the classifier in the first image classification model isjThe probability of (a) of (b) being,

representing i.e. strongly enhanced view images

The class output by the classifier in the first image classification model is

Probability of, pseudo label

The class label corresponding to the maximum probability in the second classification result,

representing the maximum probability of prediction of the second image classification model

Greater than a threshold value

。

And thirdly, a total loss function.

In the embodiment of the present invention, the total loss function is constructed by integrating the above-mentioned comparison loss and the basic loss, and is expressed as:

wherein the content of the first and second substances,

in order to take the basis of the loss,

in order to compare the losses of the process,

in order to compare the weight coefficients of the losses,

is a mathematical expectation symbol;

for a source domain image set

And low confidence sample set

The union of (a) and (b),

is composed of

Of the image data.

Based on the scheme, the following provides an integrated training and testing process introduction, and the main steps comprise:

step 1, preparing a training data set marked by a source domain and a training set of a target domain, and testing. For the training set images of the source domain and the target domain, two enhancements are constructed online: the method comprises the steps of strong enhancement and weak enhancement, wherein the strong enhancement adopts a random data enhancement method (RandAugment), and the weak enhancement adopts common random cutting and water machine horizontal turning. After image processing, the size of the image is scaled to 224 × 224, and then subjected to numerical normalization processing. The images obtained by strong enhancement and weak enhancement are the two different enhancement view images mentioned above, specifically, the first enhancement view image and the third enhancement view image are constructed by using a strong enhancement mode, and the second enhancement view image and the fourth enhancement view image are constructed by using a weak enhancement mode.

And 2, establishing a contrast learning method based on a low confidence sample by using a Pythroch deep learning frame. The model is composed of a teacher model and a student model, the teacher model and the student model have the same structure and initialization parameters, the student model is updated through gradient back propagation, and the teacher model is an exponential sliding average of the parameters of the student model. The model structure adopts common image classification models, such as ResNet34, ResNet50 and the like, and here, the classifier of the classification model is changed into a calculation mode based on cosine similarity. In the comparison learning process, an additional memory bank is used for storing the features generated by the processed low confidence coefficient samples of the target domain, the capacity of the memory bank is 512, a first-in first-out updating mode is adopted, and updating is carried out after the iteration of each batch of samples is finished.

Step 3, inputting the source domain image to a student model, outputting the prediction probability, using the labeling information of the source domain to perform supervised learning, and using the training data of the source domain and the target domain to perform alignment loss

And (4) calculating.

And 4, inputting the image with weak enhancement to the teacher model and inputting the image with strong enhancement to the student model for the target domain image. Outputting the prediction probability according to a given threshold

And determining whether the input sample is a high-confidence sample (the maximum probability predicted by the teacher model is greater than a threshold), if so, generating a pseudo label by using a FixMatch learning mode and a weakly enhanced image, and supervising the strongly enhanced image.

Step 5, mixing different enhanced images of the low confidence sample with the enhanced versions of the randomly sampled source domain images, inputting the mixed different enhanced samples into a student model and a teacher model respectively, generating new features by the output feature intermediate features through a re-representation module, taking the first re-representation features as query features, and combining the mixed re-representation features, the re-representation features corresponding to the second enhanced view image and the re-representation features corresponding to the fourth enhanced view image to construct a positive sample pair for comparative learning, wherein the positive sample pair specifically comprises

And

、

and

、

and

then using the features stored in the memory bank

And (5) constructing a comparison learning loss as a negative sample, and updating the student model.

And 6, updating the memory base by using the characteristics of the target domain after the low-confidence sample is re-represented.

And 7, accumulating the loss functions in the steps 3 and 5, minimizing the loss functions through a back propagation algorithm and a gradient descent strategy, updating the weight of the student model, and updating the parameters of the teacher model through the parameters of the student model.

And 8, inputting a test data set and calculating the classification accuracy of the student model.

Example two

The invention further provides a domain adaptive learning system based on low-confidence sample contrast loss, which is implemented mainly based on the method provided by the first embodiment, as shown in fig. 6, the system mainly includes:

the enhanced view image generation unit is used for obtaining two different enhanced view images, namely a first enhanced view image and a second enhanced view image, in each low confidence sample image by using a data enhancement method, randomly selecting a source domain sample image from a source domain image set, and obtaining two different enhanced view images, namely a third enhanced view image and a fourth enhanced view image by using the data enhancement method;

It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to perform all or part of the above described functions.

EXAMPLE III

The present invention also provides a processing apparatus, as shown in fig. 7, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.

Further, the processing device further comprises at least one input device and at least one output device; in the processing device, a processor, a memory, an input device and an output device are connected through a bus.

In the embodiment of the present invention, the specific types of the memory, the input device, and the output device are not limited; for example:

the input device can be a touch screen, an image acquisition device, a physical button or a mouse and the like;

the output device may be a display terminal;

the Memory may be a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as a disk Memory.

Example four

The present invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.

The readable storage medium in the embodiment of the present invention may be provided in the foregoing processing device as a computer readable storage medium, for example, as a memory in the processing device. The readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A domain adaptive learning method based on low confidence sample contrast loss is characterized by comprising the following steps:

mixing the first enhanced view image and the third enhanced view image to form a query image, inputting the query image into a first image classification model, and performing image feature extraction and re-representation through the first image classification model to obtain a first re-representation feature; inputting the second enhanced view image and the fourth enhanced view image into a second image classification model respectively, and extracting and re-representing image features through the second image classification model respectively to obtain corresponding re-represented features; blending the second enhanced view image with the corresponding re-representation feature of the fourth enhanced view image to form a blended re-representation feature;

2. The method of claim 1, wherein the first enhanced view image and the third enhanced view image are mixed in a manner represented by:

wherein the content of the first and second substances,

in order to obtain a mixing factor,

as a parameter of the Beta distribution,

is the new mixing coefficient obtained by the max function;

is a firstiA first enhanced view image corresponding to each low confidence sample image,

for source domain sample images

A corresponding third enhanced view image is displayed on the display,

the obtained query images are blended.

3. The method of claim 1, wherein the query image is input into a first image classification model, and image feature extraction and re-representation are performed by the first image classification model to obtain a first re-representation feature; inputting the second enhanced view image and the fourth enhanced view image into a second image classification model respectively, and extracting and re-representing image features through the second image classification model respectively to obtain corresponding re-represented features; the process of blending the second enhanced view image with the re-representation features corresponding to the fourth enhanced view image to form a blended re-representation feature is represented as:

for the feature extractor in the second image classification model,

is a function normalized to the norm of L2,

is a function for re-representing image features;

in order to query the image(s),

to obtain image features after normalizing the features of the query image using the L2 norm normalization function,

characterizing the first re-representation;

to normalize the function pair using the L2 norm

The features of (a) are normalized to obtain image features,

as a feature of an image

Corresponding re-representation features;

for source domain sample images

A corresponding fourth enhanced view image is displayed on the display,

to normalize the function pair using the L2 norm

The features of (a) are normalized to obtain image features,

as a feature of an image

Corresponding re-representation features;

in order to re-characterize the mixture,

4. The method of claim 3, wherein the function for re-representing image features is a function of a low confidence sample contrast loss

Is shown as：

Wherein the content of the first and second substances,

classifying classifiers in a model for a first imageCThe weight of (a) is calculated,

represents the softmax function;

，

classifying classifiers in a model for a second image

T is the transposed symbol,

is the temperature coefficient at the time of re-expression.

5. The method of claim 1 or 3, wherein the contrast loss is constructed by using the difference between the query feature and each contrast feature as:

wherein the content of the first and second substances,

in order to query the features of the image,

in order to re-characterize the mixture,

for a memory bankMThe re-representation features corresponding to second enhanced view images of other low confidence sample images obtained by the second image classification model are stored in the database;

is a cosine similarity function.

6. The method of claim 1, wherein the total loss function is expressed as:

wherein the content of the first and second substances,

in order to take the basis of the loss,

in order to compare the losses of the process,

in order to compare the weight coefficients of the losses,

is a mathematical expectation symbol;

for a source domain image set

And low confidence sample set

The union of (a) and (b),

is composed of

A single image of (a);

the base loss includes: cross entropy loss on annotated images

Loss of features for cross-domain alignment

KLD regularization term in high confidence samples

And cross-entropy loss of high-confidence samples after FixMatch usage

FixMatch represents a technique based on pseudo tagsSemi-supervised learning algorithms for surgery; the base loss is expressed as:

wherein the content of the first and second substances,

in order to have a set of annotated images,

for a single annotated image, a collection of annotated images

All the labeled images in the source domain image set and the target domain image set are included;

is composed of

With the target domain image set

The union of (a) and (b),

is composed of

The number of individual images in (1) is,

a set of high-confidence samples is represented,

for a single high-confidence sample imageThe high confidence sample set is a set formed by the residual images in the target domain image set after the low confidence sample set is removed;

for loss of cross-domain alignment features

The weight coefficient of (a) is,

regularization term for KLD in low confidence samples

The weight coefficient of (2).

7. The method of claim 6, wherein KLD regularization term in the high confidence samples is used as a basis for domain adaptive learning based on contrast loss of low confidence samples

And cross-entropy loss of high-confidence samples after FixMatch usage

The calculation method comprises the following steps:

definition of

And

separately representing samples from a set of high confidence samples

Single high confidence sample image

Two different enhanced view images;

inputting the image data into a second image classification model, obtaining a second classification result through feature extraction and classification, and constructing a pseudo label

；

Inputting the image data into a first image classification model, obtaining a first classification result through feature extraction and classification, and calculating a KLD regular term in a high-confidence sample by using the first classification result

；

KLD regularization term in high confidence samples

And cross-entropy loss of high-confidence samples after FixMatch usage

Is expressed as:

in order to indicate the function,

the number of the categories is indicated and,

representing i.e. enhancing a view image

representing i.e. enhancing a view image

The class output by the classifier in the first image classification model is

Probability of, pseudo label

Greater than a threshold value

。

8. A domain adaptive learning system based on low confidence sample contrast loss, which is realized based on the method of any one of claims 1 to 7, and comprises:

a re-representation feature obtaining unit, configured to mix the first enhanced view image and the third enhanced view image to obtain a query image, input the query image into a first image classification model, perform image feature extraction through the first image classification model, and perform re-representation to obtain a first re-representation feature; inputting the second enhanced view image and the fourth enhanced view image into a second image classification model, and respectively performing image feature extraction and re-representation through the second image classification model to obtain corresponding re-representation features; blending the first re-representation feature with a re-representation feature corresponding to the fourth enhanced view image to form a blended re-representation feature;

and the total loss function construction and model training unit is used for constructing a comparison loss by using the difference between the query feature and each comparison feature and combining the basic loss construction total loss function of the first image classification model to train the first image classification model by using the first re-representation feature as a query feature and all the rest re-representation features as comparison features.

9. A processing device, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A readable storage medium, storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1 to 7.