CN117253097A

CN117253097A - Semi-supervision domain adaptive image classification method, system, equipment and storage medium

Info

Publication number: CN117253097A
Application number: CN202311541384.3A
Authority: CN
Inventors: 王子磊; 凃科宇
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2023-12-19
Anticipated expiration: 2043-11-20
Also published as: CN117253097B

Abstract

The invention discloses a semi-supervised domain adaptive image classification method, a system, equipment and a storage medium, which are one-to-one schemes, wherein: the sensitivity weighted feature contrast loss and the probability space contrast loss are used, the contrast loss is constructed in the probability space on the basis of the original feature contrast learning method, the target domain features of multiple dimensions are aligned, the feature extractor and the classifier are optimized synchronously, and the suboptimal field migration effect of the classifier caused by deviation of the classifier to the source domain sample is prevented; moreover, the sensitivity score is used for guiding the network to pay attention to the target domain sample with challenges, so that few difficult samples with poor classification effect are emphasized; meanwhile, pseudo tags monitor the probability space contrast loss, so that semantic conflict is avoided, and the network learns better field invariant features. In general, the invention utilizes the joint contrast learning and combines the sensitivity score and the category perception to improve the accuracy of the image classification of the semi-supervision domain adaptation.

Description

Semi-supervision domain adaptive image classification method, system, equipment and storage medium

Technical Field

The present invention relates to the field of image classification, and in particular, to a method, system, device, and storage medium for classifying images in a semi-supervised domain.

Background

With the development of deep learning, deep neural networks play an important role in solving the machine learning problem, however, their fine performance is highly dependent on a large number of high quality annotation data sets. Obtaining an accurate manual annotation dataset requires significant time and labor costs. And because of domain migration problems, deep neural network models cannot be effectively generalized to new data distributions. Thus, domain adaptation assists in learning on a target domain that is related to the source domain but lacks annotations by using knowledge learned from a large number of annotated source domain data. The training time and the learning cost can be effectively reduced by avoiding learning the target domain. Semi-supervised domain adaptation can access a small number of annotated target domain samples, a large number of unlabeled target domain samples, and a large number of annotated source domain samples. The common method for realizing the domain migration is to make the model learn the unchanged characteristics of the domain and pull the distance between the source domain distribution and the target domain distribution.

In a chinese patent application publication No. CN112541580a, a semi-supervised domain adaptive method based on active countermeasure learning, the most valuable target data is marked by an active learning method, and a multi-class discriminator is used to alleviate the distribution difference between the marked target domain sample and the source domain sample; domain contrast training is performed by the confusion domain arbiter to mitigate domain differences. In the Chinese patent of the invention of the grant bulletin number CN113011456B, namely an unsupervised domain adaptation method based on a class self-adaptive model for image classification, a domain transferable encoder is established through a self-attention module and a cross-attention module, so that intra-domain alignment and inter-domain alignment are realized; a class adaptive decoder is built to reduce domain differences through class prototype learning and alignment. In the Chinese patent publication No. CN113283489B, a classification method based on the adaptation learning of a semi-supervised domain of joint distribution matching, a model trained by source domain data is used for migrating to a target domain to process target domain data based on the theory of a nuclear method; the difference between the source domain sample and the target domain sample distribution is measured, and the joint distribution of the target domain nuclear source domain is pulled. In the Chinese patent application of the invention of the publication No. CN114529900A, namely a semi-supervised domain adaptive semantic segmentation method and system of feature prototypes, a deep neural network comprises an encoder, a classifier, a decoder and a wind lattice conversion network; the feature prototype obtained by the features of the intermediate layer of the classifier is utilized to improve the alignment of feature levels by aligning the images respectively converted by the encoder and the style conversion network; and in the constraint characteristic extraction process, the generalization performance of the model is improved, and a better segmentation effect is achieved on the data distribution of the target domain. In the Chinese patent application No. CN114781647A, namely an unsupervised domain adaptation method for distinguishing simple and difficult samples, dividing a target domain sample into a simple sample and a difficult sample by the entropy value of the target domain sample, and distributing pseudo labels for the simple sample by using a source domain classifier; calculating class centers by using simple sample pseudo tags of a source domain tag and a target domain, optimizing inter-domain alignment and instance contrast alignment in a distribution manner, and reducing inter-domain and intra-domain differences. In the Chinese patent application No. CN114998602B, the domain adaptation learning method and system based on low confidence coefficient sample contrast loss, the target domain low confidence coefficient sample is fully utilized on the original domain adaptation method using the target domain high confidence coefficient sample, and cross-domain mixing is used for the low confidence coefficient sample; meanwhile, in contrast learning, recoding original image features according to tasks to obtain specific semantic information; the domain difference is reduced, and the suboptimal domain migration effect of the image classification model caused by biasing towards a sample close to the source domain in the target domain is prevented.

Most of the methods are to use a source domain encoder and a classifier to acquire alignment of sample characteristics of a class prototype optimization target domain, so that an image classification model has a source domain style on the extracted characteristics of the target domain sample, the content and style of the target domain sample are ignored, and the semi-supervised domain adaptation image classification accuracy is poor.

In view of this, the present invention has been made.

Disclosure of Invention

The invention aims to provide a semi-supervised domain adaptive image classification method, a system, equipment and a storage medium, which can improve the accuracy of semi-supervised domain adaptive image classification.

The invention aims at realizing the following technical scheme:

a semi-supervised domain adaptive image classification method, comprising:

acquiring training data, comprising: the method comprises the steps of processing an unlabeled target domain image in an unlabeled target domain image set by adopting two different data enhancement modes, wherein an enhancement image obtained by a first data enhancement mode is called a first enhancement image, and two enhancement images are obtained by different processing means in a second data enhancement mode and are called a second enhancement image and a third enhancement image;

Constructing an image classification network based on joint contrast learning, comprising: a feature extractor, a classifier and a linear mapping network;

training the image classification network, wherein the training process comprises the following steps: inputting three enhanced images corresponding to the non-labeling target domain image into an image classification network, extracting image features through a feature extractor, and obtaining probability vectors through a classifier; generating a pseudo tag by using the probability vector of the first enhanced image, and calculating a first classification loss by combining the probability vector of the second enhanced image from the same non-labeling target domain image; based on the pseudo labels, positive and negative samples corresponding to each enhanced image are screened out by adopting a category perception mode, and probability space contrast loss is calculated by combining probability vectors corresponding to the positive and negative samples; the image features of the second enhanced image and the third enhanced image are respectively subjected to linear mapping through a linear mapping network to calculate sensitivity scores, and the sensitivity weighted feature contrast loss is calculated by combining the sensitivity scores; calculating second classification loss by using the marked information in the marked source domain and target domain image sets; the first classification loss, the second classification loss, the probability space comparison loss and the sensitivity weighted characteristic comparison loss are synthesized to obtain a total loss function, and the total loss function is utilized to train an image classification network;

After training, classifying the input images by using the trained image classification network.

A semi-supervised domain adaptive image classification system, comprising:

the data acquisition and processing unit is used for acquiring training data and comprises the following steps: the method comprises the steps of processing an unlabeled target domain image in an unlabeled target domain image set by adopting two different data enhancement modes, wherein an enhancement image obtained by a first data enhancement mode is called a first enhancement image, and two enhancement images are obtained by different processing means in a second data enhancement mode and are called a second enhancement image and a third enhancement image;

a network construction unit, configured to construct an image classification network based on joint contrast learning, including: a feature extractor, a classifier and a linear mapping network;

the network training unit is used for training the image classification network, and the training process comprises the following steps: inputting three enhanced images corresponding to the non-labeling target domain image into an image classification network, extracting image features through a feature extractor, and obtaining probability vectors through a classifier; generating a pseudo tag by using the probability vector of the first enhanced image, and calculating a first classification loss by combining the probability vector of the second enhanced image from the same non-labeling target domain image; based on the pseudo labels, positive and negative samples corresponding to each enhanced image are screened out by adopting a category perception mode, and probability space contrast loss is calculated by combining probability vectors corresponding to the positive and negative samples; the image features of the second enhanced image and the third enhanced image are respectively subjected to linear mapping, then the sensitivity score is calculated, and the feature contrast loss of sensitivity weighting is calculated by combining the sensitivity score; calculating second classification loss by using the marked information in the marked source domain and target domain image sets; the first classification loss, the second classification loss, the probability space comparison loss and the sensitivity weighted characteristic comparison loss are synthesized to obtain a total loss function, and the total loss function is utilized to train an image classification network;

And the image classification unit is used for classifying the input images by using the trained image classification network after training.

A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.

A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.

According to the technical scheme provided by the invention, the method comprises the steps of (1) constructing contrast loss in a probability space by using joint contrast loss, namely sensitivity weighted feature contrast loss and probability space contrast loss, aligning target domain features in multiple dimensions and synchronizing optimization of a feature extractor and a classifier on the basis of an original feature contrast learning method, and preventing the classifier from suboptimal field migration effect caused by deviation to a source domain sample; (2) Guiding the network to pay attention to the challenging target domain samples by using the sensitivity scores, focusing on solving a few difficult samples which can cause poor classification effect; (3) The high-quality pseudo tag is used for supervising the probability space contrast loss, so that the occurrence of semantic conflict is avoided, and the network learns to better field invariant features. In general, the invention utilizes the joint contrast learning and combines the sensitivity score and the category perception to improve the accuracy of the image classification of the semi-supervision domain adaptation.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a semi-supervised domain adaptive image classification method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a semi-supervised domain adaptive training mode according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a training framework of an image classification network based on joint contrast learning according to an embodiment of the present invention;

FIG. 4 is a frame flow chart of a Fixmatch method provided by an embodiment of the invention;

FIG. 5 is a schematic diagram of a semi-supervised domain adaptive image classification system according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a processing apparatus according to an embodiment of the present invention;

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The terms that may be used herein will first be described as follows:

the terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.

The following describes in detail a method, a system, a device and a storage medium for classifying semi-supervised domain adaptive images. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.

Example 1

The embodiment of the invention provides a semi-supervised domain adaptive image classification method, as shown in fig. 1, which mainly comprises the following steps:

Step 1, training data are obtained, and enhancement processing is carried out.

In the embodiment of the invention, the training data mainly comprises: a labeled source domain and target domain image set, and a label-free target domain image set.

In the embodiment of the invention, two different data enhancement modes are adopted, and the marked source domain and target domain image sets are processed by adopting a first data enhancement mode to obtain corresponding enhanced images. The non-labeling target domain image in the non-labeling target domain image set is processed by adopting two different data enhancement modes, the enhancement image obtained by the first data enhancement mode is called a first enhancement image, and the two enhancement images are obtained by different processing means in the second data enhancement mode and are called a second enhancement image and a third enhancement image.

And 2, constructing an image classification network based on joint contrast learning.

In the embodiment of the invention, the constructed network mainly comprises: the feature extractor, the classifier and the linear mapping network are mainly used for assisting training, and the linear mapping network is removed after training is finished.

It should be noted that, the steps 1 and 2 are not distinguished in execution sequence, and may be executed synchronously or sequentially in any order.

And step 3, training the image classification network.

In the embodiment of the invention, the training process comprises the following steps: inputting three enhanced images corresponding to the non-labeling target domain image into an image classification network, extracting image features through a feature extractor, and obtaining probability vectors through a classifier; generating a pseudo tag by using the probability vector of the first enhanced image, and calculating a first classification loss by combining the probability vector of the second enhanced image from the same non-labeling target domain image; based on the pseudo labels, positive and negative samples corresponding to each enhanced image are screened out by adopting a category perception mode, and probability space contrast loss is calculated by combining probability vectors corresponding to the positive and negative samples; the image features of the second enhanced image and the third enhanced image are respectively subjected to linear mapping through a linear mapping network to calculate sensitivity scores, and the sensitivity weighted feature contrast loss is calculated by combining the sensitivity scores; calculating second classification loss by using the marked information in the marked source domain and target domain image sets; and obtaining a total loss function by integrating the first classification loss, the second classification loss, the probability space comparison loss and the sensitivity weighted characteristic comparison loss, and training the image classification network by using the total loss function.

(1) A first classification penalty.

For a single non-labeling target domain image, the method comprises the steps of generating pseudo labels by using probability vectors of a first enhancement image output by a classifier, supervising the probability vectors of a second enhancement image and a third enhancement image, and carrying out supervised Fixmaxtch (semantic consistency matching) learning.

(2) Probability space contrast loss.

And (3) carrying out semantic comparison by using pseudo labels, screening out positive samples corresponding to each enhanced image, and regarding any enhanced image as an anchor point sample, wherein the positive samples comprise: the first enhanced image or the second enhanced image is from the same label-free target domain image with the anchor point sample, and the enhanced image which belongs to the same category with the pseudo label corresponding to the anchor point sample; the remainder are negative samples of the first enhanced image. And then, calculating the probability space contrast loss by combining probability vectors corresponding to the positive and negative samples.

(3) Sensitivity weighted feature contrast loss.

The image features of the second enhanced image and the third enhanced image are mapped into low-dimensional features through a linear mapping network, and then a calculated sensitivity score guiding model focuses on challenging target domain samples; and weighting the existing feature contrast loss by taking the sensitivity score as a weight to obtain the feature contrast loss with sensitivity weighting.

(4) The second classification is lost.

In the step 1, the labeled source domain and target domain image sets are processed by adopting a first type data enhancement mode to obtain corresponding enhanced images, and then the enhanced images are input into an image classification network, image features are extracted through a feature extractor, probability vectors are obtained through a classifier, and then second classification losses are calculated by combining corresponding labeling information.

And carrying out weighted summation on the losses to obtain a total loss function, training an image classification network according to the total loss function, updating parameters of a feature extractor, a classifier and a linear mapping network, and finally minimizing the total loss function.

And 4, classifying the input images by using the trained image classification network.

In the embodiment of the invention, after training is finished, removing the linear mapping network, extracting image features from an input image through a feature extractor in a trained image classification network, and obtaining a probability vector through a classifier, wherein an element index with the largest numerical value in the probability vector is a classification result; in this section, the input image is typically an unlabeled target domain image.

The scheme provided by the embodiment of the invention mainly has the following advantages: (1) Combining probability space contrast learning and feature contrast learning, constructing contrast loss again in the probability space on the basis of the original feature contrast learning method, aligning target domain features of multiple dimensions, synchronizing optimization of a feature extractor and a classifier, and preventing suboptimal domain migration effect of the classifier caused by deviation of the classifier to a source domain sample; (2) In contrast learning used in the invention, a sensitivity weight guiding model is used for focusing on a challenging target domain sample, so that a few difficult samples with poor classification effects are emphasized; (3) The high-quality pseudo tag is used for supervising probability contrast loss, so that occurrence of semantic conflict is avoided, and a model learns better field invariant features. In general, the invention utilizes joint contrast learning and combines sensitivity weight and category perception to improve the accuracy of image classification of semi-supervised domain adaptation.

In order to more clearly demonstrate the technical scheme and the technical effects provided by the invention, the method provided by the embodiment of the invention is described in detail below by using specific embodiments.

1. Overall overview of the protocol.

The semi-supervised domain adaptive image classification method based on joint contrast loss is mainly used for solving the problem that the accuracy of an existing semi-supervised domain adaptive image classification scheme is limited. Current schemes generally use a single-dimensional feature to align the distribution between the source and target domains by minimizing the distance between the two domains. However, since the features of a particular dimension do not cover the distribution of the target domain samples over the whole area, the aligned distribution is not complete and comprehensive. In addition, in the existing scheme, the feature participating in training is generally the output of the feature extractor, namely the feature vector, so that the optimization process of the classifier is less influenced by the target domain. Because the classifier is significantly biased towards the source domain samples, the network lacks the ability to judge the target domain samples. As shown in fig. 2, the present invention uses joint contrast loss: the sensitivity weighted feature contrast loss and probability space contrast loss, and the similarity of positive sample pairs in contrast learning gives attention to different degrees to the samples, namely the invention comprises two parts of solid arrows and dotted arrows in fig. 2, and the output of the classifier, namely the probability vector, is used in the solid arrow part. And through deep mining of the information of the target domain data, the learning of the domain invariant features of the target domain sample and the source domain sample is promoted, and the image classification accuracy of the deep neural network model in the target domain is improved.

2. And constructing a network model and training.

1. And constructing a network model.

The embodiment of the invention constructs an image classification network based on joint contrast learning, which mainly comprises the following steps: a feature extractor F, a classifier C and a nonlinear mapping network P; the classification result of the network prediction is thatWherein->Is the image feature extracted from the image x by the feature extractor F,/and>is the classifier weight, T is the transposed symbol, ">For the softmax function (normalized exponential function), i.e. the classifier outputs a normalized probability vector, the arg max function represents selecting the element index with the largest value in the probability vector as the classification result +.>。

2. The network model is trained.

The main core innovation points in the scheme provided by the embodiment of the invention can be summarized as two aspects: on the one hand is: comparing probability space of category perception; the other aspect is: feature contrast of sensitivity weighting; the overall structure is shown in fig. 3. Two innovation points are sequentially described in detail below, and then an overall optimization scheme is provided. Using、/>And->Respectively representing a marked source domain set, a marked target domain image set and a non-marked target domain image set. And (3) carrying out data enhancement on the unlabeled target domain image x by referring to the original contrast learning paradigm, wherein a first type of data enhancement mode is denoted as A (), and is expressed as: ，/>Obtaining a first enhanced image (enhanced view) by a first type of data enhancement; the second class of data enhancements is denoted B (), expressed as: />Obtaining an enhanced image through a second type data enhancement mode; />(i.e.)>) Different processing means included in the second class data enhancement mode to obtain a second enhanced image +.>And third enhanced image->，B ₁ (-) and B ₂ (.) is two processing means in the second class of data enhancement.

In the embodiment of the present invention, the first type of data enhancement mode is a weak data enhancement mode, including: a basic image enhancement method such as clipping and rotation; the second type of data enhancement is a strong data enhancement, which may be implemented, for example, by a data enhancement algorithm (RadnAugment), and exemplary strong data enhancement may include: color transformation, sharpening, erasure, etc., for example, the second enhanced image and the third enhanced image shown in fig. 3 are respectively processed by erasure and color transformation.

(1) The probability of class perception is spatially compared.

Through early experiments, the classifier weight of the semi-supervised domain adaptive network optimized by the general feature contrast loss is foundThere is a clear source domain preference. Based on this result, introducing the target domain samples into the classifier's optimization, thereby correcting the classifier's preference for the source domain is a reasonable direction of improvement. The invention adopts the probability vector of the target domain sample to construct the contrast loss, synchronously optimizes the feature extractor, the classifier and the nonlinear mapping network, and ensures that the three obtain the characteristic of matching with the real target domain distribution. In general, the contrast loss of the probability vector in the present invention is calculated by the following formula:

；

Wherein,representing contrast loss function, +.>Representing the utilization of probability vectors->Calculated contrast loss, enhanced image->Representing a first enhanced image or a second enhanced image corresponding to the non-annotated target field image x, ++>Output +.>Corresponding probability vector, ">Is the contrast penalty for the probability vector.

In the embodiment of the invention, the loss function is comparedThe contrast loss function, which can be implemented using the form InfoNCE (mutual information contrast loss), is given by:

；

wherein,for the probability vector corresponding to an anchor sample, an anchor sample is any enhancement picture, e.g. the enhancement picture mentioned earlier +.>；/>Representing the probability vector corresponding with anchor samples +.>Calculated contrast loss,/->Representing the relevant positive samples, +.>Then it is a negative sample of the participation calculation, +.>Is the hyper-parameter for controlling uniformity when calculating contrast loss, M is the negative sample number, ++>The index sign of the negative sample, exp, is an exponential function with the natural constant e as the base. The probability vectors of the respective samples participate in the above probability spatial comparison. In contrast learning, by comparing and simulating an anchor point sample with a positive sample (with higher similarity with an anchor point), and separating the characteristics of the anchor point and a negative sample (with weaker relevance with the anchor point), representative image characteristics are learned.

In the above formula, the positive sample is generally composed of two different enhancement images from the same image sample, i.e. the first enhancement image and the second enhancement image of the same non-labeling target domain image are positive samples, while all other first enhancement images and second enhancement images form a negative sample set. However, due to the lack of reliable class labels, there must be data in the negative samples that belong to the same class as the positive samples, which is a semantic conflict phenomenon. Because of the limitations of the selection approach, the portion of data that should be modeled as a positive sample is rejected as a negative sample, misleading the training of the deep neural network for class information. For this purpose, the method for screening positive and negative samples corresponding to each enhanced image by using a category perception mode includes: and (3) carrying out semantic comparison by using pseudo labels, screening out positive samples corresponding to each enhanced image, and regarding any enhanced image as an anchor point sample, wherein the positive samples comprise: the first enhanced image or the second enhanced image is from the same label-free target domain image with the anchor point sample, and the enhanced image which belongs to the same category with the pseudo label corresponding to the anchor point sample; the remainder are negative samples of the first enhanced image.

In this section, any enhanced image refers to any first enhanced image or second enhanced image, and the first enhanced image or second enhanced image from the same non-labeling target domain image with the anchor point sample refers to the first enhanced image, when the current anchor point sample is the first enhanced image, then its positive sample is the second enhanced image from the same non-labeling target domain image, and when the current anchor point sample is the second enhanced image, then its positive sample is the first enhanced image from the same non-labeling target domain image.

The specific embodiments are as follows: the first enhanced image and the second enhanced image of each non-labeling target domain image are respectively input into the network, and the corresponding predicted value is more reliable because the first enhanced image has little influence on the original information. The probability vector of the first enhanced image is endowed with a pseudo tag through a network and screening is carried out: the maximum predicted probability in the probability vector is greater than a thresholdAnd a second enhanced image from the same unified non-annotated target domain image as a high confidence sample is added to the annotated target domain image set. Judging a positive sample according to the category information of the marked target domain image set: for example, there is a data enhanced sample +. > (/>Is the first enhanced image->Is a pseudo tag of (1) all first derived from unlabeled target domain imagesEnhanced image and second enhanced image, and (2) prediction category ++>Is the target domain samples of (a) are all +.>The relevant positive samples, here the target domain samples mainly refer to the second enhanced image, and the second enhanced image also uses the pseudo tag of the first enhanced image from the same non-annotated target domain image as the image class tag. The positive samples in both cases (1) and (2) above can be judged by the condition whether the pseudo tags are equal, if equal, the positive samples are equal, and if unequal, the negative samples are equal. Finally, the probability space contrast loss is calculated by combining the probability vectors corresponding to the positive and negative samples, and the calculation formula is expressed as follows:

；

wherein,for using the probability vector corresponding to the anchor sample +.>The calculated probability space contrast loss; />Is the probability vector of all enhancement pictures i (first enhancement picture and second enhancement picture) except the anchor sample,/for each anchor sample>Representation->Corresponding labels, N being the number of enhanced images removed from a batch except for anchor samples, +.>Is the super parameter for calculating the control uniformity when comparing loss；/>Is a judging function, if the condition in the brackets is satisfied, the value is 1, otherwise, the value is 0; exp is an exponential function with a natural constant e as a base; / >And representing the pseudo tag corresponding to the anchor point sample, wherein when the anchor point sample is the second enhanced image, the corresponding pseudo tag is the pseudo tag of the first enhanced image from the same label-free target domain image.

(2) Sensitivity weighted feature contrast.

In the embodiment of the invention, the sensitivity is judged according to the similarity of positive samples with contrast loss. Because of the consistency assumption, different data enhancements of the same image should exhibit similar semantic information after feature extraction. If mutually contradictory semantic information appears, the extracted features are not satisfied with the content representativeness required for the subsequent classification. Therefore, the network should perform a focused study on the part of the samples, so as to avoid the classification short board of the network in the migration process caused by difficult samples.

The part mainly relates to a second enhanced image and a third enhanced image of each non-labeling target domain image, after corresponding image features are obtained through a feature extractor, linear mapping is carried out through a linear mapping network, high-dimensional image features (high-dimensional features) are mapped into low-dimensional features, and then sensitivity scores are calculated through the following formulas and expressed as follows:

；

wherein,representing the utilization of enhanced images- >Is a linear mapped image feature +.>Calculated sensitivity fraction, said enhanced image +.>A second enhanced image or a third enhanced image which is a non-annotated target field image x, ++>To enhance the image +.>Linear mapped image features of positive samples, +.>To enhance the image +.>The linear mapped image feature of the jth negative sample, k is the number of negative samples; enhanced image->The positive sample is a second enhanced image or a third enhanced image from the same non-annotated target domain image (i.e. the second enhanced image and the third enhanced image from the same non-annotated target domain image are mutually positive samples), enhanced image ∈ ->The negative sample is the other enhanced image except the positive sample in the same batch, < >>Is a hyper-parameter for controlling uniformity in calculating sensitivity scores.

The less similar an enhanced image is to its positive sample, the higher the sensitivity score is, the greater the learning strength of the network is, the attention of the network to the sample is corrected by the sensitivity score, so the invention loses the original characteristic contrastThe improvement is as follows:

；

wherein the original characteristic contrast is lostRefers to the image feature outputted by the feature extractor +.>Calculating characteristic contrast loss; / >Representing the utilization of enhanced images->Is a linear mapped image feature +.>Calculated feature contrast loss, < >>Characteristic contrast loss representing sensitivity weighting. Taking into account feature contrast lossThe calculation can be performed according to the existing method, and thus, a detailed description is omitted.

Through the feature representation of the deep mining difficult samples, the fuzzy samples with the features at certain category boundaries are guided to gather towards the category center, so that a better image classification effect on a target domain is realized.

In the embodiment of the invention, the high dimension and the low dimension are relative concepts, and mainly aim to reduce the dimension of the input features for the feature mapping network.

(3) A base loss function.

In the embodiment of the invention, the basic loss function comprises the following steps: fixmatch method (first class penalty) with enhanced high confidence sample training, and cross entropy penalty on labeled samples(i.e., a second classification loss).

In the embodiment of the invention, the probability vector of the first enhanced image is utilized to generate the pseudo tagUsing a second enhanced image from the same non-annotated target field image, a first classification loss is calculated with the probability vector of the first enhanced image, expressed as:

；

wherein,for the first classification loss, the symbol +. >Refers to a threshold value used in the first classification loss; />A probability vector representing a second enhanced image, H indicating cross entropy loss; />Finger probability vector->The maximum predicted probability value of (2) is greater than +.>When outputting 1, otherwise outputting 0.

Fig. 4 presents the framework flow of the Fixmatch method above.

In the embodiment of the present invention, the second classification loss calculation method includes: processing the marked source domain and target domain image sets by adopting a first type data enhancement mode to obtain corresponding enhanced images, inputting the enhanced images into an image classification network, extracting image features by a feature extractor, obtaining probability vectors by a classifier, and calculating second classification loss by combining corresponding marking information:

；

wherein,representing an annotated image, comprising: the marked source domain and target domain images are concentrated;for marked images->K is the number of marked images in the same batch, u is the number of marked images, < ->For marked images->(category label vector).

Finally, the base loss is the sum of the two classification losses, expressed as:

；

wherein E is a mathematical desired symbol,as a basis loss.

(4) An integral optimization scheme.

The integral optimization scheme of the invention is formed by combining the steps (1) - (3), and is expressed as follows:

；

wherein L represents the total loss function,and->Is a scaling factor used to control the importance of the corresponding loss.

Based on the total loss function, the network is trained, parameters in the network are optimized, and the process involved in the stage can be realized by referring to the conventional technology, so that the description is omitted.

3. And (3) reasoning stage.

After training, removing the linear mapping network, and obtaining a probability vector after the image to be classified passes through the feature extractor and the classifier, wherein the element index with the largest value in the probability vector is the classification result.

In order to intuitively present the complete flow of the present invention, a specific example of steps is provided below.

Step S1, preparing a labeled source domain training data setTraining data set with labels in target domain +.>Training set without label of target domain +.>And a test set. For->And->A set of weak data enhancement (consisting of basic image enhancement methods such as cropping, rotation, etc.) is performed. For->Two sets of strong data enhancements and one set of weak data enhancements are performed. All images are scaled to a uniform size (e.g., 224 x 224) after the data enhancement process and numerical normalization is performed. The test set comprises marked target domain images, and is mainly used for test training The classification accuracy after training.

And S2, constructing an image classification network based on joint contrast learning based on a Pytorch (an open source Python machine learning library) learning framework. A pre-trained deep neural network such as a common ResNet34 (34-layer deep residual network), alexnet (a convolutional neural network) and the like is used as a feature extractor; the classifier is built by a full-connection layer and a normalization layer, and parameters are initialized randomly; the linear mapping network is composed of a nonlinear layer (a modified linear unit layer and the like) and a full connection layer, and parameters are initialized randomly. The feature extractor and classifier in the upper and lower branches share weights in a combined connection in the manner of fig. 3.

S3, randomly sampling samples of a source domain and a target domain; and step S1, the data are enhanced and spliced into an integral tensor, and the integral tensor is input into a feature extractor of an image classification network to obtain high-dimensional features.Is a group of weakly enhanced image features,/-)>A set of strongly enhanced image features of (a) flows into classifier C; />All of the two sets of strongly enhanced high-dimensional features enter the mapping network P.

Step S4, classifier pairThe weakly enhanced image (first enhanced image) of the sample outputs a probability vector, and using the available annotation information, a supervised cross entropy penalty (i.e., a second classification penalty) is constructed according to the formula provided previously.

And S5, outputting a classification result of the non-annotated target domain weak enhanced image (a first enhanced image), outputting probability vectors of the strong enhanced image (a second enhanced image and a third enhanced image), supervising the strong enhanced image by using the classification result of the weak enhanced image as a pseudo tag, performing supervised Fixmatch learning according to the formula introduced above, and calculating the first classification loss.

And S6, carrying out semantic comparison by using the pseudo labels of the weak enhanced images of the unlabeled target domain obtained in the last step, and screening out a positive sample set and a negative sample set of each anchor point sample. Using the probability vector output by classifier C, a probability spatial contrast penalty is constructed based on the formula described above.

And S7, mapping the high-dimensional features of the two groups of strong enhanced images of the unlabeled target domain to a lower dimension by the feature mapping network, calculating a sensitivity score by using paired low-dimensional features, constructing an original feature contrast loss, and multiplying the sensitivity score by the original contrast loss to obtain a sensitivity weighted feature contrast loss.

Step S8, the losses in the step S4, the step S5, the step S6 and the step S7 are weighted and summed, and the loss function is minimized through a back propagation algorithm and a gradient descent strategy. And synchronously updating weights of the feature extractor, classifier and mapping network.

And S9, removing the feature mapping network, inputting a test data set, obtaining a classification result by using the feature extractor and the classifier, and calculating the accuracy of network classification according to the classification result.

From the description of the above embodiments, it will be apparent to those skilled in the art that the above embodiments may be implemented in software, or may be implemented by means of software plus a necessary general hardware platform. With such understanding, the technical solutions of the foregoing embodiments may be embodied in a software product, where the software product may be stored in a nonvolatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and include several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present invention.

Example two

The present invention also provides a semi-supervised domain adaptive image classification system, which is mainly used for implementing the method provided in the foregoing embodiment, as shown in fig. 5, and the system mainly includes:

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the system is divided into different functional modules to perform all or part of the functions described above.

Example III

The present invention also provides a processing apparatus, as shown in fig. 6, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.

Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.

In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:

The input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;

the output device may be a display terminal;

the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.

Example IV

The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.

The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A semi-supervised domain adaptive image classification method, characterized by:

Obtain training data, including: labeled source domain and target domain image sets, and unlabeled target domain image sets. Use two different types of data enhancement methods to process the unlabeled target domain images in the unlabeled target domain image set. , the enhanced image obtained through the first type of data enhancement method is called the first enhanced image, and the two enhanced images obtained through different processing means in the second type of data enhancement method are called the second enhanced image and the third enhanced image;

Construct an image classification network based on joint contrastive learning, including: feature extractor, classifier and linear mapping network;

The image classification network is trained. The training process includes: inputting three enhanced images corresponding to the unlabeled target domain image into the image classification network, extracting image features through the feature extractor, and obtaining the probability vector through the classifier; using the first enhancement The probability vector of the image generates a pseudo label, and combines the probability vector of the second enhanced image from the same unlabeled target domain image to calculate the first classification loss; based on the pseudo label, the positive and negative samples corresponding to each enhanced image are screened out using a category-aware method. Calculate the probability spatial contrast loss by combining the probability vectors corresponding to the positive and negative samples; linearly map the image features of the second enhanced image and the third enhanced image through the linear mapping network to calculate the sensitivity score, and combine the sensitivity scores to calculate the sensitivity weighted features Contrast loss; use the annotation information in the labeled source domain and target domain image sets to calculate the second classification loss; combine the first classification loss, the second classification loss, the probability space contrast loss and the sensitivity-weighted feature contrast loss to obtain the total loss function, Use the total loss function to train the image classification network;

After training, the trained image classification network is used to classify the input images.

2. A semi-supervised domain adaptive image classification method according to claim 1, characterized in that the probability vector of the first enhanced image is used to generate pseudo labels, combined with the probability of the second enhanced image from the same unlabeled target domain image. The vector calculation of the first classification loss is expressed as:

;

in, is the first classification loss, symbol/> refers to the threshold used in the first classification loss, /> Represents the probability vector of the second enhanced image, /> The pseudo label generated for the probability vector of the first enhanced image, H refers to the cross entropy loss;/> Is the judgment function,/> Refers to the probability vector/> The maximum predicted probability value in is greater than/> When , output 1, otherwise output 0.

3. A semi-supervised domain adaptive image classification method according to claim 1, characterized in that, based on pseudo-labels, using a category-aware approach to select positive and negative samples corresponding to each enhanced image includes:

Use pseudo-labels for semantic comparison to filter out the positive samples corresponding to each enhanced image. For any enhanced image, it is regarded as an anchor sample. Its positive samples include: the first image from the same unlabeled target domain image as the anchor sample. An enhanced image or a second enhanced image, and enhanced images belonging to the same category as the pseudo-label corresponding to the anchor sample; the rest are used as negative samples of the first enhanced image; where any enhanced image refers to any first enhanced image or Second enhanced image.

4. A semi-supervised domain adaptive image classification method according to claim 1 or 3, characterized in that the probability space contrast loss calculated by combining the probability vectors corresponding to positive and negative samples is expressed as:

;

in, is the probability vector corresponding to the anchor point sample/> Calculate the probability spatial contrast loss, the anchor sample is any enhanced image;/> is the probability vector of all enhanced images i except the anchor point sample, /> Express/> The corresponding label, N is the number of enhanced images in a batch except for anchor point samples, /> Is a hyperparameter that controls uniformity when calculating contrast loss;/> It is a judgment function. If the conditions in the brackets are met, the value is 1, otherwise it is 0; exp is the exponential function with the natural constant e as the base;/> Indicates the pseudo label corresponding to the anchor sample. When the anchor sample is the second enhanced image, its corresponding pseudo label is the pseudo label of the first enhanced image from the same unlabeled target domain image.

5. A semi-supervised domain adaptive image classification method according to claim 1, characterized in that the image features of the second enhanced image and the third enhanced image are respectively linearly mapped through a linear mapping network to calculate the sensitivity score representation. for:

;

in, Represents the use of enhanced images/> Image features after linear mapping/> Calculated sensitivity score of the enhanced image/> It is the second enhanced image or the third enhanced image of the unlabeled target domain image x, /> To enhance images/> Image features after linear mapping of positive samples, /> To enhance images/> Image features after linear mapping of the jth negative sample, k is the number of negative samples; enhanced image/> The positive sample is the second enhanced image or the third enhanced image from the same unlabeled target domain image, that is, the second enhanced image and the third enhanced image from the same unlabeled target domain image are positive samples of each other, and the enhanced image/> Negative samples are other enhanced images in the same batch except positive samples, /> is a hyperparameter that controls uniformity when calculating sensitivity scores.

6. A semi-supervised domain adaptive image classification method according to claim 5, characterized in that the sensitivity-weighted feature contrast loss calculated in combination with the sensitivity score is:

;

in, Represents the use of enhanced images/> Image features after linear mapping/> Calculated feature contrast loss, /> Represents sensitivity-weighted feature contrast loss.

7. A semi-supervised domain adaptive image classification method according to claim 1, characterized in that calculating the second classification loss using annotation information in annotated source domain and target domain image sets includes:

For each image in the annotated source domain and target domain image set, the corresponding enhanced image is obtained through the first type of data enhancement method, and input to the image classification network, the image features are extracted through the feature extractor, and the probability is obtained through the classifier vector, and then combined with the corresponding annotation information to calculate the second classification loss.

8. A semi-supervised domain adaptive image classification system, characterized by:

The data acquisition and processing unit is used to obtain training data, including: labeled source domain and target domain image sets, and unlabeled target domain image sets. The unlabeled target domain images in the unlabeled target domain image set are processed using two methods. Different data enhancement methods are used for processing. The enhanced image obtained through the first type of data enhancement method is called the first enhanced image. The two enhanced images obtained through different processing methods in the second type of data enhancement method are called the second enhancement. image and third enhanced image;

Network building unit, used to build an image classification network based on joint contrastive learning, including: feature extractor, classifier and linear mapping network;

The network training unit is used to train the image classification network. The training process includes: inputting three enhanced images corresponding to the unlabeled target domain image into the image classification network, extracting image features through the feature extractor, and obtaining the probability through the classifier vector; use the probability vector of the first enhanced image to generate a pseudo label, and calculate the first classification loss based on the probability vector of the second enhanced image from the same unlabeled target domain image; based on the pseudo label, use a category-aware approach to filter out each enhanced image The corresponding positive and negative samples are combined with the probability vectors corresponding to the positive and negative samples to calculate the probability spatial contrast loss; the image features of the second enhanced image and the third enhanced image are linearly mapped respectively to calculate the sensitivity score, and combined with the sensitivity score to calculate the sensitivity weighting feature contrast loss; use the annotation information in the labeled source domain and target domain image sets to calculate the second classification loss; combine the first classification loss, the second classification loss, the probability space contrast loss and the sensitivity-weighted feature contrast loss to obtain the total loss function, using the total loss function to train the image classification network;

The image classification unit is used to classify input images using the trained image classification network after training.

9. A processing device, characterized in that it includes: one or more processors; a memory for storing one or more programs;

Wherein, when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method according to any one of claims 1 to 7.

10. A readable storage medium storing a computer program, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.