CN116543237B

CN116543237B - Image classification method, system, equipment and medium for non-supervision domain adaptation of passive domain

Info

Publication number: CN116543237B
Application number: CN202310762911.7A
Authority: CN
Inventors: 王子磊; 张燚鑫; 贺伟男
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2023-06-27
Filing date: 2023-06-27
Publication date: 2023-11-28
Anticipated expiration: 2043-06-27
Also published as: CN116543237A

Abstract

The invention discloses an image classification method, a system, equipment and a medium for passive domain unsupervised domain adaptation, which are one-to-one schemes, wherein: explicit modeling is carried out on semantic similarity consistent with cross domains in the training process of the source domain model, the semantic similarity is knowledge of cross domain robustness, and the explicit modeling and migration of the semantic similarity can effectively enhance generalization performance on a target domain; then, the semantic similarity matrix is combined in the training process of the target domain for calculating the contrast loss, so that optimization of the classifier in the model can be realized, and meanwhile, the fine position relation of the sample in the feature space is accurately expressed.

Description

Image classification method, system, equipment and medium for non-supervision domain adaptation of passive domain

Technical Field

The present invention relates to the field of image classification technologies, and in particular, to a method, a system, an apparatus, and a medium for image classification adapted to an unsupervised domain of a passive domain.

Background

In recent years, the use of deep neural networks has been effective in dealing with various machine learning problems, however, its superior performance has been largely dependent on a large number of high quality annotated data sets. The high time and labor costs make manual labeling of datasets impractical. Traditional deep learning methods also do not generalize well to new data sets due to domain shift issues. In this regard, domain adaptation utilizes knowledge learned over source domains with a large number of annotation samples to facilitate model learning over another target domain that is related to the source domain but lacks annotations, enabling cost savings in annotations by reducing domain offset. The conventional domain adaptation learning method assumes that Source domain data and target domain data are visible at the same time, but in a practical scenario, the Source domain data can only be used for training a Source domain model and cannot be shared for target domain training due to the problems of data privacy and the like, so that when the target domain data is trained, only the Source domain model is used and the Source domain data is not used, and the problem is called passive domain adaptation (Source-free Domain Adaptation).

Active domain adaptation methods, which generally reduce the difference in cross-domain distribution based on inter-domain difference metrics or domain countermeasure learning, require the simultaneous presence of source domain data and target domain data and thus have no applicability under passive domain adaptation settings. In the vehicle re-identification process, a generator is trained by using the relationship maintenance consistency loss and the knowledge distillation loss through a source domain model and target domain data in the Chinese patent application No. CN114332787A, and the purpose is to generate a pseudo target sample with a source domain style, and then fine-tune the model through the pseudo target sample, so that the model performance is improved; instead of using source domain data, the method uses source domain knowledge learned in a source domain model as a guide, thereby promoting the migration of the style of target domain data to the style of source domain data; the Chinese patent with the grant bulletin number of CN115186773B, a passive active field self-adaptive model training method and device, introduces an active learning method to search a target sample with the maximum information amount for labeling, so that the target sample is most beneficial to the classification of a target field; the Chinese patent application with publication number of CN114639021A, a training method and a target dividing method of a target detection model, focuses on the problem of passive target detection domain adaptation, and proposes to train by using a teacher-student framework, wherein the student model is updated by adopting knowledge distillation and a weight regularization strategy; the Chinese patent application with publication number of CN114528913A, trust and consistency-based model migration method, device, equipment and medium, uses a dual classification network to perform model self-adaptive learning in the training process of a target domain, and uses a trust and consistency-based mechanism to perform training optimization; the Chinese patent with the authorized bulletin number of CN115546567B (an unsupervised field adaptation classification method, system, equipment and storage medium) proposes neighbor alignment loss, regular loss, dispersion loss, cross-view alignment loss and cross-view neighbor alignment loss, and iteratively improves the image classification capability of a target domain model; the Chinese patent application with publication number of CN115019106A, namely the robust unsupervised domain self-adaptive image classification method and device based on countermeasure distillation, combines knowledge distillation and countermeasure training in the training process of the target domain, and effectively improves the classification performance and model robustness of the target domain countermeasure sample while maintaining the classification performance of the target domain natural sample. Aiming at the passive domain adaptation task, the above listed schemes mainly rely on pseudo tags, clusters or neighbor information to perform self-supervision learning on the model, but are negatively affected by tag noise and source domain model deviation, and insufficient knowledge mining for cross-domain consistency is achieved.

Disclosure of Invention

The invention aims to provide an image classification method, system, equipment and medium for non-supervision domain adaptation of a passive domain, which can realize explicit modeling and migration of semantic similarity of cross-domain robustness, effectively enhance generalization performance on a target domain, realize learning of distinguishing characteristics of the target domain and further improve image classification performance of the target domain.

The invention aims at realizing the following technical scheme:

an image classification method for non-supervision domain adaptation of a passive domain, comprising:

explicit modeling is carried out on the semantic similarity consistent with the cross domain in the training process of the source domain model, and a semantic similarity matrix representing the category relation is obtained;

constructing a teacher model and a student model based on the trained source domain model, respectively inputting each target domain image into the teacher model and the student model to obtain corresponding prediction probabilities after different modes of enhancement treatment, constructing positive and negative samples by combining the prediction probabilities obtained by the student model or the teacher model in a plurality of different modes and combining the prediction probabilities obtained by the student model, calculating corresponding contrast loss by combining the semantic similarity matrix, integrating all contrast loss to update parameters of the student model, and updating the parameters of the teacher model based on the updated student model parameters;

And inputting the images to be classified into the trained student model to obtain an image classification result.

A passive domain unsupervised domain adapted image classification system comprising:

the semantic similarity matrix acquisition unit is used for carrying out explicit modeling on the semantic similarity consistent with the cross domain in the source domain model training process to obtain a semantic similarity matrix representing the category relation;

the model construction and training unit is used for constructing a teacher model and a student model based on the trained source domain model, each target domain image is respectively input into the teacher model and the student model after being subjected to enhancement processing in different modes to obtain corresponding prediction probabilities, the prediction probabilities obtained by the student model are used as query samples, positive and negative samples are constructed by combining the prediction probabilities obtained by the student model or the teacher model in a plurality of different modes, the semantic similarity matrix is combined to calculate corresponding contrast loss, the parameters of the student model are updated by integrating all the contrast loss, and the parameters of the teacher model are updated based on the updated student model parameters;

and the image classification unit is used for inputting the images to be classified into the trained student model to obtain an image classification result.

A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.

A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.

According to the technical scheme provided by the invention, explicit modeling is performed on the semantic similarity consistent with the cross-domain in the training process of the source domain model, the semantic similarity is knowledge of cross-domain robustness, explicit modeling and migration on the semantic similarity can effectively enhance generalization performance on a target domain, and robustness on pseudo tag noise can be realized by using a source domain semantic similarity matrix; then, the semantic similarity matrix is combined in the training process of the target domain for calculating the contrast loss, so that optimization of the classifier in the model can be realized, and meanwhile, the fine position relation of the sample in the feature space is accurately expressed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an image classification method for non-supervision domain adaptation of a passive domain according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of source domain model training and cross-domain robust knowledge extraction provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a target domain training process based on cross-domain robust knowledge according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an image classification system adapted to an unsupervised domain of a passive domain according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The terms that may be used herein will first be described as follows:

the terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.

The method, the system, the equipment and the medium for classifying the image, which are suitable for the non-supervision domain of the passive domain, are provided by the invention in detail. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer. The reagents or apparatus used in the examples of the present invention were conventional products commercially available without the manufacturer's knowledge.

Example 1

The embodiment of the invention provides an image classification method for non-supervision domain adaptation of a passive domain, which is shown in fig. 1 and comprises the following steps:

step 1, carrying out explicit modeling on the semantic similarity consistent with the cross domain in the training process of the source domain model to obtain a semantic similarity matrix representing the category relation.

In the embodiment of the invention, the source domain model is trained by utilizing the source domain image data, the semantic similarity consistent with the cross domain is explicitly modeled in the training process, the joint optimization of the semantic similarity matrix and the classifier parameters in the source domain model is realized by regularization constraint based on trace norms, and the loss in the training process comprises classification loss and regularization constraint based on trace norms; after training, a semantic similarity matrix representing the relation among the categories is obtained, and each element in the semantic similarity matrix represents the similarity among different categories.

And 2, constructing a teacher model and a student model based on the trained source domain model, and optimizing the student model and the teacher model by combining the target domain image with the semantic similarity matrix.

In the embodiment of the invention, after being subjected to enhancement processing in different modes, each target domain image is respectively input into a teacher model and a student model to obtain corresponding prediction probabilities, the prediction probabilities obtained by the student model are used as query samples, positive and negative samples are constructed by combining the prediction probabilities obtained by the student model or the teacher model in a plurality of different modes, corresponding contrast loss is calculated by combining the semantic similarity matrix, parameters of the student model are updated by integrating all the contrast loss, and the parameters of the teacher model are updated based on the updated parameters of the student model.

In the embodiment of the invention, a sample similarity calculation method of fusion semantic similarity is adopted, and the sample similarity between a query sample and a positive sample and the sample similarity between the query sample and a negative sample are calculated respectively by combining the semantic similarity matrix; then, a contrast loss is calculated based on the sample similarity between the query sample and the positive sample, and the sample similarity between the query sample and the negative sample.

In the embodiment of the invention, three types of contrast losses are mainly calculated: noise robust contrast loss based on pseudo-labels, contrast loss based on single samples, and contrast loss based on neighbor samples.

(1) And screening out a corresponding target domain image according to the prediction probability output by the teacher model, taking the prediction probability obtained by the student model after the screened out corresponding target domain image is subjected to enhancement processing as a query sample, taking the prediction probability obtained by the teacher model after the query sample corresponds to the target domain image is subjected to enhancement processing as a pseudo tag, taking the probability vector in the form of a single thermal vector corresponding to the pseudo tag as a positive sample, removing the probability vectors in the form of single thermal vectors corresponding to other types except the category corresponding to the pseudo tag as a negative sample, utilizing the query sample, the positive sample and the negative sample, and calculating noise robust comparison loss based on the pseudo tag by combining the semantic similarity matrix.

(2) The method comprises the steps of carrying out enhancement processing on the same target domain image in different modes, respectively obtaining corresponding prediction probabilities through a student model and a teacher model, taking the prediction probabilities obtained by the student model as a query sample, taking the prediction probabilities obtained by the teacher model as a positive sample, taking the prediction probabilities obtained by the teacher model after enhancement processing on other target domain images as a negative sample, utilizing the query sample, the positive sample and the negative sample, and calculating the comparison loss based on a single sample by combining the semantic similarity matrix.

(3) Each target domain image is subjected to enhancement processing and then is respectively input into a student model, and corresponding prediction probability is obtained through feature extraction and classification prediction; for a current target domain image, a k neighbor sample is found in a feature space according to the similarity between features, the prediction probability obtained by a student model corresponding to the current target domain image is used as a query sample, the prediction probability obtained by the student model after enhancement processing of the k neighbor sample is used as a positive sample, the prediction probability obtained by the student model after enhancement processing of other target domain images except the current target domain image and the k neighbor sample is used as a negative sample, and the query sample, the positive sample and the negative sample are utilized and the semantic similarity matrix is combined to calculate the comparison loss based on the neighbor sample; where k is a hyper-parameter.

And step 3, inputting the images to be classified into the trained student model to obtain an image classification result.

The scheme provided by the embodiment of the invention mainly has the following advantages:

(1) Semantic similarity is a cross-domain robust knowledge, and explicit modeling and migration of semantic similarity can effectively enhance generalization performance on a target domain.

(2) Compared with the feature similarity, the sample similarity calculation method based on the probability space fusion semantic similarity can realize the optimization of the classifier, and simultaneously accurately express the fine position relation of the sample in the feature space.

(3) Sample similarity fused with semantic similarity is used for three comparison losses, so that learning of target domain discriminant features is realized.

In general, the method extracts the semantic similarity matrix in the source domain training process, uses the semantic similarity matrix for sample similarity measurement in the target domain learning process, and uses the semantic similarity matrix for three comparison losses, so that the effective mining of the target domain distribution is realized, the discrimination capability of the model in the target domain is enhanced, and the accuracy of the target domain image classification is improved.

In order to clearly show the technical scheme and the technical effects, the invention is introduced from the three parts of the whole principle, source domain learning and target domain learning.

1. Overview of the overall principle.

In order to solve the problem that the current passive domain adaptation method is insufficient in cross-domain consistent knowledge mining, the invention discloses a passive domain adaptation learning method integrating semantic similarity. The current method mainly relies on pseudo tags, clusters or neighbor information to perform self-supervision learning on the model, but is negatively affected by tag noise and source domain model deviation, and insufficient knowledge mining for cross-domain consistency is achieved. The invention provides an image classification method for non-supervision domain adaptation of a passive domain, which can realize explicit modeling and migration of semantic similarity of cross-domain robustness, can effectively enhance generalization performance on a target domain and realize learning of discrimination characteristics of the target domain.

The core innovation can be summarized in three ways: 1. the semantic similarity matrix is explicitly constructed and trained during the source domain model training process. 2. The cross-domain robust probability space sample similarity calculation method uses a semantic similarity matrix as a weighting coefficient. 3. Based on cross-domain robust sample similarity, three comparison losses are constructed, and learning of target domain discriminant features is achieved.

2. Source domain learning.

In the embodiment of the invention, source domain learning mainly refers to training a source domain model by using source domain data. In the training process of the source domain model, adding the learnable parameters is an effective semantic similarity modeling mode. For this purpose, a semantic similarity matrix A of the source domain is constructed by regularization constraint based on trace norms Realizing a semantic similarity matrix A and classifier parameters +.>Wherein the semantic similarity matrix A is a semi-positive definite matrix, each element represents the similarity degree between different categories, and more robust and accurate category relations can be learned through the semantic similarity matrix. The semantic similarity matrix a is passed to the target domain along with the source domain model, and remains fixed during training of the target domain.

As shown in FIG. 2, the source domain model generally includes a feature extraction network and a classifier, and the loss function is used in the training process of the source domain modelIncluding classification loss (supervised cross entropy loss->) And a trace norm based regularization constraint, expressed as:

；

wherein,for trace norm symbols, T is a transposed symbol.

3. And (5) learning a target domain.

In the embodiment of the invention, a target domain model is built based on a trained source domain model, the target domain model is trained by combining a target domain image and a semantic similarity matrix, a cross-domain robust probability space sample similarity calculation method is designed in the training process, and three contrast losses are built based on cross-domain robust sample similarity (based on cross-domain robust knowledge).

1. And constructing a target domain model.

In the embodiment of the invention, a teacher model and a student model are constructed based on the trained source domain model, and the teacher model and the student model are target domain models, specifically: and constructing a teacher model and a student model which have the same structure as the source domain model, and initializing parameters of the teacher model and the student model by using the trained parameters of the source domain model.

2. A cross-domain robust sample similarity measurement method.

For any two target domain imagesAnd->The obtained prediction probabilities are +.>And->. Providing a similarity measurement method between every two samples of cross-domain robustness, wherein the calculation process uses the outer product of probabilities, each term represents the similarity of two samples in corresponding categories, and the similarity measurement method uses the semantic similarity matrix A of a source domainThe corresponding item is used as a weight, and the specific calculation process is as follows:

；

wherein,representing an outer product operation, K representing the number of categories, i and j representing two categories; />Representing the similarity of the ith row and the jth column elements, namely the category i and the category j, in the semantic similarity matrix A; />The expression is represented by->And->The ith row and jth column elements of the rank-one matrix obtained by the outer product represent +.>Middle categories i and->Similarity of category j in->Representation->And- >Is a similarity of (3).

In the embodiment of the invention, the form of the probability outer product is used in the similarity calculation process, and the hyper-parameters needing to be adjusted are not introduced, so that the calculation mode is simpler and more efficient compared with the calculation mode.

In addition, compared with the traditional cosine similarity based on the characteristics, the sample similarity calculation method provided by the invention depends on the probability output by the classifier, so that the parameters of the classifier can be optimized simultaneously, and the consistency between the characteristics and the weights of the classifier can be maintained; meanwhile, the semantic similarity matrix A of the source domain is used as the weight, and the semantic similarity matrix A expresses the position relation of the category weight in the feature space, so that the relation of the sample in the feature space can be expressed more accurately.

3. Contrast loss based on cross-domain robust knowledge.

In order to utilize the sample similarity of the cross-domain robustness provided by the above, we establish a unified form of comparison loss, and the purpose of the comparison loss is to construct positive and negative samples of a given query sample, so that the similarity of the query sample and the positive sample is large, and the similarity of the query sample and the negative sample is small, so that the model has discriminant property, wherein the similarity calculation between the positive sample pair and the positive and negative samples uses a sample similarity measurement method fused with a semantic similarity matrix.

The following describes the unified contrast loss calculation method: for target domain imagesFirst, strong data enhancement is carried out, then the strong data enhancement is input into a student model, and the obtained prediction probability is +.>As a query sample, a positive sample is constructed simultaneously>And memory bank B holding negative samples, the purpose of the contrast penalty is to make +.>And->Approach, while making the query sample +.>And preserving the negative sampleWhen the sample probability in the memory library B is far away and the contrast loss is calculated, firstly, the semantic similarity matrix is combined, and the sample similarity between the query sample and the positive sample and the sample similarity between the query sample and the negative sample are calculated respectively and expressed as:

；

wherein,representing an outer product operation, K representing the number of categories, i and j representing two categories; />Representing the similarity of the ith row and the jth column elements, namely the category i and the category j, in the semantic similarity matrix A; s is(s) _robust (p, q) represents the similarity of query sample p to sample q,/>Representing the similarity of class i in query sample p and class j in sample q, sample q referring to the positive sample +.>Or negative sample->I.e. query sample p and positive sample +.>Sample similarity between ∈>Query sample p and negative sample->Sample similarity between ∈>。

Then, a contrast loss is calculated based on the sample similarity between the query sample and the positive sample, and the sample similarity between the query sample and the negative sample, expressed as:

；

Wherein,representing similarity s using query sample p and sample q _robust (p, q) calculated intermediate parameter, < ->Indicating that sample q is a positive sample +.>Calculated intermediate parameters +.>Indicating that sample q is a negative sample->Calculated intermediate parameters; l represents the contrast loss calculated by using the query sample and the corresponding positive and negative samples, B represents a memory bank for storing the negative samples, and the size and the storage content of the memory bank are determined according to the contrast loss in different specific forms; exp represents an exponential function based on a natural constant e, ++>For the temperature coefficient, 0.07 is generally taken.

As shown in fig. 3, based on the above-mentioned unified form of contrast loss, three specific contrast learning losses are constructed in combination with the selection of different positive and negative samples, which differ in the manner of selection of the positive and negative samples.

(1) Noise robust contrast loss based on pseudo tags.

Query sample p-selectionThe high confidence level screening is to input the weak enhanced image corresponding to the target domain image into the teacher model, and to input the maximum prediction probability (i.e. pseudo tag) and the preset probability threshold value(can be generally taken from [0.90,0.98 ] ]In a number) if the maximum prediction probability is greater than +.>The target domain image belongs to a high confidence sample, the prediction probability obtained by inputting the strong enhanced image corresponding to the target domain image into the student model is used as a query sample, and the category corresponding to the maximum probability represents the pseudo tag of the image, positive sample->The method comprises the steps that a probability vector in One-hot form corresponding to a pseudo tag is adopted, and a memory bank B for storing a negative sample consists of probability vectors in One-hot forms of other categories except the category corresponding to the pseudo tag; after obtaining the query sample and the positive and negative samples, the noise robust contrast loss L based on the pseudo tag can be calculated through the contrast loss calculation mode in the unified form described above ₁ . Because of the domain differences between the source domain and the target domain, the original pseudo tag is always noisy, where robustness to pseudo tag noise is achieved by using the source domain semantic similarity matrix.

Those skilled in the art will appreciate that there may be a variety of ways for image enhancement, for example, weak enhancement and strong enhancement, while strong enhancement encompasses a variety of ways. For each target domain image, a random weak enhancement mode is adopted for processing, so that a corresponding weak enhancement image can be obtained, and two strong enhancement images can be obtained by adopting random different strong enhancement modes for processing; the weak enhanced image and one strong enhanced image are input to the teacher model, and the other strong enhanced image is input to the student model.

As will be understood by those skilled in the art, the probability vector in the form of a single heat means that only one element in the vector is 1, the rest are 0, and assuming that the class corresponding to the maximum probability is the 1 st class and the number of classes is 5, the positive samples can be expressed as (1, 0), and the negative samples are: (0,1,0,0,0), (0,0,1,0,0), (0,0,0,1,0), (0,0,0,0,1).

(2) Loss of contrast based on a single sample.

The query sample p can be the prediction probability generated by the strong enhanced image of any target domain image through the student model, and is a positive sampleThe memory bank B for storing the negative samples can be composed of the prediction probabilities generated by the teacher model of the strong enhancement images of the other target domain images except the query sample which is randomly sampled, and other target domain images in the same training batch data can be selected in actual training. After obtaining the query sample and the positive and negative samples, the comparison loss L based on the single sample can be calculated by the comparison loss calculation mode in the unified form as described above ₂ 。

(3) Loss of contrast based on neighbor samples.

The query sample p is the prediction probability generated by the strong enhanced image of any target domain image through the student model, in order to obtain a positive sample In the feature cosine similarity calculated in the feature space, specifically, different target domain images are input into a student model after enhancement processing to obtain corresponding features, for the current target domain image, a k neighbor sample (k neighbor search based on features) is found out through the similarity among the features, and k is a super parameter and is usually selected according to the data scale. Taking the prediction probability obtained by the strong enhancement image of the current target domain image through the student model as a query sample p, and taking the prediction probability obtained by the strong enhancement image of the k neighbor samples (namely, k target domain images with highest feature similarity) through the student model as a positive sample +.>The memory bank B for storing the negative samples can randomly sample the prediction probability of the strong enhanced images of other target domain images except the query sample and the positive sample generated by the student model, and in the actual training, other target domain images in the same training batch data can be selected. After obtaining the query sample and the positive and negative samples, the comparison loss L based on the neighbor samples can be calculated by the comparison loss calculation mode in the unified form as described above ₃ 。

The three comparison losses are combined, and an overall loss function of the target domain is constructed in the following form:

L _target =L ₁ + L ₂ + L ₃ ；

Wherein L is _target Is the overall loss function of the target domain.

In the embodiment of the invention, the three comparison losses are combined for training, so that multi-level discriminant feature learning of the global semantic category, the local neighbor structure and the single-sample multiple views is realized, the learning effect is improved, and the target domain image classification performance is further improved.

Gradient obtained through calculation of the overall loss function of the target domain updates parameters of the student model, and then the parameters of the teacher model are updated in an Exponential Moving Average (EMA) mode, wherein the updating of the parameters of the student model comprises the following steps: the parameters of the feature extraction network and the parameters of the classifier are shared by the student model and the teacher model, and the parameters of the feature extraction network in the teacher model are updated by the parameters of the feature extraction network in the student model in an EMA mode.

The above is the main content of the passive domain unsupervised domain adaptive image classification method provided in the embodiments of the present invention, and for convenience of understanding, the following provides an overall example.

In step S1, a training dataset with labeled source domain and a pre-trained image classification model on the ImageNet dataset are prepared, which may be a model based on CNN (convolutional neural network) (e.g. res net 50) or a model based on a transducer (e.g. ViT-Base), the source domain training dataset data are subjected to ordinary random clipping and horizontal water machine overturning, after image processing, the size of the image is scaled to a specified size (e.g. 224×224), and then numerical normalization processing is performed. And inputting the processed source domain image into an image classification model, performing supervised training by using cross entropy loss and regularization constraint based on trace norms, and obtaining a source domain model and a semantic similarity matrix after training is finished.

And S2, constructing a model in the target domain training process based on the source domain model, specifically constructing a teacher model and a student model which have the same structure as the source domain model, and initializing the teacher model and the student model by using parameters of the source domain model.

Step S3, preparing unlabeled image data of the target domains, carrying out random weak enhancement once to each target domain image to obtain a weak enhancement image, and carrying out random strong enhancement twice to obtain two different strong enhancement images, wherein the weak enhancement can use random cutting, random horizontal turning and other modes, and the strong enhancement refers to a RandAugment process adopted in a semi-supervised learning method FixMatch. The target domain images are randomly disturbed, each 64 target domain images form a batch, 3 enhanced images corresponding to each target domain image are taken as one sample, and each batch actually contains 64 multiplied by 3 images.

And S4, calculating noise robust comparison loss based on pseudo labels based on fusion semantic similarity based on the target domain batch data. Specifically, for each target domain image in the batch data, inputting the weak enhanced image thereof into the teacher model to obtain a prediction probability, and obtaining a pseudo tag based on the maximum probability category, if the maximum value of the prediction probability is greater than a preset probability threshold value The sample belongs to a high confidence sample. Calculating the contrast loss based on the pseudo tag only for the high confidence coefficient sample, respectively inputting two strong enhanced images of the high confidence coefficient sample into the student model and the teacher model to respectively obtain the prediction probability, wherein the prediction probability of the student model is used as a query sample, and the one-hot probability corresponding to the pseudo tagAs a positive sample, the one-hot probability of other categories than the pseudo tag is taken as a negative sample, thereby calculating the noise robust contrast loss based on the pseudo tag.

And S5, calculating the contrast loss of fusion semantic similarity based on the target domain batch data and based on a single sample. Specifically, for each sample in the batch data, one of the strong enhanced images is input into the student model to obtain a prediction probability as a query sample, the other strong enhanced image is input into the teacher model to obtain a prediction probability as a positive sample, and the prediction probability output by the teacher model corresponding to the strong enhanced image in the other samples in the same batch is taken as a negative sample, so that the contrast loss based on the single sample is calculated.

And S6, calculating the contrast loss based on the neighbor information of the fusion semantic similarity based on the target domain batch data. Specifically, a sample feature-probability library is constructed, wherein features generated by strong enhancement images in all samples through student models and corresponding prediction probabilities thereof are stored, and the update is carried out when batch data arrive. For each sample in batch data, firstly inputting a student model to obtain characteristics and prediction probability, wherein the prediction probability is used as a query sample, then searching the nearest k samples in a sample characteristic library by utilizing the characteristics, taking the prediction probability corresponding to the k neighbor samples as positive samples, and taking the probability that strong enhancement images in other samples in the same batch are output by the student model as negative samples, thereby calculating the contrast loss based on neighbor information.

And S7, accumulating the three losses calculated in the steps S4-S6, minimizing a loss function through a back propagation algorithm and a gradient descent strategy, updating parameters of the student model, and updating parameters of the teacher model in an exponential sliding average mode through the parameters of the student model.

And S8, inputting a test data set of the target domain, scaling the test data set to a specified size, then carrying out numerical normalization processing, inputting the processed image into a student model to obtain a prediction probability, obtaining a category corresponding to the maximum probability value as an image classification result, and calculating the accuracy of the image classification result.

From the description of the above embodiments, it will be apparent to those skilled in the art that the above embodiments may be implemented in software, or may be implemented by means of software plus a necessary general hardware platform. With such understanding, the technical solutions of the foregoing embodiments may be embodied in a software product, where the software product may be stored in a nonvolatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and include several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present invention.

Example two

The present invention also provides a passive domain unsupervised domain adaptive image classification system, which is mainly used for implementing the method provided in the foregoing embodiment, as shown in fig. 4, and the system mainly includes:

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the system is divided into different functional modules to perform all or part of the functions described above.

Example III

The present invention also provides a processing apparatus, as shown in fig. 5, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.

Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.

In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:

the input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;

The output device may be a display terminal;

the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.

Example IV

The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.

The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. An image classification method for non-supervision domain adaptation of a passive domain, comprising:

explicit modeling of cross-domain consistent semantic similarity in a source domain model training process to obtain a semantic similarity matrix representing category relationships, comprising: training the source domain model by utilizing the source domain image data, carrying out explicit modeling on the semantic similarity consistent with the cross domain in the training process, and realizing joint optimization of a semantic similarity matrix and classifier parameters in the source domain model by regularization constraint based on trace norms, wherein the loss in the training process comprises classification loss and regularization constraint based on the trace norms; after training, obtaining a semantic similarity matrix between the categories, wherein each element in the semantic similarity matrix represents the similarity between different categories;

Inputting the images to be classified into a trained student model to obtain an image classification result;

the step of constructing positive and negative samples by combining the prediction probabilities obtained by the student model or the teacher model in a plurality of different modes by using the prediction probabilities obtained by the student model as query samples, and calculating corresponding contrast loss by combining the semantic similarity matrix comprises the following steps:

screening out corresponding target domain images according to the prediction probabilities output by the teacher model, taking the prediction probabilities obtained through the student model after the screened out corresponding target domain images are subjected to enhancement processing as query samples, taking the prediction probabilities obtained through the teacher model after the query sample corresponds to the target domain images and is subjected to enhancement processing as pseudo labels, taking probability vectors in the form of independent heat vectors corresponding to the pseudo labels as positive samples, removing probability vectors in the form of independent heat vectors corresponding to other types except the types corresponding to the pseudo labels as negative samples, utilizing the query samples, the positive samples and the negative samples, and calculating noise robust comparison loss based on the pseudo labels by combining the semantic similarity matrix;

the method comprises the steps of carrying out enhancement processing on the same target domain image in different modes, respectively obtaining corresponding prediction probabilities through a student model and a teacher model, taking the prediction probabilities obtained by the student model as a query sample, taking the prediction probabilities obtained by the teacher model as a positive sample, taking the prediction probabilities obtained by the teacher model after enhancement processing on other target domain images as a negative sample, utilizing the query sample, the positive sample and the negative sample, and calculating the comparison loss based on a single sample by combining the semantic similarity matrix;

Each target domain image is subjected to enhancement processing and then is respectively input into a student model, and corresponding prediction probability is obtained through feature extraction and classification prediction; for a current target domain image, a k neighbor sample is found in a feature space according to the similarity between features, the prediction probability obtained by a student model corresponding to the current target domain image is used as a query sample, the prediction probability obtained by the student model after enhancement processing of the k neighbor sample is used as a positive sample, the prediction probability obtained by the student model after enhancement processing of other target domain images except the current target domain image and the k neighbor sample is used as a negative sample, and the query sample, the positive sample and the negative sample are utilized and the semantic similarity matrix is combined to calculate the comparison loss based on the neighbor sample; where k is a hyper-parameter.

2. The method of image classification for passive domain unsupervised domain adaptation according to claim 1, wherein the means for calculating contrast loss comprises:

combining the semantic similarity matrix, and respectively calculating sample similarity between a query sample and a positive sample and sample similarity between the query sample and a negative sample;

the contrast loss is calculated based on sample similarity between the query sample and the positive sample, and sample similarity between the query sample and the negative sample.

3. A method of image classification for passive domain unsupervised domain adaptation as claimed in claim 2, characterized in that,

sample similarity between the query sample and the positive sample, and sample similarity between the query sample and the negative sample, are calculated as:

；

wherein,representing an outer product operation, K representing the number of categories, i and j representing two categories; />Representing the similarity of the ith row and the jth column elements, namely the category i and the category j, in the semantic similarity matrix A; s is(s) _robust (p, q) represents the similarity of query sample p to sample q,/>Representing the similarity of class i in query sample p and class j in sample q, sample q referring to the positive sample +.>Or negative sample->I.e. query sampleThe p and positive samples->Sample similarity between ∈>Query sample p and negative sample->Sample similarity between ∈>；

The corresponding contrast loss was recalculated, expressed as:

；

wherein,representing similarity s using query sample p and sample q _robust (p, q) the calculated intermediate parameter,indicating that sample q is a positive sample +.>Calculated intermediate parameters +.>Indicating that sample q is a negative sample->Calculated intermediate parameters; l represents calculation using the query sample and the corresponding positive and negative samplesB represents a memory bank holding negative samples; exp represents an exponential function based on a natural constant e, ++ >Is a temperature coefficient.

4. A passive domain unsupervised domain adapted image classification system comprising:

the semantic similarity matrix acquisition unit is used for carrying out explicit modeling on the semantic similarity consistent with the cross domain in the source domain model training process to obtain a semantic similarity matrix representing the category relation, and comprises the following steps: training the source domain model by utilizing the source domain image data, carrying out explicit modeling on the semantic similarity consistent with the cross domain in the training process, and realizing joint optimization of a semantic similarity matrix and classifier parameters in the source domain model by regularization constraint based on trace norms, wherein the loss in the training process comprises classification loss and regularization constraint based on the trace norms; after training, obtaining a semantic similarity matrix between the categories, wherein each element in the semantic similarity matrix represents the similarity between different categories;

The image classification unit is used for inputting the images to be classified into the trained student model to obtain an image classification result;

5. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-3.

6. A readable storage medium storing a computer program, which when executed by a processor implements the method of any one of claims 1-3.