CN113920382B - Cross-domain image classification method based on class consistency structured learning and related device - Google Patents

Cross-domain image classification method based on class consistency structured learning and related device Download PDF

Info

Publication number
CN113920382B
CN113920382B CN202111530728.1A CN202111530728A CN113920382B CN 113920382 B CN113920382 B CN 113920382B CN 202111530728 A CN202111530728 A CN 202111530728A CN 113920382 B CN113920382 B CN 113920382B
Authority
CN
China
Prior art keywords
matrix
target domain
label
domain
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111530728.1A
Other languages
Chinese (zh)
Other versions
CN113920382A (en
Inventor
陆玉武
罗幸萍
林德伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202111530728.1A priority Critical patent/CN113920382B/en
Publication of CN113920382A publication Critical patent/CN113920382A/en
Application granted granted Critical
Publication of CN113920382B publication Critical patent/CN113920382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]

Abstract

The embodiment of the application discloses a cross-domain image classification method and device based on class consistency structured learning, electronic equipment and a computer readable storage medium. The method comprises the following steps: the method comprises the steps of obtaining an initialized pseudo label of a target domain sample based on a first image classifier trained by using source domain data, obtaining an initialized projection matrix, a graph matrix, a category weight matrix, a Laplace matrix of a source domain and a Laplace matrix of the target domain based on the initialized pseudo label and a label of the source domain, updating the projection matrix by using the matrix obtained through initialization, performing projection learning by using the updated projection matrix to update the initialized pseudo label, and finally taking the updated pseudo label as a final classification result of the target domain sample image when the cycle number reaches a preset number. Therefore, by learning the Laplace matrix from the source domain to the target domain in the same category, the consistency and continuity of the samples in the category are improved, and the classification performance of the model on the samples in the target domain is improved.

Description

Cross-domain image classification method based on class consistency structured learning and related device
Technical Field
The application belongs to the technical field of machine learning, and particularly relates to a cross-domain image classification method and device based on class consistency structured learning, electronic equipment and a computer-readable storage medium.
Background
The unsupervised Domain Adaptation (Domain Adaptation) refers to a machine learning method for training a model applied to an unlabeled target Domain in a source Domain containing labels.
Data distribution differences (edge distribution differences and conditional distribution differences) may exist between the source domain data set containing the label and the target domain data set containing no label, and therefore, when the source domain trained model is applied to the target domain, the performance of the model may be significantly reduced ("overfitting"). In order to alleviate the data Distribution difference between the source domain and the target domain, the conventional unsupervised domain Adaptation method may adopt a method based on feature Adaptation, such as Transport Component Analysis (TCA) and Joint Distribution Adaptation (JDA); example weight-based methods such as the Transfer Joint Matching method (TJM) and the Coupled Knowledge Transfer method (CKET) can also be employed.
In order to overcome the data distribution difference of the source domain and the target domain, most of the current unsupervised domain adaptation methods for processing image classification introduce edge distribution matching and conditional distribution matching. After the sample data is subjected to edge distribution matching and condition distribution matching, the sample points of the same category in different domains are distributed and evacuated, that is, the samples in the classes in different domains are distributed and evacuated, and the consistency and the continuity are poor. The distributed and sparse sample clusters of the same category can greatly reduce the classification performance of the model on the target domain samples.
Disclosure of Invention
The embodiment of the application provides a cross-domain image classification method, a device, electronic equipment and a computer readable storage medium based on class consistency structured learning, which can solve the problem that the classification performance of a model on a target domain sample is low due to poor consistency and continuity of an intra-class sample in the existing unsupervised domain adaptation method.
In a first aspect, an embodiment of the present application provides a cross-domain image classification method based on class consistency structured learning, including:
acquiring a source domain data set and a target domain data set, wherein the source domain data set comprises source domain sample images and labels of the source domain sample images, and the target domain data set comprises target domain sample images;
obtaining an initialization pseudo label of each target domain sample image based on a first image classifier trained by using a source domain data set;
projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, initializing according to the same type of source domain sample points and target domain sample points based on the initialized pseudo labels and labels, and obtaining a projection matrix, a graph matrix, a type weight matrix, a Laplace matrix of the source domain and a Laplace matrix of the target domain;
updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;
performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data;
classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data;
when the cycle times do not reach the preset times, respectively updating the graph matrix, the category weight matrix and the Laplacian matrix of the target domain according to the labels and the pseudo labels to obtain an updated graph matrix, an updated category weight matrix and an updated Laplacian matrix of the target domain, and returning to update the projection matrix according to the graph matrix, the category weight matrix, the Laplacian matrix of the source domain and the Laplacian matrix of the target domain to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplacian matrix of the target domain;
when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label;
wherein, according to the graph matrix, the category weight matrix, the laplacian matrix of the source domain and the laplacian matrix of the target domain, updating the projection matrix to obtain an updated projection matrix, including:
by the formula (X.OMEGA.X)T+βIm)P=XHXTP phi, obtaining an updated projection matrix; beta is a hyperparameter;
wherein X ═ Xs,Xt]∈Rm×n,XsFor the source domain data set, XtFor the target domain data set, P is the projection matrix, Im∈Rm ×mIs an identity matrix of dimension m, Rm×mRepresenting a real space of dimensions m x m, Rm×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions;
Φ=diag(Φ1,Φ2,...,Φd)∈Rd×da diagonal matrix whose diagonal elements are lagrange multipliers;
Figure GDA0003502289810000021
alpha, eta and gamma are all hyper-parameters; mcIs a matrix of the class weights,
Figure GDA0003502289810000031
Figure GDA0003502289810000032
v(i)to contain zt,iFront of
Figure GDA0003502289810000033
A set of target domain sample points of the nearest neighbor point,
Figure GDA0003502289810000034
δ∈[0,1]is a pre-set abutment factor and is,
Figure GDA0003502289810000035
is a rounded up symbol;
Figure GDA0003502289810000036
representing sample points
Figure GDA0003502289810000041
And sample point
Figure GDA0003502289810000042
The same as the category label of (1);
Figure GDA0003502289810000043
representing sample points
Figure GDA0003502289810000044
The category label of (a) is set,
Figure GDA0003502289810000045
representing sample points
Figure GDA0003502289810000046
The category label of (1), wherein,
Figure GDA0003502289810000047
is that
Figure GDA0003502289810000048
Is determined by the point of the neighborhood of the point,
Figure GDA0003502289810000049
αi、αja weight coefficient representing each sample in the target domain; ztRepresenting the form of the target domain data in projection space, Zi=PTXt,zt,iRepresents ZtThe ith data in (1);
Figure GDA00035022898100000410
respectively a source domain sample point and a target domain sample point;
Figure GDA00035022898100000411
the source domain sample points and the target domain sample points with the category of c;
Figure GDA00035022898100000412
Lslaplace matrix, L, for the source domaintA Laplace matrix as a target domain; g is a matrix of the graph,
Figure GDA00035022898100000413
Figure GDA00035022898100000414
Figure GDA00035022898100000415
and is
Figure GDA00035022898100000416
Figure GDA00035022898100000417
And is
Figure GDA00035022898100000418
With a representation dimension of ns×ntReal number space of, nsAnd ntRepresenting a spatial dimension;
Figure GDA0003502289810000051
,zi=PTxi,zj=PTxjas projected sample points, y (z)i),y(zj) Labels for the projected sample points;
Figure GDA0003502289810000052
u(i)to comprise
Figure GDA0003502289810000053
The target domain sample point set of k nearest neighbors,
Figure GDA0003502289810000054
k is the preset number of nearest neighbors, C ∈ {1, 2., C } is the number of categories common to the source domain and the target domain,
Figure GDA0003502289810000055
and
Figure GDA0003502289810000056
the number of samples of the source domain and target domain class c respectively,
Figure GDA0003502289810000057
respectively a sample point with the category c in the source domain and the target domain; n is ns+nt
Figure GDA0003502289810000061
In,IdIdentity matrices of dimensions n and d, 1n×nIs a square matrix with elements of 1.
Therefore, the laplacian matrix from the source domain to the target domain is learned to provide consistency and continuity of the samples in the class between the domains, so that the classification performance of the model on the samples in the target domain is further improved, and the knowledge transfer from the source domain to the target domain is promoted.
In some possible implementation manners of the first aspect, updating the graph matrix, the category weight matrix, and the laplacian matrix of the target domain according to the label and the pseudo label, respectively, to obtain an updated graph matrix, an updated category weight matrix, and an updated laplacian matrix of the target domain, includes: by the formula
Figure GDA0003502289810000062
Calculating to obtain an updated graph matrix according to the label and the pseudo label; by the formula
Figure GDA0003502289810000071
Calculating to obtain an updated category weight matrix according to the label and the pseudo label; by the formula
Figure GDA0003502289810000081
And calculating to obtain an updated Laplace matrix of the target domain according to the label and the pseudo label.
In some possible implementation manners of the first aspect, classifying the projected target domain sample data based on a second image classifier trained using the labeled projected source domain sample data to obtain a pseudo label of the projected target domain sample data includes:
training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier;
and classifying the projected target domain sample data by using a second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is a pseudo label.
In some possible implementations of the first aspect, obtaining an initialization pseudo label for each target domain sample image based on a first image classifier trained using a source domain dataset includes:
training an image classifier by using a source domain data set to obtain a trained first image classifier;
and classifying the sample images of each target domain by using the first image classifier to obtain a second classification result of the sample images of each target domain, wherein the second classification result is an initialization pseudo label.
In a second aspect, an embodiment of the present application provides a cross-domain image classification device based on class consistency structured learning, including:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a source domain data set and a target domain data set, the source domain data set comprises a source domain sample image and a label of the source domain sample image, and the target domain data set comprises a target domain sample image;
the pseudo label initialization module is used for obtaining initialization pseudo labels of all target domain sample images based on a first image classifier trained by using a source domain data set;
the initialization module is used for projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialization pseudo labels and the initialization labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplacian matrix of the source domain and a Laplacian matrix of the target domain;
the projection matrix updating module is used for updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;
the projection module is used for performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data;
the pseudo label updating module is used for classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data;
the circulation module is used for respectively updating the graph matrix, the category weight matrix and the Laplace matrix of the target domain according to the label and the pseudo label when the circulation frequency does not reach the preset frequency to obtain an updated graph matrix, an updated category weight matrix and an updated Laplace matrix of the target domain, and returning to the Laplace matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to update the projection matrix to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplace matrix of the target domain; when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label;
wherein, the projection matrix updating module is specifically configured to:
by the formula (X.OMEGA.X)T+βIm)P=XHXTP phi, obtaining an updated projection matrix; beta is a hyperparameter;
wherein X ═ Xs,Xt]∈Rm×n,XsFor the source domain data set, XtFor the target domain data set, P is the projection matrix, Im∈Rm ×mIs an identity matrix of dimension m, Rm×mRepresenting a real space of dimensions m x m, Rm×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions;
Φ=diag(Φ1Φ2,...,Φd)∈Rd×da diagonal matrix whose diagonal elements are lagrange multipliers;
Figure GDA0003502289810000091
alpha, eta and gamma are all hyper-parameters; mcIs a matrix of the class weights,
Figure GDA0003502289810000101
Figure GDA0003502289810000102
v(i)to contain zt,iFront of
Figure GDA0003502289810000103
A set of target domain sample points of the nearest neighbor point,
Figure GDA0003502289810000104
δ∈[0,1]is a pre-set abutment factor and is,
Figure GDA0003502289810000105
is a rounded up symbol;
Figure GDA0003502289810000111
representing sample points
Figure GDA0003502289810000112
And sample point
Figure GDA0003502289810000113
The same as the category label of (1);
Figure GDA0003502289810000114
representing sample points
Figure GDA0003502289810000115
The category label of (a) is set,
Figure GDA0003502289810000116
representing sample points
Figure GDA0003502289810000117
The category label of (1), wherein,
Figure GDA0003502289810000118
is that
Figure GDA0003502289810000119
Is determined by the point of the neighborhood of the point,
Figure GDA00035022898100001110
αi、αja weight coefficient representing each sample in the target domain; ztRepresenting the form of the target domain data in projection space, Zt=PTXt,zt,iRepresents ZtThe ith data in (1);
Figure GDA00035022898100001111
respectively a source domain sample point and a target domain sample point;
Figure GDA00035022898100001112
the source domain sample points and the target domain sample points with the category of c;
Figure GDA00035022898100001113
Lslaplace matrix, L, for the source domaintA Laplace matrix as a target domain; g is a matrix of the graph,
Figure GDA00035022898100001114
Figure GDA00035022898100001115
Figure GDA00035022898100001116
and is
Figure GDA00035022898100001117
Figure GDA00035022898100001118
And is
Figure GDA00035022898100001119
With a representation dimension of ns×ntReal number space of, nsAnd ntRepresenting a spatial dimension;
Figure GDA0003502289810000121
,zi=PTxi,zj=PTxjas projected sample points, y (z)i),y(zj) Labels for the projected sample points;
Figure GDA0003502289810000122
u(i)to comprise
Figure GDA0003502289810000127
The target domain sample point set of k nearest neighbors,
Figure GDA0003502289810000123
k is the preset number of nearest neighbors, C ∈ {1, 2., C } is the number of categories common to the source domain and the target domain,
Figure GDA0003502289810000124
and
Figure GDA0003502289810000125
the number of samples of the source domain and target domain class c respectively,
Figure GDA0003502289810000126
respectively a sample point with the category c in the source domain and the target domain; n is nsnt
Figure GDA0003502289810000131
In,Id
Identity matrices of dimensions n and d, 1n×nIs a square matrix with elements of 1.
In some possible implementations of the second aspect, the loop module is specifically configured to: by the formula
Figure GDA0003502289810000132
Calculating to obtain an updated graph matrix according to the label and the pseudo label; by the formula
Figure GDA0003502289810000133
Calculating to obtain an updated category weight matrix according to the label and the pseudo label; by the formula
Figure GDA0003502289810000141
And calculating to obtain an updated Laplace matrix of the target domain according to the label and the pseudo label.
In some possible implementations of the second aspect, the pseudo tag updating module is specifically configured to:
training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier;
and classifying the projected target domain sample data by using a second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is a pseudo label.
In some possible implementations of the second aspect, the pseudo tag initialization module is specifically configured to:
training an image classifier by using a source domain data set to obtain a trained first image classifier;
and classifying the sample images of each target domain by using the first image classifier to obtain a second classification result of the sample images of each target domain, wherein the second classification result is an initialization pseudo label.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to any one of the first aspect is implemented.
In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program is executed by a processor to implement the method according to any one of the above first aspects.
In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on an electronic device, causes the electronic device to perform the method of any one of the above first aspects.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Drawings
Fig. 1 is a schematic flowchart of a cross-domain image classification method based on class consistency structured learning according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of another flowchart of a cross-domain image classification method based on class consistency structure learning according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a cross-domain image classification device based on class consistency structure learning according to an embodiment of the present application;
fig. 4 is a block diagram schematically illustrating a structure of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
At present, most unsupervised domain adaptation methods do not consider the consistency and continuity of similar samples among the fields, so that the situations of fuzzy inter-class distribution boundary and sparse intra-class sample distribution occur after the convergence through a plurality of times of projection learning.
In order to improve the consistency and continuity of the intra-Class samples, the embodiment of the present application provides a Cross-Domain image classification method (ccs-Domain Class-Wise Structure Learning, CCSL) based on Class consistency Structure Learning, so as to promote the knowledge migration from a source Domain to a target Domain by Learning a laplacian matrix from the source Domain to the target Domain on the basis of a CKET algorithm.
In addition, considering that the pseudo label learned by the target domain has certain uncertainty, the embodiment of the application provides a pseudo label credibility mechanism based on the sample distribution characteristics so as to improve the credibility of the pseudo label of the target domain and further reduce the risk of negative knowledge migration from the source domain to the target domain.
Referring to fig. 1, a flowchart of a cross-domain image classification method based on class consistency structured learning according to an embodiment of the present application is shown, where the method includes the following steps:
step S101, a source domain data set and a target domain data set are obtained, wherein the source domain data set comprises source domain sample images and labels of the source domain sample images, and the target domain data set comprises target domain sample images.
It is to be understood that the source domain data set is labeled data and the target domain data set is unlabeled data. The label in the source domain sample image may characterize the class to which the image belongs.
Illustratively, the source domain data set and the target domain data set may use existing public data sets, such as Office-caltech (surf), COIL20, PIE, and the like. Where Office-caltech (surf) and COIL20 are data sets for item recognition and PIE is a data set for face gesture recognition.
Taking the Office-Caltech (SURF) dataset as an example, Office-Caltech (SURF) includes four subdata sets, C (Caltech), A (Amazon), W (Webcam), and D (DSLR), respectively. Each sub data set comprises a different number of pictures with a common number of categories of 10 between the 4 sub data sets. Randomly selecting one data set from the 4 data subsets as the source domain data set, and selecting one data set from the remaining 3 data subsets as the target domain data set. For example, a Caltech subdata set is selected as a source domain data set, and an Amazon subdata set is selected as a target domain data set. The Caltech subdata set comprises 1123 pictures, the Amazon subdata set comprises 958 pictures, and each picture is compressed and extracted into 800-dimensional column directionMeasuring and finally obtaining a source domain data matrix Xs∈R800×1123Target field data matrix Xt∈R800×958The label information corresponding to the source field is the real category information of each picture, so that the label Y of the sample image of the source fields∈R1123×1,YsEach specific element y insThe label information representing the sample has ten categories, so ys∈{1,2,...10}。
In addition, some relevant parameters need to be acquired. The relevant parameters may include α, β, η, γ, δ, projection subspace dimension d, number of neighbors k, and number of iterations T. Illustratively, k is 5 and T is 10.
After obtaining the relevant parameters, the optimal values of the parameters α, β, γ may be found according to Grid Search lookup, where η is 0.5, for example.
In a specific application, corresponding parameter values can be input according to different selected source domain data sets and target domain data sets. For example, the subspace dimension d of Office-Caltech10 is 10, δ is 0.1; COIL20 corresponds to d ═ 20 and δ ═ 0.05; and d is 100 and delta is 0.25 corresponding to the PIE.
Step S102, obtaining initialization pseudo labels of all target domain sample images based on a first image classifier trained by using a source domain data set.
After obtaining the target domain data set, the source domain data set, and the associated parameters, the pseudo label y of the target domain sample image may be initialized in the original spacet,i(1≤i≤nt). Wherein the original space is relative to the projection space. Illustratively, the initialization pseudo label of the target domain sample image may be obtained by the following process: the image classifier may be trained using the labeled source domain sample images to obtain a trained first image classifier. The image classifier may be exemplified by a K-neighbor classifier. After the trained first image classifier is obtained, classifying each target domain sample image by using the first image classifier to obtain a classification result of each target domain sample image, and taking the classification result of the target domain sample image as the target domain sample imageThe classification result of the target domain sample image is used for representing the belonged category of the image.
Step S103, projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialized pseudo labels and the labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplacian matrix of the source domain and a Laplacian matrix of the target domain.
In specific application, each sample image in the source domain data set and each sample image in the target domain data set are projected to the same public space, so that a source domain sample point corresponding to each source domain sample image and a target domain sample point corresponding to each target domain sample image are obtained. That is, the source domain sample points are sample points at which the source domain sample images are projected into the common space, and the target domain sample points are sample points at which the target domain sample images are projected into the common space.
It will be appreciated that the source domain exemplar points in the common space have corresponding labels, and that the labels of the source domain exemplar points in the common space are the same as the labels of the source domain exemplar images in the original space. Similarly, the target domain sample point also corresponds to a corresponding pseudo label, and is the same as the pseudo label of the target domain sample image in the original space. Based on the above, after projection, the category to which each source domain sample point belongs can be determined according to the label, the category to which each target domain sample point belongs can be determined according to the pseudo label, and further the target domain sample point and the source domain sample point which belong to the same category can be determined.
From the target domain sample points and the source domain sample points that belong to the same class, an initialized projection matrix P ═ I can be obtainedmRm×m. Then, according to the information of the initialized projection matrix, the label, the pseudo label and the like, the formula is used
Figure GDA0003502289810000171
And calculating to obtain an initialized graph matrix G. Wherein, the formula
Figure GDA0003502289810000172
The acquisition process of (a) may be as follows:
firstly, in order to enhance the classification performance of the target domain image by enhancing the compactness of the sample clusters of the same category, the following mathematical expression can be obtained:
Figure GDA0003502289810000173
P∈Rm×dc ∈ {1, 2., C } is the number of classes common to the source and target domains,
Figure GDA0003502289810000181
and
Figure GDA0003502289810000182
the number of samples of the source domain and target domain class c respectively,
Figure GDA0003502289810000183
sample points of class c for the source domain and the target domain, respectively.
Figure GDA0003502289810000184
To reassign the weights to the target domain samples based on the sample distribution characteristics,
Figure GDA0003502289810000185
the assignment rule is determined by the following trust mechanism: and when the projected target domain sample point is positioned in the first k nearest adjacent points of the projected source domain sample point, assigning 1, otherwise, assigning 0. Expressed as:
Figure GDA0003502289810000186
u(i)is composed of
Figure GDA0003502289810000187
Of k nearest neighbors, wherein
Figure GDA0003502289810000188
Figure GDA0003502289810000189
k is the preset nearest neighbor number.
Based on this, the above equation 1 can be rewritten as the following equation:
Figure GDA00035022898100001810
X=[Xs,Xt]∈Rm×nfor the original source and target domain data sets, XsFor the source domain data set, XtFor the target domain data set, n ═ ns+nt
Figure GDA00035022898100001811
Figure GDA0003502289810000191
,zi=PTxi,zj=PTxjAs projected sample points, y (z)i),y(zj) Respectively the label values of the sample points,
Figure GDA0003502289810000192
Figure GDA0003502289810000193
respectively two diagonal matrices
Figure GDA0003502289810000194
Figure GDA0003502289810000195
After the initialized projection matrix is obtained, the information such as the initialized projection matrix, the label and the pseudo label is processed by a formula
Figure GDA0003502289810000201
Calculating to obtain an initialized class weight matrix Mc
And, based on the information such as the initialized projection matrix, the label and the pseudo label, the formula is used
Figure GDA0003502289810000211
Calculating to obtain an initialized Laplace matrix L of the source domainsAnd the Laplace matrix L of the target domaint
Wherein, the formula
Figure GDA0003502289810000221
And
Figure GDA0003502289810000231
the acquisition process of (a) may be as follows:
firstly, in order to realize global matching of a source domain and a target domain, edge distribution matching and conditional distribution matching are introduced, and a specific expression is as follows:
Figure GDA0003502289810000241
wherein the content of the first and second substances,
Figure GDA0003502289810000251
Figure GDA0003502289810000261
xi,xjsubscripts for source domain and target domain sample points, respectively;
Figure GDA0003502289810000262
respectively a source domain sample point and a target domain sample point;
Figure GDA0003502289810000263
Respectively, a source domain sample point and a target domain sample point of class c.
Then, adding local manifold learning in the source domain and the target domain respectively, wherein the specific expression is as follows:
Figure GDA0003502289810000271
wherein the content of the first and second substances,
Figure GDA0003502289810000272
Figure GDA0003502289810000273
to ensure a more compact distribution of sample points of the same class, the weighting coefficients for the target domain samples are reassigned using the following equation:
Figure GDA0003502289810000281
δ∈[0,1]is a pre-set abutment factor and is,
Figure GDA0003502289810000282
to round up the symbol, v(i)To contain zt,iFront of
Figure GDA0003502289810000283
A set of target domain sample points of the nearest neighbor point.
Based on this, the weight coefficients for the target domain sample points may be defined as follows:
Figure GDA0003502289810000284
up to this point, the above equations (3) and (4) may be rewritten as the following expressions, respectively:
Figure GDA0003502289810000291
wherein the content of the first and second substances,
Figure GDA0003502289810000301
1≤c≤C,M0the assignment is as in equation (3) above.
Figure GDA0003502289810000311
Wherein the content of the first and second substances,
Figure GDA0003502289810000312
y(xt,i)=y(xt,j)∧αiαj=1,Ls,Ltlaplacian matrices for the source domain and the target domain respectively,
Figure GDA0003502289810000313
Figure GDA0003502289810000321
as a diagonal matrix
Figure GDA0003502289810000322
LsThe solution process is similar.
The initialized target domain pseudo label, the initialized graph matrix G and the initialized category weight matrix M can be obtained through the stepscLaplace matrix L of initialized source domainsAnd the Laplace matrix L of the initialized target domaint
And step S104, updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix.
Obtaining initialized graph matrix G and category weight matrix McSource domain laplacian matrix LsAnd the Laplace matrix L of the target domaintThen, by the formula (X Ω X)T+βIm)P=XHXTP Φ, the updated projection matrix P is calculated.
Wherein, Im∈Rm×mIs an identity matrix with dimension m, phi ═ diag (phi)1Φ2,...,Φd)∈Rd×dA diagonal matrix whose diagonal elements are lagrange multipliers;
Figure GDA0003502289810000323
equation (X Ω X)T+βIm)P=XHXTThe acquisition process of P Φ can be as follows:
first, the above formula (2), formula (7) and formula (8) are combined, and a regularization term is added
Figure GDA0003502289810000324
The final objective function of CCSL can be obtained:
Figure GDA0003502289810000325
s.t.PTXHXTP=Id
(9)
wherein the constraint term is derived by Principal Component Analysis (PCA) to maximize data variance,
Figure GDA0003502289810000331
In,Idrespectively identity matrices of dimensions n and d,
Figure GDA0003502289810000332
the matrix is a square matrix with elements of 1, and alpha, beta, eta and gamma are 4 preset hyper-parameters.
The above equation (9) is a nonlinear function, so the solution is performed using a lagrange multiplier, and the lagrange function is as follows:
Figure GDA0003502289810000333
finally, taking the derivative of P of equation (10) and making it 0, the following equation can be obtained:
(XΩXT+βIm)P=XHXTPΦ (11)
and selecting the first d minimum eigenvectors to form a projection matrix P.
That is, based on the above equation (11), the matrix G and the matrix M are determinedcSource domain laplacian matrix LsAnd the Laplace matrix L of the target domaintAnd calculating a projection matrix P, wherein the projection matrix P is the updated projection matrix.
And S105, performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data.
After obtaining the updated projection matrix P according to the above equation (11), respectively projecting each source domain sample image in the source domain data set and each target domain sample image in the target domain data set to the same public space by using the updated projection matrix P, and obtaining the projected source domain sample data Zs=PT*XsAnd projected target domain sample data Zt=PT*Xt。
And S106, classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data.
In some embodiments, tagged projected source domain sample data Z is useds=PT*XsAnd training the image classifier to obtain a second trained image classifier. The image classifier may be exemplified by a K-neighbor classifier. Then, a second image classifier is used for sampling data Z of the projected target domaint=PT*XtAnd classifying to obtain a classification result of each projected target domain sample data, and taking the classification result as a pseudo label of the projected target domain sample data. That is to say that the first and second electrodes,and updating the initialized pseudo label to obtain an updated target domain pseudo label.
And step S107, judging whether the circulation frequency reaches the preset frequency, if not, entering step S108, otherwise, entering step S109 if the circulation frequency reaches the preset frequency.
In a specific application, the steps S104 to S108 are executed in a loop, that is, the projection matrix is continuously updated, and the updated projection matrix is used to update the pseudo tag of the target domain. If the cumulative cycle number at the current time is greater than or equal to the preset number after the step S106 is executed, step S109 is performed, that is, the pseudo label calculated in the step S106 in the current cycle process is used as the final image classification result, and the updated projection matrix obtained in the step S104 in the current cycle process is output; otherwise, if the cumulative cycle count at the current time does not reach the preset count, the process goes to step S108, and then returns to step S104.
And S108, respectively updating the graph matrix, the category weight matrix and the Laplace matrix of the target domain according to the labels and the pseudo labels to obtain an updated graph matrix, an updated category weight matrix and an updated Laplace matrix of the target domain, and returning to the S104.
In specific application, the matrix G may be updated according to the above formula (2) and the matrix M may be updated according to the above formula (7) according to information such as tags and pseudo tagscThe Laplace matrix L of the target domain is updated according to the above equation (8)t. Then according to the Laplace matrix L of the source domainsUpdated matrix G and matrix M obtained in step S108cAnd the Laplace matrix L of the target domaintAnd updating the projection matrix to obtain the updated projection matrix. Then, using the updated projection matrix to perform projection learning on each data in the source domain data set and the target domain data set to obtain projected source domain sample data and projected target domain sample data, using the projected source domain sample data to train an image classifier to obtain a trained image classifier, and using the trained image classifier to perform projection learning on the projected target domain sampleThe data is classified, and the pseudo label of the projected target domain sample data is obtained. Finally, judging whether the current accumulated cycle number reaches a preset number, and if the current accumulated cycle number reaches the preset number, taking the pseudo label as a final image classification result; if the cycle times do not reach the preset times, updating the matrix G and the matrix McAnd the Laplace matrix L of the target domaint. And circulating in sequence until the circulation times reach the preset times.
And step S109, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label.
It is to be understood that the pseudo label of each target domain sample image may characterize the category to which the target domain sample image belongs. In other embodiments, the projection matrix P may be output in addition to the pseudo-tags.
It should be noted that while the CKET learns the projection matrix, the weights of the target domain samples are redistributed according to the sparsity of the distribution of the target domain samples in the projection space. On the basis of CKET, the Laplace matrix from the source domain to the target domain in the same category is learned, so that the consistency and continuity of the intra-class samples between the domains are further improved, and further the knowledge transfer from the source domain to the target domain is improved.
Therefore, the laplacian matrix from the source domain to the target domain is learned to provide consistency and continuity of the samples in the class between the domains, so that the classification performance of the model on the samples in the target domain is further improved, and the knowledge transfer from the source domain to the target domain is promoted.
In order to better describe the CCSL method provided in the embodiment of the present application, it needs to be described with reference to another schematic flow diagram of the class consistency structure learning-based cross-domain image classification method shown in fig. 2. At this time, the image classifier is specifically a K-nearest neighbor classifier, and the preset number of times is 10.
As shown in fig. 2, the number of initialization cycles t is 1, and corresponding parameter values of parameters α, β, η, γ, δ, etc., as well as the projection subspace dimension d and the number of tie points k are input according to different data sets. Namely, corresponding parameter values are input according to the difference between the selected source domain data set and the selected target domain data set.
And then training the K neighbor classifier by using the source domain data set to obtain the trained K neighbor classifier.
Next, each sample in the target domain dataset is classified using the trained K-nearest neighbor classifier to initialize a pseudo label for each target domain sample image. And calculating a graph matrix G according to the above expression (2), and calculating a category weight matrix M according to the above expression (7)cCalculating Laplace matrix L of the target domain according to the above equation (8)tLaplace matrix L of sum source domainsTo matrix G, matrix McLaplace matrix L of the target domaintAnd a Laplace matrix L of the source domainsInitialization is performed.
Then, the projection matrix P is calculated according to the above equation (11) to update the initialized projection matrix, and an updated projection matrix is obtained. And performing projection learning on the source domain data and the target domain data by using the updated projection matrix to obtain projected source domain data Zs=PT*XsAnd projected target domain data Zt=PT*Xt. Reusing projected source domain data Zs=PT*XsTraining the K nearest neighbor classifier, obtaining the trained K nearest neighbor classifier, and using the trained K nearest neighbor classifier to project the target domain data Zt=PT*XtAnd classifying to obtain a classification result, and taking the classification result as a target domain pseudo label.
Judging whether the cycle times t is less than or equal to 10, if so, returning to the calculation of the matrix G according to the formula (2), and calculating the matrix M according to the formula (7)cUpdating the Laplace matrix L of the target domain according to the above equation (8)tLaplace matrix L of sum source domainsIf not, outputting a target domain pseudo label and a projection matrix P, wherein the target domain pseudo label is a final image classification result.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Corresponding to the method for classifying cross-domain images based on class consistency structure learning described in the foregoing embodiments, fig. 3 shows a block diagram of a cross-domain image classification device based on class consistency structure learning provided in an embodiment of the present application, and for convenience of explanation, only the relevant parts of the embodiment of the present application are shown.
Referring to fig. 3, the apparatus includes:
an obtaining module 31, configured to obtain a source domain data set and a target domain data set, where the source domain data set includes a source domain sample image and a label of the source domain sample image, and the target domain data set includes a target domain sample image;
a pseudo label initialization module 32, configured to obtain an initialized pseudo label of each target domain sample image based on a first image classifier trained using the source domain data set;
the initialization module 33 is configured to project the source domain data set and the target domain data set to the same public space, obtain source domain sample points and target domain sample points, and initialize according to the same type of source domain sample points and target domain sample points based on the initialization pseudo tag and the initialization label, so as to obtain a projection matrix, a graph matrix, a type weight matrix, a laplacian matrix of the source domain, and a laplacian matrix of the target domain;
a projection matrix updating module 34, configured to update the projection matrix according to the graph matrix, the category weight matrix, the laplacian matrix of the source domain, and the laplacian matrix of the target domain, so as to obtain an updated projection matrix;
the projection module 35 is configured to perform projection learning on the source domain data set and the target domain data set by using the updated projection matrix, and obtain projected source domain sample data and projected target domain sample data;
a pseudo label updating module 36, configured to classify the projected target domain sample data based on a second image classifier trained using the projected source domain sample data with a label, and obtain a pseudo label of the projected target domain sample data;
the circulation module 37 is configured to update the graph matrix, the category weight matrix, and the laplacian matrix of the target domain according to the label and the pseudo label when the circulation frequency does not reach the preset frequency, to obtain an updated graph matrix, an updated category weight matrix, and an updated laplacian matrix of the target domain, and to return to the laplacian matrix of the source domain and the laplacian matrix of the target domain according to the graph matrix, the category weight matrix, the laplacian matrix of the source domain, and the laplacian matrix of the target domain based on the updated graph matrix, the updated category weight matrix, and the updated laplacian matrix of the target domain, to update the projection matrix, and to obtain an updated projection matrix; when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label;
wherein, the projection matrix updating module is specifically configured to: by the formula (X.OMEGA.X)T+βIm)P=XHXTP phi obtains an updated projection matrix; beta is a hyperparameter;
wherein X ═ Xs,Xt]∈Rm×n,XsFor the source domain data set, XtFor the target domain data set, P is the projection matrix, Im∈Rm×mIs an identity matrix of dimension m, Rm×mRepresenting a real space of dimensions m x m, Rm×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions; Φ to diag (Φ)1Φ2,...,Φd)∈Rd×dA diagonal matrix whose diagonal elements are lagrange multipliers;
Figure GDA0003502289810000361
alpha, eta and gamma are all hyper-parameters; mcIs a matrix of the class weights,
Figure GDA0003502289810000371
Figure GDA0003502289810000372
v(i)to contain zt,iFront of
Figure GDA0003502289810000373
A set of target domain sample points of the nearest neighbor point,
Figure GDA0003502289810000374
δ∈[0,1]is a pre-set abutment factor and is,
Figure GDA0003502289810000375
is a rounded up symbol;
Figure GDA0003502289810000381
representing sample points
Figure GDA0003502289810000382
And sample point
Figure GDA0003502289810000383
The same as the category label of (1);
Figure GDA0003502289810000384
representing sample points
Figure GDA0003502289810000385
The category label of (a) is set,
Figure GDA0003502289810000386
representing sample points
Figure GDA0003502289810000387
The category label of (1), wherein,
Figure GDA0003502289810000388
is that
Figure GDA0003502289810000389
Is determined by the point of the neighborhood of the point,
Figure GDA00035022898100003810
αi、αja weight coefficient representing each sample in the target domain; ztIndicating that the target domain data is in the projection spaceOf the intermediate form, Zt=PTXt,zt,iRepresents ZtThe ith data in (1);
Figure GDA00035022898100003811
respectively a source domain sample point and a target domain sample point;
Figure GDA00035022898100003812
the source domain sample points and the target domain sample points with the category of c;
Figure GDA00035022898100003813
Lslaplace matrix, L, for the source domaintA Laplace matrix as a target domain; g is a matrix of the graph,
Figure GDA00035022898100003814
Figure GDA00035022898100003815
Figure GDA00035022898100003816
and is
Figure GDA00035022898100003817
Figure GDA00035022898100003818
And is
Figure GDA00035022898100003819
With a representation dimension of ns×ntReal number space of, nsAnd ntRepresenting a spatial dimension;
Figure GDA0003502289810000391
,zi=PTxi,zj=PTxjas projected sample points, y (z)i),y(zj) Labels for the projected sample points;
Figure GDA0003502289810000392
u(i)to comprise
Figure GDA0003502289810000393
The target domain sample point set of k nearest neighbors,
Figure GDA0003502289810000394
k is the preset number of nearest neighbors, C ∈ {1, 2., C } is the number of categories common to the source domain and the target domain,
Figure GDA0003502289810000395
and
Figure GDA0003502289810000396
the number of samples of the source domain and target domain class c respectively,
Figure GDA0003502289810000397
respectively a sample point with the category c in the source domain and the target domain; n is ns+nt
Figure GDA0003502289810000401
In,IdIdentity matrices of dimensions n and d, 1n×nIs a square matrix with elements of 1.
In some possible implementations, the loop module is specifically configured to: by the formula
Figure GDA0003502289810000402
Calculating to obtain an updated graph matrix according to the label and the pseudo label; by the formula
Figure GDA0003502289810000403
Calculating to obtain an updated category weight matrix according to the label and the pseudo label; by the formula
Figure GDA0003502289810000411
And calculating to obtain an updated Laplace matrix of the target domain according to the label and the pseudo label.
In some possible implementations, the pseudo tag update module is specifically configured to: training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier; and classifying the projected target domain sample data by using a second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is a pseudo label.
In some possible implementations, the pseudo tag initialization module is specifically configured to: training an image classifier by using a source domain data set to obtain a trained first image classifier; and classifying the sample images of each target domain by using the first image classifier to obtain a second classification result of the sample images of each target domain, wherein the second classification result is an initialization pseudo label.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the method embodiment in the embodiment of the present application, which may be referred to in the method embodiment section specifically, and are not described herein again.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the processor 40 implementing the steps in any of the various object tracking method embodiments described above when executing the computer program 42.
The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The electronic device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of the electronic device 4, and does not constitute a limitation of the electronic device 4, and may include more or less components than those shown, or combine some of the components, or different components, such as an input-output device, a network access device, etc.
The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may in some embodiments be an internal storage unit of the electronic device 4, such as a hard disk or a memory of the electronic device 4. The memory 41 may also be an external storage device of the electronic device 4 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the electronic device 4. The memory 41 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
An embodiment of the present application further provides an electronic device, including: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on an electronic device, enables the electronic device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A cross-domain image classification method based on class consistency structured learning is characterized by comprising the following steps:
obtaining a source domain data set and a target domain data set, wherein the source domain data set comprises a source domain sample image and a label of the source domain sample image, and the target domain data set comprises a target domain sample image;
obtaining an initialized pseudo label of each target domain sample image based on a first image classifier trained by using the source domain data set;
projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialized pseudo labels and the labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplace matrix of a source domain and a Laplace matrix of a target domain;
updating the projection matrix according to the graph matrix, the class weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;
performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data;
classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data;
when the cycle times do not reach the preset times, respectively updating the graph matrix, the category weight matrix and the Laplacian matrix of the target domain according to the label and the pseudo label to obtain an updated graph matrix, an updated category weight matrix and an updated Laplacian matrix of the target domain, and returning to update the projection matrix according to the graph matrix, the category weight matrix, the Laplacian matrix of the source domain and the Laplacian matrix of the target domain to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplacian matrix of the target domain;
when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is the pseudo label;
wherein, according to the graph matrix, the class weight matrix, the laplacian matrix of the source domain, and the laplacian matrix of the target domain, updating the projection matrix to obtain an updated projection matrix, includes: by the formula (X.OMEGA.X)T+βIm)P=XHXTP phi obtains the updated projection matrix; beta is a hyperparameter;
wherein X ═ Xs,Xt]∈Rm×n,XsFor the source domain data set, XtFor the target domain data set, P is the projection matrix, Im∈Rm×mIs an identity matrix of dimension m, Rm×mWith a representation dimension of m x mReal number space, Rm×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions;
Φ=diag(Φ1,Φ2,...,Φd)∈Rd×da diagonal matrix whose diagonal elements are lagrange multipliers;
Figure FDA0003502289800000021
alpha, eta and gamma are all hyper-parameters; mcFor the said class weight matrix, the class weight matrix,
Figure FDA0003502289800000022
Figure FDA0003502289800000031
v(i)to contain zt,iFront of
Figure FDA0003502289800000032
A set of target domain sample points of the nearest neighbor point,
Figure FDA0003502289800000033
δ∈[0,1]is a pre-set abutment factor and is,
Figure FDA0003502289800000034
is a rounded up symbol;
Figure FDA0003502289800000035
representing sample points
Figure FDA0003502289800000036
And sample point
Figure FDA0003502289800000037
The same as the category label of (1);
Figure FDA0003502289800000038
representing sample points
Figure FDA0003502289800000039
The category label of (a) is set,
Figure FDA00035022898000000310
representing sample points
Figure FDA00035022898000000311
The category label of (1), wherein,
Figure FDA00035022898000000312
is that
Figure FDA00035022898000000313
Is determined by the point of the neighborhood of the point,
Figure FDA00035022898000000314
αi、αja weight coefficient representing each sample in the target domain; ztRepresenting the form of the target domain data in projection space, Zt=PTXt,zt,iRepresents ZtThe ith data in (1);
Figure FDA00035022898000000315
respectively a source domain sample point and a target domain sample point;
Figure FDA00035022898000000316
the source domain sample points and the target domain sample points with the category of c;
Figure FDA00035022898000000317
Lsis the Laplace matrix of the source domain, LtA Laplace matrix for the target domain; g is the matrix of the graph,
Figure FDA00035022898000000318
Figure FDA0003502289800000041
Figure FDA0003502289800000042
and is
Figure FDA0003502289800000043
Figure FDA0003502289800000044
And is
Figure FDA0003502289800000045
Figure FDA0003502289800000046
With a representation dimension of ns×ntReal number space of, nsAnd ntRepresenting a spatial dimension;
Figure FDA0003502289800000047
,zi=PTxi,zj=PTxjas projected sample points, y (z)i),y(zj) Labels for the projected sample points;
Figure FDA0003502289800000048
u(i)to comprise
Figure FDA0003502289800000051
The target domain sample point set of k nearest neighbors,
Figure FDA0003502289800000052
k is the preset number of nearest neighbors, C ∈ {1, 2., C } is the number of categories common to the source domain and the target domain,
Figure FDA0003502289800000053
and
Figure FDA0003502289800000054
the number of samples of the source domain and target domain class c respectively,
Figure FDA0003502289800000055
respectively a sample point with the category c in the source domain and the target domain; n is ns+nt
Figure FDA0003502289800000056
In,IdIdentity matrices of dimensions n and d, 1n×nIs a square matrix with elements of 1.
2. The method of claim 1, wherein updating the graph matrix, the class weight matrix, and the target domain laplacian matrix according to the labels and the pseudo labels to obtain an updated graph matrix, an updated class weight matrix, and an updated target domain laplacian matrix respectively comprises:
by the formula
Figure FDA0003502289800000057
Calculating to obtain the updated graph matrix according to the label and the pseudo label;
by the formula
Figure FDA0003502289800000061
Calculating to obtain the updated category weight matrix according to the label and the pseudo label;
by the formula
Figure FDA0003502289800000071
And calculating to obtain the updated Laplace matrix of the target domain according to the label and the pseudo label.
3. The method of claim 1, wherein classifying the projected target domain sample data based on a second image classifier trained using the projected source domain sample data with the label to obtain a pseudo label for the projected target domain sample data comprises:
training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier;
classifying the projected target domain sample data by using the second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is the pseudo label.
4. The method of claim 1, wherein obtaining an initialized pseudo label for each of the target domain sample images based on a first image classifier trained using the source domain dataset comprises:
training an image classifier by using the source domain data set to obtain the trained first image classifier;
classifying each target domain sample image by using the first image classifier to obtain a second classification result of each target domain sample image, wherein the second classification result is the initialized pseudo label.
5. A cross-domain image classification device based on class consistency structured learning is characterized by comprising the following components:
an obtaining module, configured to obtain a source domain data set and a target domain data set, where the source domain data set includes a source domain sample image and a label of the source domain sample image, and the target domain data set includes a target domain sample image;
a pseudo label initialization module, configured to obtain an initialized pseudo label of each target domain sample image based on a first image classifier trained using the source domain data set;
the initialization module is used for projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialized pseudo labels and the labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplace matrix of a source domain and a Laplace matrix of a target domain;
the projection matrix updating module is used for updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;
the projection module is used for performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data;
a pseudo label updating module, configured to classify the projected target domain sample data based on a second image classifier trained using the projected source domain sample data with the label, and obtain a pseudo label of the projected target domain sample data;
the circulation module is used for respectively updating the graph matrix, the category weight matrix and the Laplace matrix of the target domain according to the label and the pseudo label when the circulation frequency does not reach a preset frequency to obtain an updated graph matrix, an updated category weight matrix and an updated Laplace matrix of the target domain, and returning to update the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplace matrix of the target domain; when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is the pseudo label;
wherein the projection matrix updating module is specifically configured to:
by the formula (X.OMEGA.X)T+βIm)P=XHXTP phi obtains the updated projection matrix; beta is a hyperparameter;
wherein X ═ Xs,Xt]∈Rm×n,XsFor the source domain data set, XtFor the target domain data set, P is the projection matrix, Im∈Rm×mIs an identity matrix of dimension m, Rm×mRepresenting a real space of dimensions m x m, Rm×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions; Φ to diag (Φ)1,Φ2,...,Φd)∈Rd×dA diagonal matrix whose diagonal elements are lagrange multipliers;
Figure FDA0003502289800000091
alpha, eta and gamma are all hyper-parameters; mcFor the said class weight matrix, the class weight matrix,
Figure FDA0003502289800000092
Figure FDA0003502289800000093
v(i)to contain zt,iFront of
Figure FDA0003502289800000094
A set of target domain sample points of the nearest neighbor point,
Figure FDA0003502289800000101
δ∈[0,1]is a pre-set abutment factor and is,
Figure FDA0003502289800000102
is a rounded up symbol;
Figure FDA0003502289800000103
representing sample points
Figure FDA0003502289800000104
And sample point
Figure FDA0003502289800000105
The same as the category label of (1);
Figure FDA0003502289800000106
representing sample points
Figure FDA0003502289800000107
The category label of (a) is set,
Figure FDA0003502289800000108
representing sample points
Figure FDA0003502289800000109
The category label of (1), wherein,
Figure FDA00035022898000001010
is that
Figure FDA00035022898000001011
Is determined by the point of the neighborhood of the point,
Figure FDA00035022898000001012
αi、αja weight coefficient representing each sample in the target domain; ztRepresenting the form of the target domain data in projection space, Zt=PTXt,zt,iRepresents ZtThe ith data in (1);
Figure FDA00035022898000001013
respectively a source domain sample point and a target domain sample point;
Figure FDA00035022898000001014
the source domain sample points and the target domain sample points with the category of c;
Figure FDA00035022898000001015
Lsis the Laplace matrix of the source domain, LtA Laplace matrix for the target domain; g is the matrix of the graph,
Figure FDA00035022898000001016
Figure FDA00035022898000001017
Figure FDA00035022898000001018
and is
Figure FDA00035022898000001019
Figure FDA0003502289800000111
And is
Figure FDA0003502289800000112
Figure FDA0003502289800000113
With a representation dimension of ns×ntReal number space of, nsAnd ntRepresenting a spatial dimension;
Figure FDA0003502289800000114
,zi=PTxi,zj=PTxjas projected sample points, y (z)i),y(zj) Labels for the projected sample points;
Figure FDA0003502289800000115
u(i)to comprise
Figure FDA0003502289800000116
The target domain sample point set of k nearest neighbors,
Figure FDA0003502289800000117
k is the preset number of nearest neighbors, C ∈ {1, 2., C } is the number of categories common to the source domain and the target domain,
Figure FDA0003502289800000118
and
Figure FDA0003502289800000121
the number of samples of the source domain and target domain class c respectively,
Figure FDA0003502289800000122
samples of class c in the source and target domains, respectivelyPoint; n is ns+nt
Figure FDA0003502289800000123
In,IdIdentity matrices of dimensions n and d, 1n×nIs a square matrix with elements of 1.
6. The apparatus of claim 5, wherein the circulation module is specifically configured to:
by the formula
Figure FDA0003502289800000124
Calculating to obtain the updated graph matrix according to the label and the pseudo label;
by the formula
Figure FDA0003502289800000131
Calculating to obtain the updated category weight matrix according to the label and the pseudo label;
by the formula
Figure FDA0003502289800000141
And calculating to obtain the updated Laplace matrix of the target domain according to the label and the pseudo label.
7. The apparatus of claim 5, wherein the pseudo tag update module is specifically configured to:
training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier;
classifying the projected target domain sample data by using the second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is the pseudo label.
8. The apparatus of claim 5, wherein the pseudo tag initialization module is specifically configured to:
training an image classifier by using the source domain data set to obtain the trained first image classifier;
classifying each target domain sample image by using the first image classifier to obtain a second classification result of each target domain sample image, wherein the second classification result is the initialized pseudo label.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN202111530728.1A 2021-12-15 2021-12-15 Cross-domain image classification method based on class consistency structured learning and related device Active CN113920382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111530728.1A CN113920382B (en) 2021-12-15 2021-12-15 Cross-domain image classification method based on class consistency structured learning and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111530728.1A CN113920382B (en) 2021-12-15 2021-12-15 Cross-domain image classification method based on class consistency structured learning and related device

Publications (2)

Publication Number Publication Date
CN113920382A CN113920382A (en) 2022-01-11
CN113920382B true CN113920382B (en) 2022-03-15

Family

ID=79248895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111530728.1A Active CN113920382B (en) 2021-12-15 2021-12-15 Cross-domain image classification method based on class consistency structured learning and related device

Country Status (1)

Country Link
CN (1) CN113920382B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114219047B (en) * 2022-02-18 2022-05-10 深圳大学 Heterogeneous domain self-adaption method, device and equipment based on pseudo label screening
CN117237857B (en) * 2023-11-13 2024-02-09 腾讯科技(深圳)有限公司 Video understanding task execution method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348081A (en) * 2020-11-05 2021-02-09 平安科技(深圳)有限公司 Transfer learning method for image classification, related device and storage medium
CN113420775A (en) * 2021-03-31 2021-09-21 中国矿业大学 Image classification method under extremely small quantity of training samples based on adaptive subdomain field adaptation of non-linearity

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320387A1 (en) * 2010-06-28 2011-12-29 International Business Machines Corporation Graph-based transfer learning
US10956817B2 (en) * 2018-04-18 2021-03-23 Element Ai Inc. Unsupervised domain adaptation with similarity learning for images
CN109299676A (en) * 2018-09-07 2019-02-01 电子科技大学 A kind of visual pursuit method of combining classification and domain adaptation
KR20200075344A (en) * 2018-12-18 2020-06-26 삼성전자주식회사 Detector, method of object detection, learning apparatus, and learning method for domain transformation
CN110348579B (en) * 2019-05-28 2023-08-29 北京理工大学 Domain self-adaptive migration feature method and system
CN111340021B (en) * 2020-02-20 2022-07-15 中国科学技术大学 Unsupervised domain adaptive target detection method based on center alignment and relation significance
CN111783831B (en) * 2020-05-29 2022-08-05 河海大学 Complex image accurate classification method based on multi-source multi-label shared subspace learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348081A (en) * 2020-11-05 2021-02-09 平安科技(深圳)有限公司 Transfer learning method for image classification, related device and storage medium
CN113420775A (en) * 2021-03-31 2021-09-21 中国矿业大学 Image classification method under extremely small quantity of training samples based on adaptive subdomain field adaptation of non-linearity

Also Published As

Publication number Publication date
CN113920382A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
CN110033026B (en) Target detection method, device and equipment for continuous small sample images
CN113920382B (en) Cross-domain image classification method based on class consistency structured learning and related device
JP5848833B2 (en) Method and system for comparing images
EP2668618A1 (en) Method and system for comparing images
JP2015504215A5 (en)
CN108985190B (en) Target identification method and device, electronic equipment and storage medium
Shetty et al. Segmentation and labeling of documents using conditional random fields
Wang et al. Optimal transport for label-efficient visible-infrared person re-identification
CN111223128A (en) Target tracking method, device, equipment and storage medium
Son et al. Spectral clustering with brainstorming process for multi-view data
CN114169381A (en) Image annotation method and device, terminal equipment and storage medium
CN111104941B (en) Image direction correction method and device and electronic equipment
Xu et al. Graphical modeling for multi-source domain adaptation
Zhu et al. Multi-view multi-sparsity kernel reconstruction for multi-class image classification
US11941792B2 (en) Machine learning-based analysis of computing device images included in requests to service computing devices
CN114444565A (en) Image tampering detection method, terminal device and storage medium
Ghalyan Estimation of ergodicity limits of bag-of-words modeling for guaranteed stochastic convergence
CN112364916A (en) Image classification method based on transfer learning, related equipment and storage medium
JP2019086979A (en) Information processing device, information processing method, and program
Wang et al. Bayesian denoising hashing for robust image retrieval
Singh et al. Meta-DZSL: a meta-dictionary learning based approach to zero-shot recognition
CN111695526B (en) Network model generation method, pedestrian re-recognition method and device
Rad et al. A multi-view-group non-negative matrix factorization approach for automatic image annotation
Yan et al. Variational Bayesian learning for background subtraction based on local fusion feature
CN114492640A (en) Domain-adaptive-based model training method, target comparison method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant