CN113920382B

CN113920382B - Cross-domain image classification method based on class consistency structured learning and related device

Info

Publication number: CN113920382B
Application number: CN202111530728.1A
Authority: CN
Inventors: 陆玉武; 罗幸萍; 林德伟
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-15
Anticipated expiration: 2041-12-15
Also published as: CN113920382A

Abstract

The embodiment of the application discloses a cross-domain image classification method and device based on class consistency structured learning, electronic equipment and a computer readable storage medium. The method comprises the following steps: the method comprises the steps of obtaining an initialized pseudo label of a target domain sample based on a first image classifier trained by using source domain data, obtaining an initialized projection matrix, a graph matrix, a category weight matrix, a Laplace matrix of a source domain and a Laplace matrix of the target domain based on the initialized pseudo label and a label of the source domain, updating the projection matrix by using the matrix obtained through initialization, performing projection learning by using the updated projection matrix to update the initialized pseudo label, and finally taking the updated pseudo label as a final classification result of the target domain sample image when the cycle number reaches a preset number. Therefore, by learning the Laplace matrix from the source domain to the target domain in the same category, the consistency and continuity of the samples in the category are improved, and the classification performance of the model on the samples in the target domain is improved.

Description

Cross-domain image classification method based on class consistency structured learning and related device

Technical Field

The application belongs to the technical field of machine learning, and particularly relates to a cross-domain image classification method and device based on class consistency structured learning, electronic equipment and a computer-readable storage medium.

Background

The unsupervised Domain Adaptation (Domain Adaptation) refers to a machine learning method for training a model applied to an unlabeled target Domain in a source Domain containing labels.

Data distribution differences (edge distribution differences and conditional distribution differences) may exist between the source domain data set containing the label and the target domain data set containing no label, and therefore, when the source domain trained model is applied to the target domain, the performance of the model may be significantly reduced ("overfitting"). In order to alleviate the data Distribution difference between the source domain and the target domain, the conventional unsupervised domain Adaptation method may adopt a method based on feature Adaptation, such as Transport Component Analysis (TCA) and Joint Distribution Adaptation (JDA); example weight-based methods such as the Transfer Joint Matching method (TJM) and the Coupled Knowledge Transfer method (CKET) can also be employed.

In order to overcome the data distribution difference of the source domain and the target domain, most of the current unsupervised domain adaptation methods for processing image classification introduce edge distribution matching and conditional distribution matching. After the sample data is subjected to edge distribution matching and condition distribution matching, the sample points of the same category in different domains are distributed and evacuated, that is, the samples in the classes in different domains are distributed and evacuated, and the consistency and the continuity are poor. The distributed and sparse sample clusters of the same category can greatly reduce the classification performance of the model on the target domain samples.

Disclosure of Invention

The embodiment of the application provides a cross-domain image classification method, a device, electronic equipment and a computer readable storage medium based on class consistency structured learning, which can solve the problem that the classification performance of a model on a target domain sample is low due to poor consistency and continuity of an intra-class sample in the existing unsupervised domain adaptation method.

In a first aspect, an embodiment of the present application provides a cross-domain image classification method based on class consistency structured learning, including:

acquiring a source domain data set and a target domain data set, wherein the source domain data set comprises source domain sample images and labels of the source domain sample images, and the target domain data set comprises target domain sample images;

obtaining an initialization pseudo label of each target domain sample image based on a first image classifier trained by using a source domain data set;

projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, initializing according to the same type of source domain sample points and target domain sample points based on the initialized pseudo labels and labels, and obtaining a projection matrix, a graph matrix, a type weight matrix, a Laplace matrix of the source domain and a Laplace matrix of the target domain;

updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;

performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data;

classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data;

when the cycle times do not reach the preset times, respectively updating the graph matrix, the category weight matrix and the Laplacian matrix of the target domain according to the labels and the pseudo labels to obtain an updated graph matrix, an updated category weight matrix and an updated Laplacian matrix of the target domain, and returning to update the projection matrix according to the graph matrix, the category weight matrix, the Laplacian matrix of the source domain and the Laplacian matrix of the target domain to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplacian matrix of the target domain;

when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label;

wherein, according to the graph matrix, the category weight matrix, the laplacian matrix of the source domain and the laplacian matrix of the target domain, updating the projection matrix to obtain an updated projection matrix, including:

by the formula (X.OMEGA.X)^T+βI_m)P＝XHX^TP phi, obtaining an updated projection matrix; beta is a hyperparameter;

wherein X ═ X_s，X_t]∈R^m×n，X_sFor the source domain data set, X_tFor the target domain data set, P is the projection matrix, I_m∈R^m ^×mIs an identity matrix of dimension m, R^m×mRepresenting a real space of dimensions m x m, R^m×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions;

Φ＝diag(Φ₁，Φ₂，...，Φ_d)∈R^d×da diagonal matrix whose diagonal elements are lagrange multipliers;

alpha, eta and gamma are all hyper-parameters; m_cIs a matrix of the class weights,

v⁽ⁱ⁾to contain z_t，iFront of

A set of target domain sample points of the nearest neighbor point,

δ∈[0，1]is a pre-set abutment factor and is,

is a rounded up symbol;

representing sample points

And sample point

The same as the category label of (1);

representing sample points

The category label of (a) is set,

representing sample points

The category label of (1), wherein,

is that

Is determined by the point of the neighborhood of the point,

α_i、α_ja weight coefficient representing each sample in the target domain; z_tRepresenting the form of the target domain data in projection space, Z_i＝P^TX_t，z_t，iRepresents Z_tThe ith data in (1);

respectively a source domain sample point and a target domain sample point;

the source domain sample points and the target domain sample points with the category of c;

L_slaplace matrix, L, for the source domain_tA Laplace matrix as a target domain; g is a matrix of the graph,

and is

And is

With a representation dimension of n_s×n_tReal number space of, n_sAnd n_tRepresenting a spatial dimension;

，z_i＝P^Tx_i，z_j＝P^Tx_jas projected sample points, y (z)_i)，y(z_j) Labels for the projected sample points;

u⁽ⁱ⁾to comprise

The target domain sample point set of k nearest neighbors,

k is the preset number of nearest neighbors, C ∈ {1, 2., C } is the number of categories common to the source domain and the target domain,

and

the number of samples of the source domain and target domain class c respectively,

respectively a sample point with the category c in the source domain and the target domain; n is n_s+n_t；

I_n，I_dIdentity matrices of dimensions n and d, 1_n×nIs a square matrix with elements of 1.

Therefore, the laplacian matrix from the source domain to the target domain is learned to provide consistency and continuity of the samples in the class between the domains, so that the classification performance of the model on the samples in the target domain is further improved, and the knowledge transfer from the source domain to the target domain is promoted.

In some possible implementation manners of the first aspect, updating the graph matrix, the category weight matrix, and the laplacian matrix of the target domain according to the label and the pseudo label, respectively, to obtain an updated graph matrix, an updated category weight matrix, and an updated laplacian matrix of the target domain, includes: by the formula

Calculating to obtain an updated graph matrix according to the label and the pseudo label; by the formula

Calculating to obtain an updated category weight matrix according to the label and the pseudo label; by the formula

And calculating to obtain an updated Laplace matrix of the target domain according to the label and the pseudo label.

In some possible implementation manners of the first aspect, classifying the projected target domain sample data based on a second image classifier trained using the labeled projected source domain sample data to obtain a pseudo label of the projected target domain sample data includes:

training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier;

and classifying the projected target domain sample data by using a second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is a pseudo label.

In some possible implementations of the first aspect, obtaining an initialization pseudo label for each target domain sample image based on a first image classifier trained using a source domain dataset includes:

training an image classifier by using a source domain data set to obtain a trained first image classifier;

and classifying the sample images of each target domain by using the first image classifier to obtain a second classification result of the sample images of each target domain, wherein the second classification result is an initialization pseudo label.

In a second aspect, an embodiment of the present application provides a cross-domain image classification device based on class consistency structured learning, including:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a source domain data set and a target domain data set, the source domain data set comprises a source domain sample image and a label of the source domain sample image, and the target domain data set comprises a target domain sample image;

the pseudo label initialization module is used for obtaining initialization pseudo labels of all target domain sample images based on a first image classifier trained by using a source domain data set;

the initialization module is used for projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialization pseudo labels and the initialization labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplacian matrix of the source domain and a Laplacian matrix of the target domain;

the projection matrix updating module is used for updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;

the projection module is used for performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data;

the pseudo label updating module is used for classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data;

the circulation module is used for respectively updating the graph matrix, the category weight matrix and the Laplace matrix of the target domain according to the label and the pseudo label when the circulation frequency does not reach the preset frequency to obtain an updated graph matrix, an updated category weight matrix and an updated Laplace matrix of the target domain, and returning to the Laplace matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to update the projection matrix to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplace matrix of the target domain; when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label;

wherein, the projection matrix updating module is specifically configured to:

Φ＝diag(Φ₁Φ₂，...，Φ_d)∈R^d×da diagonal matrix whose diagonal elements are lagrange multipliers;

v⁽ⁱ⁾to contain z_t，iFront of

A set of target domain sample points of the nearest neighbor point,

δ∈[0，1]is a pre-set abutment factor and is,

is a rounded up symbol;

representing sample points

And sample point

The same as the category label of (1);

representing sample points

The category label of (a) is set,

representing sample points

The category label of (1), wherein,

is that

Is determined by the point of the neighborhood of the point,

α_i、α_ja weight coefficient representing each sample in the target domain; z_tRepresenting the form of the target domain data in projection space, Z_t＝P^TX_t，z_t，iRepresents Z_tThe ith data in (1);

respectively a source domain sample point and a target domain sample point;

and is

And is

u⁽ⁱ⁾to comprise

The target domain sample point set of k nearest neighbors,

and

respectively a sample point with the category c in the source domain and the target domain; n is n_sn_t；

I_n，I_d

Identity matrices of dimensions n and d, 1_n×nIs a square matrix with elements of 1.

In some possible implementations of the second aspect, the loop module is specifically configured to: by the formula

In some possible implementations of the second aspect, the pseudo tag updating module is specifically configured to:

In some possible implementations of the second aspect, the pseudo tag initialization module is specifically configured to:

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method according to any one of the first aspect is implemented.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program is executed by a processor to implement the method according to any one of the above first aspects.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on an electronic device, causes the electronic device to perform the method of any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

Fig. 1 is a schematic flowchart of a cross-domain image classification method based on class consistency structured learning according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of another flowchart of a cross-domain image classification method based on class consistency structure learning according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a cross-domain image classification device based on class consistency structure learning according to an embodiment of the present application;

fig. 4 is a block diagram schematically illustrating a structure of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

At present, most unsupervised domain adaptation methods do not consider the consistency and continuity of similar samples among the fields, so that the situations of fuzzy inter-class distribution boundary and sparse intra-class sample distribution occur after the convergence through a plurality of times of projection learning.

In order to improve the consistency and continuity of the intra-Class samples, the embodiment of the present application provides a Cross-Domain image classification method (ccs-Domain Class-Wise Structure Learning, CCSL) based on Class consistency Structure Learning, so as to promote the knowledge migration from a source Domain to a target Domain by Learning a laplacian matrix from the source Domain to the target Domain on the basis of a CKET algorithm.

In addition, considering that the pseudo label learned by the target domain has certain uncertainty, the embodiment of the application provides a pseudo label credibility mechanism based on the sample distribution characteristics so as to improve the credibility of the pseudo label of the target domain and further reduce the risk of negative knowledge migration from the source domain to the target domain.

Referring to fig. 1, a flowchart of a cross-domain image classification method based on class consistency structured learning according to an embodiment of the present application is shown, where the method includes the following steps:

step S101, a source domain data set and a target domain data set are obtained, wherein the source domain data set comprises source domain sample images and labels of the source domain sample images, and the target domain data set comprises target domain sample images.

It is to be understood that the source domain data set is labeled data and the target domain data set is unlabeled data. The label in the source domain sample image may characterize the class to which the image belongs.

Illustratively, the source domain data set and the target domain data set may use existing public data sets, such as Office-caltech (surf), COIL20, PIE, and the like. Where Office-caltech (surf) and COIL20 are data sets for item recognition and PIE is a data set for face gesture recognition.

Taking the Office-Caltech (SURF) dataset as an example, Office-Caltech (SURF) includes four subdata sets, C (Caltech), A (Amazon), W (Webcam), and D (DSLR), respectively. Each sub data set comprises a different number of pictures with a common number of categories of 10 between the 4 sub data sets. Randomly selecting one data set from the 4 data subsets as the source domain data set, and selecting one data set from the remaining 3 data subsets as the target domain data set. For example, a Caltech subdata set is selected as a source domain data set, and an Amazon subdata set is selected as a target domain data set. The Caltech subdata set comprises 1123 pictures, the Amazon subdata set comprises 958 pictures, and each picture is compressed and extracted into 800-dimensional column directionMeasuring and finally obtaining a source domain data matrix X_s∈R^800×1123Target field data matrix X_t∈R^800×958The label information corresponding to the source field is the real category information of each picture, so that the label Y of the sample image of the source field_s∈R^1123×1，Y_sEach specific element y in_sThe label information representing the sample has ten categories, so y_s∈{1，2，...10}。

In addition, some relevant parameters need to be acquired. The relevant parameters may include α, β, η, γ, δ, projection subspace dimension d, number of neighbors k, and number of iterations T. Illustratively, k is 5 and T is 10.

After obtaining the relevant parameters, the optimal values of the parameters α, β, γ may be found according to Grid Search lookup, where η is 0.5, for example.

In a specific application, corresponding parameter values can be input according to different selected source domain data sets and target domain data sets. For example, the subspace dimension d of Office-Caltech10 is 10, δ is 0.1; COIL20 corresponds to d ═ 20 and δ ═ 0.05; and d is 100 and delta is 0.25 corresponding to the PIE.

Step S102, obtaining initialization pseudo labels of all target domain sample images based on a first image classifier trained by using a source domain data set.

After obtaining the target domain data set, the source domain data set, and the associated parameters, the pseudo label y of the target domain sample image may be initialized in the original space_t，i(1≤i≤n_t). Wherein the original space is relative to the projection space. Illustratively, the initialization pseudo label of the target domain sample image may be obtained by the following process: the image classifier may be trained using the labeled source domain sample images to obtain a trained first image classifier. The image classifier may be exemplified by a K-neighbor classifier. After the trained first image classifier is obtained, classifying each target domain sample image by using the first image classifier to obtain a classification result of each target domain sample image, and taking the classification result of the target domain sample image as the target domain sample imageThe classification result of the target domain sample image is used for representing the belonged category of the image.

Step S103, projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialized pseudo labels and the labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplacian matrix of the source domain and a Laplacian matrix of the target domain.

In specific application, each sample image in the source domain data set and each sample image in the target domain data set are projected to the same public space, so that a source domain sample point corresponding to each source domain sample image and a target domain sample point corresponding to each target domain sample image are obtained. That is, the source domain sample points are sample points at which the source domain sample images are projected into the common space, and the target domain sample points are sample points at which the target domain sample images are projected into the common space.

It will be appreciated that the source domain exemplar points in the common space have corresponding labels, and that the labels of the source domain exemplar points in the common space are the same as the labels of the source domain exemplar images in the original space. Similarly, the target domain sample point also corresponds to a corresponding pseudo label, and is the same as the pseudo label of the target domain sample image in the original space. Based on the above, after projection, the category to which each source domain sample point belongs can be determined according to the label, the category to which each target domain sample point belongs can be determined according to the pseudo label, and further the target domain sample point and the source domain sample point which belong to the same category can be determined.

From the target domain sample points and the source domain sample points that belong to the same class, an initialized projection matrix P ═ I can be obtained_mR^m×m. Then, according to the information of the initialized projection matrix, the label, the pseudo label and the like, the formula is used

And calculating to obtain an initialized graph matrix G. Wherein, the formula

The acquisition process of (a) may be as follows:

firstly, in order to enhance the classification performance of the target domain image by enhancing the compactness of the sample clusters of the same category, the following mathematical expression can be obtained:

P∈R^m×dc ∈ {1, 2., C } is the number of classes common to the source and target domains,

and

sample points of class c for the source domain and the target domain, respectively.

To reassign the weights to the target domain samples based on the sample distribution characteristics,

the assignment rule is determined by the following trust mechanism: and when the projected target domain sample point is positioned in the first k nearest adjacent points of the projected source domain sample point, assigning 1, otherwise, assigning 0. Expressed as:

u⁽ⁱ⁾is composed of

Of k nearest neighbors, wherein

k is the preset nearest neighbor number.

Based on this, the above equation 1 can be rewritten as the following equation:

X＝[X_s，X_t]∈R^m×nfor the original source and target domain data sets, X_sFor the source domain data set, X_tFor the target domain data set, n ═ n_s+n_t，

，z_i＝P^Tx_i，z_j＝P^Tx_jAs projected sample points, y (z)_i)，y(z_j) Respectively the label values of the sample points,

respectively two diagonal matrices

After the initialized projection matrix is obtained, the information such as the initialized projection matrix, the label and the pseudo label is processed by a formula

Calculating to obtain an initialized class weight matrix M_c。

And, based on the information such as the initialized projection matrix, the label and the pseudo label, the formula is used

Calculating to obtain an initialized Laplace matrix L of the source domain_sAnd the Laplace matrix L of the target domain_t。

Wherein, the formula

And

the acquisition process of (a) may be as follows:

firstly, in order to realize global matching of a source domain and a target domain, edge distribution matching and conditional distribution matching are introduced, and a specific expression is as follows:

wherein the content of the first and second substances,

；_xi，x_jsubscripts for source domain and target domain sample points, respectively;

respectively a source domain sample point and a target domain sample point；

Respectively, a source domain sample point and a target domain sample point of class c.

Then, adding local manifold learning in the source domain and the target domain respectively, wherein the specific expression is as follows:

wherein the content of the first and second substances,

to ensure a more compact distribution of sample points of the same class, the weighting coefficients for the target domain samples are reassigned using the following equation:

δ∈[0，1]is a pre-set abutment factor and is,

to round up the symbol, v⁽ⁱ⁾To contain z_t，iFront of

A set of target domain sample points of the nearest neighbor point.

Based on this, the weight coefficients for the target domain sample points may be defined as follows:

up to this point, the above equations (3) and (4) may be rewritten as the following expressions, respectively:

wherein the content of the first and second substances,

1≤c≤C，M₀the assignment is as in equation (3) above.

Wherein the content of the first and second substances,

y(x_t，i)＝y(x_t，j)∧α_iα_j＝1，L_s，L_tlaplacian matrices for the source domain and the target domain respectively,

as a diagonal matrix

L_sThe solution process is similar.

The initialized target domain pseudo label, the initialized graph matrix G and the initialized category weight matrix M can be obtained through the steps_cLaplace matrix L of initialized source domain_sAnd the Laplace matrix L of the initialized target domain_t。

And step S104, updating the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix.

Obtaining initialized graph matrix G and category weight matrix M_cSource domain laplacian matrix L_sAnd the Laplace matrix L of the target domain_tThen, by the formula (X Ω X)^T+βI_m)P＝XHX^TP Φ, the updated projection matrix P is calculated.

Wherein, I_m∈R^m×mIs an identity matrix with dimension m, phi ═ diag (phi)₁Φ2，...，Φ_d)∈R^d×dA diagonal matrix whose diagonal elements are lagrange multipliers;

equation (X Ω X)^T+βI_m)P＝XHX^TThe acquisition process of P Φ can be as follows:

first, the above formula (2), formula (7) and formula (8) are combined, and a regularization term is added

The final objective function of CCSL can be obtained:

s.t.P^TXHX^TP＝I_d

(9)

wherein the constraint term is derived by Principal Component Analysis (PCA) to maximize data variance,

I_n，I_drespectively identity matrices of dimensions n and d,

the matrix is a square matrix with elements of 1, and alpha, beta, eta and gamma are 4 preset hyper-parameters.

The above equation (9) is a nonlinear function, so the solution is performed using a lagrange multiplier, and the lagrange function is as follows:

finally, taking the derivative of P of equation (10) and making it 0, the following equation can be obtained:

(XΩX^T+βI_m)P＝XHX^TPΦ (11)

and selecting the first d minimum eigenvectors to form a projection matrix P.

That is, based on the above equation (11), the matrix G and the matrix M are determined_cSource domain laplacian matrix L_sAnd the Laplace matrix L of the target domain_tAnd calculating a projection matrix P, wherein the projection matrix P is the updated projection matrix.

And S105, performing projection learning on the source domain data set and the target domain data set by using the updated projection matrix to obtain projected source domain sample data and projected target domain sample data.

After obtaining the updated projection matrix P according to the above equation (11), respectively projecting each source domain sample image in the source domain data set and each target domain sample image in the target domain data set to the same public space by using the updated projection matrix P, and obtaining the projected source domain sample data Z_s＝P^T*X_sAnd projected target domain sample data Z_t＝P^T*Xt。

And S106, classifying the projected target domain sample data based on a second image classifier trained by using the projected source domain sample data with the label to obtain a pseudo label of the projected target domain sample data.

In some embodiments, tagged projected source domain sample data Z is used_s＝P^T*X_sAnd training the image classifier to obtain a second trained image classifier. The image classifier may be exemplified by a K-neighbor classifier. Then, a second image classifier is used for sampling data Z of the projected target domain_t＝P^T*X_tAnd classifying to obtain a classification result of each projected target domain sample data, and taking the classification result as a pseudo label of the projected target domain sample data. That is to say that the first and second electrodes,and updating the initialized pseudo label to obtain an updated target domain pseudo label.

And step S107, judging whether the circulation frequency reaches the preset frequency, if not, entering step S108, otherwise, entering step S109 if the circulation frequency reaches the preset frequency.

In a specific application, the steps S104 to S108 are executed in a loop, that is, the projection matrix is continuously updated, and the updated projection matrix is used to update the pseudo tag of the target domain. If the cumulative cycle number at the current time is greater than or equal to the preset number after the step S106 is executed, step S109 is performed, that is, the pseudo label calculated in the step S106 in the current cycle process is used as the final image classification result, and the updated projection matrix obtained in the step S104 in the current cycle process is output; otherwise, if the cumulative cycle count at the current time does not reach the preset count, the process goes to step S108, and then returns to step S104.

And S108, respectively updating the graph matrix, the category weight matrix and the Laplace matrix of the target domain according to the labels and the pseudo labels to obtain an updated graph matrix, an updated category weight matrix and an updated Laplace matrix of the target domain, and returning to the S104.

In specific application, the matrix G may be updated according to the above formula (2) and the matrix M may be updated according to the above formula (7) according to information such as tags and pseudo tags_cThe Laplace matrix L of the target domain is updated according to the above equation (8)_t. Then according to the Laplace matrix L of the source domain_sUpdated matrix G and matrix M obtained in step S108_cAnd the Laplace matrix L of the target domain_tAnd updating the projection matrix to obtain the updated projection matrix. Then, using the updated projection matrix to perform projection learning on each data in the source domain data set and the target domain data set to obtain projected source domain sample data and projected target domain sample data, using the projected source domain sample data to train an image classifier to obtain a trained image classifier, and using the trained image classifier to perform projection learning on the projected target domain sampleThe data is classified, and the pseudo label of the projected target domain sample data is obtained. Finally, judging whether the current accumulated cycle number reaches a preset number, and if the current accumulated cycle number reaches the preset number, taking the pseudo label as a final image classification result; if the cycle times do not reach the preset times, updating the matrix G and the matrix M_cAnd the Laplace matrix L of the target domain_t. And circulating in sequence until the circulation times reach the preset times.

And step S109, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label.

It is to be understood that the pseudo label of each target domain sample image may characterize the category to which the target domain sample image belongs. In other embodiments, the projection matrix P may be output in addition to the pseudo-tags.

It should be noted that while the CKET learns the projection matrix, the weights of the target domain samples are redistributed according to the sparsity of the distribution of the target domain samples in the projection space. On the basis of CKET, the Laplace matrix from the source domain to the target domain in the same category is learned, so that the consistency and continuity of the intra-class samples between the domains are further improved, and further the knowledge transfer from the source domain to the target domain is improved.

In order to better describe the CCSL method provided in the embodiment of the present application, it needs to be described with reference to another schematic flow diagram of the class consistency structure learning-based cross-domain image classification method shown in fig. 2. At this time, the image classifier is specifically a K-nearest neighbor classifier, and the preset number of times is 10.

As shown in fig. 2, the number of initialization cycles t is 1, and corresponding parameter values of parameters α, β, η, γ, δ, etc., as well as the projection subspace dimension d and the number of tie points k are input according to different data sets. Namely, corresponding parameter values are input according to the difference between the selected source domain data set and the selected target domain data set.

And then training the K neighbor classifier by using the source domain data set to obtain the trained K neighbor classifier.

Next, each sample in the target domain dataset is classified using the trained K-nearest neighbor classifier to initialize a pseudo label for each target domain sample image. And calculating a graph matrix G according to the above expression (2), and calculating a category weight matrix M according to the above expression (7)_cCalculating Laplace matrix L of the target domain according to the above equation (8)_tLaplace matrix L of sum source domain_sTo matrix G, matrix M_cLaplace matrix L of the target domain_tAnd a Laplace matrix L of the source domain_sInitialization is performed.

Then, the projection matrix P is calculated according to the above equation (11) to update the initialized projection matrix, and an updated projection matrix is obtained. And performing projection learning on the source domain data and the target domain data by using the updated projection matrix to obtain projected source domain data Z_s＝P^T*X_sAnd projected target domain data Z_t＝P^T*X_t. Reusing projected source domain data Z_s＝P^T*X_sTraining the K nearest neighbor classifier, obtaining the trained K nearest neighbor classifier, and using the trained K nearest neighbor classifier to project the target domain data Z_t＝P^T*X_tAnd classifying to obtain a classification result, and taking the classification result as a target domain pseudo label.

Judging whether the cycle times t is less than or equal to 10, if so, returning to the calculation of the matrix G according to the formula (2), and calculating the matrix M according to the formula (7)_cUpdating the Laplace matrix L of the target domain according to the above equation (8)_tLaplace matrix L of sum source domain_sIf not, outputting a target domain pseudo label and a projection matrix P, wherein the target domain pseudo label is a final image classification result.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the method for classifying cross-domain images based on class consistency structure learning described in the foregoing embodiments, fig. 3 shows a block diagram of a cross-domain image classification device based on class consistency structure learning provided in an embodiment of the present application, and for convenience of explanation, only the relevant parts of the embodiment of the present application are shown.

Referring to fig. 3, the apparatus includes:

an obtaining module 31, configured to obtain a source domain data set and a target domain data set, where the source domain data set includes a source domain sample image and a label of the source domain sample image, and the target domain data set includes a target domain sample image;

a pseudo label initialization module 32, configured to obtain an initialized pseudo label of each target domain sample image based on a first image classifier trained using the source domain data set;

the initialization module 33 is configured to project the source domain data set and the target domain data set to the same public space, obtain source domain sample points and target domain sample points, and initialize according to the same type of source domain sample points and target domain sample points based on the initialization pseudo tag and the initialization label, so as to obtain a projection matrix, a graph matrix, a type weight matrix, a laplacian matrix of the source domain, and a laplacian matrix of the target domain;

a projection matrix updating module 34, configured to update the projection matrix according to the graph matrix, the category weight matrix, the laplacian matrix of the source domain, and the laplacian matrix of the target domain, so as to obtain an updated projection matrix;

the projection module 35 is configured to perform projection learning on the source domain data set and the target domain data set by using the updated projection matrix, and obtain projected source domain sample data and projected target domain sample data;

a pseudo label updating module 36, configured to classify the projected target domain sample data based on a second image classifier trained using the projected source domain sample data with a label, and obtain a pseudo label of the projected target domain sample data;

the circulation module 37 is configured to update the graph matrix, the category weight matrix, and the laplacian matrix of the target domain according to the label and the pseudo label when the circulation frequency does not reach the preset frequency, to obtain an updated graph matrix, an updated category weight matrix, and an updated laplacian matrix of the target domain, and to return to the laplacian matrix of the source domain and the laplacian matrix of the target domain according to the graph matrix, the category weight matrix, the laplacian matrix of the source domain, and the laplacian matrix of the target domain based on the updated graph matrix, the updated category weight matrix, and the updated laplacian matrix of the target domain, to update the projection matrix, and to obtain an updated projection matrix; when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is a pseudo label;

wherein, the projection matrix updating module is specifically configured to: by the formula (X.OMEGA.X)^T+βI_m)P＝XHX^TP phi obtains an updated projection matrix; beta is a hyperparameter;

wherein X ═ X_s,X_t]∈Rm×n，X_sFor the source domain data set, X_tFor the target domain data set, P is the projection matrix, I_m∈R^m×mIs an identity matrix of dimension m, R^m×mRepresenting a real space of dimensions m x m, R^m×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions; Φ to diag (Φ)₁Φ2，...，Φ_d)∈R^d×dA diagonal matrix whose diagonal elements are lagrange multipliers;

v⁽ⁱ⁾to contain z_t，iFront of

A set of target domain sample points of the nearest neighbor point,

δ∈[0，1]is a pre-set abutment factor and is,

is a rounded up symbol;

representing sample points

And sample point

The same as the category label of (1);

representing sample points

The category label of (a) is set,

representing sample points

The category label of (1), wherein,

is that

Is determined by the point of the neighborhood of the point,

α_i、α_ja weight coefficient representing each sample in the target domain; z_tIndicating that the target domain data is in the projection spaceOf the intermediate form, Z_t＝P^TX_t，z_t，iRepresents Z_tThe ith data in (1);

respectively a source domain sample point and a target domain sample point;

and is

And is

u⁽ⁱ⁾to comprise

The target domain sample point set of k nearest neighbors,

and

In some possible implementations, the loop module is specifically configured to: by the formula

In some possible implementations, the pseudo tag update module is specifically configured to: training an image classifier by using the projected source domain sample data with the label to obtain a second image classifier; and classifying the projected target domain sample data by using a second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is a pseudo label.

In some possible implementations, the pseudo tag initialization module is specifically configured to: training an image classifier by using a source domain data set to obtain a trained first image classifier; and classifying the sample images of each target domain by using the first image classifier to obtain a second classification result of the sample images of each target domain, wherein the second classification result is an initialization pseudo label.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the method embodiment in the embodiment of the present application, which may be referred to in the method embodiment section specifically, and are not described herein again.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: at least one processor 40 (only one shown in fig. 4), a memory 41, and a computer program 42 stored in the memory 41 and executable on the at least one processor 40, the processor 40 implementing the steps in any of the various object tracking method embodiments described above when executing the computer program 42.

The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The electronic device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of the electronic device 4, and does not constitute a limitation of the electronic device 4, and may include more or less components than those shown, or combine some of the components, or different components, such as an input-output device, a network access device, etc.

The Processor 40 may be a Central Processing Unit (CPU), and the Processor 40 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 41 may in some embodiments be an internal storage unit of the electronic device 4, such as a hard disk or a memory of the electronic device 4. The memory 41 may also be an external storage device of the electronic device 4 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the electronic device 4. The memory 41 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 41 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present application further provides an electronic device, including: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on an electronic device, enables the electronic device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A cross-domain image classification method based on class consistency structured learning is characterized by comprising the following steps:

obtaining a source domain data set and a target domain data set, wherein the source domain data set comprises a source domain sample image and a label of the source domain sample image, and the target domain data set comprises a target domain sample image;

obtaining an initialized pseudo label of each target domain sample image based on a first image classifier trained by using the source domain data set;

projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialized pseudo labels and the labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplace matrix of a source domain and a Laplace matrix of a target domain;

updating the projection matrix according to the graph matrix, the class weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix;

when the cycle times do not reach the preset times, respectively updating the graph matrix, the category weight matrix and the Laplacian matrix of the target domain according to the label and the pseudo label to obtain an updated graph matrix, an updated category weight matrix and an updated Laplacian matrix of the target domain, and returning to update the projection matrix according to the graph matrix, the category weight matrix, the Laplacian matrix of the source domain and the Laplacian matrix of the target domain to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplacian matrix of the target domain;

when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is the pseudo label;

wherein, according to the graph matrix, the class weight matrix, the laplacian matrix of the source domain, and the laplacian matrix of the target domain, updating the projection matrix to obtain an updated projection matrix, includes: by the formula (X.OMEGA.X)^T+βI_m)P＝XHX^TP phi obtains the updated projection matrix; beta is a hyperparameter;

wherein X ═ X_s，X_t]∈R^m×n，X_sFor the source domain data set, X_tFor the target domain data set, P is the projection matrix, I_m∈R^m×mIs an identity matrix of dimension m, R^m×mWith a representation dimension of m x mReal number space, R^m×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions;

alpha, eta and gamma are all hyper-parameters; m_cFor the said class weight matrix, the class weight matrix,

v⁽ⁱ⁾to contain z_t，iFront of

A set of target domain sample points of the nearest neighbor point,

δ∈[0，1]is a pre-set abutment factor and is,

is a rounded up symbol;

representing sample points

And sample point

The same as the category label of (1);

representing sample points

The category label of (a) is set,

representing sample points

The category label of (1), wherein,

is that

Is determined by the point of the neighborhood of the point,

respectively a source domain sample point and a target domain sample point;

L_sis the Laplace matrix of the source domain, L_tA Laplace matrix for the target domain; g is the matrix of the graph,

and is

And is

u⁽ⁱ⁾to comprise

The target domain sample point set of k nearest neighbors,

and

2. The method of claim 1, wherein updating the graph matrix, the class weight matrix, and the target domain laplacian matrix according to the labels and the pseudo labels to obtain an updated graph matrix, an updated class weight matrix, and an updated target domain laplacian matrix respectively comprises:

by the formula

Calculating to obtain the updated graph matrix according to the label and the pseudo label;

by the formula

Calculating to obtain the updated category weight matrix according to the label and the pseudo label;

by the formula

And calculating to obtain the updated Laplace matrix of the target domain according to the label and the pseudo label.

3. The method of claim 1, wherein classifying the projected target domain sample data based on a second image classifier trained using the projected source domain sample data with the label to obtain a pseudo label for the projected target domain sample data comprises:

classifying the projected target domain sample data by using the second image classifier to obtain a first classification result of each projected target domain sample data, wherein the first classification result is the pseudo label.

4. The method of claim 1, wherein obtaining an initialized pseudo label for each of the target domain sample images based on a first image classifier trained using the source domain dataset comprises:

training an image classifier by using the source domain data set to obtain the trained first image classifier;

classifying each target domain sample image by using the first image classifier to obtain a second classification result of each target domain sample image, wherein the second classification result is the initialized pseudo label.

5. A cross-domain image classification device based on class consistency structured learning is characterized by comprising the following components:

an obtaining module, configured to obtain a source domain data set and a target domain data set, where the source domain data set includes a source domain sample image and a label of the source domain sample image, and the target domain data set includes a target domain sample image;

a pseudo label initialization module, configured to obtain an initialized pseudo label of each target domain sample image based on a first image classifier trained using the source domain data set;

the initialization module is used for projecting the source domain data set and the target domain data set to the same public space to obtain source domain sample points and target domain sample points, and initializing according to the source domain sample points and the target domain sample points of the same type based on the initialized pseudo labels and the labels to obtain a projection matrix, a graph matrix, a type weight matrix, a Laplace matrix of a source domain and a Laplace matrix of a target domain;

a pseudo label updating module, configured to classify the projected target domain sample data based on a second image classifier trained using the projected source domain sample data with the label, and obtain a pseudo label of the projected target domain sample data;

the circulation module is used for respectively updating the graph matrix, the category weight matrix and the Laplace matrix of the target domain according to the label and the pseudo label when the circulation frequency does not reach a preset frequency to obtain an updated graph matrix, an updated category weight matrix and an updated Laplace matrix of the target domain, and returning to update the projection matrix according to the graph matrix, the category weight matrix, the Laplace matrix of the source domain and the Laplace matrix of the target domain to obtain an updated projection matrix based on the updated graph matrix, the updated category weight matrix and the updated Laplace matrix of the target domain; when the cycle times reach the preset times, obtaining the classification result of each target domain sample image, wherein the classification result is the pseudo label;

wherein the projection matrix updating module is specifically configured to:

by the formula (X.OMEGA.X)^T+βI_m)P＝XHX^TP phi obtains the updated projection matrix; beta is a hyperparameter;

wherein X ═ X_s，X_t]∈R^m×n，X_sFor the source domain data set, X_tFor the target domain data set, P is the projection matrix, I_m∈R^m×mIs an identity matrix of dimension m, R^m×mRepresenting a real space of dimensions m x m, R^m×nRepresenting a real number space with dimension of m multiplied by n, wherein m and n respectively represent space dimensions; Φ to diag (Φ)₁，Φ₂，...，Φ_d)∈R^d×dA diagonal matrix whose diagonal elements are lagrange multipliers;

v⁽ⁱ⁾to contain z_t，iFront of

A set of target domain sample points of the nearest neighbor point,

δ∈[0，1]is a pre-set abutment factor and is,

is a rounded up symbol;

representing sample points

And sample point

The same as the category label of (1);

representing sample points

The category label of (a) is set,

representing sample points

The category label of (1), wherein,

is that

Is determined by the point of the neighborhood of the point,

respectively a source domain sample point and a target domain sample point;

and is

And is

u⁽ⁱ⁾to comprise

The target domain sample point set of k nearest neighbors,

and

samples of class c in the source and target domains, respectivelyPoint; n is n_s+n_t；

6. The apparatus of claim 5, wherein the circulation module is specifically configured to:

by the formula

by the formula

by the formula

7. The apparatus of claim 5, wherein the pseudo tag update module is specifically configured to:

8. The apparatus of claim 5, wherein the pseudo tag initialization module is specifically configured to:

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.