CN110020674B

CN110020674B - Cross-domain self-adaptive image classification method for improving local category discrimination

Info

Publication number: CN110020674B
Application number: CN201910190041.4A
Authority: CN
Inventors: 宋士吉; 陈一鸣
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-03-13
Filing date: 2019-03-13
Publication date: 2021-01-29
Anticipated expiration: 2039-03-13
Also published as: CN110020674A

Abstract

The invention relates to a cross-domain self-adaptive image classification method for improving local category discrimination, and belongs to the technical field of image processing. According to the cross-domain self-adaptive image classification method, when an image content classification model is trained, images of a training set and a testing set are different in tone, angle, definition and the like, and data representing image information obeys different probability distribution. The method has the advantages that the distributed and shared characteristics of the two images are learned, common points among the images with the same content and differences among the images with different contents are mined in other images with similar styles of each image, the aggregation of the images with the same content is stronger, the mutual interference among the images with different contents is reduced, the local category distinguishing degree of the image data set is improved, and the image classification accuracy is improved.

Description

Cross-domain self-adaptive image classification method for improving local category discrimination

Technical Field

The invention relates to a cross-domain self-adaptive image classification method for improving local category discrimination, and belongs to the technical field of image processing.

Background

In a general image classification problem, training set and test set images generally exhibit the same style. These images may be captured or drawn collectively in the same environment by the same equipment in a short time, or collected and collated by the collector intentionally according to some criteria. After the images are subjected to the data processing, the data obey the same probability distribution. In this case, the classifier is trained by using known image content classification information (i.e., labels) in the training set, so that the image content in the test set can be accurately identified and classified. However, when solving the actual problem, the image set involved in the problem, i.e., the target domain data set, often has insufficient content tags that can be used directly because of problems such as tag production cost. To cope with this, another set of labeled image sets with a style not identical to the target image style may be used as the training set, i.e., the source domain data set. The images of the source domain and the target domain are subjected to different probability distributions after being digitized. In this case, the classifier of the training set (source domain) cannot be directly used for classification of the test set (target domain) data. The core of the cross-domain image classification problem is to solve the problem that the image styles of the training set and the test set are inconsistent.

In recent years, researchers have proposed a variety of models and algorithms that enable cross-domain image classification. Since each image is composed of a large number of pixel points, the dimension of the image itself is often as high as 1e6, a commonly used method is to find a low-dimensional subspace, so that the images of the source domain and the target domain are subjected to the same or similar distribution after being subjected to dimension reduction mapping to the subspace, wherein the migration component analysis method proposed by Pan et al in 2011 and the joint distribution adaptation algorithm proposed by Long et al in 2013 are representative. The core idea is to search an optimal low-dimensional mapping to minimize the sample probability distribution difference in the low-dimensional space. The distance measurement method of probability distribution used in these two methods is the Maximum Mean Distance (MMD). The MMD is defined as the distance maximum of two expected values distributed under a certain function class mapping. And minimizing the MMD in the low-dimensional subspace to obtain the optimal low-dimensional invariant feature representation of the image, and eliminating the difference between the source domain image and the target domain image after dimension reduction.

Minimizing MMD, while having good results in terms of distribution alignment, can only numerically translate two different sets of images into the same distribution, according to the mathematical definition of MMD, without retaining sufficient information of the images themselves. This causes part of information that is useful for image classification to be lost in the low-dimensional subspace, resulting in a reduction in classification accuracy. Therefore, how to perfect an objective function in an algorithm based on MMD optimization is an important problem to protect key content information of an image while ensuring the cross-domain characteristics of the obtained features.

In the related algorithm of the classification problem, there are a lot of methods for improving the image classification degree of data distribution. Typical methods such as Linear Discriminant Analysis (LDA) simultaneously maximize the dispersion between data classes and minimize the dispersion within data classes in the form of rayleigh quotient, thereby realizing the aggregation of homogeneous data and the separation of heterogeneous data. Similarly, Yan et al, 2007, proposed an edge Fisher analysis (MFA) method that improves the class clustering operation of LDA on the global data distribution to be performed in the local neighborhood of each data point, i.e., maximizes the dispersion between each data point and the heterogeneous points in its neighborhood, and minimizes the dispersion between each data point and the homogeneous points in its neighborhood. Compared with LDA, MFA does not need the hypothesis that various data distributions obey Gaussian distribution, does not need prior information of the distribution, has better generalization capability, and can effectively solve the problem of data multimodal distribution. However, this method has not been applied to the problem of cross-domain adaptation of images.

Disclosure of Invention

The invention aims to provide a cross-domain self-adaptive image classification method for improving local category discrimination, which optimizes the local dispersion characteristics of a plurality of image samples and enhances the content discrimination of an image in a local adjacent range so as to be beneficial to the prediction of image content classification labels.

The invention provides a cross-domain self-adaptive image classification method for improving local category discrimination, which comprises the following steps:

(1) the method comprises the steps of scanning a plurality of images line by line, sequentially arranging pixels obtained by line scanning into column vectors according to a scanning sequence, and dividing the column vectors by Euclidean norms of the column vectors to obtain a plurality of image column vectors with Euclidean norms of 1;

(2) dividing a plurality of image column vectors obtained in the step (1) into a source domain training set { Z_S,Y_SAnd target domain test set Z_T}，

Wherein Z is_SIs a set of a plurality of image column vectors in a source domain training set, Y_SIs a set of content classification labels for a plurality of images in a source domain training set, n_SThe number of image column vectors in the training set for the source domain,

is Z_SThe ith image column vector, i.e., the ith image sample in the source domain training set,

is a content classification label for the ith image, i.e.

Representing an object described by an image, with dimension 1;

wherein Z is_TIs an objectSet of multiple image column vectors in a domain test set, n_TThe number of image column vectors in the test set for the target domain,

is Z_TA jth image column vector, namely a jth image sample in the target domain test set;

(3) respectively calculating a plurality of source domain training set samples of the step (2)

First column vector of

Wherein

K (·,) is a kernel function selected arbitrarily among a Gaussian kernel function, a hyperbolic tangent kernel function or a linear kernel function, and superscript T represents matrix transposition; using first column vectors respectively

Representing a plurality of images in a source domain training set and comparing the plurality of images

Sequentially arranged in rows to obtain a source domain training set matrix X_S(ii) a Respectively calculating a plurality of target domain samples of the step (2)

Second column vector of

Wherein

Using second column vectors respectively

Representing a plurality of images in a target domain test set and combining the plurality of images

Sequentially arranging the target domain test set matrixes according to rows to obtain a target domain test set matrix X_T(ii) a Training set matrix X according to source domain_SAnd a target domain test set matrix X_TObtaining a whole-body data set matrix X, X ═ X_S,X_T]；

(4) Setting a projection matrix A^TUsing projection matrix A^TPerforming linear mapping on the plurality of image column vectors obtained in the step (3), namely performing linear mapping on the plurality of image column vectors

And

respectively linear mapping to obtain projection column vector

And

(5) taking the projection column vector obtained after the linear mapping in the step (4) as an image data point sample, and establishing an optimization model of cross-domain self-adaptive image classification features, wherein an objective function of the optimization model comprises the following steps:

a. the square MMD of the maximum mean distance sample estimation value between the probability distribution of the image samples in the source domain training set and the probability distribution of the image samples in the target domain testing set²(S, T) is minimum:

wherein, Tr represents the trace of the matrix, i.e. the sum of diagonal elements of the matrix, M is the maximum mean distance matrix:

wherein 1 represents an all-1 matrix;

b. according to the types of the content classification labels in the step (2), enabling the square sum of the maximum mean distance sample estimation values between the sample probability distribution of each type of image samples in the source domain training set and the sample probability distribution of each type of image samples in the target domain testing set

To a minimum:

wherein C represents the number of image sample classes,

indicating that the data point is temporarily assigned at the current step

The prediction content classification label of (1) dimension,

representing the number of image samples in the source domain training set with a content classification label c,

representing the number of image samples with a current predicted content classification label of c in the target domain test set, M_cIs the maximum mean distance matrix of the image samples with content classification label c:

wherein e is_ScIs of length n_SIs a column vector composed of 0 and 1, e is a column vector composed of e when the content classification label of the corresponding image in the source domain training set is c_ScThe value of the element in (e) is 1, and when the content classification label of the corresponding image in the source domain training set is not c, e_ScThe value of the element in (A) is 0; e.g. of the type_TcIs of length n_TIs composed of 0 and 1, e_TcThe element value of (1) represents that the corresponding image in the target domain test set has the current prediction content classification label of c, e_TcThe value of the element in the target domain is 0, which indicates that the classification label of the corresponding image in the target domain test set in the current prediction content is not c;

c. minimizing the weighted square sum of the Euclidean distance between the projection column vectors of every two image samples with the same content label in the source domain training set and the Euclidean distance between the projection column vectors of every two image samples with the same content label in the target domain testing set:

in the first row of the above-mentioned formula,

and

the weighting coefficients for the distance between every two image samples with the same content label inside the source domain training set and the target domain test set respectively,

indicating if the image sample is

Is a sample of an image

Like k-neighbors of, then get η_ijIf the image sample is 1

Not of image samples

Like k-neighbors of, then get η_ijWhen the value of k is 0, the value of k is determined according to the precision of image processing; alpha is alpha_cIs a positive coefficient, alpha, associated with a class in the source domain training set_cIs determined according to the accuracy of the image processing, in one embodiment of the invention, α_cIs 0.01;

if the image sample

Is a sample of an image

Like k-neighbors of, then get η_klIf the image sample is 1

Not of image samples

Like k-neighbors of, then get η_kl＝0；β_cIs a positive coefficient, β, associated with the class in the target domain test set_cThe value of (a) is determined according to the precision of image processing;

in the first term of the second line of the above formula, W^SIs formed by weight coefficients

The weight matrix of the composition is formed,

is a diagonal matrix with diagonal elements in the diagonal matrix of

R^SIs an intra-class dispersion matrix of a source domain training set,

in the second term of the second line of the above formula, W^TIs formed by weight coefficients

The weight matrix of the composition is formed,

is a diagonal matrix with diagonal elements in the diagonal matrix of

R^TIs an intra-class dispersion matrix of the target domain,

the definition of the matrix R in the third row of the above equation is:

wherein 0 represents a matrix with elements all 0;

d. maximizing the weighted square sum of the Euclidean distance between the projection column vectors of every two image samples with different content labels in the source domain training set and the Euclidean distance between the projection column vectors of every two image samples with different content labels in the target domain testing set:

in the first row of the above-mentioned formula,

and

weighting coefficients for the distance between the image samples with different content labels inside the source domain training set and the target domain testing set respectively,

if the image sample

Is a sample of an image

Of different classes of neighbors, then

If the image sample

Not of image samples

Of different classes of neighbors, then

Indicating points

Is a point

The different kinds of neighboring points of (a),

indicating points

Is not a point

Different classes of neighbors of (1);

in the first term of the second line of the above formula, U^SIs formed by weight coefficients

The weight matrix of the composition is formed,

is a diagonal matrix having diagonal elements of

P^SIs an inter-class dispersion matrix of the source domain training set,

in the second term of the second row, U^TIs formed by weight coefficients

The weight matrix of the composition is formed,

is a diagonal matrix having diagonal elements of

P^TIs the inter-class dispersion matrix of the target domain test set,

the definition of matrix P in the third row of the above equation is:

e. making the projection matrix A in step (4)^TThe regularization term of (d) is minimum:

wherein,

is the sum of the squares of all elements in the matrix a, λ is a positive coefficient, and the value of λ is taken according to the image classification accuracy, and in one embodiment of the method, the value is 1;

according to the objective function, an optimization model of cross-domain self-adaptive image classification features is obtained as follows:

(6) and solving the optimization model of the cross-domain self-adaptive image classification characteristics, and initializing to obtain the following optimization model in the first iteration of solving the optimization model:

wherein I is an identity matrix, and the optimization model is solved by the following formula to obtain an intermediate optimal solution A^*：

Where Θ is the diagonal matrix and the diagonal elements of the diagonal matrix are matrices

Relative to the matrix

Solving the matrix for the generalized eigenvalues of

Relative to the matrix

N of (A) to (B)_S+n_TA sum of generalized eigenvalues and n_S+n_TN corresponding to each generalized eigenvalue_S+n_TA generalized characteristic column vector from n_S+n_TM minimum generalized eigenvalues are selected from the generalized eigenvalues and are arranged in the order from small to large, m generalized eigenvalue column vectors respectively corresponding to the m generalized eigenvalues obtained by selection are sequentially arranged in rows in the same order as the m generalized eigenvalues, and a matrix A is obtained^*，A^*Is a first intermediate optimal solution of the above-mentioned optimization model;

(7) according to the first intermediate optimal solution A obtained in the step (6)^*For the original image column vector

And

linear mapping is carried out to obtain the column vector of the image sample

And

using column vectors

Making training set, and aligning column vector by using nearest neighbor method

Predicting image content labels to obtain predicted content labels of a group of target domain test set samples

(8) Substituting the predicted content label obtained in the step (7) into the complete optimization model in the step (5), solving the optimization model, and solving the intermediate optimal solution A of the optimization model by using the following formula^*：

Relative to matrix XPX^TThe generalized eigenvalues of (a); solving the matrix

Relative to matrix XPX^TN of (A) to (B)_S+n_TA generalized eigenvalue and n_S+n_TN corresponding to each generalized eigenvalue_S+n_TA generalized characteristic column vector from n_S+n_TM minimum generalized eigenvalues are selected from the generalized eigenvalues and are arranged in the order from small to large, m generalized eigenvalue column vectors respectively corresponding to the m generalized eigenvalues obtained by selection are sequentially arranged in rows in the same order as the m generalized eigenvalues, and a projection matrix A is obtained^*，A^*Is a second intermediate optimal solution of the above-mentioned optimization model;

(9) replacing the first intermediate optimal solution in the step (7) with the second intermediate optimal solution obtained in the step (8), repeating the steps (7) and (8), and predicting the content label of the step (7) obtained by N times of circulation

Making a judgment if the predicted content label obtained in N cycles

If the two are identical, ending the iteration and labeling the predicted content obtained in the step (7) in the last iteration

As a prediction result, namely an image classification result, the cross-domain self-adaptive image classification for improving the local category discrimination is realized; if obtained in N cyclesContent measurement label

And (3) if the two solutions are not identical, returning to the step (7), replacing the first intermediate optimal solution in the step (7) with the second intermediate optimal solution obtained in the step (8), repeating the steps (7) and (8), determining the value of N according to the image classification precision, and in one embodiment of the method, setting the value of N to be 10.

The cross-domain self-adaptive image classification method for improving the local category discrimination, provided by the invention, has the characteristics and advantages that:

according to the cross-domain self-adaptive image classification method for improving the local category discrimination, when an image content classification model is trained, images of a training set and a testing set are different in aspects of tone, angle, definition and the like, and data representing image information obeys different probability distribution. The method has the advantages that the distributed and shared characteristics of the two images are learned, common points among the images with the same content and differences among the images with different contents are mined in other images with similar styles of each image, the images with the same content are enabled to be more highly aggregated, mutual interference among the images with different contents is reduced, and therefore the local category distinguishing degree of the image data set is improved, and the classifying accuracy is further improved.

In conclusion, the image classification method can optimize the local dispersion characteristics of a plurality of image samples, enhance the content discrimination of the image in the local neighbor range and facilitate the prediction of the image content classification label; and the predicted image content label is updated iteratively, so that the accuracy and the reliability of image content classification are improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The invention provides a cross-domain self-adaptive image classification method for improving local category discrimination, a flow chart of which is shown in figure 1, and the method comprises the following steps:

is a content classification label for the ith image, i.e.

Representing an object described by an image, with dimension 1;

wherein Z is_TIs a set of a plurality of image column vectors in a target domain test set, n_TThe number of image column vectors in the test set for the target domain,

is Z_TThe jth image column vector is the jth image sample in the target domain test set, and the content classification label in the target domain test set is unknown;

First column vector of

Wherein

Second column vector of

Wherein

Using second column vectors respectively

And

respectively linear mapping to obtain projection column vector

And

matrix A^TThe value is undetermined;

wherein 1 represents an all-1 matrix;

b. according to the types of the content classification labels in the step (2), the maximum average between the sample probability distribution of each type of image sample in the source domain training set and the sample probability distribution of each type of image sample in the target domain testing set is enabled to beSum of squares of the estimated values of the value distance samples

To a minimum:

wherein C represents the number of image sample classes,

indicating that the data point is temporarily assigned at the current step

The prediction content classification label of (1) dimension,

wherein e is_ScIs of length n_SIs a column vector composed of 0 and 1, e is a column vector composed of e when the content classification label of the corresponding image in the source domain training set is c_ScThe value of the element in (e) is 1, and when the content classification label of the corresponding image in the source domain training set is not c, e_ScThe value of the element in (A) is 0; e.g. of the type_TcIs of length n_TIs composed of 0 and 1, e_TcThe element value of (1) represents that the corresponding image in the target domain test set has the current prediction content classification label of c, e_TcThe value of the element in (1) is 0, which represents the target domain measurementThe classification label of the corresponding image in the trial set in the current prediction content is not c;

in the first row of the above-mentioned formula,

and

indicating if the image sample is

Is a sample of an image

Like k-neighbors of, then get η_ijIf the image sample is 1

Not of image samples

if the image sample

Is a sample of an image

Like k-neighbors of, then get η_klIf the image sample is 1

Not of image samples

Like k-neighbors of, then get η_kl＝0；β_cIs a positive coefficient, β, associated with the class in the target domain test set_cIs determined according to the accuracy of the image processing, in one embodiment of the invention, β_cIs 0.01;

The weight matrix of the composition is formed,

is a diagonal matrix with diagonal elements in the diagonal matrix of

R^SIs an intra-class dispersion matrix of a source domain training set,

The weight matrix of the composition is formed,

is a diagonal matrix with diagonal elements in the diagonal matrix of

R^TIs an intra-class dispersion matrix of the target domain,

the definition of the matrix R in the third row of the above equation is:

wherein 0 represents a matrix with elements all 0;

in the first row of the above-mentioned formula,

and

if the image sample

Is a sample of an image

Of different classes of neighbors, then

If the image sample

Not of image samples

Of different classes of neighbors, then

Indicating points

Is a point

The different kinds of neighboring points of (a),

indicating points

Is not a point

Different classes of neighbors of (1);

The weight matrix of the composition is formed,

is a diagonal matrix having diagonal elements of

P^SIs an inter-class dispersion matrix of the source domain training set,

in the second term of the second row, U^TIs formed by weight coefficients

The weight matrix of the composition is formed,

is a diagonal matrix having diagonal elements of

P^TIs the inter-class dispersion matrix of the target domain test set,

the definition of matrix P in the third row of the above equation is:

wherein,

Relative to the matrix

Solving the matrix for the generalized eigenvalues of

Relative to the matrix

And

linear mapping is carried out to obtain the column vector of the image sample

And

using column vectors

Relative to matrix XPX^TThe generalized eigenvalues of (a); solving the matrix

Making a judgment if the predicted content label obtained in N cycles

As a prediction result, namely an image classification result, the cross-domain self-adaptive image classification for improving the local category discrimination is realized; if predicted content label obtained in N cycles

Claims

1. A cross-domain self-adaptive image classification method for improving local category discrimination is characterized by comprising the following steps:

is a content classification label for the ith image, i.e.

Representing an object described by an image, with dimension 1;

First column vector of

Wherein

Second column vector of

Wherein

Using second column vectors respectively

To representThe target domain test collects a plurality of images and combines the plurality of images

And

respectively linear mapping to obtain projection column vector

And

wherein 1 represents an all-1 matrix;

To a minimum:

wherein C represents the number of image sample classes,

indicating that the data point is temporarily assigned at the current step

The prediction content classification label of (1) dimension,

in the first row of the above-mentioned formula,

and

the weight coefficients of the distance between every two image samples with the same content label in the source domain training set and the target domain testing set respectively,

η_ijif an image sample is represented by 1

Is a sample of an image

The same class of K-neighbors of (1), then eta is obtained_ijIf the image sample is 1

Not of image samples

The same class of K-neighbors of (1), then eta is obtained_ijWhen the value of K is 0, the value of K is determined according to the precision of image processing; alpha is alpha_cIs a positive coefficient, alpha, associated with a class in the source domain training set_cThe value of (a) is determined according to the precision of image processing;

if the image sample

Is a sample of an image

The same class of K-neighbors of (1), then eta is obtained_klIf the image sample is 1

Not of image samples

The same class of K-neighbors of (1), then eta is obtained_kl＝0；β_cIs a positive coefficient, β, associated with the class in the target domain test set_cThe value of (a) is determined according to the precision of image processing;

The weight matrix of the composition is formed,

is a diagonal matrix with diagonal elements in the diagonal matrix of

R^SIs an intra-class dispersion matrix of a source domain training set,

The weight matrix of the composition is formed,

is a diagonal matrix with diagonal elements in the diagonal matrix of

R^TIs an intra-class dispersion matrix of the target domain,

the definition of the matrix R in the third row of the above equation is:

wherein 0 represents a matrix with elements all 0;

in the first row of the above-mentioned formula,

and

the weight coefficients of the distance between every two image samples with different content labels in the source domain training set and the target domain testing set respectively,

if the image sample

Is a sample of an image

Of different classes of neighbors, then

If the image sample

Not of image samples

Of different classes of neighbors, then

Indicating points

Is a point

The different kinds of neighboring points of (a),

indicating points

Is not a point

Different classes of neighbors of (1);

The weight matrix of the composition is formed,

is a diagonal matrix having diagonal elements of

P^SIs an inter-class dispersion matrix of the source domain training set,

in the second term of the second row, U^TIs formed by weight coefficients

The weight matrix of the composition is formed,

is a diagonal matrix having diagonal elements of

P^TClass being target domain test setThe matrix of the inter-dispersion is,

the definition of matrix P in the third row of the above equation is:

wherein,

is the sum of squares of all elements in the matrix A, lambda is a positive coefficient, and the value of lambda is 1 according to the image classification precision;

Relative to the matrix

Solving the matrix for the generalized eigenvalues of

Relative to the matrix

(7) according to the intermediate optimal solution A obtained in the step (6)^*For the original image column vector

And

linear mapping is carried out to obtain the column vector of the image sample

And

using column vectors

Relative to matrix XPX^TThe generalized eigenvalues of (a); solving the matrix

Making a judgment if the predicted content label obtained in the last N cycles

And (4) if the image classification accuracy is not the same, returning to the step (7), replacing the first intermediate optimal solution in the step (7) with the second intermediate optimal solution obtained in the step (8), repeating the steps (7) and (8), and determining the value of N according to the image classification accuracy.