CN111639686B

CN111639686B - Semi-supervised classification method based on dimension weighting and visual angle feature consistency

Info

Publication number: CN111639686B
Application number: CN202010416737.7A
Authority: CN
Inventors: 聂飞平; 石少君; 王榕; 李学龙
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2020-05-17
Filing date: 2020-05-17
Publication date: 2022-03-15
Anticipated expiration: 2040-05-17
Also published as: CN111639686A

Abstract

The invention provides a semi-supervised classification method based on dimension weighting and visual angle feature consistency. Firstly, constructing a similarity matrix of each visual angle of multi-visual angle data by adopting a self-adaptive local structure learning method; then, taking an average value of the similarity matrixes of all the visual angles as an initial consistency similarity matrix, and constructing a multi-visual angle semi-supervised classification model based on dimension weighting and visual angle characteristic consistency; then, solving the model by adopting an alternative iteration updating method until a final label matrix is obtained; and finally, obtaining the label of the sample according to the label matrix, and finishing sample classification. The classification model constructed by the method combines the construction similarity matrix with the label inference, so that the influence of the composition quality on the classification result is reduced; and better classification results can be obtained due to the fact that the characteristic dimensions in the visual angle are weighted and the local structure information of the data is considered.

Description

Semi-supervised classification method based on dimension weighting and visual angle feature consistency

Technical Field

The invention belongs to the technical field of machine learning and data mining, and particularly relates to a semi-supervised classification method based on dimension weighting and view angle feature consistency.

Background

With the advent of the big data age, information in many real scenes can be obtained through different channels, different angles, different modalities and different features. For the multi-source data, how to efficiently and accurately fuse the information through a certain strategy to complete a specific task has important research significance in practical scenes.

Under the assumption that the multi-view data sets have "complementarity" and "consistency", multi-view learning refers to a method for describing a researched object from multiple angles and then integrating information of multiple angles for learning. Semi-supervised classification refers to training a classifier with a small number of labeled samples and unlabeled samples, and then using a learned classifier to infer the labels of the unlabeled samples. In an actual scene, a generally obtained multi-view data set has a small number of labels, and a large amount of manpower and material resources are consumed for labeling the data set. Therefore, it is of great research value to label unlabeled samples with a small number of labels in combination with information from multiple perspectives.

The traditional multi-view semi-supervised classification method is mainly divided into three categories: 1) performing collaborative training; 2) multi-view semi-supervised classification based on graphs; 3) regression-based multi-view semi-supervised classification. In graph-based semi-supervised classification, samples represent nodes of a graph, and the similarity between any two nodes represents the strength of an edge. Thus, the graph-based semi-supervised learning process is equivalent to a staining process. Nie et al in the references "F.Nie, J.Li and X.Li, Parameter-free auto-weighted multiple graph: A frame for multi-view clustering and semi-superimposed classification, in Proc.IEEE conf.IJCAI,2016, pp.1881-1887" first construct a similarity matrix and then infer the label of the unlabeled sample from the labeled sample information and the constructed similarity map. Yang et al, in the documents "M.Yang, C.Deng, and F.Nie, Adaptive-weighted differential regression for multi-view classification, Pattern Recognition, vol.88, pp.236-245,2019," classify multi-view datasets using the idea of Adaptive discriminant regression. Considering that there are a small number of labels in the real dataset, Tao et al established a Regression-based Semi-Supervised Classification model for each View in the documents "H.Tao, C.Hou, F.Nie, J.Zhu, and D.Yi, Scalable Multi-View Semi-Supervised Classification View Adaptive Regression, IEEE Transactions on Image Processing, vol.26, No.9, pp.4283-4296". In order to make the model robust to noise or outliers, the method also utilizes L_2,1And (4) norm. The model can adaptively assign view weights considering that each view has different importance to the classification result. However, it is not limited toThese regression-based concepts described above only consider the linear relationship between the samples and the labels, which is not satisfied with the nonlinear relationship. In graph-based semi-supervised classification, the quality of the constructed similarity graph will greatly affect the final classification result, and since the construction of the similarity graph and the label inference are treated as two separate steps, the relationship between the two is ignored. In addition, the methods only consider the difference of characteristics between the visual angles, and ignore the difference between dimensions in the visual angles, thereby ignoring the data local structure information. Therefore, the classification accuracy of these methods is affected.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a semi-supervised classification method based on dimension weighting and view angle feature consistency. Firstly, constructing a similarity matrix of each visual angle in multi-visual angle data by adopting a self-adaptive local structure learning method; then, taking an average value of the similarity matrixes of all the visual angles as an initial consistency similarity matrix, and constructing a multi-visual angle semi-supervised classification model based on dimension weighting and visual angle characteristic consistency; then, solving the model by adopting an alternative iteration updating method until a final label matrix is obtained; and finally, obtaining the label of the sample according to the label matrix, and finishing sample classification. The method of the invention simultaneously performs the learning of the similarity matrix and the propagation of the label, thereby reducing the dependence of the label propagation process on the quality of the similarity matrix. In addition, when the similarity matrix is updated, the semi-supervised classification result can be improved by weighting the feature dimension in each visual angle.

A semi-supervised classification method based on dimension weighting and visual angle feature consistency is characterized by comprising the following steps:

step 1: let χ ═ X¹,X²,...,X^VDenotes a multi-view data set, in which,

a feature representing a V-th view, V1, 2.., V being the number of views, n representing the number of samples, d (V) representing the dimensionality of the V-th view feature; the number of categories of the data set is set as C;

according to

The method calculates the ith sample point in the v view

To the jth sample point

I, j ═ 1, 2.., n; for each sample point, sorting the distances between all other sample points and the sample point from small to large, and selecting the first k sample points with the minimum distance as the adjacent points; then, the ith sample point is calculated as follows

And the jth sample point

Similarity between them:

wherein the content of the first and second substances,

representing distance sample points

Sample point of k +1 th nearest sample point and sample point

The value range of k is 5-15, when i is j,

to be provided with

Obtaining a similarity matrix S of a v-th visual angle as the ith row and j column element values of the matrix^v∈R^n×n，v＝1,2,...,V；

Step 2: adding the similarity matrixes of all V visual angles, and averaging to obtain an initial consistency similarity matrix S; then, according to L_S＝D_SS is calculated to obtain an initial Laplace matrix L_SWherein D is_SIs a degree matrix, is a diagonal matrix, the ith diagonal element of which is

1,2, n; tag matrix F ═ F_l；F_u]^T，F_l＝Y_l，Y_l∈R^l×CA label matrix representing known samples, F_u∈R^u×CA label matrix representing unlabeled samples, wherein u is n-l, l is the number of known sample labels, and F is taken as the first C eigenvectors of the matrix S at the beginning; according to the formula theta ^v _ii1/d (v) initialize the weight matrix Θ for the v-th view angle^v，Θ^v∈R^d(v)×d(v)Is a diagonal matrix, Θ^v _iiIs theta^vI 1,2, d (V), V1, 2, 1.., V;

and step 3: the multi-view semi-supervised classification model is constructed as follows:

wherein s is_ijJ row and column elements of the ith row representing the consistency similarity matrix S, | | · | | computationally_FF norm, θ, representing the matrix^vThe representation is represented by a weight matrix Θ ^v1 represents a column vector with all elements 1, gamma and lambda are regularization parameters, gamma > 0, lambda > 0;

and 4, step 4: and (3) solving the semi-supervised classification model in the step (3) by adopting an iterative alternation method according to the following processes by taking all the matrixes obtained in the step (2) as initial values until a final label matrix F is obtained:

step 4.1, fixing theta and F, and solving the following formula to update S:

wherein s is_iThe i-th row vector, d, representing the matrix S_iRepresents a vector whose j-th element is calculated according to the following equation:

wherein f is_iAnd f_jI and j represent the ith and jth row vectors, i, j, respectively, of the matrix F;

step 4.2, fixing S and theta, updating F:

first, the degree matrix D is updated as follows_S：

Wherein is D_iiIs a diagonal matrix D_SI ═ 1,2,. and n;

the laplacian matrix is then updated as follows:

L_S＝D_S-S (6)

will Laplace matrix L_SBlocking from row l and column l:

wherein L is_llRepresenting a matrix of size L x L_luDenotes a matrix of size L × u, L_ulDenotes a matrix of size u x L, L_uuRepresents a matrix of size u x u;

the consistency similarity matrix S and the degree matrix D are compared_SPartitioning:

the label matrix F of unlabeled samples is updated as follows_u：

F_u＝(I-P_uu)^-1P_ulF_l (10)

Wherein the content of the first and second substances,

finally, according to F ═ F_l；F_u]^TUpdating a label matrix F;

step 4.3, fixing F and S, and solving the following formula to update theta:

wherein, theta^vThe representation is represented by a weight matrix Θ^vIs a vector of diagonal elements, W^vIs a diagonal matrix of the v-th view, its i-th diagonal element

Is a matrix M^vThe ith diagonal element of (1), M^v＝(X^v)^TL_sX^v；

Step 4.4, iteration stop judgment:

s, F, L obtained by the last time and the current time of updating respectively_SΘ brings in the following objective function:

if the difference between the two obtained objective function values Z is smaller than a set threshold value, stopping iteration, wherein F at the moment is the final label matrix F; otherwise, returning to the step 4.1 to continue the iterative updating;

and 5: the label for each sample was obtained as follows:

y_i＝argmax_1≤j≤CF_ij i＝1,2,...,n (13)

wherein, y_iRepresents the ith sample point

Label of (1), F_ijRepresenting the i row and j column elements of the final label matrix F obtained in the step 4;

and classifying the samples with the same label into one class to obtain a classification result.

Further, the regularization parameter γ stated in step 3 is

Further, the threshold value set in step 4 is 10^-8。

The invention has the beneficial effects that: because the process of constructing the similarity matrix is combined with the process of label deduction, label propagation is carried out while the similarity matrix is learned, and the influence of composition quality on the classification result is reduced; due to the fact that the characteristic dimensions in the view angles are weighted, the difference among the dimensions in the view angles is considered, and the relation among the characteristic dimensions in the view angles can be better mined; due to the consideration of the local structure information of the data, better neighborhood distribution can be obtained, and thus better classification results can be obtained.

Drawings

FIG. 1 is a flow chart of a semi-supervised classification method based on dimension weighting and view feature consistency according to the present invention;

FIG. 2 is a schematic diagram of a simulation data set;

in the figure, (a) -a first perspective of the simulated data set, and (b) -a second perspective of the simulated data set.

Detailed Description

The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.

As shown in fig. 1, the present invention provides a semi-supervised classification method based on dimension weighting and view feature consistency, which is implemented as follows:

1. initializing a similarity matrix

Let χ ═ X¹,X²,...,X^VDenotes a multi-view data set, in which,

features representing a V-th view, V1, 2.., V being the number of views, n representing the number of samples, d (V) representing the dimensionality of the V-th view features.

In the euclidean space, if the distance between two samples is closer, it indicates that the similarity between the two samples is higher, and the two samples should have the same output class. Furthermore, there is complementarity and consistency between the multi-view data sets. Thus, the initialized similarity matrix S for the v-th perspective^vCan be obtained by solving the following objective function:

wherein the content of the first and second substances,

representation matrix S^vThe ith row and j column element values of (i, j) 1, 2.

The first term of equation (14) is to measure the correlation between the multi-view datasets; the second term is a regular term, in order to avoid the occurrence of trivial solutions, namely: in the v-th view angle, distance samples

The probability of the nearest sample point assignment is 1, and the other sample points are assigned 0. In order to improve the efficiency of label propagation, the invention adopts the model to construct a sparse similarity matrix. In particular, for sample points

According to

Measuring the distance between other sample points and the sample points, sorting the sample points according to the distance from small to large, then selecting k sample points with the smallest distance as the adjacent points of the sample points, adopting a k adjacent method to allocate weight, namely when j is less than or equal to k,

when j is greater than k, the number of the first and second groups,

j is the sample point sequence number after sorting by distance,

represents the (k + 1) th sample point and the sample point after sorting

The distance (k) is a parameter value to be set in advance, and is usually set in a range of 5. ltoreq. k.ltoreq.15.

In addition, to solve for the regularization parameter γ, the lagrangian function of equation (14) can be derived for γ and using the KKT condition to obtain

2. Construction of multi-view semi-supervised classification model

Obtaining an initial similarity matrix S for each view^vThereafter, for the multi-view dataset, a consistency similarity matrix S needs to be learned, so the initial consistency similarity matrix is set to

The degree matrix of the corresponding matrix S is set to D_S，D_SIs a diagonal matrix with the ith diagonal element of

i＝1,2,...,n。

In order to enable label propagation while learning the similarity matrix, it can be obtained by solving the following model (15):

wherein s is_ijJ row and column elements of the ith row representing the consistency similarity matrix S, | | · | | computationally_FF norm, θ, representing the matrix^vIs formed by a weight matrix theta^vGamma is a regularization parameter, gamma is greater than 0; l is_SIs a Laplace matrix and is semi-positive definite, initially in terms of L_S＝D_SAnd S is obtained through calculation. F ═ F_l；F_u]^TRepresenting a label matrix, consisting of two parts, F_l＝Y_l，Y_l∈R^l×CLabel matrix representing the first l samples, F_uThe labels representing unlabeled exemplars, the initial value of F is obtained from the first C eigenvectors of the similarity matrix S above, u-n-l, l being the number of known exemplar labels.

The first term and the second term in the above formula objective function represent the process of the learning of the similarity graph, and the third term represents the process of the label propagation. The combination of the two can ensure label propagation when constructing similar graphs.

In the semi-supervised classification model described above, only the "complementarity" and "consistency" information among multiple views is utilized in constructing the similarity map to construct the similarity matrix. However, for each view angle feature, there is also a difference between the different dimensions. In order to take into account the influence of different dimension information within the view on the classification result, the characteristic dimensions within each view can be weighted adaptively. Let the weight matrix of the v-th view beΘ^v，Θ^v∈R^d(v)×d(v)Is a diagonal matrix whose diagonal elements are initially in accordance with

And (4) calculating. Therefore, a multi-view semi-supervised classification model based on dimension weighting and view feature consistency is obtained as follows:

wherein, theta^vThe representation is represented by a weight matrix Θ^vThe diagonal elements of (a) constitute a vector. In the above formula, Θ and F, S are to be solved and can be obtained by learning, that is, the formula (16) can be obtained by an alternate iterative update algorithm.

3. Alternating iteration updating and solving multi-view semi-supervised classification model

Theta, F, S, L have been obtained previously_SThe final label matrix F is obtained by alternate iterative updating. The method specifically comprises the following steps:

(1) fix F and theta, update S

When F and Θ are fixed, the above model (16) is equivalent to the minimum solving problem as follows:

due to the fact that

And is

Therefore, the above equation can be converted into:

is provided with

Since it is independent for each i, the above equation is equivalent to solving the following problem:

setting the Lagrangian function of equation (19) and applying it to s_iDerivative to obtain

(2) Fix S and theta, update F

When S and Θ are fixed, the first two terms of the objective function in equation (16) are fixed values, which is equivalent to solving the following problem:

general matrix F, D_SS and L_S＝D_S-S is written separately in block form, i.e. F ═ F_l；F_u]^T,

Wherein, F_lDenotes a matrix of size l × C, F_uDenotes a matrix of size u × C, L_llRepresenting a matrix of size L x L_luDenotes a matrix of size L × u, L_ulDenotes a matrix of size u x L, L_uuDenotes a matrix of size u × u, u ═ n-l;

equation (20) can be converted to:

the Lagrangian function of equation (21) is applied to F_lThe derivative is taken and set to 0 and,then F is obtained_u＝-inv(L_uu)L_ulY_l＝-inv(D_uu-S_uu)S_ulY_l. Suppose that

Then F_uCan be written as

(3) Fix F and S, update Θ

When F and S are fixed, the second term and the third term of the objective function in equation (16) are fixed values, and therefore, it can be converted to solve the following problem:

since each view angle v is independent, solving equation (22) is equivalent to solving:

wherein (X)^T)^vTranspose representing the v-th view characteristic, M^v＝(X^v)^TL_sX^v，W^vIs a diagonal matrix of the v-th view, its i-th diagonal element

Is a matrix M^vThe ith diagonal element of (1).

(4) Substituting each matrix obtained in the updating into an objective function in a formula (16), calculating to obtain an objective function value, subtracting the objective function value obtained in the last iterative calculation, and stopping iterative updating if the difference value of the objective function value and the objective function value is less than a set threshold value, wherein the obtained F is a final label matrix; otherwise, returning to the step (1) to perform the next iteration updating. The threshold value may be set to 10^ (-8) in general.

4. Sample classification

According to the finally obtained label matrix F, selecting the column serial number of the maximum value in each row as the label of the sample, and setting the maximum value of the column in the ith row as F_ikAnd k is the label of the ith sample, and the samples with the same label are classified into one class to obtain the final classification result.

In order to verify the effect of the method, classification experiments are respectively carried out on a simulation data set and a real data set, and the classification effect is evaluated by respectively adopting indexes such as classification Accuracy (ACC), arithmetic mean of average accuracy (Map), F-measure, Precision (Precision) and recall rate (Re-call), wherein the larger the ACC value is, the more the number of samples which are classified correctly is; map represents the arithmetic mean of the average precision in the information retrieval system; precision represents Precision; re-call represents recall ratio, and F-measure is the compromise between precision ratio and recall ratio. Fig. 2 shows simulated datasets for two views and table 1 shows a simplified depiction of the real datasets, where # v1- # v6 represent the characteristic dimensions of the first view through the sixth view of each dataset and MSRC-v1 is a dataset containing a total of 240 images of 8 classes, seven of which were selected in the experiment: i.e. trees, buildings, airplanes, cattle, portraits, cars, bicycles. Since there are 30 images per category, there are 210 images in total. Six features of these images were extracted, including 48-dimensional Color Moments (CMT), 256-dimensional local binary pattern features (LBP), 100-dimensional gradient direction Histogram (HOG), 200-dimensional Scale Invariant Features (SIFT), 512-dimensional GIST grayscale features, and 1320-dimensional CENTRIST features. HW is a digital image data set of '0 to 9', with 200 images per class for a total of 2000 images. Six features of the dataset were extracted for classification, including 240-dimensional pixel features (PIX) with a sliding window size of 2 × 3, 76-dimensional fourier coefficient Features (FOU), 216-dimensional contour correlation Features (FAC), 47-dimensional transform coefficient features (ZER), 64-dimensional KAR coefficient features and 6-dimensional morphological features (MOR). The Cal101 dataset contains 101 target recognition images, the first category being the selection broad1474 images of 7 types form a Cal101-7 data set; the second category is to select a widely used 20 categories of 2386 images in total to construct the Cal101-20 dataset. For the above two data sets, six features that are commonly used are extracted: namely 48-dimensional Gabor features, 40-dimensional wavelet moments, 254-dimensional CENTRIST features, 1984-dimensional gradient direction histogram, 512-dimensional GIST features, 928-dimensional local binary pattern features (LBP). For the simulation dataset, the known number of specimen labels per class is set to 1. The number of neighbors for all datasets is set to 5 and the regularization parameter λ is 10^-3,10^-2,10^-1,10⁰,10¹,10²,10³}. For each lambda value, the experiment was run 10 times, the best result for each dataset was selected as the final classification result, and Gaussian Field and Harmonic Function (GFHF) was selected as the comparison method. Table 2 gives the single-view and multi-view classification results for the simulated dataset. Tables 3-6 present the results of classification of the four above-described real data sets MSRC-v1, HW, Cal101-7, and Cal101-20 for 10%, 20%, 30%, and 40% known sample labels, respectively. As can be seen from tables 2-6, the classification results of the method of the present invention are superior to the classification results of single view in the simulation data set. In addition, for the four real data sets, as the proportion of known sample labels is increased, the accuracy of the classification result of the method is gradually increased, and the method has a better classification effect.

TABLE 1

TABLE 2

Data set	ACC(％)	Map(％)	F-measure(％)	Precision(％)	Re-call(％)
						View 1(GFHF)	85.35％	87.29％	75.79％	79.08％	72.77％
View 2(GFHF)	77.27％	67.80％	67.35％	73.18％	62.37％
						The method of the invention	97.98％	98.68％	96.00％	96.02％	95.98％

TABLE 3

Data set	ACC(％)	Map(％)	F-measure(％)	Precision(％)	Re-call(％)
						MSRC-v1	91.38％	85.56％	83.81％	84.71％	82.94％
HW	97.48％	95.74％	95.01％	95.09％	94.93％
						Cal101-7	95.37％	73.19％	95.36％	98.89％	92.10％
Cal101-20	84.13％	55.80％	86.56％	91.34％	82.27％

TABLE 4

Data set	ACC(％)	Map(％)	F-measure(％)	Precision(％)	Re-call(％)
						MSRC-v1	91.79％	86.03％	84.33％	84.91％	83.76％
HW	97.74％	96.14％	95.53％	95.57％	95.49％
						Cal101-7	96.51％	79.34％	97.04％	99.15％	95.02％
Cal101-20	87.60％	62.87％	89.92％	92.15％	87.82％

TABLE 5

Data set	ACC(％)	Map(％)	F-measure(％)	Precision(％)	Re-call(％)
						MSRC-v1	93.06％	87.92％	86.57％	87.04％	86.11％
HW	98.03％	96.67％	96.09％	96.12％	96.06％
						Cal101-7	97.12％	82.92％	97.75％	99.00％	96.53％
Cal101-20	88.51％	65.20％	90.81％	92.22％	89.46％

TABLE 6

Data set	ACC(％)	Map(％)	F-measure(％)	Precision(％)	Re-call(％)
						MSRC-v1	93.49％	89.31％	87.28％	87.67％	86.90％
HW	98.03％	96.62％	96.11％	96.13％	96.08％
						Cal101-7	97.37％	82.83％	98.11％	98.89％	97.35％
Cal101-20	89.23％	67.33％	91.41％	92.35％	90.50％

Claims

1. A semi-supervised classification method based on dimension weighting and visual angle feature consistency is characterized by comprising the following steps:

step 1: let χ ═ X¹,X²,...,X^VDenotes a multi-view data set, in which,

according to

The method calculates the ith sample point in the v view

To the jth sample point

And the jth sample point

Similarity between them:

wherein the content of the first and second substances,

representing distance sample points

Sample point of k +1 th nearest sample point and sample point

The value range of k is 5-15, when i is j,

to be provided with

Tag matrix F ═ F_l；F_u]^T，F_l＝Y_l，Y_l∈R^l×CA label matrix representing known samples, F_u∈R^u×CA label matrix representing unlabeled samples, wherein u is n-l, l is the number of known sample labels, and F is taken as the first C eigenvectors of the matrix S at the beginning; according to the formula theta^v _ii1/d (v) initialize the weight matrix Θ for the v-th view angle^v，Θ^v∈R^d(v)×d(v)Is a diagonal matrix, Θ^v _iiIs theta^vI 1,2, d (V), V1, 2, 1.., V;

wherein s is_ijJ row and column elements of the ith row representing the consistency similarity matrix S, | | · | | computationally_FF norm, θ, representing the matrix^vThe representation is represented by a weight matrix Θ^v1 represents a column vector with all elements 1, gamma and lambda are regularization parameters, gamma > 0, lambda > 0;

step 4.1, fixing theta and F, and solving the following formula to update S:

step 4.2, fixing S and theta, updating F:

first, the degree matrix D is updated as follows_S：

Wherein is D_iiIs a diagonal matrix D_SI ═ 1,2,. and n;

the laplacian matrix is then updated as follows:

L_S＝D_S-S (6)

will Laplace matrix L_SBlocking from row l and column l:

the label matrix F of unlabeled samples is updated as follows_u：

F_u＝(I-P_uu)^-1P_ulF_l (10)

Wherein the content of the first and second substances,

finally, according to F ═ F_l；F_u]^TUpdating a label matrix F;

step 4.3, fixing F and S, and solving the following formula to update theta:

Is a matrix M^vThe ith diagonal element of (1), M^v＝(X^v)^TL_sX^v；

Step 4.4, iteration stop judgment:

and 5: the label for each sample was obtained as follows:

y_i＝argmax_1≤j≤CF_ij i＝1,2,...,n (13)

wherein, y_iRepresents the ith sample point

2. The semi-supervised classification method based on dimension weighting and view angle feature consistency as claimed in claim 1, wherein: the regularization parameter γ in step 3 is

3. The semi-supervised classification method based on dimension weighting and view angle feature consistency as claimed in claim 1, wherein: the threshold value in step 4 is set to 10^-8。