CN110767263B

CN110767263B - Non-coding RNA and disease associated prediction method based on sparse subspace learning

Info

Publication number: CN110767263B
Application number: CN201910991283.3A
Authority: CN
Inventors: 汤永; 伍亚舟; 易东; 卫泽良
Original assignee: Chinese PLA General Hospital
Current assignee: Third Military Medical University TMMU
Priority date: 2019-10-18
Filing date: 2019-10-18
Publication date: 2022-12-06
Anticipated expiration: 2039-10-18
Also published as: CN110767263A

Abstract

The invention discloses a sparse subspace learning-based non-coding RNA and disease associated prediction method, and belongs to the field of system biology. The method specifically comprises the following steps: step one, constructing a non-coding RNA-disease associated adjacency matrix, and then respectively calculating the Gaussian spectrum nuclear similarity of the non-coding RNA and the Gaussian spectrum nuclear similarity of the disease; secondly, calculating a graph theory characteristic matrix and a statistic characteristic matrix according to the two similarity matrixes and the adjacency matrix, further constructing an objective function and solving a mapping matrix G; and step three, solving a non-coding RNA-disease association pair relation score prediction matrix, and sequencing to give a final prediction result. The invention integrates graph theory, statistical method and machine learning method, can effectively utilize the information of negative samples in the non-coding RNA-disease associated data, efficiently, accurately and quickly predict the non-coding RNA with obvious correlation to the occurrence and development of diseases, and effectively solves the problems of long time consumption and high cost of a biological experiment method.

Description

Sparse subspace learning-based non-coding RNA and disease associated prediction method

Technical Field

The invention relates to the field of system biology, in particular to a non-coding RNA and disease associated prediction method based on sparse subspace learning.

Background

Non-coding RNA (ncRNA) refers to an RNA molecule that does not encode a protein in a transcriptome, and commonly includes microRNA, incrna, circRNA, and the like.

Micrornas (mirnas) are endogenous single-stranded RNAs of about 22 nucleotides in length that are present in a variety of species, including plants, animals and certain viruses. As an important posttranscriptional regulator, they inhibit gene expression and promote mRNA degradation by base-pairing with the 3' untranslated regions (UTRs) of target RNA. They play key roles in a variety of biological processes, such as cell division, differentiation, development, metabolism, infection, aging, apoptosis, and signal transduction. Experimental evidence suggests that aberrant expression of mirnas is associated with a number of human diseases. For example, up-regulated expression of miRNA 181a may trigger progression to human type 1 diabetes. In addition, hypercholesterolemia is closely associated with increased liver miR-223 levels in atherosclerotic mice. Furthermore, it has been demonstrated that miR-21, miR-494 and miR-1973 are disease response biomarkers in classical Hodgkin's lymphoma.

Long non-coding RNA (lncRNA) is RNA with the length of more than 200bp, participates in regulation and control of various biological processes, including genome epigenetic modification, regulation and control of posttranscriptional translation, enhancer RNA effect and the like, and thus plays a role in regulation and control of proliferation, differentiation, migration, apoptosis, immunity and the like of cells. Experiments show that lncRNA AC006449.2 can play the role of cancer suppressor in ovarian cancer cells. In addition, the high-expression liver cancer cells of the lncRNA H19 are in an exosome mode, the proliferation, migration and invasion capacity of adjacent liver cancer cells are enhanced, and the occurrence and development of liver cancer are promoted. Big data analysis shows that the lncRNA RP11-214F16.8 is highly expressed in the breast cancer, promotes the proliferation of breast cancer cells and further promotes the breast cancer process.

Circular RNA (circRNA) is a circular closed RNA molecule which is formed by reverse splicing and has no 5 'end cap and 3' end poly A tail, and has the characteristics of conservation, stability, tissue specificity, space-time specificity and the like. A large number of researches find that the compound can participate in the growth and development regulation of animals and the occurrence and development of diseases and the like through a plurality of mechanisms. Studies have found that forced expression of circRNA HRCR in ISO-induced myocardial hypertrophy mice can significantly alleviate myocardial hypertrophy. Experiments show that the circRNA Cdr1as can influence insulin secretion and islet B cell renewal. Colorectal cancer-related studies indicate that hsa _ circ _001988 is reduced in cancer tissues, correlating with the degree of tumor cell differentiation and prognosis.

Since non-coding RNAs affect the development and progression of a variety of human complex diseases, identification of potential ncRNA-disease associations can provide a better understanding of disease pathogenesis at the ncRNA level, which in turn facilitates disease diagnosis and treatment. However, since revealing the correlation through experimental methods is expensive and time consuming, there is a need for a novel and efficient computational method for correlation prediction. However, there are many common disadvantages of existing methods, such as failure to take global similarity into account, higher false positives related to transition components or inexactness of approximate substitutions as negative using randomized, unverified samples.

Disclosure of Invention

In order to overcome the above defects in the prior art, the present invention provides a Sparse Subspace Learning-based non-coding RNA and Disease Association prediction method (Graph regulated space Learning method for ncRNA-Disease Association prediction, GRSSL-RDA for short), which comprises calculating gaussian spectrum core similarity of ncRNA and gaussian spectrum core similarity of Disease respectively; then, calculating a map characteristic matrix and a statistic characteristic matrix according to the ncRNA-disease associated adjacency matrix, ncRNA Gaussian spectrum nuclear similarity and disease Gaussian spectrum nuclear similarity; then constructing an objective function and solving a mapping matrix G to obtain a pre-measured partial matrix of ncRNA-disease association pairs; and finally, sequencing to obtain a final prediction result. The method can accurately and efficiently predict ncRNA related to the occurrence and development of diseases according to ncRNA-disease associated data.

In order to achieve the above object, the present invention provides the following technical solution, a sparse subspace learning-based non-coding RNA and disease association prediction method, specifically comprising the following steps:

inputting known non-coding RNA-disease association pairs to construct an adjacency matrix Y;

step two, respectively calculating the Gaussian interaction spectrum nuclear similarity between diseases and the Gaussian interaction spectrum nuclear similarity between non-coding RNAs:

if there is a correlation between a disease d (i) and non-coding RNA, the corresponding position is marked as 1, otherwise, the corresponding position is marked as 0, a 1 × nm-sized row vector consisting of 0 or 1 is formed, the row vector is marked as the interaction spectrum IP (d (i)) of the disease d (i), and then, the Gaussian interaction spectrum nuclear similarity between the diseases d (i) and d (j) is calculated:

S _d (d(i),d(j))＝exp(-γ _d ||IP(d(i))-IP(d(j))|| ² )

in the above formula, the parameter γ _d For controlling core Bandwidth by normalizing the New Bandwidth parameter γ' _d Obtaining:

the gaussian interaction profile nuclear similarity between non-coding RNAs m (i) and m (j) is defined in a similar manner:

S _m (m(i),m(j))＝exp(-γ _m ||IP(m(i))-IP(m(j))|| ² )

wherein nd represents the number of diseases, nm represents the number of non-coding RNAs, and is taken as gamma' _d ＝γ’ _m ＝1；

Step three, extracting the feature vectors and synthesizing a feature matrix: from the disease similarity matrix S _d Similarity matrix S of non-coding RNA _m And extracting the statistical feature vector X of the disease (or non-coding RNA) from the known disease-non-coding RNA correlation matrix Y ₁ (or X) ₁ ') and graph theory feature vector X ₂ (or X) ₂ ’)：

(1) Class I signature X of each disease (or non-coding RNA) ₁ (or X) ₁ ') includes:

(1) for disease d (i) (or non-coding RNAm (j)), the sum of the number of associations observed in the corresponding ith row (or jth column) of matrix Y;

(2) mean of similarity scores, i.e. S _d (or S) _m ) The average value of the ith row (or jth column) of (a);

(3) dividing the range [0,1] into n intervals, and calculating the proportion of the similarity score of d (i) (or m (j)) falling into each interval;

(2) Class II signatures X of each disease (or non-coding RNA) ₂ (or X) ₂ ') comprises:

①S _d (or S) _m ) The number of neighbors of a node in the unweighted graph;

(2) similarity values of k nearest neighbors of the node;

(3) average of class i features between k nearest neighbors;

(4) averaging the class I features centered by k nearest neighbors of the nodes in the similarity value weighted graph;

(5) from matrix S _d (or S) _m ) The intermediacy, proximity, feature vector centrality (Betweenness, closeness, eigen vector centricity) of the resulting node;

(6) from matrix S _d (or S) _m ) The obtained Page-Rank score of the node is obtained;

(3) Class III signature X of each disease-noncoding RNA pair ₃ (or X) ₃ ') includes:

(1) potential vectors of non-coding RNA and disease obtained by matrix Y decomposition;

(2) medians, proximity, feature vector centrality of the nodes resulting from the matrix Y (Betweenness, closeness, eigenvector centricity);

(3) the Page-Rank score of the node obtained by the matrix Y;

combining the 3 kinds of characteristics of the diseases according to rows to obtain the total characteristic matrix of the diseases, namely X _d ＝[X ₁ X ₂ X ₃ ]；

Merging the 3 kinds of characteristics of non-coding RNA according to rows to obtain the total characteristic matrix of the non-coding RNA, namely X _m ＝[X ₁ ’ X ₂ ’ X ₃ ’]；

Step four, constructing an objective function as follows:

wherein F represents the relevance score matrix (unknown, to be solved) and L is the Laplace regularization matrix of the disease or ncRNA (known, from S) _d Or S _m Found), U is the decision matrix (known, set as identity matrix), b is the offset (unknown, to be found),1 _n is an all 1 vector, μ and λ are regularization coefficients (optimizable parameters, taking non-negative values), G is the mapping matrix (unknown, to be solved), tr () represents the trace of the solving matrix, | · |) _F An F norm representing a matrix;

in the objective function, item 1 is laplacian regularization for capturing the internal manifold structure information of data, so that our method can utilize a small amount of labeled data and a large amount of unlabeled data at the same time, item 2 depicts the difference between the prediction score value and the actual correlation matrix, item 3 is subspace regression item (in this item, miRNA-disease interaction information is compressed from a high-dimensional space F to a low-dimensional space X by a projection matrix G, which belongs to the category of "subspace learning", and externally satisfies the general form of regression y = Ax + B, so named), which communicates the feature matrix and the prediction score, using F norm to minimize the difference, and item 4 finally introduces l _2,1/2 Norm, λ | G | ^1/2 _2,1/2 Selecting the most discriminant sparse feature according to the item;

step five, solving an objective function, and solving a matrix G by using the following iterative algorithm:

inputting: adjacency matrix Y, disease similarity matrix S _d ncRNAs similarity matrix S _m Non-negative regularization parameters μ and λ;

the process is as follows:

(1) From S _d Or (S) _m ) Respectively calculating Laplace matrix L epsilon R ^r×nd L = D-W, where D is a diagonal matrix, and

(2) According to Y, S _d And S _m The disease feature matrix X is calculated separately in the manner described in step three _d (or ncRNA signature matrix X) _m )；

(3) Initializing decision matrix U e R ^r×nd ；

(4) Computing

(5) Calculation J = (L + U + μ A) ^-1 ；

(6) Calculation M = XA (μ I- μ ² J)AX ^T ；

(7) Calculate N = μ XAJUY;

(8) Let t =0,G ₀ ∈R ^r×nd Is a random matrix and is characterized by that,

(9) Computing a diagonal matrix D _t ,

Update G _t+1 ＝(M+4λD _t ) ^-1 N

t＝t+1

Until convergence;

and (3) outputting: optimal mapping matrix G

Then calculate the matrix F = JK, where J = (L + U + μ a) ^-1 ，K＝UY+μAX ^T G F, further obtaining

Step six, after obtaining X, G and b, calculating a prediction scoring matrix F of a non-coding RNA space _m :

Similarly, a predictive scoring matrix F for the disease space is calculated _d ；

Step seven, F is _m And F _d Performing linear combination to obtain a final prediction score matrix:

F _prediction ＝σF _m +(1-σ)F _d

the combination coefficient sigma can be further searched and optimized;

and step eight, sequencing the relationship scores according to the calculated non-coding RNA-disease association pair to give a final prediction result.

The invention has the technical effects and advantages that:

1. the Laplace regularization item is introduced into a subspace learning framework, the inherent manifold structure of data is effectively captured, the prediction performance is improved, the model is a semi-supervised model, only positive samples and unmarked samples are needed without depending on negative samples, and the difficulty of model construction is greatly reduced.

2. Increase in

The norm constraint ensures the sparsity of the mapping matrix G, so that the influence of noise data can be weakened, and a more reliable prediction result can be obtained.

3. The method reasonably integrates graph theory, statistical method and machine learning method, can efficiently, accurately and quickly give the ncRNA-disease associated prediction result, and has better expandability and robustness.

Drawings

FIG. 1 is a general flow diagram of the present invention.

Figure 2 is a graph of the results of five fold cross validation of the present invention on the same data set with several reported methods.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only examples of some miRNA, but not all miRNA (ncRNA also includes other species, such as lncRNA, circRNA, etc.). All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Known human miRNA-disease association data used in the examples of the present invention were retrieved from the database HMDD V2.0 and then downloaded (website http:// www. Cuilab. Cn/HMDD), and 5430 experimentally validated human miRNA-disease associations, including 383 diseases and 495 mirnas, could be obtained after washing, classifying and standardizing the downloaded data.

Then, a sparse subspace learning-based non-coding RNA and disease association prediction method shown in fig. 1 is performed, which specifically includes the following steps:

step one, inputting known miRNA-disease association pairs, and constructing an adjacency matrix Y:

a matrix Y with 383 rows by 495 columns and 0 or 1 element can be obtained;

step two, respectively calculating the Gaussian interaction spectrum nuclear similarity between diseases and the Gaussian interaction spectrum nuclear similarity between miRNA:

if a certain disease d (i) is associated with miRNA, marking the corresponding position as 1, otherwise marking as 0, forming a row vector consisting of 0 or 1 with the size of 1 × 495, marking as interaction spectrum IP (d (i)) of the disease d (i), and then calculating Gaussian interaction spectrum nuclear similarity between the diseases d (i) and d (j):

S _d (d(i),d(j))＝exp(-γ _d ||IP(d(i))-IP(d(j))|| ² )

in the above formula, the parameter γ _d For controlling the core bandwidth by normalizing the new bandwidth parameter γ' _d Obtaining:

the gaussian interaction profile nuclear similarity between mirnas m (i) and m (j) is defined in a similar manner:

S _m (m(i),m(j))＝exp(-γ _m ||IP(m(i))-IP(m(j))|| ² )

taking gamma' _d ＝γ’ _m ＝1；

Where nd represents the number of diseases, here 383 nm represents the number of miRNAs, here 495, which is calculated to give a size of 38Symmetric matrix S of 3 x 383 _d And a symmetric matrix S of size 495 x 495 _m And S is _d And S _m Each element of (a) is between 0 and 1, e.g., d (12) Aging has a similarity of 0.7028 to d (18) Amyotrophic Laterial Sclerosing (ALS), and m (186) hsa-mir-539 and m (424) hsa-mir-4792 have a similarity of 0.5787;

step three, extracting the feature vectors and synthesizing a feature matrix: from the disease similarity matrix S _d miRNA similarity matrix S _m And extracting statistical quantity characteristic vector X of disease (or miRNA) from known disease-miRNA incidence matrix Y ₁ (or X) ₁ ') and graph theory feature vector X ₂ (or X) ₂ ’)：

(1) Class I signature X of each disease (or miRNA) ₁ (or X) ₁ ') comprises:

(1) for disease d (i) (or miRNAm (j)), the sum of the number of associations observed in the corresponding ith row (or jth column) of matrix Y;

(3) dividing the range [0,1] into n =5 intervals, calculating the proportion of d (i) (or m (j)) in which the similarity score falls in each interval;

after this step, a class I feature matrix X of the disease is generated ₁ (size is 383 row X (1 + 5) column), type I feature matrix X of miRNA ₁ ' (column of size 495 row x (1 + 5));

(2) Class II signatures X of each disease (or miRNA) ₂ (or X) ₂ ') includes:

①S _d (or S) _m ) The number of neighbors of a node in the unweighted graph;

(2) similarity values for k =10 nearest neighbors of a node;

(3) k = mean of class i features between 10 nearest neighbors;

(4) node k =10 mean values of the class i features centered in the nearest neighbors in the similarity value weighted graph;

(5) from matrix S _d (or S) _m ) Obtained byThe intermediacy, proximity, feature vector centrality (Betweenness, close, eigenvector centricity);

after this step, a class II feature matrix X of the disease is generated ₂ (size is 383 row X (1 +10 +7+3+ 1) column), class II feature matrix X of miRNA ₂ ' (column of size 495 row x (1 +10 +7+3+ 1));

(3) Class III signatures X of each disease-miRNA pair ₃ (or X) ₃ ') comprises:

(1) potential vectors of miRNA and disease obtained by matrix Y decomposition (top 5 columns);

(3) the Page-Rank score of the node obtained by the matrix Y;

after the calculation, a class III feature matrix X of the disease is generated ₃ (size is 383 row X (5 +3+ 1) column), type III feature matrix X of miRNA ₃ ' (column size 495 row x (5 +3+ 1));

merging the 3 types of disease features by row, i.e. X _m ＝[X ₁ X ₂ X ₃ ]Obtaining the total characteristic matrix X of the disease _m (383 rows × 45 columns);

merging the 3 classes of miRNA features by row, i.e. X _d ＝[X ₁ ’X ₂ ’X ₃ ’]Obtaining the total feature matrix X of miRNA _d (495 rows and 45 columns);

step four, constructing an objective function as follows:

wherein F represents a correlation score matrix to be predicted (unknown, to be solved), and L is a Laplace regularization matrix of disease or miRNA (known, from S) _d Or S _m Found), U is a decision matrix (known, here set to be an identity matrix), b isOffset (unknown, to be solved), 1 _n Is an all 1 vector, μ and λ are regularization coefficients (optimizable parameters, here directly assigned to 1), G is the mapping matrix (unknown, to be solved), tr () represents the trace of the solving matrix, | · |, where _F An F-norm representing a matrix;

solving an objective function, and solving a matrix G by using the following iterative algorithm;

inputting: adjacency matrix Y, disease similarity matrix S _d (or miRNAs similarity matrix S) _m ) Non-negative regularization parameters μ and λ (both set directly here to 1);

the process is as follows:

(1) From S _d (or S) _m ) Respectively calculating corresponding Laplace matrix L epsilon R ^r×nd L = D-W, where D is the diagonal moment

Array, and

(2) According to Y, S _d And S _m The disease feature matrix X is calculated separately in the manner described in step 3 _d (or miRNA signature matrix X) _m )；

(3) Initializing decision matrix U e R ^r×nd ；

(4) Computing

(5) Calculation J = (L + U + μ A) ^-1 ；

(6) Calculation M = XA (μ I- μ) ² J)AX ^T ；

(7) Calculate N = μ XAJUY;

(8) Let t =0,G ₀ ∈R ^r×nd Is a random matrix and is characterized by that,

(9) Computing a diagonal matrix D _t ,

Update G _t+1 ＝(M+4λD _t ) ^-1 N

t＝t+1

Until convergence

And (3) outputting: optimal mapping matrix G

When the matlab programming is used for realizing the algorithm, the mapping matrix G is initialized to a random matrix of 100 rows by 383 (or 495) columns, and the iteration cycle number is set to be 1000 or the requirement is met

Exiting the iteration loop, and obtaining a matrix G after the operation is finished;

then, the matrix F, F = JK is calculated, where J = (L + U + μ a) ^-1 ，K＝UY+μAX ^T G is then obtained

Step six, after obtaining X, G and b, calculating a prediction scoring matrix F of miRNA space _m :

Similarly, a prediction scoring matrix F for the disease space is calculated _d ；

F _prediction ＝σF _m +(1-σ)F _d

the combination coefficient sigma is optimized, and is taken as 0.9;

and step eight, sequencing the relation scores according to the calculated miRNA-disease association pair to give a final prediction result.

The validity of the invention is verified:

a sparse subspace learning-based non-coding RNA and disease associated prediction method as shown in FIG. 1 adopts quintuple cross validation for prediction evaluation, and is carried out in such a way that: all known miRNA-disease associations were randomly and evenly divided into 5 groups, and then each of the 5 groups was set as a test sample, and the other groups were used as training samples.

Thus, the predicted outcome is obtained using the training sample as an input to the method, and finally the predicted score for each test sample in the set is compared to the score for the candidate miRNA. To reduce the effect that random partitioning could have on the process of obtaining test samples, 100 five-fold cross-validation was performed.

The following data, shown in fig. 2, are then obtained, in particular for the performance comparison between GRSSLRDA of the present method and the most advanced existing prediction models of association between several diseases, mirnas. The method obtains 0.9030 +/-0.0005972 AUROC (area under ROC curve) in 5-fold cross validation, and shows more excellent prediction performance than the past several classical models.

On the other hand, for a specific disease, such as Lung cancer (Lung cancer), based on the known correlation in HMDD V2.0, the miRNA-Lung cancer correlation prediction is carried out by using the method, and 48 of the first 50 miRNAs in the obtained result can be supported by an external database.

The first and third columns in the table below represent the first 1-25 and first 26-50 related mirnas in the predicted outcome, respectively. In the table, I, II and III represent three external databases of dbDEMC, miR2Disease and HMDD v3.0 respectively.

And finally: the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit and principle of the present invention are intended to be included in the scope of the present invention.

Claims

1. The non-coding RNA and disease associated prediction method based on sparse subspace learning is characterized by comprising the following steps:

S _d (d(i),d(j))＝exp(-γ _d ||IP(d(i))-IP(d(j))|| ² )

the gaussian interaction spectrum kernel similarity between non-coding RNAm (i) and m (j) is defined in a similar manner:

S _m (m(i),m(j))＝exp(-γ _m ||IP(m(i))-IP(m(j))|| ² )

wherein nd represents the number of diseases, nm represents the number of non-coding RNAs, and is taken as gamma' _d ＝γ′ _m ＝1；

Step three, extracting the feature vectors and synthesizing a feature matrix: from the disease similarity matrix S _d Similarity matrix S of non-coding RNA _m Extracting the statistical quantity characteristic vector X of the disease (or non-coding RNA) from the known disease-non-coding RNA correlation matrix Y ₁ (or X) ₁ ') and graph theory feature vector X ₂ (or X) ₂ ’)：

(2) mean of similarity scores, i.e. S _d (or S) _m ) Row i (or column j) average:

(2) Class II signatures X of each disease (or non-coding RNA) ₂ (or X) ₂ ') includes:

①S _d (or S) _m ) The number of neighbors of a node in the unweighted graph;

(2) similarity values of k nearest neighbors of the node;

(3) the mean of class i features between k nearest neighbors;

(5) node intermediacy, proximity, feature vector centrality (Betweenness, closeness, eigenvector centricity);

(6) the Page-Rank score of the node;

(2) node intermediacy, proximity, feature vector centrality (Betweenness, closeness, eigenvector centricity);

(3) the Page-Rank score of the node;

combining the 3 types of characteristics of the diseases according to rows to obtain a total characteristic matrix of the diseases, namely X _d ＝[X ₁ X ₂ X ₃ ]；

Combining the 3 types of characteristics of the non-coding RNA according to rows to obtain a total characteristic matrix, namely X, of the non-coding RNA _m ＝[X ₁ ’ X ₂ ’ X ₃ ’]；

Step four, based on subspace learning, integrating Laplace regularization and l _2,1/2 Constructing an objective function by using the norm constraint term;

step five, solving the objective function to obtain the characteristics of the non-coding RNAMatrix X _m Mapping matrix G and offset matrix b, and calculating prediction scoring matrix F of non-coding RNA space _m :

Step six, similarly, a predictive scoring matrix F for the disease space is calculated _d ；

F _prediction ＝σF _m +(1-σ)F _d

the combination coefficient sigma can be used for further optimization of grid search;

2. The sparse subspace learning based non-coding RNA and disease associated prediction method of claim 1, wherein the objective function in the fourth step is a subspace regression with Laplace regularization and l integrated _2,1/2 The norm constraint term specifically includes:

wherein F represents a correlation score matrix (unknown, to be solved), L is a Laplacian matrix of disease or non-coding RNAs (known, can be represented by S) _d Or S _m Found), U is the decision matrix (known, set as identity matrix), b is the offset (unknown, to be found), 1 _n Is an all 1 vector, μ and λ are regularization coefficients (non-negative values, further grid search optimization), G is a mapping matrix (unknown, to be solved), tr () represents the trace of the solving matrix, | · |) _F An F-norm representing a matrix;

item 1 is a Laplace regularization term in the objective function to capture the inherent manifold structure of the dataInformation, item 2 is a difference quantification item, which characterizes the difference between the pre-measured score and the actual correlation matrix, item 3 is a subspace regression item, which links the feature matrix and the prediction score matrix, and uses the F norm to measure the difference, and item 4 is l _2,1/2 And (4) limiting the item by norm so as to select the sparse feature with the most discriminability.