CN116910502A

CN116910502A - Sparse feature selection method based on local tag correlation and feature redundancy

Info

Publication number: CN116910502A
Application number: CN202310743486.7A
Authority: CN
Inventors: 孙林; 马雨萱; 常宝方; 王振华
Original assignee: Henan Normal University
Current assignee: Henan Normal University
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-10-20

Abstract

The invention belongs to the technical field of data analysis, and particularly relates to a sparse feature selection method based on local tag correlation and feature redundancy, which is implemented by inputting a sample matrix, a tag matrix and super parameters in an objective function into the objective function for iterative updating, wherein the objective function consists of a loss function, a local tag correlation function and a feature redundancy function which are constructed among a tag coefficient matrix, an instance matrix, a weight matrix and the tag matrix; and stopping the iterative updating process when the set stopping rule is reached, and sequencing the output feature subsets according to the set. The method deepens the relation between the labels and the features through the loss function, explores the potential relation between the local labels through the local label correlation function, selects a feature subset with better performance, finally generates the feature subset with discrimination and low redundancy by utilizing the feature redundancy function, and can output the feature subset with optimal sequencing, thereby improving the classification effect of the feature selection method and ensuring the accuracy of classification results.

Description

Sparse feature selection method based on local tag correlation and feature redundancy

Technical Field

The invention belongs to the technical field of data analysis, and particularly relates to a sparse feature selection method based on local tag correlation and feature redundancy.

Background

In the field of multi-tag learning, a single instance is typically associated with multiple semantic scenes and has one or more tags at the same time. Furthermore, almost the multi-labeled dataset is sparse and redundant, which may lead to significant degradation of the performance of the multi-labeled classification model. In real life, there is a lot of redundancy or noise in the high-dimensional data, so it is necessary to preprocess it. Since such invalid or redundant information would reduce the classification performance of the model, it is necessary to reduce the curse of dimensions. In general, feature selection may be used to preserve the visual interpretability and physical meaning of the selected feature. Therefore, the feature selection algorithm in multi-label learning is effectively applied to the fields of text analysis, biological research and the like.

Three feature selection models, encapsulation model, filtering model and embedding model, respectively, are currently available to select the important feature subset. Among these methods, the embedding model is more favored than other methods because of its excellent classification ability and lower time cost. In order to avoid the influence of redundant and noise data on classification performance in the prior art, sparse regularization can be added in a multi-label feature selection framework, for example, a model is developed in Chen et al, extended adaptive Lasso for multi-class and multi-label feature selection, and the model extends adaptive Lasso regression and can be used for multi-class and multi-label data sets. But the model has poor adaptability to nonlinear problems, and the loss function has non-micropoints. Li et al, label correlations variation for robust multi-label feature selection, describe a robust multi-label classification method that designs a self-representative coefficient matrix with L2,1 regularization to remove noise and redundancy points, but does not take into account correlation between features, while exploring correlation between labels is critical to multi-label feature selection methods. In reality, most researchers tend to study tag correlation through manifold learning, as described by Hu et al, "Robust Multi-label feature selection with dual-graph regularization," describes a multi-tag data framework with dual graph structure that uses manifold regularization to preserve local geometry of tags and features, but this embedded approach takes little consideration of the dependency between tags and features, resulting in the loss of critical information. The "Arobust graph based multi-label feature selection considering feature-label dependency" written by Liu et al uses manifold regularization to low-dimensional manifolds embedded in the original features and tag space, preserving the local tag manifold structure. But they do not consider interactions between datasets and features with complex distributions. In real world applications, the resulting raw data set is often too chaotic and incomplete, resulting in machine learning models that are ineffective in identifying and extracting important information. Therefore, it is an important task to remove redundant features using feature selection schemes. In general, existing multi-tag feature selection models consider either relationships between tags or redundancy between features. Most algorithms only use simple sparse constraints to process high-dimensional data, but do not consider the inherent relationship between the features and the labels in detail, and these problems can have a great influence on the classification effect of the feature selection algorithm, so that the classification result is inaccurate.

Disclosure of Invention

The invention aims to provide a sparse feature selection method based on local tag correlation and feature redundancy, which is used for solving the problem of inaccurate classification results of the existing feature selection method.

In order to solve the technical problems, the invention provides a sparse feature selection method based on local tag correlation and feature redundancy, which comprises the following steps:

1) Inputting a sample matrix, a tag matrix and super parameters in an objective function into the objective function for iterative updating, wherein the objective function is composed of a loss function, a local tag correlation function and a feature redundancy function which are constructed among the tag coefficient matrix, the instance matrix, the weight matrix and the tag matrix; the relation between the labels and the features is explored through a loss function, information between the labels is obtained according to a local label correlation function, the features with higher scores are screened, a feature subset with lower feature redundancy is generated according to a feature redundancy function, and the weight coefficient of each feature is solved;

2) And stopping the iterative updating process when the set stopping rule is reached, and sequencing the output feature subsets according to the set.

The beneficial effects are as follows: in order to research potential relation between labels and features, a loss function is constructed among a label coefficient matrix, an instance matrix, a weight matrix and a label matrix, and in order to further utilize local label correlation, a feature subset with higher score can be obtained by utilizing local label information, modified cosine similarity is introduced, feature redundancy is controlled by calculating the similarity among features, and features with low redundancy are screened out, so that classification effect is improved, and accuracy of classification results is guaranteed.

Further, in step 1), the loss function is a regularized loss function based on L2, 1.

Further, the regularization loss function based on L2,1 is:wherein U is a sample matrix, P is a weight matrix, G is a label matrix, and I UP-G I _F Representing the Frobenius norm; gamma is a super parameter; i P I _2,1 The L2,1 norm of the matrix P is represented.

Further, in step 1), the local tag correlation function is built based on combining manifold constraints with Laplacian scores.

Further, the local tag correlation function is:wherein S is a tag penalty term, q (u _i ) The e Q is a discriminant function, S ₂ Represents the L2 norm, tr () represents the trace of the matrix, Q ^T Inverse matrix representing matrix Q,>L＝H-H ^R is a matrix of the graph Laplace>For a diagonal matrix, λ is the hyper-parameter.

Further, in step 1), the feature redundancy function is:

wherein P is a weight matrix; w (W) _er Representing a modified cosine similarity between the e-th feature and the r-th feature, n being the total number of features.

Further, in step 1), the objective function is:

wherein alpha, beta and lambda represent different super parameters, U is a sample matrix, P is a weight matrix, G is a label matrix, V is a label coefficient matrix, L _p Is a symmetric graph Laplace matrix, S is a tag penalty term, W _ij For a modified cosine similarity between the i-th feature and the j-th feature, I _F Is the Frobenius norm, I _2,1 Is L2,1 norm.

Further, the values of the three matrices P, S and V in the objective function are iteratively updated, and the set iteration condition is met or the iteration number reaches the maximum iteration number, and the set iteration is stopped.

Further, the set iteration condition is satisfied when the difference between two consecutive function values is smaller than a preset value.

Further, in step 2), the feature subset output according to the set ordering is: and selecting the feature subsets from the maximum to the minimum according to the weight coefficients, and outputting the feature subsets which are arranged in sequence.

Drawings

FIGS. 1 a-1 f are graphs comparing the method of the present invention with prior art methods for AP in six different data sets;

FIGS. 2 a-2 f are CV plots of the method of the present invention versus the prior art method at six different data sets;

FIGS. 3 a-3 f are RL graphs comparing the method of the present invention with the prior art method in six different data sets;

FIGS. 4 a-4 f are HL graphs comparing the method of the present invention with prior art methods in six different data sets;

FIGS. 5 a-5 f are MA graphs comparing the method of the present invention with prior art methods in six different data sets;

FIGS. 6 a-6 f are graphs comparing MI of the method of the present invention versus prior art methods in six different data sets;

FIG. 7 is a flow chart of a sparse feature selection method based on local tag correlation and feature redundancy in accordance with the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Sparse feature selection method embodiment based on local tag correlation and feature redundancy:

the method of the present embodiment constructs a loss function between the tag coefficient matrix, the instance matrix, the weight matrix, and the tag matrix in order to determine potential links between the tags and the features. In the multi-label classification method, in order to solve the problem of data redundancy, an L2, 1-norm constraint loss function may be used to improve classification performance. To further exploit local label dependencies, manifold constraints are combined with the Laplace score, and these local labels are processed. By adding manifold regularization terms, the local geometry of the labels can be preserved and the local correlation function of the labels given. Thus, a higher scoring feature subset may be obtained using the local tag information. And introducing the modified cosine similarity, controlling the feature redundancy by calculating the similarity among the features, and screening out the features with representativeness and lower redundancy.

The method of this embodiment first establishes a new loss function between the instance matrix, the tag coefficient matrix, the weight matrix, and the tag matrix, and then introduces the Frobenius norm to explore the potential relationship between the feature and the tag. The weight matrix is thinned using L2,1 norm. Therefore, the new loss function not only has higher interpretability, but also can better perform feature ordering. Secondly, utilizing manifold constraints to process local geometric structures among labels, and further mining potential information among the labels; and then, fusing the Laplace scoring strategy, and screening out the features with higher scores. The manifold constraint and the Laplace score are combined to conduct embedded feature selection, and therefore mining of hidden information of potential labels can be guided. And finally, analyzing the feature redundancy by utilizing the modified cosine similarity in consideration of the differences among the feature scores and the redundancy among the examples, and generating a candidate feature subset with lower redundancy. And then selecting low redundancy features by using the L2 norm on the premise of keeping sparsity, and establishing a new objective function optimization solution. Specifically, the procedure of the sparse feature selection method based on local tag correlation and feature redundancy in this embodiment is as shown in fig. 7:

1) Constructing a loss function among the tag coefficient matrix, the instance matrix, the weight matrix and the tag matrix, and adopting constraint loss function based on L2 and 1 norm.

The method in this embodiment considers that since the convex function is easy to optimize, it is often used as a sparse regularization term by a sparse method, but the mode of accurately selecting the first k features using L2,0 norms, although L2,0 norms show a structured regularization technique, but it is mainly used for binary classification, so the method in this embodiment uses L2,1 norms as penalty terms for building a sparse model, which can not only sparse data, but also increase the interpretability of the model.

Assume thatRepresenting a training set; wherein u= [ U ] ₁ ,u ₂ ,…,u _n ]∈R ^n×d Representing a sample matrix; d represents the number of feature vectors; g= { G ₁ ,g ₂ ,…,g _n }∈{1,-1} ^n×m Representing a tag matrix; where m is the class of tags. For each sample u _i If g _ij ＝1，u _i Containing the j-th tag, or vice versa (i.e., u _i Not containing the j-th tag) then g _ij = -1. The loss function is expressed as follows:

wherein γ is a hyper-parameter; i _2,1 Represents L2,1 norm; i _F Representing the Frobenius norm; p epsilon R ^d×m Representing a weight matrix; the weight matrix may represent a mapping relationship of the feature space and the tag space.

To better represent tag information, the potential links between tags and features are mined and the most relevant features are selected to improve on equation (1) as follows: suppose U epsilon R ^n×d Representing a sample matrix; g epsilon R ^n×m Representing a tag matrix; p epsilon R ^d×m Representing a weight matrix. Then formula (1) can be generalized to:

wherein V is E R ^d×m The label coefficient matrix can represent important distribution of labels; gamma represents a hyper-parameter.

2) And combining manifold constraint with Laplacian score by utilizing local label correlation, and processing the local label.

By adding manifold regularization terms, the local geometry of the labels can be preserved and the local correlation function of the labels given. Thus, a higher scoring feature subset may be obtained using the local tag information.

The method of the present embodiment allows for multiple constraints of embedded feature selection to guide the exploration of potential real tags and the selection of different features for a single tag. Whereas the method of changing feature space to tag space using local topology uses tag information as two pairs of constraints to guide the construction of the graph, these constraints are not applicable to multi-tag datasets, and are different based on the relevant feature subsets of different tags, which may share a subset of shared characteristics closely related to the tag if some instances have the same tag. Thus, the method of the present embodiment processes the matrix associated with the local labels by scoring the local features using the Laplace score and introducing manifold constraints.

Let t= [ T ] ₁ ,t ₂ ,…,t _n ]A digital label matrix; z is Z _ij Representing a topological distance between the two feature vectors; c (C) _ij Representing the topological distance between two digital label vectors. They can be described as:

wherein u is _i And u _j Is an instance vector; i ₂ Represents an L2 norm; t is t _i To describe the different meanings of tags associated with the same instance, facilitating the use of local tag correlation. In constraint scoring, the tag information is called a constraint to guide the construction of the graph, but these constraints are not applicable to multi-tag data, so u will be in this embodiment _i And u _j The constraint affinity between them is expressed as follows:

let k denote the number of nearest neighbors, then two instances u _i And u _j The constraint affinity between is defined as:

wherein H is _ij Represents u _i And u _j Constraint affinity between these two vectors; in the affinity matrix, H _ij With u _i And u _j The similarity increases; z is Z _ij Representing a topological distance between the two feature vectors; c (C) _ij Representing between two numerical token vectorsIs a topological distance of (c).

In addition, an information feature may preserve the local structure of an instance, if two features are nearest neighbors, then the feature should contain two instances that are close to each other, but instead the feature should be outside of the two instances. The adjacency matrix is defined in this embodiment as:

wherein the method comprises the steps ofH ^R ∈R ^n×n An instance of a k-nearest neighbor graph representing H; r (u) _i ) Representing the set of u most recent instances of the ith instance.

Suppose U epsilon R ^n×d Representing an instance matrix; g epsilon R ^n×m Is a label matrix; p epsilon R ^d×m Representing a weight matrix; v epsilon R ^d×m Is a label coefficient matrix; e (E) _n ∈R ⁿ Describing a full one vector; q epsilon R ^n×m Representing the mapping function. Then define the local tag information expression formula as:

Q＝[q(u ₁ ),q(u ₂ ),...,q(u _n )] ^T ∈R ^n×m (i.e.Q＝GV+E _n S+UP) (6)

wherein q (u) _i )＝g _i V+E _n H+u _i The method comprises the steps of carrying out a first treatment on the surface of the Q represents a discriminant function; e (E) _n ∈R ⁿ Describing a full one vector; s epsilon R ^m Punishment items for the tags; q (u) _i ) E Q represents a discriminant function.

Let S.epsilon.R ^m Punishment items for the tags; h ^R ∈R ^n×n An example of an R nearest neighbor graph representing H;Q∈R ⁿ ^×m representing the mapping function, then minimizing the local feature structure as:

wherein q (u) _i ) E Q represents a discriminant function; i ₂ Represents an L2 norm; tr () represents the trace of the matrix; q (Q) ^T An inverse matrix representing the matrix Q; l=h-H ^R A drawing Laplace matrix;is a diagonal matrix. In order to more easily study the local geometry characteristics, L is set to be symmetrical, i.e. +.>

3) And introducing the modified cosine similarity, controlling the feature redundancy by calculating the similarity among the features, and screening out the features with representativeness and lower redundancy.

In the method of this embodiment, in consideration of feature selection, redundant or uncorrelated features can be removed, model performance is improved, time cost is reduced, a control function based on feature redundancy is used to suppress connection between features, cosine similarity is used to calculate similarity between two features so as to preserve a feature relation in an original dataset, but cosine similarity is used to examine similarity between two non-zero vectors, and features with larger score difference are not considered, so that in order to eliminate the defect of similarity based on vectors, the method of this embodiment uses improved cosine similarity (decentralization before cosine formula calculation) to examine similarity of features, and this part constructs a control item with feature redundancy on the basis of a weight matrix so as to achieve the purpose of constraining feature redundancy, that is, the control function can evaluate correlation between the e-th feature and the r-th feature.

The cosine similarity is adjusted by concentrating the data based on the cosine similarity and then measuring the similarity of the two vectors by calculating the cosine of the included angle of the two vectors. The adjusted cosine similarity is as follows:

let p= [ P ] ₁ ,p ₂ ,…,p _d ]∈R ^d×m Representing a weight matrix, W _er Representing a modified cosine similarity between the e-th feature and the r-th feature. The feature redundancy function is:

wherein I ₂ Representing the L2 norm.

The modified cosine similarity is obtained by performing decentration processing on the data based on the cosine similarity, and then measuring the similarity of the two vectors by calculating the cosine of the included angle of the two vectors. The corrected cosine similarity is noted as:

wherein the method comprises the steps ofIs the average index score for the a-th feature.

For equation (9), if W _er =0, denote u ^e And u ^r There is no correlation between them. W (W) _er Larger indicates stronger correlation, i.e. when |W _er The correlation is strongest when i is close to 1; when W is _er Near 0 the correlation is the weakest.

4) And establishing a multi-mark characteristic selection algorithm to improve the classification performance of the multi-mark data.

In this embodiment, an embedded multi-tag feature selection method is configured according to an objective function, a weight coefficient of each feature is solved, and features are selected from the largest to the smallest according to the weight coefficient. The objective function of the LSFSR consists of the following three parts: based on the loss function based on the L2,1 norm in equation (2), the local tag correlation function in equation (7), and the feature redundancy function in equation (8).

Suppose U _∈ R ^n×d Is an example matrix; g _∈ R ^n×m Is a label matrix; p (P) _∈ R ^d×m Representing a weight matrix; s is S _∈ R ^m Punishment items for the tags; v epsilon R ^d×m Is a label coefficient matrix; l (L) _p A Laplace matrix is a symmetric graph; q epsilon R ^n×m Representing the discriminant function, the objective function of the LSFSR algorithm is expressed as:

wherein α, β and γ represent three different hyper-parameters; w (W) _ij Representing the distance between the features and the distance between the features, I ₂ Represents an L2 norm; i _2,1 Represents the L2,1 norm and can be used to solve for the lean fluffiness.

In the formula (10), three variables are included, namely a weight matrix P; a tag penalty term S; and a label coefficient matrix V. In general, the objective function is non-smooth and therefore cannot be solved directly, so an alternating gradient descent method is applied to solve these variables, and when two variables are fixed, only one unknown variable is in the optimization problem, so that the objective function can be solved directly.

Thus, the functions with respect to variables P, S and V are derived, respectively, and then the gradient descent operation is alternately performed in each iteration. The specific deduction results are as follows:

wherein the method comprises the steps ofH represents the diagonal matrix +.> D _∈ R ^d×d Representing a diagonal matrix; i _∈ R ^m×m Representing the identity matrix.

Solving the objective function with the inequality beam yields the partial derivatives of the variables P, S, V as:

the optimal solution for equation (12) is:

in this section, P, S, V values in the LSFSR algorithm are each an identity matrix initialized to 1. The three matrices according to equation (13) may be updated by an alternate optimization procedure. If the difference between the absolute values of two consecutive objective functions is less than 0.001, the iteration stops. It follows that the results of the LSFSR algorithm will be represented by a feature ordering, which may be specified by algorithm 1. Table 1 below is a schematic representation of the LSFSR algorithm process of this embodiment:

TABLE 1

The complexity of each iteration time of the LSFSR algorithm in this embodiment is analyzed as follows: for the update of P, S and V, due to matrix L _p Only one calculation is needed, the remaining calculations are mainly matrix multiplication, transposition, matrix inversion, which results in O _max (n ³ +n ² d,m ³ ,d ³ ) Is not limited by the complexity of (a). For updating S and V, the main computing operations include matrix multiplication, transposition and matrix inverse computation, resulting in O, respectively _max (n ² ,dmn,m ³ ) And O _max (m ³ ,dmn,m ² n) complexity. To sum up, the time complexity of each iteration of the LSFSR algorithm is approximately O _max (dmn,n ³ +nd,m ³ )。

And (3) experimental verification: to verify the superiority of the feature selection method of the present invention, in this embodiment, 6 multi-labeled text datasets are selected from http:// mulan. As shown in table 2, the attribute of the 6 data sets is that we divide the data sets into a corpus, divide the test set and the training set on the basis of the data sets to develop experiments, and evaluate the experimental results.

Table 2 detailed information table of multi-tag dataset

LSFSR was compared with GRRO, MFS-MCDM, WFSNR, GLFS, MDFS, MCLS. Wherein GRRO is a generic multi-tag learning framework based on global optimization and redundancy; MFS-MCDM is a multi-tag feature selection method using multi-standard decision making, where the set parameter is 0.1; WFSNR is a weak marker feature selection algorithm based on a neighborhood rough set and a Relieff; GLFS is a group retention tab specific feature selection algorithm for multi-tab learning; MDFS is a manifold regularization discrimination feature selection method based on multi-label learning, wherein set parameters are respectively as follows: 1. 1, 0.1; MCLS is a manifold-based constraint laplacian score multi-label feature selection.

The above methods and the first 100 feature subsets that have been sorted obtained by the LSFSR method of this embodiment are put into ML-KNN classification for comparison of classification results, and in order to quantify the performance of these multi-tag feature selection methods, six multi-tag learning metrics are used: micro average-F1 (MI), macro average-F1 (MA), rank Loss (RL), hamming Loss (HL), coverage (CV) and Average Precision (AP). Wherein, the effects are better the closer the hamming loss, ranking loss and coverage rate are to 0; the closer to 1 the average precision, macro-average-F1 and micro-average-F1 are, the better the effect is. In the experiment, the larger the +%) indicative value behind each evaluation criterion is, the better the performance is, the smaller the +% indicative value is, the better the performance is, the black body in the table marks the optimal expression value, and the six evaluation indexes respectively have different emphasis points, so that few feature selection methods can be superior to other methods on all the indexes. The results of the LSFSR comparison with other methods over 6 data sets are shown in table 3.

Table 3LSFSR results of comparison with other methods over 6 data sets

Table 3 gives the experimental results of seven algorithms on the Science, health, enron, social, reference, arts dataset for text classification. On the dataset Science, health, enron, social, reference, arts, the present algorithm achieves optimal performance at the six metrics HL, RL, CV, AP, MI and MA. Based on the 36 comparison results (6 data sets and 6 evaluation indexes) in the table, the present invention can obtain an optimal value at 100% of cases. The analysis of the experimental results fully shows that the classification performance induced by the feature subset obtained by the feature selection method is obviously superior to that of other comparison algorithms.

Fig. 1 a-1 f, 2 a-2 f, 3 a-3 f, 4 a-4 f, 5 a-5 f, and 6 a-6 f show experimental classification results for seven algorithms on six multi-labeled datasets. For each of the graphs, the X-axis (horizontal axis) represents the number of selected feature subsets and the Y-axis (vertical axis) represents the performance of the evaluation index. From fig. 1 a-6 f (i.e., from the experimental classification result graph illustrating seven algorithms on six multi-labeled datasets), LSFSR performs far better on six text datasets than all other algorithms for the five metrics AP, CV, HL, RL and MA. For datasets Health and Reference, the advantages of LSFSR are not obvious when the number of feature subsets selected is small in index MI, and as the number of features increases, the LSFSR algorithm will be due to all other algorithms. In general, it can be seen from FIGS. 1 a-6 f that our method is superior to the other six advanced multi-signature feature selection methods.

In this embodiment, a sparse feature selection method based on local tag correlation and feature redundancy is provided. Firstly, in order to solve the sparsity problem of example data, an L2,1 norm constraint weight matrix is adopted; functional relations among the label matrix, the instance matrix, the weight matrix and the label coefficient matrix are established, and relations between labels and features are deepened. Secondly, introducing manifold constraint to explore potential relations among local labels on the basis of an embedded method, and selecting a feature subset with better performance by combining Laplace scores; finally, the redundancy between the high-dimensional data features is constrained by the weight matrix, which is beneficial to generating feature subsets with discrimination and low redundancy. The L2 norm is used to select low redundancy features while maintaining sparsity. Finally, a multi-label feature selection algorithm is designed to obtain the best ordered feature subset. Experimental results show that our method has better classification performance for multiple data sets than other comparison methods. In future work, we will further study the deep links between features by studying the complex distribution of tags and features, building a sparse multi-tag feature selection model.

The above description is only a preferred embodiment of the present invention, and the patent protection scope of the present invention is defined by the claims, and all equivalent structural changes made by the specification and the drawings of the present invention should be included in the protection scope of the present invention.

Claims

1. The sparse feature selection method based on local tag correlation and feature redundancy is characterized by comprising the following steps of:

2. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1) the loss function is a regularized loss function based on L2, 1.

3. The sparse feature selection method based on local tag correlation and feature redundancy of claim 2, wherein the regularization loss function based on L2,1 is:wherein U is a sample matrix, P is a weight matrix, G is a label matrix, and I UP-G I _F Representing the Frobenius norm; gamma is a super parameter; i P I _2,1 The L2,1 norm of the matrix P is represented.

4. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1) the local tag correlation function is established based on combining manifold constraints with laplace scores.

5. The sparse feature selection method based on local tag correlation and feature redundancy of claim 4, wherein the local tag correlation function is:wherein S is a tag penalty term, q (u _i ) The e Q is a discriminant function, S ₂ Represents the L2 norm, tr () represents the trace of the matrix, Q ^T Inverse matrix representing matrix Q,>L＝H-H ^R is a matrix of the graph Laplace>For a diagonal matrix, λ is the hyper-parameter.

6. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1), the feature redundancy function is:

7. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1), the objective function is:

8. The sparse feature selection method based on local tag correlation and feature redundancy of claim 7, wherein the updating of the values of the three matrices P, S and V in the objective function is performed iteratively, and stopping when a set iteration condition is met or the number of iterations reaches a maximum number of iterations.

9. The sparse feature selection method based on local tag correlation and feature redundancy of claim 8, wherein the satisfying a set iteration condition is when a difference between two consecutive function values is less than a preset value.

10. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 2), the feature subset output according to the set ordering is: and selecting the feature subsets from the maximum to the minimum according to the weight coefficients, and outputting the feature subsets which are arranged in sequence.