CN116910502A - Sparse feature selection method based on local tag correlation and feature redundancy - Google Patents

Sparse feature selection method based on local tag correlation and feature redundancy Download PDF

Info

Publication number
CN116910502A
CN116910502A CN202310743486.7A CN202310743486A CN116910502A CN 116910502 A CN116910502 A CN 116910502A CN 202310743486 A CN202310743486 A CN 202310743486A CN 116910502 A CN116910502 A CN 116910502A
Authority
CN
China
Prior art keywords
feature
matrix
function
redundancy
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310743486.7A
Other languages
Chinese (zh)
Inventor
孙林
马雨萱
常宝方
王振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Normal University
Original Assignee
Henan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Normal University filed Critical Henan Normal University
Priority to CN202310743486.7A priority Critical patent/CN116910502A/en
Publication of CN116910502A publication Critical patent/CN116910502A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • G06F18/21322Rendering the within-class scatter matrix non-singular
    • G06F18/21326Rendering the within-class scatter matrix non-singular involving optimisations, e.g. using regularisation techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data analysis, and particularly relates to a sparse feature selection method based on local tag correlation and feature redundancy, which is implemented by inputting a sample matrix, a tag matrix and super parameters in an objective function into the objective function for iterative updating, wherein the objective function consists of a loss function, a local tag correlation function and a feature redundancy function which are constructed among a tag coefficient matrix, an instance matrix, a weight matrix and the tag matrix; and stopping the iterative updating process when the set stopping rule is reached, and sequencing the output feature subsets according to the set. The method deepens the relation between the labels and the features through the loss function, explores the potential relation between the local labels through the local label correlation function, selects a feature subset with better performance, finally generates the feature subset with discrimination and low redundancy by utilizing the feature redundancy function, and can output the feature subset with optimal sequencing, thereby improving the classification effect of the feature selection method and ensuring the accuracy of classification results.

Description

Sparse feature selection method based on local tag correlation and feature redundancy
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a sparse feature selection method based on local tag correlation and feature redundancy.
Background
In the field of multi-tag learning, a single instance is typically associated with multiple semantic scenes and has one or more tags at the same time. Furthermore, almost the multi-labeled dataset is sparse and redundant, which may lead to significant degradation of the performance of the multi-labeled classification model. In real life, there is a lot of redundancy or noise in the high-dimensional data, so it is necessary to preprocess it. Since such invalid or redundant information would reduce the classification performance of the model, it is necessary to reduce the curse of dimensions. In general, feature selection may be used to preserve the visual interpretability and physical meaning of the selected feature. Therefore, the feature selection algorithm in multi-label learning is effectively applied to the fields of text analysis, biological research and the like.
Three feature selection models, encapsulation model, filtering model and embedding model, respectively, are currently available to select the important feature subset. Among these methods, the embedding model is more favored than other methods because of its excellent classification ability and lower time cost. In order to avoid the influence of redundant and noise data on classification performance in the prior art, sparse regularization can be added in a multi-label feature selection framework, for example, a model is developed in Chen et al, extended adaptive Lasso for multi-class and multi-label feature selection, and the model extends adaptive Lasso regression and can be used for multi-class and multi-label data sets. But the model has poor adaptability to nonlinear problems, and the loss function has non-micropoints. Li et al, label correlations variation for robust multi-label feature selection, describe a robust multi-label classification method that designs a self-representative coefficient matrix with L2,1 regularization to remove noise and redundancy points, but does not take into account correlation between features, while exploring correlation between labels is critical to multi-label feature selection methods. In reality, most researchers tend to study tag correlation through manifold learning, as described by Hu et al, "Robust Multi-label feature selection with dual-graph regularization," describes a multi-tag data framework with dual graph structure that uses manifold regularization to preserve local geometry of tags and features, but this embedded approach takes little consideration of the dependency between tags and features, resulting in the loss of critical information. The "Arobust graph based multi-label feature selection considering feature-label dependency" written by Liu et al uses manifold regularization to low-dimensional manifolds embedded in the original features and tag space, preserving the local tag manifold structure. But they do not consider interactions between datasets and features with complex distributions. In real world applications, the resulting raw data set is often too chaotic and incomplete, resulting in machine learning models that are ineffective in identifying and extracting important information. Therefore, it is an important task to remove redundant features using feature selection schemes. In general, existing multi-tag feature selection models consider either relationships between tags or redundancy between features. Most algorithms only use simple sparse constraints to process high-dimensional data, but do not consider the inherent relationship between the features and the labels in detail, and these problems can have a great influence on the classification effect of the feature selection algorithm, so that the classification result is inaccurate.
Disclosure of Invention
The invention aims to provide a sparse feature selection method based on local tag correlation and feature redundancy, which is used for solving the problem of inaccurate classification results of the existing feature selection method.
In order to solve the technical problems, the invention provides a sparse feature selection method based on local tag correlation and feature redundancy, which comprises the following steps:
1) Inputting a sample matrix, a tag matrix and super parameters in an objective function into the objective function for iterative updating, wherein the objective function is composed of a loss function, a local tag correlation function and a feature redundancy function which are constructed among the tag coefficient matrix, the instance matrix, the weight matrix and the tag matrix; the relation between the labels and the features is explored through a loss function, information between the labels is obtained according to a local label correlation function, the features with higher scores are screened, a feature subset with lower feature redundancy is generated according to a feature redundancy function, and the weight coefficient of each feature is solved;
2) And stopping the iterative updating process when the set stopping rule is reached, and sequencing the output feature subsets according to the set.
The beneficial effects are as follows: in order to research potential relation between labels and features, a loss function is constructed among a label coefficient matrix, an instance matrix, a weight matrix and a label matrix, and in order to further utilize local label correlation, a feature subset with higher score can be obtained by utilizing local label information, modified cosine similarity is introduced, feature redundancy is controlled by calculating the similarity among features, and features with low redundancy are screened out, so that classification effect is improved, and accuracy of classification results is guaranteed.
Further, in step 1), the loss function is a regularized loss function based on L2, 1.
Further, the regularization loss function based on L2,1 is:wherein U is a sample matrix, P is a weight matrix, G is a label matrix, and I UP-G I F Representing the Frobenius norm; gamma is a super parameter; i P I 2,1 The L2,1 norm of the matrix P is represented.
Further, in step 1), the local tag correlation function is built based on combining manifold constraints with Laplacian scores.
Further, the local tag correlation function is:wherein S is a tag penalty term, q (u i ) The e Q is a discriminant function, S 2 Represents the L2 norm, tr () represents the trace of the matrix, Q T Inverse matrix representing matrix Q,>L=H-H R is a matrix of the graph Laplace>For a diagonal matrix, λ is the hyper-parameter.
Further, in step 1), the feature redundancy function is:
wherein P is a weight matrix; w (W) er Representing a modified cosine similarity between the e-th feature and the r-th feature, n being the total number of features.
Further, in step 1), the objective function is:
wherein alpha, beta and lambda represent different super parameters, U is a sample matrix, P is a weight matrix, G is a label matrix, V is a label coefficient matrix, L p Is a symmetric graph Laplace matrix, S is a tag penalty term, W ij For a modified cosine similarity between the i-th feature and the j-th feature, I F Is the Frobenius norm, I 2,1 Is L2,1 norm.
Further, the values of the three matrices P, S and V in the objective function are iteratively updated, and the set iteration condition is met or the iteration number reaches the maximum iteration number, and the set iteration is stopped.
Further, the set iteration condition is satisfied when the difference between two consecutive function values is smaller than a preset value.
Further, in step 2), the feature subset output according to the set ordering is: and selecting the feature subsets from the maximum to the minimum according to the weight coefficients, and outputting the feature subsets which are arranged in sequence.
Drawings
FIGS. 1 a-1 f are graphs comparing the method of the present invention with prior art methods for AP in six different data sets;
FIGS. 2 a-2 f are CV plots of the method of the present invention versus the prior art method at six different data sets;
FIGS. 3 a-3 f are RL graphs comparing the method of the present invention with the prior art method in six different data sets;
FIGS. 4 a-4 f are HL graphs comparing the method of the present invention with prior art methods in six different data sets;
FIGS. 5 a-5 f are MA graphs comparing the method of the present invention with prior art methods in six different data sets;
FIGS. 6 a-6 f are graphs comparing MI of the method of the present invention versus prior art methods in six different data sets;
FIG. 7 is a flow chart of a sparse feature selection method based on local tag correlation and feature redundancy in accordance with the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.
Sparse feature selection method embodiment based on local tag correlation and feature redundancy:
the method of the present embodiment constructs a loss function between the tag coefficient matrix, the instance matrix, the weight matrix, and the tag matrix in order to determine potential links between the tags and the features. In the multi-label classification method, in order to solve the problem of data redundancy, an L2, 1-norm constraint loss function may be used to improve classification performance. To further exploit local label dependencies, manifold constraints are combined with the Laplace score, and these local labels are processed. By adding manifold regularization terms, the local geometry of the labels can be preserved and the local correlation function of the labels given. Thus, a higher scoring feature subset may be obtained using the local tag information. And introducing the modified cosine similarity, controlling the feature redundancy by calculating the similarity among the features, and screening out the features with representativeness and lower redundancy.
The method of this embodiment first establishes a new loss function between the instance matrix, the tag coefficient matrix, the weight matrix, and the tag matrix, and then introduces the Frobenius norm to explore the potential relationship between the feature and the tag. The weight matrix is thinned using L2,1 norm. Therefore, the new loss function not only has higher interpretability, but also can better perform feature ordering. Secondly, utilizing manifold constraints to process local geometric structures among labels, and further mining potential information among the labels; and then, fusing the Laplace scoring strategy, and screening out the features with higher scores. The manifold constraint and the Laplace score are combined to conduct embedded feature selection, and therefore mining of hidden information of potential labels can be guided. And finally, analyzing the feature redundancy by utilizing the modified cosine similarity in consideration of the differences among the feature scores and the redundancy among the examples, and generating a candidate feature subset with lower redundancy. And then selecting low redundancy features by using the L2 norm on the premise of keeping sparsity, and establishing a new objective function optimization solution. Specifically, the procedure of the sparse feature selection method based on local tag correlation and feature redundancy in this embodiment is as shown in fig. 7:
1) Constructing a loss function among the tag coefficient matrix, the instance matrix, the weight matrix and the tag matrix, and adopting constraint loss function based on L2 and 1 norm.
The method in this embodiment considers that since the convex function is easy to optimize, it is often used as a sparse regularization term by a sparse method, but the mode of accurately selecting the first k features using L2,0 norms, although L2,0 norms show a structured regularization technique, but it is mainly used for binary classification, so the method in this embodiment uses L2,1 norms as penalty terms for building a sparse model, which can not only sparse data, but also increase the interpretability of the model.
Assume thatRepresenting a training set; wherein u= [ U ] 1 ,u 2 ,…,u n ]∈R n×d Representing a sample matrix; d represents the number of feature vectors; g= { G 1 ,g 2 ,…,g n }∈{1,-1} n×m Representing a tag matrix; where m is the class of tags. For each sample u i If g ij =1,u i Containing the j-th tag, or vice versa (i.e., u i Not containing the j-th tag) then g ij = -1. The loss function is expressed as follows:
wherein γ is a hyper-parameter; i 2,1 Represents L2,1 norm; i F Representing the Frobenius norm; p epsilon R d×m Representing a weight matrix; the weight matrix may represent a mapping relationship of the feature space and the tag space.
To better represent tag information, the potential links between tags and features are mined and the most relevant features are selected to improve on equation (1) as follows: suppose U epsilon R n×d Representing a sample matrix; g epsilon R n×m Representing a tag matrix; p epsilon R d×m Representing a weight matrix. Then formula (1) can be generalized to:
wherein V is E R d×m The label coefficient matrix can represent important distribution of labels; gamma represents a hyper-parameter.
2) And combining manifold constraint with Laplacian score by utilizing local label correlation, and processing the local label.
By adding manifold regularization terms, the local geometry of the labels can be preserved and the local correlation function of the labels given. Thus, a higher scoring feature subset may be obtained using the local tag information.
The method of the present embodiment allows for multiple constraints of embedded feature selection to guide the exploration of potential real tags and the selection of different features for a single tag. Whereas the method of changing feature space to tag space using local topology uses tag information as two pairs of constraints to guide the construction of the graph, these constraints are not applicable to multi-tag datasets, and are different based on the relevant feature subsets of different tags, which may share a subset of shared characteristics closely related to the tag if some instances have the same tag. Thus, the method of the present embodiment processes the matrix associated with the local labels by scoring the local features using the Laplace score and introducing manifold constraints.
Let t= [ T ] 1 ,t 2 ,…,t n ]A digital label matrix; z is Z ij Representing a topological distance between the two feature vectors; c (C) ij Representing the topological distance between two digital label vectors. They can be described as:
wherein u is i And u j Is an instance vector; i 2 Represents an L2 norm; t is t i To describe the different meanings of tags associated with the same instance, facilitating the use of local tag correlation. In constraint scoring, the tag information is called a constraint to guide the construction of the graph, but these constraints are not applicable to multi-tag data, so u will be in this embodiment i And u j The constraint affinity between them is expressed as follows:
let k denote the number of nearest neighbors, then two instances u i And u j The constraint affinity between is defined as:
wherein H is ij Represents u i And u j Constraint affinity between these two vectors; in the affinity matrix, H ij With u i And u j The similarity increases; z is Z ij Representing a topological distance between the two feature vectors; c (C) ij Representing between two numerical token vectorsIs a topological distance of (c).
In addition, an information feature may preserve the local structure of an instance, if two features are nearest neighbors, then the feature should contain two instances that are close to each other, but instead the feature should be outside of the two instances. The adjacency matrix is defined in this embodiment as:
wherein the method comprises the steps ofH R ∈R n×n An instance of a k-nearest neighbor graph representing H; r (u) i ) Representing the set of u most recent instances of the ith instance.
Suppose U epsilon R n×d Representing an instance matrix; g epsilon R n×m Is a label matrix; p epsilon R d×m Representing a weight matrix; v epsilon R d×m Is a label coefficient matrix; e (E) n ∈R n Describing a full one vector; q epsilon R n×m Representing the mapping function. Then define the local tag information expression formula as:
Q=[q(u 1 ),q(u 2 ),...,q(u n )] T ∈R n×m (i.e.Q=GV+E n S+UP) (6)
wherein q (u) i )=g i V+E n H+u i The method comprises the steps of carrying out a first treatment on the surface of the Q represents a discriminant function; e (E) n ∈R n Describing a full one vector; s epsilon R m Punishment items for the tags; q (u) i ) E Q represents a discriminant function.
Let S.epsilon.R m Punishment items for the tags; h R ∈R n×n An example of an R nearest neighbor graph representing H;Q∈R n ×m representing the mapping function, then minimizing the local feature structure as:
wherein q (u) i ) E Q represents a discriminant function; i 2 Represents an L2 norm; tr () represents the trace of the matrix; q (Q) T An inverse matrix representing the matrix Q; l=h-H R A drawing Laplace matrix;is a diagonal matrix. In order to more easily study the local geometry characteristics, L is set to be symmetrical, i.e. +.>
3) And introducing the modified cosine similarity, controlling the feature redundancy by calculating the similarity among the features, and screening out the features with representativeness and lower redundancy.
In the method of this embodiment, in consideration of feature selection, redundant or uncorrelated features can be removed, model performance is improved, time cost is reduced, a control function based on feature redundancy is used to suppress connection between features, cosine similarity is used to calculate similarity between two features so as to preserve a feature relation in an original dataset, but cosine similarity is used to examine similarity between two non-zero vectors, and features with larger score difference are not considered, so that in order to eliminate the defect of similarity based on vectors, the method of this embodiment uses improved cosine similarity (decentralization before cosine formula calculation) to examine similarity of features, and this part constructs a control item with feature redundancy on the basis of a weight matrix so as to achieve the purpose of constraining feature redundancy, that is, the control function can evaluate correlation between the e-th feature and the r-th feature.
The cosine similarity is adjusted by concentrating the data based on the cosine similarity and then measuring the similarity of the two vectors by calculating the cosine of the included angle of the two vectors. The adjusted cosine similarity is as follows:
let p= [ P ] 1 ,p 2 ,…,p d ]∈R d×m Representing a weight matrix, W er Representing a modified cosine similarity between the e-th feature and the r-th feature. The feature redundancy function is:
wherein I 2 Representing the L2 norm.
The modified cosine similarity is obtained by performing decentration processing on the data based on the cosine similarity, and then measuring the similarity of the two vectors by calculating the cosine of the included angle of the two vectors. The corrected cosine similarity is noted as:
wherein the method comprises the steps ofIs the average index score for the a-th feature.
For equation (9), if W er =0, denote u e And u r There is no correlation between them. W (W) er Larger indicates stronger correlation, i.e. when |W er The correlation is strongest when i is close to 1; when W is er Near 0 the correlation is the weakest.
4) And establishing a multi-mark characteristic selection algorithm to improve the classification performance of the multi-mark data.
In this embodiment, an embedded multi-tag feature selection method is configured according to an objective function, a weight coefficient of each feature is solved, and features are selected from the largest to the smallest according to the weight coefficient. The objective function of the LSFSR consists of the following three parts: based on the loss function based on the L2,1 norm in equation (2), the local tag correlation function in equation (7), and the feature redundancy function in equation (8).
Suppose U R n×d Is an example matrix; g R n×m Is a label matrix; p (P) R d×m Representing a weight matrix; s is S R m Punishment items for the tags; v epsilon R d×m Is a label coefficient matrix; l (L) p A Laplace matrix is a symmetric graph; q epsilon R n×m Representing the discriminant function, the objective function of the LSFSR algorithm is expressed as:
wherein α, β and γ represent three different hyper-parameters; w (W) ij Representing the distance between the features and the distance between the features, I 2 Represents an L2 norm; i 2,1 Represents the L2,1 norm and can be used to solve for the lean fluffiness.
In the formula (10), three variables are included, namely a weight matrix P; a tag penalty term S; and a label coefficient matrix V. In general, the objective function is non-smooth and therefore cannot be solved directly, so an alternating gradient descent method is applied to solve these variables, and when two variables are fixed, only one unknown variable is in the optimization problem, so that the objective function can be solved directly.
Thus, the functions with respect to variables P, S and V are derived, respectively, and then the gradient descent operation is alternately performed in each iteration. The specific deduction results are as follows:
wherein the method comprises the steps ofH represents the diagonal matrix +.> D R d×d Representing a diagonal matrix; i R m×m Representing the identity matrix.
Solving the objective function with the inequality beam yields the partial derivatives of the variables P, S, V as:
the optimal solution for equation (12) is:
in this section, P, S, V values in the LSFSR algorithm are each an identity matrix initialized to 1. The three matrices according to equation (13) may be updated by an alternate optimization procedure. If the difference between the absolute values of two consecutive objective functions is less than 0.001, the iteration stops. It follows that the results of the LSFSR algorithm will be represented by a feature ordering, which may be specified by algorithm 1. Table 1 below is a schematic representation of the LSFSR algorithm process of this embodiment:
TABLE 1
The complexity of each iteration time of the LSFSR algorithm in this embodiment is analyzed as follows: for the update of P, S and V, due to matrix L p Only one calculation is needed, the remaining calculations are mainly matrix multiplication, transposition, matrix inversion, which results in O max (n 3 +n 2 d,m 3 ,d 3 ) Is not limited by the complexity of (a). For updating S and V, the main computing operations include matrix multiplication, transposition and matrix inverse computation, resulting in O, respectively max (n 2 ,dmn,m 3 ) And O max (m 3 ,dmn,m 2 n) complexity. To sum up, the time complexity of each iteration of the LSFSR algorithm is approximately O max (dmn,n 3 +nd,m 3 )。
And (3) experimental verification: to verify the superiority of the feature selection method of the present invention, in this embodiment, 6 multi-labeled text datasets are selected from http:// mulan. As shown in table 2, the attribute of the 6 data sets is that we divide the data sets into a corpus, divide the test set and the training set on the basis of the data sets to develop experiments, and evaluate the experimental results.
Table 2 detailed information table of multi-tag dataset
LSFSR was compared with GRRO, MFS-MCDM, WFSNR, GLFS, MDFS, MCLS. Wherein GRRO is a generic multi-tag learning framework based on global optimization and redundancy; MFS-MCDM is a multi-tag feature selection method using multi-standard decision making, where the set parameter is 0.1; WFSNR is a weak marker feature selection algorithm based on a neighborhood rough set and a Relieff; GLFS is a group retention tab specific feature selection algorithm for multi-tab learning; MDFS is a manifold regularization discrimination feature selection method based on multi-label learning, wherein set parameters are respectively as follows: 1. 1, 0.1; MCLS is a manifold-based constraint laplacian score multi-label feature selection.
The above methods and the first 100 feature subsets that have been sorted obtained by the LSFSR method of this embodiment are put into ML-KNN classification for comparison of classification results, and in order to quantify the performance of these multi-tag feature selection methods, six multi-tag learning metrics are used: micro average-F1 (MI), macro average-F1 (MA), rank Loss (RL), hamming Loss (HL), coverage (CV) and Average Precision (AP). Wherein, the effects are better the closer the hamming loss, ranking loss and coverage rate are to 0; the closer to 1 the average precision, macro-average-F1 and micro-average-F1 are, the better the effect is. In the experiment, the larger the +%) indicative value behind each evaluation criterion is, the better the performance is, the smaller the +% indicative value is, the better the performance is, the black body in the table marks the optimal expression value, and the six evaluation indexes respectively have different emphasis points, so that few feature selection methods can be superior to other methods on all the indexes. The results of the LSFSR comparison with other methods over 6 data sets are shown in table 3.
Table 3LSFSR results of comparison with other methods over 6 data sets
Table 3 gives the experimental results of seven algorithms on the Science, health, enron, social, reference, arts dataset for text classification. On the dataset Science, health, enron, social, reference, arts, the present algorithm achieves optimal performance at the six metrics HL, RL, CV, AP, MI and MA. Based on the 36 comparison results (6 data sets and 6 evaluation indexes) in the table, the present invention can obtain an optimal value at 100% of cases. The analysis of the experimental results fully shows that the classification performance induced by the feature subset obtained by the feature selection method is obviously superior to that of other comparison algorithms.
Fig. 1 a-1 f, 2 a-2 f, 3 a-3 f, 4 a-4 f, 5 a-5 f, and 6 a-6 f show experimental classification results for seven algorithms on six multi-labeled datasets. For each of the graphs, the X-axis (horizontal axis) represents the number of selected feature subsets and the Y-axis (vertical axis) represents the performance of the evaluation index. From fig. 1 a-6 f (i.e., from the experimental classification result graph illustrating seven algorithms on six multi-labeled datasets), LSFSR performs far better on six text datasets than all other algorithms for the five metrics AP, CV, HL, RL and MA. For datasets Health and Reference, the advantages of LSFSR are not obvious when the number of feature subsets selected is small in index MI, and as the number of features increases, the LSFSR algorithm will be due to all other algorithms. In general, it can be seen from FIGS. 1 a-6 f that our method is superior to the other six advanced multi-signature feature selection methods.
In this embodiment, a sparse feature selection method based on local tag correlation and feature redundancy is provided. Firstly, in order to solve the sparsity problem of example data, an L2,1 norm constraint weight matrix is adopted; functional relations among the label matrix, the instance matrix, the weight matrix and the label coefficient matrix are established, and relations between labels and features are deepened. Secondly, introducing manifold constraint to explore potential relations among local labels on the basis of an embedded method, and selecting a feature subset with better performance by combining Laplace scores; finally, the redundancy between the high-dimensional data features is constrained by the weight matrix, which is beneficial to generating feature subsets with discrimination and low redundancy. The L2 norm is used to select low redundancy features while maintaining sparsity. Finally, a multi-label feature selection algorithm is designed to obtain the best ordered feature subset. Experimental results show that our method has better classification performance for multiple data sets than other comparison methods. In future work, we will further study the deep links between features by studying the complex distribution of tags and features, building a sparse multi-tag feature selection model.
The above description is only a preferred embodiment of the present invention, and the patent protection scope of the present invention is defined by the claims, and all equivalent structural changes made by the specification and the drawings of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The sparse feature selection method based on local tag correlation and feature redundancy is characterized by comprising the following steps of:
1) Inputting a sample matrix, a tag matrix and super parameters in an objective function into the objective function for iterative updating, wherein the objective function is composed of a loss function, a local tag correlation function and a feature redundancy function which are constructed among the tag coefficient matrix, the instance matrix, the weight matrix and the tag matrix; the relation between the labels and the features is explored through a loss function, information between the labels is obtained according to a local label correlation function, the features with higher scores are screened, a feature subset with lower feature redundancy is generated according to a feature redundancy function, and the weight coefficient of each feature is solved;
2) And stopping the iterative updating process when the set stopping rule is reached, and sequencing the output feature subsets according to the set.
2. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1) the loss function is a regularized loss function based on L2, 1.
3. The sparse feature selection method based on local tag correlation and feature redundancy of claim 2, wherein the regularization loss function based on L2,1 is:wherein U is a sample matrix, P is a weight matrix, G is a label matrix, and I UP-G I F Representing the Frobenius norm; gamma is a super parameter; i P I 2,1 The L2,1 norm of the matrix P is represented.
4. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1) the local tag correlation function is established based on combining manifold constraints with laplace scores.
5. The sparse feature selection method based on local tag correlation and feature redundancy of claim 4, wherein the local tag correlation function is:wherein S is a tag penalty term, q (u i ) The e Q is a discriminant function, S 2 Represents the L2 norm, tr () represents the trace of the matrix, Q T Inverse matrix representing matrix Q,>L=H-H R is a matrix of the graph Laplace>For a diagonal matrix, λ is the hyper-parameter.
6. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1), the feature redundancy function is:
wherein P is a weight matrix; w (W) er Representing a modified cosine similarity between the e-th feature and the r-th feature, n being the total number of features.
7. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 1), the objective function is:
wherein alpha, beta and lambda represent different super parameters, U is a sample matrix, P is a weight matrix, G is a label matrix, V is a label coefficient matrix, L p Is a symmetric graph Laplace matrix, S is a tag penalty term, W ij For a modified cosine similarity between the i-th feature and the j-th feature, I F Is the Frobenius norm, I 2,1 Is L2,1 norm.
8. The sparse feature selection method based on local tag correlation and feature redundancy of claim 7, wherein the updating of the values of the three matrices P, S and V in the objective function is performed iteratively, and stopping when a set iteration condition is met or the number of iterations reaches a maximum number of iterations.
9. The sparse feature selection method based on local tag correlation and feature redundancy of claim 8, wherein the satisfying a set iteration condition is when a difference between two consecutive function values is less than a preset value.
10. The sparse feature selection method based on local tag correlation and feature redundancy of claim 1, wherein in step 2), the feature subset output according to the set ordering is: and selecting the feature subsets from the maximum to the minimum according to the weight coefficients, and outputting the feature subsets which are arranged in sequence.
CN202310743486.7A 2023-06-21 2023-06-21 Sparse feature selection method based on local tag correlation and feature redundancy Pending CN116910502A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310743486.7A CN116910502A (en) 2023-06-21 2023-06-21 Sparse feature selection method based on local tag correlation and feature redundancy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310743486.7A CN116910502A (en) 2023-06-21 2023-06-21 Sparse feature selection method based on local tag correlation and feature redundancy

Publications (1)

Publication Number Publication Date
CN116910502A true CN116910502A (en) 2023-10-20

Family

ID=88359195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310743486.7A Pending CN116910502A (en) 2023-06-21 2023-06-21 Sparse feature selection method based on local tag correlation and feature redundancy

Country Status (1)

Country Link
CN (1) CN116910502A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454154A (en) * 2023-12-22 2024-01-26 江西农业大学 Robust feature selection method for bias marker data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454154A (en) * 2023-12-22 2024-01-26 江西农业大学 Robust feature selection method for bias marker data

Similar Documents

Publication Publication Date Title
Ghasemian et al. Evaluating overfit and underfit in models of network community structure
Wang et al. Block diagonal representation learning for robust subspace clustering
Carbonera et al. A density-based approach for instance selection
Tian et al. Global linear neighborhoods for efficient label propagation
Ji et al. An EnKF-based scheme to optimize hyper-parameters and features for SVM classifier
Wang et al. Minimum error entropy based sparse representation for robust subspace clustering
CN111582506A (en) Multi-label learning method based on global and local label relation
CN116910502A (en) Sparse feature selection method based on local tag correlation and feature redundancy
Wang et al. Entropy regularization for unsupervised clustering with adaptive neighbors
Chen et al. Concept factorization with local centroids
CN111027636B (en) Unsupervised feature selection method and system based on multi-label learning
Chen et al. Feature weighted non-negative matrix factorization
Obozinski et al. Joint covariate selection for grouped classification
Leon-Alcaide et al. An evolutionary approach for efficient prototyping of large time series datasets
Wang et al. Laplacian regularized low-rank representation for cancer samples clustering
CN113033626B (en) Image classification method based on multi-task collaborative learning
CN110738245A (en) automatic clustering algorithm selection system and method for scientific data analysis
Shang et al. Unsupervised feature selection via discrete spectral clustering and feature weights
Fu et al. Auto-weighted low-rank representation for clustering
Gao et al. Possibilistic neighborhood graph: A new concept of similarity graph learning
Dornaika et al. Simultaneous label inference and discriminant projection estimation through adaptive self-taught graphs
Li et al. Multi-label feature selection with high-sparse personalized and low-redundancy shared common features
Zhang et al. Unsupervised learning of Dirichlet process mixture models with missing data.
Chang et al. Calibrated multi-task subspace learning via binary group structure constraint
CN113688229B (en) Text recommendation method, system, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination