CN107203489A - A kind of feature selection approach based on optimal reconstruct - Google Patents

A kind of feature selection approach based on optimal reconstruct Download PDF

Info

Publication number
CN107203489A
CN107203489A CN201710188156.0A CN201710188156A CN107203489A CN 107203489 A CN107203489 A CN 107203489A CN 201710188156 A CN201710188156 A CN 201710188156A CN 107203489 A CN107203489 A CN 107203489A
Authority
CN
China
Prior art keywords
matrix
optimal
data
representing matrix
mentioned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710188156.0A
Other languages
Chinese (zh)
Inventor
张晓宇
王树鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201710188156.0A priority Critical patent/CN107203489A/en
Publication of CN107203489A publication Critical patent/CN107203489A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The present invention provides a kind of feature selection approach based on optimal reconstruct, and its step is:1) it is that the data that d is tieed up are expressed as data matrix X, wherein d > 1 by each primitive character in data set;2) above-mentioned data matrix X is set up and optimizes linear reconstruction model, and the model optimum target is representing matrix B;3) above-mentioned data matrix X progress transposition are obtained into eigenmatrix F, and representing matrix B is zeroed out;4) using the mode of iteration alternative optimization to above-mentioned process step 3) optimization linear reconstruction model solve, obtain optimal representing matrix B*;5) according to optimal representing matrix B*Selection can represent the optimal k dimensional features subset of whole d dimensional features, wherein k<d.This method to data without being labeled, and full dose data may serve to Optimized model, so as to ensure that data volume;The feature selecting based on full dose data can reflect the true binary distribution of " data characteristics " comprehensively simultaneously.

Description

A kind of feature selection approach based on optimal reconstruct
Technical field
The present invention relates to data mining technology field, more particularly to a kind of feature selection approach based on optimal reconstruct.
Background technology
In machine learning, high dimensional data is often difficult to efficient process on mathematics and computing.Feature learning is sought to The low-dimensional expression-form of original high dimensional data.Feature learning means are divided into two kinds:Feature extraction and feature selecting.Feature extraction will Original high-dimensional feature space is mapped to lower-dimensional subspace, after optimal mapping function is tried to achieve, and all features are required for participating in feature Convert to obtain low-dimensional feature representation;And feature selecting is then directly chosen low-dimensional subset from original high dimensional feature and removed other Feature, is not related to feature calculation, after optimal characteristics position is tried to achieve, can directly be obtained from relevant position low-dimensional feature without Use all features.As can be seen here, in actual applications, the execution efficiency of feature selection approach is more high more conventional.
Current feature selection approach is asked under sparsity constraints from the discrimination performance design object function of feature From data input to differentiate output optimization Linear Mapping model, this method depend critically upon the quality of labeled data and Scale.In actual applications, data, which are marked, is wasted time and energy, of a high price, therefore the quantity of labeled data is difficult to ensure that;It is another Aspect, labeled data only account for very little proportion relative to full dose data, the data distribution that labeled data has been obeyed can not be complete The distribution situation of full dose data is reflected in face.Therefore, above-mentioned factor constrains the effect of feature selecting.
The content of the invention
It is an object of the invention to provide a kind of feature selection approach based on optimal reconstruct, this method to data without entering Rower is noted, and full dose data may serve to Optimized model, so as to ensure that data volume;While the feature selecting based on full dose data The true binary distribution of " data-feature " can be reflected comprehensively, therefore selected feature has optimal expression performance.
For above-mentioned purpose, the technical solution adopted in the present invention is:
A kind of feature selection approach based on optimal reconstruct, its step includes:
1) it is that the data that d is tieed up are expressed as data matrix X, wherein d > 1 by each primitive character in data set;
2) above-mentioned data matrix X is set up and optimizes linear reconstruction model, and the model optimum target is representing matrix B;
3) above-mentioned data matrix X progress transposition are obtained into eigenmatrix F, and representing matrix B is zeroed out;
4) using the mode of iteration alternative optimization to above-mentioned process step 3) optimization linear reconstruction model solve, Obtain optimal representing matrix B*
5) according to optimal representing matrix B*Selection can represent the optimal k dimensional features subset of whole d dimensional features, wherein k<d.
Further, step 2) described in representing matrix B matrixing representation be:
Wherein ‖ (FB-F)T2,1Represent reconstructed error, and bi(1≤i≤d) is representing matrix B the i-th column vector, fi(1≤ I≤d) it is characterized the i-th column vector of matrix F;‖B‖2,1It is the regular terms for ensureing that row is openness, β is that regulation reconstructed error and row are dilute Dredge the parameter of both property proportions.
Further, step 4) described in iteration alternative optimization mode refer to circulation perform following steps until algorithm receive Hold back, obtain optimal representing matrix B*;Its step is:
4-1) calculate diagonal matrixIts i-th of diagonal element is:pii=1/ (2 ‖ Fbi-fi2);Wherein bi(1 ≤ i≤d) be representing matrix B the i-th column vector, fi(1≤i≤d) is characterized the i-th column vector of matrix F;
4-2) calculate diagonal matrixIts i-th of diagonal element is:qii=1/ (2 ‖ bi2);Wherein bi(1≤i ≤ d) be representing matrix B the i-th row vector;
Above-mentioned representing matrix B 4-3) is updated, its i-th column vector is:bi=pii(piiFTF+βQ)-1FTfi;Wherein β is regulation The parameter of reconstructed error and both openness proportions of row;
4-4) when algorithmic statement condition C restrains, then iteration terminates and returns to optimal representing matrix B*
Further, step 4-3) in using the method that solves line by line analytic solutions are obtained to representing matrix B.
Further, step 4-4) in judge that the whether convergent method of the algorithmic statement condition C includes but is not limited to: The difference of representing matrix B obtained by two-wheeled iteration, is designated as Δ B before and after calculating;WhenWhen algorithmic statement;WhereinFor Frobenius norms, t is the threshold value rule of thumb set.
Further, step 5) specifically include:
5-1) to above-mentioned optimal representing matrix B*Element take absolute value, obtain abs (B*);
5-2) to above-mentioned abs (B*) by row summation, obtain indicating vectorial idx;
Above-mentioned instruction vector idx preceding k big elements 5-3) are chosen, then the collection of the corresponding feature composition of the k element Close is that can represent the optimal k dimensional features subset of whole d dimensional features.
The beneficial effects of the present invention are:The present invention provides a kind of feature selection approach based on optimal reconstruct, this method From the expression performance design object function of feature, the optimization linear reconstruction mould of the feature reconstruction under sparsity constraints is asked for Type, namely ask for being capable of the optimal feature subset of linear reconstruction primitive character.This method is independent of specific data mark simultaneously Note, effectively utilizes full dose data, and feature selection issues are defined as to optimize linear reconstruction problem, learnt using automatic mode The sparse expression of representing matrix, and using iteration alternative optimization mode realize optimize linear reconstruction model Efficient Solution, Ensure that every one-step optimization has analytic solutions, so as to effectively improve solution efficiency, reduction computational complexity.And by full dose number According to the modeling of overall distribution, the optimal feature subset of comprehensive representation " data-feature " binary distribution is capable of in acquisition, so as to height Dimension primitive character ensure that while dimensionality reduction the optimal expression performance of low-dimensional character subset.
Brief description of the drawings
Fig. 1 is a kind of feature selection approach flow chart based on optimal reconstruct that the present invention is provided.
Fig. 2 is a kind of feature selection approach particular flow sheet based on optimal reconstruct for the embodiment that the present invention is provided.
Embodiment
To enable the features described above and advantage of the present invention to become apparent, special embodiment below, and coordinate institute's accompanying drawing work Describe in detail as follows.
In the present patent application file, symbol represents that rule is:Matrix is expressed as overstriking capitalization, and vector representation is overstriking Lowercase, scalar is expressed as conventional alpha.Give set matrix M=[mij], the row vector of matrix i-th and jth column vector difference table It is shown as miAnd mj.And vectorLp- normal form is defined as:
MatrixL2,1- normal form is defined as:
The present invention provides a kind of feature selection approach based on optimal reconstruct, as shown in figure 1, its step includes:
1) it is that the data that d is tieed up are expressed as data matrix X, wherein d > 1 by each primitive character in data set.
Assuming that data set includes n data, each data primitive character is tieed up for d, i.e., each data xi(1≤i≤n) can be with A d dimensional feature vector is expressed as, then data matrix can be expressed asIn order to realize that feature drops Dimension from original d feature, it is necessary to choose k (k<D) it is individual.
2) above-mentioned data matrix X is set up and optimizes linear reconstruction model, and the model optimum target is representing matrix B;And data matrix X and representing matrix B are initialized.
Feature selection issues are modeled as an optimization linear reconstruction model by the inventive method, and the model is chosen being capable of table Show the optimal k dimensional features subset of whole d dimensional features.Above-mentioned data matrix X is subjected to transposition, eigenmatrix is obtained Each column vector f in eigenmatrixi(1≤i≤d) corresponds to using n data as basal orientation Point in the n-dimensional space of amount.And representing matrix B is zeroed out.The optimum target of the inventive method is to ask for table Show matrixSo that:
(3) in formula,Represent reconstructed error, wherein bi(1≤i≤d) for representing matrix B i-th arrange to Amount, fi(1≤i≤d) is characterized the i-th column vector of matrix F;‖B‖2,1It is for ensuring that (row is openness for the openness regular terms of row So that B row vector is mostly null vector, so as to reach the purpose of feature selecting), β is regulation reconstructed error and row openness two The parameter of person's proportion.
(3) the matrixing representation of formula is:
3) above-mentioned optimization linear reconstruction model is solved using the mode of iteration alternative optimization, obtains optimal expression Matrix B*.The inventive method can obtain the cost function of augmentation by introducing auxiliary vector p and q:
(5) in formula, Tr () is the mark of matrix, is the summation of each element on matrix leading diagonal, (FB-F) P (FB-F)T And BTQB is the multiplication operations of matrix, and β is the parameter for adjusting reconstructed error and both openness proportions of row.P and Q are to angular moment Gust, diagonal element is respectively:
(8) in formula, bi(1≤i≤d) is representing matrix B the i-th column vector, fi(1≤i≤d) is characterized the i-th of matrix F Column vector;(9) b in formulai(1≤i≤d) is representing matrix B the i-th row vector.
Fixed vector p and q, representing matrix B can be calculated by following formula:
To B derivations, and derivative is made to be zero:
According to (11) formula, representing matrix B analytic solutions can not be directly obtained.The present invention is right using the method solved line by line I-th (1≤i≤d) is arranged, and (11) formula can be rewritten as following column vector form:
piiFTFbi+βQbi-piiFTfi=0 (12)
According to (12) formula, the analytic solutions of representing matrix B the i-th row are:
bi=pii(piiFTF+βQ)-1FTfi (13)
To sum up, optimal representing matrix B is solved*Process be:By alternative optimization (6), (7), (13) and loop iteration is straight (utilizing the mode of iteration alternative optimization) is restrained to algorithmic statement condition C, optimal representing matrix B is now obtained*.Wherein judge The whether convergent method of algorithmic statement condition C includes but is not limited to:Representing matrix B obtained by two-wheeled iteration before and after calculating Difference, is designated as Δ B;WhenWhen restrain;Wherein t is the threshold value rule of thumb set, and F is Frobenius contracting Write.Namely when the poor Frobenius norms of representing matrix B obtained by front and rear two-wheeled iterationAfter threshold value, algorithmic statement.
4) according to optimal representing matrix B*Selection can represent the optimal k dimensional features subset of whole d dimensional features, wherein k<d.
According to optimal representing matrix B*, selection k dimensional features detailed process be:
4-1) to optimal representing matrix B*Element take absolute value, obtain abs (B*)。
4-2) to abs (B*) by row summation, obtain indicating vectorial idx.
Preceding k big elements for indicating vector idx 4-3) are chosen, then the corresponding feature of k element is the base selected In the k dimensional features of optimal reconstruct, the set of k element corresponding feature composition namely selection can represent whole d dimensional features Optimal k dimensional features subset.
Below the present invention is illustrated for a specific embodiment.
A kind of feature selection approach particular flow sheet such as Fig. 2 institutes based on optimal reconstruct for the embodiment that the present invention is provided Show, its step includes:
1) known data matrix is inputtedSelect characteristic k.
2) above-mentioned data matrix X and representing matrix B (optimum target for optimizing linear reconstruction model) are carried out initial Change, i.e., data matrix X progress transposition are obtained into eigenmatrixRepresenting matrix B is zeroed out and obtained
3) above-mentioned representing matrix B is optimized, obtains optimal representing matrix B*;I.e. circulation performs step 3-1) to step 3- 4) until algorithmic statement.
3-1) calculate diagonal matrixIts i-th of diagonal element is:pii=1/ (2 ‖ Fbi-fi2);Wherein bi(1 ≤ i≤d) be representing matrix B the i-th column vector, fi(1≤i≤d) is characterized the i-th column vector of matrix F;
3-2) calculate diagonal matrixIts i-th of diagonal element is:qii=1/ (2 ‖ bi2);Wherein bi(1≤i ≤ d) be representing matrix B the i-th row vector;
Representing matrix B 3-3) is updated, its i-th column vector is:bi=pii(piiFTF+βQ)-1FTfi;Wherein β is regulation reconstruct The parameter of error and both openness proportions of row;
3-4) the computational algorithm condition of convergenceWhen C restrains, iteration terminates, and returns to optimal representing matrix B*
4) according to above-mentioned optimal representing matrix B*Feature selecting is carried out, it specifically includes following steps:
4-1) to above-mentioned optimal representing matrix B*Element take absolute value, obtain abs (B*)。
4-2) to abs (B*) by row summation, obtain indicating vectorial idx.
4-3) to indicating vector idx element value according to sorting from big to small, the corresponding feature of preceding k element is to select The k dimensional features based on optimal reconstruct, that is, that chooses can represent the optimal k dimensional features subset of whole d dimensional features.
5) data matrix (the optimal k dimensional features subset chosen) after output dimensionality reduction:
Implement to be merely illustrative of the technical solution of the present invention rather than be limited above, the ordinary skill people of this area Member can modify or equivalent substitution to technical scheme, without departing from the spirit and scope of the present invention, this hair Bright protection domain should be to be defined described in claims.

Claims (6)

1. a kind of feature selection approach based on optimal reconstruct, its step includes:
1) it is that the data that d is tieed up are expressed as data matrix X, wherein d > 1 by each primitive character in data set;
2) above-mentioned data matrix X is set up and optimizes linear reconstruction model, and the model optimum target is representing matrix B;
3) above-mentioned data matrix X progress transposition are obtained into eigenmatrix F, and representing matrix B is zeroed out;
4) using the mode of iteration alternative optimization to above-mentioned process step 3) optimization linear reconstruction model solve, obtain Optimal representing matrix B*
5) according to optimal representing matrix B*Selection can represent the optimal k dimensional features subset of whole d dimensional features, wherein k<d.
2. the method as described in claim 1, it is characterised in that step 2) described in representing matrix B matrixing representation For:
Wherein ‖ (FB-F)T2,1Represent reconstructed error, and bi(1≤i≤d) is representing matrix B the i-th column vector, fi(1≤i≤ D) it is characterized the i-th column vector of matrix F;‖B‖2,1It is the regular terms for ensureing that row is openness, β is that regulation reconstructed error and row are sparse The parameter of both property proportions.
3. the method as described in claim 1, it is characterised in that step 4) described in the mode of iteration alternative optimization refer to circulation Following steps are performed until algorithmic statement, obtains optimal representing matrix B*;Its step is:
4-1) calculate diagonal matrixIts i-th of diagonal element is:pii=1/ (2 ‖ Fbi-fi2);Wherein bi(1≤i ≤ d) be representing matrix B the i-th column vector, fi(1≤i≤d) is characterized the i-th column vector of matrix F;
4-2) calculate diagonal matrixIts i-th of diagonal element is:qii=1/ (2 ‖ bi2);Wherein bi(1≤i≤d) For representing matrix B the i-th row vector;
Above-mentioned representing matrix B 4-3) is updated, its i-th column vector is:bi=pii(piiFTF+βQ)-1FTfi;Wherein β is regulation reconstruct The parameter of error and both openness proportions of row;
4-4) when algorithmic statement condition C restrains, then iteration terminates and returns to optimal representing matrix B*
4. method as claimed in claim 3, it is characterised in that step 4-3) in representing matrix B using the side solved line by line Method obtains analytic solutions.
5. method as claimed in claim 3, it is characterised in that step 4-4) in judge whether the algorithmic statement condition C is received The method held back includes but is not limited to:The difference of representing matrix B obtained by two-wheeled iteration, is designated as Δ B before and after calculating;WhenWhen algorithmic statement;WhereinFor Frobenius norms, t is the threshold value rule of thumb set.
6. the method as described in claim 1, it is characterised in that step 5) specifically include:
5-1) to above-mentioned optimal representing matrix B*Element take absolute value, obtain abs (B*);
5-2) to above-mentioned abs (B*) by row summation, obtain indicating vectorial idx;
Above-mentioned instruction vector idx preceding k big elements 5-3) are chosen, then the set of the corresponding feature composition of the k element is For the optimal k dimensional features subset of whole d dimensional features can be represented.
CN201710188156.0A 2017-03-27 2017-03-27 A kind of feature selection approach based on optimal reconstruct Pending CN107203489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710188156.0A CN107203489A (en) 2017-03-27 2017-03-27 A kind of feature selection approach based on optimal reconstruct

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710188156.0A CN107203489A (en) 2017-03-27 2017-03-27 A kind of feature selection approach based on optimal reconstruct

Publications (1)

Publication Number Publication Date
CN107203489A true CN107203489A (en) 2017-09-26

Family

ID=59904943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710188156.0A Pending CN107203489A (en) 2017-03-27 2017-03-27 A kind of feature selection approach based on optimal reconstruct

Country Status (1)

Country Link
CN (1) CN107203489A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363839A (en) * 2018-01-23 2018-08-03 佛山市顺德区中山大学研究院 The optimal pin of extensive BGA package based on priori is distributed generation method
CN110029544A (en) * 2019-06-03 2019-07-19 西南交通大学 A kind of measurement method and device of track irregularity
CN111783816A (en) * 2020-02-27 2020-10-16 北京沃东天骏信息技术有限公司 Feature selection method and device, multimedia and network data dimension reduction method and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363839A (en) * 2018-01-23 2018-08-03 佛山市顺德区中山大学研究院 The optimal pin of extensive BGA package based on priori is distributed generation method
CN110029544A (en) * 2019-06-03 2019-07-19 西南交通大学 A kind of measurement method and device of track irregularity
CN111783816A (en) * 2020-02-27 2020-10-16 北京沃东天骏信息技术有限公司 Feature selection method and device, multimedia and network data dimension reduction method and equipment

Similar Documents

Publication Publication Date Title
Carter China's grain production and trade: An economic analysis
CN110211192B (en) Rendering method from three-dimensional model to two-dimensional image based on deep learning
McLoughlin et al. Similarity measures for enhancing interactive streamline seeding
CN107203489A (en) A kind of feature selection approach based on optimal reconstruct
Robinzonov et al. Boosting techniques for nonlinear time series models
Tyagi et al. Optimal Conjugate Gradient Algorithm for Generalization of Linear Discriminant Analysis Based on L1 Norm.
Liu et al. Name your style: An arbitrary artist-aware image style transfer
CN103914527B (en) Graphic image recognition and matching method based on genetic programming algorithms of novel coding modes
Lohpetch et al. Discovering effective technical trading rules with genetic programming: Towards robustly outperforming buy-and-hold
CN110096630A (en) Big data processing method of the one kind based on clustering
Junbao et al. Refined kernel principal component analysis based feature extraction
CN113624998A (en) Electric boiler heat supplementing and heat storing cost optimization method and device based on electric power big data
Vallejo et al. InstanceRank: Bringing order to datasets
CN109255106A (en) A kind of text handling method and terminal
CN101655985B (en) Unified parametrization method of human face cartoon samples of diverse styles
CN104657553B (en) A kind of hardware-accelerated method of similarity measure based on quick normalized crosscorrelation method
CN104598657A (en) Gene die body reconstruction technology based on memtic algorithm
CN116629264A (en) Relation extraction method based on multiple word embedding and multi-head self-attention mechanism
Zalk Markups in South African Manufacturing-Are they high and what can they tell us?
CN106875101B (en) Energy management system control method and control device
CN109784545A (en) A kind of dispatching method of the distributed energy hinge based on multiple agent
Zhu et al. Joint Learning of Anchor Graph-Based Fuzzy Spectral Embedding and Fuzzy K-Means
CN114092653A (en) Method, device and equipment for reconstructing 3D image based on 2D image and storage medium
Mrad et al. Optimization of unconstrained problems using a developed algorithm of spectral conjugate gradient method calculation
Tian et al. PL-FSCIL: Harnessing the Power of Prompts for Few-Shot Class-Incremental Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170926