CN108446735A - A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution - Google Patents
A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution Download PDFInfo
- Publication number
- CN108446735A CN108446735A CN201810233510.1A CN201810233510A CN108446735A CN 108446735 A CN108446735 A CN 108446735A CN 201810233510 A CN201810233510 A CN 201810233510A CN 108446735 A CN108446735 A CN 108446735A
- Authority
- CN
- China
- Prior art keywords
- feature
- vector
- population
- formula
- nca
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention discloses a kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution, it is intended to solve how from orientation optimization neighbour constituent analysis (NCA) algorithm of optimization, to obtain optimal feature weight coefficient.The method of the present invention optimizes the object function of NCA algorithms using differential evolution algorithm, to obtain the feature weight coefficient of global optimum.Compared to traditional NCA methods, the object function of NCA algorithms is optimized for using differential evolution algorithm, to ensure that last weight coefficient vector is global optimum's result rather than local optimum.Secondly, the method for the present invention is not with tradition NCA the difference is that consider the object function for including planningization parameter, and also there is no need to determine the size of regularisation parameter.It can be said that it is to be used for a kind of strategy for improvement of characteristic of division selection to traditional NCA methods that the present invention, which sends out method,.
Description
Technical field
The present invention relates to a kind of feature selection approach more particularly to a kind of neighbour constituent analysis is optimized based on differential evolution
Feature selection approach.
Background technology
In recent years, data digging method is obtained in all trades and professions and widely answers, for the theory of data digging method
Research has been similarly obtained extensive concern with application study.In industrial information construction, financial field, internet industry, logarithm
A large amount of manpower and material resources have been put into according to the research excavated with machine learning.Feature selecting is accounted in data mining with machine learning
The person's of having consequence, although it is not certain specific data mining or machine learning algorithm, feature selecting can be significantly
Ground improves the performance of follow-up data mining algorithm.When carrying out data modeling in particular for high dimensional data, feature selecting can be sent out
The positive effect shot is obvious to all.In identifying in mode for common disaggregated model, the input of model is typically height
The sample data of dimension, and the output of model is then the corresponding category label of each sample data.In the identical sorting algorithm of application
Under the premise of, it takes and input data feature selecting is not taken to have significantly difference on classification accuracy.Because implementing feature
When resettling disaggregated model after selection, the negative effect of many interference informations can be rejected, to promote the precision of disaggregated model.
For the research of feature selecting, has many researchers and propose phase for different objects, different problems
The resolving ideas answered.Among these, neighbour's constituent analysis (Neighborhood Component Analysis, NCA) be it is a kind of compared with
Novel feature selecting algorithm, can be dedicated for the feature selecting before classification model construction.The method optimizing that NCA passes through 1 rank neighbour
Leave-One-Out classification accuracy rates, and then obtain the weight coefficient of each input feature vector.So, weight coefficient is close to 0
Feature is exactly useless feature, can be rejected.However, the process of the Optimization Solution feature weight coefficient of tradition NCA methods is very
It is easily trapped into local optimum, and weight coefficient is also susceptible to over-fitting.Although can be by introducing regularisation parameter tune
It is had suffered fitting degree, but the regularisation parameter how to be selected the mode of cross validation can only to be relied on to carry out at present.Therefore, traditional
The perfect of NCA algorithms needs to be further studied.
Invention content
Technical problem underlying to be solved by this invention is:How from the orientation optimization NCA algorithms of optimization, to obtain
Optimal feature weight coefficient.Specifically, the method for the present invention optimizes the object function of NCA algorithms using differential evolution algorithm,
To obtain the feature weight coefficient of global optimum.
Technical solution is used by the present invention solves above-mentioned technical problem:One kind optimizing neighbour's ingredient based on differential evolution
The feature selection approach of analysis, includes the following steps:
(1) the different classes of y of application is collected1, y2..., yCCorresponding sample data set X1, X2..., XC, wherein C
Indicate classification sum, c class data setsInclude the N of m featurecA sample data, c=1,2 ..., C.
(2) by data set X1, X2..., XCForm a matrix X ∈ RN×m, and X is handled to obtain X by row execution standardization
=[x1, x2..., xN]T∈RN×mTo eliminate each feature dimension influence, wherein N=N1+N1+…+NC, xi∈Rm×1It indicates
I-th of sample data.
(3) parameter of differential evolution algorithm, including population number nP=6m, zoom factor Z=0.6, greatest iteration time are set
Number Imax >=2000 and crossover probability p=0.1.
(4) matrix W=[w of arbitrary initial m × nP dimensions1, w2..., wnP] after, set iterations iter=0 and k=1.
(5) take in matrix W k-th of column vector as population wk∈Rm×1Afterwards, according to formula dij=wk|xi-xj| calculating matrixIn arbitrary two sample points xiWith xjThe distance between dij, wherein | xi-xj| it indicates vector xi-xjIn element all take absolutely
Value, lower label i, j=1,2 ..., N.
(6) x is calculated according to formula as followsiSelect xjProbability p as its reference data pointsij:
(7) according to formula fk=∑i∑jzijpijCalculate k-th of population wkCorresponding object function fk, wherein zijFor two into
Number processed and only in xiWith xjValue 1 when belonging to one species.
(8) judge whether to meet condition k < NIf so, setting return to step after k=k+1 (5);If it is not, obtaining object function
Vectorial F=[f1, f2..., fN] after find out maximum value f in FbestCorresponding population wbest, and execute next step (9).
(9) it is that each population generates a corresponding variation vector v according to formula as followsk:
vk=wk+Z(wbest-wk)+Z(wa-wb) (2)
In above formula, lower label a and b is the 2 mutually different integers randomly generated from section [1, nP].
(10) according to formula as follows to the vector v that makes a variationkIt is modified, i.e.,;
Wherein, vK, nIndicate vector vkIn nth elements, n=1,2 ..., m.
(11) it is generated according to formula as follows and attempts vector uk∈Rm×1, i.e.,:
Wherein, uK, nWith wK, nRespectively ukWith wkMiddle nth elements, vectorial rand ∈ Rm×1Middle each element be all 0 to 1 it
Between equally distributed arbitrary random decimal, randnIt is then the nth elements in random vector rand.
(12) according to formula Population Regeneration w as followsk, i.e.,:
In above formula, h (uk) indicate ukAs population wkReplacement values after the target function value that is calculated.
(13) step (9)~(12) are repeated until all populations all update and finish to obtain new matrix W, juxtaposition iter=
iter+1。
(14) judge whether to meet condition iter > ImaxIf it is not, return to step (5) continues to execute;If so, output is most
Big object function fbestCorresponding population wbest, the respective weights coefficient of as each feature.
(15) according to wbest∈Rm×1In each element concrete numerical value size, will be close to the feature corresponding to 0 element
It rejects, then remaining feature is the result after feature selecting.
Compared with conventional method, inventive process have the advantage that:
First, the method for the present invention optimizes the object function of NCA algorithms using differential evolution algorithm, to ensure last power
Weight coefficient vector is global optimum's result rather than local optimum.Secondly, the method for the present invention and tradition NCA be not the difference is that
Once considered the object function for including planningization parameter, also there is no need to determine the size of regularisation parameter.It can be said that the present invention is sent out
Method is that a kind of strategy for improvement of characteristic of division selection is used for traditional NCA methods.
Description of the drawings
Fig. 1 is the implementing procedure figure of the method for the present invention.
Fig. 2 is the feature selecting result schematic diagram of the method for the present invention.
Specific implementation mode
The method of the present invention is described in detail with specific case study on implementation below in conjunction with the accompanying drawings.
As shown in Figure 1, the present invention discloses a kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution.Under
Design the validity that the numerical value case that one two is classified verifies the method for the present invention in face.
The equally distributed data set X between section [0,1] for randomly generating one 500 × 20 dimension, will be full in data set X
Sufficient condition X3·X9/X15The category label of the sample of < 0.4 is arranged to y1=1, and the classification mark of other samples for being unsatisfactory for condition
Number it is arranged to y2=2.
(1) above-mentioned training dataset is made of two class sample datas, and the result of feature selecting ought to select in data set X
3, the corresponding feature of 9 and 15 row, continues with implementation the method for the present invention.
(2) X is handled to obtain X=[x by row execution standardization1, x2..., x500]T∈R500×20To eliminate each feature
The influence of dimension.
(3) parameter of differential evolution algorithm, including population number nP=120, zoom factor Z=0.6, greatest iteration time are set
Number Imax=2000 and crossover probability p=0.1.
(4) matrix W=[w of sharp arbitrary initial m × nP dimensions1, w2..., wnP] after, set iterations iter=0 and k=
1。
(5) take in matrix W k-th of column vector as population wk∈Rm×1Afterwards, according to formula dij=wk|xi-xj| calculating matrixIn arbitrary two sample points xiWith xjThe distance between dij。
(6) x is calculatediSelect xjProbability p as its reference data pointsij。
(7) according to formula fk=∑i∑jzijpijCalculate k-th of population wkCorresponding object function fk。
(8) judge whether to meet condition k < 500If so, setting return to step after k=k+1 (5);If it is not, obtaining target letter
Number vector F=[f1, f2..., f500] after find out maximum value f in FbestCorresponding population wbest, and execute next step (9).
(9) it is that each population generates a corresponding variation vector vk。
(10) to the vector v that makes a variationkIt is modified.
(11) it is generated according to formula as follows and attempts vector uk∈Rm×1, i.e.,:
(12) Population Regeneration wk。
(13) step (9)~(12) are repeated until all populations all update and finish to obtain new matrix W, juxtaposition iter=
iter+1。
(14) judge whether to meet condition iter > ImaxIf it is not, return to step (5) continues to execute;If so, output is most
Big object function fbestCorresponding population wbest, the respective weights coefficient of as each feature.
(15) according to wbest∈R33×1In each element concrete numerical value size, will be close to the spy corresponding to 0 element
Sign is rejected, then remaining feature is the result after feature selecting.
As shown in Fig. 2, the corresponding weighting coefficient scatter plot of each feature, from figure it can be found that the method for the present invention correctly
Corresponding feature is selected.
Above-mentioned case study on implementation only is used for illustrating the specific implementation of the present invention, rather than limits the invention.
In the protection domain of spirit and claims of the present invention, to any modification that the present invention makes, the protection of the present invention is both fallen within
Range.
Claims (1)
1. a kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution, which is characterized in that include the following steps:
Step (1):Collect the different classes of y of application1, y2..., yCCorresponding sample data set X1, X2..., XC, wherein C
Indicate classification sum, c class data setsInclude the N of m featurecA sample data, c=1,2 ..., C;
Step (2):By data set X1, X2..., XCForm a matrix X ∈ RN×m, and X is handled to obtain X by row execution standardization
=[x1, x2..., xN]T∈RN×mTo eliminate each feature dimension influence, wherein N=N1+N1+…+NC, xi∈Rm×1It indicates
The transposition of i-th of sample data, upper label T representing matrixes or vector;
Step (3):The parameter of differential evolution algorithm, including population number nP=6m, zoom factor Z=0.6, greatest iteration time are set
Number Imax >=2000 and crossover probability p=0.1;
Step (4):Matrix W=[w of sharp arbitrary initial m × nP dimensions1, w2..., wnP] after, set iterations iter=0 and k=
1;
Step (5):Take in matrix W k-th of column vector as population wk∈Rm×1Afterwards, according to formula dij=wk|xi-xj| calculate square
Battle arrayIn arbitrary two sample points xiWith xjThe distance between dij, wherein | xi-xj| it indicates vector xi-xjIn element all take absolutely
Value, lower label i, j=1,2 ..., N;
Step (6):X is calculated according to formula as followsiSelect xjProbability p as its reference data pointsij:
Step (7):According to formula fk=∑i∑jzijpijCalculate k-th of population wkCorresponding neighbour's constituent analysis object function fk,
Wherein zijFor binary number and only in xiWith xjValue 1 when belonging to one species;
Step (8):Judge whether to meet condition k < NIf so, setting return to step after k=k+1 (5);If it is not, obtaining object function
Vectorial F=[f1, f2..., fN] after find out maximum value f in FbestCorresponding population wbest, and execute next step (9);
Step (9):It is that each population generates a corresponding variation vector v according to formula as followsk:
vk=wk+Z(wbest-wk)+Z(wa-wb) (2)
In above formula, lower label a and b is the 2 mutually different integers randomly generated from section [1, nP];
Step (10):According to formula as follows to the vector v that makes a variationkIt is modified, i.e.,:
In above formula, vK, nIndicate vector vkIn nth elements, n=1,2 ..., m;
Step (11):It is generated according to formula as follows and attempts vector uk∈Rm×1, i.e.,:
Wherein, uK, nWith wK, nRespectively ukWith wkMiddle nth elements, vectorial rand ∈ Rm×1Middle each element be all between 0 to 1 uniformly
The arbitrary random decimal of distribution, randnIt is then the nth elements in random vector rand;
Step (12):According to formula Population Regeneration w as followsk, i.e.,:
In above formula, h (uk) indicate ukAs population wkReplacement values after the target function value that is calculated;
Step (13):Step (9)~(12) are repeated until all populations all update and finish to obtain new matrix W, juxtaposition iter=
iter+1;
Step (14):Judge whether to meet condition iter > ImaxIf it is not, return to step (5) continues to execute;If so, output
Maximum target function fbestCorresponding population wbest, the respective weights coefficient of as each feature;
Step (15):According to wbest∈Rm×1In each element concrete numerical value size, will be close to the spy corresponding to 0 element
Sign is rejected, then remaining feature is the result after feature selecting.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810233510.1A CN108446735A (en) | 2018-03-06 | 2018-03-06 | A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810233510.1A CN108446735A (en) | 2018-03-06 | 2018-03-06 | A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108446735A true CN108446735A (en) | 2018-08-24 |
Family
ID=63196015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810233510.1A Withdrawn CN108446735A (en) | 2018-03-06 | 2018-03-06 | A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108446735A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109407649A (en) * | 2018-10-09 | 2019-03-01 | 宁波大学 | A kind of fault type matching process based on fault signature variables choice |
CN109636487A (en) * | 2019-01-14 | 2019-04-16 | 平安科技(深圳)有限公司 | Advertisement sending method, server, computer equipment and storage medium |
CN113177608A (en) * | 2021-05-21 | 2021-07-27 | 河南大学 | Neighbor model feature selection method and device for incomplete data |
CN113191616A (en) * | 2021-04-18 | 2021-07-30 | 宁波大学科学技术学院 | Polypropylene product quality abnormity detection method based on double-layer correlation characteristic analysis |
-
2018
- 2018-03-06 CN CN201810233510.1A patent/CN108446735A/en not_active Withdrawn
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109407649A (en) * | 2018-10-09 | 2019-03-01 | 宁波大学 | A kind of fault type matching process based on fault signature variables choice |
CN109636487A (en) * | 2019-01-14 | 2019-04-16 | 平安科技(深圳)有限公司 | Advertisement sending method, server, computer equipment and storage medium |
CN109636487B (en) * | 2019-01-14 | 2023-09-29 | 平安科技(深圳)有限公司 | Advertisement pushing method, server, computer device and storage medium |
CN113191616A (en) * | 2021-04-18 | 2021-07-30 | 宁波大学科学技术学院 | Polypropylene product quality abnormity detection method based on double-layer correlation characteristic analysis |
CN113191616B (en) * | 2021-04-18 | 2023-01-24 | 宁波大学科学技术学院 | Polypropylene product quality abnormity detection method based on double-layer correlation characteristic analysis |
CN113177608A (en) * | 2021-05-21 | 2021-07-27 | 河南大学 | Neighbor model feature selection method and device for incomplete data |
CN113177608B (en) * | 2021-05-21 | 2023-09-05 | 河南大学 | Neighbor model feature selection method and device for incomplete data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108446735A (en) | A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution | |
CN104536412B (en) | Photoetching procedure dynamic scheduling method based on index forecasting and solution similarity analysis | |
CN111191732A (en) | Target detection method based on full-automatic learning | |
Talavera-Llames et al. | Big data time series forecasting based on nearest neighbours distributed computing with Spark | |
CN109214449A (en) | A kind of electric grid investment needing forecasting method | |
CN102521656A (en) | Integrated transfer learning method for classification of unbalance samples | |
CN105929690B (en) | A kind of Flexible Workshop Robust Scheduling method based on decomposition multi-objective Evolutionary Algorithm | |
CN108921604B (en) | Advertisement click rate prediction method based on cost-sensitive classifier integration | |
CN105373606A (en) | Unbalanced data sampling method in improved C4.5 decision tree algorithm | |
CN103886330A (en) | Classification method based on semi-supervised SVM ensemble learning | |
CN103473598A (en) | Extreme learning machine based on length-changing particle swarm optimization algorithm | |
CN112685504B (en) | Production process-oriented distributed migration chart learning method | |
CN103617435A (en) | Image sorting method and system for active learning | |
Febriantono et al. | Classification of multiclass imbalanced data using cost-sensitive decision tree C5. 0 | |
CN110751378A (en) | Nuclear facility decommissioning scheme evaluation method and system | |
CN113392587A (en) | Parallel support vector machine classification method for large-area landslide risk evaluation | |
CN107392155A (en) | The Manuscripted Characters Identification Method of sparse limited Boltzmann machine based on multiple-objection optimization | |
CN107273922A (en) | A kind of screening sample and weighing computation method learnt towards multi-source instance migration | |
CN108830407B (en) | Sensor distribution optimization method in structure health monitoring under multi-working condition | |
CN111737924B (en) | Method for selecting typical load characteristic transformer substation based on multi-source data | |
CN110084376B (en) | Method and device for automatically separating data into boxes | |
CN108805152A (en) | A kind of scene classification method and device | |
CN116993548A (en) | Incremental learning-based education training institution credit assessment method and system for LightGBM-SVM | |
CN116306785A (en) | Student performance prediction method of convolution long-short term network based on attention mechanism | |
CN116452373A (en) | Multi-target genetic algorithm-based intelligent generation method and system for block building body quantity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180824 |