CN108446735A - A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution - Google Patents

A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution Download PDF

Info

Publication number
CN108446735A
CN108446735A CN201810233510.1A CN201810233510A CN108446735A CN 108446735 A CN108446735 A CN 108446735A CN 201810233510 A CN201810233510 A CN 201810233510A CN 108446735 A CN108446735 A CN 108446735A
Authority
CN
China
Prior art keywords
feature
vector
population
formula
nca
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810233510.1A
Other languages
Chinese (zh)
Inventor
童楚东
俞海珍
朱莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN201810233510.1A priority Critical patent/CN108446735A/en
Publication of CN108446735A publication Critical patent/CN108446735A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses a kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution, it is intended to solve how from orientation optimization neighbour constituent analysis (NCA) algorithm of optimization, to obtain optimal feature weight coefficient.The method of the present invention optimizes the object function of NCA algorithms using differential evolution algorithm, to obtain the feature weight coefficient of global optimum.Compared to traditional NCA methods, the object function of NCA algorithms is optimized for using differential evolution algorithm, to ensure that last weight coefficient vector is global optimum's result rather than local optimum.Secondly, the method for the present invention is not with tradition NCA the difference is that consider the object function for including planningization parameter, and also there is no need to determine the size of regularisation parameter.It can be said that it is to be used for a kind of strategy for improvement of characteristic of division selection to traditional NCA methods that the present invention, which sends out method,.

Description

A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution
Technical field
The present invention relates to a kind of feature selection approach more particularly to a kind of neighbour constituent analysis is optimized based on differential evolution Feature selection approach.
Background technology
In recent years, data digging method is obtained in all trades and professions and widely answers, for the theory of data digging method Research has been similarly obtained extensive concern with application study.In industrial information construction, financial field, internet industry, logarithm A large amount of manpower and material resources have been put into according to the research excavated with machine learning.Feature selecting is accounted in data mining with machine learning The person's of having consequence, although it is not certain specific data mining or machine learning algorithm, feature selecting can be significantly Ground improves the performance of follow-up data mining algorithm.When carrying out data modeling in particular for high dimensional data, feature selecting can be sent out The positive effect shot is obvious to all.In identifying in mode for common disaggregated model, the input of model is typically height The sample data of dimension, and the output of model is then the corresponding category label of each sample data.In the identical sorting algorithm of application Under the premise of, it takes and input data feature selecting is not taken to have significantly difference on classification accuracy.Because implementing feature When resettling disaggregated model after selection, the negative effect of many interference informations can be rejected, to promote the precision of disaggregated model.
For the research of feature selecting, has many researchers and propose phase for different objects, different problems The resolving ideas answered.Among these, neighbour's constituent analysis (Neighborhood Component Analysis, NCA) be it is a kind of compared with Novel feature selecting algorithm, can be dedicated for the feature selecting before classification model construction.The method optimizing that NCA passes through 1 rank neighbour Leave-One-Out classification accuracy rates, and then obtain the weight coefficient of each input feature vector.So, weight coefficient is close to 0 Feature is exactly useless feature, can be rejected.However, the process of the Optimization Solution feature weight coefficient of tradition NCA methods is very It is easily trapped into local optimum, and weight coefficient is also susceptible to over-fitting.Although can be by introducing regularisation parameter tune It is had suffered fitting degree, but the regularisation parameter how to be selected the mode of cross validation can only to be relied on to carry out at present.Therefore, traditional The perfect of NCA algorithms needs to be further studied.
Invention content
Technical problem underlying to be solved by this invention is:How from the orientation optimization NCA algorithms of optimization, to obtain Optimal feature weight coefficient.Specifically, the method for the present invention optimizes the object function of NCA algorithms using differential evolution algorithm, To obtain the feature weight coefficient of global optimum.
Technical solution is used by the present invention solves above-mentioned technical problem:One kind optimizing neighbour's ingredient based on differential evolution The feature selection approach of analysis, includes the following steps:
(1) the different classes of y of application is collected1, y2..., yCCorresponding sample data set X1, X2..., XC, wherein C Indicate classification sum, c class data setsInclude the N of m featurecA sample data, c=1,2 ..., C.
(2) by data set X1, X2..., XCForm a matrix X ∈ RN×m, and X is handled to obtain X by row execution standardization =[x1, x2..., xN]T∈RN×mTo eliminate each feature dimension influence, wherein N=N1+N1+…+NC, xi∈Rm×1It indicates I-th of sample data.
(3) parameter of differential evolution algorithm, including population number nP=6m, zoom factor Z=0.6, greatest iteration time are set Number Imax >=2000 and crossover probability p=0.1.
(4) matrix W=[w of arbitrary initial m × nP dimensions1, w2..., wnP] after, set iterations iter=0 and k=1.
(5) take in matrix W k-th of column vector as population wk∈Rm×1Afterwards, according to formula dij=wk|xi-xj| calculating matrixIn arbitrary two sample points xiWith xjThe distance between dij, wherein | xi-xj| it indicates vector xi-xjIn element all take absolutely Value, lower label i, j=1,2 ..., N.
(6) x is calculated according to formula as followsiSelect xjProbability p as its reference data pointsij
(7) according to formula fk=∑ijzijpijCalculate k-th of population wkCorresponding object function fk, wherein zijFor two into Number processed and only in xiWith xjValue 1 when belonging to one species.
(8) judge whether to meet condition k < NIf so, setting return to step after k=k+1 (5);If it is not, obtaining object function Vectorial F=[f1, f2..., fN] after find out maximum value f in FbestCorresponding population wbest, and execute next step (9).
(9) it is that each population generates a corresponding variation vector v according to formula as followsk
vk=wk+Z(wbest-wk)+Z(wa-wb) (2)
In above formula, lower label a and b is the 2 mutually different integers randomly generated from section [1, nP].
(10) according to formula as follows to the vector v that makes a variationkIt is modified, i.e.,;
Wherein, vK, nIndicate vector vkIn nth elements, n=1,2 ..., m.
(11) it is generated according to formula as follows and attempts vector uk∈Rm×1, i.e.,:
Wherein, uK, nWith wK, nRespectively ukWith wkMiddle nth elements, vectorial rand ∈ Rm×1Middle each element be all 0 to 1 it Between equally distributed arbitrary random decimal, randnIt is then the nth elements in random vector rand.
(12) according to formula Population Regeneration w as followsk, i.e.,:
In above formula, h (uk) indicate ukAs population wkReplacement values after the target function value that is calculated.
(13) step (9)~(12) are repeated until all populations all update and finish to obtain new matrix W, juxtaposition iter= iter+1。
(14) judge whether to meet condition iter > ImaxIf it is not, return to step (5) continues to execute;If so, output is most Big object function fbestCorresponding population wbest, the respective weights coefficient of as each feature.
(15) according to wbest∈Rm×1In each element concrete numerical value size, will be close to the feature corresponding to 0 element It rejects, then remaining feature is the result after feature selecting.
Compared with conventional method, inventive process have the advantage that:
First, the method for the present invention optimizes the object function of NCA algorithms using differential evolution algorithm, to ensure last power Weight coefficient vector is global optimum's result rather than local optimum.Secondly, the method for the present invention and tradition NCA be not the difference is that Once considered the object function for including planningization parameter, also there is no need to determine the size of regularisation parameter.It can be said that the present invention is sent out Method is that a kind of strategy for improvement of characteristic of division selection is used for traditional NCA methods.
Description of the drawings
Fig. 1 is the implementing procedure figure of the method for the present invention.
Fig. 2 is the feature selecting result schematic diagram of the method for the present invention.
Specific implementation mode
The method of the present invention is described in detail with specific case study on implementation below in conjunction with the accompanying drawings.
As shown in Figure 1, the present invention discloses a kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution.Under Design the validity that the numerical value case that one two is classified verifies the method for the present invention in face.
The equally distributed data set X between section [0,1] for randomly generating one 500 × 20 dimension, will be full in data set X Sufficient condition X3·X9/X15The category label of the sample of < 0.4 is arranged to y1=1, and the classification mark of other samples for being unsatisfactory for condition Number it is arranged to y2=2.
(1) above-mentioned training dataset is made of two class sample datas, and the result of feature selecting ought to select in data set X 3, the corresponding feature of 9 and 15 row, continues with implementation the method for the present invention.
(2) X is handled to obtain X=[x by row execution standardization1, x2..., x500]T∈R500×20To eliminate each feature The influence of dimension.
(3) parameter of differential evolution algorithm, including population number nP=120, zoom factor Z=0.6, greatest iteration time are set Number Imax=2000 and crossover probability p=0.1.
(4) matrix W=[w of sharp arbitrary initial m × nP dimensions1, w2..., wnP] after, set iterations iter=0 and k= 1。
(5) take in matrix W k-th of column vector as population wk∈Rm×1Afterwards, according to formula dij=wk|xi-xj| calculating matrixIn arbitrary two sample points xiWith xjThe distance between dij
(6) x is calculatediSelect xjProbability p as its reference data pointsij
(7) according to formula fk=∑ijzijpijCalculate k-th of population wkCorresponding object function fk
(8) judge whether to meet condition k < 500If so, setting return to step after k=k+1 (5);If it is not, obtaining target letter Number vector F=[f1, f2..., f500] after find out maximum value f in FbestCorresponding population wbest, and execute next step (9).
(9) it is that each population generates a corresponding variation vector vk
(10) to the vector v that makes a variationkIt is modified.
(11) it is generated according to formula as follows and attempts vector uk∈Rm×1, i.e.,:
(12) Population Regeneration wk
(13) step (9)~(12) are repeated until all populations all update and finish to obtain new matrix W, juxtaposition iter= iter+1。
(14) judge whether to meet condition iter > ImaxIf it is not, return to step (5) continues to execute;If so, output is most Big object function fbestCorresponding population wbest, the respective weights coefficient of as each feature.
(15) according to wbest∈R33×1In each element concrete numerical value size, will be close to the spy corresponding to 0 element Sign is rejected, then remaining feature is the result after feature selecting.
As shown in Fig. 2, the corresponding weighting coefficient scatter plot of each feature, from figure it can be found that the method for the present invention correctly Corresponding feature is selected.
Above-mentioned case study on implementation only is used for illustrating the specific implementation of the present invention, rather than limits the invention. In the protection domain of spirit and claims of the present invention, to any modification that the present invention makes, the protection of the present invention is both fallen within Range.

Claims (1)

1. a kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution, which is characterized in that include the following steps:
Step (1):Collect the different classes of y of application1, y2..., yCCorresponding sample data set X1, X2..., XC, wherein C Indicate classification sum, c class data setsInclude the N of m featurecA sample data, c=1,2 ..., C;
Step (2):By data set X1, X2..., XCForm a matrix X ∈ RN×m, and X is handled to obtain X by row execution standardization =[x1, x2..., xN]T∈RN×mTo eliminate each feature dimension influence, wherein N=N1+N1+…+NC, xi∈Rm×1It indicates The transposition of i-th of sample data, upper label T representing matrixes or vector;
Step (3):The parameter of differential evolution algorithm, including population number nP=6m, zoom factor Z=0.6, greatest iteration time are set Number Imax >=2000 and crossover probability p=0.1;
Step (4):Matrix W=[w of sharp arbitrary initial m × nP dimensions1, w2..., wnP] after, set iterations iter=0 and k= 1;
Step (5):Take in matrix W k-th of column vector as population wk∈Rm×1Afterwards, according to formula dij=wk|xi-xj| calculate square Battle arrayIn arbitrary two sample points xiWith xjThe distance between dij, wherein | xi-xj| it indicates vector xi-xjIn element all take absolutely Value, lower label i, j=1,2 ..., N;
Step (6):X is calculated according to formula as followsiSelect xjProbability p as its reference data pointsij
Step (7):According to formula fk=∑ijzijpijCalculate k-th of population wkCorresponding neighbour's constituent analysis object function fk, Wherein zijFor binary number and only in xiWith xjValue 1 when belonging to one species;
Step (8):Judge whether to meet condition k < NIf so, setting return to step after k=k+1 (5);If it is not, obtaining object function Vectorial F=[f1, f2..., fN] after find out maximum value f in FbestCorresponding population wbest, and execute next step (9);
Step (9):It is that each population generates a corresponding variation vector v according to formula as followsk
vk=wk+Z(wbest-wk)+Z(wa-wb) (2)
In above formula, lower label a and b is the 2 mutually different integers randomly generated from section [1, nP];
Step (10):According to formula as follows to the vector v that makes a variationkIt is modified, i.e.,:
In above formula, vK, nIndicate vector vkIn nth elements, n=1,2 ..., m;
Step (11):It is generated according to formula as follows and attempts vector uk∈Rm×1, i.e.,:
Wherein, uK, nWith wK, nRespectively ukWith wkMiddle nth elements, vectorial rand ∈ Rm×1Middle each element be all between 0 to 1 uniformly The arbitrary random decimal of distribution, randnIt is then the nth elements in random vector rand;
Step (12):According to formula Population Regeneration w as followsk, i.e.,:
In above formula, h (uk) indicate ukAs population wkReplacement values after the target function value that is calculated;
Step (13):Step (9)~(12) are repeated until all populations all update and finish to obtain new matrix W, juxtaposition iter= iter+1;
Step (14):Judge whether to meet condition iter > ImaxIf it is not, return to step (5) continues to execute;If so, output Maximum target function fbestCorresponding population wbest, the respective weights coefficient of as each feature;
Step (15):According to wbest∈Rm×1In each element concrete numerical value size, will be close to the spy corresponding to 0 element Sign is rejected, then remaining feature is the result after feature selecting.
CN201810233510.1A 2018-03-06 2018-03-06 A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution Withdrawn CN108446735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810233510.1A CN108446735A (en) 2018-03-06 2018-03-06 A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810233510.1A CN108446735A (en) 2018-03-06 2018-03-06 A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution

Publications (1)

Publication Number Publication Date
CN108446735A true CN108446735A (en) 2018-08-24

Family

ID=63196015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810233510.1A Withdrawn CN108446735A (en) 2018-03-06 2018-03-06 A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution

Country Status (1)

Country Link
CN (1) CN108446735A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109407649A (en) * 2018-10-09 2019-03-01 宁波大学 A kind of fault type matching process based on fault signature variables choice
CN109636487A (en) * 2019-01-14 2019-04-16 平安科技(深圳)有限公司 Advertisement sending method, server, computer equipment and storage medium
CN113177608A (en) * 2021-05-21 2021-07-27 河南大学 Neighbor model feature selection method and device for incomplete data
CN113191616A (en) * 2021-04-18 2021-07-30 宁波大学科学技术学院 Polypropylene product quality abnormity detection method based on double-layer correlation characteristic analysis

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109407649A (en) * 2018-10-09 2019-03-01 宁波大学 A kind of fault type matching process based on fault signature variables choice
CN109636487A (en) * 2019-01-14 2019-04-16 平安科技(深圳)有限公司 Advertisement sending method, server, computer equipment and storage medium
CN109636487B (en) * 2019-01-14 2023-09-29 平安科技(深圳)有限公司 Advertisement pushing method, server, computer device and storage medium
CN113191616A (en) * 2021-04-18 2021-07-30 宁波大学科学技术学院 Polypropylene product quality abnormity detection method based on double-layer correlation characteristic analysis
CN113191616B (en) * 2021-04-18 2023-01-24 宁波大学科学技术学院 Polypropylene product quality abnormity detection method based on double-layer correlation characteristic analysis
CN113177608A (en) * 2021-05-21 2021-07-27 河南大学 Neighbor model feature selection method and device for incomplete data
CN113177608B (en) * 2021-05-21 2023-09-05 河南大学 Neighbor model feature selection method and device for incomplete data

Similar Documents

Publication Publication Date Title
CN108446735A (en) A kind of feature selection approach optimizing neighbour's constituent analysis based on differential evolution
CN104536412B (en) Photoetching procedure dynamic scheduling method based on index forecasting and solution similarity analysis
CN111191732A (en) Target detection method based on full-automatic learning
Talavera-Llames et al. Big data time series forecasting based on nearest neighbours distributed computing with Spark
CN109214449A (en) A kind of electric grid investment needing forecasting method
CN102521656A (en) Integrated transfer learning method for classification of unbalance samples
CN105929690B (en) A kind of Flexible Workshop Robust Scheduling method based on decomposition multi-objective Evolutionary Algorithm
CN108921604B (en) Advertisement click rate prediction method based on cost-sensitive classifier integration
CN105373606A (en) Unbalanced data sampling method in improved C4.5 decision tree algorithm
CN103886330A (en) Classification method based on semi-supervised SVM ensemble learning
CN103473598A (en) Extreme learning machine based on length-changing particle swarm optimization algorithm
CN112685504B (en) Production process-oriented distributed migration chart learning method
CN103617435A (en) Image sorting method and system for active learning
Febriantono et al. Classification of multiclass imbalanced data using cost-sensitive decision tree C5. 0
CN110751378A (en) Nuclear facility decommissioning scheme evaluation method and system
CN113392587A (en) Parallel support vector machine classification method for large-area landslide risk evaluation
CN107392155A (en) The Manuscripted Characters Identification Method of sparse limited Boltzmann machine based on multiple-objection optimization
CN107273922A (en) A kind of screening sample and weighing computation method learnt towards multi-source instance migration
CN108830407B (en) Sensor distribution optimization method in structure health monitoring under multi-working condition
CN111737924B (en) Method for selecting typical load characteristic transformer substation based on multi-source data
CN110084376B (en) Method and device for automatically separating data into boxes
CN108805152A (en) A kind of scene classification method and device
CN116993548A (en) Incremental learning-based education training institution credit assessment method and system for LightGBM-SVM
CN116306785A (en) Student performance prediction method of convolution long-short term network based on attention mechanism
CN116452373A (en) Multi-target genetic algorithm-based intelligent generation method and system for block building body quantity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20180824