CN110210529A - A kind of feature selection approach based on binary quanta particle swarm optimization - Google Patents
A kind of feature selection approach based on binary quanta particle swarm optimization Download PDFInfo
- Publication number
- CN110210529A CN110210529A CN201910400448.5A CN201910400448A CN110210529A CN 110210529 A CN110210529 A CN 110210529A CN 201910400448 A CN201910400448 A CN 201910400448A CN 110210529 A CN110210529 A CN 110210529A
- Authority
- CN
- China
- Prior art keywords
- feature
- calculated
- algorithm
- correlation
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2111—Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Physiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of feature selection approach based on binary quanta particle swarm optimization.Feature correlation analysis is carried out using maximum information coefficient, then feature selecting processing is carried out by improved BQPSO algorithm, carries out classification accuracy verifying using SVM later.Gene expression profile the experimental results showed that, based on improved BQPSO algorithm carry out feature selecting be a kind of practicable method.The present invention mainly improves the binary quanta particle colony optimization algorithm of standard, and the calculating of local attraction's has used the mode based on complete learning strategy, while introducing the variation thought of genetic algorithm to increase the diversity of population.Experiment shows that better classification accuracy can be obtained using improved BQPSO algorithm progress feature selecting.
Description
Technical field
The invention belongs to data mining technology fields, are related to a kind of feature selecting based on binary quanta particle swarm optimization
Method.
Background technique
In classification problem, data set generally comprises thousands of feature, including those related, uncorrelated and redundancies
Feature, it is excessive huge due to data set, in some instances it may even be possible to classification performance to be reduced, this just will appear " dimension disaster ".Pass through spy
The dimension that sign selects to reduce data set is one of the mode of Data Dimensionality Reduction.
Feature selecting, in occupation of very important status, and has very high researching value in area of pattern recognition.
On the one hand, can effectively reduce data volume to be processed by using feature selecting reduces computing cost;On the other hand, feature is selected
Non-key interference characteristic can be eliminated by selecting, and reduce the correlation between feature, the validity of Enhanced feature.
Currently, there is the feature selection approach based on filtration method, pack and embedding inlay technique.Pack is commented using classifier
Estimate the character subset of generation.And filtration method is that character subset is assessed according to its information content and statistical measures.In general,
Pack can than filtration method obtain preferably as a result, but calculation amount it is larger.The classifier building process of embedding inlay technique is also a spy
Levy the process of selection.How to design effective feature selection approach is the major issue that current high dimensional data faces.
Summary of the invention
The purpose of the present invention is being directed to the demand of the existing feature selecting to higher-dimension, Small Sample Database, a kind of base is proposed
In the feature selection approach of binary quanta particle swarm optimization.This method uses maximum information coefficient (the maximal
Information coefficient, writes a Chinese character in simplified form MIC) (see DN, the paper Detecting novel of R., et al.
Associations in large data sets.Science (New York, N.Y.), 2011.334 (6062)) it is counted
Data preprocess deletes the feature of weak dependence, then passes through improved binary quanta particle group (Binary Quantum
Particle Swarm Optimization, BQPSO) algorithm carries out feature selecting operation, later selected characteristic use SVM into
The verifying of row classification accuracy, keeps higher accuracy rate.
The specific steps of the present invention are as follows:
Step 1: input common data sets;
Step 2: the correlation of each data field feature and category is calculated using maximum information coefficient MIC, setting is related
Property be less than threshold value the weak relevant feature of conduct, delete weak correlated characteristic;
The correlation of each feature and category is calculated using maximum information coefficient MIC, specifically:
Wherein X is sample characteristics, and Y is category, and B takes 0.6 or 0.55 power of total amount of data.
Step 3: being directed to strong correlation feature, carried out using the variation thought and binary quanta particle swarm optimization of genetic algorithm
Optimal feature subset selection;
Specifically:
1) initialization population;
2) fitness value of each particle in group is calculated according to fitness function, and is carried out with a preceding local optimum
Compare, if f (xi)<f(pbesti), then pbesti=xi, otherwise it does not update;
3) population optimal value gbest is calculated, average optimal value mbest is calculated;
4) the sub- p of local attraction is calculatedi, calculate the new location updating Probability p r of particle;
5) according to function Transf (pi, pr) and update xiValue;
6) the poor particle of fitness value is filtered out, using the variation thought of genetic algorithm, the grain poor to fitness value
Son is made a variation with the probability of Pm, to improve the diversity of population;
7) whether interpretation meets termination condition, and Step4 is returned to if being unsatisfactory for), otherwise enter and operates in next step;
8) optimal feature subset is exported;
Wherein fitness function are as follows:
Wherein, wAIt is svm classifier accuracy rate weight, wFIt is the feature quantity weight with category strong correlation, sum (chrom) is
Refer to the feature quantity with category strong correlation, Acc is according to the classification accuracy of selected feature, and mic_c is by maximum linear system
The correlation that number MIC is calculated between feature and category obtains;Mic_f is to calculate feature and feature by maximum information coefficient MIC
Between correlation;
Step 4: validation verification evaluation being carried out to selected character subset using algorithm of support vector machine.
Beneficial effects of the present invention: the present invention mainly changes the binary quanta particle colony optimization algorithm of standard
Into the calculating of local attraction's has used the mode based on complete learning strategy, while the variation thought for introducing genetic algorithm is come
Increase the diversity of population.Experiment shows to carry out feature selecting using improved BQPSO algorithm, can preferably be classified
Accuracy rate.
Detailed description of the invention
Fig. 1 is algorithm general flow chart of the invention;
Fig. 2 is binary quanta particle swarm optimization flow chart of the invention;
Fig. 3 is the character subset that Lymphoma lymthoma data set obtains through the invention, passes through support vector machines
(Support Vector Machine, SVM) obtains classification accuracy.
Specific embodiment
As shown in Figure 1, a kind of feature selection approach based on binary quanta particle swarm optimization, the specific steps are as follows:
Step 1, input common data sets Lymphoma, wherein sample size is 45, feature quantity 4026, wherein negative sample
Quantity is 22, and positive sample quantity is 23.
Step 2, the correlation that all features and category are calculated using maximum information coefficient (MIC).MIC calculation method is such as public
Shown in formula (1) (2).
Step 3 carries out relevance ranking to feature according to MIC value, deletes the weak correlated characteristic in part according to the threshold value of setting.
Step 4, to remaining feature using binary particle swarm algorithm scan for optimization obtain optimal feature subset.Tool
Body algorithm flow chart is shown in Fig. 2.
In BQPSO algorithm, without the concept of speed and track, distance is general only between particle position point and particle
It reads.The distance between two particles are indicated with Hamming distance.The p in QPSOiIt is local attraction's for calculating population, pidValue exist
pbestidAnd gbestdBetween, pi=(pi1,pi2,...piD) be then located at pbestiIt is the hypermatrix at diagonal line both ends with gbest
In, piTo pbestiOr the distance of gbest is necessarily less than cornerwise length, that is, must satisfy such as lower inequality:
|pi-pbesti|≤|pbesti-gbest| (3)
|pi-gbest|≤|pbesti-gbest| (4)
Pass through the sub- p of local attractioniCalculating, can make population generate diversity, jump out the local search area of particle.?
In BQPSO algorithm, piProducing method and QPSO algorithm it is different, be by parent pbestiWith each in gbest with
Machine intersects to generate new filial generation.
With going deep into for PSO algorithm iteration, particle is easy Premature Convergence, falls into locally optimal solution.It is asked to solve this
Topic introduces the variation thought of genetic algorithm, and the particle poor for some fitness is made a variation with the probability of Pm, increases population
Diversity prevents particle from falling into locally optimal solution too early.
This method wishes to obtain higher classification accuracy while selecting less characteristic.Therefore, algorithm for design
Fitness function be formula (3):
Wherein sum (chrom) refers to feature quantity selected by each population, and Acc is to be classified to obtain according to selected feature
Accuracy rate.This method uses two classifier SVM, carries out classification model construction to sample according to the character subset of each population, uses
Fitness evaluation effect.The fitness function keeps selected characteristic as few as possible, while keeping classification error rate as low as possible.
Binary quanta particle swarm optimization selects the process of feature as follows:
1) initialization population.
2) according to fitness function calculate group in each particle fitness value, and with a preceding local optimum into
Row compares, if f (xi)<f(pbesti), then pbesti=xi, otherwise it does not update.
3) population optimal value gbest is calculated, average optimal value mbest is calculated.
4) the sub- p of local attraction is calculatedi, calculate the new location updating Probability p r of particle.
5) according to function Transf (pi, pr) and update xiValue.
6) the poor particle of fitness value is filtered out, using the variation thought of genetic algorithm, the grain poor to fitness value
Son is made a variation with the probability of Pm, to improve the diversity of population.
7) whether interpretation meets termination condition, and Step4 is returned to if being unsatisfactory for, and otherwise enters and operates in next step.
8) optimal chromosome, i.e., 01 optimal string are exported, wherein 0 indicates not choose this feature, 1 indicates to have selected the spy
Sign.
Step 5, above four step repetitive cyclings repeatedly obtain selected character subset.Using ten times of cross validations to each
Obtained character subset is verified.The improved BQPSO algorithm and BQPSO algorithm modeled by support vector cassification
Classification accuracy comparison schematic diagram (see Fig. 3).
Claims (2)
1. a kind of feature selection approach based on binary quanta particle swarm optimization, it is characterised in that: the specific steps of this method
It is as follows:
Step 1: input common data sets;
Step 2: the correlation of each data field feature and category is calculated using maximum information coefficient MIC, setting correlation is small
In the weak relevant feature of the conduct of threshold value, weak correlated characteristic is deleted;
Step 3: being directed to strong correlation feature, carried out using the variation thought and binary quanta particle swarm optimization of genetic algorithm optimal
Feature subset selection;
Specifically:
1) initialization population;
2) fitness value of each particle in group is calculated according to fitness function, and is compared with a preceding local optimum
Compared with if f (xi)<f(pbesti), then pbesti=xi, otherwise it does not update;
3) population optimal value gbest is calculated, average optimal value mbest is calculated;
4) the sub- p of local attraction is calculatedi, calculate the new location updating Probability p r of particle;
5) according to function Transf (pi, pr) and update xiValue;
6) filter out the poor particle of fitness value, using the variation thought of genetic algorithm, the particle poor to fitness value with
The probability of Pm makes a variation, to improve the diversity of population;
7) whether interpretation meets termination condition, and Step4 is returned to if being unsatisfactory for), otherwise enter and operates in next step;
8) optimal feature subset is exported;
Wherein fitness function are as follows:
Wherein, wAIt is svm classifier accuracy rate weight, wFThe feature quantity weight with category strong correlation, sum (chrom) refer to
The feature quantity of category strong correlation, Acc are according to the classification accuracy of selected feature, and mic_c is by maximum linear coefficient MIC
The correlation calculated between feature and category obtains;Mic_f is calculated between feature and feature by maximum information coefficient MIC
Correlation;
Step 4: validation verification evaluation being carried out to selected character subset using algorithm of support vector machine.
2. a kind of feature selection approach based on binary quanta particle swarm optimization according to claim 1, feature exist
In: the correlation of each feature and category is calculated using maximum information coefficient MIC,
Specifically:
Wherein X is sample characteristics, and Y is category, and B takes 0.6 or 0.55 power of total amount of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910400448.5A CN110210529A (en) | 2019-05-14 | 2019-05-14 | A kind of feature selection approach based on binary quanta particle swarm optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910400448.5A CN110210529A (en) | 2019-05-14 | 2019-05-14 | A kind of feature selection approach based on binary quanta particle swarm optimization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110210529A true CN110210529A (en) | 2019-09-06 |
Family
ID=67787230
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910400448.5A Pending CN110210529A (en) | 2019-05-14 | 2019-05-14 | A kind of feature selection approach based on binary quanta particle swarm optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110210529A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659719A (en) * | 2019-09-19 | 2020-01-07 | 江南大学 | Aluminum profile flaw detection method |
CN111191764A (en) * | 2019-12-30 | 2020-05-22 | 内蒙古工业大学 | Bus passenger flow volume test method and system based on SPGAPSO-SVM algorithm |
CN112819062A (en) * | 2021-01-26 | 2021-05-18 | 淮阴工学院 | Fluorescence spectrum quadratic characteristic selection method based on mixed particle swarm and continuous projection |
CN113408731A (en) * | 2021-06-21 | 2021-09-17 | 北京计算机技术及应用研究所 | K-near quantum circuit realizing method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140257767A1 (en) * | 2013-03-09 | 2014-09-11 | Bigwood Technology, Inc. | PSO-Guided Trust-Tech Methods for Global Unconstrained Optimization |
US20150242759A1 (en) * | 2014-02-21 | 2015-08-27 | Battelle Memorial Institute | Method of generating features optimal to a dataset and classifier |
CN105718943A (en) * | 2016-01-19 | 2016-06-29 | 南京邮电大学 | Character selection method based on particle swarm optimization algorithm |
CN107657098A (en) * | 2017-09-15 | 2018-02-02 | 哈尔滨工程大学 | Perimeter antenna array Sparse methods based on quantum chicken group's mechanism of Evolution |
CN108140145A (en) * | 2015-08-13 | 2018-06-08 | D-波系统公司 | For the system and method for creating and being interacted using the higher degree between quantum device |
CN108805159A (en) * | 2018-04-17 | 2018-11-13 | 杭州电子科技大学 | A kind of high dimensional data feature selection approach based on filtration method and genetic algorithm |
-
2019
- 2019-05-14 CN CN201910400448.5A patent/CN110210529A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140257767A1 (en) * | 2013-03-09 | 2014-09-11 | Bigwood Technology, Inc. | PSO-Guided Trust-Tech Methods for Global Unconstrained Optimization |
US20150242759A1 (en) * | 2014-02-21 | 2015-08-27 | Battelle Memorial Institute | Method of generating features optimal to a dataset and classifier |
CN108140145A (en) * | 2015-08-13 | 2018-06-08 | D-波系统公司 | For the system and method for creating and being interacted using the higher degree between quantum device |
CN105718943A (en) * | 2016-01-19 | 2016-06-29 | 南京邮电大学 | Character selection method based on particle swarm optimization algorithm |
CN107657098A (en) * | 2017-09-15 | 2018-02-02 | 哈尔滨工程大学 | Perimeter antenna array Sparse methods based on quantum chicken group's mechanism of Evolution |
CN108805159A (en) * | 2018-04-17 | 2018-11-13 | 杭州电子科技大学 | A kind of high dimensional data feature selection approach based on filtration method and genetic algorithm |
Non-Patent Citations (1)
Title |
---|
沈渊锋: "基于改进的粒子群优化算法的特征选择方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 1, pages 18 - 49 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659719A (en) * | 2019-09-19 | 2020-01-07 | 江南大学 | Aluminum profile flaw detection method |
CN110659719B (en) * | 2019-09-19 | 2022-02-08 | 江南大学 | Aluminum profile flaw detection method |
CN111191764A (en) * | 2019-12-30 | 2020-05-22 | 内蒙古工业大学 | Bus passenger flow volume test method and system based on SPGAPSO-SVM algorithm |
CN112819062A (en) * | 2021-01-26 | 2021-05-18 | 淮阴工学院 | Fluorescence spectrum quadratic characteristic selection method based on mixed particle swarm and continuous projection |
CN113408731A (en) * | 2021-06-21 | 2021-09-17 | 北京计算机技术及应用研究所 | K-near quantum circuit realizing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210529A (en) | A kind of feature selection approach based on binary quanta particle swarm optimization | |
US11977634B2 (en) | Method and system for detecting intrusion in parallel based on unbalanced data Deep Belief Network | |
CN107609121B (en) | News text classification method based on LDA and word2vec algorithm | |
Cheng et al. | Label ranking methods based on the Plackett-Luce model | |
Bandyopadhyay et al. | Multiobjective GAs, quantitative indices, and pattern classification | |
CN102663100B (en) | Two-stage hybrid particle swarm optimization clustering method | |
CN110147321A (en) | A kind of recognition methods of the defect high risk module based on software network | |
CN108363810A (en) | Text classification method and device | |
CN110188785A (en) | A kind of data clusters analysis method based on genetic algorithm | |
Parrott et al. | Multi-objective techniques in genetic programming for evolving classifiers | |
CN112906890A (en) | User attribute feature selection method based on mutual information and improved genetic algorithm | |
CN110837884B (en) | Effective mixed characteristic selection method based on improved binary krill swarm algorithm and information gain algorithm | |
CN109670687A (en) | A kind of mass analysis method based on particle group optimizing support vector machines | |
CN108805159A (en) | A kind of high dimensional data feature selection approach based on filtration method and genetic algorithm | |
CN112633346A (en) | Feature selection method based on feature interactivity | |
CN114625868A (en) | Electric power data text classification algorithm based on selective ensemble learning | |
CN111914930A (en) | Density peak value clustering method based on self-adaptive micro-cluster fusion | |
Ahlawat et al. | A genetic algorithm based feature selection for handwritten digit recognition | |
CN104636814A (en) | Method and system for optimizing random forest models | |
CN111275206A (en) | Integrated learning method based on heuristic sampling | |
CN117978661A (en) | Influence maximization method based on refused neighborhood | |
CN114169406A (en) | Feature selection method based on symmetry uncertainty joint condition entropy | |
Lin et al. | A new density-based scheme for clustering based on genetic algorithm | |
Lingras et al. | Statistical, evolutionary, and neurocomputing clustering techniques: cluster-based vs object-based approaches | |
CN105654498A (en) | Image segmentation method based on dynamic local search and immune clone automatic clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |