CN105279520A - Optimal character subclass selecting method based on classification ability structure vector complementation - Google Patents

Optimal character subclass selecting method based on classification ability structure vector complementation Download PDF

Info

Publication number
CN105279520A
CN105279520A CN201510621401.3A CN201510621401A CN105279520A CN 105279520 A CN105279520 A CN 105279520A CN 201510621401 A CN201510621401 A CN 201510621401A CN 105279520 A CN105279520 A CN 105279520A
Authority
CN
China
Prior art keywords
feature
classification
vector
capacity
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510621401.3A
Other languages
Chinese (zh)
Other versions
CN105279520B (en
Inventor
王淑琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Normal University
Original Assignee
Tianjin Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Normal University filed Critical Tianjin Normal University
Priority to CN201510621401.3A priority Critical patent/CN105279520B/en
Publication of CN105279520A publication Critical patent/CN105279520A/en
Application granted granted Critical
Publication of CN105279520B publication Critical patent/CN105279520B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation

Abstract

The invention aims to solve a problem that most of conventional methods take a single value as the classification ability evaluation standard of characters or character subclasses and provides an optimal character subclass selecting method based on classification ability structure vector complementation. The method defines a character classification ability structure vector V=(V1:V2: ...:Vn) in a binary form and classification ability structure vector complementation characters, adopts the dichotomy for calculating the character classification distinguishing ability threshold of each subclass problem, and performs optimal character subclass selection according to the structure complementation maximization principle and the greedy strategy of different characters in the selected character subclass on the basis. The method fully considers different evaluations of each character on classification abilities of different types, and accords with the classification ability structure complementation maximization principle in the character selection process. The method accords with natural laws of advantage complementation and maximizes character classification information, thereby obtaining a better character subclass, effectively reducing redundant characters, and improving the classification prediction accuracy.

Description

Based on the optimal feature subset choosing method of classification capacity structure vector complementation
Technical field
The invention belongs to machine learning and mode identification technology, a kind of rationally effective feature subset selection method of concrete proposition.
Background technology
Feature Selection is one of two main method of Dimensionality Reduction.It plays vital effect in machine learning and pattern-recognition, is also one of underlying issue wherein studied, and is data prediction step crucial in structural classification device.Feature Selection chooses on original characteristic set the significant character subset of classification according to some evaluation criterions thus removes irrelevant or redundancy feature, thus the m be down to by the dimension in former space much smaller than former dimension ties up.Along with developing rapidly of internet and high-throughput techniques, instantly entered large data age, data are enormous amount and numerous and complicated feature, and it is even more important at society that this also makes Feature Selection Algorithms study.In recent years, one of principal focal point problem faced when the research of Feature Selection Algorithms has become the data set be applied to containing thousands of features.Feature Selection can make data be easier to understand, and reduces tolerance and memory requirement, reduces training and realize the time, improve estimated performance etc.In this research direction, how evaluating characteristic classification capacity and the choosing method obtaining effective character subset is a key.
In recent years, domestic scholars has carried out a large amount of research work to Feature Selection, and domestic periodical has also published lot of research.All there is a something in common in these Feature Selection Algorithms, namely various classification capacity measure is all a score value feature or character subset being provided to the classification capacity size describing this feature or character subset.It has been generally acknowledged that the classification capacity of the feature that aspect ratio score value that score value is large is little is strong, the feature that thus score value is large also will preferentially be chosen.But a few thing has shown that the feature that some score values are little also should be selected, and some have the classification results also always do not obtained compared with the combination of the feature of high-class ability value.With the comprehensive evaluation that single value representation feature classification capacity size is only to this tagsort ability, and have ignored each feature for the different evaluation of different classes of classification capacity.
Summary of the invention
The present invention, in order to solve the problem in above-mentioned existing method, proposes a kind of optimal feature subset choosing method based on the complementation of classification capacity structure vector newly.The present invention obtains corresponding vectorization classification capacity by evaluating each feature to the classification capacity of different subclass problem, namely by the separating capacity of multiple value representation feature to different subclass problem, then according to the principle selected characteristic of classification capacity structure vector complementation or character subset.The present invention is applicable to multiclass and number of samples is predicted much smaller than data set such as the classification of cancer data collection etc. of Characteristic Number.For breast cancer data set, validity of the present invention will be described in a specific embodiment.
The present invention is by the tagsort ability structure vector of definition based on binary mode and the feature of classification capacity complementary structure, adopt dichotomy to calculate the threshold value of each subclass problem respectively, and carry out optimal feature subset according to the structural complementarity maximization principle of different characteristic in selected character subset and Greedy strategy on this basis and choose.This method had both met the natural law of mutual supplement with each other's advantages, also tagsort information can be performed to ultimate attainment, thus obtain better character subset.Therefore, research considers that the Feature Selection Algorithms of classification capacity structural complementarity has very large meaning.
For achieving the above object, the invention discloses following technology contents:
A kind of optimal feature subset choosing method based on the complementation of classification capacity structure vector, it is characterized in that first the method will carry out the definition of the tagsort ability structure vector based on binary mode, and complete the calculating of each tagsort ability structure vector, concrete steps are as follows:
For having individual feature the classification problem of class , first adopt 1-vs-1 form to be translated into be made up of any two classes individual two classification subproblems, wherein , then adopt Fisher differentiation rate as feature antithetical phrase Question Classification separating capacity value, be called for short FDR value, be designated as , calculate each feature respectively to the classification separating capacity of individual subproblem , wherein, ; Last will own according to the threshold value of following threshold value calculation method acquisition class separating capacity again be converted into 0 or 1, thus obtain the classification separating capacity structure vector of each feature to each subproblem.
Wherein adopt dichotomy to calculate the threshold value of each subclass problem respectively, concrete steps are as follows:
Because the classification capacity of each feature to each subclass problem is different, therefore to each subclass problem calculated threshold, can obtain so respectively individual threshold value, in order to reduce the time complexity of threshold calculations, adopts better simply binary search method, with calculate by class form the the threshold value of individual subclass problem is example, and the computation process of threshold value is described;
First threshold value is set initial value be , be all features to the average of the classification separating capacity of individual subclass problem; To all features by classification separating capacity carry out descending sort, and maximal value and minimum value are assigned to variable ;
The optimal feature subset carried out again based on Greedy strategy and classification capacity complementary structure is chosen on this basis.Concrete steps are as follows:
After definite threshold, separating capacity of classifying is greater than the union of the attribute of threshold value as initial characteristics subset in all subproblems.For feature each in initial characteristics subset , and classification capacity structure vector, calculate the separating capacity that it is total, that is, weighted sum is asked, as total classification capacity to the FDR value that its structure vector component is the subproblem of 1.By total classification capacity of water to the feature descending sort of initial subset.
Choose successively from front to back each feature in initial characteristics subset and with the character subset chosen all features compare, if all complementary with the characteristic classification capacity structure vector of institute in selected character subset, then directly choose enter character subset, namely ; Otherwise, for all the feature that classification capacity structure vector covers, after the sample mistakenly hit vector calculating each feature respectively and sample total mistakenly hit vector or computing, selection can make the number of in sample total mistakenly hit vector 1 increase maximum features and enter character subset, if all features all can not make the change of sample total mistakenly hit vector, then do not select.Repeatedly perform said process until the total mistakenly hit vector of sample is unit vector, then character subset is the optimal feature subset chosen.
The concept relevant with the present invention and definition.
Subclass problem:
If given, there is individual feature the classification problem of class, for characteristic set, for classification, adopt 1-vs-1 form to be translated into be made up of any two classes individual two classification subproblems, wherein .Each two classification subproblems are wherein called subclass problem.
Tagsort ability:
The tolerance of certain feature to the classification capacity of classification problem.The present invention adopts the Fisher differentiation rate of feature, namely as feature to subproblem classification capacity value, referred to as FDR value, wherein feature in sample mean value, and feature in class sample mean value, feature respectively variance on two class samples.
Tagsort ability structure vector:
The classification capacity FDR value of certain feature to all subproblems just constitutes a vector, and this vector is called as the classification capacity structure vector of this feature.In order to simplify the complexity of calculating, the present invention adopts the classification capacity structure vector based on binary mode, is designated as:
Need to arrange threshold value the classification capacity FDR value of each feature to each subproblem is converted into 0 or 1.
Feature in the present invention classification capacity structure vector in subproblem the computing formula of component be defined as follows:
Sample mistakenly hit vector:
In order to the threshold value and feature subset selection that calculate each subclass problem respectively also introduce sample mistakenly hit vector, to classify all samples to make selected subset.
If a sample belonging to 1 class, its feature be worth the feature at all samples of 2 class between the minimum value of value and maximal value, then think that this 1 class sample is by feature otherwise for hitting.
Then feature in individual subproblem sample mistakenly hit vector be designated as:
, 0 represents that corresponding to this component, sample is by mistakenly hit, and 1 expression is hit.And uniquely determine.
By feature whole subproblems sample mistakenly hit vector connect come in, as feature sample mistakenly hit vector.
1. cover:
Suppose two features with structure vector be respectively if had
So just claim feature cover feature, is designated as , otherwise feature do not cover feature, is designated as .
2. classification capacity structure vector complementary characteristic:
For feature with if, set up, then claim these two features to be classification capacity structure vector complementary characteristics, be designated as .
3. initial characteristics subset sums optimal feature subset
Initial characteristics subset: after definite threshold, is greater than the union of the attribute of threshold value as initial characteristics subset using separating capacity of classifying in all subproblems.
Optimal feature subset: the character subset chosen according to the tagsort ability structure complementary maximization principle of vector and Greedy strategy in initial characteristics subset is called optimal feature subset.
The good effect compared with prior art had based on the feature subset selection method of classification capacity structure vector complementation disclosed by the invention is:
(1) choosing method of the present invention not only takes into full account that each feature is for the different evaluation of different classes of classification capacity, and in Feature Selection process, follow the maximized principle of classification capacity structural complementarity.This method had both met the natural law of mutual supplement with each other's advantages, also tagsort information can be performed to ultimate attainment, thus obtain better character subset, effectively to reduce redundancy feature, improved the accuracy rate of classification prediction.
(2) choosing method of the present invention can solve the comprehensive evaluation that classification capacity measure in existing Feature Selection Algorithms is all the classification capacity using single value as a feature or character subset, and have ignored the problem of each feature for the different evaluation of different classes of classification capacity.The results show effectively can reduce redundancy feature based on the feature subset selection method of classification capacity structure vector complementation, and improving the accuracy rate of classification prediction, is effective.
(3) the present invention can be used for the classification prediction of cancer data collection, improves predictablity rate, is conducive to the important gene finding to cause cancer to occur, to such an extent as to better study the targeted drug of Therapeutic cancer.
Accompanying drawing explanation
Fig. 1 classification problem data set ;
Fig. 2 is the algorithm flow chart based on dichotomy calculated threshold.
Embodiment
In order to explain enforcement of the present invention more fully, below by drawings and Examples, the present invention is described further.These embodiments are only explain instead of limit the scope of the invention.
Embodiment 1
1. read classification problem data set.
Usual classification problem data set is a two-dimensional matrix, such as, have and such as have individual feature class the classification problem of individual sample data set as shown in Figure 1, wherein represent the of individual sample the eigenwert of individual feature, represent the the classification of individual sample.Table 1 shows the expression value of the Partial Feature gene of breast cancer breast data centralization part sample, wherein the second behavior sample classification, the third line is the expression value of first feature on each sample, the rest may be inferred for other row, a sample is shown in one list, i.e. each feature representation value of someone and classification.All eigenwerts of each for data centralization sample are read two-dimensional array , the classification of each sample is read one-dimension array in.
The expression value of the Partial Feature gene of table 1 breast cancer breast data centralization part sample
2. calculate each feature to the classification separating capacity value of each subclass problem, namely .
First adopt 1-vs-1 form multicategory classification problem to be converted into be made up of any two classes individual two classification subproblems, wherein.Adopt Fisher differentiation rate as feature antithetical phrase Question Classification separating capacity value again,
Then feature:
to the classification separating capacity of individual subproblem, is designated as if, the comprising classification in individual subproblem is sample, then
Computing formula is as follows:
Wherein feature in class sample mean value, and the mean value of feature in class sample, feature respectively variance on two class samples.
According to above-mentioned computing method, calculate each feature respectively to the classification separating capacity of individual subproblem , wherein, .The classification separating capacity of each like this feature to each subproblem just constitutes a vector, , be called tagsort ability structure vector.
3. adopt dichotomy to calculate the threshold value of each subclass problem respectively.
Because the classification capacity of each feature to each subclass problem is different, therefore to each subclass problem calculated threshold, can obtain so respectively individual threshold value, in order to reduce the time complexity of threshold calculations, adopts better simply binary search method.With calculate by the threshold value of the subclass problem formed is example, and the computation process of threshold value is described, its respective algorithms process flow diagram as shown in Figure 2.
First threshold value is set initial value be ; To all features by classification separating capacity value carries out descending sort, and its maximal value and minimum value are assigned to variable
Getting the average of all features to the FDR value of this subclass problem is initial threshold , Flag=0.
To own the component that value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value is clearly 0, and the respective components being greater than the classification capacity structure vector corresponding to feature of this threshold value is set to 1.
Be the feature of 1 to all classification capacity structure components , calculate their mistakenly hit vectors or, namely if be vector of unit length and Flag=0, then to get entire infrastructure component be the average of the value of the feature of 1 is new threshold value , will own value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value component be clearly 0.Else if be not vector of unit length, then getting entire infrastructure component is the feature of 0 the average of value is threshold value , Max is updated to former , the respective components of the classification capacity structure vector corresponding to the feature being greater than this threshold value is set to 1, simultaneously Flag=1.
Be the feature of 1 to all classification capacity structure components again , calculate their mistakenly hit vectors or, namely .
Repeatedly perform this process until make be vector of unit length and till Flag=1.Then threshold value now be designated as last threshold value.
4. the optimal feature subset based on Greedy strategy and classification capacity complementary structure is chosen, and its algorithm is as shown in algorithm 1.
Separating capacity of classifying in all subproblems is greater than the union of the attribute of threshold value as initial characteristics subset after determining by the threshold value of each subproblem.For feature each in initial characteristics subset and classification capacity structure vector, calculate the separating capacity that it is total, that is, weighted sum is asked, as total classification capacity to the FDR value that its structure vector component is the subproblem of 1.By total classification capacity of water to the feature descending sort of initial subset.
By classification capacity structure vector, calculate feature calculation feature to the mistakenly hit vector of whole sample, that is, for a certain subproblem if this feature is corresponding structure vector component be 1, then this subproblem sample is corresponding mistakenly hit vector is the mistakenly hit vector in the former algorithm if structure vector component is 0, then the mistakenly hit vector of the subproblem of its correspondence is 0 vector.The mistakenly hit of whole subproblem vector is connected to come in, as feature mistakenly hit vector.
Choose successively from front to back each feature in initial characteristics subset , and with the character subset chosen all features compare, if all complementary with the characteristic classification capacity structure vector of institute in selected character subset, then directly choose enter character subset, namely otherwise for all the feature that classification capacity structure vector covers, calculates the sample mistakenly hit vector of each feature respectively and the total mistakenly hit of sample is vectorial or after computing, select to make the number of in sample total mistakenly hit vector 1 increase maximum features and enter character subset.If all features all can not make the change of sample total mistakenly hit vector, then do not select.Repeatedly perform said process until the total mistakenly hit vector of sample is unit vector, then character subset for the optimal feature subset chosen.
algorithm 1: choose based on the optimal feature subset of Greedy strategy and classification capacity complementary structure
Input: the classification separating capacity threshold value of each subproblem;
Export: optimal feature subset
Initialization set sample total mistakenly hit vector Hit is 0 vector;
The each feature of For if then
The each feature of For
Calculate the separating capacity that it is total
By total classification capacity of water to the feature descending sort of initial subset
Calculate each feature in CF vectorial to the mistakenly hit of whole sample
do
The each feature of For then
If feature mistakenly hit vector
else
The total mistakenly hit vector of max=sample or after computing in vector 1 number,
The each feature of For
if then
Calculate feature sample mistakenly hit vector with the total mistakenly hit vector of sample or after computing in vector 1 number b
Ifb>maxthen
If feature mistakenly hit vector
while
Embodiment 2
Experimental result of the present invention and data:
Experimental data collection-breast cancer (breast) of the present invention, downloaded from http://www.ccbm.jhu.edu/, see list of references in 2007.Breast data set contains 5 classifications, 9216 features and 54 samples.Traditional objective evaluation index is adopted to carry out the performance of testing algorithm, mainly contain number and the classification predictablity rate of selected characteristic, wherein the number of Feature Selection refers to the number of feature using Feature Selection Algorithms to choose, and classification predictablity rate is the accuracy rate obtained as the input of sorter by the character subset chosen.In order to verify the validity of the method that the present invention proposes, the attribute selection method such as it and existing FCBF, CFS, mrMr, Relief is compared.Just feature evaluated due to mrMr and Relief method and provide ranking results, instead of character subset, in order to the character subset classification results can chosen with the inventive method compares, respectively characteristic evaluation method in CFS, mrMr and Relief and FCBF feature subset selection methods combining are obtained CFS_FCBF, mrMr_FCBF and Relief_FCBF feature subset selection method.In order to represent the necessity of Feature Selection, the method also the present invention proposed with directly use whole feature to carry out classifying to predict that the result of (Orig) compares.The sorter used has naive Bayesian (NB), support vector machine (SVM), k neighbour (KNN), decision tree (C4.5), random forest (RF) and simple classification and regression tree (SCart).
Show the rank of the feature in the optimal feature subset chosen on breast data set of method using the present invention to propose, rank in the subproblem of feature place and FDR total value in table 2, and the rank of these features in other comparative approach with whether selected.As can be seen from Table 2 the present invention propose method choice feature in be all in subproblem rank above, but Partial Feature is but in final ranking and after other existing method ranks.Such as feature 8715_A8715,9063_A9063, although in final ranking ranking rearward, in subproblem, rank is front, therefore to be chosen by the method that the present invention proposes, but additive method is not selected.
Table 3 gives the comparison of distinct methods number of selected characteristic on breast data set.The method that table 4 shows the present invention's proposition is classified comparing of predictablity rate with additive method.
Can find out that from table 3 and table 4 method that the present invention proposes is better than other four kinds of methods.The method that the present invention proposes not only have selected relatively less feature, and all obtains the highest nicety of grading on each sorter.It can also be seen that and use Feature Selection Algorithms select to carry out after character subset the to classify result of prediction to be better than not using Feature Selection.
All these method indicating the present invention's proposition is effective, can obtain good character subset.
List of references:
A.C.Tan,D.Q.Naiman,L.Xu,R.L.Winslow,D.Geman,Simpledecisionrulesforclassifyinghumancancersfromgeneexpressionprofiles,Bioinformatics,2005,21(20):3896–3904.
The optimal feature subset that the method that table 2 the present invention proposes is chosen on breast data set
The number of table 3 distinct methods selected characteristic on breast data set
The method that table 4 the present invention proposes is classified comparing of predictablity rate with additive method

Claims (1)

1. based on the optimal feature subset choosing method of classification capacity structure vector complementation, it is characterized in that, the method concrete steps are as follows:
The first step: define the tagsort ability structure vector based on binary mode and classification capacity complementary structure feature, calculate each tagsort ability structure vector;
Second step: adopt dichotomy to calculate the tagsort capacity threshold of each subclass problem respectively;
3rd step: carry out optimal feature subset according to the structural complementarity maximization principle of different characteristic in selected character subset and Greedy strategy on the basis of the above-described procedure and choose; The calculation procedure of wherein said tagsort ability structure vector is as follows:
For having individual feature the classification problem of class , first adopt 1-vs-1 form to be translated into be made up of any two classes individual two classification subproblems, wherein , then adopt Fisher differentiation rate as feature antithetical phrase Question Classification separating capacity value, be called for short FDR value, be designated as , calculate each feature respectively to the classification separating capacity of individual subproblem , wherein, ; Last will own according to the threshold value of following threshold value calculation method acquisition class separating capacity again value is converted into 0 or 1, thus obtains the classification separating capacity structure vector of each feature to each subproblem; The calculation procedure of described subclass problem threshold value is as follows:
Because the classification capacity of each feature to each subclass problem is different, therefore to each subclass problem calculated threshold, can obtain so respectively individual threshold value, in order to reduce the time complexity of threshold calculations, adopts better simply binary search method, with calculate by class and class form the the threshold value of individual subclass problem is example, and the computation process of threshold value is described;
First threshold value is set initial value be , be all features to the average of the classification separating capacity of individual subclass problem; To all features by classification separating capacity carry out descending sort, and maximal value and minimum value are assigned to variable ; Getting the average of all features to the FDR value of this subclass problem is initial threshold , Flag=0;
To own value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value component be clearly 0, and the respective components being greater than the classification capacity structure vector corresponding to feature of this threshold value is set to 1; Be the feature of 1 to all classification capacity structure components calculate their mistakenly hit vectors or, namely ; If be vector of unit length and Flag=0, then getting entire infrastructure component is the feature of 1 the average of value is new threshold value to own the component that value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value is clearly 0, else if be not vector of unit length, then getting entire infrastructure component is the feature of 0 , Max is updated to former , the respective components of the classification capacity structure vector corresponding to the feature being greater than this threshold value is set to 1, simultaneously Flag=1; Be the feature of 1 to all classification capacity structure components again , calculate their mistakenly hit vectors or, namely ; Repeatedly perform this process until make be vector of unit length and till Flag=1, then threshold value now be designated as last threshold value;
Described optimal feature subset choosing method step is as follows:
After definite threshold, separating capacity of classifying is greater than the union of the attribute of threshold value as initial characteristics subset in all subproblems;
For feature each in initial characteristics subset and classification capacity structure vector, calculate the separating capacity that it is total, that is, weighted sum is asked to the FDR value that its structure vector component is the subproblem of 1, as total classification capacity, by total classification capacity of water to the feature descending sort of initial subset; Choose successively from front to back each feature in initial characteristics subset and with the character subset chosen all features compare, if all complementary with the characteristic classification capacity structure vector of institute in selected character subset, then directly choose enter character subset, namely otherwise, for all the feature that classification capacity structure vector covers, after the sample mistakenly hit vector calculating each feature respectively and sample total mistakenly hit vector or computing, selection can make the number of in sample total mistakenly hit vector 1 increase maximum features and enter character subset, if all features all can not make the change of sample total mistakenly hit vector, then do not select, repeatedly perform said process until the total mistakenly hit vector of sample is unit vector, then character subset for the optimal feature subset chosen.
CN201510621401.3A 2015-09-25 2015-09-25 Optimal feature subset choosing method based on classification capacity structure vector complementation Expired - Fee Related CN105279520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510621401.3A CN105279520B (en) 2015-09-25 2015-09-25 Optimal feature subset choosing method based on classification capacity structure vector complementation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510621401.3A CN105279520B (en) 2015-09-25 2015-09-25 Optimal feature subset choosing method based on classification capacity structure vector complementation

Publications (2)

Publication Number Publication Date
CN105279520A true CN105279520A (en) 2016-01-27
CN105279520B CN105279520B (en) 2018-07-24

Family

ID=55148501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510621401.3A Expired - Fee Related CN105279520B (en) 2015-09-25 2015-09-25 Optimal feature subset choosing method based on classification capacity structure vector complementation

Country Status (1)

Country Link
CN (1) CN105279520B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991296A (en) * 2017-04-01 2017-07-28 大连理工大学 Ensemble classifier method based on the greedy feature selecting of randomization
CN109117956A (en) * 2018-07-05 2019-01-01 浙江大学 A kind of determination method of optimal feature subset
CN109523056A (en) * 2018-10-12 2019-03-26 中国平安人寿保险股份有限公司 Object ability classification prediction technique and device, electronic equipment, storage medium
CN112802555A (en) * 2021-02-03 2021-05-14 南开大学 Complementary differential expression gene selection method based on mvAUC

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136141A (en) * 2007-10-12 2008-03-05 清华大学 Vehicle type classification method based on single frequency continuous-wave radar
CN101154266A (en) * 2006-09-25 2008-04-02 郝红卫 Dynamic selection and circulating integration method for categorizer
US20090326923A1 (en) * 2006-05-15 2009-12-31 Panasonic Corporatioin Method and apparatus for named entity recognition in natural language

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090326923A1 (en) * 2006-05-15 2009-12-31 Panasonic Corporatioin Method and apparatus for named entity recognition in natural language
CN101154266A (en) * 2006-09-25 2008-04-02 郝红卫 Dynamic selection and circulating integration method for categorizer
CN101136141A (en) * 2007-10-12 2008-03-05 清华大学 Vehicle type classification method based on single frequency continuous-wave radar

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SOMOL等: "Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality", 《IEEE》 *
谢娟英等: "基于特征子集区分度与支持向量机的特征选择算法", 《计算机学报》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991296A (en) * 2017-04-01 2017-07-28 大连理工大学 Ensemble classifier method based on the greedy feature selecting of randomization
CN106991296B (en) * 2017-04-01 2019-12-27 大连理工大学 Integrated classification method based on randomized greedy feature selection
CN109117956A (en) * 2018-07-05 2019-01-01 浙江大学 A kind of determination method of optimal feature subset
CN109117956B (en) * 2018-07-05 2021-08-24 浙江大学 Method for determining optimal feature subset
CN109523056A (en) * 2018-10-12 2019-03-26 中国平安人寿保险股份有限公司 Object ability classification prediction technique and device, electronic equipment, storage medium
CN109523056B (en) * 2018-10-12 2023-11-07 中国平安人寿保险股份有限公司 Object capability classification prediction method and device, electronic equipment and storage medium
CN112802555A (en) * 2021-02-03 2021-05-14 南开大学 Complementary differential expression gene selection method based on mvAUC
CN112802555B (en) * 2021-02-03 2022-04-19 南开大学 Complementary differential expression gene selection method based on mvAUC

Also Published As

Publication number Publication date
CN105279520B (en) 2018-07-24

Similar Documents

Publication Publication Date Title
Zheng et al. A novel hybrid algorithm for feature selection based on whale optimization algorithm
CN105279520A (en) Optimal character subclass selecting method based on classification ability structure vector complementation
Allahverdipour et al. An improved k-nearest neighbor with crow search algorithm for feature selection in text documents classification
CN116959725A (en) Disease risk prediction method based on multi-mode data fusion
Nunthanid et al. Parameter-free motif discovery for time series data
Li et al. Nonlinear semi-supervised metric learning via multiple kernels and local topology
Khaleel et al. An automatic text classification system based on genetic algorithm
Morovvat et al. An ensemble of filters and wrappers for microarray data classification
Liping Feature selection algorithm based on conditional dynamic mutual information
Garcıa et al. On the suitability of numerical performance measures for class imbalance problems
Abd-el Fattah et al. A TOPSIS based method for gene selection for cancer classification
Kumar et al. Review of gene subset selection using modified k-nearest neighbor clustering algorithm
Ghaderi Zefrehi et al. Threshold prediction for detecting rare positive samples using a meta-learner
CN106021929A (en) Filter characteristic selection method based on subclass problem classification ability measurement
CN113936246A (en) Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning
Wang et al. Entropic feature discrimination ability for pattern classification based on neural IAL
Sheikhi et al. A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
Chen Research on Cost-sensitive Classification Methods for Imbalanced Data
Khanchouch et al. A comparative study of multi-SOM algorithms for determining the optimal number of clusters
Zhang et al. Source-Free Domain Adaptation for Rotating Machinery Cross-Domain Fault Diagnosis with Neighborhood Reciprocity Clustering
Georgiev et al. Feature selection using Gustafson-Kessel fuzzy algorithm in high dimension data clustering
Kianmehr et al. Effective classification by integrating support vector machine and association rule mining
Li et al. A novel LASSO-based feature weighting selection method for microarray data classification
Tewari et al. Soccer Analytics using Machine Learning
Cebron et al. Active learning in parallel universes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180724

Termination date: 20190925