CN105279520A - Optimal character subclass selecting method based on classification ability structure vector complementation - Google Patents
Optimal character subclass selecting method based on classification ability structure vector complementation Download PDFInfo
- Publication number
- CN105279520A CN105279520A CN201510621401.3A CN201510621401A CN105279520A CN 105279520 A CN105279520 A CN 105279520A CN 201510621401 A CN201510621401 A CN 201510621401A CN 105279520 A CN105279520 A CN 105279520A
- Authority
- CN
- China
- Prior art keywords
- feature
- classification
- vector
- capacity
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
Abstract
The invention aims to solve a problem that most of conventional methods take a single value as the classification ability evaluation standard of characters or character subclasses and provides an optimal character subclass selecting method based on classification ability structure vector complementation. The method defines a character classification ability structure vector V=(V1:V2: ...:Vn) in a binary form and classification ability structure vector complementation characters, adopts the dichotomy for calculating the character classification distinguishing ability threshold of each subclass problem, and performs optimal character subclass selection according to the structure complementation maximization principle and the greedy strategy of different characters in the selected character subclass on the basis. The method fully considers different evaluations of each character on classification abilities of different types, and accords with the classification ability structure complementation maximization principle in the character selection process. The method accords with natural laws of advantage complementation and maximizes character classification information, thereby obtaining a better character subclass, effectively reducing redundant characters, and improving the classification prediction accuracy.
Description
Technical field
The invention belongs to machine learning and mode identification technology, a kind of rationally effective feature subset selection method of concrete proposition.
Background technology
Feature Selection is one of two main method of Dimensionality Reduction.It plays vital effect in machine learning and pattern-recognition, is also one of underlying issue wherein studied, and is data prediction step crucial in structural classification device.Feature Selection chooses on original characteristic set the significant character subset of classification according to some evaluation criterions thus removes irrelevant or redundancy feature, thus the m be down to by the dimension in former space much smaller than former dimension ties up.Along with developing rapidly of internet and high-throughput techniques, instantly entered large data age, data are enormous amount and numerous and complicated feature, and it is even more important at society that this also makes Feature Selection Algorithms study.In recent years, one of principal focal point problem faced when the research of Feature Selection Algorithms has become the data set be applied to containing thousands of features.Feature Selection can make data be easier to understand, and reduces tolerance and memory requirement, reduces training and realize the time, improve estimated performance etc.In this research direction, how evaluating characteristic classification capacity and the choosing method obtaining effective character subset is a key.
In recent years, domestic scholars has carried out a large amount of research work to Feature Selection, and domestic periodical has also published lot of research.All there is a something in common in these Feature Selection Algorithms, namely various classification capacity measure is all a score value feature or character subset being provided to the classification capacity size describing this feature or character subset.It has been generally acknowledged that the classification capacity of the feature that aspect ratio score value that score value is large is little is strong, the feature that thus score value is large also will preferentially be chosen.But a few thing has shown that the feature that some score values are little also should be selected, and some have the classification results also always do not obtained compared with the combination of the feature of high-class ability value.With the comprehensive evaluation that single value representation feature classification capacity size is only to this tagsort ability, and have ignored each feature for the different evaluation of different classes of classification capacity.
Summary of the invention
The present invention, in order to solve the problem in above-mentioned existing method, proposes a kind of optimal feature subset choosing method based on the complementation of classification capacity structure vector newly.The present invention obtains corresponding vectorization classification capacity by evaluating each feature to the classification capacity of different subclass problem, namely by the separating capacity of multiple value representation feature to different subclass problem, then according to the principle selected characteristic of classification capacity structure vector complementation or character subset.The present invention is applicable to multiclass and number of samples is predicted much smaller than data set such as the classification of cancer data collection etc. of Characteristic Number.For breast cancer data set, validity of the present invention will be described in a specific embodiment.
The present invention is by the tagsort ability structure vector of definition based on binary mode and the feature of classification capacity complementary structure, adopt dichotomy to calculate the threshold value of each subclass problem respectively, and carry out optimal feature subset according to the structural complementarity maximization principle of different characteristic in selected character subset and Greedy strategy on this basis and choose.This method had both met the natural law of mutual supplement with each other's advantages, also tagsort information can be performed to ultimate attainment, thus obtain better character subset.Therefore, research considers that the Feature Selection Algorithms of classification capacity structural complementarity has very large meaning.
For achieving the above object, the invention discloses following technology contents:
A kind of optimal feature subset choosing method based on the complementation of classification capacity structure vector, it is characterized in that first the method will carry out the definition of the tagsort ability structure vector based on binary mode, and complete the calculating of each tagsort ability structure vector, concrete steps are as follows:
For having
individual feature
the classification problem of class
, first adopt 1-vs-1 form to be translated into be made up of any two classes
individual two classification subproblems, wherein
, then adopt Fisher differentiation rate as feature antithetical phrase Question Classification separating capacity value, be called for short FDR value, be designated as
, calculate each feature respectively
to
the classification separating capacity of individual subproblem
, wherein,
; Last will own according to the threshold value of following threshold value calculation method acquisition class separating capacity again
be converted into 0 or 1, thus obtain the classification separating capacity structure vector of each feature to each subproblem.
Wherein adopt dichotomy to calculate the threshold value of each subclass problem respectively, concrete steps are as follows:
Because the classification capacity of each feature to each subclass problem is different, therefore to each subclass problem
calculated threshold, can obtain so respectively
individual threshold value, in order to reduce the time complexity of threshold calculations, adopts better simply binary search method, with calculate by
class form the
the threshold value of individual subclass problem is example, and the computation process of threshold value is described;
First threshold value is set
initial value be
, be all features to
the average of the classification separating capacity of individual subclass problem; To all features by classification separating capacity
carry out descending sort, and maximal value and minimum value are assigned to variable
;
The optimal feature subset carried out again based on Greedy strategy and classification capacity complementary structure is chosen on this basis.Concrete steps are as follows:
After definite threshold, separating capacity of classifying is greater than the union of the attribute of threshold value as initial characteristics subset in all subproblems.For feature each in initial characteristics subset
, and classification capacity structure vector, calculate the separating capacity that it is total, that is, weighted sum is asked, as total classification capacity to the FDR value that its structure vector component is the subproblem of 1.By total classification capacity of water to the feature descending sort of initial subset.
Choose successively from front to back each feature in initial characteristics subset
and with the character subset chosen
all features compare, if
all complementary with the characteristic classification capacity structure vector of institute in selected character subset, then directly choose
enter character subset, namely
; Otherwise, for all
the feature that classification capacity structure vector covers, after the sample mistakenly hit vector calculating each feature respectively and sample total mistakenly hit vector or computing, selection can make the number of in sample total mistakenly hit vector 1 increase maximum features and enter character subset, if all features all can not make the change of sample total mistakenly hit vector, then do not select.Repeatedly perform said process until the total mistakenly hit vector of sample is unit vector, then character subset is the optimal feature subset chosen.
The concept relevant with the present invention and definition.
Subclass problem:
If given, there is individual feature
the classification problem of class,
for characteristic set,
for classification, adopt 1-vs-1 form to be translated into be made up of any two classes
individual two classification subproblems, wherein
.Each two classification subproblems are wherein called subclass problem.
Tagsort ability:
The tolerance of certain feature to the classification capacity of classification problem.The present invention adopts the Fisher differentiation rate of feature, namely
as feature
to subproblem
classification capacity value, referred to as FDR value, wherein
feature in sample
mean value, and
feature in class sample
mean value,
feature respectively
variance on two class samples.
Tagsort ability structure vector:
The classification capacity FDR value of certain feature to all subproblems just constitutes a vector, and this vector is called as the classification capacity structure vector of this feature.In order to simplify the complexity of calculating, the present invention adopts the classification capacity structure vector based on binary mode, is designated as:
。
Need to arrange threshold value
the classification capacity FDR value of each feature to each subproblem is converted into 0 or 1.
Feature in the present invention
classification capacity structure vector in subproblem
the computing formula of component be defined as follows:
Sample mistakenly hit vector:
In order to the threshold value and feature subset selection that calculate each subclass problem respectively also introduce sample mistakenly hit vector, to classify all samples to make selected subset.
If a sample belonging to 1 class, its feature
be worth the feature at all samples of 2 class
between the minimum value of value and maximal value, then think that this 1 class sample is by feature
otherwise for hitting.
Then
feature in individual subproblem
sample mistakenly hit vector be designated as:
, 0 represents that corresponding to this component, sample is by mistakenly hit, and 1 expression is hit.And
uniquely determine.
By feature
whole subproblems sample mistakenly hit vector connect come in, as feature
sample mistakenly hit vector.
1. cover:
Suppose two features
with
structure vector be respectively
if had
So just claim feature
cover
feature, is designated as
, otherwise feature
do not cover
feature, is designated as
.
2. classification capacity structure vector complementary characteristic:
For feature
with
if,
set up, then claim these two features to be classification capacity structure vector complementary characteristics, be designated as
.
3. initial characteristics subset sums optimal feature subset
Initial characteristics subset: after definite threshold, is greater than the union of the attribute of threshold value as initial characteristics subset using separating capacity of classifying in all subproblems.
Optimal feature subset: the character subset chosen according to the tagsort ability structure complementary maximization principle of vector and Greedy strategy in initial characteristics subset is called optimal feature subset.
The good effect compared with prior art had based on the feature subset selection method of classification capacity structure vector complementation disclosed by the invention is:
(1) choosing method of the present invention not only takes into full account that each feature is for the different evaluation of different classes of classification capacity, and in Feature Selection process, follow the maximized principle of classification capacity structural complementarity.This method had both met the natural law of mutual supplement with each other's advantages, also tagsort information can be performed to ultimate attainment, thus obtain better character subset, effectively to reduce redundancy feature, improved the accuracy rate of classification prediction.
(2) choosing method of the present invention can solve the comprehensive evaluation that classification capacity measure in existing Feature Selection Algorithms is all the classification capacity using single value as a feature or character subset, and have ignored the problem of each feature for the different evaluation of different classes of classification capacity.The results show effectively can reduce redundancy feature based on the feature subset selection method of classification capacity structure vector complementation, and improving the accuracy rate of classification prediction, is effective.
(3) the present invention can be used for the classification prediction of cancer data collection, improves predictablity rate, is conducive to the important gene finding to cause cancer to occur, to such an extent as to better study the targeted drug of Therapeutic cancer.
Accompanying drawing explanation
Fig. 1 classification problem data set
;
Fig. 2 is the algorithm flow chart based on dichotomy calculated threshold.
Embodiment
In order to explain enforcement of the present invention more fully, below by drawings and Examples, the present invention is described further.These embodiments are only explain instead of limit the scope of the invention.
Embodiment 1
1. read classification problem data set.
Usual classification problem data set is a two-dimensional matrix, such as, have and such as have
individual feature
class
the classification problem of individual sample
data set as shown in Figure 1, wherein
represent the
of individual sample
the eigenwert of individual feature,
represent the
the classification of individual sample.Table 1 shows the expression value of the Partial Feature gene of breast cancer breast data centralization part sample, wherein the second behavior sample classification, the third line is the expression value of first feature on each sample, the rest may be inferred for other row, a sample is shown in one list, i.e. each feature representation value of someone and classification.All eigenwerts of each for data centralization sample are read two-dimensional array
, the classification of each sample is read one-dimension array
in.
The expression value of the Partial Feature gene of table 1 breast cancer breast data centralization part sample
2. calculate each feature to the classification separating capacity value of each subclass problem, namely
.
First adopt 1-vs-1 form multicategory classification problem to be converted into be made up of any two classes
individual two classification subproblems, wherein.Adopt Fisher differentiation rate as feature antithetical phrase Question Classification separating capacity value again,
Then feature:
to
the classification separating capacity of individual subproblem, is designated as
if, the
comprising classification in individual subproblem is
sample, then
Computing formula is as follows:
,
Wherein
feature in class sample
mean value, and
the mean value of feature in class sample,
feature respectively
variance on two class samples.
According to above-mentioned computing method, calculate each feature respectively
to
the classification separating capacity of individual subproblem
, wherein,
.The classification separating capacity of each like this feature to each subproblem just constitutes a vector,
, be called tagsort ability structure vector.
3. adopt dichotomy to calculate the threshold value of each subclass problem respectively.
Because the classification capacity of each feature to each subclass problem is different, therefore to each subclass problem
calculated threshold, can obtain so respectively
individual threshold value, in order to reduce the time complexity of threshold calculations, adopts better simply binary search method.With calculate by
the threshold value of the subclass problem formed is example, and the computation process of threshold value is described, its respective algorithms process flow diagram as shown in Figure 2.
First threshold value is set
initial value be
; To all features by classification separating capacity
value carries out descending sort, and its maximal value and minimum value are assigned to variable
Getting the average of all features to the FDR value of this subclass problem is initial threshold
, Flag=0.
To own
the component that value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value is clearly 0, and the respective components being greater than the classification capacity structure vector corresponding to feature of this threshold value is set to 1.
Be the feature of 1 to all classification capacity structure components
, calculate their mistakenly hit vectors or, namely
if
be vector of unit length and Flag=0, then to get entire infrastructure component be the average of the value of the feature of 1 is new threshold value
, will own
value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value
component be clearly 0.Else if
be not vector of unit length, then getting entire infrastructure component is the feature of 0
the average of value is threshold value
, Max is updated to former
, the respective components of the classification capacity structure vector corresponding to the feature being greater than this threshold value is set to 1, simultaneously Flag=1.
Be the feature of 1 to all classification capacity structure components again
, calculate their mistakenly hit vectors or, namely
.
Repeatedly perform this process until make
be vector of unit length and till Flag=1.Then threshold value now
be designated as last threshold value.
4. the optimal feature subset based on Greedy strategy and classification capacity complementary structure is chosen, and its algorithm is as shown in algorithm 1.
Separating capacity of classifying in all subproblems is greater than the union of the attribute of threshold value as initial characteristics subset after determining by the threshold value of each subproblem.For feature each in initial characteristics subset
and classification capacity structure vector, calculate the separating capacity that it is total, that is, weighted sum is asked, as total classification capacity to the FDR value that its structure vector component is the subproblem of 1.By total classification capacity of water to the feature descending sort of initial subset.
By classification capacity structure vector, calculate feature calculation feature
to the mistakenly hit vector of whole sample, that is, for a certain subproblem
if this feature is corresponding
structure vector component be 1, then this subproblem sample is corresponding mistakenly hit vector is the mistakenly hit vector in the former algorithm
if structure vector component is 0, then the mistakenly hit vector of the subproblem of its correspondence is 0 vector.The mistakenly hit of whole subproblem vector is connected to come in, as feature
mistakenly hit vector.
Choose successively from front to back each feature in initial characteristics subset
, and with the character subset chosen
all features compare, if
all complementary with the characteristic classification capacity structure vector of institute in selected character subset, then directly choose
enter character subset, namely
otherwise for all
the feature that classification capacity structure vector covers, calculates the sample mistakenly hit vector of each feature respectively and the total mistakenly hit of sample is vectorial or after computing, select to make the number of in sample total mistakenly hit vector 1 increase maximum features and enter character subset.If all features all can not make the change of sample total mistakenly hit vector, then do not select.Repeatedly perform said process until the total mistakenly hit vector of sample is unit vector, then character subset
for the optimal feature subset chosen.
algorithm 1: choose based on the optimal feature subset of Greedy strategy and classification capacity complementary structure
Input: the classification separating capacity threshold value of each subproblem;
Export: optimal feature subset
Initialization set
sample total mistakenly hit vector Hit is 0 vector;
The each feature of For
if
then
The each feature of For
Calculate the separating capacity that it is total
By total classification capacity of water to the feature descending sort of initial subset
Calculate each feature in CF vectorial to the mistakenly hit of whole sample
do
The each feature of For
then
If
feature
mistakenly hit vector
else
The total mistakenly hit vector of max=sample or after computing in vector 1 number,
The each feature of For
if
then
Calculate feature
sample mistakenly hit vector with the total mistakenly hit vector of sample or after computing in vector 1 number b
Ifb>maxthen
If
feature
mistakenly hit vector
while
Embodiment 2
Experimental result of the present invention and data:
Experimental data collection-breast cancer (breast) of the present invention, downloaded from http://www.ccbm.jhu.edu/, see list of references in 2007.Breast data set contains 5 classifications, 9216 features and 54 samples.Traditional objective evaluation index is adopted to carry out the performance of testing algorithm, mainly contain number and the classification predictablity rate of selected characteristic, wherein the number of Feature Selection refers to the number of feature using Feature Selection Algorithms to choose, and classification predictablity rate is the accuracy rate obtained as the input of sorter by the character subset chosen.In order to verify the validity of the method that the present invention proposes, the attribute selection method such as it and existing FCBF, CFS, mrMr, Relief is compared.Just feature evaluated due to mrMr and Relief method and provide ranking results, instead of character subset, in order to the character subset classification results can chosen with the inventive method compares, respectively characteristic evaluation method in CFS, mrMr and Relief and FCBF feature subset selection methods combining are obtained CFS_FCBF, mrMr_FCBF and Relief_FCBF feature subset selection method.In order to represent the necessity of Feature Selection, the method also the present invention proposed with directly use whole feature to carry out classifying to predict that the result of (Orig) compares.The sorter used has naive Bayesian (NB), support vector machine (SVM), k neighbour (KNN), decision tree (C4.5), random forest (RF) and simple classification and regression tree (SCart).
Show the rank of the feature in the optimal feature subset chosen on breast data set of method using the present invention to propose, rank in the subproblem of feature place and FDR total value in table 2, and the rank of these features in other comparative approach with whether selected.As can be seen from Table 2 the present invention propose method choice feature in be all in subproblem rank above, but Partial Feature is but in final ranking and after other existing method ranks.Such as feature 8715_A8715,9063_A9063, although in final ranking ranking rearward, in subproblem, rank is front, therefore to be chosen by the method that the present invention proposes, but additive method is not selected.
Table 3 gives the comparison of distinct methods number of selected characteristic on breast data set.The method that table 4 shows the present invention's proposition is classified comparing of predictablity rate with additive method.
Can find out that from table 3 and table 4 method that the present invention proposes is better than other four kinds of methods.The method that the present invention proposes not only have selected relatively less feature, and all obtains the highest nicety of grading on each sorter.It can also be seen that and use Feature Selection Algorithms select to carry out after character subset the to classify result of prediction to be better than not using Feature Selection.
All these method indicating the present invention's proposition is effective, can obtain good character subset.
List of references:
A.C.Tan,D.Q.Naiman,L.Xu,R.L.Winslow,D.Geman,Simpledecisionrulesforclassifyinghumancancersfromgeneexpressionprofiles,Bioinformatics,2005,21(20):3896–3904.
The optimal feature subset that the method that table 2 the present invention proposes is chosen on breast data set
The number of table 3 distinct methods selected characteristic on breast data set
The method that table 4 the present invention proposes is classified comparing of predictablity rate with additive method
Claims (1)
1. based on the optimal feature subset choosing method of classification capacity structure vector complementation, it is characterized in that, the method concrete steps are as follows:
The first step: define the tagsort ability structure vector based on binary mode
and classification capacity complementary structure feature, calculate each tagsort ability structure vector;
Second step: adopt dichotomy to calculate the tagsort capacity threshold of each subclass problem respectively;
3rd step: carry out optimal feature subset according to the structural complementarity maximization principle of different characteristic in selected character subset and Greedy strategy on the basis of the above-described procedure and choose; The calculation procedure of wherein said tagsort ability structure vector is as follows:
For having
individual feature
the classification problem of class
, first adopt 1-vs-1 form to be translated into be made up of any two classes
individual two classification subproblems, wherein
, then adopt Fisher differentiation rate as feature antithetical phrase Question Classification separating capacity value, be called for short FDR value, be designated as
, calculate each feature respectively
to
the classification separating capacity of individual subproblem
, wherein,
; Last will own according to the threshold value of following threshold value calculation method acquisition class separating capacity again
value is converted into 0 or 1, thus obtains the classification separating capacity structure vector of each feature to each subproblem; The calculation procedure of described subclass problem threshold value is as follows:
Because the classification capacity of each feature to each subclass problem is different, therefore to each subclass problem
calculated threshold, can obtain so respectively
individual threshold value, in order to reduce the time complexity of threshold calculations, adopts better simply binary search method, with calculate by
class and
class form the
the threshold value of individual subclass problem is example, and the computation process of threshold value is described;
First threshold value is set
initial value be
, be all features to
the average of the classification separating capacity of individual subclass problem; To all features by classification separating capacity
carry out descending sort, and maximal value and minimum value are assigned to variable
; Getting the average of all features to the FDR value of this subclass problem is initial threshold
, Flag=0;
To own
value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value
component be clearly 0, and the respective components being greater than the classification capacity structure vector corresponding to feature of this threshold value is set to 1; Be the feature of 1 to all classification capacity structure components
calculate their mistakenly hit vectors or, namely
; If
be vector of unit length and Flag=0, then getting entire infrastructure component is the feature of 1
the average of value is new threshold value
to own
the component that value is less than corresponding subproblem in the classification capacity structure vector of the feature of this threshold value is clearly 0, else if
be not vector of unit length, then getting entire infrastructure component is the feature of 0
, Max is updated to former
, the respective components of the classification capacity structure vector corresponding to the feature being greater than this threshold value is set to 1, simultaneously Flag=1; Be the feature of 1 to all classification capacity structure components again
, calculate their mistakenly hit vectors or, namely
; Repeatedly perform this process until make
be vector of unit length and till Flag=1, then threshold value now
be designated as last threshold value;
Described optimal feature subset choosing method step is as follows:
After definite threshold, separating capacity of classifying is greater than the union of the attribute of threshold value as initial characteristics subset in all subproblems;
For feature each in initial characteristics subset
and classification capacity structure vector, calculate the separating capacity that it is total, that is, weighted sum is asked to the FDR value that its structure vector component is the subproblem of 1, as total classification capacity, by total classification capacity of water to the feature descending sort of initial subset; Choose successively from front to back each feature in initial characteristics subset
and with the character subset chosen
all features compare, if
all complementary with the characteristic classification capacity structure vector of institute in selected character subset, then directly choose
enter character subset, namely
otherwise, for all
the feature that classification capacity structure vector covers, after the sample mistakenly hit vector calculating each feature respectively and sample total mistakenly hit vector or computing, selection can make the number of in sample total mistakenly hit vector 1 increase maximum features and enter character subset, if all features all can not make the change of sample total mistakenly hit vector, then do not select, repeatedly perform said process until the total mistakenly hit vector of sample is unit vector, then character subset
for the optimal feature subset chosen.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510621401.3A CN105279520B (en) | 2015-09-25 | 2015-09-25 | Optimal feature subset choosing method based on classification capacity structure vector complementation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510621401.3A CN105279520B (en) | 2015-09-25 | 2015-09-25 | Optimal feature subset choosing method based on classification capacity structure vector complementation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105279520A true CN105279520A (en) | 2016-01-27 |
CN105279520B CN105279520B (en) | 2018-07-24 |
Family
ID=55148501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510621401.3A Expired - Fee Related CN105279520B (en) | 2015-09-25 | 2015-09-25 | Optimal feature subset choosing method based on classification capacity structure vector complementation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105279520B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991296A (en) * | 2017-04-01 | 2017-07-28 | 大连理工大学 | Ensemble classifier method based on the greedy feature selecting of randomization |
CN109117956A (en) * | 2018-07-05 | 2019-01-01 | 浙江大学 | A kind of determination method of optimal feature subset |
CN109523056A (en) * | 2018-10-12 | 2019-03-26 | 中国平安人寿保险股份有限公司 | Object ability classification prediction technique and device, electronic equipment, storage medium |
CN112802555A (en) * | 2021-02-03 | 2021-05-14 | 南开大学 | Complementary differential expression gene selection method based on mvAUC |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101136141A (en) * | 2007-10-12 | 2008-03-05 | 清华大学 | Vehicle type classification method based on single frequency continuous-wave radar |
CN101154266A (en) * | 2006-09-25 | 2008-04-02 | 郝红卫 | Dynamic selection and circulating integration method for categorizer |
US20090326923A1 (en) * | 2006-05-15 | 2009-12-31 | Panasonic Corporatioin | Method and apparatus for named entity recognition in natural language |
-
2015
- 2015-09-25 CN CN201510621401.3A patent/CN105279520B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090326923A1 (en) * | 2006-05-15 | 2009-12-31 | Panasonic Corporatioin | Method and apparatus for named entity recognition in natural language |
CN101154266A (en) * | 2006-09-25 | 2008-04-02 | 郝红卫 | Dynamic selection and circulating integration method for categorizer |
CN101136141A (en) * | 2007-10-12 | 2008-03-05 | 清华大学 | Vehicle type classification method based on single frequency continuous-wave radar |
Non-Patent Citations (2)
Title |
---|
SOMOL等: "Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality", 《IEEE》 * |
谢娟英等: "基于特征子集区分度与支持向量机的特征选择算法", 《计算机学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991296A (en) * | 2017-04-01 | 2017-07-28 | 大连理工大学 | Ensemble classifier method based on the greedy feature selecting of randomization |
CN106991296B (en) * | 2017-04-01 | 2019-12-27 | 大连理工大学 | Integrated classification method based on randomized greedy feature selection |
CN109117956A (en) * | 2018-07-05 | 2019-01-01 | 浙江大学 | A kind of determination method of optimal feature subset |
CN109117956B (en) * | 2018-07-05 | 2021-08-24 | 浙江大学 | Method for determining optimal feature subset |
CN109523056A (en) * | 2018-10-12 | 2019-03-26 | 中国平安人寿保险股份有限公司 | Object ability classification prediction technique and device, electronic equipment, storage medium |
CN109523056B (en) * | 2018-10-12 | 2023-11-07 | 中国平安人寿保险股份有限公司 | Object capability classification prediction method and device, electronic equipment and storage medium |
CN112802555A (en) * | 2021-02-03 | 2021-05-14 | 南开大学 | Complementary differential expression gene selection method based on mvAUC |
CN112802555B (en) * | 2021-02-03 | 2022-04-19 | 南开大学 | Complementary differential expression gene selection method based on mvAUC |
Also Published As
Publication number | Publication date |
---|---|
CN105279520B (en) | 2018-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zheng et al. | A novel hybrid algorithm for feature selection based on whale optimization algorithm | |
CN105279520A (en) | Optimal character subclass selecting method based on classification ability structure vector complementation | |
Allahverdipour et al. | An improved k-nearest neighbor with crow search algorithm for feature selection in text documents classification | |
CN116959725A (en) | Disease risk prediction method based on multi-mode data fusion | |
Nunthanid et al. | Parameter-free motif discovery for time series data | |
Li et al. | Nonlinear semi-supervised metric learning via multiple kernels and local topology | |
Khaleel et al. | An automatic text classification system based on genetic algorithm | |
Morovvat et al. | An ensemble of filters and wrappers for microarray data classification | |
Liping | Feature selection algorithm based on conditional dynamic mutual information | |
Garcıa et al. | On the suitability of numerical performance measures for class imbalance problems | |
Abd-el Fattah et al. | A TOPSIS based method for gene selection for cancer classification | |
Kumar et al. | Review of gene subset selection using modified k-nearest neighbor clustering algorithm | |
Ghaderi Zefrehi et al. | Threshold prediction for detecting rare positive samples using a meta-learner | |
CN106021929A (en) | Filter characteristic selection method based on subclass problem classification ability measurement | |
CN113936246A (en) | Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning | |
Wang et al. | Entropic feature discrimination ability for pattern classification based on neural IAL | |
Sheikhi et al. | A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection | |
Chen | Research on Cost-sensitive Classification Methods for Imbalanced Data | |
Khanchouch et al. | A comparative study of multi-SOM algorithms for determining the optimal number of clusters | |
Zhang et al. | Source-Free Domain Adaptation for Rotating Machinery Cross-Domain Fault Diagnosis with Neighborhood Reciprocity Clustering | |
Georgiev et al. | Feature selection using Gustafson-Kessel fuzzy algorithm in high dimension data clustering | |
Kianmehr et al. | Effective classification by integrating support vector machine and association rule mining | |
Li et al. | A novel LASSO-based feature weighting selection method for microarray data classification | |
Tewari et al. | Soccer Analytics using Machine Learning | |
Cebron et al. | Active learning in parallel universes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180724 Termination date: 20190925 |