CN109934278A - A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set - Google Patents
A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set Download PDFInfo
- Publication number
- CN109934278A CN109934278A CN201910168981.3A CN201910168981A CN109934278A CN 109934278 A CN109934278 A CN 109934278A CN 201910168981 A CN201910168981 A CN 201910168981A CN 109934278 A CN109934278 A CN 109934278A
- Authority
- CN
- China
- Prior art keywords
- attribute
- information gain
- feature
- red
- reduction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The invention discloses a kind of high-dimensional feature selection methods of information gain mixing neighborhood rough set, and specific steps include the following: step 1: data prediction;Step 2: image segmentation;Step 3: feature extraction;Step 4: feature normalization;Step 5: the feature selecting based on information gain;Step 6: the feature selecting based on field rough set;Step 7: Classification and Identification is carried out to reduction result twice.The present disclosure provides a kind of high-dimensional feature selection methods of information gain mixing neighborhood rough set, and from the feasibility of theoretical level analysis two stages Algorithm for Reduction.The accuracy of algorithm can be improved in algorithm, time complexity is effectively reduced, and the performance of the high dimensional feature selection algorithm of Comprehensive Correlation distinct methods building, ensure the superiority of context of methods, guarantee the science of result from the gradually selection of model method, pernicious identification good to lung tumors has certain reference value.
Description
Technical field
The present invention relates to technical field of image processing, more particularly to a kind of information gain mixes neighborhood rough set
High-dimensional feature selection method.
Background technique
Information gain (information gain, IG) and rough set (rough set, RS) are feature selectings common two
Kind of algorithm, IG are to measure to include or provide the index of how many information content for classifier when not comprising some feature, are successively asked
The information content that each feature provides classifier out, is then ranked up from big to small, K spy before taking according to certain rules
Sign, to achieve the purpose that carry out feature selecting using information gain.IG progress feature selecting computation complexity is lower, only needs
Single operation, therefore operational efficiency is higher, can effectively reject redundancy, uncorrelated and noise characteristic.But IG is as a kind of mistake
Filter formula algorithm carries out feature selecting and still has problem, it can only investigate contribution of the feature to whole system, and cannot arrive in detail
In some classification, and the relationship between feature is not considered, therefore be only suitable for (referring to whole for the feature selecting for doing " overall situation "
Class all use same characteristic set).And the feature selecting of " part " can not be done (each classification has the feature set of oneself
It closes, some features have biggish discrimination to a certain classification, and then insignificant to other classifications).RS is that processing is uncertain
Property data effective tool, because its be not necessarily to priori knowledge characteristic, be widely used in feature selecting, pattern-recognition, data mining
With the fields such as Knowledge Discovery.Two key concepts of RS research are that concept is approximate and attribute reduction respectively, and wherein attribute reduction is
The dimension of attribute is reduced under the premise of not influencing current identification mission differentiability, but RS is constructed on the basis of certain
Equivalence relation, be rather limited in many practical applications.Therefore in order to avoid data to the dependence of single method and
Preferably the redundancy in rejecting data set and uncorrelated attribute, many scholars are superior by the global characteristics selective power of IG and RS
Attribute reduction ability combine and carry out high dimensional feature selection, be successfully applied to sentiment analysis, real estate marked price analysis, swollen
Tumor diagnostic classification, the prediction of fishing feelings etc..But Pawlak RS can only handle nominal type variable, the data in practical application are often
Continuous numerical variable, although the data set after discretization is adapted to the building of RS algorithm equivalence class, but may also can lose
It loses important information and different discretization strategies also will affect reduction effect.It is mentioned for this purpose, Hu Qinghua et al. introduces neighborhood relationships
Go out improved Pawlak RS, i.e. neighborhood rough set (neighborhood rough set, NRS), it can be directly to continuous
Numeric type data is handled.Although IG and RS can individually carry out feature selecting, have some limitations, therefore
The advantage of the two is combined and carries out feature selecting with certain feasibility, selects high relevant feature by IG result
Collection, then the attribute by NRS rejecting highly redundant, wherein NRS can overcome RS to be only suitable for handling discrete variable and causing original
The problem of information is largely lost.Optimal character subset is obtained by attribute reduction twice, can preferably be rejected in data set
Redundancy and uncorrelated features, improve the performance of algorithm, reduce time complexity, can also to avoid data to single method according to
Rely.
Therefore, how to provide a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set is this field skill
The problem of art personnel's urgent need to resolve.
Summary of the invention
In view of this, the present invention provides a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set,
And from the feasibility of theoretical level analysis two stages Algorithm for Reduction.By with not Algorithm for Reduction, Pawlak RS, IG and NRS about
Contracted calculation is compared it is found that the accuracy of algorithm can be improved in the algorithm, and time complexity, and Comprehensive Correlation is effectively reduced
The performance of the high dimensional feature selection algorithm of distinct methods building, it is ensured that the superiority of context of methods, from the gradually choosing of model method
The science for guaranteeing result is selected, pernicious identification good to lung tumors has certain reference value.
To achieve the goals above, the invention provides the following technical scheme:
A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set, specific steps include the following:
Step 1: data prediction;Image is numbered in sequence respectively, pseudo- coloured silk is gone to be converted into gray level image;From gray scale
Divide ROI region in image, and by the image normalization of ROI region;
Step 2: image segmentation;To the image of pretreated obtained ROI region, using maximum variance between clusters into
Row segmentation;
Step 3: feature extraction;The target area image of ROI region after segmentation is extracted into feature;And construct continuous type
Decision information table S0;
Step 4: feature normalization;Continuous type decision information table S will be constructed in step 30Conditional attribute carry out normalizing
Change, obtains new continuous type decision information table S;
Step 5: the feature selecting based on information gain;Using the continuous type decision information table S in step 4 as defeated
Enter, the attribute set red after obtaining information gain reduction1;
Step 6: the feature selecting based on field rough set;Attribute set red after inputting information gain reduction1By
The feature selecting of field rough set obtains reduction result red twice;
Step 7: Classification and Identification is carried out to reduction result twice.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step
It is in order to eliminate error present in ROI region acquisition process and facilitate the processing of subsequent image, ROI image is complete in rapid one
Portion is normalized to the image of 50 × 50 pixel sizes.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step
In rapid two, the image of ROI region is cut into two groups in the punishment of a certain threshold value, one group of corresponding background, one group of corresponding target.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step
It includes: shape feature, textural characteristics and gray feature that feature is extracted in rapid three.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step
The target area image of the ROI region after segmentation is extracted into feature in rapid four, and the feature of extraction is normalized, so that
Data after normalization are all fallen between [0,1], formula are as follows:
Wherein, xmaxAnd xminRespectively indicate the maximum value and minimum value of sample array.Herein only to step 3 feature extraction
The consecutive decision making table S constructed afterwards0In conditional attribute be normalized, decision attribute is obtained without normalized
To new continuous type decision information table S.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step
Specific steps include: in rapid five
1) continuous type decision information table S=(U, A, V, f) is inputted, wherein U indicates domain, A=C ∪ D, C conditional attribute
The target area image of ROI region after segmentation is extracted feature by collection, and the collection after feature is normalized
It closes, D indicates the set that decision attribute is constituted;The union of V expression attribute codomain;The information function of f expression mapping relations;
2) init attributes set red1=φ calculates the information gain Gain (C of each conditional attributei), calculate each
The average value average of part attribute information gain;
3) the maximum attribute c of information gain is selectedi, attribute set red1=red1∪{ci, and in conditional attribute collection C
Remove the attribute;
If 4) the maximum attribute c of information gainiInformation gain value be less than average value average, then stop obtain letter
Cease the attribute set red of gain reduction1, otherwise it is adjusted to step 2).
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step
Specific steps include: in rapid six
1) the attribute set red of information gain reduction is inputted1=(U, A ', V, f), wherein A '=C ' ∪ D, C ' expression step
Information gain value is greater than or equal to the set of the conditional attribute of average information yield value in five, determines the set of radius of neighbourhood δ, if
Setting different degree lower limit is 0.001;
2) reduction set red=φ, sample smp=U twice are initialized;
3) rightUtilize formulaCalculate positive domainIts
In,δB(xi)={ xj|xj∈U,ΔB(xi,xj)≤δ },N represents the number that domain U is divided into equivalence class by decision attribute D,
4) for a ∈ B, a is selectedkSo that positive domainIt is maximum;
5) formula sig is utilizedB(c, B, D)=γB∪c(D)-γB(D) computation attribute different degree sig (ak, red, D), whereinIndicate dependency degree of the decision attribute D relative to subset B;
If 6) sig (ak, red, D) and be greater than the different degree lower limit value of setting, then reduction result red is exported, program is terminated, it is no
Then, k value is recorded, is enabled:Then return step 2) continue to calculate, until output reduction result
red。
It can be seen via above technical scheme that compared with prior art, it is mixed that the present disclosure provides a kind of information gains
The high-dimensional feature selection method of neighborhood rough set is closed, and from the feasibility of theoretical level analysis two stages Algorithm for Reduction.Algorithm
Time complexity, and the high dimensional feature choosing of Comprehensive Correlation distinct methods building is effectively reduced in the accuracy that algorithm can be improved
Select the performance of algorithm, it is ensured that the superiority of context of methods guarantees the science of result, to lung from the gradually selection of model method
Benign from malignant tumors identification in portion's has certain reference value.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 attached drawing is flow chart of the invention;
Fig. 2 attached drawing is data prediction schematic diagram of the invention;
Fig. 3 attached drawing is image segmentation schematic diagram of the invention;
Fig. 4 attached drawing is that the present invention is based on the flow charts of the feature selecting of field rough set;
Fig. 5 attached drawing is the histogram that algorithms of different reduction length compares in present invention experiment one;
Fig. 6 attached drawing is the histogram that algorithms of different classification accuracy compares in present invention experiment two;
Fig. 7 attached drawing is the histogram that the algorithms of different classification time compares in present invention experiment two.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a kind of high-dimensional feature selection methods of information gain mixing neighborhood rough set, and from
The feasibility of theoretical level analysis two stages Algorithm for Reduction.By with not Algorithm for Reduction, Pawlak RS, IG and NRS Algorithm for Reduction
It is compared it is found that the accuracy of algorithm can be improved in the algorithm, is effectively reduced time complexity, and Comprehensive Correlation not Tongfang
The performance of the high dimensional feature selection algorithm of method building, it is ensured that the superiority of context of methods is protected from the gradually selection of model method
The science of result is demonstrate,proved, pernicious identification good to lung tumors has certain reference value.
Embodiment:
(1) data acquisition
Data source is in hospital general, Ningxia Medical University, and every number of cases is according to including clinical diagnosis result, image data, inspection institute
See, clinical findings are the standards for diagnosing lung's benign from malignant tumors.Cause model training insufficient in order to avoid data are very few, this
Research is not limited to certain lung tumors.Therefore lung tumors data 3000 are obtained, wherein pulmonary malignant tumour CT data
1500, benign lung tumour CT data 1500.
(2) data prediction
Lung's innocent and malignant tumour CT image is obtained from DICOM file according to the inspection conclusion in doctor's advice referring to every number of cases, it will
Image is numbered in sequence respectively, and pseudo- coloured silk is gone to be converted into gray level image.With the sick label of image department medical courses in general in gray level image
Interception has the subgraph of stronger separating capacity as ROI region lung tumors centered on stove, and is by ROI region image normalization
50 × 50 pixels;Process of data preprocessing is as shown in Figure 2.
(3) image segmentation
In order to which the features such as shape, texture and gray scale to lung images are accurately measured, to the pretreated area ROI
Domain selects maximum variance between clusters (OTSU algorithm) to be split.Because it is most effective, most steady that OTSU algorithm is that threshold value is chosen automatically
One of fixed method, and do not influenced under certain condition by picture contrast and brightness change.The basic principle is that by the area ROI
Area image is cut into two groups in the punishment of a certain threshold value, one group of corresponding background, one group of corresponding target.As shown in figure 3, providing the present invention point
Cut 5 groups of examples of front and back;
(4) feature extraction
Feature extraction, the feature of extraction totally 104 dimension, including shape are carried out for the target area ROI after step (3) segmentation
Shape feature, textural characteristics and gray feature, specific features are shown in Table 1.It extracts feature and constructs continuous type decision information table later
S0: including 3000 samples, each sample includes 104 dimension conditional attributes and 1 dimension decision attribute;
1 lung tumors CT characteristics of image of table
(5) feature normalization
Accurate data processed result in order to obtain, feature (the i.e. step (4) that the target area ROI after segmentation is extracted
The continuous type characteristic set of extraction) difference for eliminating data bulk grade and dimension is normalized, the present invention uses commonly most
Big minimum value method, so that the data after normalization are all fallen between [0,1], formula are as follows:
Wherein, xmaxAnd xminRespectively indicate the maximum value and minimum value of sample array.
(6) based on the feature selecting of information gain
Input: continuous type decision information table S=(U, A, V, f), wherein U=(x1,x2,...,xn) it is known as domain, indicate complete
The set that body sample is constituted;A=C ∪ D, the set that C indicates that conditional attribute is constituted (pass through normalized in step (5)
104 dimensional feature set), (i.e. lung tumors is good pernicious, represents malignant pulmonary with number 1 for the set that D expression decision attribute is constituted
Tumour, -1 represents benign malignant tumour);The union of V expression attribute codomain;The information function of f expression mapping relations;
Output: the attribute set red after information gain reduction1;
Step: 1) set red is initialized1=φ calculates the information gain Gain (C of each conditional attributei), it calculates each
The average value average of conditional attribute information gain;
2) the maximum attribute c of information gain is selectedi, red1=red1∪{ci, and remove the category in conditional attribute collection C
Property;
If 3) the maximum attribute c of information gainiInformation gain value be less than average value average, then stop obtain
red1, otherwise it is adjusted to step 2).
(7) based on the feature selecting of neighborhood rough set
NRS attribute reduction is that redundant attributes, reduction are deleted under the premise of not influencing decision system decision-making capability itself
Algorithm use before to greedy algorithm, as shown in figure 4, its key step is as follows:
Input: the attribute set red after information gain reduction1=(U, A ', V, f), wherein A '=C ' ∪ D, C ' expression walks
Suddenly information gain value determines the set of radius of neighbourhood δ more than or equal to the set of the conditional attribute of average information yield value in (6),
It is 0.001 that different degree lower limit, which is arranged,;
Output: reduction set red twice;
Step: 1) reduction set red=φ, sample smp=U twice are initialized;
2) rightUtilize formulaCalculate positive domainIts
In,δB(xi)={ xj|xj∈U,ΔB(xi,xj)≤δ },N represents the number that domain U is divided into equivalence class by decision attribute D,
3) for a ∈ B, a is selectedkSo that positive domainIt is maximum;
4) formula sig is utilizedB(c, B, D)=γB∪c(D)-γB(D) computation attribute different degree sig (ak, red, D), whereinIndicate dependency degree of the decision attribute D relative to subset B;
If 5) sig (ak, red, D) and be greater than the different degree lower limit value of setting, then reduction result red is exported, program is terminated, it is no
Then, k value is recorded, is enabled: red=red+ak,Then return step 2) continue to calculate, until output reduction result
red。
(8) machine learning method on Statistical Learning Theory basis is established using support vector machines, most according to structure
Smallization principle can preferably solve small sample, overfitting, high latitude, the locally practical challenges such as ultimate attainment, have very strong
Generalization ability and Classification and Identification ability can be solved effectively " non-linear, high during the CAD based on medical image is diagnosed
The problem of dimension ".Classification and Identification is carried out using result of the SVM to reduction twice, wherein Selection of kernel function Radial basis kernel function
(Radial Basis Function, RBF), C and g using grid optimizing algorithm (Grid Search, GS) optimization SVM join
Number.
The performance evaluation for early diagnosing accuracy includes the big index of sensibility and specificity two, but the two indexs are difficult
The overall performance of comprehensive interpretive classification device.Therefore, the present invention is reduction length to reduction model evaluation index, and disaggregated model is commented
Valence index includes: accuracy (Accuracy), sensibility (Sensitivity), specificity (Specificity), F value (F-
Score value), Ma Xiusi relative coefficient (Matthews correlation coefficient, MCC), balance F score
(balanced F Score,F1Score), youden index (Youden index, YI) and algorithm are time-consuming (Time).
Accuracy (Accuracy) is the most common evaluation index, and accuracy is higher, and classifier is better, and calculation formula is such as
Under:
Sensitivity (sensitive) and specificity (specificity) are respectively intended to measure classifier to positive example and negative example
Recognition capability, be worth it is bigger, recognition performance is higher, and calculation formula is as follows:
F value is recall ratio and precision ratio weighted harmonic mean, for weighing accurate rate and recall rate.
MCC is the related coefficient described between actual classification and prediction classification, considers true positives, true negative, vacation comprehensively
Positive and false negative is a kind of more balanced index, its value range is [- 1,1], and value indicates tested right closer to 1
The prediction of elephant is more accurate, and calculation formula is as follows:
F1Score is a kind of more comprehensively evaluation index that two disaggregated model accuracy are measured in statistics, is accurate
A kind of weighted average of rate and recall rate, its value range are [0,1], and the accuracy rate for being worth closer 1 representative model is higher, meter
It is as follows to calculate formula:
YI is also known as correct index, is indicated with the value that the sum of sensitivity and specificity subtract 1, its value range is [0,1], value
Closer to 1, the authenticity of model prediction is better, and calculation formula is as follows:
YI=Sensitivity+Specificity-1
Algorithm time-consuming (Time) indicates algorithm from bringing into operation to terminating the time it takes.
Wherein, TP indicates correctly to be divided into the number of positive example, i.e., practical to be positive example and be classified device and be divided into positive example
Sample number;FP indicates mistakenly to be divided into the number of positive example, i.e., the example that is actually negative but is classified the sample that device is divided into positive example
This number;FN indicates the number for mistakenly being divided the example that is negative, i.e., practical to be positive example but be classified the sample number that device divides the example that is negative;
TN indicates correctly to be divided the number of example of being negative, i.e., the example that is actually negative and is classified the sample number that device divides the example that is negative.
Experimental result and analysis
Original decision letter can be effectively reduced from theoretic in character subset by IG's and after the reduction of NRS two stages
The dimension of table is ceased, time complexity and space complexity are reduced.Data noise can be reduced with preliminary screening by IG, rejected related
The lesser attribute of property, the attribute of highly redundant can be effectively rejected by bis- reduction of NRS.In order to further verify text proposition
The feasibility and validity of two stages reduction high dimensional feature selection algorithm, with 3000 (1500 benign, and 1500 pernicious) lungs
Portion's tumour CT image is research object, extracts shape, texture and gray feature totally 104 dimension construction original respectively after obtaining ROI region
Beginning characteristic set carries out two stages reduction using IG and NRS, and reduction result carries out Classification and Identification using SVM.
Test the comparison of an algorithms of different reduction result
Reduction is carried out to original decision information table using different algorithms, concrete outcome is as shown in Figure 5.From figure 5 it can be seen that adopting
When being carried out after reduction original decision information table compared to not reduction with algorithms of different, the dimension of information table, which has, largely to drop
Low, the reduction length of inventive algorithm is only above NRS algorithm, reduces by 65 dimensions compared to raw information table dimension.
Test the comparison of two algorithms of different classification results
Five folding of SVM intersection is utilized respectively to the reduction of one algorithms of different of experiment (to select from 1500 good (evil) property every time
300 are taken as test set, remaining 1200 are used as training set) Classification and Identification is carried out, from accuracy, susceptibility, specificity, F
The superiority and inferiority of value, MCC, F1Score, 8 Youden, total time metrics evaluation algorithms, each five folding of index of algorithms of different intersect result
Average value as final evaluation result.Concrete outcome is shown in Table 2:
The comparison of 2 algorithms of different classification results of table
As can be seen from Table 2, each evaluation index of algorithm difference number of crossings of the same race has differences, for the property of comprehensive measure algorithm
Can, final classification result of the average value intersected using five foldings as the algorithm.Removed Pawlak RS-SVM, calculation of the invention
Method susceptibility has lesser degree of reduction compared to other algorithms, other indexs are better than other algorithms, accuracy, specificity, F
Value, MCC, F1Score, Youden be respectively increased 0.17%~0.84%, 0.67%~1.4%, 0.0015~0.0081,
0.0035~0.0169,0.0017~0.0083 and 0.003~0.0167, the time reduces 8.06s~203.81s.Due to accurate
Degree and time are most common evaluation indexes, in order to which the expression algorithms of different that is more clear is in accuracy and time two indices
Difference, the average value of the two indexs is drawn into histogram, respectively as shown in Fig. 6 and Fig. 7.By Fig. 6 and Fig. 7 as it can be seen that this hair
The accuracy of the accuracy highest of bright algorithm, Pawlak RS-SVM model is minimum.Because PawlakRS is established in equivalence relation
On the basis of, nominal type variable can only be handled, logarithm type data need to pass through sliding-model control, and the time for not only increasing algorithm is complicated
Degree can also lose important information, and different discretization methods also will affect final treatment effect, the spy after discretization
Collection is closed can not portray lung tumors ROI region comprehensively.The time complexity of inventive algorithm reduces low when comparing not reduction
4.27 times, algorithm is compared also below other, it can be seen that lung tumors high dimensional feature selection algorithm can be improved in inventive algorithm
Accuracy, the time complexity of algorithm is effectively reduced, have certain promotional value.
In order to improve the performance of lung tumors computer-aided diagnosis algorithm, the advantage and disadvantage of IG and NRS are analyzed, propose one
The lung tumors high dimensional feature selection algorithm of kind of mixing IG and NRS, and from the feasible of theoretical level analysis two stages Algorithm for Reduction
Property.For the validity of verification algorithm, the 104 dimensional features construction decision information table of 3000 lung tumors CT images is extracted, is borrowed
Helping IG and NRS, attribute reduction obtains optimal character subset twice, finally carries out Classification and Identification using SVM.By with not reduction
Algorithm, Pawlak RS, IG and NRS Algorithm for Reduction are compared it is found that the accuracy of algorithm can be improved in the algorithm, effectively drops
Low time complexity, and the performance of the lung tumors high dimensional feature selection algorithm of Comprehensive Correlation distinct methods building, it is ensured that this
The superiority of inventive method guarantees the science of result, to lung tumors area of computer aided from the gradually selection of model method
Diagnosis has certain reference value.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (5)
1. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set, which is characterized in that specific steps include
It is as follows:
Step 1: data prediction;Image is numbered in sequence respectively, pseudo- coloured silk is gone to be converted into gray level image;From gray level image
Middle division ROI region, and by the image normalization of ROI region;
Step 2: image segmentation;To the image of pretreated obtained ROI region, divided using maximum variance between clusters
It cuts, obtains background area image and target area image;
Step 3: feature extraction;The target area image of ROI region after segmentation is extracted into feature;And construct continuous type decision
Information table S0;
Step 4: feature normalization;Continuous type decision information table S will be constructed in step 30It is normalized, wherein only to continuous
Type decision information table S0In conditional attribute be normalized, obtain continuous type decision information table S;
Step 5: the feature selecting based on information gain;Using the continuous type decision information table S in step 4 as input,
Carry out feature selecting, the attribute set red after obtaining information gain reduction1;
Step 6: the feature selecting based on field rough set;Attribute set red after inputting information gain reduction1It is thick by field
The feature selecting of rough collection obtains reduction result red twice;
Step 7: Classification and Identification is carried out to reduction result twice.
2. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 1, special
Sign is that it includes: shape feature, textural characteristics and gray feature that feature is extracted in the step 3.
3. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 1, special
Sign is, the target area image of the ROI region after segmentation is extracted feature in the step 4, and to the feature of extraction into
Row normalization, so that the data after normalization are all fallen between [0,1], formula are as follows:
Wherein, xmaxAnd xminRespectively indicate the maximum value and minimum value of sample array.
4. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 1, special
Sign is that specific steps include: in the step 5
1) continuous type decision information table S=(U, A, V, f) is inputted, wherein U indicates that domain, A=C ∪ D, C indicate conditional attribute collection,
The target area image of ROI region after will dividing extracts feature, and the set after being normalized, D expression are determined
The set that plan attribute is constituted;The union of V expression attribute codomain;The information function of f expression mapping relations;
2) init attributes set red1=φ calculates the information gain Gain (C of each conditional attributei), calculate each condition category
The average value average of property information gain;
3) the maximum attribute c of information gain is selectedi, attribute set red1=red1∪{ci, and remove this in conditional attribute collection C
Attribute;
If 4) the maximum attribute c of information gainiInformation gain value be less than average value average, then stop obtain information gain
The attribute set red of reduction1, otherwise it is adjusted to step 2).
5. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 4, special
Sign is that specific steps include: in the step 6
1) the attribute set red of information gain reduction is inputted1=(U, A ', V, f), wherein in A '=C ' ∪ D, C ' expression step 5
Information gain value is greater than or equal to the set of the conditional attribute of average information yield value, determines the set of radius of neighbourhood δ, setting weight
Spending lower limit is 0.001;
2) reduction set red=φ, sample smp=U twice are initialized;
3) rightUtilize formulaCalculate positive domainWherein,δB(xi)={ xj|xj∈U,ΔB(xi,xj)≤δ },N represents the number that domain U is divided into equivalence class by decision attribute D,
4) for a ∈ B, a is selectedkSo that positive domainIt is maximum;
5) formula sig is utilizedB(c, B, D)=γB∪c(D)-γB(D) computation attribute different degree sig (ak, red, D), whereinIndicate dependency degree of the decision attribute D relative to subset B;
If 6) sig (ak, red, D) and be greater than the different degree lower limit value of setting, then reduction result red is exported, program is terminated, otherwise, note
K value is recorded, is enabled:Then return step 2) continue to calculate, until output reduction result red.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910168981.3A CN109934278B (en) | 2019-03-06 | 2019-03-06 | High-dimensionality feature selection method for information gain mixed neighborhood rough set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910168981.3A CN109934278B (en) | 2019-03-06 | 2019-03-06 | High-dimensionality feature selection method for information gain mixed neighborhood rough set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109934278A true CN109934278A (en) | 2019-06-25 |
CN109934278B CN109934278B (en) | 2023-06-27 |
Family
ID=66986458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910168981.3A Active CN109934278B (en) | 2019-03-06 | 2019-03-06 | High-dimensionality feature selection method for information gain mixed neighborhood rough set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109934278B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110464345A (en) * | 2019-08-22 | 2019-11-19 | 北京航空航天大学 | A kind of separate head bioelectrical power signal interference elimination method and system |
CN110598192A (en) * | 2019-06-28 | 2019-12-20 | 太原理工大学 | Text feature reduction method based on neighborhood rough set |
CN110988804A (en) * | 2019-11-11 | 2020-04-10 | 浙江大学 | Radar radiation source individual identification system based on radar pulse sequence |
CN111476455A (en) * | 2020-03-03 | 2020-07-31 | 中国南方电网有限责任公司 | Power grid operation section feature selection and online generation method based on two-stage structure |
CN111553127A (en) * | 2020-04-03 | 2020-08-18 | 河南师范大学 | Multi-label text data feature selection method and device |
CN112200259A (en) * | 2020-10-19 | 2021-01-08 | 哈尔滨理工大学 | Information gain text feature selection method and classification device based on classification and screening |
CN112365992A (en) * | 2020-11-27 | 2021-02-12 | 安徽理工大学 | Medical examination data identification and analysis method based on NRS-LDA |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004114650A2 (en) * | 2003-06-16 | 2004-12-29 | Hewlett Packard Development Company, L.P. | Systems and methods for dot gain determination and dot gain based printing |
CN101923604A (en) * | 2010-07-23 | 2010-12-22 | 福建师范大学 | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set |
CN102510363A (en) * | 2011-09-30 | 2012-06-20 | 哈尔滨工程大学 | LFM (linear frequency modulation) signal detecting method under strong interference source environment |
CN102755172A (en) * | 2011-04-28 | 2012-10-31 | 株式会社东芝 | Nuclear medical imaging method and device |
CN103202714A (en) * | 2012-01-16 | 2013-07-17 | 株式会社东芝 | Ultrasonic Diagnostic Apparatus, Medical Image Processing Apparatus, And Medical Image Processing Method |
CN103258204A (en) * | 2012-02-21 | 2013-08-21 | 中国科学院心理研究所 | Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features |
CN103336790A (en) * | 2013-06-06 | 2013-10-02 | 湖州师范学院 | Hadoop-based fast neighborhood rough set attribute reduction method |
CN103336791A (en) * | 2013-06-06 | 2013-10-02 | 湖州师范学院 | Hadoop-based fast rough set attribute reduction method |
CN103744928A (en) * | 2013-12-30 | 2014-04-23 | 北京理工大学 | Network video classification method based on historical access records |
US20140213466A1 (en) * | 2010-11-19 | 2014-07-31 | Rutgers, The State University Of New Jersey | High-throughput assessment method for contact hypersensitivity |
CN105758450A (en) * | 2015-12-23 | 2016-07-13 | 西安石油大学 | Fire protection pre-warning sensing system building method based on multiple sensor emergency robots |
CN106202886A (en) * | 2016-06-29 | 2016-12-07 | 中国铁路总公司 | Track circuit red band Fault Locating Method based on fuzzy coarse central Yu decision tree |
CN107194420A (en) * | 2017-05-16 | 2017-09-22 | 浙江象立医疗科技有限公司 | A kind of Fuzzy and Rough concentrates the attribute selection method based on information gain-ratio |
CN107679368A (en) * | 2017-09-11 | 2018-02-09 | 宁夏医科大学 | PET/CT high dimensional feature level systems of selection based on genetic algorithm and varied precision rough set |
CN108334859A (en) * | 2018-02-28 | 2018-07-27 | 上海海洋大学 | A kind of optical remote sensing Warships Model identification crowdsourcing system based on fine granularity feature |
CN108389109A (en) * | 2018-02-11 | 2018-08-10 | 中国民航信息网络股份有限公司 | A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm |
-
2019
- 2019-03-06 CN CN201910168981.3A patent/CN109934278B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004114650A2 (en) * | 2003-06-16 | 2004-12-29 | Hewlett Packard Development Company, L.P. | Systems and methods for dot gain determination and dot gain based printing |
CN101923604A (en) * | 2010-07-23 | 2010-12-22 | 福建师范大学 | Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set |
US20140213466A1 (en) * | 2010-11-19 | 2014-07-31 | Rutgers, The State University Of New Jersey | High-throughput assessment method for contact hypersensitivity |
CN102755172A (en) * | 2011-04-28 | 2012-10-31 | 株式会社东芝 | Nuclear medical imaging method and device |
CN102510363A (en) * | 2011-09-30 | 2012-06-20 | 哈尔滨工程大学 | LFM (linear frequency modulation) signal detecting method under strong interference source environment |
CN103202714A (en) * | 2012-01-16 | 2013-07-17 | 株式会社东芝 | Ultrasonic Diagnostic Apparatus, Medical Image Processing Apparatus, And Medical Image Processing Method |
CN103258204A (en) * | 2012-02-21 | 2013-08-21 | 中国科学院心理研究所 | Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features |
CN103336790A (en) * | 2013-06-06 | 2013-10-02 | 湖州师范学院 | Hadoop-based fast neighborhood rough set attribute reduction method |
CN103336791A (en) * | 2013-06-06 | 2013-10-02 | 湖州师范学院 | Hadoop-based fast rough set attribute reduction method |
CN103744928A (en) * | 2013-12-30 | 2014-04-23 | 北京理工大学 | Network video classification method based on historical access records |
CN105758450A (en) * | 2015-12-23 | 2016-07-13 | 西安石油大学 | Fire protection pre-warning sensing system building method based on multiple sensor emergency robots |
CN106202886A (en) * | 2016-06-29 | 2016-12-07 | 中国铁路总公司 | Track circuit red band Fault Locating Method based on fuzzy coarse central Yu decision tree |
CN107194420A (en) * | 2017-05-16 | 2017-09-22 | 浙江象立医疗科技有限公司 | A kind of Fuzzy and Rough concentrates the attribute selection method based on information gain-ratio |
CN107679368A (en) * | 2017-09-11 | 2018-02-09 | 宁夏医科大学 | PET/CT high dimensional feature level systems of selection based on genetic algorithm and varied precision rough set |
CN108389109A (en) * | 2018-02-11 | 2018-08-10 | 中国民航信息网络股份有限公司 | A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm |
CN108334859A (en) * | 2018-02-28 | 2018-07-27 | 上海海洋大学 | A kind of optical remote sensing Warships Model identification crowdsourcing system based on fine granularity feature |
Non-Patent Citations (5)
Title |
---|
LIU JINGHUA: "Online multi-label streaming feature selection based on neighborhood rough set", 《PATTERN RECOGNITION》 * |
刘翠翠: "基于改进邻域粗糙集的肿瘤特征基因选择算法的研究", 《无线互联科技》 * |
王荣荣等: "基于粗糙集和遗传算法的水轮发电机组故障诊断方法", 《中国农村水利水电》 * |
詹蓉等: "个性化需求分类的定量分析研究", 《软科学》 * |
邓大勇等: "多粒度粗糙集的双层绝对约简", 《模式识别与人工智能》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110598192A (en) * | 2019-06-28 | 2019-12-20 | 太原理工大学 | Text feature reduction method based on neighborhood rough set |
CN110464345A (en) * | 2019-08-22 | 2019-11-19 | 北京航空航天大学 | A kind of separate head bioelectrical power signal interference elimination method and system |
CN110988804A (en) * | 2019-11-11 | 2020-04-10 | 浙江大学 | Radar radiation source individual identification system based on radar pulse sequence |
CN110988804B (en) * | 2019-11-11 | 2022-01-25 | 浙江大学 | Radar radiation source individual identification system based on radar pulse sequence |
CN111476455A (en) * | 2020-03-03 | 2020-07-31 | 中国南方电网有限责任公司 | Power grid operation section feature selection and online generation method based on two-stage structure |
CN111553127A (en) * | 2020-04-03 | 2020-08-18 | 河南师范大学 | Multi-label text data feature selection method and device |
CN111553127B (en) * | 2020-04-03 | 2023-11-24 | 河南师范大学 | Multi-label text data feature selection method and device |
CN112200259A (en) * | 2020-10-19 | 2021-01-08 | 哈尔滨理工大学 | Information gain text feature selection method and classification device based on classification and screening |
CN112365992A (en) * | 2020-11-27 | 2021-02-12 | 安徽理工大学 | Medical examination data identification and analysis method based on NRS-LDA |
Also Published As
Publication number | Publication date |
---|---|
CN109934278B (en) | 2023-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Arunkumar et al. | Fully automatic model‐based segmentation and classification approach for MRI brain tumor using artificial neural networks | |
CN109934278A (en) | A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set | |
Carvalho et al. | Breast cancer diagnosis from histopathological images using textural features and CBIR | |
Farid et al. | A novel approach of CT images feature analysis and prediction to screen for corona virus disease (COVID-19) | |
CN108364006B (en) | Medical image classification device based on multi-mode deep learning and construction method thereof | |
Lee et al. | Random forest based lung nodule classification aided by clustering | |
de Carvalho Filho et al. | Automatic detection of solitary lung nodules using quality threshold clustering, genetic algorithm and diversity index | |
Bridge et al. | Introducing the GEV activation function for highly unbalanced data to develop COVID-19 diagnostic models | |
Orozco et al. | Lung nodule classification in CT thorax images using support vector machines | |
Kundu et al. | An automatic bleeding frame and region detection scheme for wireless capsule endoscopy videos based on interplane intensity variation profile in normalized RGB color space | |
de Sousa Costa et al. | Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance | |
CN109978880A (en) | Lung tumors CT image is carried out sentencing method for distinguishing using high dimensional feature selection | |
Borkowski et al. | Comparing artificial intelligence platforms for histopathologic cancer diagnosis | |
Dong et al. | Cervical cell classification based on the CART feature selection algorithm | |
Buda et al. | Deep radiogenomics of lower-grade gliomas: convolutional neural networks predict tumor genomic subtypes using MR images | |
Yuan et al. | An efficient multi-path 3D convolutional neural network for false-positive reduction of pulmonary nodule detection | |
Sethanan et al. | Double AMIS-ensemble deep learning for skin cancer classification | |
Diniz et al. | An ensemble method for nuclei detection of overlapping cervical cells | |
Kumar et al. | Recent advances in machine learning for diagnosis of lung disease: A broad view | |
Vogado et al. | A ensemble methodology for automatic classification of chest X-rays using deep learning | |
Ganeshkumar et al. | Two-stage deep learning model for automate detection and classification of lung diseases | |
Singh et al. | Detection of Brain Tumors Through the Application of Deep Learning and Machine Learning Models | |
Grace John et al. | Extreme learning machine algorithm‐based model for lung cancer classification from histopathological real‐time images | |
Kaur et al. | A survey on medical image segmentation | |
Hu et al. | Classification of malignant-benign pulmonary nodules in lung CT images using an improved random forest (Use style: Paper title) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |