CN109934278A - A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set - Google Patents

A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set Download PDF

Info

Publication number
CN109934278A
CN109934278A CN201910168981.3A CN201910168981A CN109934278A CN 109934278 A CN109934278 A CN 109934278A CN 201910168981 A CN201910168981 A CN 201910168981A CN 109934278 A CN109934278 A CN 109934278A
Authority
CN
China
Prior art keywords
attribute
information gain
feature
red
reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910168981.3A
Other languages
Chinese (zh)
Other versions
CN109934278B (en
Inventor
陆惠玲
周涛
张飞飞
梁蒙蒙
杨健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningxia Medical University
Original Assignee
Ningxia Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningxia Medical University filed Critical Ningxia Medical University
Priority to CN201910168981.3A priority Critical patent/CN109934278B/en
Publication of CN109934278A publication Critical patent/CN109934278A/en
Application granted granted Critical
Publication of CN109934278B publication Critical patent/CN109934278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of high-dimensional feature selection methods of information gain mixing neighborhood rough set, and specific steps include the following: step 1: data prediction;Step 2: image segmentation;Step 3: feature extraction;Step 4: feature normalization;Step 5: the feature selecting based on information gain;Step 6: the feature selecting based on field rough set;Step 7: Classification and Identification is carried out to reduction result twice.The present disclosure provides a kind of high-dimensional feature selection methods of information gain mixing neighborhood rough set, and from the feasibility of theoretical level analysis two stages Algorithm for Reduction.The accuracy of algorithm can be improved in algorithm, time complexity is effectively reduced, and the performance of the high dimensional feature selection algorithm of Comprehensive Correlation distinct methods building, ensure the superiority of context of methods, guarantee the science of result from the gradually selection of model method, pernicious identification good to lung tumors has certain reference value.

Description

A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set
Technical field
The present invention relates to technical field of image processing, more particularly to a kind of information gain mixes neighborhood rough set High-dimensional feature selection method.
Background technique
Information gain (information gain, IG) and rough set (rough set, RS) are feature selectings common two Kind of algorithm, IG are to measure to include or provide the index of how many information content for classifier when not comprising some feature, are successively asked The information content that each feature provides classifier out, is then ranked up from big to small, K spy before taking according to certain rules Sign, to achieve the purpose that carry out feature selecting using information gain.IG progress feature selecting computation complexity is lower, only needs Single operation, therefore operational efficiency is higher, can effectively reject redundancy, uncorrelated and noise characteristic.But IG is as a kind of mistake Filter formula algorithm carries out feature selecting and still has problem, it can only investigate contribution of the feature to whole system, and cannot arrive in detail In some classification, and the relationship between feature is not considered, therefore be only suitable for (referring to whole for the feature selecting for doing " overall situation " Class all use same characteristic set).And the feature selecting of " part " can not be done (each classification has the feature set of oneself It closes, some features have biggish discrimination to a certain classification, and then insignificant to other classifications).RS is that processing is uncertain Property data effective tool, because its be not necessarily to priori knowledge characteristic, be widely used in feature selecting, pattern-recognition, data mining With the fields such as Knowledge Discovery.Two key concepts of RS research are that concept is approximate and attribute reduction respectively, and wherein attribute reduction is The dimension of attribute is reduced under the premise of not influencing current identification mission differentiability, but RS is constructed on the basis of certain Equivalence relation, be rather limited in many practical applications.Therefore in order to avoid data to the dependence of single method and Preferably the redundancy in rejecting data set and uncorrelated attribute, many scholars are superior by the global characteristics selective power of IG and RS Attribute reduction ability combine and carry out high dimensional feature selection, be successfully applied to sentiment analysis, real estate marked price analysis, swollen Tumor diagnostic classification, the prediction of fishing feelings etc..But Pawlak RS can only handle nominal type variable, the data in practical application are often Continuous numerical variable, although the data set after discretization is adapted to the building of RS algorithm equivalence class, but may also can lose It loses important information and different discretization strategies also will affect reduction effect.It is mentioned for this purpose, Hu Qinghua et al. introduces neighborhood relationships Go out improved Pawlak RS, i.e. neighborhood rough set (neighborhood rough set, NRS), it can be directly to continuous Numeric type data is handled.Although IG and RS can individually carry out feature selecting, have some limitations, therefore The advantage of the two is combined and carries out feature selecting with certain feasibility, selects high relevant feature by IG result Collection, then the attribute by NRS rejecting highly redundant, wherein NRS can overcome RS to be only suitable for handling discrete variable and causing original The problem of information is largely lost.Optimal character subset is obtained by attribute reduction twice, can preferably be rejected in data set Redundancy and uncorrelated features, improve the performance of algorithm, reduce time complexity, can also to avoid data to single method according to Rely.
Therefore, how to provide a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set is this field skill The problem of art personnel's urgent need to resolve.
Summary of the invention
In view of this, the present invention provides a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set, And from the feasibility of theoretical level analysis two stages Algorithm for Reduction.By with not Algorithm for Reduction, Pawlak RS, IG and NRS about Contracted calculation is compared it is found that the accuracy of algorithm can be improved in the algorithm, and time complexity, and Comprehensive Correlation is effectively reduced The performance of the high dimensional feature selection algorithm of distinct methods building, it is ensured that the superiority of context of methods, from the gradually choosing of model method The science for guaranteeing result is selected, pernicious identification good to lung tumors has certain reference value.
To achieve the goals above, the invention provides the following technical scheme:
A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set, specific steps include the following:
Step 1: data prediction;Image is numbered in sequence respectively, pseudo- coloured silk is gone to be converted into gray level image;From gray scale Divide ROI region in image, and by the image normalization of ROI region;
Step 2: image segmentation;To the image of pretreated obtained ROI region, using maximum variance between clusters into Row segmentation;
Step 3: feature extraction;The target area image of ROI region after segmentation is extracted into feature;And construct continuous type Decision information table S0
Step 4: feature normalization;Continuous type decision information table S will be constructed in step 30Conditional attribute carry out normalizing Change, obtains new continuous type decision information table S;
Step 5: the feature selecting based on information gain;Using the continuous type decision information table S in step 4 as defeated Enter, the attribute set red after obtaining information gain reduction1
Step 6: the feature selecting based on field rough set;Attribute set red after inputting information gain reduction1By The feature selecting of field rough set obtains reduction result red twice;
Step 7: Classification and Identification is carried out to reduction result twice.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step It is in order to eliminate error present in ROI region acquisition process and facilitate the processing of subsequent image, ROI image is complete in rapid one Portion is normalized to the image of 50 × 50 pixel sizes.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step In rapid two, the image of ROI region is cut into two groups in the punishment of a certain threshold value, one group of corresponding background, one group of corresponding target.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step It includes: shape feature, textural characteristics and gray feature that feature is extracted in rapid three.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step The target area image of the ROI region after segmentation is extracted into feature in rapid four, and the feature of extraction is normalized, so that Data after normalization are all fallen between [0,1], formula are as follows:
Wherein, xmaxAnd xminRespectively indicate the maximum value and minimum value of sample array.Herein only to step 3 feature extraction The consecutive decision making table S constructed afterwards0In conditional attribute be normalized, decision attribute is obtained without normalized To new continuous type decision information table S.
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step Specific steps include: in rapid five
1) continuous type decision information table S=(U, A, V, f) is inputted, wherein U indicates domain, A=C ∪ D, C conditional attribute The target area image of ROI region after segmentation is extracted feature by collection, and the collection after feature is normalized It closes, D indicates the set that decision attribute is constituted;The union of V expression attribute codomain;The information function of f expression mapping relations;
2) init attributes set red1=φ calculates the information gain Gain (C of each conditional attributei), calculate each The average value average of part attribute information gain;
3) the maximum attribute c of information gain is selectedi, attribute set red1=red1∪{ci, and in conditional attribute collection C Remove the attribute;
If 4) the maximum attribute c of information gainiInformation gain value be less than average value average, then stop obtain letter Cease the attribute set red of gain reduction1, otherwise it is adjusted to step 2).
Preferably, in a kind of high-dimensional feature selection method of above-mentioned information gain mixing neighborhood rough set, the step Specific steps include: in rapid six
1) the attribute set red of information gain reduction is inputted1=(U, A ', V, f), wherein A '=C ' ∪ D, C ' expression step Information gain value is greater than or equal to the set of the conditional attribute of average information yield value in five, determines the set of radius of neighbourhood δ, if Setting different degree lower limit is 0.001;
2) reduction set red=φ, sample smp=U twice are initialized;
3) rightUtilize formulaCalculate positive domainIts In,δB(xi)={ xj|xj∈U,ΔB(xi,xj)≤δ },N represents the number that domain U is divided into equivalence class by decision attribute D,
4) for a ∈ B, a is selectedkSo that positive domainIt is maximum;
5) formula sig is utilizedB(c, B, D)=γB∪c(D)-γB(D) computation attribute different degree sig (ak, red, D), whereinIndicate dependency degree of the decision attribute D relative to subset B;
If 6) sig (ak, red, D) and be greater than the different degree lower limit value of setting, then reduction result red is exported, program is terminated, it is no Then, k value is recorded, is enabled:Then return step 2) continue to calculate, until output reduction result red。
It can be seen via above technical scheme that compared with prior art, it is mixed that the present disclosure provides a kind of information gains The high-dimensional feature selection method of neighborhood rough set is closed, and from the feasibility of theoretical level analysis two stages Algorithm for Reduction.Algorithm Time complexity, and the high dimensional feature choosing of Comprehensive Correlation distinct methods building is effectively reduced in the accuracy that algorithm can be improved Select the performance of algorithm, it is ensured that the superiority of context of methods guarantees the science of result, to lung from the gradually selection of model method Benign from malignant tumors identification in portion's has certain reference value.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 attached drawing is flow chart of the invention;
Fig. 2 attached drawing is data prediction schematic diagram of the invention;
Fig. 3 attached drawing is image segmentation schematic diagram of the invention;
Fig. 4 attached drawing is that the present invention is based on the flow charts of the feature selecting of field rough set;
Fig. 5 attached drawing is the histogram that algorithms of different reduction length compares in present invention experiment one;
Fig. 6 attached drawing is the histogram that algorithms of different classification accuracy compares in present invention experiment two;
Fig. 7 attached drawing is the histogram that the algorithms of different classification time compares in present invention experiment two.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a kind of high-dimensional feature selection methods of information gain mixing neighborhood rough set, and from The feasibility of theoretical level analysis two stages Algorithm for Reduction.By with not Algorithm for Reduction, Pawlak RS, IG and NRS Algorithm for Reduction It is compared it is found that the accuracy of algorithm can be improved in the algorithm, is effectively reduced time complexity, and Comprehensive Correlation not Tongfang The performance of the high dimensional feature selection algorithm of method building, it is ensured that the superiority of context of methods is protected from the gradually selection of model method The science of result is demonstrate,proved, pernicious identification good to lung tumors has certain reference value.
Embodiment:
(1) data acquisition
Data source is in hospital general, Ningxia Medical University, and every number of cases is according to including clinical diagnosis result, image data, inspection institute See, clinical findings are the standards for diagnosing lung's benign from malignant tumors.Cause model training insufficient in order to avoid data are very few, this Research is not limited to certain lung tumors.Therefore lung tumors data 3000 are obtained, wherein pulmonary malignant tumour CT data 1500, benign lung tumour CT data 1500.
(2) data prediction
Lung's innocent and malignant tumour CT image is obtained from DICOM file according to the inspection conclusion in doctor's advice referring to every number of cases, it will Image is numbered in sequence respectively, and pseudo- coloured silk is gone to be converted into gray level image.With the sick label of image department medical courses in general in gray level image Interception has the subgraph of stronger separating capacity as ROI region lung tumors centered on stove, and is by ROI region image normalization 50 × 50 pixels;Process of data preprocessing is as shown in Figure 2.
(3) image segmentation
In order to which the features such as shape, texture and gray scale to lung images are accurately measured, to the pretreated area ROI Domain selects maximum variance between clusters (OTSU algorithm) to be split.Because it is most effective, most steady that OTSU algorithm is that threshold value is chosen automatically One of fixed method, and do not influenced under certain condition by picture contrast and brightness change.The basic principle is that by the area ROI Area image is cut into two groups in the punishment of a certain threshold value, one group of corresponding background, one group of corresponding target.As shown in figure 3, providing the present invention point Cut 5 groups of examples of front and back;
(4) feature extraction
Feature extraction, the feature of extraction totally 104 dimension, including shape are carried out for the target area ROI after step (3) segmentation Shape feature, textural characteristics and gray feature, specific features are shown in Table 1.It extracts feature and constructs continuous type decision information table later S0: including 3000 samples, each sample includes 104 dimension conditional attributes and 1 dimension decision attribute;
1 lung tumors CT characteristics of image of table
(5) feature normalization
Accurate data processed result in order to obtain, feature (the i.e. step (4) that the target area ROI after segmentation is extracted The continuous type characteristic set of extraction) difference for eliminating data bulk grade and dimension is normalized, the present invention uses commonly most Big minimum value method, so that the data after normalization are all fallen between [0,1], formula are as follows:
Wherein, xmaxAnd xminRespectively indicate the maximum value and minimum value of sample array.
(6) based on the feature selecting of information gain
Input: continuous type decision information table S=(U, A, V, f), wherein U=(x1,x2,...,xn) it is known as domain, indicate complete The set that body sample is constituted;A=C ∪ D, the set that C indicates that conditional attribute is constituted (pass through normalized in step (5) 104 dimensional feature set), (i.e. lung tumors is good pernicious, represents malignant pulmonary with number 1 for the set that D expression decision attribute is constituted Tumour, -1 represents benign malignant tumour);The union of V expression attribute codomain;The information function of f expression mapping relations;
Output: the attribute set red after information gain reduction1
Step: 1) set red is initialized1=φ calculates the information gain Gain (C of each conditional attributei), it calculates each The average value average of conditional attribute information gain;
2) the maximum attribute c of information gain is selectedi, red1=red1∪{ci, and remove the category in conditional attribute collection C Property;
If 3) the maximum attribute c of information gainiInformation gain value be less than average value average, then stop obtain red1, otherwise it is adjusted to step 2).
(7) based on the feature selecting of neighborhood rough set
NRS attribute reduction is that redundant attributes, reduction are deleted under the premise of not influencing decision system decision-making capability itself Algorithm use before to greedy algorithm, as shown in figure 4, its key step is as follows:
Input: the attribute set red after information gain reduction1=(U, A ', V, f), wherein A '=C ' ∪ D, C ' expression walks Suddenly information gain value determines the set of radius of neighbourhood δ more than or equal to the set of the conditional attribute of average information yield value in (6), It is 0.001 that different degree lower limit, which is arranged,;
Output: reduction set red twice;
Step: 1) reduction set red=φ, sample smp=U twice are initialized;
2) rightUtilize formulaCalculate positive domainIts In,δB(xi)={ xj|xj∈U,ΔB(xi,xj)≤δ },N represents the number that domain U is divided into equivalence class by decision attribute D,
3) for a ∈ B, a is selectedkSo that positive domainIt is maximum;
4) formula sig is utilizedB(c, B, D)=γB∪c(D)-γB(D) computation attribute different degree sig (ak, red, D), whereinIndicate dependency degree of the decision attribute D relative to subset B;
If 5) sig (ak, red, D) and be greater than the different degree lower limit value of setting, then reduction result red is exported, program is terminated, it is no Then, k value is recorded, is enabled: red=red+ak,Then return step 2) continue to calculate, until output reduction result red。
(8) machine learning method on Statistical Learning Theory basis is established using support vector machines, most according to structure Smallization principle can preferably solve small sample, overfitting, high latitude, the locally practical challenges such as ultimate attainment, have very strong Generalization ability and Classification and Identification ability can be solved effectively " non-linear, high during the CAD based on medical image is diagnosed The problem of dimension ".Classification and Identification is carried out using result of the SVM to reduction twice, wherein Selection of kernel function Radial basis kernel function (Radial Basis Function, RBF), C and g using grid optimizing algorithm (Grid Search, GS) optimization SVM join Number.
The performance evaluation for early diagnosing accuracy includes the big index of sensibility and specificity two, but the two indexs are difficult The overall performance of comprehensive interpretive classification device.Therefore, the present invention is reduction length to reduction model evaluation index, and disaggregated model is commented Valence index includes: accuracy (Accuracy), sensibility (Sensitivity), specificity (Specificity), F value (F- Score value), Ma Xiusi relative coefficient (Matthews correlation coefficient, MCC), balance F score (balanced F Score,F1Score), youden index (Youden index, YI) and algorithm are time-consuming (Time).
Accuracy (Accuracy) is the most common evaluation index, and accuracy is higher, and classifier is better, and calculation formula is such as Under:
Sensitivity (sensitive) and specificity (specificity) are respectively intended to measure classifier to positive example and negative example Recognition capability, be worth it is bigger, recognition performance is higher, and calculation formula is as follows:
F value is recall ratio and precision ratio weighted harmonic mean, for weighing accurate rate and recall rate.
MCC is the related coefficient described between actual classification and prediction classification, considers true positives, true negative, vacation comprehensively Positive and false negative is a kind of more balanced index, its value range is [- 1,1], and value indicates tested right closer to 1 The prediction of elephant is more accurate, and calculation formula is as follows:
F1Score is a kind of more comprehensively evaluation index that two disaggregated model accuracy are measured in statistics, is accurate A kind of weighted average of rate and recall rate, its value range are [0,1], and the accuracy rate for being worth closer 1 representative model is higher, meter It is as follows to calculate formula:
YI is also known as correct index, is indicated with the value that the sum of sensitivity and specificity subtract 1, its value range is [0,1], value Closer to 1, the authenticity of model prediction is better, and calculation formula is as follows:
YI=Sensitivity+Specificity-1
Algorithm time-consuming (Time) indicates algorithm from bringing into operation to terminating the time it takes.
Wherein, TP indicates correctly to be divided into the number of positive example, i.e., practical to be positive example and be classified device and be divided into positive example Sample number;FP indicates mistakenly to be divided into the number of positive example, i.e., the example that is actually negative but is classified the sample that device is divided into positive example This number;FN indicates the number for mistakenly being divided the example that is negative, i.e., practical to be positive example but be classified the sample number that device divides the example that is negative; TN indicates correctly to be divided the number of example of being negative, i.e., the example that is actually negative and is classified the sample number that device divides the example that is negative.
Experimental result and analysis
Original decision letter can be effectively reduced from theoretic in character subset by IG's and after the reduction of NRS two stages The dimension of table is ceased, time complexity and space complexity are reduced.Data noise can be reduced with preliminary screening by IG, rejected related The lesser attribute of property, the attribute of highly redundant can be effectively rejected by bis- reduction of NRS.In order to further verify text proposition The feasibility and validity of two stages reduction high dimensional feature selection algorithm, with 3000 (1500 benign, and 1500 pernicious) lungs Portion's tumour CT image is research object, extracts shape, texture and gray feature totally 104 dimension construction original respectively after obtaining ROI region Beginning characteristic set carries out two stages reduction using IG and NRS, and reduction result carries out Classification and Identification using SVM.
Test the comparison of an algorithms of different reduction result
Reduction is carried out to original decision information table using different algorithms, concrete outcome is as shown in Figure 5.From figure 5 it can be seen that adopting When being carried out after reduction original decision information table compared to not reduction with algorithms of different, the dimension of information table, which has, largely to drop Low, the reduction length of inventive algorithm is only above NRS algorithm, reduces by 65 dimensions compared to raw information table dimension.
Test the comparison of two algorithms of different classification results
Five folding of SVM intersection is utilized respectively to the reduction of one algorithms of different of experiment (to select from 1500 good (evil) property every time 300 are taken as test set, remaining 1200 are used as training set) Classification and Identification is carried out, from accuracy, susceptibility, specificity, F The superiority and inferiority of value, MCC, F1Score, 8 Youden, total time metrics evaluation algorithms, each five folding of index of algorithms of different intersect result Average value as final evaluation result.Concrete outcome is shown in Table 2:
The comparison of 2 algorithms of different classification results of table
As can be seen from Table 2, each evaluation index of algorithm difference number of crossings of the same race has differences, for the property of comprehensive measure algorithm Can, final classification result of the average value intersected using five foldings as the algorithm.Removed Pawlak RS-SVM, calculation of the invention Method susceptibility has lesser degree of reduction compared to other algorithms, other indexs are better than other algorithms, accuracy, specificity, F Value, MCC, F1Score, Youden be respectively increased 0.17%~0.84%, 0.67%~1.4%, 0.0015~0.0081, 0.0035~0.0169,0.0017~0.0083 and 0.003~0.0167, the time reduces 8.06s~203.81s.Due to accurate Degree and time are most common evaluation indexes, in order to which the expression algorithms of different that is more clear is in accuracy and time two indices Difference, the average value of the two indexs is drawn into histogram, respectively as shown in Fig. 6 and Fig. 7.By Fig. 6 and Fig. 7 as it can be seen that this hair The accuracy of the accuracy highest of bright algorithm, Pawlak RS-SVM model is minimum.Because PawlakRS is established in equivalence relation On the basis of, nominal type variable can only be handled, logarithm type data need to pass through sliding-model control, and the time for not only increasing algorithm is complicated Degree can also lose important information, and different discretization methods also will affect final treatment effect, the spy after discretization Collection is closed can not portray lung tumors ROI region comprehensively.The time complexity of inventive algorithm reduces low when comparing not reduction 4.27 times, algorithm is compared also below other, it can be seen that lung tumors high dimensional feature selection algorithm can be improved in inventive algorithm Accuracy, the time complexity of algorithm is effectively reduced, have certain promotional value.
In order to improve the performance of lung tumors computer-aided diagnosis algorithm, the advantage and disadvantage of IG and NRS are analyzed, propose one The lung tumors high dimensional feature selection algorithm of kind of mixing IG and NRS, and from the feasible of theoretical level analysis two stages Algorithm for Reduction Property.For the validity of verification algorithm, the 104 dimensional features construction decision information table of 3000 lung tumors CT images is extracted, is borrowed Helping IG and NRS, attribute reduction obtains optimal character subset twice, finally carries out Classification and Identification using SVM.By with not reduction Algorithm, Pawlak RS, IG and NRS Algorithm for Reduction are compared it is found that the accuracy of algorithm can be improved in the algorithm, effectively drops Low time complexity, and the performance of the lung tumors high dimensional feature selection algorithm of Comprehensive Correlation distinct methods building, it is ensured that this The superiority of inventive method guarantees the science of result, to lung tumors area of computer aided from the gradually selection of model method Diagnosis has certain reference value.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part It is bright.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (5)

1. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set, which is characterized in that specific steps include It is as follows:
Step 1: data prediction;Image is numbered in sequence respectively, pseudo- coloured silk is gone to be converted into gray level image;From gray level image Middle division ROI region, and by the image normalization of ROI region;
Step 2: image segmentation;To the image of pretreated obtained ROI region, divided using maximum variance between clusters It cuts, obtains background area image and target area image;
Step 3: feature extraction;The target area image of ROI region after segmentation is extracted into feature;And construct continuous type decision Information table S0
Step 4: feature normalization;Continuous type decision information table S will be constructed in step 30It is normalized, wherein only to continuous Type decision information table S0In conditional attribute be normalized, obtain continuous type decision information table S;
Step 5: the feature selecting based on information gain;Using the continuous type decision information table S in step 4 as input, Carry out feature selecting, the attribute set red after obtaining information gain reduction1
Step 6: the feature selecting based on field rough set;Attribute set red after inputting information gain reduction1It is thick by field The feature selecting of rough collection obtains reduction result red twice;
Step 7: Classification and Identification is carried out to reduction result twice.
2. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 1, special Sign is that it includes: shape feature, textural characteristics and gray feature that feature is extracted in the step 3.
3. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 1, special Sign is, the target area image of the ROI region after segmentation is extracted feature in the step 4, and to the feature of extraction into Row normalization, so that the data after normalization are all fallen between [0,1], formula are as follows:
Wherein, xmaxAnd xminRespectively indicate the maximum value and minimum value of sample array.
4. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 1, special Sign is that specific steps include: in the step 5
1) continuous type decision information table S=(U, A, V, f) is inputted, wherein U indicates that domain, A=C ∪ D, C indicate conditional attribute collection, The target area image of ROI region after will dividing extracts feature, and the set after being normalized, D expression are determined The set that plan attribute is constituted;The union of V expression attribute codomain;The information function of f expression mapping relations;
2) init attributes set red1=φ calculates the information gain Gain (C of each conditional attributei), calculate each condition category The average value average of property information gain;
3) the maximum attribute c of information gain is selectedi, attribute set red1=red1∪{ci, and remove this in conditional attribute collection C Attribute;
If 4) the maximum attribute c of information gainiInformation gain value be less than average value average, then stop obtain information gain The attribute set red of reduction1, otherwise it is adjusted to step 2).
5. a kind of high-dimensional feature selection method of information gain mixing neighborhood rough set according to claim 4, special Sign is that specific steps include: in the step 6
1) the attribute set red of information gain reduction is inputted1=(U, A ', V, f), wherein in A '=C ' ∪ D, C ' expression step 5 Information gain value is greater than or equal to the set of the conditional attribute of average information yield value, determines the set of radius of neighbourhood δ, setting weight Spending lower limit is 0.001;
2) reduction set red=φ, sample smp=U twice are initialized;
3) rightUtilize formulaCalculate positive domainWherein,δB(xi)={ xj|xj∈U,ΔB(xi,xj)≤δ },N represents the number that domain U is divided into equivalence class by decision attribute D,
4) for a ∈ B, a is selectedkSo that positive domainIt is maximum;
5) formula sig is utilizedB(c, B, D)=γB∪c(D)-γB(D) computation attribute different degree sig (ak, red, D), whereinIndicate dependency degree of the decision attribute D relative to subset B;
If 6) sig (ak, red, D) and be greater than the different degree lower limit value of setting, then reduction result red is exported, program is terminated, otherwise, note K value is recorded, is enabled:Then return step 2) continue to calculate, until output reduction result red.
CN201910168981.3A 2019-03-06 2019-03-06 High-dimensionality feature selection method for information gain mixed neighborhood rough set Active CN109934278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910168981.3A CN109934278B (en) 2019-03-06 2019-03-06 High-dimensionality feature selection method for information gain mixed neighborhood rough set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910168981.3A CN109934278B (en) 2019-03-06 2019-03-06 High-dimensionality feature selection method for information gain mixed neighborhood rough set

Publications (2)

Publication Number Publication Date
CN109934278A true CN109934278A (en) 2019-06-25
CN109934278B CN109934278B (en) 2023-06-27

Family

ID=66986458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910168981.3A Active CN109934278B (en) 2019-03-06 2019-03-06 High-dimensionality feature selection method for information gain mixed neighborhood rough set

Country Status (1)

Country Link
CN (1) CN109934278B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110464345A (en) * 2019-08-22 2019-11-19 北京航空航天大学 A kind of separate head bioelectrical power signal interference elimination method and system
CN110598192A (en) * 2019-06-28 2019-12-20 太原理工大学 Text feature reduction method based on neighborhood rough set
CN110988804A (en) * 2019-11-11 2020-04-10 浙江大学 Radar radiation source individual identification system based on radar pulse sequence
CN111476455A (en) * 2020-03-03 2020-07-31 中国南方电网有限责任公司 Power grid operation section feature selection and online generation method based on two-stage structure
CN111553127A (en) * 2020-04-03 2020-08-18 河南师范大学 Multi-label text data feature selection method and device
CN112200259A (en) * 2020-10-19 2021-01-08 哈尔滨理工大学 Information gain text feature selection method and classification device based on classification and screening
CN112365992A (en) * 2020-11-27 2021-02-12 安徽理工大学 Medical examination data identification and analysis method based on NRS-LDA

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114650A2 (en) * 2003-06-16 2004-12-29 Hewlett Packard Development Company, L.P. Systems and methods for dot gain determination and dot gain based printing
CN101923604A (en) * 2010-07-23 2010-12-22 福建师范大学 Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set
CN102510363A (en) * 2011-09-30 2012-06-20 哈尔滨工程大学 LFM (linear frequency modulation) signal detecting method under strong interference source environment
CN102755172A (en) * 2011-04-28 2012-10-31 株式会社东芝 Nuclear medical imaging method and device
CN103202714A (en) * 2012-01-16 2013-07-17 株式会社东芝 Ultrasonic Diagnostic Apparatus, Medical Image Processing Apparatus, And Medical Image Processing Method
CN103258204A (en) * 2012-02-21 2013-08-21 中国科学院心理研究所 Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features
CN103336790A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method
CN103336791A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast rough set attribute reduction method
CN103744928A (en) * 2013-12-30 2014-04-23 北京理工大学 Network video classification method based on historical access records
US20140213466A1 (en) * 2010-11-19 2014-07-31 Rutgers, The State University Of New Jersey High-throughput assessment method for contact hypersensitivity
CN105758450A (en) * 2015-12-23 2016-07-13 西安石油大学 Fire protection pre-warning sensing system building method based on multiple sensor emergency robots
CN106202886A (en) * 2016-06-29 2016-12-07 中国铁路总公司 Track circuit red band Fault Locating Method based on fuzzy coarse central Yu decision tree
CN107194420A (en) * 2017-05-16 2017-09-22 浙江象立医疗科技有限公司 A kind of Fuzzy and Rough concentrates the attribute selection method based on information gain-ratio
CN107679368A (en) * 2017-09-11 2018-02-09 宁夏医科大学 PET/CT high dimensional feature level systems of selection based on genetic algorithm and varied precision rough set
CN108334859A (en) * 2018-02-28 2018-07-27 上海海洋大学 A kind of optical remote sensing Warships Model identification crowdsourcing system based on fine granularity feature
CN108389109A (en) * 2018-02-11 2018-08-10 中国民航信息网络股份有限公司 A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114650A2 (en) * 2003-06-16 2004-12-29 Hewlett Packard Development Company, L.P. Systems and methods for dot gain determination and dot gain based printing
CN101923604A (en) * 2010-07-23 2010-12-22 福建师范大学 Classification method for weighted KNN oncogene expression profiles based on neighborhood rough set
US20140213466A1 (en) * 2010-11-19 2014-07-31 Rutgers, The State University Of New Jersey High-throughput assessment method for contact hypersensitivity
CN102755172A (en) * 2011-04-28 2012-10-31 株式会社东芝 Nuclear medical imaging method and device
CN102510363A (en) * 2011-09-30 2012-06-20 哈尔滨工程大学 LFM (linear frequency modulation) signal detecting method under strong interference source environment
CN103202714A (en) * 2012-01-16 2013-07-17 株式会社东芝 Ultrasonic Diagnostic Apparatus, Medical Image Processing Apparatus, And Medical Image Processing Method
CN103258204A (en) * 2012-02-21 2013-08-21 中国科学院心理研究所 Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features
CN103336790A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast neighborhood rough set attribute reduction method
CN103336791A (en) * 2013-06-06 2013-10-02 湖州师范学院 Hadoop-based fast rough set attribute reduction method
CN103744928A (en) * 2013-12-30 2014-04-23 北京理工大学 Network video classification method based on historical access records
CN105758450A (en) * 2015-12-23 2016-07-13 西安石油大学 Fire protection pre-warning sensing system building method based on multiple sensor emergency robots
CN106202886A (en) * 2016-06-29 2016-12-07 中国铁路总公司 Track circuit red band Fault Locating Method based on fuzzy coarse central Yu decision tree
CN107194420A (en) * 2017-05-16 2017-09-22 浙江象立医疗科技有限公司 A kind of Fuzzy and Rough concentrates the attribute selection method based on information gain-ratio
CN107679368A (en) * 2017-09-11 2018-02-09 宁夏医科大学 PET/CT high dimensional feature level systems of selection based on genetic algorithm and varied precision rough set
CN108389109A (en) * 2018-02-11 2018-08-10 中国民航信息网络股份有限公司 A kind of suspicious order feature extracting method of civil aviaton based on composite character selection algorithm
CN108334859A (en) * 2018-02-28 2018-07-27 上海海洋大学 A kind of optical remote sensing Warships Model identification crowdsourcing system based on fine granularity feature

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LIU JINGHUA: "Online multi-label streaming feature selection based on neighborhood rough set", 《PATTERN RECOGNITION》 *
刘翠翠: "基于改进邻域粗糙集的肿瘤特征基因选择算法的研究", 《无线互联科技》 *
王荣荣等: "基于粗糙集和遗传算法的水轮发电机组故障诊断方法", 《中国农村水利水电》 *
詹蓉等: "个性化需求分类的定量分析研究", 《软科学》 *
邓大勇等: "多粒度粗糙集的双层绝对约简", 《模式识别与人工智能》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598192A (en) * 2019-06-28 2019-12-20 太原理工大学 Text feature reduction method based on neighborhood rough set
CN110464345A (en) * 2019-08-22 2019-11-19 北京航空航天大学 A kind of separate head bioelectrical power signal interference elimination method and system
CN110988804A (en) * 2019-11-11 2020-04-10 浙江大学 Radar radiation source individual identification system based on radar pulse sequence
CN110988804B (en) * 2019-11-11 2022-01-25 浙江大学 Radar radiation source individual identification system based on radar pulse sequence
CN111476455A (en) * 2020-03-03 2020-07-31 中国南方电网有限责任公司 Power grid operation section feature selection and online generation method based on two-stage structure
CN111553127A (en) * 2020-04-03 2020-08-18 河南师范大学 Multi-label text data feature selection method and device
CN111553127B (en) * 2020-04-03 2023-11-24 河南师范大学 Multi-label text data feature selection method and device
CN112200259A (en) * 2020-10-19 2021-01-08 哈尔滨理工大学 Information gain text feature selection method and classification device based on classification and screening
CN112365992A (en) * 2020-11-27 2021-02-12 安徽理工大学 Medical examination data identification and analysis method based on NRS-LDA

Also Published As

Publication number Publication date
CN109934278B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
Arunkumar et al. Fully automatic model‐based segmentation and classification approach for MRI brain tumor using artificial neural networks
CN109934278A (en) A kind of high-dimensional feature selection method of information gain mixing neighborhood rough set
Carvalho et al. Breast cancer diagnosis from histopathological images using textural features and CBIR
Farid et al. A novel approach of CT images feature analysis and prediction to screen for corona virus disease (COVID-19)
CN108364006B (en) Medical image classification device based on multi-mode deep learning and construction method thereof
Lee et al. Random forest based lung nodule classification aided by clustering
de Carvalho Filho et al. Automatic detection of solitary lung nodules using quality threshold clustering, genetic algorithm and diversity index
Bridge et al. Introducing the GEV activation function for highly unbalanced data to develop COVID-19 diagnostic models
Orozco et al. Lung nodule classification in CT thorax images using support vector machines
Kundu et al. An automatic bleeding frame and region detection scheme for wireless capsule endoscopy videos based on interplane intensity variation profile in normalized RGB color space
de Sousa Costa et al. Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance
CN109978880A (en) Lung tumors CT image is carried out sentencing method for distinguishing using high dimensional feature selection
Borkowski et al. Comparing artificial intelligence platforms for histopathologic cancer diagnosis
Dong et al. Cervical cell classification based on the CART feature selection algorithm
Buda et al. Deep radiogenomics of lower-grade gliomas: convolutional neural networks predict tumor genomic subtypes using MR images
Yuan et al. An efficient multi-path 3D convolutional neural network for false-positive reduction of pulmonary nodule detection
Sethanan et al. Double AMIS-ensemble deep learning for skin cancer classification
Diniz et al. An ensemble method for nuclei detection of overlapping cervical cells
Kumar et al. Recent advances in machine learning for diagnosis of lung disease: A broad view
Vogado et al. A ensemble methodology for automatic classification of chest X-rays using deep learning
Ganeshkumar et al. Two-stage deep learning model for automate detection and classification of lung diseases
Singh et al. Detection of Brain Tumors Through the Application of Deep Learning and Machine Learning Models
Grace John et al. Extreme learning machine algorithm‐based model for lung cancer classification from histopathological real‐time images
Kaur et al. A survey on medical image segmentation
Hu et al. Classification of malignant-benign pulmonary nodules in lung CT images using an improved random forest (Use style: Paper title)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant