CN108962382B - Hierarchical important feature selection method based on breast cancer clinical high-dimensional data - Google Patents

Hierarchical important feature selection method based on breast cancer clinical high-dimensional data Download PDF

Info

Publication number
CN108962382B
CN108962382B CN201810552686.3A CN201810552686A CN108962382B CN 108962382 B CN108962382 B CN 108962382B CN 201810552686 A CN201810552686 A CN 201810552686A CN 108962382 B CN108962382 B CN 108962382B
Authority
CN
China
Prior art keywords
feature
threshold
value
statistical
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810552686.3A
Other languages
Chinese (zh)
Other versions
CN108962382A (en
Inventor
付波
刘沛
林劼
郑鸿
邓玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810552686.3A priority Critical patent/CN108962382B/en
Publication of CN108962382A publication Critical patent/CN108962382A/en
Application granted granted Critical
Publication of CN108962382B publication Critical patent/CN108962382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a hierarchical important feature selection method based on breast cancer clinical high-dimensional data. The feature selection method comprises the steps of statistical feature selection and integrated feature selection, wherein the statistical feature selection adopts a single factor analysis method, and the features which have significant influence on bureau variables are preliminarily selected through different statistical tests; the integrated feature selection is realized by establishing a gradient lifting tree model, obtaining feature importance scores after model training, and then using an importance score threshold value after design and verification to realize feature selection which has important influence on bureau variables. The invention can effectively overcome the problems of over-high data characteristic dimension, excessive redundant characteristics, disordered data and the like in the clinical breast cancer prediction modeling process. Redundant or meaningless characteristics in clinical breast cancer high-dimensional data can be eliminated, so that characteristics which are as few as possible and have important influence on breast cancer modeling are selected, and the accuracy and the practicability of a breast cancer model are guaranteed.

Description

Hierarchical important feature selection method based on breast cancer clinical high-dimensional data
Technical Field
The invention relates to the fields of computer technology, statistical machine learning technology, characteristic engineering technology and the like.
Background
The breast cancer is the malignant tumor with the highest incidence rate in women all over the world, and seriously threatens the health of the women. Breast cancer patients usually intervene by surgical and chemotherapy treatments, and may be at risk of recurrence at any time after treatment. Scientific evaluation and prediction of survival state of breast cancer patients can assist doctors to make appropriate treatment plans, and provides new support for reducing recurrence risk of patients and improving prognosis.
The survival state of the breast cancer patient is estimated and predicted, for example, the recurrence-free survival rate is realized, and a machine learning prediction model can be established based on breast cancer clinical data. However, the quality of the clinical data largely determines the performance of the predictive model. In the real world, clinical data of breast cancer patients generally include basic information of patients, diagnosis history, pathology, surgery, chemotherapy, radiotherapy, endocrine therapy, targeted therapy and the like. These data features are high in dimensionality and generally have problems of data missing, abnormality, duplication and inconsistency, so that the original clinical data in the real world needs to be cleaned to ensure data quality.
Data cleaning cannot solve the problem of high dimensionality of breast cancer clinical data. The feature engineering and the dimension reduction processing of the high-dimensional feature data are greatly necessary, and are mainly expressed in the following two aspects:
(1) and (5) predicting the practicability of the model. The predictive model, after being embedded in a breast cancer patient prognosis evaluation system, requires a doctor or a patient to input necessary information related to prediction. The information is used as a model input characteristic value to enter a prediction model, and finally the system can effectively predict according to the input information. Too many input features will consume the patient or doctor effort and time, which greatly reduces the utility of the predictive model.
(2) And predicting the performance of the model. In fact, feature engineering is used to identify and remove unwanted, irrelevant, and redundant attributes that do not improve the performance of the predictive model or may actually degrade the performance of the model. In practice, fewer features are needed because it reduces the complexity of the model and a simpler model can be more easily understood and interpreted.
Therefore, in order to construct a practical and high-performance prediction model, the important point is to perform feature engineering processing on clinical high-dimensional data so as to screen out features which have important influence on the survival without recurrence of breast cancer, thereby achieving the purposes of assisting diagnosis of doctors, reducing the recurrence risk of patients and improving prognosis.
The high-dimensional data feature selection method can be generally divided into the following methods:
(1) single factor analysis method. Each factor is analyzed separately, and whether the factor has a significant influence on the target variable is determined by a statistical test method. This method can only simply exclude a few irrelevant features and ignore the interaction between features.
(2) And (3) a characteristic importance analysis method. And fitting the training data by using a certain base learner (such as CART or random forest) to obtain the importance score of each feature, and excluding the feature with the importance score of 0. The method can eliminate irrelevant features, but the finally selected feature dimension is still high, and the data feature dimension cannot be reduced as much as possible.
(3) A recursive feature elimination method. Proposed by Guyon et al. The method recursively eliminates the features with lower importance one by one on the basis of a feature importance analysis method, calculates the expression of a base learner on a new feature set one by one, and recalculates the importance score of each feature as the basis of the next feature elimination. The feature set that performs best is ultimately selected. The method has high requirements on computing resources and time under a real high-dimensional data scene, and the selection of the base learning and the instability of the feature importance score often have great influence on the result.
The high-dimensional data feature selection method requires that redundant or irrelevant features are eliminated under the condition of ensuring model performance and acceptable time complexity, and the number of finally selected features is reduced as much as possible. Therefore, how to select important features from high-dimensional data is a problem that scientific researchers at home and abroad need to take important thinking.
Disclosure of Invention
The invention aims to solve the problem of over-high dimensionality of clinical data in the establishment of a breast cancer survival prediction model. The problem of important feature extraction and model practicability is solved by utilizing a hierarchical feature selection method combining statistical feature selection and integrated feature selection.
The invention discloses a hierarchical important feature selection method based on breast cancer clinical high-dimensional data, which comprises the following steps of:
and (3) statistical feature selection processing:
carrying out feature extraction on the original clinical data and cleaning to obtain an original feature set Fn
Computing a raw feature set FnFeature F of each dimension iniA significance value of;
features F having significance values less than a predetermined thresholdiForming a set of statistical features Fm
Integrated feature selection processing:
obtaining a statistical feature set FmEach feature F in (1)iMean value of importance scores
Figure BDA0001680826210000021
Setting different random number seeds, and selecting a statistical feature set F based on the random number seedsmEstablishing a gradient lifting tree model and outputting a statistical feature set FmEach feature F in (1)iImportance Score under current random numberiScore for importance under all random number of seedsiAveraging to obtain each characteristic FiMean value of importance scores
Figure BDA0001680826210000022
Based on a preset importance score threshold value, collecting F by using statistical characteristicsmMean value of importance scores in (1)
Figure BDA0001680826210000023
Feature F greater than importance score thresholdiForm an important feature set Fe
Further, feature FiThe calculation mode of the significance value is specifically as follows:
based on features FiThe characteristic attributes of the two types of the filter are differentIs measured to calculate the feature FiA significance value of;
feature F for feature attributes as categorical variablesiFirst, the feature F is judgediWhether it is an ordered or unordered categorical variable, if feature FiFor ordered categorizing variables, the Mann-Whitney U test was used to calculate feature FiSignificance value (p-value); if feature FiIf the variable is a disorder classification variable, calculating the characteristic F by chi-square testiA significance value of;
for features F with feature attributes being continuous variablesiFirst, a KS test (Kolmogorov-Smirnov test) was used for the feature FiWhether the distribution is in accordance with normal distribution or not, if so, calculating the characteristic F by adopting the T Test (One-Samples T Test) of an independent sampleiA significance value of; otherwise, feature F was calculated using the Mann-Whitney U testiSignificance value of (a).
Further, the preferred setting mode of the importance score threshold is as follows:
setting an initial threshold value as 0, adopting a backward feature selection method, gradually and selectively increasing the threshold value to obtain feature sets under corresponding threshold values, establishing a gradient lifting tree model for each feature set corresponding to the threshold value to obtain an evaluation index value of the model on a test set, and selecting the feature set corresponding threshold value with the least number of features as a feature importance scoring threshold value in all corresponding feature sets meeting the condition that the difference between the maximum evaluation index value and the corresponding threshold value is within an acceptable range.
The method of the invention fully utilizes the layered feature selection and screens layer by layer. The combination of important features containing fewer features is selected as much as possible without affecting the breast cancer model performance. The method has the following advantages:
(1) the statistical feature selection is used for finding out the single-dimensional features which have significant influence on the bureau variables, and the influence of the single features which are significantly irrelevant on the performance of the final prediction model is eliminated;
(2) the gradient lifting tree is used as a base learner, and the interaction among the multi-dimensional data features can be well processed. Therefore, the probability space of the data features is fully learned, and the accuracy of grading important features is ensured;
(3) the importance score average value is obtained by adopting multiple tests, so that the influence of accidental random number selection events in machine learning is shielded, and the reliability and stability of the importance score are ensured;
(4) the importance score threshold is selectively selected, rather than eliminating the features one by one, so that the time for selecting the features and the consumption of computing resources are reduced;
(5) the simplest characteristic set is selected within the acceptable range of model performance loss, and the performance and the practicability of the built prediction model are ensured.
Therefore, the invention has obvious advantages and wider application scenes.
Drawings
FIG. 1 is a basic process flow diagram of the present invention;
FIG. 2 is a flow chart of statistical feature selection in accordance with the present invention;
FIG. 3 is a flow chart of the integrated feature selection of the present invention;
FIG. 4 is a schematic diagram of threshold setting for integrated feature selection;
fig. 5 is a schematic diagram of an implementation process of the application of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.
Referring to fig. 1, the method for selecting hierarchical important features oriented to breast cancer clinical high-dimensional data includes a threshold setting mode involved in statistical feature calculation, integrated feature calculation and integrated feature calculation. The invention utilizes a hierarchical feature selection method combining statistical feature selection and integrated feature selection to effectively solve the problems of important feature extraction, model practicability and the like. The specific implementation process is as follows:
s1: and (4) selecting statistical characteristics.
Carrying out feature extraction on the original clinical data and cleaning to obtain an original feature set Fn(ii) a And calculating an original feature set FnFeature F of each dimension iniBy the feature F whose significance value is less than a preset threshold valuei(subscript dimension identifier) constitutes a set of statistical features Fm. Referring to fig. 2, the process is performed as follows:
s101: extracting the characteristics of breast cancer clinical data and cleaning the breast cancer clinical data to obtain an original characteristic set FnGo through FnFeature F of each dimension iniJudging the feature FiIs the judgment feature FiWhether the variable belongs to a classification variable or a continuous variable, if the variable belongs to the classification variable, executing the step S102; if the continuous variable belongs to the continuous variable, step S104 is executed.
S102: if feature FiAnd if the variable belongs to the classification variable, judging whether the variable belongs to the ordered classification variable or the unordered classification variable.
S103: if the feature FiIf the classification variable is an ordered classification variable, calculating a p value by using a Mann-Whitney U test; if the feature FiIs an unordered categorical variable, a chi-squared test is used to calculate the p-value for it. And then jumping to S106 for execution.
S104: if the feature FiIs a continuous variable, KS check feature F is applied theretoiWhether the distribution follows a normal distribution.
S105: if a normal distribution is obeyed (e.g., p >0.05, then the normal distribution is considered obeyed), then the p-value is calculated using the T-test of the independent sample; otherwise, calculating a p-value using the Mann-Whitney U test;
s106: if the feature FiIf the p value is less than 0.05 by statistical test, the feature F isiAdding selected feature set FmI.e. a set of statistical characteristics FmIn which F ismIs an empty set.
S2: and (4) integrating feature selection.
For the obtained statistical feature set FmAnd further screening important features by applying a echelon lifting tree learning method, referring to fig. 3, the implementation process is as follows:
s201: for the statistical feature set FmPerforming importance scoring:
use the bagSet F containing statistical characteristicsmAnd establishing a gradient lifting tree model according to the training data. Through model parameter adjustment and training, outputting a statistical characteristic set FmScore of importance of each feature in (1)i
S202: obtaining importance score mean
Figure BDA0001680826210000051
Setting different random number seeds, repeating the step S201 experiment T times (in the specific embodiment, setting to 100 times), and finally averaging the T experiment results to obtain a statistical characteristic set FmMean value of importance scores of each feature in (1)
Figure BDA0001680826210000052
S203: setting a feature importance score threshold:
set of statistical features FmThe features (elements) in the initial candidate feature set F are sorted from small to large according to the average value of the importance scoresh(ii) a Then, for the initial candidate feature set FhAnd obtaining a feature importance scoring threshold value by adopting a backward feature selection method. Referring to fig. 4, the implementation process is as follows:
(1) an initial threshold value threshold is set to 0.
(2) Setting a random step size or a fixed step size (observing the average value of the importance scores) of the threshold value increase to obtain a threshold value threshold of each stepdLower candidate feature set FhdWherein threshold isd=thresholdd-1+step,threshold0The initial value of step identifier j is 1, 0; candidate feature set FhdIs based on threshold value thresholdjFor the initial candidate feature set FhCharacteristics after screening: if the initial candidate feature set FhFeature F iniMean value of importance scores
Figure BDA0001680826210000053
Greater than a threshold valuedThen feature F is retainediOtherwise, F will beiFrom the set FhTo obtain a candidate feature set F after screeninghd
(3) Updating the step identifier d as d +1, and continuing to calculate the thresholddAnd a candidate feature set FhdUntil a preset maximum number of steps is reached (set to 10 in this embodiment). The end condition of this step can also be the current candidate feature set FhdIs an empty set; or up to thresholddEqual to or greater than the initial candidate feature set FhMean importance score of the endmost feature in (1).
(4) A plurality of non-empty candidate feature sets F obtained in the above stepsh1,Fh2…, using a set of contained candidate features FhjThe gradient lifting tree model is established, wherein the subscript j is the identifier of the candidate feature set of the non-empty set.
(5) Adjusting and training parameters of each gradient lifting tree model to obtain an evaluation index value V of the model on an independent test setjAnd setting a corresponding evaluation index based on the actual demand.
(6) Final selection of feature importance score threshold
Figure BDA0001680826210000063
Corresponding subscript j*Satisfies the following conditions:
Figure BDA0001680826210000061
wherein Δ represents a preset deviation threshold value, and is selected according to actual conditions, that is, the characteristic number | F is selected from all the characteristic sets satisfying the difference with the maximum evaluation index value within the acceptable range ΔhjThreshold corresponding to minimum feature set
Figure BDA0001680826210000062
As the final feature importance score threshold (i.e., threshold t shown in fig. 1).
S204: an important feature is selected.
By a set of statistical features FmImportance score in (1)Obtaining an important feature set F by using features with the mean value being greater than or equal to thresholde
The feature selection method of the present invention is applied to a breast cancer prediction system, and a specific application implementation process schematic diagram of the feature selection method is shown in fig. 5, and includes two stages of training and prediction, wherein the training process specifically includes: in the data preprocessing module, historical data of breast cancer patients are extracted and sorted, and then are divided into demographic characteristics, diagnosis characteristics, pathological characteristics and treatment characteristics. The features are input into a statistical feature selection processing module in a whole, and the features which are not statistically significant are preliminarily screened out. And then inputting the screened statistical characteristic data into an integrated characteristic selection processing module, and eliminating the characteristics smaller than the threshold value based on a threshold value and a characteristic evaluation score which are set by repeated tests, parameter adjustment and performance comparison and meet the requirements. Therefore, the final characteristics (important characteristics) with strong statistics and model discrimination capability are obtained, and the purpose of reducing the dimension is achieved. And (5) taking the features subjected to dimensionality reduction as input to construct a breast cancer prediction machine learning model.
In the prediction stage, for a certain patient (prediction object), based on the important features screened in the training stage, feature data corresponding to the important features are extracted from breast cancer clinical data of the patient and input into a breast cancer prediction model, and the disease state of the patient is output based on the prediction result.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (4)

1. A hierarchical important feature selection method based on breast cancer clinical high-dimensional data is characterized by comprising the following steps:
and (3) statistical feature selection processing:
feature extraction of raw clinical dataCleaning, extracting, sorting, and classifying into demographic characteristics, diagnostic characteristics, pathological characteristics and therapeutic characteristics to obtain original characteristic set Fn
Computing a raw feature set FnFeature F of each dimension iniSignificance value of (a):
characteristic FiThe calculation mode of the significance value is specifically as follows:
based on features FiThe characteristic attribute of the image is calculated by adopting different measurement modes to obtain a characteristic FiA significance value of;
feature F for feature attributes as categorical variablesiFirst, the feature F is judgediWhether it is an ordered or unordered categorical variable, if feature FiFor ordered categorizing variables, the Mann-Whitney U test was used to calculate feature FiA significance value of; if FiIf the variable is a disorder classification variable, calculating the characteristic F by chi-square testiA significance value of;
for features F with feature attributes being continuous variablesiFirst, feature F is examined using KSiIf the distribution obeys normal distribution, the characteristic F is calculated by adopting the T test of an independent sampleiA significance value of; otherwise, feature F was calculated using the Mann-Whitney U testiA significance value of;
features F having significance values less than a predetermined thresholdiForming a set of statistical features Fm
Integrated feature selection processing:
obtaining a statistical feature set FmEach feature F in (1)iMean value of importance scores
Figure FDA0003509141880000011
Setting different random number seeds, and selecting a statistical feature set F based on the random number seedsmEstablishing a gradient lifting tree model and outputting a statistical feature set FmEach feature F in (1)iImportance Score under current random numberiSco score for importance under all random number seedsreiAveraging to obtain each characteristic FiMean value of importance scores
Figure FDA0003509141880000012
Based on a preset importance score threshold value, collecting F by using statistical characteristicsmMean value of importance scores in (1)
Figure FDA0003509141880000013
Feature F greater than importance score thresholdiForm an important feature set FeSet F with important featureseThe method is characterized by comprising the steps of inputting, and constructing a breast cancer prediction machine learning model;
the setting mode of the importance score threshold is as follows:
setting an initial threshold value as 0, adopting a backward feature selection method, gradually and selectively increasing the threshold value to obtain feature sets under corresponding threshold values, establishing a gradient lifting tree model for each feature set corresponding to the threshold value to obtain an evaluation index value of the model on a test set, and selecting the feature set corresponding threshold value with the least number of features as a feature importance scoring threshold value in all corresponding feature sets meeting the condition that the difference between the maximum evaluation index value and the corresponding threshold value is within an acceptable range.
2. The method of claim 1, wherein the importance score threshold is preferably set by:
setting the value of the random step or fixed step of the threshold increase, and initializing the threshold00, step identifier d 1;
calculating threshold of the d stepd=thresholdd-1+ step, where step denotes the value of the random step or fixed step of the threshold increase, from the statistical feature set FmMean of medium importance scores
Figure FDA0003509141880000021
Greater than a threshold valuedCharacteristic F ofiForming a candidate feature set F of the d stephd
Step identifier d is increased by 1, and threshold is continuously calculateddAnd a candidate feature set FhdUntil d reaches a preset maximum step number, obtaining a plurality of non-empty candidate feature sets Fh1,Fh2,…;
Candidate feature set F of obtained non-empty sethjUsing a set of candidate features FhjEstablishing a gradient lifting tree model, and obtaining an evaluation index value V of the gradient lifting tree model on an independent test setjWherein the index j is an identifier of a candidate feature set that is not an empty set;
according to the formula
Figure FDA0003509141880000022
From all thresholdsjTo select a corresponding identifier j*Is/are as follows
Figure FDA0003509141880000023
As the importance score threshold, where Δ represents a preset deviation threshold, | FhjI represents a candidate feature set FhjThe characteristic number of (2).
3. The method of claim 2, wherein calculating threshold is to be stoppeddAnd replacing the end condition of the candidate feature set with the current candidate feature set FhdAs an empty set or current thresholddIs equal to or greater than the statistical feature set FmThe mean of the maximum importance scores in (1).
4. The method of claim 2, wherein the step d of screening the candidate feature set comprises first screening the statistical feature set FmThe features in the method are sorted from small to large according to the average value of the importance scores to form an initial candidate feature set FhThen, the initial candidate feature set FhScreening the candidate feature set in the step d.
CN201810552686.3A 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data Active CN108962382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810552686.3A CN108962382B (en) 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810552686.3A CN108962382B (en) 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data

Publications (2)

Publication Number Publication Date
CN108962382A CN108962382A (en) 2018-12-07
CN108962382B true CN108962382B (en) 2022-05-03

Family

ID=64492813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810552686.3A Active CN108962382B (en) 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data

Country Status (1)

Country Link
CN (1) CN108962382B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383766A (en) * 2018-12-28 2020-07-07 中山大学肿瘤防治中心 Computer data processing method, device, medium and electronic equipment
CN110363333A (en) * 2019-06-21 2019-10-22 南京航空航天大学 The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree
CN112183758A (en) * 2019-07-04 2021-01-05 华为技术有限公司 Method and device for realizing model training and computer storage medium
CN112309571B (en) * 2020-10-30 2022-04-15 电子科技大学 Screening method of prognosis quantitative characteristics of digital pathological image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999760A (en) * 2012-09-28 2013-03-27 常州工学院 Target image area tracking method for on-line self-adaptive adjustment of voting weight
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN107316205A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN107729915A (en) * 2017-09-08 2018-02-23 第四范式(北京)技术有限公司 For the method and system for the key character for determining machine learning sample
CN107909433A (en) * 2017-11-14 2018-04-13 重庆邮电大学 A kind of Method of Commodity Recommendation based on big data mobile e-business

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693865B2 (en) * 2006-08-30 2010-04-06 Yahoo! Inc. Techniques for navigational query identification
CN107256245B (en) * 2017-06-02 2020-05-05 河海大学 Offline model improvement and selection method for spam message classification
CN114298323A (en) * 2017-09-08 2022-04-08 第四范式(北京)技术有限公司 Method and system for generating combined features of machine learning samples
CN107944913B (en) * 2017-11-21 2022-03-22 重庆邮电大学 High-potential user purchase intention prediction method based on big data user behavior analysis

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999760A (en) * 2012-09-28 2013-03-27 常州工学院 Target image area tracking method for on-line self-adaptive adjustment of voting weight
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN107316205A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN107729915A (en) * 2017-09-08 2018-02-23 第四范式(北京)技术有限公司 For the method and system for the key character for determining machine learning sample
CN107909433A (en) * 2017-11-14 2018-04-13 重庆邮电大学 A kind of Method of Commodity Recommendation based on big data mobile e-business

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers;Zhibin Xiao 等;《ISPRS International Journal of Geo-Information》;20170228;1-22 *
后腹腔镜下肾部分切除术治疗中央型及外周型肾肿瘤的临床对比研究;杜健;《中国优秀硕士学位论文全文数据库 医药卫生科技辑》;20180115;E072-1507 *
基于集成学习和深度学习的短期降雨预测模型;关鹏洲 等;《2017年(第五届)全国大学生统计建模大赛获奖论文选》;20171205;1-22 *

Also Published As

Publication number Publication date
CN108962382A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN108962382B (en) Hierarchical important feature selection method based on breast cancer clinical high-dimensional data
CN109272048B (en) Pattern recognition method based on deep convolutional neural network
Rajini et al. Computer aided detection of ischemic stroke using segmentation and texture features
CN104424386A (en) Multi-parameter magnetic resonance image based prostate cancer computer auxiliary identification system
Merjulah et al. Classification of myocardial ischemia in delayed contrast enhancement using machine learning
CN110859624A (en) Brain age deep learning prediction system based on structural magnetic resonance image
CN112530592A (en) Non-small cell lung cancer risk prediction method based on machine learning
CN107545133A (en) A kind of Gaussian Blur cluster calculation method for antidiastole chronic bronchitis
Zhu et al. DSNN: a DenseNet-based SNN for explainable brain disease classification
Xiang et al. A novel weight pruning strategy for light weight neural networks with application to the diagnosis of skin disease
Hu et al. A GLCM embedded CNN strategy for computer-aided diagnosis in intracerebral hemorrhage
Syed et al. Detection of tumor in MRI images using artificial neural networks
He et al. Quantification of cognitive function in Alzheimer’s disease based on deep learning
JP7413295B2 (en) Image processing device, image processing method and program
Gull et al. A deep learning approach for multi‐stage classification of brain tumor through magnetic resonance images
WO2023198224A1 (en) Method for constructing magnetic resonance image preliminary screening model for mental disorders
CN113113143A (en) Myocardial infarction risk degree evaluation system considering delayed enhancement nuclear magnetic image
González et al. Deep convolutional neural network to predict 1p19q co-deletion and IDH1 mutation status from MRI in low grade gliomas
CN115067887A (en) Alzheimer disease prediction platform based on structural brain connection group method
Sünkel et al. Hybrid quantum machine learning assisted classification of COVID-19 from computed tomography scans
CN114373096A (en) Pulmonary nodule benign and malignant prediction system and method based on multi-feature fusion
CN109875522A (en) A method of prediction prostate biopsy and root value criterion pathological score consistency
Ali et al. Fuzzy classifier for classification of medical data
WO2022210473A1 (en) Prognosis prediction device, prognosis prediction method, and program
CN116487038B (en) Prediction system and storage medium for progression of mild cognitive impairment to Alzheimer's disease

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant