CN108962382A - A kind of layering important feature selection method based on breast cancer clinic high dimensional data - Google Patents

A kind of layering important feature selection method based on breast cancer clinic high dimensional data Download PDF

Info

Publication number
CN108962382A
CN108962382A CN201810552686.3A CN201810552686A CN108962382A CN 108962382 A CN108962382 A CN 108962382A CN 201810552686 A CN201810552686 A CN 201810552686A CN 108962382 A CN108962382 A CN 108962382A
Authority
CN
China
Prior art keywords
feature
value
threshold
threshold value
selection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810552686.3A
Other languages
Chinese (zh)
Other versions
CN108962382B (en
Inventor
付波
刘沛
林劼
郑鸿
邓玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810552686.3A priority Critical patent/CN108962382B/en
Publication of CN108962382A publication Critical patent/CN108962382A/en
Application granted granted Critical
Publication of CN108962382B publication Critical patent/CN108962382B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of layering important feature selection methods based on breast cancer clinic high dimensional data.Feature selection approach of the invention includes statistical nature selection and Ensemble feature selection, and wherein statistical nature selection uses single-factor analysis therapy, goes out the feature having a significant impact to final result variable by different statistical check initial options;Ensemble feature selection promotes tree-model by establishing gradient, and feature prominence score is obtained after model training, then using the prominence score threshold value by design and verifying, to realize the feature selecting having a major impact to final result variable.The present invention can effectively overcome the problems such as data characteristics dimension in clinical breast cancer prediction modeling process is excessively high, redundancy feature is excessive and data are mixed and disorderly.Redundancy or meaningless feature in clinical breast cancer high dimensional data can be excluded, so that selection models less and to breast cancer the feature having a major impact as far as possible, guarantees the accuracy and practicability of breast cancer model.

Description

A kind of layering important feature selection method based on breast cancer clinic high dimensional data
Technical field
The present invention relates to computer technology, the fields such as statistical machine learning technology and Feature Engineering technology.
Background technique
Breast cancer is the global highest malignant tumour of women disease incidence, seriously threatens women's health.Patient with breast cancer is usual It can be intervened by remedy measures such as operation and chemotherapy, the risk of recurrence may be faced at any time after treatment.Science Ground assessment prediction patient with breast cancer survival condition can assist doctor to formulate appropriate treatment plan, to reduce Patients on Recurrence risk New support is provided with prognosis is improved.
It realizes assessment prediction patient with breast cancer survival condition, such as recurrence-free survival rate, breast cancer clinic number can be based on According to establishing machine learning prediction model.However, clinical data quality has been largely fixed the performance of prediction model.True generation Under boundary, the clinical data of patient with breast cancer, generally comprise patient basis, diagnosis medical history, pathology, operation, chemotherapy, radiotherapy, The information such as endocrine therapy and targeted therapy.These data characteristics dimensions are higher, and usually exist the missings of data, exception, Repetition and inconsistent problem, so needing to clean the initial clinical data under real world, to ensure the quality of data.
Data cleansing can not solve the problems, such as that breast cancer clinical data is high-dimensional.And feature work is carried out to high dimensional feature data Journey, dimension-reduction treatment have very big necessity, in terms of being mainly manifested in following two:
(1) prediction model practicability.Prediction model needs doctor or trouble after being embedded in patient with breast cancer's prognostic system The relevant necessary information of person's input prediction.These information will enter prediction model, final system as mode input feature value It could be effectively predicted according to input information.Input feature vector is excessive, will expend patient or doctor's energy and time, this is dropped significantly The low practicability of prediction model.
(2) prediction model performance.In fact, Feature Engineering be used to identify and remove it is unwanted, it is incoherent and superfluous Remaining attribute, these attributes can not improve the performance of prediction model, or may in fact reduce the performance of model.Actually ask In topic, it would be desirable to which less feature, because it can reduce the complexity of model, and simpler model can be by Simpler understanding and explanation.
Therefore, to construct practical and high performance prediction model, it is preferred that emphasis is carry out Feature Engineering to clinical high dimensional data Processing, to reach auxiliary diagnosis, reduces patient to filter out the feature having a major impact to breast cancer recurrence-free survival Risk of recurrence and the purpose for improving prognosis.
High dimensional data feature selection approach totally comes to be divided into following several:
(1) single factor analysis method.Each factor is individually analyzed, which is determined by the method for statistical check Whether target variable is had a significant impact.This method can only simply exclude a small amount of incoherent feature, have ignored feature it Between reciprocation.
(2) feature importance analysis method.It is fitted training data using some base learner (such as CART or random forest), The prominence score of each feature is obtained, the feature that prominence score is 0 is excluded.This method can exclude incoherent spy Sign, but often the characteristic dimension of final choice is still higher, can not reduce data characteristics dimension as far as possible.
(3) recursive feature removing method.It is proposed by Guyon et al..This method is on the basis of feature importance analysis method On the one by one lower feature of recursion elimination importance, gradually calculate performance of the base learner in new feature set, and again The prominence score for newly calculating each feature, the foundation eliminated as feature next time.The feature set that final choice behaves oneself best. This method is higher to computing resource and time requirement under true high dimensional data scene, and the selection and feature of base study The unstability of prominence score often has a significant impact to result.
High dimensional data feature selection approach, it is desirable that under conditions of guaranteeing model performance and acceptable time complexity, Redundancy or incoherent feature are excluded, reduces the feature quantity of final choice as far as possible.Therefore, how in high dimensional data Important feature is selected, is the problem of domestic and international researcher needs emphasis to think deeply.
Summary of the invention
Object of the present invention is to be directed to establish the problem that clinical data dimension is excessively high in breast cancer Prediction of survival model.Utilize system The layered characteristic selection method that meter feature selecting and Ensemble feature selection combine solves important feature and extracts and Model Practical The problem of.
Layering important feature selection method based on breast cancer clinic high dimensional data of the invention, comprising the following steps:
Statistical nature selection processing:
Feature extraction is carried out to initial clinical data and is started the cleaning processing, primitive character set F is obtainedn
Calculate primitive character set FnIn each dimension feature FiSignificance value;
It is less than the feature F of preset threshold by significance valueiConstitute statistical nature set Fm
Ensemble feature selection processing:
Obtain statistical nature set FmIn each feature FiProminence score mean valueDifferent random numbers is set Seed includes statistical nature set F based on random number seed selectionmTraining data, establish gradient promoted tree-model, output system Count characteristic set FmIn each feature FiProminence score Score under current random number seedi, to all random number seeds Under prominence score ScoreiIt is averaged to obtain each feature FiProminence score mean value
Based on preset prominence score threshold value, by statistical nature set FmIn prominence score mean valueIt is greater than The feature F of prominence score threshold valueiConstitute important feature set Fe
Further, feature FiSignificance value calculation specifically:
Based on feature FiCharacteristic attribute feature F is calculated using different metric formiSignificance value;
It is the feature F of classified variable for characteristic attributei, first determine whether feature FiIt is ordered into classified variable or unordered point Class variable, if feature FiFor ordered categorization variable, then Mann-Whitney U checking computation feature F is usediSignificance value (p Value);If feature FiIt is unordered classified variable, then feature F is calculated using Chi-square TestiSignificance value;
It is the feature F of continuous variable for characteristic attributei, (Kolmogorov-Smirnov is examined using KS first Test) feature FiDistribution whether Normal Distribution, if Normal Distribution, using independent sample T examine (One- Samples T Test) calculate feature FiSignificance value;Otherwise, using Mann-Whitney U checking computation feature FiIt is aobvious Work property value.
Further, the preferred arrangement of prominence score threshold value are as follows:
Initial threshold is set as 0, using Method for Feature Selection backward, gradually selectively increases threshold value, obtains corresponding threshold value Lower characteristic set, and to each threshold value character pair set, it establishes gradient and promotes tree-model, obtain model commenting on test set Estimate index value, in all character pair set of the difference of satisfaction and maximum evaluation index value within an acceptable range, selection is special The sign least characteristic set of number corresponds to threshold value as feature prominence score threshold value.
The method of the present invention is sufficiently selected with layered characteristic, is successively screened.The case where not influencing breast cancer model performance Under, selection as far as possible is combined comprising the important feature of less feature.This method has the advantage that
(1) being found out using statistical nature selection has the one-dimensional feature significantly affected to final result variable, eliminates significantly not Relevant single feature is on the final possible influence of prediction model performance;
(2) gradient boosted tree is used as base learner, can handle influencing each other between multidimensional data feature well. To the probability space of abundant learning data feature, it is ensured that the accuracy of important feature scoring;
(3) prominence score mean value is sought using test of many times, shields accidental random number selection event in machine learning Influence, ensure that the reliability and stability of prominence score;
(4) prominence score threshold value is selectively chosen, rather than eliminates feature one by one, reduces the time of feature selecting And the consumption of computing resource;
(5) simplest feature set is selected in model performance loss tolerance interval, it is ensured that construct prediction model Performance and practicability.
Therefore, the present invention has obvious advantage and wide applicable scene.
Detailed description of the invention
Fig. 1 is basic handling flow chart of the invention;
Fig. 2 is that statistical nature of the invention selects flow chart;
Fig. 3 is Ensemble feature selection flow chart of the invention;
Fig. 4 is that schematic diagram is arranged in the threshold value of Ensemble feature selection;
Fig. 5 is the realization process schematic of application of the invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this hair It is bright to be described in further detail.
Referring to Fig. 1, the layering important feature selection method of the invention towards breast cancer clinic high dimensional data includes system Count the related threshold value set-up mode in feature calculation, integration characteristic calculating and integration characteristic calculating.The present invention utilizes system The layered characteristic selection method that meter feature selecting and Ensemble feature selection combine can efficiently solve important feature extraction and mould The problems such as type practicability.Itself the specific implementation process is as follows:
S1: statistical nature selection.
Feature extraction is carried out to initial clinical data and is started the cleaning processing, primitive character set F is obtainedn;And calculate original Beginning characteristic set FnIn each dimension feature FiSignificance value, by significance value be less than preset threshold feature Fi(subscript For dimension identifier) constitute statistical nature set Fm.Referring to fig. 2, implementation procedure is as follows:
S101: feature extraction is carried out to breast cancer clinical data and is started the cleaning processing, primitive character set F is obtainedn, time Go through FnIn each dimension feature Fi, judge this feature FiCharacteristic attribute, i.e. judging characteristic FiBelong to classified variable or Continuous variable thens follow the steps S102 if belonging to classified variable;If belonging to continuous variable, S104 is thened follow the steps.
S102: if feature FiBelong to classified variable, then judges that it belongs to ordered categorization variable or unordered classification becomes again Amount.
S103: if feature FiIt is ordered into classified variable, then is used for Mann-Whitney U checking computation p value;Such as Fruit feature FiIt is unordered classified variable, then is used for Chi-square Test and calculates p value.S106 execution is jumped to again.
S104: if feature FiIt is continuous variable, then is used for KS verification characteristics FiDistribution whether Normal Distribution.
S105: if Normal Distribution (such as p > 0.05, then it is assumed that Normal Distribution), then it is used for independent sample T checking computation p value;Otherwise, using Mann-Whitney U checking computation p value;
S106: if feature FiStatistical check p value is less than 0.05, then by feature FiSelected characteristic set F is addedm, that is, unite Count characteristic set Fm, wherein FmInitial value be empty set.
S2: Ensemble feature selection.
To obtained statistical nature set Fm, using echelon's boosted tree learning method, further screening important feature, referring to Fig. 3, implementation procedure are as follows:
S201: to statistical nature set FmCarry out prominence score:
Using including statistical nature set FmTraining data, establish gradient promoted tree-model.It is adjusted by model parameter And training, export statistical nature set FmIn each feature prominence score Scorei
S202: prominence score mean value is obtained
Different random number seeds is set, repeats step S201 experiment T times and (in present embodiment, is set as 100 It is secondary), finally T experimental result is averaged to obtain statistical nature set FmIn each feature prominence score mean value
S203: setting feature prominence score threshold value:
Statistical nature set FmIn each feature (element) sort from small to large by prominence score mean value, constitute initial wait Select characteristic set Fh;Again to initial candidate characteristic set FhFeature prominence score threshold value is obtained using to backward Method for Feature Selection. Referring to fig. 4, realize that process is as follows:
(1) setting initial threshold threshold is 0.
(2) arbitrary width or fixed step size step (observation prominence score mean value) that setting threshold value increases, obtain every step Threshold value thresholddUnder candidate characteristic set Fhd, wherein thresholdd=thresholdd-1+ step, threshold0=0, The initial value for walking identifier j is 1;Candidate characteristic set FhdFor based on threshold value thresholdjTo initial candidate characteristic set FhScreening Feature afterwards: if initial candidate characteristic set FhIn feature FiProminence score mean valueGreater than threshold value thresholdd, then keeping characteristics Fi, otherwise by FiFrom set FhMiddle deletion, thus the candidate characteristic set F after being screenedhd
(3) step identifier d=d+1 is updated, continues to calculate thresholddAnd candidate characteristic set Fhd, pre- until reaching If maximum step number (10 are set as in present embodiment).The termination condition of this step is also possible to current candidate feature set FhdFor empty set;Also or until thresholddEqual to or more than initial candidate characteristic set FhThe feature of middle least significant end it is important Property scoring mean value.
(4) the candidate characteristic set F for multiple non-emptys that above-mentioned steps obtainh1,Fh2..., using including candidate characteristic set Fhj's Training data establishes gradient and promotes tree-model, and wherein subscript j is the identifier of the candidate characteristic set of nonvoid set.
(5) parameter for promoting tree-model to each gradient is adjusted and trains, and obtains model commenting on independent test collection Estimate index value Vj, corresponding evaluation index is arranged based on actual demand.
(6) final choice feature prominence score threshold valueCorresponding subscript j*Meet:
Wherein Δ indicates preset deviation threshold, chooses according to actual conditions, i.e., meet with maximum evaluation index value it Difference selects characteristic in all characteristic sets in tolerance interval Δ | Fhj| the corresponding threshold value of the smallest characteristic setAs final feature prominence score threshold value threshold (i.e. threshold value t) shown in Fig. 1.
S204: selection important feature.
By statistical nature set FmIn prominence score mean value more than or equal to threshold value threshold feature obtain it is important Characteristic set Fe
Feature selection approach of the invention is applied in breast cancer forecasting system, concrete application realizes process schematic As shown in figure 5, including training and two stages of prediction, wherein training process specifically: in data preprocessing module, based on cream The historical data of adenocarcinoma patients is classified as demographic characteristics, diagnostic characteristic, pathological characters and controls after extraction and arrangement Treat feature.These features will be integrally input in statistical nature selection processing module, and preliminary screening goes out for statistically not Feature with statistical significance.Then, the statistical nature data filtered out are input in Ensemble feature selection processing module, Threshold value and feature evaluation score value based on repetition test, the meet demand for adjusting parameter and performance relatively more set, will be less than threshold value Feature weed out.The final feature (important feature) with stronger statistics and model resolving ability has been obtained as a result, has been reached The purpose of dimensionality reduction.It is input with the feature after dimensionality reduction, building breast cancer predicts machine learning model.
In forecast period, to some patient's (prediction object), based on the important feature that the training stage is filtered out, from patient Breast cancer clinical data in extract those of corresponding important feature characteristic, and be input to breast cancer prediction model, be based on The morbid state of prediction result output patient.
The above description is merely a specific embodiment, any feature disclosed in this specification, except non-specifically Narration, can be replaced by other alternative features that are equivalent or have similar purpose;Disclosed all features or all sides Method or in the process the step of, other than mutually exclusive feature and/or step, can be combined in any way.

Claims (6)

1. a kind of layering important feature selection method based on breast cancer clinic high dimensional data, which is characterized in that including following step It is rapid:
Statistical nature selection processing:
Feature extraction is carried out to initial clinical data and is started the cleaning processing, primitive character set F is obtainedn
Calculate primitive character set FnIn each dimension feature FiSignificance value;
It is less than the feature F of preset threshold by significance valueiConstitute statistical nature set Fm
Ensemble feature selection processing:
Obtain statistical nature set FmIn each feature FiProminence score mean valueDifferent random number seeds is set, It include statistical nature set F based on random number seed selectionmTraining data, establish gradient and promote tree-model, output statistics is special F is closed in collectionmIn each feature FiProminence score Score under current random number seedi, under all random number seeds Prominence score ScoreiIt is averaged to obtain each feature FiProminence score mean value
Based on preset prominence score threshold value, by statistical nature set FmIn prominence score mean valueGreater than important Property scoring threshold value feature FiConstitute important feature set Fe
2. the method as described in claim 1, which is characterized in that feature FiSignificance value calculation specifically:
Based on feature FiCharacteristic attribute feature F is calculated using different metric formiSignificance value;
It is the feature F of classified variable for characteristic attributei, first determine whether feature FiIt is ordered into classified variable or unordered classification becomes Amount, if feature FiFor ordered categorization variable, then Mann-Whitney U checking computation feature F is usediSignificance value;If FiIt is Unordered classified variable then calculates feature F using Chi-square TestiSignificance value;
It is the feature F of continuous variable for characteristic attributei, KS verification characteristics F is used firstiDistribution whether Normal Distribution, If Normal Distribution, the T checking computation feature F of independent sample is usediSignificance value;Otherwise, using Mann- Whitney U checking computation feature FiSignificance value.
3. method according to claim 1 or 2, which is characterized in that the preferred arrangement of prominence score threshold value are as follows:
Initial threshold is set as 0, using Method for Feature Selection backward, gradually selectively increases threshold value, obtains special under corresponding threshold value Collection is closed, and to each threshold value character pair set, is established gradient and promoted tree-model, obtain assessment of the model on test set and refer to Scale value selects characteristic in all character pair set of the difference of satisfaction and maximum evaluation index value within an acceptable range Least characteristic set corresponds to threshold value as feature prominence score threshold value.
4. method as claimed in claim 3, which is characterized in that the preferred arrangement of prominence score threshold value are as follows:
The arbitrary width of threshold value growth or the value of fixed step size, and initialization threshold are set0=0, walk identifier d=1;
Calculate the threshold value threshold of d stepd=thresholdd-1+ step, wherein step indicates the arbitrary width that threshold value increases Or the value of fixed step size, by statistical nature set FmMiddle prominence score mean valueGreater than threshold value thresholddFeature FiConstitute the candidate characteristic set F of d stephd
Identifier j is walked from increasing 1, continues to calculate thresholddAnd candidate characteristic set Fhd, until d reaches preset maximum step Number;
To the candidate characteristic set F of obtained nonvoid sethj, using including candidate characteristic set FhjTraining data, establish gradient promotion Tree-model obtains gradient and promotes evaluation index value V of the tree-model on independent test collectionj, wherein subscript j is the candidate of nonvoid set The identifier of feature set;
According to formulaFrom all thresholdjThe corresponding mark of middle selection Accord with j*'sAs prominence score threshold value, wherein Δ indicates preset deviation threshold, | Fhj| indicate candidate characteristic set FhjCharacteristic.
5. method as claimed in claim 4, which is characterized in that will stop calculating thresholddAnd the knot of candidate characteristic set Beam condition replaces with current candidate feature set FhdFor empty set or current thresholddEqual to or more than statistical nature set Fm In maximum prominence score mean value.
6. method as claimed in claim 4, which is characterized in that when screening the candidate characteristic set of d step, first by statistical nature Set FmIn each feature sort from small to large by prominence score mean value, constitute initial candidate characteristic set Fh, then again first Beginning candidate feature set FhThe candidate characteristic set of middle screening d step.
CN201810552686.3A 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data Expired - Fee Related CN108962382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810552686.3A CN108962382B (en) 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810552686.3A CN108962382B (en) 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data

Publications (2)

Publication Number Publication Date
CN108962382A true CN108962382A (en) 2018-12-07
CN108962382B CN108962382B (en) 2022-05-03

Family

ID=64492813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810552686.3A Expired - Fee Related CN108962382B (en) 2018-05-31 2018-05-31 Hierarchical important feature selection method based on breast cancer clinical high-dimensional data

Country Status (1)

Country Link
CN (1) CN108962382B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363333A (en) * 2019-06-21 2019-10-22 南京航空航天大学 The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree
CN111383766A (en) * 2018-12-28 2020-07-07 中山大学肿瘤防治中心 Computer data processing method, device, medium and electronic equipment
WO2021000958A1 (en) * 2019-07-04 2021-01-07 华为技术有限公司 Method and apparatus for realizing model training, and computer storage medium
CN112309571A (en) * 2020-10-30 2021-02-02 电子科技大学 Screening method of prognosis quantitative characteristics of digital pathological image

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059508A1 (en) * 2006-08-30 2008-03-06 Yumao Lu Techniques for navigational query identification
CN102999760A (en) * 2012-09-28 2013-03-27 常州工学院 Target image area tracking method for on-line self-adaptive adjustment of voting weight
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN107256245A (en) * 2017-06-02 2017-10-17 河海大学 Improved and system of selection towards the off-line model that refuse messages are classified
CN107316205A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN107679549A (en) * 2017-09-08 2018-02-09 第四范式(北京)技术有限公司 Generate the method and system of the assemblage characteristic of machine learning sample
CN107729915A (en) * 2017-09-08 2018-02-23 第四范式(北京)技术有限公司 For the method and system for the key character for determining machine learning sample
CN107909433A (en) * 2017-11-14 2018-04-13 重庆邮电大学 A kind of Method of Commodity Recommendation based on big data mobile e-business
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059508A1 (en) * 2006-08-30 2008-03-06 Yumao Lu Techniques for navigational query identification
CN102999760A (en) * 2012-09-28 2013-03-27 常州工学院 Target image area tracking method for on-line self-adaptive adjustment of voting weight
CN106650314A (en) * 2016-11-25 2017-05-10 中南大学 Method and system for predicting amino acid mutation
CN107316205A (en) * 2017-05-27 2017-11-03 银联智惠信息服务(上海)有限公司 Recognize humanized method, device, computer-readable medium and the system of holding
CN107256245A (en) * 2017-06-02 2017-10-17 河海大学 Improved and system of selection towards the off-line model that refuse messages are classified
CN107679549A (en) * 2017-09-08 2018-02-09 第四范式(北京)技术有限公司 Generate the method and system of the assemblage characteristic of machine learning sample
CN107729915A (en) * 2017-09-08 2018-02-23 第四范式(北京)技术有限公司 For the method and system for the key character for determining machine learning sample
CN107909433A (en) * 2017-11-14 2018-04-13 重庆邮电大学 A kind of Method of Commodity Recommendation based on big data mobile e-business
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHIBIN XIAO 等: "Identifying Different Transportation Modes from Trajectory Data Using Tree-Based Ensemble Classifiers", 《ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION》 *
关鹏洲 等: "基于集成学习和深度学习的短期降雨预测模型", 《2017年(第五届)全国大学生统计建模大赛获奖论文选》 *
杜健: "后腹腔镜下肾部分切除术治疗中央型及外周型肾肿瘤的临床对比研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111383766A (en) * 2018-12-28 2020-07-07 中山大学肿瘤防治中心 Computer data processing method, device, medium and electronic equipment
CN110363333A (en) * 2019-06-21 2019-10-22 南京航空航天大学 The prediction technique of air transit ability under the influence of a kind of weather based on progressive gradient regression tree
WO2021000958A1 (en) * 2019-07-04 2021-01-07 华为技术有限公司 Method and apparatus for realizing model training, and computer storage medium
CN112309571A (en) * 2020-10-30 2021-02-02 电子科技大学 Screening method of prognosis quantitative characteristics of digital pathological image

Also Published As

Publication number Publication date
CN108962382B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN108962382A (en) A kind of layering important feature selection method based on breast cancer clinic high dimensional data
CN106815481B (en) Lifetime prediction method and device based on image omics
CN108257135A (en) The assistant diagnosis system of medical image features is understood based on deep learning method
CN104951894B (en) Hospital's disease control intellectual analysis and assessment system
CN107463771B (en) Case grouping method and system
CN104636631B (en) A kind of device using diabetes system big data prediction diabetes
CN109785928A (en) Diagnosis and treatment proposal recommending method, device and storage medium
CN107748900A (en) Tumor of breast sorting technique and device based on distinction convolutional neural networks
CN107203999A (en) A kind of skin lens image automatic division method based on full convolutional neural networks
CN110070540A (en) Image generating method, device, computer equipment and storage medium
CN108304887A (en) Naive Bayesian data processing system and method based on the synthesis of minority class sample
CN115100467B (en) Pathological full-slice image classification method based on nuclear attention network
CN110245657A (en) Pathological image similarity detection method and detection device
CN108509982A (en) A method of the uneven medical data of two classification of processing
CN106529165A (en) Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix
CN110859624A (en) Brain age deep learning prediction system based on structural magnetic resonance image
CN103678534A (en) Physiological information and health correlation acquisition method based on rough sets and fuzzy inference
CN109599181A (en) A kind of Prediction of survival system and prediction technique being directed to T3-LARC patient before the treatment
Rastogi et al. Brain tumor segmentation and tumor prediction using 2D-Vnet deep learning architecture
CN114926396B (en) Mental disorder magnetic resonance image preliminary screening model construction method
JP2024043567A (en) Training method, training device, electronic device, storage medium, and pathological image classification system for pathological image feature extractor based on feature separation
Xiang et al. A novel weight pruning strategy for light weight neural networks with application to the diagnosis of skin disease
CN110236497A (en) A kind of fatty liver prediction technique based on tongue phase and BMI index
CN106570325A (en) Partial-least-squares-based abnormal detection method of mammary gland cell
Ramos et al. Fast and smart segmentation of paraspinal muscles in magnetic resonance imaging with CleverSeg

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220503