CN111145912A - Machine learning-based prediction device for personalized ovulation promotion scheme - Google Patents

Machine learning-based prediction device for personalized ovulation promotion scheme Download PDF

Info

Publication number
CN111145912A
CN111145912A CN201911337735.2A CN201911337735A CN111145912A CN 111145912 A CN111145912 A CN 111145912A CN 201911337735 A CN201911337735 A CN 201911337735A CN 111145912 A CN111145912 A CN 111145912A
Authority
CN
China
Prior art keywords
model
scheme
learner
prediction
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911337735.2A
Other languages
Chinese (zh)
Other versions
CN111145912B (en
Inventor
吴健
陈晋泰
陈婷婷
冯芮苇
应豪超
雷璧闻
刘雪晨
宋庆宇
曹燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201911337735.2A priority Critical patent/CN111145912B/en
Publication of CN111145912A publication Critical patent/CN111145912A/en
Application granted granted Critical
Publication of CN111145912B publication Critical patent/CN111145912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B10/00Other methods or instruments for diagnosis, e.g. instruments for taking a cell sample, for biopsy, for vaccination diagnosis; Sex determination; Ovulation-period determination; Throat striking implements
    • A61B10/0012Ovulation-period determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Computing Systems (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a prediction device of an individual ovulation induction scheme based on machine learning, which comprises a computer memory and a computer processor, wherein an ovulation induction scheme prediction model is stored in the computer memory, and the ovulation induction scheme prediction model comprises a trained primary learner and a trained secondary learner; the primary learner consists of an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model, and the secondary learner adopts a Catboost model; the computer processor, when executing the computer program, performs the steps of: performing characteristic engineering processing on clinical characteristic data to be detected, inputting the processed characteristic data into a primary learner for calculation, and obtaining predicted values of the five models; and calculating the 5 predicted values by adopting a trained secondary learner to obtain a final predicted result. By utilizing the method, the prediction accuracy of the ovulation induction scheme can be improved.

Description

Machine learning-based prediction device for personalized ovulation promotion scheme
Technical Field
The invention belongs to the field of medical artificial intelligence, and particularly relates to a prediction device of an individual superovulation-promoting scheme based on machine learning.
Background
The rapid development of reproductive medicine in the last 30 years has stabilized the clinical pregnancy rate and embryo implantation rate using the technique of tube babies, medically known as the in vitro fertilization-embryo transfer technique (IVF-ET). Implementation of the ovarian hyperstimulation protocol (COS protocol, medical terminology) is a very important step in the course of tube infants, which determines the quantity and quality of eggs obtained later. Thus, the COS protocols are more emphasized for personalized applications, i.e., protocols tailored to each patient need to be tailored to the patient's physical characteristics and conditions.
In selecting the COS scheme, at present, doctors basically make different schemes for different people by observing the physical signs of patients and daily reactions after medication in China. However, this method requires doctors to have deep and solid knowledge and experience in internal secretion of reproduction to ensure the quantity and quality of ova and embryos obtained in later period to a certain extent. However, according to the current medical resource judgment in China, the number of doctors and patients is extremely unbalanced, and doctors with abundant experience are fewer, so that the selection of the scheme for different patients is unstable, and finally, the success rate of the test tube infants is affected.
With the great development of machine learning in the field of artificial intelligence, the machine learning method is widely applied to medical data.
In machine learning, a model captures relevant information from a sample. For a given task, the sample gives an input (feature) and an output (label). Machine learning algorithms learn from observations and then a computer decides how to map features to labels to create a generalized model so that new tasks can be performed correctly on unseen input (e.g., never treated patients).
The classification algorithm is a common and important task in the machine learning algorithm, namely predicting the class to which the algorithm belongs according to a sample. Since the linear discriminant analysis algorithm was proposed in the 30's of the last century, various classification algorithms are developed, including Logistic regression models, COX models, and other linear models, decision trees, RandomForest, and various classification tree models based on boosting, as well as Neural Networks (NN).
With the progress of the technology, different algorithms have been developed in specific fields according to their own characteristics, but various problems are encountered in the process of being specifically applied to different scenes, and different difficulties need to be overcome for the application scenes.
Until now, the application of a machine learning algorithm in the research of the ovulation triggering scheme does not appear, and in order to improve the accuracy and the high efficiency of the selection of the ovulation triggering scheme, a system for predicting the ovulation triggering scheme needs to be designed urgently.
Disclosure of Invention
The invention provides a prediction device of an individual ovulation induction scheme based on machine learning, improves the prediction accuracy of the ovulation induction scheme, and provides an effective suggestion for a doctor to select the ovulation induction scheme.
A prediction device of an individualized ovarian hyperstimulation scheme based on machine learning comprises a computer memory, a computer processor and a computer program which is stored in the computer memory and can be executed on the computer processor, wherein a ovarian hyperstimulation scheme prediction model is stored in the computer memory, and comprises a trained primary learner and a trained secondary learner; the primary learner consists of an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model, and the secondary learner adopts a Catboost model;
the computer processor, when executing the computer program, performs the steps of:
carrying out characteristic engineering processing on the clinical characteristic data to be detected, wherein the characteristic engineering processing comprises abnormal value processing, missing value processing and characteristic combination calculation;
inputting the processed clinical characteristic data into a primary learner for calculation to obtain predicted values of the five models;
and calculating the 5 predicted values by adopting a trained secondary learner to obtain a final predicted result.
The prediction device of the invention makes full use of different algorithms to make full use of different observations of data from different data space angles and data structure angles to make up for deficiencies and optimize results, thereby improving the prediction accuracy of the final ovulation induction scheme, reducing the overfitting degree of the whole model by the fusion of multiple models, and assisting a doctor in making decisions of the ovulation induction scheme.
The training process of the primary learner and the secondary learner is as follows:
collecting all clinical characteristic data of a patient who adopts the superovulation therapy to perform assisted reproduction from admission to the time when a treatment result is obtained after the superovulation therapy; judging all patient records according to a professional doctor, determining patients with the required egg number and quality, and bringing clinical characteristic data and the adopted ovulation induction scheme into sample data; classifying and labeling the samples according to an ovulation induction scheme, wherein the sample adopting a long scheme is labeled as 0, a short scheme is labeled as 1, an ultra-long scheme is labeled as 2, an antagonist scheme is labeled as 3, an ultra-short scheme is labeled as 4, and a micro-stimulation scheme is labeled as 5 to form a training set;
respectively inputting the clinical characteristic data in the training set into an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model of a primary learner after characteristic engineering processing is carried out on the clinical characteristic data, respectively obtaining a predicted value, taking the 5 predicted values as the input of a Catboost model of a secondary learner, and calculating to obtain a final predicted value; each model calculates a cross entropy loss function according to its predicted value and the label value of the sample, thereby updating the model parameters according to the loss function.
Furthermore, during model training, an oversampling method and a cross-validation method are adopted to train the superovulation scheme prediction model, so that the balance and stability of model training are improved.
When a cross validation method is adopted to train the superovulation scheme prediction model, training an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model in a primary learner by adopting 5-fold cross validation; after training is complete, 5 models are generated for each model, and 25 models are generated by the primary learner.
In the model training process, each model in the primary learner obtains importance ranking of each feature on the superovulation scheme prediction model through calculation, and the feature importance ranking results of each model are averaged to obtain the final feature importance ranking.
The ovulation induction scheme prediction model can be trained on line and then stored in a prediction device;
or online training is completed, and the received clinical characteristic data to be predicted in each application is used as a training sample after characteristic engineering, so that the prediction model is optimized and updated.
In the feature engineering process of clinical feature data of the present invention, the abnormal value process specifically includes: feature data outside the medical range is processed as null values.
The missing value processing specifically comprises the following steps: for continuous characteristic missing data, adopting an average filling method, a median filling method, a mode filling method and a nearest neighbor filling method; for discrete feature missing data, a mode filling method and a nearest neighbor filling method are adopted.
The feature combination calculation specifically comprises: the two data of height and weight are combined into a new characteristic index, and the two data of basal follicle stimulating hormone and luteinizing hormone are combined into a new characteristic index.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention integrates a plurality of characteristic data of a plurality of patients by using a machine learning algorithm, learns favorable information from past success cases, automates the process of the ovulation induction scheme, and helps doctors select a more appropriate personalized ovulation induction scheme for the patients who are treated for pregnancy by the IVF-ET technology.
2. The prediction model of the personalized ovulation induction scheme provided by the invention integrates the advantages of 5 models, improves the prediction accuracy of the ovulation induction scheme, and provides an effective suggestion for a doctor to select the ovulation induction scheme. In addition, the prediction model of the ovulation induction scheme can also output the importance ranking of the characteristics, provides more specific reference for doctors to design a treatment scheme more suitable for patients, and fills the blank of machine learning in the application of the personalized ovulation induction scheme.
Drawings
Fig. 1 is a schematic flow chart of the implementation of the prediction device of the personalized ovarian hyperstimulation scheme based on machine learning.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
The embodiment provides a prediction device of an individualized ovarian hyperstimulation scheme based on machine learning, which comprises a computer memory, a computer processor and a computer program which is stored in the computer memory and can be executed on the computer processor, wherein a hyperstimulation scheme prediction model is stored in the computer memory, and is obtained in the following three stages on line or off line:
stage 1: reception and preprocessing of data
The characteristic data is from clinical records of all patients who are subjected to IVF-ET treatment in eight years from 2010 to 2017 in the procreation department of a woman health care hospital, and specifically comprises basic information (height, weight and age) of male/female, blood routine indexes (female indexes are not indicated), biochemical indexes, infertility reasons, sex hormones, other ovarian function prediction indexes, male smoking history, male alcoholism history, family diabetes history, family hypertension history and the like. The system comprises a plurality of large-class direction data, and a plurality of independent features under the large-class directions. Firstly, according to the quantity and quality of the finally obtained ova, the treatment record of each patient allows a professional doctor to judge whether the record is a treatment case meeting the standard, and the record meeting the standard is included in a sample to be analyzed. All samples were classified into 6 categories according to the hyperstimulation protocol, with the sample using the long protocol labeled 0, the short protocol labeled 1, the very long protocol labeled 2, the antagonist protocol labeled 3, the very short protocol labeled 4, and the microstimulation protocol labeled 5.
And (2) stage: construction of training samples
For the collected feature data, firstly, the text feature data is classified and coded. Secondly, all features are processed with outliers and missing values.
Specifically, the discrete data is first subjected to a thermal encoding process, and data with irregular content or format is subjected to a null value process. Secondly, carrying out abnormal value detection on the continuous characteristic data, and carrying out null value processing on the data beyond the medical scope; then, aiming at missing continuous characteristic data, filling processing is carried out by adopting methods such as average filling, median filling, mode filling, nearest neighbor filling and the like; and (4) performing mode filling processing and nearest neighbor filling methods aiming at missing discrete feature data.
Aiming at the processed characteristic data, new characteristics are generated by combining part of characteristics, for example, height and weight can be combined into Body Mass Index (BMI), and basic follicle stimulating hormone and luteinizing hormone can be combined into basic follicle stimulating hormone/luteinizing hormone, and the like.
And performing correlation detection processing on the processed feature data to remove redundant features with high correlation. For example, the Pearson correlation coefficient of the white blood cell count and the neutrophil count in the blood routine can be 0.9 or more, and one of the highly correlated feature pairs is retained. It should be noted that, in this embodiment, the threshold of the correlation coefficient is 0.8, and if the threshold is higher than 0.8, it is considered that the two features are highly correlated, and the rejection can be performed. The correlation of clinical features can be obtained from statistical knowledge, or from medical experience knowledge.
After the characteristic data is processed, a group of clinical characteristic data corresponding to each patient is a training sample.
And (3) stage: construction of personalized ovulation induction scheme prediction model
The prediction model of the personalized ovulation promotion scheme adopts a stacking model (stacking frame), the first layer is a primary learner, and 5 models are adopted: SVM model, ExtraTrees model, RandomForest model, LightGBM model and XGBosost model. The SVM model is one of the most classical classification algorithms of machine learning in recent decades, and has excellent performance on small-scale high-latitude data classification problems. The ExtraTrees model greatly improves the respective generalization capability by the characteristics that the ExtraTrees model obtains the bifurcation attribute completely randomly and the RandomForest model by the idea that the random selection characteristic and the random sample sampling are adopted when the sub-decision tree is constructed. The XGBost and the LightGBM are different implementations of the gradient descent boosting decision tree (GBDT), perform different optimization processes aiming at the same target, and have excellent performances in a plurality of data mining tasks and competitions.
The Catboost model in the secondary learner is also an improvement over GBDT, and performs no better than XGBost and LightGBM in each big contest, even a little better.
In the present invention, regarding the stacking structure adopted by the prediction model, it should be emphasized that each model in the primary learner must be "quasi-different", that is, each model should have a high prediction accuracy, and the correlation degree between each model cannot be too high, so that the respective advantages of each model can be combined, and no redundant information is generated.
The purpose of the secondary learner is to fuse the information learned by each model in the primary learner and do further learning so that the secondary learner no longer trains using the original training data, which reduces the risk of overfitting.
Particularly, the 5 models adopted by the primary learner in the embodiment all have differences in design principle, and meet the requirements of "quasi-different" after accuracy testing.
Wherein, the ExtraTrees model, the RandomForest model and the SVM model are provided by scimit-left library, and the XGBosost model, the LightGBM model and the Catboost model are provided by respective development kits.
And next, training the constructed personalized ovulation induction scheme prediction model by using the training sample constructed in the stage 2.
In particular, since there are fewer patients using the micro-stimulation scheme in practical situations, there is a case of uneven distribution of training samples, and for this reason, the present embodiment uses an oversampling method to increase the sample balance. And then starting model training based on the data after the equalization processing.
Specifically, 5 models in the primary learner are first trained using 5-fold cross-validation for each model. That is, training samples are randomly divided into 5 equal parts, 4 parts are taken out as training sets, and the rest 1 part is taken as a verification set, so that 5 combinations of the training sets and the verification sets are generated.
Each cross validation process was: training the model based on the training set, predicting the verification set based on the model generated by the training set, and storing the model of each cross training. After the cross-validation training is completed, each model generates data with the number of rows being the total sample length (the sum of the lengths of 5 validation sets) and the number of columns being 1. And combining the data generated by each model in columns to finally form data with the number of rows being the whole sample length and the number of columns being 5 as training samples of the secondary learner. After the cross-validation training is completed, 5 models are generated for each model, and 25 models are generated by the primary learner.
Next, taking the data generated by the primary learner with the number of lines being the length of all samples and the number of columns being 5 as training samples (the sample labels are still the original labels), for the secondary learner: the Catboost model was trained.
And (3) obtaining a trained primary learner through training optimization: SVM models (5), extratress models (5), RandomForest models (5), LightGBM models (5), XGboost models (5), and secondary learner castboost models (1).
The trained personalized ovulation induction scheme prediction model has high accuracy, and can provide effective suggestions for a doctor to select the personalized ovulation induction scheme to a certain extent.
In the training process of the primary learner, each model can rank the importance of the features by calculating the information entropy. The ranking results of all models are averaged, and the final importance ranking result of each feature can be calculated. The characteristic sorting result can suggest that a doctor pays more attention to the index sorted in the front, so that the doctor can be assisted to design a treatment scheme for a patient in a targeted manner.
The obtained personalized ovarian hyperstimulation scheme prediction model is stored in a memory of the prediction device, as shown in figure 1. When the method is applied, after abnormal values, missing value processing, feature combination and other feature engineering are carried out on feature data of a patient, the feature data are respectively input into an ExtraTrees model, a RandomForest model, an SVM model, an XGBost model and a LightGBM model in a primary learner, 5 predicted values are obtained after calculation of each model, 1 predicted value is obtained after averaging of the 5 predicted values, and finally the primary learner outputs the 5 predicted values. Further, 5 predicted values are input to the secondary learner: the Catboost model, the calculated output, yields the final category of the record for this example.
When the personalized ovulation-promoting scheme prediction model is trained on line, the received characteristic data to be predicted in each application is processed to serve as a training sample, and the personalized ovulation-promoting scheme prediction model is optimized and updated.
The personalized ovulation induction scheme prediction model integrates the advantages of 5 models, improves the prediction accuracy of the ovulation induction scheme, and provides effective suggestions for a doctor to select the ovulation induction scheme. In addition, the prediction model can output the importance ranking of the features, and provides more specific reference for a doctor to design a treatment scheme more suitable for a patient.
In this embodiment, the model training in the primary learner uses 5-fold cross validation, which may be 3-fold, 10-fold, or other folds, depending on the training effect.
The computer processor in this embodiment may be any type of processor, and the Memory may be a Random Access Memory (RAM), a Read Only Memory (ROM), a Flash Memory (Flash Memory), a first-in first-out Memory (FIFO), a first-in last-out Memory (FILO), and the like.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A prediction apparatus for a machine learning-based personalized ovarian hyperstimulation protocol, comprising a computer memory, a computer processor and a computer program stored in said computer memory and executable on said computer processor, characterized in that:
the computer memory is stored with a prediction model of the ovulation induction scheme, and the prediction model of the ovulation induction scheme comprises a trained primary learner and a trained secondary learner; the primary learner consists of an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model, and the secondary learner adopts a Catboost model;
the computer processor, when executing the computer program, performs the steps of:
carrying out characteristic engineering processing on the clinical characteristic data to be detected, wherein the characteristic engineering processing comprises abnormal value processing, missing value processing and characteristic combination calculation;
inputting the processed clinical characteristic data into a primary learner for calculation to obtain predicted values of the five models;
and calculating the 5 predicted values by adopting a trained secondary learner to obtain a final predicted result.
2. The device for predicting the machine learning-based personalized ovarian hyperstimulation protocol according to claim 1, wherein the training process of the primary learner and the secondary learner is as follows:
collecting all clinical characteristic data of a patient who adopts the superovulation therapy to perform assisted reproduction from admission to the time when a treatment result is obtained after the superovulation therapy; judging all patient records according to a professional doctor, determining patients with the required egg number and quality, and bringing clinical characteristic data and the adopted ovulation induction scheme into sample data; classifying and labeling the samples according to an ovulation induction scheme, wherein the sample adopting a long scheme is labeled as 0, a short scheme is labeled as 1, an ultra-long scheme is labeled as 2, an antagonist scheme is labeled as 3, an ultra-short scheme is labeled as 4, and a micro-stimulation scheme is labeled as 5 to form a training set;
respectively inputting the clinical characteristic data in the training set into an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model of a primary learner after characteristic engineering processing is carried out on the clinical characteristic data, respectively obtaining a predicted value, taking the 5 predicted values as the input of a Catboost model of a secondary learner, and calculating to obtain a final predicted value; each model calculates a cross entropy loss function according to its predicted value and the label value of the sample, thereby updating the model parameters according to the loss function.
3. The device for predicting the personalized ovarian hyperstimulation scheme based on the machine learning, which is characterized in that an oversampling method and a cross validation method are adopted to train a ovarian hyperstimulation scheme prediction model.
4. The prediction device of the personalized ovulation induction scheme based on the machine learning as claimed in claim 3, wherein when the ovulation induction scheme prediction model is trained by adopting a cross validation method, 5-fold cross validation is adopted to train an SVM model, an ExtraTrees model, a RandomForest model, a LightGBM model and an XGboost model in a primary learner; after training is complete, 5 models are generated for each model, and 25 models are generated by the primary learner.
5. The prediction device for the personalized ovulation induction scheme based on the machine learning as claimed in claim 1 or 2, wherein in the model training process, each model in the primary learner obtains importance ranking of each feature on the ovulation induction scheme prediction model through calculation, and the feature importance ranking results of each model are averaged to obtain the final feature importance ranking.
6. The prediction device for the personalized ovarian hyperstimulation scheme based on the machine learning as claimed in claim 1 or 2, wherein the ovarian hyperstimulation scheme prediction model is trained on line and then stored in the prediction device;
or online training is completed, and the received clinical characteristic data to be predicted in each application is used as a training sample after characteristic engineering, so that the prediction model is optimized and updated.
7. The device for predicting the machine learning-based personalized ovarian hyperstimulation scheme according to claim 1, wherein the abnormal value processing is specifically as follows: feature data outside the medical range is processed as null values.
8. The prediction device for machine learning-based personalized ovarian hyperstimulation protocol according to claim 1, wherein the deficiency value processing is specifically as follows: for continuous characteristic missing data, adopting an average filling method, a median filling method, a mode filling method and a nearest neighbor filling method; for discrete feature missing data, a mode filling method and a nearest neighbor filling method are adopted.
9. The prediction device for machine learning-based personalized ovarian hyperstimulation protocol according to claim 1, wherein the feature combination calculation is specifically as follows:
the two data of height and weight are combined into a new characteristic index, and the two data of basal follicle stimulating hormone and luteinizing hormone are combined into a new characteristic index.
CN201911337735.2A 2019-12-23 2019-12-23 Machine learning-based prediction device for personalized ovulation promotion scheme Active CN111145912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911337735.2A CN111145912B (en) 2019-12-23 2019-12-23 Machine learning-based prediction device for personalized ovulation promotion scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911337735.2A CN111145912B (en) 2019-12-23 2019-12-23 Machine learning-based prediction device for personalized ovulation promotion scheme

Publications (2)

Publication Number Publication Date
CN111145912A true CN111145912A (en) 2020-05-12
CN111145912B CN111145912B (en) 2023-04-18

Family

ID=70519384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911337735.2A Active CN111145912B (en) 2019-12-23 2019-12-23 Machine learning-based prediction device for personalized ovulation promotion scheme

Country Status (1)

Country Link
CN (1) CN111145912B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933290A (en) * 2020-08-14 2020-11-13 苏州赫亚斯顿智能科技有限公司 Method and device for establishing artificial reproduction pregnancy prediction by machine learning model
CN112185555A (en) * 2020-09-10 2021-01-05 北京工业大学 Gestational diabetes prediction method based on stacking algorithm
CN112354042A (en) * 2020-12-01 2021-02-12 南通市肿瘤医院 Analgesia pump flow control method and device
CN112837826A (en) * 2020-12-30 2021-05-25 浙江大学温州研究院 Severe sequential organ failure scoring method and system based on machine learning
CN113317820A (en) * 2021-05-14 2021-08-31 杭州医学院 Follicle development prediction system in superovulation therapy based on artificial intelligence technology
CN113936801A (en) * 2021-10-18 2022-01-14 河北工业大学 Machine learning fusion-based general anesthesia induced contraction compression prediction method and system
CN117235673A (en) * 2023-11-15 2023-12-15 中南大学 Cell culture prediction method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807947A (en) * 2016-09-09 2018-03-16 索尼公司 The system and method for providing recommendation on an electronic device based on emotional state detection
CN109448855A (en) * 2018-09-17 2019-03-08 大连大学 A kind of diabetes glucose prediction technique based on CNN and Model Fusion
WO2019066421A2 (en) * 2017-09-27 2019-04-04 이화여자대학교 산학협력단 Dna copy number variation-based prediction method for kind of cancer
CN109635118A (en) * 2019-01-10 2019-04-16 博拉网络股份有限公司 A kind of user's searching and matching method based on big data
CN109637663A (en) * 2018-11-14 2019-04-16 浙江大学山东工业技术研究院 A kind of prediction meanss of the percutaneous coronary intervention (pci) Cardia cevent based on machine learning
CN110033860A (en) * 2019-02-27 2019-07-19 杭州贝安云科技有限公司 A kind of Inherited Metabolic Disorders recall rate method for improving based on machine learning
US10426442B1 (en) * 2019-06-14 2019-10-01 Cycle Clarity, LLC Adaptive image processing in assisted reproductive imaging modalities
US20190303795A1 (en) * 2018-03-29 2019-10-03 NEC Laboratories Europe GmbH Method and system for model integration in ensemble learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107807947A (en) * 2016-09-09 2018-03-16 索尼公司 The system and method for providing recommendation on an electronic device based on emotional state detection
WO2019066421A2 (en) * 2017-09-27 2019-04-04 이화여자대학교 산학협력단 Dna copy number variation-based prediction method for kind of cancer
US20190303795A1 (en) * 2018-03-29 2019-10-03 NEC Laboratories Europe GmbH Method and system for model integration in ensemble learning
CN109448855A (en) * 2018-09-17 2019-03-08 大连大学 A kind of diabetes glucose prediction technique based on CNN and Model Fusion
CN109637663A (en) * 2018-11-14 2019-04-16 浙江大学山东工业技术研究院 A kind of prediction meanss of the percutaneous coronary intervention (pci) Cardia cevent based on machine learning
CN109635118A (en) * 2019-01-10 2019-04-16 博拉网络股份有限公司 A kind of user's searching and matching method based on big data
CN110033860A (en) * 2019-02-27 2019-07-19 杭州贝安云科技有限公司 A kind of Inherited Metabolic Disorders recall rate method for improving based on machine learning
US10426442B1 (en) * 2019-06-14 2019-10-01 Cycle Clarity, LLC Adaptive image processing in assisted reproductive imaging modalities

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
盛雅兰: "基于组合分类器的肾病风险预测的应用研究" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933290A (en) * 2020-08-14 2020-11-13 苏州赫亚斯顿智能科技有限公司 Method and device for establishing artificial reproduction pregnancy prediction by machine learning model
CN111933290B (en) * 2020-08-14 2023-10-10 北京赫雅智能科技有限公司 Method and device for predicting artificial reproduction conception by machine learning model
CN112185555A (en) * 2020-09-10 2021-01-05 北京工业大学 Gestational diabetes prediction method based on stacking algorithm
CN112354042A (en) * 2020-12-01 2021-02-12 南通市肿瘤医院 Analgesia pump flow control method and device
CN112837826A (en) * 2020-12-30 2021-05-25 浙江大学温州研究院 Severe sequential organ failure scoring method and system based on machine learning
CN113317820A (en) * 2021-05-14 2021-08-31 杭州医学院 Follicle development prediction system in superovulation therapy based on artificial intelligence technology
CN113936801A (en) * 2021-10-18 2022-01-14 河北工业大学 Machine learning fusion-based general anesthesia induced contraction compression prediction method and system
CN117235673A (en) * 2023-11-15 2023-12-15 中南大学 Cell culture prediction method and device, electronic equipment and storage medium
CN117235673B (en) * 2023-11-15 2024-01-30 中南大学 Cell culture prediction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111145912B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111145912B (en) Machine learning-based prediction device for personalized ovulation promotion scheme
WO2021120936A1 (en) Chronic disease prediction system based on multi-task learning model
CN109920501B (en) Electronic medical record classification method and system based on convolutional neural network and active learning
CN104298651B (en) Biomedicine named entity recognition and protein interactive relationship extracting on-line method based on deep learning
CN111222340B (en) Breast electronic medical record entity recognition system based on multi-standard active learning
Siristatidis et al. Artificial intelligence in IVF: a need
CN109949929A (en) A kind of assistant diagnosis system based on the extensive case history of deep learning
TWI723868B (en) Method for applying a label made after sampling to neural network training model
Sanni et al. Analysis of performance metrics of heart failured patients using Python and machine learning algorithms
Durai et al. Liver disease prediction using machine learning
CN112420191A (en) Traditional Chinese medicine auxiliary decision making system and method
CN109213871A (en) Patient information knowledge mapping construction method, readable storage medium storing program for executing and terminal
CN111883258B (en) Method for constructing OHSS indexing parting prediction model
CN116936108A (en) Unbalanced data-oriented disease prediction system
CN111986814A (en) Modeling method of lupus nephritis prediction model of lupus erythematosus patient
CN109119155B (en) ICU death risk assessment system based on deep learning
CN113345581B (en) Cerebral apoplexy post thrombolysis bleeding probability prediction method based on ensemble learning
Oğur et al. Development of an artificial intelligence-supported hybrid data management platform for monitoring depression and anxiety symptoms in the perinatal period: Pilot-scale study
Du et al. The effects of deep network topology on mortality prediction
WO2021231044A1 (en) System and method for testing for sars-cov-2/covid-19 based on wearable medical sensors and neural networks
CN115547502B (en) Hemodialysis patient risk prediction device based on time sequence data
CN116313141A (en) Knowledge-graph-based intelligent inquiry method for unknown cause fever
Dong et al. Readmission prediction of diabetic patients based on AdaBoost-RandomForest mixed model
Magade et al. Automating Decision Process of Overnight Patient Care Using Hybrig Machine Learning Algorithms
Li et al. TLDA: A transfer learning based dual-augmentation strategy for traditional Chinese Medicine syndrome differentiation in rare disease

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant