CN117409963A - Premature infant feeding intolerance risk prediction method and system - Google Patents
Premature infant feeding intolerance risk prediction method and system Download PDFInfo
- Publication number
- CN117409963A CN117409963A CN202310123452.8A CN202310123452A CN117409963A CN 117409963 A CN117409963 A CN 117409963A CN 202310123452 A CN202310123452 A CN 202310123452A CN 117409963 A CN117409963 A CN 117409963A
- Authority
- CN
- China
- Prior art keywords
- variables
- characteristic variables
- sample
- intolerant
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002028 premature Effects 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000002496 gastric effect Effects 0.000 claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 27
- 238000002790 cross-validation Methods 0.000 claims abstract description 11
- 238000010219 correlation analysis Methods 0.000 claims abstract description 10
- 238000012360 testing method Methods 0.000 claims abstract description 10
- 230000035945 sensitivity Effects 0.000 claims description 11
- 239000008280 blood Substances 0.000 claims description 7
- 210000004369 blood Anatomy 0.000 claims description 7
- 208000008784 apnea Diseases 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 239000006041 probiotic Substances 0.000 claims description 6
- 235000018291 probiotics Nutrition 0.000 claims description 6
- 206010028923 Neonatal asphyxia Diseases 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 5
- 208000015181 infectious disease Diseases 0.000 claims description 4
- 238000005399 mechanical ventilation Methods 0.000 claims description 4
- 239000008267 milk Substances 0.000 claims description 4
- 210000004080 milk Anatomy 0.000 claims description 4
- 235000013336 milk Nutrition 0.000 claims description 4
- 206010020843 Hyperthermia Diseases 0.000 claims description 3
- 238000012352 Spearman correlation analysis Methods 0.000 claims description 3
- 230000037396 body weight Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000036031 hyperthermia Effects 0.000 claims description 3
- 230000000529 probiotic effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 25
- 230000005856 abnormality Effects 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- RYYVLZVUVIJVGH-UHFFFAOYSA-N caffeine Chemical compound CN1C(=O)N(C)C(=O)C2=C1N=CN2C RYYVLZVUVIJVGH-UHFFFAOYSA-N 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000002405 diagnostic procedure Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 206010005908 Body temperature conditions Diseases 0.000 description 2
- 206010070511 Hypoxic-ischaemic encephalopathy Diseases 0.000 description 2
- 208000032571 Infant acute respiratory distress syndrome Diseases 0.000 description 2
- LPHGQDQBBGAPDZ-UHFFFAOYSA-N Isocaffeine Natural products CN1C(=O)N(C)C(=O)C2=C1N(C)C=N2 LPHGQDQBBGAPDZ-UHFFFAOYSA-N 0.000 description 2
- 208000006098 Neonatal Hyperbilirubinemia Diseases 0.000 description 2
- 201000006346 Neonatal Jaundice Diseases 0.000 description 2
- 208000037212 Neonatal hypoxic and ischemic brain injury Diseases 0.000 description 2
- 206010061308 Neonatal infection Diseases 0.000 description 2
- 206010028974 Neonatal respiratory distress syndrome Diseases 0.000 description 2
- 206010048596 Umbilical cord abnormality Diseases 0.000 description 2
- 210000004381 amniotic fluid Anatomy 0.000 description 2
- 239000003242 anti bacterial agent Substances 0.000 description 2
- 229940088710 antibiotic agent Drugs 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 208000009973 brain hypoxia - ischemia Diseases 0.000 description 2
- 229960001948 caffeine Drugs 0.000 description 2
- VJEONQKOZGKCAK-UHFFFAOYSA-N caffeine Natural products CN1C(=O)N(C)C(=O)C2=C1C=CN2C VJEONQKOZGKCAK-UHFFFAOYSA-N 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 210000002219 extraembryonic membrane Anatomy 0.000 description 2
- 230000001605 fetal effect Effects 0.000 description 2
- 210000003754 fetus Anatomy 0.000 description 2
- 235000020243 first infant milk formula Nutrition 0.000 description 2
- 235000013350 formula milk Nutrition 0.000 description 2
- 229940066294 lung surfactant Drugs 0.000 description 2
- 239000003580 lung surfactant Substances 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 208000030159 metabolic disease Diseases 0.000 description 2
- 201000002652 newborn respiratory distress syndrome Diseases 0.000 description 2
- 208000033300 perinatal asphyxia Diseases 0.000 description 2
- 210000002826 placenta Anatomy 0.000 description 2
- 230000002035 prolonged effect Effects 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 230000009278 visceral effect Effects 0.000 description 2
- 206010000060 Abdominal distension Diseases 0.000 description 1
- 208000032170 Congenital Abnormalities Diseases 0.000 description 1
- 208000002330 Congenital Heart Defects Diseases 0.000 description 1
- 206010011409 Cross infection Diseases 0.000 description 1
- 206010061619 Deformity Diseases 0.000 description 1
- 206010067125 Liver injury Diseases 0.000 description 1
- 238000004497 NIR spectroscopy Methods 0.000 description 1
- 206010047700 Vomiting Diseases 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000008321 arterial blood flow Effects 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 208000028831 congenital heart disease Diseases 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 210000002249 digestive system Anatomy 0.000 description 1
- 208000016097 disease of metabolism Diseases 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000005176 gastrointestinal motility Effects 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 231100000234 hepatic damage Toxicity 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000008818 liver damage Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000001363 mesenteric artery superior Anatomy 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 235000006286 nutrient intake Nutrition 0.000 description 1
- 235000016709 nutrition Nutrition 0.000 description 1
- 230000035764 nutrition Effects 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 235000016236 parenteral nutrition Nutrition 0.000 description 1
- 230000008506 pathogenesis Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000000287 tissue oxygenation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000008673 vomiting Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Abstract
The invention discloses a premature infant feeding intolerance risk prediction method, which comprises the following steps: obtaining gastrointestinal feeding intolerance information according to preset premature infant case information; according to whether gastrointestinal feeding is intolerant or not, intolerant variables are obtained, correlation analysis is carried out on each group of characteristic variables and intolerant variables, and characteristic variables with high correlation degree, namely sample characteristic variables, are obtained; calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables; dividing a training set and a testing set by adopting a 10-fold cross validation method for input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost; presetting a classification threshold, calculating the prediction probability of a sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding. The invention also discloses a system adopting the method.
Description
Technical Field
The invention relates to the technical field of intelligent prediction, in particular to a premature infant feeding intolerance risk prediction method and system.
Background
The premature infant is intolerant to feed, is a clinically common digestive system multiple symptom, is easier to feed than the term infant because of the immature gastrointestinal development and relatively slower development of gastrointestinal motility than digestion and absorption functions, and is manifested by vomiting, abdominal distension, gastric retention and the like after starting gastrointestinal feeding, which seriously affects early nutrition support treatment of the premature infant and brings great challenges to reasonable feeding of the premature infant. The research shows that the incidence rate of feeding intolerance of premature infants in China is 33.80-53.45%, and the incidence rate in foreign countries is about 25%. Feeding intolerance will lead to insufficient nutrient intake in premature infants, a long development delay of Gong Waisheng, and prolonged parenteral nutrition will also increase the incidence of complications such as nosocomial infections, metabolic disorders, liver damage, etc. Meanwhile, the hospitalization time is prolonged, the social and family economic burden is increased, and the survival rate and the life quality of premature infants are influenced.
Feeding intolerance is a common clinical symptom, the pathogenesis is complex, the influence factors are numerous, and the high-risk factors influencing feeding intolerance can be accurately identified and targeted prevention is a key measure for reducing feeding intolerance. Domestic Chen Qiong et al use logistic regression analysis modeling to predict the occurrence of premature infant feeding intolerance; li Yan et al, discuss the relationship of arterial blood flow changes on the mesenteric of premature infants before and after a meal to feeding tolerance using Spearmans-related regression analysis in the hope of predicting whether feeding intolerance occurs by gastrointestinal kinetics changes; in abroad, carlo developed early effective biomarkers for prediction, measuring visceral tissue oxygenation fraction with near infrared spectroscopy to predict premature feeding tolerance; valentina adopts a generalized linear model to evaluate the relationship between visceral blood oxygen saturation, superior mesenteric artery Doppler blood flow velocity measurement and feeding tolerance; bozzetti predicts the feeding tolerance of intrauterine limited newborns using logstin regression modeling.
From the above, most of the research of modeling by using the data mining method at home and abroad only uses a simple data mining method, the obtained model may not be the best result, and most of the models mainly use the research of the biomarker, so that the cost is high, the operation is complex, the requirements on manpower and material resources are high, and part of evaluation tools need to be predicted by combining with an imaging examination means, so that the economic burden of patients is increased, and the method does not accord with the hope of masses for a convenient and quick screening mode. Therefore, it is needed to construct a systematic, convenient and accurate prediction method to make up for the defects of the existing researches.
Disclosure of Invention
An object of the present invention is to propose a method for predicting the risk of feeding intolerance of premature infants which can be accurately predicted.
A method of predicting risk of feeding intolerance in premature infants comprising the steps of:
obtaining gastrointestinal feeding intolerance information according to preset premature infant case information, and obtaining a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
according to whether gastrointestinal feeding is intolerant or not, assigning a value to each premature infant case information to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables;
dividing a training set and a testing set by adopting a 10-fold cross validation method for input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
presetting a classification threshold, calculating the prediction probability of a sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding.
According to the premature infant feeding intolerance risk prediction method provided by the invention, the significance of the features is ordered by calculating the SHAP value of each feature, and the features are selected for model training, so that the problems of dimension disasters and noise caused by more features and the over-fitting problem caused by the increase of model complexity are overcome.
In addition, the premature infant feeding intolerance risk prediction method provided by the invention can also have the following additional technical characteristics:
further, the characteristic variables include the following sets:
body weight, gestational age, 1 minute apgar score, resuscitation history, neonatal asphyxia, NRDS, infection, PDA, PS use, probiotic use, blood transfusion, apnea, hyperthermia, abnormal interval between bowel movements, milk opening time and mechanical ventilation.
Further, the step of assigning a value to each of the premature infant case information based on whether the gastrointestinal feeding is intolerant, respectively, comprises:
if the premature infant is intolerant to feeding, assigning a first identification value;
otherwise, the second identification value is assigned.
Further, the step of performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation, namely sample characteristic variables, includes:
inputting each group of characteristic variables and intolerance variables into statistical software, and executing spearman correlation analysis to obtain a plurality of correlation coefficients ρi;
and acquiring a characteristic variable of ρi <0.05, and identifying the characteristic variable as a characteristic variable with high correlation, namely a sample characteristic variable.
Further, the step of calculating the shape value of each group of sample feature variables, and selecting the sample feature variables with large shape values to obtain the input feature variables includes:
calculating the shape values of each group of sample characteristic variables, and sorting all the shape values according to the order from large to small;
and acquiring n sample characteristic variables before the Shapley value ranking, and taking the sample characteristic variables as input characteristic variables, wherein n is a positive integer.
Further, the step of presetting the classification threshold value includes:
calculating about sign indexes P of the prediction model under different data sets by using about sign rules;
the critical point is determined by the maximum of the about-step index, and the average value of the about-step index P is taken as the best classification threshold Bestp of the model.
Further, the step of calculating the about sign index P of the predictive model under different data sets using about sign law includes:
calculating the sensitivity and specificity of the prediction model by using the two classification confusion matrix;
the method for calculating the about sign index P comprises the following steps: p=sensitivity+specificity-1.
Another object of the invention is to propose a system for predicting the risk of feeding intolerance of premature infants, comprising:
the characteristic variable acquisition module is used for acquiring gastrointestinal feeding intolerance information according to preset premature infant case information and acquiring a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
the sample characteristic variable acquisition module is used for respectively assigning a value to each premature infant case information according to whether gastrointestinal feeding is intolerant or not to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
the input characteristic variable acquisition module is used for calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables;
the model construction module is used for dividing a training set and a testing set by adopting a 10-fold cross validation method on input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
the prediction module is used for presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding if the prediction probability is larger than the classification threshold.
The beneficial effects of the invention are as follows: XGBoost belongs to Boosting integrated learning algorithm, and is formed by integrating CART regression tree models together, so that a strong classifier is formed, the accuracy is high, the running speed is high, the overfitting is reduced by using regularization technology, and abnormal value interference is avoided.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a first embodiment of the present invention;
FIG. 2 is a flow chart showing the steps of a method for predicting feeding intolerance of premature infants based on SHAP feature selection and XGBoost in accordance with a first embodiment of the present invention;
FIG. 3 is a schematic diagram showing SHAP additivity results;
FIG. 4 is a schematic diagram of feature importance ranking based on SHAP values;
FIG. 5 is a schematic diagram of SHAP-based feature abstracts;
FIG. 6 is a schematic diagram of a 10-fold cross-validation principle;
FIG. 7 is a schematic representation of the ROC curve of the method of the invention;
fig. 8 is a block diagram of a second embodiment of the present invention.
Detailed Description
In order that the objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Referring to fig. 1 and 2, a first embodiment of the present invention provides a method for predicting feeding intolerance risk of premature infants, comprising the following steps.
S1, obtaining gastrointestinal feeding intolerance information according to preset premature infant case information, and obtaining multiple groups of characteristic variables in the gastrointestinal feeding intolerance information.
In this example, the preset premature infant case information is premature infant case information of Neonatal Intensive Care Unit (NICU) hospitalization analyzed by hospital electronic case system, and infants suffering from gastrointestinal Feeding Intolerance (FI) and infants suffering from gastrointestinal feeding tolerance are selected.
Further, inclusion criteria for the pre-set premature infant case information are: (1) gestational age <37 weeks; (2) admission is made within 24 hours after birth; (3) the hospitalization time is more than or equal to 7 days. The exclusion criteria were: (1) serious digestive tract deformity, congenital heart disease, genetic metabolic disease and the like; (2) the infant who is not milked and is automatically abandoned in treatment.
In this embodiment, the plurality of sets of characteristic variables in the gastrointestinal feeding intolerance information may be collecting related risk factors affecting feeding intolerance of premature infants, the related risk factors being: (1) general condition of infant (sex, gestational age, birth weight, gestational time, apgar score one minute after birth, whether test tube infant, whether there is post-birth resuscitation history, body temperature condition); (2) infant mother conditions (gestational complications, amniotic fluid abnormality, placenta abnormality, umbilical cord abnormality, fetal membrane abnormality, production mode, assisted reproduction, whether multiple fetuses, fetal position, mother age); (3) diseases of infant after birth (neonatal asphyxia, neonatal respiratory distress syndrome, neonatal hypoxic ischemic encephalopathy, neonatal infection, neonatal hyperbilirubinemia, arterial catheter patent); (4) drug use cases (antibiotics, probiotics, lung surfactant PS, caffeine); (5) others (first milk break time, interval between two stool, ventilator use, blood transfusion, apnea). In other embodiments, the feature variable may be selected according to the actual situation.
S2, according to whether gastrointestinal feeding is intolerant or not, assigning value to each premature infant case information to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables.
In this example, sample characteristic variables include body weight, gestational age, 1 minute apgar score, resuscitation history, neonatal asphyxia, NRDS, infection, PDA, PS use, probiotic use, blood transfusion, apnea, hyperthermia, abnormal time between bowel movements, milk opening time, and mechanical ventilation. In practical operation, there are different situations for the sample feature variables, and the embodiment is not limited.
Specifically, the step of assigning a value to each of the premature infant case information based on whether gastrointestinal feeding is intolerant includes:
s21, if the premature infant is intolerant to feeding, assigning a first identification value;
s22, otherwise, assigning a second identification value.
In this embodiment, the case data acquired in step S1 is recorded by using the dual recording system and the automatic logic error correction system of the data of epidata3.1, and then is imported into the spss26.0 statistical software, and all variables are assigned from 0, so that the classified variables are changed into numerical variables.
Specifically, the first identification value is 1, and the second identification value is 0. In other embodiments, the identification value may be selected according to the actual situation.
In this embodiment, when assigning a value to each premature infant, the value is also assigned to the corresponding characteristic variable, and the assignment mode is shown in table 1.
TABLE 1
Variable(s) | Assignment of value |
Weight of body | 0=≤1.5kg,1=<2.5kg,2=≥2.5kg |
Gestational age | 0=<34w,1=≥34w |
1 minute apgar score | 0 =. Ltoreq.6 min, 1 =. Gtoreq.7 min |
History of resuscitation | 0 = none, 1 = have |
Newborn chamber rest | 0 = none, 1 = have |
NRDS | 0 = none, 1 = have |
Infection with | 0 = none, 1 = have |
PDA | 0 = none, 1 = have |
PS usage | 0 = none, 1 = have |
Probiotics | 0 = none, 1 = have |
Blood transfusion | 0 = none, 1 = have |
Apnea | 0 = none, 1 = have |
High body temperature | 0 = none, 1 = have |
Abnormal time between two stool | 0 = none, 1 = have |
Time for milk opening | 0=<24h,1=≥24h |
Mechanical ventilation | 0 = none, 1 = have |
With or without intolerance of feeding | 0 = none, 1 = have |
Specifically, the step of performing a correlation analysis on each set of feature variables and intolerant variables to obtain feature variables with high correlation, i.e., sample feature variables, includes:
s23, inputting each group of characteristic variables and intolerance variables into statistical software, and executing spearman correlation analysis to obtain a plurality of correlation coefficients ρi;
s24, obtaining characteristic variables with ρi less than 0.05, and identifying the characteristic variables as characteristic variables with high correlation degree, namely sample characteristic variables.
It should be noted that, the sample characteristic variable can be understood as a factor having statistical significance, and the influence of the variation is more remarkable than other factors.
S3, calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables.
Specifically, the step of calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables comprises the following steps:
s31, calculating the shape values of each group of sample characteristic variables, and sequencing all the shape values from large to small;
s32, acquiring n sample characteristic variables before the Shapley value ranking, and taking the sample characteristic variables as input characteristic variables, wherein n is a positive integer.
In this embodiment, n=13, and in other embodiments, n=13 may be selected according to practical situations.
Referring to fig. 4, the feature importance ranking implementation method of the present invention calculates Shapley values of each feature through SHAP, and then ranks the importance of the sample features. FIG. 5 shows a summary of features based on SHAP, which can reveal not only the importance of the effect, but also the general direction of the effect, providing great assistance in selecting features.
It can be understood that in this step, the statistically significant factors in step S2 are imported into the R language, and Shapley values of each sample feature are calculated by SHAP, where the larger the Shapley values, the more significant the influence on the model, so as to measure the importance of each feature on the final prediction result.
Referring to fig. 3, the main idea of shap is to solve the problem of distribution balance in the cooperative game theory, and for a machine learning model, the model obtains a predicted value for each sample, where the predicted result is a result commonly determined by the respective contribution value (Shapley value) of each feature. Let the ith sample be x i The kth feature of the ith sample is x ik The predicted value of the model to be explained on the sample is y i Then SHAP interpretation obeys the following equation:
wherein,as a baseline of the whole model output, the prediction result expectation of all training samples on the original model is that is, the prediction mean value of all samples, +.>For the final predicted value y for the kth feature pair in the ith sample i The contribution value of (i.e. shape), when +.>Indicating that this feature improves the predicted value, has a positive effect on the output, whereas when +.>This feature is described as decreasing the predicted value and acting in the opposite direction. Therefore, SHAP can not only reveal the importance of the effect, but also reflect the overall direction of the effect, and characterize the overall positive and negative relationship between the characteristic variable and the intolerance of premature feeding.
S4, dividing the training set and the testing set by adopting a 10-fold cross validation method for the input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost.
Referring to fig. 6, the "10-fold cross-validation method" is a common method for validating classifier performance. The original data set is divided into 10 parts on average, 9 different parts are selected as training sets each time, 1 part is left as a test set, then training, prediction and evaluation of the model are carried out, and the result of each time is recorded. The above procedure was repeated 10 times and the recorded 10 results were averaged as a final indicator of the quality of the assessment model. The 10-fold cross validation ensures that each sample is validated once, so that the influence of data set division can be reduced, and the stability and generalization capability of the model can be conveniently inspected.
It should be noted that XGBoost is a lifting tree model, which integrates CART regression tree models together to form a strong classifier. The idea is to continuously add trees, and continuously perform feature splitting to grow a tree, and add one tree at a time, which is actually to learn a new function to fit the residual error of the last prediction.
Assuming a total of t trees, F represents the tree model, then the predicted valueCan be expressed as:
the objective function is:
wherein l is a loss function representing an error between the predicted value and the actual value; omega is a regularization function that prevents model overfitting.
The regularization function in XGBoost is expressed as follows:
where T represents the number of leaf nodes per tree, w represents the weight of the leaves per tree, and gamma and lambda are added in order to suppress the growth of the tree and prevent model overfitting. λ is the L2 regularization coefficient and γ is the splitting threshold. According to the objective function, the optimal scoring function is obtained by means of a solution, wherein the smaller the output value of the function is, the better the tree model is:
one tree model can be evaluated according to a scoring function, but the candidate tree is endless and it is impossible to score all candidate trees. The XGBoost algorithm uses a greedy algorithm to solve this problem, starting from the root node of the tree, calculates whether the post-split and pre-split objective function values decrease, assuming the pre-split node is j,
its contribution to the objective function is:
after the node splits, the objective function contributions of the two child nodes are:
at this time, the objective function is changed to:
finally, the information gain of the objective function after each split is obtained:
wherein GL and GR are respectively left and right She Ziyi step statistics sums during splitting, and HL and HR are information gains of left and right leaf node second-order gradient statistics sums.
In addition, grid search is used to find optimal parameters when training the model. Different parameter settings can have a great influence on the prediction effect of the model, and the grid search establishes a search space according to the parameter values, so that the parameters are comprehensively searched, and the best effect is obtained once.
Referring to fig. 7, the present application further uses an ROC curve to verify the constructed model, where the ROC curve uses a false positive rate (1-specificity) as a horizontal axis, a true positive rate (sensitivity) as a vertical axis, and the points are connected according to points generated by different boundary values, and the area under the curve AUC can reflect the accuracy of the diagnostic test. The index value range is between 0.5 and 1. It is considered that when AUC is 0.5 to 0.7, diagnostic accuracy is generally indicated, when AUC is 0.7 to 0.8, diagnostic is moderate, and when AUC >0.8, diagnostic is better. As can be seen from fig. 7, the model constructed by the present invention is better diagnostic.
S5, presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding.
Specifically, the step of presetting the classification threshold value includes:
s51, calculating about sign indexes P of the prediction model under different data sets by adopting about sign rules;
s52, determining a critical point by using the about step index maximization, taking the average value of about step index P for 10 times, and taking the average value as the best classification threshold Bestp of the model.
Further, the step of calculating the about sign index P of the predictive model under different data sets using about sign rules includes:
s511, calculating the sensitivity and specificity of a prediction model by using a two-class confusion matrix;
s512, the method for calculating the about step index P comprises the following steps: p=sensitivity+specificity-1.
The confusion matrix is a visual tool, particularly used for supervised learning, and is a standard format for representing precision evaluation, and is represented by a matrix form of n rows and n columns, as shown in table 2 below.
TABLE 2
(1) Sensitivity: the identified positive examples are the proportion of all positive examples, i.e. the patient is judged as the patient, and no missed diagnosis occurs.
(2) Specificity: the identified negative examples are the proportion of all negative examples, namely, normal people are judged as normal people, and misjudgment does not occur.
The method for calculating the about step index P comprises the following steps: sensitivity + specificity-1 is the ability to find real patients and non-patients by integrating diagnostic methods to be evaluated minus the base "1". The larger the value, the better the diagnostic method to be evaluated.
The method for calculating the Bestp comprises the following steps:
it should be noted that constructing the model function y=f (x) of XGBoost, the model outputs a predictive probability P for each sample i . The data set provided by the invention has K characteristics, and each sample characteristic can be expressed as follows:
X i =(X i1 ,X i2 ,X i3 …X ik )
x in the above formula i For the ith sample, X ik For the kth feature of the ith sample, the predictive probability P for each sample i =f(X i ). When P i Above the Bestp value, the sample is predicted to be intolerant to feeding, otherwise it is predicted to be intolerant to feeding, resulting in a risk prediction of intolerance to feeding in premature infants.
In summary, the data are preprocessed based on SHAP feature selection, then the importance of the features is ordered by calculating the SHAP value of each feature, and the features are selected for model training, so that the problems of dimension disasters and noise caused by more features and over-fitting caused by increased complexity of the model are overcome; the model is trained and predicted by adopting a 10-fold cross validation method, so that the influence of data division on the model prediction result is reduced, and the accuracy of model prediction is more reliable and the generalization capability is stronger; in addition, the XGBoost integrated learning model is built, the grid search is used for parameter tuning, the maximum approximate index principle is used for determining the optimal critical point, so that the reliability and the prediction accuracy of the model are further improved, and the XGBoost integrated learning model has great significance in preventing feeding intolerance of premature infants.
Referring to fig. 8, a second embodiment of the present invention provides a system for predicting risk of feeding intolerance of premature infants, comprising:
the characteristic variable acquisition module is used for acquiring gastrointestinal feeding intolerance information according to preset premature infant case information and acquiring a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
the sample characteristic variable acquisition module is used for respectively assigning a value to each premature infant case information according to whether gastrointestinal feeding is intolerant or not to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
the input characteristic variable acquisition module is used for calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables;
the model construction module is used for dividing a training set and a testing set by adopting a 10-fold cross validation method on input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
the prediction module is used for presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding if the prediction probability is larger than the classification threshold.
In this example, the preset premature infant case information is premature infant case information of Neonatal Intensive Care Unit (NICU) hospitalization analyzed by hospital electronic case system, and infants suffering from gastrointestinal Feeding Intolerance (FI) and infants suffering from gastrointestinal feeding tolerance are selected.
In particular, the plurality of sets of characteristic variables in the gastrointestinal feeding intolerance information include
(1) General condition of infant (sex, gestational age, birth weight, gestational time, apgar score one minute after birth, whether test tube infant, whether there is post-birth resuscitation history, body temperature condition); (2) infant mother conditions (gestational complications, amniotic fluid abnormality, placenta abnormality, umbilical cord abnormality, fetal membrane abnormality, production mode, assisted reproduction, whether multiple fetuses, fetal position, mother age); (3) diseases of infant after birth (neonatal asphyxia, neonatal respiratory distress syndrome, neonatal hypoxic ischemic encephalopathy, neonatal infection, neonatal hyperbilirubinemia, arterial catheter patent); (4) drug use cases (antibiotics, probiotics, lung surfactant PS, caffeine); (5) others (first milk opening time, second stool interval time, breathing machine use, blood transfusion, and apnea)
In the sample characteristic variable acquisition module, the data are recorded by a double-recording system and an automatic logic error correction system of the data of the case acquired by the characteristic variable acquisition module by adopting the epidata3.1, and then the data are imported into spss26.0 statistical software, all variables are assigned from 0, and the classified variables are changed into numerical variables.
Specifically, the step of presetting the classification threshold value includes:
the method for calculating the about sign index P of the prediction model under different data sets by adopting the about sign rule, and calculating the sensitivity and the specificity of the prediction model by utilizing a two-class confusion matrix comprises the following steps: p = sensitivity + specificity-1;
the critical point is determined by the maximum of the about-step index, and the average value of the about-step index P is taken as the best classification threshold Bestp of the model.
It should be noted that constructing the model function y=f (x) of XGBoost, the model outputs a predictive probability P for each sample i . The data set provided by the invention has K characteristics, and each sample characteristic can be expressed as follows:
X i =(X i1 ,X i2 ,X i3 ...X ik )
x in the above formula i For the ith sample, X ik For the kth feature of the ith sample, the predictive probability P for each sample i =f(X i ). When P i Above the Bestp value, the sample is predicted to be intolerant to feeding, otherwise it is predicted to be intolerant to feeding, resulting in a risk prediction of intolerance to feeding in premature infants.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (8)
1. A method of predicting risk of feeding intolerance in premature infants, comprising the steps of:
obtaining gastrointestinal feeding intolerance information according to preset premature infant case information, and obtaining a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
according to whether gastrointestinal feeding is intolerant or not, assigning a value to each premature infant case information to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables;
dividing a training set and a testing set by adopting a 10-fold cross validation method for input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
presetting a classification threshold, calculating the prediction probability of a sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding.
2. The method of claim 1, wherein the characteristic variables comprise the following sets of:
body weight, gestational age, 1 minute apgar score, resuscitation history, neonatal asphyxia, NRDS, infection, PDA, PS use, probiotic use, blood transfusion, apnea, hyperthermia, abnormal interval between bowel movements, milk opening time and mechanical ventilation.
3. The method of claim 1, wherein assigning each premature case information based on whether gastrointestinal feeding is intolerant comprises:
if the premature infant is intolerant to feeding, assigning a first identification value;
otherwise, the second identification value is assigned.
4. A method of predicting risk of feeding intolerance in premature infants according to claim 3 wherein the step of performing a correlation analysis on each set of characteristic variables and intolerance variables to obtain a characteristic variable of high correlation, i.e. a sample characteristic variable, comprises:
inputting each group of characteristic variables and intolerance variables into statistical software, and executing spearman correlation analysis to obtain a plurality of correlation coefficients ρi;
and acquiring a characteristic variable of ρi <0.05, and identifying the characteristic variable as a characteristic variable with high correlation, namely a sample characteristic variable.
5. The method of claim 1, wherein the step of calculating Shapley values for each set of sample characteristic variables, selecting sample characteristic variables having large Shapley values, and obtaining input characteristic variables comprises:
calculating the shape values of each group of sample characteristic variables, and sorting all the shape values according to the order from large to small;
and acquiring n sample characteristic variables before the Shapley value ranking, and taking the sample characteristic variables as input characteristic variables, wherein n is a positive integer.
6. The method of claim 1, wherein the step of pre-setting a classification threshold comprises:
calculating about sign indexes P of the prediction model under different data sets by using about sign rules;
the critical point is determined by the maximum of the about-step index, and the average value of the about-step index P is taken as the best classification threshold Bestp of the model.
7. The method of claim 6, wherein the step of calculating the about mount index P of the predictive model under different data sets using about mount law comprises:
calculating the sensitivity and specificity of the prediction model by using the two classification confusion matrix;
the method for calculating the about sign index P comprises the following steps: p=sensitivity+specificity-1.
8. A system for predicting risk of feeding intolerance in premature infants, comprising:
the characteristic variable acquisition module is used for acquiring gastrointestinal feeding intolerance information according to preset premature infant case information and acquiring a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
the sample characteristic variable acquisition module is used for respectively assigning a value to each premature infant case information according to whether gastrointestinal feeding is intolerant or not to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
the input characteristic variable acquisition module is used for calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables;
the model construction module is used for dividing a training set and a testing set by adopting a 10-fold cross validation method on input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
the prediction module is used for presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding if the prediction probability is larger than the classification threshold.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310123452.8A CN117409963A (en) | 2023-02-07 | 2023-02-07 | Premature infant feeding intolerance risk prediction method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310123452.8A CN117409963A (en) | 2023-02-07 | 2023-02-07 | Premature infant feeding intolerance risk prediction method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117409963A true CN117409963A (en) | 2024-01-16 |
Family
ID=89485849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310123452.8A Pending CN117409963A (en) | 2023-02-07 | 2023-02-07 | Premature infant feeding intolerance risk prediction method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117409963A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117831080A (en) * | 2024-03-04 | 2024-04-05 | 正大农业科学研究有限公司 | Pig growth condition prediction device based on deep learning |
-
2023
- 2023-02-07 CN CN202310123452.8A patent/CN117409963A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117831080A (en) * | 2024-03-04 | 2024-04-05 | 正大农业科学研究有限公司 | Pig growth condition prediction device based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111261282A (en) | Sepsis early prediction method based on machine learning | |
CN110827993A (en) | Early death risk assessment model establishing method and device based on ensemble learning | |
CN108648827B (en) | Cardiovascular and cerebrovascular disease risk prediction method and device | |
EP2614480A2 (en) | Medical scoring systems and methods | |
CN112017783A (en) | Prediction model for pulmonary infection after heart operation and construction method thereof | |
CN108604465B (en) | Prediction of Acute Respiratory Disease Syndrome (ARDS) based on patient physiological responses | |
Mikhno et al. | Prediction of extubation failure for neonates with respiratory distress syndrome using the MIMIC-II clinical database | |
CN104866713B (en) | Locally differentiate the Kawasaki disease and fever diagnostic system of subspace insertion based on increment | |
CN111816321B (en) | System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria | |
CN109872819A (en) | A kind of acute kidney injury incidence rate forecasting system based on Intensive Care Therapy detection | |
CN117409963A (en) | Premature infant feeding intolerance risk prediction method and system | |
CN106951710B (en) | CAP data system and method based on privilege information learning support vector machine | |
Phan et al. | Pediatric automatic sleep staging: a comparative study of state-of-the-art deep learning methods | |
Cheng et al. | Machine learning models for predicting in-hospital mortality in patient with sepsis: Analysis of vital sign dynamics | |
CN116825366A (en) | Personalized tube drawing risk prediction model construction method and device based on physiological function feature model | |
US11656234B2 (en) | Method and computer program for predicting bilirubin levels in neonates | |
CN113066547B (en) | ARDS early dynamic early warning method and system based on conventional noninvasive parameters | |
Kanbar et al. | Undersampling and bagging of decision trees in the analysis of cardiorespiratory behavior for the prediction of extubation readiness in extremely preterm infants | |
Parsons et al. | Clinical prediction models in Epidemiological studies: lessons from the application of Qrisk3 to UK Biobank data | |
Zhuang et al. | Development and validation of a nomogram for adverse outcomes of geriatric trauma patients based on frailty syndrome | |
CN113066584A (en) | Prediction method and system for early septicemia | |
Watcharapasorn et al. | The surgical patient mortality rate prediction by machine learning algorithms | |
Wu et al. | Research Progress on Phenotypic Classification of Acute Respiratory Distress Syndrome: A Narrative Review | |
CN116741384B (en) | Bedside care-based severe acute pancreatitis clinical data management method | |
Shickel et al. | Scoring for Hemorrhage Severity in Traumatic Injury |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |