CN117409963A - Premature infant feeding intolerance risk prediction method and system - Google Patents

Premature infant feeding intolerance risk prediction method and system Download PDF

Info

Publication number
CN117409963A
CN117409963A CN202310123452.8A CN202310123452A CN117409963A CN 117409963 A CN117409963 A CN 117409963A CN 202310123452 A CN202310123452 A CN 202310123452A CN 117409963 A CN117409963 A CN 117409963A
Authority
CN
China
Prior art keywords
variables
characteristic variables
sample
intolerant
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310123452.8A
Other languages
Chinese (zh)
Inventor
徐惠
周瑞
付连国
杨丽娟
陈信
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Bengbu Medical College
Original Assignee
First Affiliated Hospital of Bengbu Medical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Bengbu Medical College filed Critical First Affiliated Hospital of Bengbu Medical College
Priority to CN202310123452.8A priority Critical patent/CN117409963A/en
Publication of CN117409963A publication Critical patent/CN117409963A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a premature infant feeding intolerance risk prediction method, which comprises the following steps: obtaining gastrointestinal feeding intolerance information according to preset premature infant case information; according to whether gastrointestinal feeding is intolerant or not, intolerant variables are obtained, correlation analysis is carried out on each group of characteristic variables and intolerant variables, and characteristic variables with high correlation degree, namely sample characteristic variables, are obtained; calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables; dividing a training set and a testing set by adopting a 10-fold cross validation method for input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost; presetting a classification threshold, calculating the prediction probability of a sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding. The invention also discloses a system adopting the method.

Description

Premature infant feeding intolerance risk prediction method and system
Technical Field
The invention relates to the technical field of intelligent prediction, in particular to a premature infant feeding intolerance risk prediction method and system.
Background
The premature infant is intolerant to feed, is a clinically common digestive system multiple symptom, is easier to feed than the term infant because of the immature gastrointestinal development and relatively slower development of gastrointestinal motility than digestion and absorption functions, and is manifested by vomiting, abdominal distension, gastric retention and the like after starting gastrointestinal feeding, which seriously affects early nutrition support treatment of the premature infant and brings great challenges to reasonable feeding of the premature infant. The research shows that the incidence rate of feeding intolerance of premature infants in China is 33.80-53.45%, and the incidence rate in foreign countries is about 25%. Feeding intolerance will lead to insufficient nutrient intake in premature infants, a long development delay of Gong Waisheng, and prolonged parenteral nutrition will also increase the incidence of complications such as nosocomial infections, metabolic disorders, liver damage, etc. Meanwhile, the hospitalization time is prolonged, the social and family economic burden is increased, and the survival rate and the life quality of premature infants are influenced.
Feeding intolerance is a common clinical symptom, the pathogenesis is complex, the influence factors are numerous, and the high-risk factors influencing feeding intolerance can be accurately identified and targeted prevention is a key measure for reducing feeding intolerance. Domestic Chen Qiong et al use logistic regression analysis modeling to predict the occurrence of premature infant feeding intolerance; li Yan et al, discuss the relationship of arterial blood flow changes on the mesenteric of premature infants before and after a meal to feeding tolerance using Spearmans-related regression analysis in the hope of predicting whether feeding intolerance occurs by gastrointestinal kinetics changes; in abroad, carlo developed early effective biomarkers for prediction, measuring visceral tissue oxygenation fraction with near infrared spectroscopy to predict premature feeding tolerance; valentina adopts a generalized linear model to evaluate the relationship between visceral blood oxygen saturation, superior mesenteric artery Doppler blood flow velocity measurement and feeding tolerance; bozzetti predicts the feeding tolerance of intrauterine limited newborns using logstin regression modeling.
From the above, most of the research of modeling by using the data mining method at home and abroad only uses a simple data mining method, the obtained model may not be the best result, and most of the models mainly use the research of the biomarker, so that the cost is high, the operation is complex, the requirements on manpower and material resources are high, and part of evaluation tools need to be predicted by combining with an imaging examination means, so that the economic burden of patients is increased, and the method does not accord with the hope of masses for a convenient and quick screening mode. Therefore, it is needed to construct a systematic, convenient and accurate prediction method to make up for the defects of the existing researches.
Disclosure of Invention
An object of the present invention is to propose a method for predicting the risk of feeding intolerance of premature infants which can be accurately predicted.
A method of predicting risk of feeding intolerance in premature infants comprising the steps of:
obtaining gastrointestinal feeding intolerance information according to preset premature infant case information, and obtaining a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
according to whether gastrointestinal feeding is intolerant or not, assigning a value to each premature infant case information to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables;
dividing a training set and a testing set by adopting a 10-fold cross validation method for input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
presetting a classification threshold, calculating the prediction probability of a sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding.
According to the premature infant feeding intolerance risk prediction method provided by the invention, the significance of the features is ordered by calculating the SHAP value of each feature, and the features are selected for model training, so that the problems of dimension disasters and noise caused by more features and the over-fitting problem caused by the increase of model complexity are overcome.
In addition, the premature infant feeding intolerance risk prediction method provided by the invention can also have the following additional technical characteristics:
further, the characteristic variables include the following sets:
body weight, gestational age, 1 minute apgar score, resuscitation history, neonatal asphyxia, NRDS, infection, PDA, PS use, probiotic use, blood transfusion, apnea, hyperthermia, abnormal interval between bowel movements, milk opening time and mechanical ventilation.
Further, the step of assigning a value to each of the premature infant case information based on whether the gastrointestinal feeding is intolerant, respectively, comprises:
if the premature infant is intolerant to feeding, assigning a first identification value;
otherwise, the second identification value is assigned.
Further, the step of performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation, namely sample characteristic variables, includes:
inputting each group of characteristic variables and intolerance variables into statistical software, and executing spearman correlation analysis to obtain a plurality of correlation coefficients ρi;
and acquiring a characteristic variable of ρi <0.05, and identifying the characteristic variable as a characteristic variable with high correlation, namely a sample characteristic variable.
Further, the step of calculating the shape value of each group of sample feature variables, and selecting the sample feature variables with large shape values to obtain the input feature variables includes:
calculating the shape values of each group of sample characteristic variables, and sorting all the shape values according to the order from large to small;
and acquiring n sample characteristic variables before the Shapley value ranking, and taking the sample characteristic variables as input characteristic variables, wherein n is a positive integer.
Further, the step of presetting the classification threshold value includes:
calculating about sign indexes P of the prediction model under different data sets by using about sign rules;
the critical point is determined by the maximum of the about-step index, and the average value of the about-step index P is taken as the best classification threshold Bestp of the model.
Further, the step of calculating the about sign index P of the predictive model under different data sets using about sign law includes:
calculating the sensitivity and specificity of the prediction model by using the two classification confusion matrix;
the method for calculating the about sign index P comprises the following steps: p=sensitivity+specificity-1.
Another object of the invention is to propose a system for predicting the risk of feeding intolerance of premature infants, comprising:
the characteristic variable acquisition module is used for acquiring gastrointestinal feeding intolerance information according to preset premature infant case information and acquiring a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
the sample characteristic variable acquisition module is used for respectively assigning a value to each premature infant case information according to whether gastrointestinal feeding is intolerant or not to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
the input characteristic variable acquisition module is used for calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables;
the model construction module is used for dividing a training set and a testing set by adopting a 10-fold cross validation method on input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
the prediction module is used for presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding if the prediction probability is larger than the classification threshold.
The beneficial effects of the invention are as follows: XGBoost belongs to Boosting integrated learning algorithm, and is formed by integrating CART regression tree models together, so that a strong classifier is formed, the accuracy is high, the running speed is high, the overfitting is reduced by using regularization technology, and abnormal value interference is avoided.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a first embodiment of the present invention;
FIG. 2 is a flow chart showing the steps of a method for predicting feeding intolerance of premature infants based on SHAP feature selection and XGBoost in accordance with a first embodiment of the present invention;
FIG. 3 is a schematic diagram showing SHAP additivity results;
FIG. 4 is a schematic diagram of feature importance ranking based on SHAP values;
FIG. 5 is a schematic diagram of SHAP-based feature abstracts;
FIG. 6 is a schematic diagram of a 10-fold cross-validation principle;
FIG. 7 is a schematic representation of the ROC curve of the method of the invention;
fig. 8 is a block diagram of a second embodiment of the present invention.
Detailed Description
In order that the objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. Several embodiments of the invention are presented in the figures. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Referring to fig. 1 and 2, a first embodiment of the present invention provides a method for predicting feeding intolerance risk of premature infants, comprising the following steps.
S1, obtaining gastrointestinal feeding intolerance information according to preset premature infant case information, and obtaining multiple groups of characteristic variables in the gastrointestinal feeding intolerance information.
In this example, the preset premature infant case information is premature infant case information of Neonatal Intensive Care Unit (NICU) hospitalization analyzed by hospital electronic case system, and infants suffering from gastrointestinal Feeding Intolerance (FI) and infants suffering from gastrointestinal feeding tolerance are selected.
Further, inclusion criteria for the pre-set premature infant case information are: (1) gestational age <37 weeks; (2) admission is made within 24 hours after birth; (3) the hospitalization time is more than or equal to 7 days. The exclusion criteria were: (1) serious digestive tract deformity, congenital heart disease, genetic metabolic disease and the like; (2) the infant who is not milked and is automatically abandoned in treatment.
In this embodiment, the plurality of sets of characteristic variables in the gastrointestinal feeding intolerance information may be collecting related risk factors affecting feeding intolerance of premature infants, the related risk factors being: (1) general condition of infant (sex, gestational age, birth weight, gestational time, apgar score one minute after birth, whether test tube infant, whether there is post-birth resuscitation history, body temperature condition); (2) infant mother conditions (gestational complications, amniotic fluid abnormality, placenta abnormality, umbilical cord abnormality, fetal membrane abnormality, production mode, assisted reproduction, whether multiple fetuses, fetal position, mother age); (3) diseases of infant after birth (neonatal asphyxia, neonatal respiratory distress syndrome, neonatal hypoxic ischemic encephalopathy, neonatal infection, neonatal hyperbilirubinemia, arterial catheter patent); (4) drug use cases (antibiotics, probiotics, lung surfactant PS, caffeine); (5) others (first milk break time, interval between two stool, ventilator use, blood transfusion, apnea). In other embodiments, the feature variable may be selected according to the actual situation.
S2, according to whether gastrointestinal feeding is intolerant or not, assigning value to each premature infant case information to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables.
In this example, sample characteristic variables include body weight, gestational age, 1 minute apgar score, resuscitation history, neonatal asphyxia, NRDS, infection, PDA, PS use, probiotic use, blood transfusion, apnea, hyperthermia, abnormal time between bowel movements, milk opening time, and mechanical ventilation. In practical operation, there are different situations for the sample feature variables, and the embodiment is not limited.
Specifically, the step of assigning a value to each of the premature infant case information based on whether gastrointestinal feeding is intolerant includes:
s21, if the premature infant is intolerant to feeding, assigning a first identification value;
s22, otherwise, assigning a second identification value.
In this embodiment, the case data acquired in step S1 is recorded by using the dual recording system and the automatic logic error correction system of the data of epidata3.1, and then is imported into the spss26.0 statistical software, and all variables are assigned from 0, so that the classified variables are changed into numerical variables.
Specifically, the first identification value is 1, and the second identification value is 0. In other embodiments, the identification value may be selected according to the actual situation.
In this embodiment, when assigning a value to each premature infant, the value is also assigned to the corresponding characteristic variable, and the assignment mode is shown in table 1.
TABLE 1
Variable(s) Assignment of value
Weight of body 0=≤1.5kg,1=<2.5kg,2=≥2.5kg
Gestational age 0=<34w,1=≥34w
1 minute apgar score 0 =. Ltoreq.6 min, 1 =. Gtoreq.7 min
History of resuscitation 0 = none, 1 = have
Newborn chamber rest 0 = none, 1 = have
NRDS 0 = none, 1 = have
Infection with 0 = none, 1 = have
PDA 0 = none, 1 = have
PS usage 0 = none, 1 = have
Probiotics 0 = none, 1 = have
Blood transfusion 0 = none, 1 = have
Apnea 0 = none, 1 = have
High body temperature 0 = none, 1 = have
Abnormal time between two stool 0 = none, 1 = have
Time for milk opening 0=<24h,1=≥24h
Mechanical ventilation 0 = none, 1 = have
With or without intolerance of feeding 0 = none, 1 = have
Specifically, the step of performing a correlation analysis on each set of feature variables and intolerant variables to obtain feature variables with high correlation, i.e., sample feature variables, includes:
s23, inputting each group of characteristic variables and intolerance variables into statistical software, and executing spearman correlation analysis to obtain a plurality of correlation coefficients ρi;
s24, obtaining characteristic variables with ρi less than 0.05, and identifying the characteristic variables as characteristic variables with high correlation degree, namely sample characteristic variables.
It should be noted that, the sample characteristic variable can be understood as a factor having statistical significance, and the influence of the variation is more remarkable than other factors.
S3, calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables.
Specifically, the step of calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables comprises the following steps:
s31, calculating the shape values of each group of sample characteristic variables, and sequencing all the shape values from large to small;
s32, acquiring n sample characteristic variables before the Shapley value ranking, and taking the sample characteristic variables as input characteristic variables, wherein n is a positive integer.
In this embodiment, n=13, and in other embodiments, n=13 may be selected according to practical situations.
Referring to fig. 4, the feature importance ranking implementation method of the present invention calculates Shapley values of each feature through SHAP, and then ranks the importance of the sample features. FIG. 5 shows a summary of features based on SHAP, which can reveal not only the importance of the effect, but also the general direction of the effect, providing great assistance in selecting features.
It can be understood that in this step, the statistically significant factors in step S2 are imported into the R language, and Shapley values of each sample feature are calculated by SHAP, where the larger the Shapley values, the more significant the influence on the model, so as to measure the importance of each feature on the final prediction result.
Referring to fig. 3, the main idea of shap is to solve the problem of distribution balance in the cooperative game theory, and for a machine learning model, the model obtains a predicted value for each sample, where the predicted result is a result commonly determined by the respective contribution value (Shapley value) of each feature. Let the ith sample be x i The kth feature of the ith sample is x ik The predicted value of the model to be explained on the sample is y i Then SHAP interpretation obeys the following equation:
wherein,as a baseline of the whole model output, the prediction result expectation of all training samples on the original model is that is, the prediction mean value of all samples, +.>For the final predicted value y for the kth feature pair in the ith sample i The contribution value of (i.e. shape), when +.>Indicating that this feature improves the predicted value, has a positive effect on the output, whereas when +.>This feature is described as decreasing the predicted value and acting in the opposite direction. Therefore, SHAP can not only reveal the importance of the effect, but also reflect the overall direction of the effect, and characterize the overall positive and negative relationship between the characteristic variable and the intolerance of premature feeding.
S4, dividing the training set and the testing set by adopting a 10-fold cross validation method for the input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost.
Referring to fig. 6, the "10-fold cross-validation method" is a common method for validating classifier performance. The original data set is divided into 10 parts on average, 9 different parts are selected as training sets each time, 1 part is left as a test set, then training, prediction and evaluation of the model are carried out, and the result of each time is recorded. The above procedure was repeated 10 times and the recorded 10 results were averaged as a final indicator of the quality of the assessment model. The 10-fold cross validation ensures that each sample is validated once, so that the influence of data set division can be reduced, and the stability and generalization capability of the model can be conveniently inspected.
It should be noted that XGBoost is a lifting tree model, which integrates CART regression tree models together to form a strong classifier. The idea is to continuously add trees, and continuously perform feature splitting to grow a tree, and add one tree at a time, which is actually to learn a new function to fit the residual error of the last prediction.
Assuming a total of t trees, F represents the tree model, then the predicted valueCan be expressed as:
the objective function is:
wherein l is a loss function representing an error between the predicted value and the actual value; omega is a regularization function that prevents model overfitting.
The regularization function in XGBoost is expressed as follows:
where T represents the number of leaf nodes per tree, w represents the weight of the leaves per tree, and gamma and lambda are added in order to suppress the growth of the tree and prevent model overfitting. λ is the L2 regularization coefficient and γ is the splitting threshold. According to the objective function, the optimal scoring function is obtained by means of a solution, wherein the smaller the output value of the function is, the better the tree model is:
one tree model can be evaluated according to a scoring function, but the candidate tree is endless and it is impossible to score all candidate trees. The XGBoost algorithm uses a greedy algorithm to solve this problem, starting from the root node of the tree, calculates whether the post-split and pre-split objective function values decrease, assuming the pre-split node is j,
its contribution to the objective function is:
after the node splits, the objective function contributions of the two child nodes are:
at this time, the objective function is changed to:
finally, the information gain of the objective function after each split is obtained:
wherein GL and GR are respectively left and right She Ziyi step statistics sums during splitting, and HL and HR are information gains of left and right leaf node second-order gradient statistics sums.
In addition, grid search is used to find optimal parameters when training the model. Different parameter settings can have a great influence on the prediction effect of the model, and the grid search establishes a search space according to the parameter values, so that the parameters are comprehensively searched, and the best effect is obtained once.
Referring to fig. 7, the present application further uses an ROC curve to verify the constructed model, where the ROC curve uses a false positive rate (1-specificity) as a horizontal axis, a true positive rate (sensitivity) as a vertical axis, and the points are connected according to points generated by different boundary values, and the area under the curve AUC can reflect the accuracy of the diagnostic test. The index value range is between 0.5 and 1. It is considered that when AUC is 0.5 to 0.7, diagnostic accuracy is generally indicated, when AUC is 0.7 to 0.8, diagnostic is moderate, and when AUC >0.8, diagnostic is better. As can be seen from fig. 7, the model constructed by the present invention is better diagnostic.
S5, presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding.
Specifically, the step of presetting the classification threshold value includes:
s51, calculating about sign indexes P of the prediction model under different data sets by adopting about sign rules;
s52, determining a critical point by using the about step index maximization, taking the average value of about step index P for 10 times, and taking the average value as the best classification threshold Bestp of the model.
Further, the step of calculating the about sign index P of the predictive model under different data sets using about sign rules includes:
s511, calculating the sensitivity and specificity of a prediction model by using a two-class confusion matrix;
s512, the method for calculating the about step index P comprises the following steps: p=sensitivity+specificity-1.
The confusion matrix is a visual tool, particularly used for supervised learning, and is a standard format for representing precision evaluation, and is represented by a matrix form of n rows and n columns, as shown in table 2 below.
TABLE 2
(1) Sensitivity: the identified positive examples are the proportion of all positive examples, i.e. the patient is judged as the patient, and no missed diagnosis occurs.
(2) Specificity: the identified negative examples are the proportion of all negative examples, namely, normal people are judged as normal people, and misjudgment does not occur.
The method for calculating the about step index P comprises the following steps: sensitivity + specificity-1 is the ability to find real patients and non-patients by integrating diagnostic methods to be evaluated minus the base "1". The larger the value, the better the diagnostic method to be evaluated.
The method for calculating the Bestp comprises the following steps:
it should be noted that constructing the model function y=f (x) of XGBoost, the model outputs a predictive probability P for each sample i . The data set provided by the invention has K characteristics, and each sample characteristic can be expressed as follows:
X i =(X i1 ,X i2 ,X i3 …X ik )
x in the above formula i For the ith sample, X ik For the kth feature of the ith sample, the predictive probability P for each sample i =f(X i ). When P i Above the Bestp value, the sample is predicted to be intolerant to feeding, otherwise it is predicted to be intolerant to feeding, resulting in a risk prediction of intolerance to feeding in premature infants.
In summary, the data are preprocessed based on SHAP feature selection, then the importance of the features is ordered by calculating the SHAP value of each feature, and the features are selected for model training, so that the problems of dimension disasters and noise caused by more features and over-fitting caused by increased complexity of the model are overcome; the model is trained and predicted by adopting a 10-fold cross validation method, so that the influence of data division on the model prediction result is reduced, and the accuracy of model prediction is more reliable and the generalization capability is stronger; in addition, the XGBoost integrated learning model is built, the grid search is used for parameter tuning, the maximum approximate index principle is used for determining the optimal critical point, so that the reliability and the prediction accuracy of the model are further improved, and the XGBoost integrated learning model has great significance in preventing feeding intolerance of premature infants.
Referring to fig. 8, a second embodiment of the present invention provides a system for predicting risk of feeding intolerance of premature infants, comprising:
the characteristic variable acquisition module is used for acquiring gastrointestinal feeding intolerance information according to preset premature infant case information and acquiring a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
the sample characteristic variable acquisition module is used for respectively assigning a value to each premature infant case information according to whether gastrointestinal feeding is intolerant or not to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
the input characteristic variable acquisition module is used for calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables;
the model construction module is used for dividing a training set and a testing set by adopting a 10-fold cross validation method on input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
the prediction module is used for presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding if the prediction probability is larger than the classification threshold.
In this example, the preset premature infant case information is premature infant case information of Neonatal Intensive Care Unit (NICU) hospitalization analyzed by hospital electronic case system, and infants suffering from gastrointestinal Feeding Intolerance (FI) and infants suffering from gastrointestinal feeding tolerance are selected.
In particular, the plurality of sets of characteristic variables in the gastrointestinal feeding intolerance information include
(1) General condition of infant (sex, gestational age, birth weight, gestational time, apgar score one minute after birth, whether test tube infant, whether there is post-birth resuscitation history, body temperature condition); (2) infant mother conditions (gestational complications, amniotic fluid abnormality, placenta abnormality, umbilical cord abnormality, fetal membrane abnormality, production mode, assisted reproduction, whether multiple fetuses, fetal position, mother age); (3) diseases of infant after birth (neonatal asphyxia, neonatal respiratory distress syndrome, neonatal hypoxic ischemic encephalopathy, neonatal infection, neonatal hyperbilirubinemia, arterial catheter patent); (4) drug use cases (antibiotics, probiotics, lung surfactant PS, caffeine); (5) others (first milk opening time, second stool interval time, breathing machine use, blood transfusion, and apnea)
In the sample characteristic variable acquisition module, the data are recorded by a double-recording system and an automatic logic error correction system of the data of the case acquired by the characteristic variable acquisition module by adopting the epidata3.1, and then the data are imported into spss26.0 statistical software, all variables are assigned from 0, and the classified variables are changed into numerical variables.
Specifically, the step of presetting the classification threshold value includes:
the method for calculating the about sign index P of the prediction model under different data sets by adopting the about sign rule, and calculating the sensitivity and the specificity of the prediction model by utilizing a two-class confusion matrix comprises the following steps: p = sensitivity + specificity-1;
the critical point is determined by the maximum of the about-step index, and the average value of the about-step index P is taken as the best classification threshold Bestp of the model.
It should be noted that constructing the model function y=f (x) of XGBoost, the model outputs a predictive probability P for each sample i . The data set provided by the invention has K characteristics, and each sample characteristic can be expressed as follows:
X i =(X i1 ,X i2 ,X i3 ...X ik )
x in the above formula i For the ith sample, X ik For the kth feature of the ith sample, the predictive probability P for each sample i =f(X i ). When P i Above the Bestp value, the sample is predicted to be intolerant to feeding, otherwise it is predicted to be intolerant to feeding, resulting in a risk prediction of intolerance to feeding in premature infants.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (8)

1. A method of predicting risk of feeding intolerance in premature infants, comprising the steps of:
obtaining gastrointestinal feeding intolerance information according to preset premature infant case information, and obtaining a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
according to whether gastrointestinal feeding is intolerant or not, assigning a value to each premature infant case information to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
calculating the shape value of each group of sample characteristic variables, and selecting sample characteristic variables with large shape values to obtain input characteristic variables;
dividing a training set and a testing set by adopting a 10-fold cross validation method for input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
presetting a classification threshold, calculating the prediction probability of a sample to be predicted by using a model function, and if the prediction probability is larger than the classification threshold, judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding.
2. The method of claim 1, wherein the characteristic variables comprise the following sets of:
body weight, gestational age, 1 minute apgar score, resuscitation history, neonatal asphyxia, NRDS, infection, PDA, PS use, probiotic use, blood transfusion, apnea, hyperthermia, abnormal interval between bowel movements, milk opening time and mechanical ventilation.
3. The method of claim 1, wherein assigning each premature case information based on whether gastrointestinal feeding is intolerant comprises:
if the premature infant is intolerant to feeding, assigning a first identification value;
otherwise, the second identification value is assigned.
4. A method of predicting risk of feeding intolerance in premature infants according to claim 3 wherein the step of performing a correlation analysis on each set of characteristic variables and intolerance variables to obtain a characteristic variable of high correlation, i.e. a sample characteristic variable, comprises:
inputting each group of characteristic variables and intolerance variables into statistical software, and executing spearman correlation analysis to obtain a plurality of correlation coefficients ρi;
and acquiring a characteristic variable of ρi <0.05, and identifying the characteristic variable as a characteristic variable with high correlation, namely a sample characteristic variable.
5. The method of claim 1, wherein the step of calculating Shapley values for each set of sample characteristic variables, selecting sample characteristic variables having large Shapley values, and obtaining input characteristic variables comprises:
calculating the shape values of each group of sample characteristic variables, and sorting all the shape values according to the order from large to small;
and acquiring n sample characteristic variables before the Shapley value ranking, and taking the sample characteristic variables as input characteristic variables, wherein n is a positive integer.
6. The method of claim 1, wherein the step of pre-setting a classification threshold comprises:
calculating about sign indexes P of the prediction model under different data sets by using about sign rules;
the critical point is determined by the maximum of the about-step index, and the average value of the about-step index P is taken as the best classification threshold Bestp of the model.
7. The method of claim 6, wherein the step of calculating the about mount index P of the predictive model under different data sets using about mount law comprises:
calculating the sensitivity and specificity of the prediction model by using the two classification confusion matrix;
the method for calculating the about sign index P comprises the following steps: p=sensitivity+specificity-1.
8. A system for predicting risk of feeding intolerance in premature infants, comprising:
the characteristic variable acquisition module is used for acquiring gastrointestinal feeding intolerance information according to preset premature infant case information and acquiring a plurality of groups of characteristic variables in the gastrointestinal feeding intolerance information;
the sample characteristic variable acquisition module is used for respectively assigning a value to each premature infant case information according to whether gastrointestinal feeding is intolerant or not to obtain intolerant variables, and performing correlation analysis on each group of characteristic variables and intolerant variables to obtain characteristic variables with high correlation degree, namely sample characteristic variables;
the input characteristic variable acquisition module is used for calculating the shape value of each group of sample characteristic variables, and selecting the sample characteristic variables with large shape values to obtain the input characteristic variables;
the model construction module is used for dividing a training set and a testing set by adopting a 10-fold cross validation method on input characteristic variables, training a training set model based on an XGBoost algorithm, and constructing a model function of the XGBoost;
the prediction module is used for presetting a classification threshold, calculating the prediction probability of the sample to be predicted by using a model function, and judging that the premature infant corresponding to the sample to be predicted is intolerant to gastrointestinal feeding if the prediction probability is larger than the classification threshold.
CN202310123452.8A 2023-02-07 2023-02-07 Premature infant feeding intolerance risk prediction method and system Pending CN117409963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310123452.8A CN117409963A (en) 2023-02-07 2023-02-07 Premature infant feeding intolerance risk prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310123452.8A CN117409963A (en) 2023-02-07 2023-02-07 Premature infant feeding intolerance risk prediction method and system

Publications (1)

Publication Number Publication Date
CN117409963A true CN117409963A (en) 2024-01-16

Family

ID=89485849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310123452.8A Pending CN117409963A (en) 2023-02-07 2023-02-07 Premature infant feeding intolerance risk prediction method and system

Country Status (1)

Country Link
CN (1) CN117409963A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831080A (en) * 2024-03-04 2024-04-05 正大农业科学研究有限公司 Pig growth condition prediction device based on deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831080A (en) * 2024-03-04 2024-04-05 正大农业科学研究有限公司 Pig growth condition prediction device based on deep learning

Similar Documents

Publication Publication Date Title
CN111261282A (en) Sepsis early prediction method based on machine learning
CN110827993A (en) Early death risk assessment model establishing method and device based on ensemble learning
CN108648827B (en) Cardiovascular and cerebrovascular disease risk prediction method and device
EP2614480A2 (en) Medical scoring systems and methods
CN112017783A (en) Prediction model for pulmonary infection after heart operation and construction method thereof
CN108604465B (en) Prediction of Acute Respiratory Disease Syndrome (ARDS) based on patient physiological responses
Mikhno et al. Prediction of extubation failure for neonates with respiratory distress syndrome using the MIMIC-II clinical database
CN104866713B (en) Locally differentiate the Kawasaki disease and fever diagnostic system of subspace insertion based on increment
CN111816321B (en) System, apparatus and storage medium for intelligent infectious disease identification based on legal diagnostic criteria
CN109872819A (en) A kind of acute kidney injury incidence rate forecasting system based on Intensive Care Therapy detection
CN117409963A (en) Premature infant feeding intolerance risk prediction method and system
CN106951710B (en) CAP data system and method based on privilege information learning support vector machine
Phan et al. Pediatric automatic sleep staging: a comparative study of state-of-the-art deep learning methods
Cheng et al. Machine learning models for predicting in-hospital mortality in patient with sepsis: Analysis of vital sign dynamics
CN116825366A (en) Personalized tube drawing risk prediction model construction method and device based on physiological function feature model
US11656234B2 (en) Method and computer program for predicting bilirubin levels in neonates
CN113066547B (en) ARDS early dynamic early warning method and system based on conventional noninvasive parameters
Kanbar et al. Undersampling and bagging of decision trees in the analysis of cardiorespiratory behavior for the prediction of extubation readiness in extremely preterm infants
Parsons et al. Clinical prediction models in Epidemiological studies: lessons from the application of Qrisk3 to UK Biobank data
Zhuang et al. Development and validation of a nomogram for adverse outcomes of geriatric trauma patients based on frailty syndrome
CN113066584A (en) Prediction method and system for early septicemia
Watcharapasorn et al. The surgical patient mortality rate prediction by machine learning algorithms
Wu et al. Research Progress on Phenotypic Classification of Acute Respiratory Distress Syndrome: A Narrative Review
CN116741384B (en) Bedside care-based severe acute pancreatitis clinical data management method
Shickel et al. Scoring for Hemorrhage Severity in Traumatic Injury

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination