CN115312196A - Novel model construction evaluation method for screening pressure injury risk factors and application thereof - Google Patents

Novel model construction evaluation method for screening pressure injury risk factors and application thereof Download PDF

Info

Publication number
CN115312196A
CN115312196A CN202210122514.9A CN202210122514A CN115312196A CN 115312196 A CN115312196 A CN 115312196A CN 202210122514 A CN202210122514 A CN 202210122514A CN 115312196 A CN115312196 A CN 115312196A
Authority
CN
China
Prior art keywords
model
random forest
prediction
evaluation method
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210122514.9A
Other languages
Chinese (zh)
Inventor
徐洁
孙彩霞
邓晓芳
潘晓云
陈瑜
许玲玲
舒美春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Wenzhou Medical University
Original Assignee
First Affiliated Hospital of Wenzhou Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Wenzhou Medical University filed Critical First Affiliated Hospital of Wenzhou Medical University
Priority to CN202210122514.9A priority Critical patent/CN115312196A/en
Publication of CN115312196A publication Critical patent/CN115312196A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a novel model construction evaluation method for screening pressure injury risk factors and application thereof, 1) basic information of a patient is collected; 2) Carrying out classification treatment; 3) Constructing a model; 4) Comparing and checking; 5) And (6) evaluating and comparing. According to the invention, through comparing the prediction performances of the Logistic regression prediction model, the decision tree prediction model and the random forest prediction model, the prediction model with the optimal prediction performance is determined, and a basis is provided for selecting a proper model for the development of the pressure injury prediction software later, so that the accuracy and convenience of clinically predicting the pressure injury of the adult patient are improved, the pressure injury incidence of the adult patient is reduced, and the pain and medical burden of the patient are relieved.

Description

Novel model construction evaluation method for screening pressure injury risk factors and application thereof
Technical Field
The invention relates to the technical field of medical treatment, in particular to a novel model construction evaluation method for screening pressure injury risk factors and application thereof.
Background
Pressure Injury (PI), which was known as bedsore, pressure sore or Pressure Ulcer (PU), is formally called Pressure Injury by the american Pressure Injury committee in 2016, refers to localized Injury occurring in the skin and/or subcutaneous tissue, usually occurring at the bony prominence or at the skin site in contact with a medical device, and may be manifested as intact skin or open Ulcer with pain sensation, and Pressure Injury has been recognized as a serious hospital adverse event, which occurs not only in association with many adverse effects, such as infection, systemic amyloidosis due to chronic inflammatory conditions, chronic Pressure and rhabdomyolysis due to ischemia, etc., but also reduces the quality of life of the patient, prolongs the length of hospitalization of the patient, and increases the cost of the medical system.
Hospital Acquired Pressure Injury (HAPI) refers to Pressure Injury Acquired at a Hospital. The stress injury is always a major health problem faced by global medical health institutions, the prevalence rate of the global stress injury is 12.8%, the prevalence rate of the hospital-acquired stress injury is 8.4%, and accounts for 62% of the stress injury, the prevalence rate of the hospital-acquired stress injury in north america is 12.6% -14.5%, the prevalence rate in europe is 12.6% -16.5%, the prevalence rate in australia is 3% -50%, the prevalence rate in asia is 2.1% -31.3%, the prevalence rate of the stress injury in inpatients in China is 1.67%, the prevalence rate of the hospital-acquired stress injury in hospitals is 0.68%, and the prevalence rate is 1.26%. The occurrence of the pressure injury not only increases the pain of the patient, reduces the life quality of the patient and prolongs the hospitalization time, but also increases the extra nursing workload of the nursing staff, improves the medical expense and increases the burden of the medical health system. British studies have shown that as the severity of the pressure injury increases, so does the cost of treatment, with stage 1 pressure injury being approximately 1214 pounds and stage 4 being 14108 pounds. The cost of treating hospital-acquired pressure injuries in the united states is $ 110 billion a decade ago, and today about $ 268 billion is expected, with the cost of treating stage 3 and stage 4 patients accounting for 56% of the total cost, and the cost of treating hospital-acquired pressure injuries per patient is about $ 10708.
Intensive Care Unit (ICU) patients are often in bed for a long time and in a calmed state due to critical illness, have poor nutritional status, and may be accompanied with diabetes, infection, cardiovascular disease and other complications, so that the incidence of pressure injury is far higher than that of common hospitalized patients in other departments. Chaboyer et al performed meta analysis of incidence and prevalence of ICU patients, and showed that ICU patients had a prevalence of stress injuries ranging from 10.0% to 25.9% and a prevalence of 16.9% to 23.8%. The second-level data analysis of Coyer and the like shows that, except for the first stage of the stress injury, from 2012 to 2014, the prevalence rate of the stress injury acquired in hospitals of intensive care unit patients is 11%, the prevalence rate of the stress injury acquired in non-intensive care unit patients is 3%, and the probability of the stress injury occurring in the intensive care unit patients during hospitalization is 3.8 times that of the non-intensive care unit patients. The result of a multi-center investigation on Jiangchenxia and the like shows that the incidence rate of the pressure injury in the internal ward is 1.4%, the incidence rate of the pressure injury in the surgical ward is 0.8%, the incidence rate of the pressure injury in the elderly department is 3.3%, and the incidence rate of the pressure injury of patients in the intensive care unit (11.9%) is highest in all departments. The risk of the pressure injury of the patient is evaluated and predicted, measures are taken in time for early intervention, and the pressure injury can be prevented, so that the injury of the patient is reduced, the hospitalization cost of the patient is reduced, and the cost burden of a medical system is lightened, because the cost for preventing the pressure injury is lower than the cost for treating the pressure injury. Therefore, the use of a stress injury prediction tool is of critical importance.
At present, a Braden meter is widely used by intensive care unit patients at home and abroad and is developed by Braden doctor and Bergs doctor in the U.S. in 1987, but researchers propose that the Braden meter lacks content for evaluating important risk factors specific to an ICU patient, such as use of sedation analgesics, use of medical instruments and the like, so that the prediction efficiency of the Braden meter for the ICU patient is insufficient, the accuracy is low, and the Braden meter is not suitable for the ICU patient. Researches such as modification of the Braden scale or combined use of the Braden scale and other scales and development of an ICU patient stress injury risk assessment scale are also carried out by researchers, but the researches are only carried out on small samples in China, the credibility needs to be further verified, large sample tests are still needed, and the researches are complex to use in combination with other scales. With the progress and development of science and technology, the world enters the big data era, and artificial intelligence and data mining technologies are widely applied in the medical field, such as prevention, health care, diagnosis and treatment, images, surgical robots and the like. Machine learning is an important branch of artificial intelligence, and is widely applied in the medical field, such as image recognition, genetics and genomics, intelligent diagnosis and treatment, prediction prognosis and the like. Models constructed using machine learning techniques exhibit strong predictive performance in the context of prediction prognoses in the medical field. A deep convolutional neural network model is constructed by retrospectively collecting 128175 retinal images by Google Brain, and then the model is used for identifying a new image, and the result shows that the model has 90.3% sensitivity and 98.1% specificity in identifying diabetic retinopathy, and the area under a ROC curve is 0.991.Kim et al used the constructed machine learning model to distinguish interstitial pneumonia from other interstitial lung diseases common in surgical lung biopsy samples, and the results showed that in the training set of 48 samples, the specificity of the model was 92%, the sensitivity was 82%, and in the testing set of 36 samples, the specificity of the model was 95%, and the sensitivity was 59%. Data mining and machine learning techniques have become one of the directions of modern medical development, which have a great role in the medical field to improve the accuracy of disease diagnosis and care adverse event management.
The method has the advantages that the risk factors of the pressure injury are identified and early risk assessment is carried out, the method is very important for preventing the pressure injury, the research progress of relevant factors of the pressure injury of patients in an intensive care unit is summarized, clinical medical staff are helped to further identify the influencing factors of the pressure injury of the patients in the intensive care unit, and a basis is provided for the clinical nursing practice and the construction research of a pressure injury prediction model.
Disclosure of Invention
The invention provides a novel model construction evaluation method for screening pressure injury risk factors and application thereof.
The scheme of the invention is as follows:
a novel model construction evaluation method for screening pressure injury risk factors comprises the following steps:
1) Collecting data, collecting patient basic information, wherein the patient basic information comprises independent variable information and dependent variable information, and inputting the basic information of a plurality of patients into Epidata of a processing end through input equipment to serve as a data set;
2) Classifying, wherein a processing end randomly divides a data set into a training set and a testing set by using software, wherein the training set accounts for 70% of data in the data set, and the rest is a verification set;
3) Constructing a model, wherein the training set is used for constructing the model, and three constructing modules in the processing end construct a Logistic regression model, a decision tree model and a random forest model through the training set respectively;
4) Performing comparative inspection, namely performing performance comparative test on the constructed Logistic regression model, the decision tree model and the random forest model respectively by using the test set;
5) And evaluation comparison, namely evaluating each module through the test set to obtain the evaluation index of each model, and then comparing the evaluation indexes of each model to provide each item of compared excellent evaluation index data.
The evaluation method of the evaluation model in the step 5) is a cross test method, which means that most of data set samples are used for constructing the model in one data set sample, then the established model is used for predicting the rest of small data set samples, the prediction errors of the small data set samples are solved, and the sum of squares of the prediction errors and the prediction errors is recorded; the cross-testing method has the function of preventing over-fitting and under-fitting phenomena and obtaining a reliable and stable model. Common Cross-testing methods include the K-Fold Cross-test (K-Fold Cross Validation, K-CV); the K-fold cross test method is characterized in that an initial data set is randomly divided into K sub-data sets, a certain independent sub-data set can be used as a data set of a test model, and the rest K-1 data sets are used for training and constructing the model; and repeating the cross test for K times, testing each subdata set once, and finally calculating the average evaluation index of the K models. The method has the advantage that training and testing are carried out by repeatedly using the randomly generated subdata sets at the same time, and the result of each time can be tested once. K is commonly taken to be five and ten, with the ten fold cross test being the most commonly used cross test method.
The evaluation indexes of the models in the step 5) are directly obtained through a confusion matrix; in the dichotomy problem, the confusion matrix has the following 4 outcomes: true (True Positive), true Negative (True Negative), false Positive (False Positive), and False Negative (False Negative). Wherein, the true category is truly represented as positive, and the prediction is also the positive category; true negative means that the true category is negative and the prediction is also a negative category; false positive means that the true category is negative, but the prediction is a positive category; false negative means that the true category is positive, but the prediction is negative;
the confusion matrix in the binary problem is shown in the following table (table 1):
Figure BDA0003499031560000031
Figure BDA0003499031560000041
the values of the indices can be calculated according to the corresponding formulas based on table 1.
Accuracy (Accuracy) is the most common metric in the classification problem, being the ratio of the number of samples predicted to be correct to all. The accuracy is between 0 and 1, and the larger the accuracy is, the better the classification result is. In general, the higher the accuracy, the better the classification model. The calculation formula is as follows:
Figure BDA0003499031560000042
the accuracy is suitable for a multi-classification model, but the evaluation effect in unbalanced concentration is poor. Precision (Precision) refers to the proportion of samples predicted to be of the positive class that are true positive class samples. The higher the accuracy rate, the fewer samples that are actually negative in the samples determined to be positive. The calculation formula is as follows:
Figure BDA0003499031560000043
recall (Recall), i.e., sensitivity, refers to the proportion of samples that are actually in the positive category to samples that are also predicted to be in the positive category. The higher the recall rate, the smaller the proportion of the positive samples that are misclassified. The calculation formula is as follows:
Figure BDA0003499031560000044
high accuracy and recall are the best, but these two metrics are mutually exclusive, so a compromise evaluation metric, F-measure (F-score), is proposed. The F measurement is a simple harmonic mean of the accuracy and the recall rate, the higher the F measurement is, the better the classification effect on the positive samples is, and the accuracy and the recall rate are better, and the performance of the algorithm is better. Accuracy and recall are also applicable to unbalanced sets, but not to multi-classification models. The value range of the F metric is between 0 and 1, and the higher the value of the F metric is, the more effective the classification model is. The formula is calculated as follows:
Figure BDA0003499031560000051
where P is the precision and R is the recall.
A Receiver Operating Characteristic Curve (ROC Curve for short), also called sensitivity Curve, is used for analyzing and evaluating the classifying discrimination effect, is a comprehensive index reflecting sensitivity and specificity, obtains multiple pairs of sensitivity and misjudgment rate (1-specificity) by judging the movement of points (or critical values), draws a Curve by taking the sensitivity as a vertical axis and taking the (1-specificity) as a horizontal axis, and takes the point closest to the upper left of a coordinate graph as a critical value with higher sensitivity and specificity on the ROC Curve.
AUC (angular coefficient), namely the area under the ROC curve, is an evaluation index of the quality of the model, and the larger the area under the curve is, the higher the accuracy is. The AUC value is between 0 and 1, and it is generally considered that the closer the AUC is to 1, the higher the sensitivity and specificity of the model is, the better the distinguishing capability is, and the higher the prediction performance and accuracy of the model are. Evaluation criteria: AUC < O.5, model no discriminatory ability; 0.5-less AUC-less-than-0.7, and the prediction accuracy of the model is low; 0.7-Ap (Apron and Apron) AUC (auc) Apron and 0.9, and the prediction accuracy of the model is moderate; AUC values above 0.9 indicate higher prediction accuracy of the model.
As a preferred technical scheme, the independent variable information is Y, Y is whether pressure damage occurs, the value of the dependent variable is set as 0 and 1,0 for no, and 1 for yes; also included are m independent variables X, wherein m is greater than or equal to 1, the independent variables include gender, history of diabetes, history of hypertension, history of stroke, state of consciousness, mechanical ventilation, sedatives, analgesics, vasopressors, age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin, wherein age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin are continuous independent variables; and male =0, female =1 in gender; no =0, yes =1 in the history of diabetes; no =0, yes =1 in the history of hypertension; no =0, yes =1 in stroke history; no =0 in state of consciousness, yes =1; no =0, yes =1 in mechanical ventilation; no =0, yes =1 in sedative; no =0, yes =1 in analgesics; blood vessel pressor drug is no =0, and is =1.
As a preferred technical scheme, the construction module constructs the Logistic regression model by performing multi-factor Logistic regression analysis on independent variables with statistical significance through single-factor analysis by using software, then bringing variables with statistical significance of multi-factor Logistic regression analysis results into a Logistic regression equation to construct the Logistic regression model, and finally visualizing the model by using a nomogram;
the m independent variables influencing the value of Y are respectively X1, X2, X3, \ 8230, xm, the conditional probability P = P (Y =1/X1, X2, X3, \ 8230; xm) of Y =1 under the action of the m independent variables, and then the Logistic regression equation is as follows:
Figure BDA0003499031560000061
in is the sign of Logistic regression equation, P is the probability, θ 0 Is a constant term, θ 1 ,θ 2 ……θ m Are regression coefficients.
As a preferred technical scheme, the construction module constructs the decision tree model by taking all independent variables as input and taking whether pressure damage occurs as output, constructing a decision tree initial model in a training set by using an rpar function of an rpart program package, and pruning the initial model by using a prune function so as to obtain a binary decision tree model;
the method mainly comprises the steps of tree building and pruning, wherein the tree building refers to the step of learning and obtaining a rule from a certain data set through an algorithm so as to build a model, and the CART algorithm is adopted in the decision tree model; pruning refers to deleting too many or some unreliable branches from the main body of the tree, thereby improving the prediction performance of the decision tree.
As a preferred technical scheme, the construction module constructs a random forest model by taking all independent variables as input and taking whether pressure damage occurs as output, a random forest initial model is constructed in a training set by using a random forest function of a random forest program package, the number of forests and the number of variable choices are obtained when the random forest model has the best prediction performance through parameter tuning, and then the importance ranking of the optimal random forest model variable is obtained by using a varImpPlut function;
the random forest model is an integrated algorithm comprising a plurality of decision trees; the method for improving the predictive performance of the model by collecting a plurality of classifiers is generated by the integrated learning algorithm, bagging and Boosting are two most common integrated learning methods, the difference between the Bagging integration mode and the Boosting integration mode is parallel, and the random forest algorithm is the most representative algorithm in the Bagging integration method. The randomness in random forests is mainly reflected in both the random selection of data and the random selection of features. The random selection of the data refers to that a data subset is constructed by sampling returned from a trained data set, the data subset is utilized to construct sub-decision trees, each sub-decision tree can obtain a result, then k classification trees are formed according to a self-service sample data set to form a combination (forest), the classification result of the algorithm needs to be voted and decided by each decision tree in the combination, and the category with the largest number of votes is the prediction result of the random forest algorithm. The essence of the method is to improve the decision tree algorithm, a plurality of decision trees are combined together, and the growth of each tree depends on an independent randomly selected sample. The features refer to variables in machine learning, and the random selection of the features refers to that every classification process of decision subtrees in a random forest does not use all independent variables, but randomly selects certain independent variables from all the independent variables, and then selects the optimal independent variables from the randomly selected independent variables to construct sub-decision trees, so that the sub-decision trees in the random forest are different, the diversity of the model is improved, and the prediction performance of the model is improved.
As a preferred technical solution, the simulation information in step 5) includes accuracy, precision, recall, F measure, sensitivity, specificity, positive predictive value, negative predictive value, and area under the ROC curve.
As a preferred solution, said patient in step 1) needs to meet inclusion criteria and does not violate exclusion criteria.
As a preferred technical scheme, the inclusion standard comprises the age of more than or equal to 18 years old; the time for entering the ICU is more than 24h; the exclusion criteria are based on incomplete information; patients with burns or blurred skin conditions that cannot be judged; the patient turning over is restricted.
The invention also discloses a method for perfecting the evaluation of the pressure damage risk factors by using the novel model construction evaluation method for screening the pressure damage risk factors.
The invention also discloses application of the novel model construction evaluation method for screening the risk factors of the pressure damage in development of pressure damage prediction software.
By adopting the technical scheme, the novel model construction evaluation method for screening the pressure injury risk factors and the application thereof comprise the steps of 1) collecting data, collecting patient basic information, wherein the patient basic information comprises independent variable information and dependent variable information, and inputting the basic information of a plurality of patients into Epidata of a processing end through input equipment to serve as a data set; 2) Classifying, wherein a processing end randomly divides a data set into a training set and a testing set by using software, wherein the training set accounts for 70% of data of the data set, and the rest is a verification set; 3) Constructing a model, wherein the training set is used for constructing the model, and three constructing modules in the processing end construct a Logistic regression model, a decision tree model and a random forest model respectively through the training set; 4) Performing comparative test, namely performing performance comparative test on the constructed Logistic regression model, the decision tree model and the random forest model by using the test set; 5) And evaluation comparison, namely evaluating each module through the test set to obtain the evaluation index of each model, and then comparing the evaluation indexes of each model to provide each item of excellent evaluation index data after comparison.
The invention has the beneficial effects that:
according to the method, through comparing the prediction performances of the Logistic regression prediction model, the decision tree prediction model and the random forest prediction model, the prediction model with the optimal prediction performance is determined, and a basis is provided for selecting a proper model for the development of intensive care unit pressure injury prediction software, so that the accuracy and convenience of clinical prediction of pressure injury of adult patients in intensive care units are improved, the pressure injury incidence rate of adult patients in intensive care units is reduced, and the pain and medical burden of patients are relieved;
confirming that the number of ICU hospitalization days is an important predictor of the occurrence of the pressure injury of patients in an intensive care unit; the method confirms that a random forest model can be selected as a prediction model for the development of pressure injury prediction software in the intensive care unit.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1: a Braden scale score plot for the patient admitted to the hospital in example 2 of the invention;
FIG. 2 is a schematic diagram: carrying out a nomogram on the Logistic regression model in the embodiment 2 of the invention;
FIG. 3: the decision tree model diagram in embodiment 2 of the invention;
FIG. 4: the number of forest trees in embodiment 2 of the invention is shown;
FIG. 5: in the embodiment 2 of the invention, the independent variable importance ranking chart of the random forest model is shown;
FIG. 6: an ROC curve of the Logistic regression model in the embodiment 2 of the invention is shown;
FIG. 7: the ROC curve diagram of the decision tree model in embodiment 2 of the invention;
FIG. 8: the ROC curve diagram of the random forest model in the embodiment 2 of the invention;
FIG. 9: an ROC curve comparison graph of Logistic regression, decision tree and random forest model in the embodiment 2 of the invention;
FIG. 10: the invention relates to a model building flow chart.
Detailed Description
In order to make up for the above deficiencies, the invention provides a novel model construction evaluation method for screening pressure injury risk factors and application thereof to solve the problems in the background art.
In the present invention
APACHE is acute physiological and chronic health assessment;
DT is a decision tree;
FP is false positive;
FN is false negative;
HAPI is a hospital-acquired pressure injury;
the ICU is an intensive care unit;
PI is pressure injury;
PU is pressure ulcer;
RF is random forest;
TP is true;
TN is True Negative.
A novel model construction evaluation method for screening pressure injury risk factors comprises the following steps:
1) Collecting data, collecting patient basic information, wherein the patient basic information comprises independent variable information and dependent variable information, and inputting the basic information of a plurality of patients into Epidata of a processing end through input equipment to serve as a data set;
2) Classifying, wherein a processing end randomly divides a data set into a training set and a testing set by using software, wherein the training set accounts for 70% of data in the data set, and the rest is a verification set;
3) Constructing a model, wherein the training set is used for constructing the model, and three constructing modules in the processing end construct a Logistic regression model, a decision tree model and a random forest model respectively through the training set;
4) Performing comparative inspection, namely performing performance comparative test on the constructed Logistic regression model, the decision tree model and the random forest model respectively by using the test set;
5) And evaluation comparison, namely evaluating each module through the test set to obtain the evaluation index of each model, and then comparing the evaluation indexes of each model to provide each item of compared excellent evaluation index data.
The independent variable information is Y, Y is whether pressure damage occurs, the value of the dependent variable is set to be 0, 1,0 represents no, and 1 represents yes; also included are m independent variables X, wherein m is greater than or equal to 1, the independent variables include gender, history of diabetes, history of hypertension, history of stroke, state of consciousness, mechanical ventilation, sedatives, analgesics, vasopressors, age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin, wherein age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin are continuous independent variables.
The construction module is used for constructing the Logistic regression model, namely, software is used for carrying out multi-factor Logistic regression analysis on independent variables with statistical significance through single-factor analysis, then variables with statistical significance of multi-factor Logistic regression analysis results are brought into a Logistic regression equation to construct the Logistic regression model, and finally, a nomogram is used for visualizing the model.
The construction module constructs a decision tree model, namely all independent variables are used as input, whether pressure damage occurs or not is used as output, an rpar function of an rpart program package is used for constructing a decision tree initial model in a training set, and then a prune function is used for pruning the initial model, so that a binary decision tree model is obtained.
The construction module constructs a random forest model by taking all independent variables as input and outputting whether pressure damage occurs or not, constructing a random forest initial model in a training set by using a randomForest function of a randomForest program package, obtaining the forest number and variable selection number when the random forest model has the best prediction performance through parameter tuning, and then obtaining the importance ranking of the optimal random forest model variable by using a varImpPlot function.
The simulation information in the step 5) comprises accuracy, precision, recall rate, F measurement, sensitivity, specificity, positive prediction value, negative prediction value and area under an ROC curve.
The patient in step 1) needs to meet inclusion criteria and not violate exclusion criteria.
The inclusion criteria include age ≥ 18 years of age; the time for entering the ICU is more than 24 hours; the exclusion criteria are based on incomplete information; burn or a patient whose skin condition is unclear and cannot be judged; the patient turning over is restricted.
The invention discloses a method for perfecting evaluation of pressure damage risk factors by a novel model construction evaluation method for screening the pressure damage risk factors.
The invention discloses a novel model construction evaluation method for screening pressure injury risk factors, which is applied to development of pressure injury prediction software.
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Example 1:
1) Collecting data, collecting patient basic information which comprises independent variable information and dependent variable information, and inputting the basic information of a plurality of patients into the Epidata of a processing end through input equipment to serve as a data set;
2) Classifying, wherein a processing end randomly divides a data set into a training set and a testing set by using software, wherein the training set accounts for 70% of data in the data set, and the rest is a verification set;
3) Constructing a model, wherein the training set is used for constructing the model, and three constructing modules in the processing end construct a Logistic regression model, a decision tree model and a random forest model respectively through the training set;
4) Performing comparative test, namely performing performance comparative test on the constructed Logistic regression model, the decision tree model and the random forest model by using the test set;
5) And evaluation comparison, namely evaluating each module through the test set to obtain the evaluation index of each model, and then comparing the evaluation indexes of each model to provide each item of excellent evaluation index data after comparison.
The independent variable information is Y, Y is whether pressure damage occurs, the value of the dependent variable is set as 0, 1,0 represents no, and 1 represents yes; also included are m independent variables X, wherein m is greater than or equal to 1, the independent variables include gender, history of diabetes, history of hypertension, history of stroke, state of consciousness, mechanical ventilation, sedatives, analgesics, vasopressors, age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin, wherein age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin are continuous independent variables.
The construction module is used for constructing the Logistic regression model, namely, software is used for carrying out multi-factor Logistic regression analysis on independent variables with statistical significance through single-factor analysis, then variables with statistical significance of multi-factor Logistic regression analysis results are brought into a Logistic regression equation to construct the Logistic regression model, and finally, the model is visualized through a nomogram.
The construction module constructs a decision tree model by taking all independent variables as input and taking whether pressure damage occurs as output, constructing a decision tree initial model in a training set by using an rpar function of an rpart program package, and pruning the initial model by using a prune function so as to obtain a binary decision tree model.
The construction module constructs a random forest model by taking all independent variables as input and outputting whether pressure damage occurs or not, constructing a random forest initial model in a training set by using a randomForest function of a randomForest program package, obtaining the forest number and variable selection number when the random forest model has the best prediction performance through parameter tuning, and then obtaining the importance ranking of the optimal random forest model variable by using a varImpPlot function.
The simulation information in the step 5) comprises accuracy, precision, recall, F measurement, sensitivity, specificity, positive prediction value, negative prediction value and area under an ROC curve.
The patient in step 1) needs to meet inclusion criteria and does not violate exclusion criteria.
The inclusion criteria include age ≥ 18 years of age; the time for entering the ICU is more than 24 hours; the exclusion criteria are based on incomplete information; patients with burns or blurred skin conditions that cannot be judged; the patient turning over is restricted.
The invention discloses a method for perfecting evaluation of pressure damage risk factors by a novel model construction evaluation method for screening the pressure damage risk factors.
The method for evaluating the risk factors of the pressure injury comprises the following steps:
a first part: clinical judgment method for risk factors of stress injury directly evaluates population with high risk of stress injury according to patient conditions
Assisted ventilation for mouth breathing machine
Coma or complete paralysis
Continuous oral sedation
Oral stools or urinary incontinence
A second part: when the patient does not meet any item in the first part, the second part is entered for evaluation, two items are met, the patient is evaluated as a middle risk group, one item is met, and the patient is evaluated as a low risk group.
Hemiplegia or paraplegia of the mouth
Emergency operation with oral operation duration being more than or equal to 90min
Oral BMI < 18.5
Oral serum albumin less than 35g/L
The oral hemoglobin is less than 30g/L
And a third part: when the patient does not comply with the complaint option, the patient is rated as a population without the risk of stress injury.
Preventive measures against stress damage:
turnover schedule for oral use
Frequency of turnover guaranteed by mouth
Oral use bed surface or chair surface decompression equipment
Maximum movement of the mouth
Oral ulcer paste for protecting pressed part
Keeping skin clean by mouth, and maintaining weak alkalinity of skin by using skin cream
The mouth ensures the nutrition of the patient.
Example 2:
1 study object
Adult patients who are hospitalized in intensive care units in a comprehensive Hospital in Wenzhou city from 2017 to 2019 in 12 are selected as study objects by adopting a convenient sampling method.
1.1 inclusion and exclusion criteria
1.2 (1) inclusion criteria: (1) the age is more than or equal to 18 years old; (2) the time for staying in the ICU is more than 24h. (2) exclusion criteria: (1) pressure injury from hospital entry; (2) incomplete medical history and clinical data in hospital; (3) patients with burns or blurred skin conditions that cannot be judged; (4) the patient who turns over is restricted.
1.2 sample size
The sample size of the invention is 10-15 times of the research factor, the invention determines the sample size as 10 times of the research factor, the invention collects 17 related factors of pressure injury, namely, 170 samples are respectively needed by patients with PI and patients without PI, and the sample size is increased by 20 percent considering invalid evaluation possibly caused by incomplete data, and at least 408 samples are needed in total. In the actual collecting process, 639 adult patients meeting the inclusion and exclusion standards in the intensive care unit are collected together, and the sample size requirement is met.
2 research tools
Relevant factors of the intensive care unit patient suffering from the pressure injury are determined through literature review, a questionnaire of the adult patient suffering from the pressure injury in the intensive care unit is designed as a survey tool, and the content of the questionnaire comprises basic information, treatment conditions, laboratory examination conditions and the like of the patient:
(1) Basic information: sex, age, ethnicity, occupation, marital conditions, cultural degree, religious belief, major diagnosis of admission, history of diabetes, history of hypertension, history of stroke, etc.;
(2) Treatment of the condition: patient state of consciousness, use of mechanical ventilation, sedatives, analgesics, vasoactive drugs, and the like;
(3) Laboratory examination items: hemoglobin, serum albumin, lactic acid, creatinine, urea nitrogen;
(4) Braden scale score at admission of patient (figure 1).
3 research methods
3.1 data Collection
By referring to an electronic medical record system of a certain comprehensive Hospital in Wenzhou city, adult patients who are hospitalized in an intensive care unit from 1 month in 2017 to 12 months in 2019 are retrieved, relevant medical record data meeting inclusion and exclusion criteria of the patients are collected retrospectively, and a questionnaire is filled. The consciousness state, drug use and laboratory examination items of PI patients collect data on the day of the stress injury. As shown in previous studies [57], ICU patients experienced pressure-related lesions on an average basis of 11.7 days, and therefore, this study collected data from patients who had not experienced PI admission for up to 14 days. The repeated measurements are averaged. The relevant index collection decision is as follows:
(1) Stress injury: the assessment was made according to the definition and staging of stress injury as updated by the american council for stress injury council [1] in 2016. Patients who could not be assessed as stress injury were excluded. (2) state of consciousness: the present study divides patients into conscious and unconscious disorders. Disturbance of consciousness means disturbance in the ability of an individual living being to perceive and recognize their own state and surroundings, and includes lethargy, coma, confusion, delirium, etc., and gramansia's coma rating scale (GCS) score is less than 15 points, and it is determined that the patient has a disturbance of consciousness to varying degrees. Patients with non-conscious disturbance, i.e. conscious, have a glasgow coma rating scale (GCS) score equal to 15 points.
(3) Mechanical ventilation: means whether the patient is ventilated mechanically through nose, mouth and trachea cannula, ventilated mechanically through tracheotomy or by using a noninvasive ventilator.
(4) Sedative: it indicates whether the patient uses the sedative drugs such as propofol, imidazole diazepam, midazolam, dexmedetomidine hydrochloride, etc.
(5) Pain relieving agent: means whether the patient uses fentanyl, remifentanil, butorphinuo and other analgesic drugs.
(6) Blood vessel pressure-increasing drugs: it refers to whether the patient uses blood vessel pressure-increasing medicines such as adrenaline, noradrenaline and dopamine.
And 3.2, data sorting and assignment, namely recording the collected questionnaire data into Epidata by adopting a unified standard to establish a data set.
Table 2 below shows the variable assignment table
Figure BDA0003499031560000131
Figure BDA0003499031560000141
3.3 data preprocessing
Utilizing R software to perform factor conversion on the classification variables, and then performing factor conversion on the data set according to the ratio of 7:3 into a training set and a test set.
3.4 construction of the model
3.4.1 Logistic regression model
And performing multi-factor Logistic regression analysis on independent variables with statistical significance after single-factor analysis by using R software, bringing the variables with statistical significance of the multi-factor Logistic regression analysis result into a Logistic regression equation to construct a Logistic regression model, and finally visualizing the model by using a nomogram.
3.4.2 decision Tree model
And taking all independent variables as input, taking whether pressure damage occurs as output, constructing a decision tree initial model in a training set by using a rpart function of a rpart program package, and pruning the initial model by using a prune function so as to obtain a binary decision tree.
3.4.3 random forest model
And taking all independent variables as input, taking whether pressure damage occurs as output, constructing a random forest initial model in a training set by using a random forest function of a random forest program package, obtaining the forest number and variable selection number when the random forest model has the best prediction performance through parameter tuning, and then obtaining the importance ranking of the optimal random forest model variable by using a varImpPlot function.
3.5 evaluation of model
And respectively calculating the accuracy, precision, recall rate, F measurement, sensitivity, specificity, positive predicted value, negative predicted value and area under an ROC curve of the Logistic regression model, the decision tree model and the random forest model in the test set, and evaluating and comparing the prediction performances of the three prediction models.
4 statistical method
R (3.6.3) was used for data set partitioning, patient general profile description, and for single factor analysis, the continuous independent variables were tested for positive-Tai distribution and homogeneity of variance, shapiro-Wilktest was used for positive-Tai distribution, levene's test was used for homogeneity of variance, t test was used for continuous variables that were too distributed and homogeneous in variance, nonparametric test was used for continuous variables that were not too distributed or inhomogeneous in variance, chi-square test was used for categorical variables, and test level α =0.05. Multifactor Logistic regression analysis, check level α =0.05. Constructing a Logistic regression model by utilizing a glm function; constructing a decision tree initial model in a training set by using an rpart function in an rpart program package, and pruning the initial model by using a prune function to obtain a final decision tree model; and (3) constructing a random forest initial model in a training set by using a randomForest function of a randomForest program package, obtaining the number of the random forest model with the best prediction performance and the number of variables through parameter tuning, and finally obtaining the importance ranking of the random forest model variables by using a varImpPlut function. Model prediction is performed by using a predict function, and ROC curves are drawn by using a pROC program package.
Results
General data on the study
A total of 639 adult patients in intensive care units were included in the study, 219 patients who developed PI and 420 patients who did not develop PI, patients were treated as 7:3 into training set and test set, wherein the training set comprises 472 patients, the test set comprises 167 patients, and the general data of the patients in the training set and the test set are shown in table 3;
TABLE 3 general data for patients in training and test sets
Figure BDA0003499031560000151
Figure BDA0003499031560000161
2 construction of the model
2.1 Construction of Logistic regression model
2.1.1 Single factor analysis
The collected gender, diabetes history, hypertension history, stroke history, consciousness state, mechanical ventilation, sedatives, analgesics, vasopressors, age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid and serum albumin were analyzed by single factor for a total of 17 independent variables and the results showed: braden scale score, ICU days of hospitalization, state of consciousness, mechanical ventilation, sedatives, urea nitrogen and lactic acid were statistically significant with the occurrence of stress injury (P < 0.05); but gender, age, history of diabetes, history of hypertension, history of stroke, analgesics, vasopressors, hemoglobin, creatinine and serum albumin and the occurrence of stress injuries are not statistically significant (P > 0.05). The results of the one-factor analysis are shown in Table 4:
TABLE 4 Single factor analysis
Figure BDA0003499031560000171
Figure BDA0003499031560000181
Note: * Denotes P <0.05
2.1.2 Multi-factor Logistic regression analysis
And (3) carrying out multi-factor Logistic regression analysis on 7 independent variables with statistical significance on the single-factor analysis result, wherein the result shows that: the 3 variables of the ICU hospitalization days, the consciousness state and the sedative have statistical significance (P is less than 0.05) with the occurrence of the stress injury, are independent risk factors of the stress injury, while the mechanical ventilation, the Braden scale score, the urea nitrogen and the lactic acid are not independent risk factors of the stress injury, and the multi-factor Logistic regression analysis result is shown in a table 5;
TABLE 5 Multi-factor Logistic regression analysis of stress Damage
Figure BDA0003499031560000182
Note: * Denotes P <0.05
2.1.3 Logistic regression model
Obtaining 3 independent influence factors of the pressure damage and regression coefficients thereof through single factor analysis and multi-factor Logistic regression analysis, wherein the established regression equation is as follows:
Logit(P)=In(P/1-P)=-2.490+0.981X5+0.787X7+0.083X12
the probability model for predicting the occurrence of the stress injury is as follows:
P=1/(1+Exp(-2.490+0.981X5+0.787X7+0.083X12))。
2.1.4 plotting alignment
And (3) carrying out nomogram visualization on the Logistic regression model (see figure 2), wherein the uppermost Points of the nomogram corresponds to the values of the prediction factors below, the Total Points refers to the sum of the scores of all the prediction factors, and the lowest probability corresponding to the Total Points is the probability of predicting the pressure injury of the patient.
2.2 construction of decision Tree models
All independent variables are used as input, whether pressure damage occurs or not is used as output to establish an initial model, and the independent variables left after pruning are as follows: ICU days of hospitalization and state of consciousness. Fig. 3 is a decision tree model, the uppermost box is a root node, the lower box is a leaf node, and one path from the root node to the leaf node each time is a classification rule, as can be seen from fig. 3, when the number of hospitalization days of the patient in the ICU is less than 10 days, it is determined that the patient does not suffer from stress injury, or when the number of hospitalization days of the patient in the ICU is greater than or equal to 10 days, but the consciousness state of the patient is non-consciousness disorder, it is determined that the patient does not suffer from stress injury, otherwise, it is determined that the patient suffers from stress injury.
2.3 construction of random forest model
And (4) establishing an initial model by taking all independent variables as input and taking whether pressure damage occurs as output. Two important parameters determining the prediction capability of the model in the random forest model are respectively the number of preselected variables of tree nodes and the number of trees in the random forest, and the results show that the prediction performance of the established random forest model is the best when the number of forest trees (shown in figure 4) is set to 96 and the variables are selected to be 5 through parameter tuning, and the importance degree of independent variables (shown in figure 5) of the model is sequentially from high to low: ICU days in hospital, urea nitrogen, serum albumin, creatinine, lactic acid.
3 evaluation of the model
3.1 Logistic regression model
3.1.1 The classification result of the Logistic regression model in the test set is shown in table 6;
TABLE 6 results of the Classification of Logistic regression models in test sets
Figure BDA0003499031560000191
According to table 6, the accuracy of the Logistic regression model in the test set is 71.26%, the accuracy is 61.11%, the recall rate is 39.92%, the F measure is 47.83%, the sensitivity is 39.29%, the specificity is 87.39%, the positive predictive value is 61.11%, and the negative predictive value is 74.05% can be calculated by the formula. 3.1.2Logistic regression model ROC curve the Logistic regression model predicts the test set data using plot function (fig. 6) and the area under the ROC curve is 0.757.
3.2 decision Tree model
3.2.1 Classification results of decision Tree models in test set
The classification results of the decision tree model in the test set are shown in table 7;
TABLE 7 classification results of decision Tree models in test set
Figure BDA0003499031560000201
According to table 7, the accuracy of the decision tree model on the test set is 74.85%, the accuracy is 62.50%, the recall rate is 62.50%, the F measure is 62.50%, the sensitivity is 62.50%, the specificity is 81.08%, the positive predictive value is 62.50%, and the negative predictive value is 81.08% can be calculated by the formula.
3.2.2 ROC Curve of decision Tree model
And drawing an ROC curve (figure 7) for predicting the test set data by using the decision tree model by using a plot function, and obtaining the area under the ROC curve to be 0.742.
3.3 random forest model
3.3.1 classification results of random forest models in test set
The classification results of the random forest models in the test set are shown in table 8.
TABLE 8 results of classification of random forest models in test set
Figure BDA0003499031560000202
According to table 8, the accuracy of the random forest model on the test set was 75.45%, the accuracy was 65.96%, the recall rate was 55.36%, the F metric was 60.20%, the sensitivity was 55.36%, the specificity was 85.59%, the positive predictive value was 65.96%, and the negative predictive value was 79.17% were calculated by the formula. 3.3.2 the ROC curve of the random forest model is used for drawing an ROC curve (figure 8) for predicting the test set data by the random forest model by using a plot function, and the area under the ROC curve is 0.816.
Comparison of 4 models
4.1 evaluation index comparison of model in test set
From table 9, it can be seen that, among the three constructed models, the Logistic regression model has the highest specificity, the decision tree model has the highest recall rate, F measurement, sensitivity and negative prediction value, and the random forest model has the highest accuracy, accuracy and positive prediction value.
TABLE 9 evaluation indices (%), of Logistic regression, decision tree and random forest model in test set
Figure BDA0003499031560000211
4.2 ROC curve comparison of models
The ROC curves of the Logistic regression model, the decision tree model and the random forest model are drawn on the same graph (figure 9), the area under the ROC curve of the random forest model represented by red can be seen to be the largest, and the prediction performance of the random forest model is better than that of the Logistic regression model and the decision tree model.
Discussion of the related Art
1 feature selection of model
The characteristics are also called attributes, namely independent variables, and the characteristics which are meaningful for the model or greatly contribute to the model are selected to carry out model construction in the model construction process. The Logistic regression model is mainly characterized in that a meaningful variable is screened for modeling through single-factor analysis and multi-factor Logistic regression analysis. Through multi-factor Logistic regression analysis, research results show that the ICU hospitalization days, the consciousness state and the sedative are independent risk factors of the occurrence of the stress injury, and therefore the variables are used as prediction factors of a Logistic regression model. The ICU hospitalization days are related to the occurrence of the pressure injury, and the analysis reason is that the longer the patient is in the ICU hospitalization days, the longer the patient is in bed, the resistance of the body is reduced, and the patient is restrained or calmed, the activity and the moving force of the patient are reduced, and the local long-term stress increases the risk of the pressure injury;
decision tree models are processes that classify data according to some attribute. The selection of attributes is important in the process of building the tree, and the classification algorithm has important attributes such as information gain, information gain ratio, and Gini (Gini) index. The decision tree basic algorithm comprises an ID3 algorithm, a C4.5 algorithm and a CART algorithm. The ID3 algorithm is used for selecting characteristics in the process of tree building by utilizing information gain, and the characteristics with the maximum information gain are used for determining the nodes of the decision tree. The invention adopts a CART algorithm, namely a classification and regression tree algorithm, which can be used for classification and regression. When the CART algorithm constructs a decision tree, a binary recursion technique is usually adopted to set a tree structure. Generally, the data sample set is divided into two sub-sample sets by the CART algorithm, the two sub-sets respectively construct a recursive decision tree therein, and each non-leaf node also respectively comprises two branches. The generation of the decision tree structure is also the process of information recursion, the information is formed through recursion to obtain two branched decision trees, the basic principle of the CART algorithm is that the average error value is minimum, and the standard of attribute selection is the Gini coefficient value. The decision tree model constructed in the research takes two attributes of ICU hospitalization days and consciousness state as classification rules.
The feature importance ordering of the random forest model is mainly obtained through out-of-bag data (OOB), and the prediction error rate of the model, namely out-of-bag data errors, is calculated. The top 5 variable importance ranks of the random forest model constructed in this study were ICU days of hospitalization, urea nitrogen, serum albumin, creatinine and lactic acid, respectively.
The Logistic regression model, the decision tree model and the random forest model all use the ICU hospitalization days as prediction factors of the occurrence of the pressure injury, the ICU hospitalization days in the decision tree model are main classification rules, and the ICU hospitalization days in the random forest model are the most important prediction factors predicted by the model, so that the ICU hospitalization days are the important prediction factors of the occurrence of the pressure injury of the ICU patients.
2 predicted Performance comparison of models
The result of the invention shows that, through test set tests, the accuracy, precision, recall ratio, F measurement, sensitivity, specificity, positive prediction value and negative prediction value of the three prediction models are respectively good and bad, the Logistic regression model only has the highest specificity, the decision tree model has the highest recall ratio, F measurement, sensitivity and negative prediction value, the random forest model has the highest accuracy, precision and positive prediction value, the area under the prediction performance comprehensive evaluation index ROC curve of the model shows that the AUC of the random forest model is higher than that of the Logistic regression model and the decision tree model, and the prediction performance of the random forest model is better than that of the Logistic regression model and the decision tree model according to the research result, so the conclusion that the prediction capability of the model is random forest > Logistic regression > multiple linear regression > decision tree is obtained.
And (4) the prediction performance of the random forest is better than that of a Logistic regression model and a decision tree model.
The result shows that the prediction accuracy and sensitivity of the random forest model are highest.
3 advantages and disadvantages of the model
The Logistic regression model has the advantages of simplicity, easy understanding, good model interpretability, visualization of the influence of different variables on the final result by using a nomogram from the regression coefficient of the variables, and convenience for clinical use. The method has the defects that the nonlinear problem cannot be solved, the method is sensitive to multiple collinearity data, the form is simple, and the method is similar to a linear model and is difficult to fit the real distribution of the data.
The CART algorithm of the decision tree model has the advantages that no logarithm operation is performed during attribute selection, the calculated amount is small, the efficiency is high, and missing data can be processed. The CART algorithm is used for carrying out secondary classification on data, the finally generated decision tree is a binary tree, the structure is clear at a glance, the classification rule is easy to find, compared with a Logistic regression model and a random forest model, the decision process of the decision tree model is very intuitive and easy to understand, the decision tree model accords with decision thinking of people, and the interpretability is good. The disadvantage is that if there are too many classes, the probability of misclassification is large, and the prediction structure is unstable for data sets of smaller samples.
The random forest is widely concerned by people due to good classification performance, and has certain algorithm advantages compared with other classification algorithms, mainly expressed in the aspects of high classification precision, small generalization error, high algorithm training speed, easy parallelization calculation and the like. An important characteristic of the random forest model is that a bagging thought is combined, a plurality of sample data sets are obtained through repeated sampling with random back placement, the influence of sample errors is balanced by a method of randomly selecting a large number of sub-sample data sets, and a model result trained by a large number of generated data is more reliable. The method has the advantages that data with high dimensionality and unbalanced classification can be processed, missing values can be processed, and data adaptability is strong no matter discrete data or continuous data are not required; in the estimation process of the random forest model, the features may be ranked in importance. Due to the introduction of the Bagging algorithm, the anti-noise capability of the model is good, the prediction capability and accuracy of the model are improved, and compared with a single decision tree, the problem of model overfitting is reduced.
Application prospect of 4 artificial intelligence and prediction model of the invention in nursing field
The method can be applied to the application of artificial intelligence in the field of nursing;
the method can also be applied to electronic medical records, which are digitalized medical service working records of information, characters, symbols, icons, data, images and the like generated by a system and used by medical staff of a medical institution for clinic diagnosis and treatment and guidance intervention of outpatients, inpatients or health-care objects. The electronic medical record contains a large amount of medical information data, and secondary analysis is facilitated. The risk prediction model based on the electronic medical record data can better help nursing staff to make decisions in advance and improve adverse outcome of patients. Currently, many hospitals are unable to effectively utilize large data to analyze electronic medical records to generate high quality studies and their clinical practices. The electronic medical record data is utilized, the mining potential is huge, the clinical use is more and more, the risk prediction model can more scientifically and effectively guide nursing work, and the development and application of the risk prediction model still have a large space in the nursing field. The model constructed by utilizing machine learning not only has good prediction performance, but also can be conveniently used in practical application.
Conclusion
Icu hospital days is an important predictor of the occurrence of stress injuries in intensive care unit patients.
2. The intensive care unit pressure injury prediction software is developed and used for constructing the model in the invention.
Novelty
(1) The prediction model comparison is carried out on intensive care unit patients with high risk of pressure injury.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, and such changes and modifications are within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A novel model construction evaluation method for screening pressure injury risk factors is characterized by comprising the following steps:
1) Collecting data, collecting patient basic information, wherein the patient basic information comprises independent variable information and dependent variable information, and inputting the basic information of a plurality of patients into Epidata of a processing end through input equipment to serve as a data set;
2) Classifying, wherein a processing end randomly divides a data set into a training set and a testing set by using software, wherein the training set accounts for 70% of data in the data set, and the rest is a verification set;
3) Constructing a model, wherein the training set is used for constructing the model, and three constructing modules in the processing end construct a Logistic regression model, a decision tree model and a random forest model respectively through the training set;
4) Performing comparative inspection, namely performing performance comparative test on the constructed Logistic regression model, the decision tree model and the random forest model respectively by using the test set;
5) And evaluation comparison, namely evaluating each module through the test set to obtain the evaluation index of each model, and then comparing the evaluation indexes of each model to provide each item of compared excellent evaluation index data.
2. The novel model construction evaluation method for screening the risk factors of the pressure injury as claimed in claim 1, wherein: the independent variable information is Y, Y is whether pressure damage occurs, the value of the dependent variable is set to be 0, 1,0 represents no, and 1 represents yes; also included are m independent variables X, wherein m is greater than or equal to 1, the independent variables include gender, history of diabetes, history of hypertension, history of stroke, state of consciousness, mechanical ventilation, sedatives, analgesics, vasopressors, age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin, wherein age, braden scale score, ICU days of hospitalization, hemoglobin, urea nitrogen, creatinine, lactic acid, and serum albumin are continuous independent variables.
3. The novel model construction evaluation method for screening risk factors for stress injury as set forth in claim 1, characterized in that: the construction module is used for constructing the Logistic regression model, namely, software is used for carrying out multi-factor Logistic regression analysis on independent variables with statistical significance through single-factor analysis, then variables with statistical significance of multi-factor Logistic regression analysis results are brought into a Logistic regression equation to construct the Logistic regression model, and finally, a nomogram is used for visualizing the model.
4. The novel model construction evaluation method for screening risk factors for pressure injury as set forth in claim 1, characterized in that: the construction module constructs a decision tree model by taking all independent variables as input and taking whether pressure damage occurs as output, constructing a decision tree initial model in a training set by using an rpar function of an rpart program package, and pruning the initial model by using a prune function so as to obtain a binary decision tree model.
5. The novel model construction evaluation method for screening risk factors for pressure injury as set forth in claim 1, characterized in that: the construction module constructs a random forest model by taking all independent variables as input and taking whether pressure damage occurs as output, constructing a random forest initial model in a training set by using a random forest function of a random forest program package, obtaining the forest number and variable selection number when the random forest model has the best prediction performance through parameter optimization, and then obtaining the importance sequence of the optimal random forest model variable by using a varImpPlot function.
6. The novel model construction evaluation method for screening risk factors for pressure injury as set forth in claim 1, characterized in that: the simulation information in the step 5) comprises accuracy, precision, recall, F measurement, sensitivity, specificity, positive prediction value, negative prediction value and area under an ROC curve.
7. The novel model construction and evaluation method for screening risk factors for stress injury according to claim 1, characterized in that: the patient in step 1) needs to meet inclusion criteria and not violate exclusion criteria.
8. The novel model construction evaluation method for screening risk factors for pressure injury as set forth in claim 7, characterized in that: the inclusion criteria include age ≥ 18 years; the time for entering the ICU is more than 24 hours; the exclusion criteria are based on incomplete information; burn or a patient whose skin condition is unclear and cannot be judged; the patient who turns over is restricted.
9. A pressure injury risk factor evaluation method is perfected by the novel model construction and evaluation method for screening the pressure injury risk factors as claimed in claims 1 to 8.
10. The model construction evaluation method for screening the risk factors of the pressure injury is applied to the development of pressure injury prediction software through the novel model construction evaluation method as claimed in claims 1 to 8.
CN202210122514.9A 2022-02-09 2022-02-09 Novel model construction evaluation method for screening pressure injury risk factors and application thereof Withdrawn CN115312196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210122514.9A CN115312196A (en) 2022-02-09 2022-02-09 Novel model construction evaluation method for screening pressure injury risk factors and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210122514.9A CN115312196A (en) 2022-02-09 2022-02-09 Novel model construction evaluation method for screening pressure injury risk factors and application thereof

Publications (1)

Publication Number Publication Date
CN115312196A true CN115312196A (en) 2022-11-08

Family

ID=83855043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210122514.9A Withdrawn CN115312196A (en) 2022-02-09 2022-02-09 Novel model construction evaluation method for screening pressure injury risk factors and application thereof

Country Status (1)

Country Link
CN (1) CN115312196A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116312929A (en) * 2023-05-25 2023-06-23 四川互慧软件有限公司 OKR-based pressure sore management method, system, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116312929A (en) * 2023-05-25 2023-06-23 四川互慧软件有限公司 OKR-based pressure sore management method, system, equipment and medium

Similar Documents

Publication Publication Date Title
Johnson et al. Reproducibility in critical care: a mortality prediction case study
Cramer et al. Predicting the incidence of pressure ulcers in the intensive care unit using machine learning
Bozkurt et al. Using automatically extracted information from mammography reports for decision-support
Ding et al. Mortality prediction for ICU patients combining just-in-time learning and extreme learning machine
CN113838577B (en) Convenient layered old people MODS early death risk assessment model, device and establishment method
CN111933281B (en) Disease typing determination system, method, device and storage medium
Chen et al. Heterogeneous postsurgical data analytics for predictive modeling of mortality risks in intensive care units
Ho et al. Imputation-enhanced prediction of septic shock in ICU patients
CN107767960A (en) Data processing method, device and the electronic equipment of clinical detection project
Meng et al. Mimic-if: Interpretability and fairness evaluation of deep learning models on mimic-iv dataset
CN111553478A (en) Community old people cardiovascular disease prediction system and method based on big data
CN113593708A (en) Sepsis prognosis prediction method based on integrated learning algorithm
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
Popkes et al. Interpretable outcome prediction with sparse Bayesian neural networks in intensive care
CN116189866A (en) Remote medical care analysis system based on data analysis
CN115295151A (en) Sepsis prediction system, prediction model construction method, system and kit
CN115312196A (en) Novel model construction evaluation method for screening pressure injury risk factors and application thereof
CN114023440A (en) Model and device capable of explaining layered old people MODS early death risk assessment and establishing method thereof
Theodoraki et al. Innovative data mining approaches for outcome prediction of trauma patients
Luo et al. Data mining-based detection of rapid growth in length of stay on COPD patients
CN114724701A (en) Noninvasive ventilation curative effect prediction system based on superposition integration algorithm and automatic encoder
CN114566284A (en) Disease prognosis risk prediction model training method and device and electronic equipment
WO2023106960A1 (en) Method for predicting the onset of a medical event in a person&#39;s health
Rajmohan et al. G-Sep: A Deep Learning Algorithm for Detection of Long-Term Sepsis Using Bidirectional Gated Recurrent Unit
CN114649071A (en) Real world data-based peptic ulcer treatment scheme prediction system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20221108