CN113782183A

CN113782183A - Pressure damage risk prediction device and method based on multi-algorithm fusion

Info

Publication number: CN113782183A
Application number: CN202110869608.8A
Authority: CN
Inventors: 韩琳; 马玉霞; 张红燕; 袁晨璐
Original assignee: GANSU PROVINCIAL HOSPITAL
Current assignee: GANSU PROVINCIAL HOSPITAL
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2021-12-10
Anticipated expiration: 2041-07-29
Also published as: CN113782183B

Abstract

The invention relates to a pressure injury risk prediction device and method based on multi-algorithm fusion, wherein the device comprises a processing unit, and the processing unit is configured to execute the following steps: acquiring analyzable medical record data, and redistributing the analyzable medical record data based on the significant risk variables, thereby generating a first classification training set for a specific target population; performing regression modeling on the first classification training set by using a random forest model so as to generate a first pressure damage risk prediction model related to the first classification training set; classifying the first pressure damage risk prediction model to obtain a second risk variable and a second weight value which represent the characteristics of the first pressure damage risk prediction model, and combining the plurality of first pressure damage risk prediction models based on the second risk variable to generate a second pressure damage risk prediction model; and predicting the risk of the pressure injury by using the second pressure injury risk prediction model.

Description

Pressure damage risk prediction device and method based on multi-algorithm fusion

Technical Field

The invention relates to the technical field of medical data processing, in particular to a pressure injury risk prediction device and method based on multi-algorithm fusion.

Background

Pressure Injury (PI) is a localized Injury that occurs in the skin or potentially subcutaneous soft tissue, usually at the bony prominences or at locations where contact with medical device equipment occurs. Stress injuries can have adverse effects on the mind and body of the patient and can also increase the patient's hospitalization time, complication rate and mortality rate.

At present, the research on the stress injury mainly focuses on the development mechanism of the injury, the analysis of the characteristics of the injury, the research on the characteristics of injured patients and nursing measures, most of the research is the statistical analysis and objective description of historical medical records, and the prediction research on the injury is lacked. The risk prediction is the first measure for preventing the stress injury, and whether the accuracy of the risk prediction result directly influences the selection and prevention effect of the preventive measure.

At present, a pressure injury risk prediction model is constructed by utilizing multi-factor regression analysis in clinical medicine to predict the risk of pressure injury. For example, document [1] Liqing, Suqiang, Linying, etc.. analysis and prediction of pressure injury of inpatients based on machine learning [ J ]. college university proceedings (Nature science edition), 2020(10) discloses the establishment of a prediction model using 3 methods of a support vector machine, a probabilistic neural network and a generalized regression neural network, in which a model is constructed using a Gaussian kernel function, and kernel function parameters are optimized using a genetic algorithm. However, the technical solution provided in this document does not take into account the complexity of the patient in the clinic and the risk prediction model needs to have an extended capability to incorporate new risk variables.

Furthermore, on the one hand, due to the differences in understanding to the person skilled in the art; on the other hand, since the inventor has studied a lot of documents and patents when making the present invention, but the space is not limited to the details and contents listed in the above, however, the present invention is by no means free of the features of the prior art, but the present invention has been provided with all the features of the prior art, and the applicant reserves the right to increase the related prior art in the background.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a pressure injury risk prediction device based on multi-algorithm fusion, which comprises a processing unit. The processing unit is configured to perform the steps of:

acquiring analyzable medical record data, and redistributing the analyzable medical record data based on the significant risk variables, thereby generating a first classification training set for a specific target population;

performing regression modeling on the first classification training set by using a random forest model so as to generate a first pressure damage risk prediction model related to the first classification training set;

classifying the first pressure damage risk prediction model to obtain a second risk variable and a second weight value which represent the characteristics of the first pressure damage risk prediction model, and combining the plurality of first pressure damage risk prediction models based on the second risk variable to generate a second pressure damage risk prediction model;

and predicting the risk of the pressure injury by using the second pressure injury risk prediction model. For pressure damage risk prediction, risk prediction is generally performed through an established pressure damage risk prediction model, and for the risk prediction model, comprehensive risk variables need to be brought into as much as possible, so that the risk variables affecting pressure damage can be avoided from being omitted, and the accuracy of pressure damage risk prediction is improved. However, different target populations have different risk variables acting on the pressure injury, and the effective degrees of the risk variables acting on the pressure injury risk prediction are different, so that the pressure injury risk prediction model is required to overcome the influence of the risk variables specific to other target populations when performing pressure injury risk prediction on a specific target population. The method is realized by adopting a multi-algorithm fusion mode, a pressure injury risk prediction model is established by utilizing a random forest model and a multiple logistic regression model, the influence caused by irrelevant risk variables is overcome by the first pressure risk prediction model aiming at a specific target group, and on the other hand, relevant medical record data of a patient can be of a composite type, namely, the medical record data can be matched with two or more first pressure injury risk prediction models, so that the first risk prediction model needs to ensure the combinability, namely, the two or more first pressure injury risk prediction models need to be combined and incorporate the expansion capability of new risk variables. Since the generated first pressure injury risk prediction model needs to have the expansion capability of incorporating new risk variables and combining a plurality of first pressure injury risk prediction models with each other, the expanded or combined first pressure injury risk prediction model needs to ensure the stability of the prediction. However, the first stress risk prediction model is constructed according to a random forest model, so if a new risk variable is included and the data volume is large, the output of the first stress damage risk prediction model may be inclined to the side with more data volume/data records, and therefore, the prediction result can be prevented from being skewed by averaging the data volume of the second type risk variable in the second classification training set. In addition, if there are many associated risk variables in the second category of risk variables, the output of the first pressure damage risk prediction model also inclines to the side of the associated more risk variables, so the present invention obtains a plurality of third category of risk variables through association degree division, and the plurality of third category of risk variables include the same number of second category of risk variables, so that the number of categories of risk variables is balanced, and the risk prediction result can be prevented from inclining.

According to a preferred embodiment, the processing unit is configured to:

classifying medical record data in a first classification training set by using a random forest model so as to obtain a first class of risk variables about the first classification training set;

performing regression on the first classification training set and the first class risk variables corresponding to the first classification training set based on a random forest model to obtain a first weight representing the mutual relation among the plurality of first class risk variables;

and dividing the first classification training set based on the first weight value to form a plurality of second classification training sets, and modeling the plurality of second classification training sets by adopting a random forest model to generate a plurality of first pressure damage risk prediction models.

According to a preferred embodiment, the processing unit is configured to:

establishing a multiple logistic regression model by taking the first-class risk variables as independent variables and taking whether the first-class risk variables are related or not as dependent variables;

obtaining the correlation degree among a plurality of first-class risk variables based on a multiple logistic regression model;

the first classification training set is divided based on the degree of association to generate a second classification training set.

According to a preferred embodiment, the processing unit is configured to:

constructing a mutual relation table based on the association degree between each first-class risk variable;

acquiring a first type risk variable pair with a first weight value smaller than a first threshold value;

and calculating the number of the first type risk variables included in the first type risk variable pair based on the mutual relation table.

According to a preferred embodiment, the processing unit is configured to:

if the number of the same first-class risk variables exceeds a second threshold value, searching a next first-class risk variable pair of which the first weight is smaller than the second threshold value;

and if the number of the same first-class risk variables is less than or equal to a second threshold value, selecting the first-class risk variable with the least number of the first-class risk variables to generate other first-class risk quantities as an isolated first-class risk variable.

According to a preferred embodiment, the processing unit is configured to:

and classifying the first pressure damage risk prediction model to obtain a second type of risk variable representing the model characteristics and a second weight. The second weight value represents the degree of correlation of the second type of risk variable with respect to the occurrence of the stress injury in the first stress injury risk prediction model.

According to a preferred embodiment, the processing unit is configured to:

and obtaining a second type of risk variable and a second weight of the first pressure damage risk prediction model based on the Keyny coefficient as a splitting or competition rule of the random forest model. The second weight is a kini coefficient.

According to a preferred embodiment, the processing unit is configured to:

averaging the data volume of the second type risk variables in the second classification training set;

dividing the second type of risk variables based on the degree of association, thereby generating a plurality of third type of risk variables;

and modeling based on the plurality of third-class risk variables to generate a second pressure injury risk prediction model.

The invention also provides a pressure injury risk prediction method, which comprises the following steps:

and predicting the risk of the pressure injury by using the second pressure injury risk prediction model.

According to a preferred embodiment, the method further comprises:

Drawings

FIG. 1 is a schematic flow chart of the steps of a preferred embodiment of the process of the present invention;

fig. 2 is a block schematic diagram of a preferred embodiment of the apparatus of the present invention.

List of reference numerals

100: the processing unit 200: the storage unit 300: communication unit

Detailed Description

The following detailed description is made with reference to the accompanying drawings.

Preferably, the invention provides a pressure injury risk prediction method, and particularly relates to pressure injury risk prediction through a pressure injury risk prediction model.

Preferably, the risk prediction model is a tool for predicting the absolute probability that a disease will occur or will occur in an individual based on multiple diseases through multi-factor analysis. The pressure injury risk prediction model aims at accurately predicting the risk of pressure injury and is convenient for medical care personnel to take targeted measures in time. The pressure injury according to the present invention refers to acquired pressure injury in medical institutions.

Preferably, referring to fig. 1, the steps of building a predictive model of the risk of stress injury are described.

S100: and (4) screening the medical record data to obtain analyzable medical record data. Preferably, medical record data of an external organization can be acquired through a network. The external institution may be a hospital, a disease center, or an associated institution that stores patient medical records. The network may be a local area network, the internet, a mobile network, etc. Because the database medical records of the external institution are more and different in specific situations, the precondition for retrospective analysis of the pressure injury is that the patient has not suffered from the pressure injury, or the patient has not suffered from the pressure injury in a short time. Therefore, it is necessary to process externally accessed medical record data and screen out data that cannot be analyzed retrospectively. The step of screening the medical record data is as follows.

S101: and (4) retrieving medical record data, and excluding the medical record data of pressure injury and skin injury generated in admission.

S102: and (4) eliminating medical record data of the pressure injury within a first time threshold after admission.

Preferably, the medical record data in which the pressure injury occurs at the time of admission is excluded from the medical record data, and data in which the pressure injury does not occur at the time of admission can be obtained. Preferably, the medical record of skin injury comprises medical record of burn, psoriasis, lupus erythematosus and the like.

Preferably, the first time threshold may be set as desired, such as 24 hours, 10 days, 20 days, etc. In order to ensure the validity of medical record data for learning training, time-dependent factors need to be considered. For example, a patient's medical history of pressure-related injuries within 24 hours of admission needs to be excluded. Because the stress injury generated in a short time after admission is possibly related to relevant factors when the patient is not admitted, the stress injury risk prediction model is incorrect.

Preferably, the characters in the analyzable medical record data are digitized. And (4) carrying out dimension normalization processing on the numerical analyzable medical record data. Preferably, because the characterization of patient information in medical record data may not be numerical, it is desirable to convert such information into a numerical value that the model can recognize. For example, 2, 8, or other multilevel representations may be employed. The patient information includes risk variables for the stress injury. For example, a eating condition may be 0 for poor eating and 1 for normal eating. Incontinence can be expressed as 1 for total control, 2 for occasional incontinence, 3 for macro/urinary incontinence and 4 for faecal incontinence. The skin type may be represented by 1 for normal, 2 for thin, 3 for dry, 4 for edema, 5 for moist, 6 for color difference, 7 for dehiscence, etc.

Preferably, part of the physiological indexes can be processed by adopting international system conversion factors. For example, conversion of creatinine to micromoles per liter requires multiplication by 88.4. For example, conversion of glucose to millimoles per liter requires multiplication by 0.0555. Preferably, the dimensional normalization process includes normalizing all variables to a range of 0-10. The normalization process can be that the minimum value of the variable in the medical record data is subtracted from the current value and then divided by the difference between the maximum value and the minimum value of the variable, and then the value is amplified by 10 times in an equal proportion. Through this setting mode, the beneficial effect who reaches is:

data are generally normalized to be within 0-1 by adopting multivariate classification algorithms such as a random forest model, a multivariate logistic regression model and a support vector machine algorithm in the prior art, but by adopting the setting mode, more decimals can be generated during subsequent computer calculation, and then a large amount of floating point operation is needed by the computer, so that a large amount of calculation overhead is consumed. The data are normalized to be within the range of 0-10, the generated decimal is reduced, the calculation amount of floating point operation is further reduced, and therefore calculation cost is saved.

S200: and randomly dividing analyzable medical record data into at least one training set for establishing a pressure injury risk prediction model and at least one verification set for verifying the pressure injury risk prediction model. The at least one training set may be one training set, two training sets, three training sets, or more. The at least one validation set may be one, two, three or more validation sets. The analyzable medical record data can be randomly divided in a manner that the analyzable medical record data is uniformly divided into two parts at random. Preferably, the analyzable medical record data is randomly divided, for example, the analyzable medical record data is randomly divided into ten parts, nine parts are used for establishing the pressure injury risk prediction model, and the remaining one part is used for verifying the pressure injury risk prediction model established by the nine training sets respectively.

Preferably, the plurality of risk variables is obtained based on a training set.

Preferably, the risk variable refers to the relevant factors that influence the occurrence of the stress injury. The risk variables include at least age, sex, weight, length of stay, department of stay, disease type, type of surgery, length of surgery, vital sign indicators, pulse, blood oxygen saturation, hemoglobin, serum protein, blood gas analysis related indicators, type of respiration (mechanical ventilation), presence or absence of coherent complications (e.g., diabetes, infection), smoking status, medication, eating status, excretion status, etc. Preferably, the risk variables of the foregoing examples are further subdivided, such as disease types that can be classified as cardiovascular disease, peripheral vascular disease, pneumonia, influenza, and the like.

Preferably, the key to constructing a stress injury risk prediction model is the screening of independent variables, i.e., risk variables. Because the pressure damage risk prediction model is constructed by adopting multi-factor regression analysis, the relation between independent variables and dependent variables is essentially quantized, and therefore the independent variables capable of causing the change of the dependent variables need to be comprehensively obtained. In the present embodiment, the independent variable refers to a risk variable. The dependent variable refers to the occurrence of stress injury.

Preferably, the acquired plurality of risk variables are screened.

Preferably, a single-factor analysis is performed first, and then a plurality of risk variables are screened, that is, the single-factor analysis is performed first, and the risk variables with single meaning are then included in the multi-factor analysis model together. However, in some cases, there is a limitation in using a single risk variable screening method followed by multiple risk variables, for example, when the number of risk variables is too large, for example, when co-linearity exists between risk variables, for example, when there are more missing values and samples containing missing values are not discarded. Aiming at the different situations, the screening can be carried out by adopting a corresponding method. For example, regularization techniques may be employed to solve for the co-linearity problem. Regularization techniques include ridge regression methods, LASSO regression models, elastic network models, and the like. For example, a random forest model can be used to solve the problem of a large number of missing values. In addition, a cluster analysis method, a principal component analysis method, a stepwise regression method, a gradient lifting method and the like can be adopted to screen a plurality of risk variables.

S300: and establishing a pressure damage risk prediction model by using a regression algorithm, and incorporating the screened risk variables into the established pressure damage risk prediction model.

Preferably, the regression-type algorithm model may adopt a parameterized model, a semi-parameterized model or a non-parameterized model. Preferably, the parameterized model may be a general linear model or a generalized linear model. The general linear model may be a linear regression algorithm model. The generalized linear model may be a Logistic (Logistic) regression model or a poisson regression model. The semi-parameterized model may be a Cox proportional hazards model or a competitive hazards model. The non-parametric model may be a machine learning class algorithm such as KNN nearest neighbor algorithm, SVM support vector machine, classification regression tree, random forest, neural network, or deep learning.

Preferably, the risk variables are the independent variables of the stress injury risk prediction model. The dependent variable of the pressure injury risk prediction model is the occurrence of pressure injury. Illustrated by a multiple Logistic (Logistic) regression model.

Preferably, the relationship between the dependent variable and the independent variable may be expressed as the following equation:

y＝1/(1+e^-z) (1)

z＝p₀+β₁X₁+…+β_nX_n (2)

wherein y represents a dependent variable. y takes the value of 0 or 1. 0 indicates that no stress injury occurred, and 1 indicates that stress injury occurred. X_nRepresenting an independent variable, i.e., a risk variable. n represents the number of independent variables (risk variables). Beta is a_nThe regression coefficients are represented. The regression coefficients characterize the degree of influence of the independent variables on the dependent variables, i.e. the regression coefficients are used to characterize the degree of contribution of the respective risk variables to the occurrence of stress injuries. Or the regression coefficients may also be weights that characterize the degree to which the risk variables cause the dependent variable to change.

Preferably, taking the logarithm of equation (1) yields:

Ln(y/1-y)＝β₀+β₁X₁+…+β_nX_n (3)

wherein Ln (y/1-y) is logic (Logistic) transformation. In the formula (3), y may represent the probability that y takes on the value 1. 1-y may represent the probability that y takes on the value 0.

Preferably, the probability that y takes on value 1 is p (y-1) -e^z/(1+e^z)。

Preferably, the probability that y takes a value of 0 is p (y is 0) 1/(1+ e)^z)。

Preferably, the values of the regression coefficients may be estimated based on medical record data in the training set according to a maximum likelihood method.

S400: and verifying the established pressure injury risk prediction model based on the verification set. Preferably, the stress injury risk prediction model is updated based on the prediction performance and consistency. Preferably, the prediction performance of the model for predicting the risk of stress injury can be characterized by indexes such as sensitivity, specificity, area under receiver operating characteristic curve (ROC) (AUC), and the like. Sensitivity is used to characterize the ability of risk prediction models to screen truly sick patients. The specific characterization risk prediction model excludes the ability of truly non-sick patients. The area under the receiver operating characteristic curve (ROC) (AUC) is generally 0.5-1, and is an index for evaluating the distinguishing capability of a risk prediction model. A larger AUC value indicates a higher authenticity. Preferably, the AUC can be explained by a confusion matrix specification. The confusion matrix includes Positive (Positive) and Negative (Negative). True (True) if the prediction is correct. The prediction error is False (False). The confusion matrix includes true yang, false yang, true yin and false yin, as shown in table 1.

TABLE 1 confusion matrix

True positives can be denoted by TP. The number of true positive samples indicates the number of people who are classified as sick by the truly sick patients, i.e. the actual value is 1, and the predicted value is 1.

False positives may be represented by FP. The number of false positive samples indicates the number of healthy patients classified as sick, with an actual value of 0 and a predicted value of 1.

True negative can be represented by TN. The number of true and negative samples indicates the number of healthy patients classified as disease-free, and the actual and predicted values are both 0.

False negatives can be indicated by FN. The number of false negative samples indicates the number of people classified as disease-free, the actual value is 1, and the predicted value is 0.

The sensitivity can be expressed in terms of true positive probability. The true positive probability is used to represent the probability that a sick patient is classified as sick and the sensitivity can be characterized by the following formula.

Specificity can be expressed in terms of true negative probability. The true-negative probability is used to represent the probability that a healthy patient is classified as disease-free, and the specificity can be characterized by the following formula.

AUC represents the area under the receiver operating characteristic curve (ROC). The vertical axis of the ROC curve is the sensitivity S_E. The horizontal axis of the ROC curve is 1-S_PI.e. the probability of a false positive. The function of the ROC curve is characterized as S_E＝F(1-S_P). F (-) represents a function. AUC is curve S_E＝F(1-S_P) In the process of S_EAnd 1-S_PThe area in the enclosed rectangular frame. An AUC of 1 is the most ideal case, indicating that neither truly sick nor healthy patients are misclassified as sick, i.e., the AUC can be used to characterize the discriminative power of the pressure injury risk prediction model.

Preferably, the consistency can be evaluated by Goodness Of Fit (Goodness Of Fit, GOF). When the P value of the risk prediction model is larger than 0.05, the risk prediction model is shown to fully extract the information in the data, and the goodness of fit is high. The P value represents: the probability of the current situation or worse when the original hypothesis is assumed to be correct. Preferably, the accuracy of the risk prediction model is generally a calibration curve. The calibration curve is a scattergram having the predicted occurrence probability as an abscissa and the actual occurrence probability as an ordinate. And (3) performing linear fitting on the scatter diagram, wherein if the linear is a linear with the slope of 45 degrees passing through the origin, the model accuracy is better. The further away from the 45 ° slope line from the origin, the worse the prediction accuracy of the model. In Logistic (Logistic) regression analysis, the calibration curve is actually a visualization of the goodness-of-fit evaluation results.

Example 1

The embodiment provides a method for predicting the risk of pressure injury, which predicts the risk of pressure injury through a pressure injury risk prediction model. The present embodiment improves the pressure injury risk prediction model established in steps S100 to S400.

Preferably, the pressure injury risk prediction is performed based on a multiple logic (Logistic) regression model or other regression models, on one hand, the multiple collinearity problem possibly existing in the risk variables needs to be considered, and on the other hand, whether the established pressure injury risk prediction model can be suitable for different types of people needs to be considered, that is, whether the application limitation problem exists in the pressure injury risk prediction model or not needs to be considered. In particular, in one aspect, multicollinearity refers to a relationship where there is some correlation between multiple risk variables, and in particular, a certain risk variable may be characterized by a combination of other risk variables. For example, there is a correlation between similar risk variables such as systolic and diastolic blood pressure, total cholesterol and low density lipoprotein cholesterol. If risk variables with multiple collinearity are included and fitted in the process of establishing the regression model, the regression coefficient estimated by the multiple Logistic (Logistic) regression equation may not conform to the common knowledge, and even the sign of the regression coefficient is opposite to the actual sign, which may seriously affect the result of the pressure damage risk prediction. On the other hand, due to different target populations, different data provided by medical record databases, different screened risk variables and the like, the pressure injury risk prediction model has a limited problem. Specifically, comprehensive risk variables are included as much as possible, so that the risk variables influencing the pressure damage can be avoided from being omitted, and the accuracy of predicting the pressure damage risk is improved. However, different target populations have different risk variables acting on the pressure injury, and the effective degrees of the risk variables acting on the pressure injury risk prediction are different, so that the pressure injury risk prediction model is required to overcome the influence of the risk variables specific to other target populations when performing pressure injury risk prediction on a specific target population.

For ease of understanding, vascular disease patients and orthopedic surgery patients are exemplified. If the patients with vascular diseases and the patients with orthopedic surgery are all brought into the training set, the risk variables at least comprise the risk variables specific to the patients with vascular diseases and the risk variables specific to the patients with orthopedic surgery, such as the intake of the medicines for treating vascular diseases of the patients with vascular diseases, the pressure, the operation duration, the operation body position and the like caused by the surgical instruments specific to the patients with orthopedic surgery, and therefore compared with the patients with simple vascular diseases, the risk variables such as the surgical instruments, the operation duration, the operation body position and the like which are brought into the patients with vascular diseases are irrelevant risk variables. If the two types of medical record data are mixed to be used as a training set for establishing the pressure injury risk prediction model, the pressure injury risk prediction model also considers the contribution of the irrelevant risk variable as the dependent variable, and then allocates a regression coefficient to the irrelevant risk variable, and compared with the vascular disease patient, the irrelevant risk variable is equivalent to an interference factor, so that the prediction of the pressure injury risk prediction model is inaccurate. In addition, in practical applications, it is more likely that medical record data of the vascular disease patient does not have data about surgical instruments, operation duration and operation positions, and for the pressure injury risk prediction model, medical record data of the patient is missing, so that the prediction performance of the pressure injury risk prediction model is affected.

Preferably, the differences of the present embodiment from the foregoing steps S100 to S400 include the following steps.

Preferably, the at least one training set and the at least one verification set acquired in step S200 are generally randomly divided into analyzable medical record data in a random manner. Through the setting mode, medical record data of different target populations are mixed in the training set or the verification set, so that training needs to be carried out on the significant risk variable of the specific target population during training.

S201: and redistributing the training set obtained by random division based on the significant risk variable so as to obtain a first classification training set aiming at a specific target population. Preferably, the number of the first classification training sets of the present embodiment may be one, two, three, or more. Specifically, the first classification training set according to the present embodiment may correspond to a specific target group in a one-to-one manner. For example, a patient with a vascular disease corresponds to a first classification training set. For example, an orthopaedic surgery patient corresponds to a first classification training set.

Preferably, the chi-square test can be used to analyze the risk variables and find significant risk variables that have a significant impact on the stress injury. Preferably, the significant risk variables referred to in this embodiment refer to risk variables that have a significant impact on the stress injury, such as department, BMI, skin type, incontinence, perception limitation, etc. Preferably, medical record data in the training set having the same significant risk variable is redistributed to the same first sorted training set. For example, the significant risk variable may be the type of disease the patient suffers from, such as the patient may be classified as vascular patient, cardiovascular surgery patient, orthopedic surgery patient, ICU patient, etc., i.e. the training set may be classified according to the significant risk variable as a first classification training set for vascular patient, a first classification training set for cardiovascular surgery patient, a first classification training set for orthopedic surgery patient, a first classification training set for ICU patient.

Preferably, in step S200, the risk variables of the medical record data in the training set need to be screened. For the multiple collinearity problem, a correlation coefficient between risk variables may be calculated, and then risk variables with high correlation coefficients may be eliminated, but this approach is only for the case that the correlation coefficient is high and may be approximately considered as two repeated risk variables, and if the correlation coefficient is low or cannot be considered as two repeated risk variables, one of the risk variables is eliminated, and information included in the risk variable may not be fully utilized.

Preferably, regularization techniques can also be employed to solve the multiple collinearity problem. Specifically, in step S300, the value of the regression coefficient is estimated based on medical record data in the training set using the maximum likelihood method. The regularization technique estimates the value of the regression coefficient by using a maximum a posteriori estimation method. Preferably, the value of the regression coefficient is estimated based on medical record data in the training set using a maximum a posteriori estimation method. The maximum a posteriori estimate can be viewed as a regularized maximum likelihood estimate. The maximum likelihood estimation considers that the parameter (regression coefficient) to be estimated is a constant number, but the value of the parameter is not known at present, and the parameter can be estimated by a randomly generated sample. The maximum posterior estimation considers that the parameter to be estimated is a variable, the variable obeys a certain random distribution model, namely the parameter is considered to be an unknown random variable, the prior probability of the parameter distribution condition can be given, and then estimation is carried out based on Bayes theorem. However, although the generalization capability can be increased by a priori estimation based on the maximum posterior estimation, the multiple logistic regression model is very sensitive to multiple collinearity, and the modeling is premised on that the risk variables are mutually independent, and in the process of screening the risk variables, the risk variables with lower preserved correlation coefficient also have influence on the result of the pressure damage risk prediction. Aiming at the problems, the method is implemented by adopting a multi-algorithm fusion mode, and a pressure damage risk prediction model is established by utilizing a random forest model and a multiple logistic regression model.

Preferably, the random forest model is a random forest combined by classification trees, and two times of randomness are used in the construction process of each decision tree: firstly, training data used in the construction of a decision tree are randomly acquired from original data through a bootstrap method; secondly, the interpretation variables used by each decision tree are randomly acquired on the original characteristic set to generate a plurality of classification trees, and then the results of the classification trees are summarized.

S202: and performing regression modeling on the first classification training set by using a random forest model so as to generate a first pressure damage risk prediction model related to the first classification training set. Preferably, the step of generating a first predictive model of risk of stress injury comprises:

s2021: and classifying the medical record data in the first classification training set by using a random forest model so as to obtain a first class of risk variables about the first classification training set.

S2022: and performing regression on the first classification training set and the first class risk variables corresponding to the first classification training set based on a random forest model to obtain a first weight value of the interrelation among the plurality of first class risk variables.

S2023: and dividing the first classification training set based on the first weight value to form a plurality of second classification training sets, and modeling the plurality of second classification training sets by adopting a random forest model to generate a plurality of first pressure damage risk prediction models. Preferably, the invention classifies the first classification training set again by using a random forest model to comprehensively screen out risk variables related to the stress injury in the first classification training set, namely the first class risk variables. Then modeling is carried out on the screened first-class risk variables based on a multiple logistic regression model, so that the mutual relation or the correlation degree between the first-class risk variables is obtained, further, the invention can screen according to the mutual relation between the first-class risk variables to obtain relatively isolated variables in the first-class risk variables, and the isolated variables are used for classifying the first classification training set to obtain a second classification training set. Through this setting mode, the beneficial effect who reaches is:

the second classification training set obtained by classifying the first classification training set through the first weight is equivalent to classifying specific noise data in the first classification training set, and random forest model modeling is performed after the same specific noise data are classified into the same group, so that the influence caused by noise can be remarkably reduced, the phenomenon of over-fitting is avoided, and the constructed risk prediction model can be generalized (applied) to new medical record data. The noisy data in this embodiment refers to an independent risk variable. Specifically, the chi-square detection adopted in step S201 obtains significant risk variables to classify the training set to obtain a first classification training set that may not be accurately classified, and the re-classification method adopting the random forest model in step S202 is used to further accurately screen out the irrelevant risk variables existing in the first classification training set. In addition, because the sensitivity of the multiple logistic regression model to the independent variables, namely, the interrelation between the risk variables can relatively accurately acquire the interrelation between the first-class risk variables, the binary regression prediction of the first-class risk variables through the multiple logistic regression model can acquire the first weight values of the interrelation between the multiple first-class risk variables, the interrelation between the multiple first-class risk variables can be quantitatively evaluated through the first weight values, the relatively isolated first-class risk variables among the multiple first-class risk variables can be acquired, namely, the isolation degree of the first-class risk variables can be evaluated according to the first weight values, and the first classification training set is divided according to the isolation degree of the first-class risk variables, so that the second classification training set is acquired. At the moment, the medical record data in the second classification training set are medical record data with similar risk variable association degree, so that the interference caused by the specific first-class risk variable is reduced, and the over-fitting problem of the random forest model is avoided.

Preferably, step S2022 further comprises the steps of:

and acquiring the correlation among the plurality of first-class risk variables based on the multiple logistic regression model. Preferably, a first type of risk variable is randomly selected, and the correlation between the first type of risk variable and other first type of risk variables is calculated based on a multiple logistic regression model.

Preferably, the degree of correlation between the plurality of first-class risk variables is obtained based on a multiple logistic regression model. The first classification training set is divided based on the degree of association to generate a second classification training set. Through this setting mode, the beneficial effect who reaches is:

although the isolated first-class risk variables cannot be accurately acquired by calculating the degree of association between the first-class risk variables, and further the specific noise cannot be eliminated to the maximum extent, the risk of division failure caused by small amount of related data can be avoided by dividing through the degree of association between the first-class risk variables.

Preferably, the first category risk variables are chosen randomly. And calculating the association degree between the first-class risk variable and other first-class risk variables based on the multiple logistic regression model. Preferably, the degree of correlation may be characterized by calculating a regression coefficient. For example, a first risk variable a is randomly selected, and regression coefficients with other first risk variables are calculated based on the first risk variable a. The regression coefficients characterize the degree of change of the other first type risk variables when the first type risk variable a changes. For example, when the first type risk variable a varies by one unit and the associated first type risk variable B varies by 1 unit, the degree of association is 1. If the first type risk variable a varies by 1 unit and the associated first type risk variable B varies by 0.1 unit, the degree of association is 0.1. Preferably, the plurality of first-class risk variables with the association degree greater than the third threshold are screened based on the association degree of the first-class risk variables. The third threshold may be set according to the actual number of first-class risk variables and medical record data. Preferably, the third threshold may be a median of the degree of correlation.

Preferably, the first weight value can be used to characterize the degree of association of the first type of risk variable. The step of dividing the first classification training set based on the first weight to form a plurality of second classification training sets is as follows:

calculating the number of the first type risk variables included in the first type risk variable pairs based on the mutual relation table;

and if the number of the same first-class risk variables exceeds a second threshold value, searching the next pair of first-class risk variables with the first weights smaller than the second threshold value. Preferably, if the number of the same first-class risk variables is less than or equal to the second threshold, the first-class risk variable with the least number of the first-class risk variables generating other first-class risk quantities is selected as the isolated first-class risk variable. Preferably, medical record data containing the isolated first-class risk variables are selected as a second classification training set based on the first classification training set. The first threshold may be chosen to be a value close to zero. The first threshold may be set according to the first weight actually obtained. Preferably, the first threshold may be a numerical value less than 20% of the average value of the first weight. Preferably, the second threshold value may be set according to the number of risk variables of the first type involved. The second threshold may be 40% of the total number of first type risk variables.

S203: and classifying the first pressure damage risk prediction model to obtain a second type of risk variable representing the model characteristics and a second weight. The second weight value represents the degree of correlation with the occurrence of the stress injury in the first stress injury risk prediction model. In practical use, the medical record data of the patient can be adapted according to the second weight value of the second type risk variable.

S2031: and obtaining a second type of risk variable and a second weight of the first pressure damage risk prediction model based on the Keyny coefficient as a splitting or competition rule of the random forest model. Preferably, the second weight is a kini coefficient. The second weight represents the degree of association of the second type of risk variable with the stress injury. And (3) extracting N samples from the second classification training set by using a Boos-strap sampling method through a random forest algorithm, then respectively establishing decision tree models for the N samples, wherein each decision tree consists of a root node, leaf nodes and branches, each decision tree model comprises 4 random variable attributes, splitting the node in an optimal splitting mode in 4 characteristics, and each tree grows completely without pruning to obtain a combined classifier. And classifying each test sample by utilizing N decision tree models to obtain N classification results, and finally voting the N classifications to determine the final classification result. Preferably, the expression of the pre-grouping kini coefficient g (t) is as follows:

preferably, p (j | t) represents the normalized probability that the output variable takes the jth class in node t. When the output quantities of the node samples are the same sample, the difference of the values of the output variables is minimum, and the Gini coefficient is 0. When the probabilities of the values of the categories are the same, the difference of the values of the output variables is the largest, and the coefficient of the kini is also the largest.

Preferably, the classification tree measures the degree of decrease Δ g (t) of heterogeneity using the amount of decrease in the kini coefficient. Preferably, a simple majority voting method may be employed to decide the final classification result. The final classification decision is as follows:

where H (x) represents a combined classification model. h is_i(x) Representing a single decision classification model. Y represents a target variable. I (-) represents an illustrative function. The whole process is repeated k times. Samples that have never been drawn are referred to as out-of-bag data. Preferably, the effect of the model can be measured as the mean square of the residuals of the out-of-bag data predictors.

Preferably, the relevant medical record data of the patient may be of a composite type, i.e. two or more first pressure injury risk prediction models may be adapted to the medical record data, and therefore the first risk prediction models need to ensure their combinability, i.e. the two or more first pressure injury risk prediction models need to be able to combine and incorporate the expandability of the new risk variables. Preferably, the step S300 is:

and dividing the second type of risk variables based on the association degree, thereby generating a plurality of third type of risk variables. Preferably, the modeling based on the plurality of third type risk variables generates a second pressure injury risk prediction model. Preferably, the number of the second-class risk variables contained in each of the divided third-class risk variables is the same.

Through this setting mode, the beneficial effect who reaches is:

since the generated first pressure injury risk prediction model needs to have the expansion capability of incorporating new risk variables and combining a plurality of first pressure injury risk prediction models with each other, the expanded or combined first pressure injury risk prediction model needs to ensure the stability of the prediction. However, the first stress risk prediction model is constructed according to a random forest model, so if a new risk variable is included and the data volume is large, the output of the first stress damage risk prediction model may be inclined to the side with more data volume/data records, and therefore, the prediction result can be prevented from being skewed by averaging the data volume of the second type risk variable in the second classification training set. In addition, if there are many associated risk variables in the second category of risk variables, the output of the first pressure damage risk prediction model also inclines to the side of the associated more risk variables, so the present invention obtains a plurality of third category of risk variables through association degree division, and the plurality of third category of risk variables include the same number of second category of risk variables, so that the number of categories of risk variables is balanced, and the risk prediction result can be prevented from inclining.

Preferably, referring to fig. 2, the present embodiment further provides a pressure injury risk prediction device. The apparatus comprises a processing unit 100, a storage unit 200 and a communication unit 300. Preferably, the processing unit 100 may perform the above steps S100 to S400 and steps S10 to S70. In one aspect, the processing unit 100 may be configured to perform steps S100 to S400. On the other hand, the processing unit 100 may be configured to perform steps S10 to S70. Preferably, the storage unit 200 is configured to store the medical record data, the analyzable medical record data, the training set, the verification set, the stress injury risk prediction model, the risk variables, the regression coefficients, and the like. Preferably, the communication unit 300 is used to access a network and connect devices to obtain medical record data. For example, the communication unit 300 can access the medical record database through a wired and/or wireless manner, such as internet, internet of things, mobile network, and ethernet. Preferably, the communication unit 300 may also be an RJ-45 interface of ethernet, a BNC interface of fine coaxial cable, a coarse coaxial cable AUI interface, an FDDI interface, an ATM interface, etc. The communication unit 300 may also be a Wi-Fi module, a bluetooth module, a Zigbee module, etc. Preferably, the communication unit 300 may also be a combination of an RJ-45 interface, a BNC interface, a coarse coaxial AUI interface, an FDDI interface, an ATM interface, a Wi-Fi module, a bluetooth module, and a Zigbee module.

Preferably, the Processing Unit 100 may be a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a Graphics Processing Unit (GPU), or other Programmable logic devices, transistor logic devices, hardware components, or any combination thereof.

Preferably, the storage unit 200 may be a magnetic disk, a hard disk, an optical disk, a removable hard disk, a solid state disk, a flash memory, or the like.

The present specification encompasses multiple inventive concepts and the applicant reserves the right to submit divisional applications according to each inventive concept. The present description contains several inventive concepts, such as "preferably", "according to a preferred embodiment" or "optionally", each indicating that the respective paragraph discloses a separate concept, the applicant reserves the right to submit divisional applications according to each inventive concept.

It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and figures are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.

Claims

1. A pressure injury risk prediction device based on multi-algorithm fusion is used for predicting the acquired pressure injury risk of a medical institution and is characterized by comprising a processing unit (100), wherein,

the processing unit (100) is configured to perform the steps of:

2. The pressure injury risk prediction device of claim 1, wherein the processing unit (100) is configured to:

3. The stress injury risk prediction device of any of claims 1 or 2, wherein the processing unit (100) is configured to:

4. The pressure injury risk prediction device of any preceding claim, wherein the processing unit (100) is configured to:

5. The pressure injury risk prediction device of any preceding claim, wherein the processing unit (100) is configured to:

6. The pressure injury risk prediction device of any preceding claim, wherein the processing unit (100) is configured to:

classifying the first pressure damage risk prediction model to obtain a second type of risk variables representing the model characteristics and a second weight, wherein,

the second weight value represents the degree of correlation of the second type of risk variable with respect to the occurrence of the stress injury in the first stress injury risk prediction model.

7. The pressure injury risk prediction device of any preceding claim, wherein the processing unit (100) is configured to:

and acquiring a second type of risk variable and a second weight value of the first pressure damage risk prediction model based on the Keyni coefficient as a splitting or competition rule of the random forest model, wherein,

the second weight is a kini coefficient.

8. The pressure injury risk prediction device of any preceding claim, wherein the processing unit (100) is configured to:

9. The method of predicting the risk of stress injury according to any of the preceding claims, for predicting the risk of acquired stress injury for a medical institution, wherein the method comprises:

10. The method for predicting the risk of pressure injury according to any of the preceding claims, wherein the method further comprises: