CN113470819A - Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest - Google Patents

Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest Download PDF

Info

Publication number
CN113470819A
CN113470819A CN202110837331.0A CN202110837331A CN113470819A CN 113470819 A CN113470819 A CN 113470819A CN 202110837331 A CN202110837331 A CN 202110837331A CN 113470819 A CN113470819 A CN 113470819A
Authority
CN
China
Prior art keywords
data
pressure sore
sample
random forest
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110837331.0A
Other languages
Chinese (zh)
Inventor
梁伟
宁优
黄素珍
陈晓红
陈妍
徐雪松
史长发
杨艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202110837331.0A priority Critical patent/CN113470819A/en
Publication of CN113470819A publication Critical patent/CN113470819A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)

Abstract

The invention belongs to the field of artificial intelligence technology application, and particularly relates to an early prediction method for adverse events of pressure sores of unbalanced small samples based on random forests. The method specifically comprises the following steps: acquiring clinical data, performing feature extraction and preprocessing on the data based on clinical experience to obtain preprocessed data, and dividing the preprocessed data into a training set and a test set; training based on a bagging algorithm by taking the training set as input to obtain a plurality of decision trees to construct a random forest classifier, and predicting pressure sores in a voting mode; and verifying and analyzing the random forest classifier by using the test set as input and adopting a cross-verification method. The prediction accuracy of the prediction model obtained by the prediction method can reach 94.92%, and the data processing capacity is large.

Description

Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest
Technical Field
The invention belongs to the field of artificial intelligence technology application, and particularly relates to an early prediction method for adverse events of pressure sores of unbalanced small samples based on a random forest, which is applied to the functions of pressure sore risk assessment and timely early warning of hospitalized patients.
Background
Pressure sores are also called pressure ulcers and bedsores, belong to skin ulcers and are caused by tissue ulceration and necrosis due to continuous ischemia, anoxia and malnutrition caused by long-term compression or friction of skin tissues of a human body. It is often the case for patients who need to be bedridden for a long time due to paralysis or surgery. The occurrence of pressure sores can aggravate the original disease conditions of patients, further cause infection, increase the physical pressure and psychological pressure of the patients, greatly reduce the life quality of the patients, prolong the hospitalization time of the patients to a certain extent and increase the death rate, and according to the reports of related documents, about 6 million people die of the pressure sores every year in China. Therefore, the method has important significance for comprehensively understanding relevant factors for pressure sore formation, researching pressure sore early warning characteristic indexes, carrying out early prediction on pressure sore so as to carry out early prevention and timely effective treatment, and reducing the death rate of patients.
However, the cause of pressure sores is complex and somewhat hidden, and difficult for patients and healthcare workers to find and attach at an early stage. At present, the method for evaluating adverse events of pressure sores in the medical field mainly adopts a scale method, which comprises the following steps: braden scale, Norton scale, Braden Q scale, and Waterlow scale, among others. However, there are a number of disadvantages to using traditional scale methods for evaluation: 1) the prediction result is influenced by subjective factors of medical staff, and due to the fact that the translation of the scales of all versions is different, the medical staff is disadvantaged when reading the detailed rules of pressure sore scores, and the clinical habitual personal clinical experience replaces the detailed rules of pressure sore scores, so that the judgment result has personal factors of the medical staff, and the prediction is not objective; 2) operation errors exist, and according to related researches in China, medical workers have the problems of misinformation, missing report, inaccurate evaluation, insufficient pressure sore knowledge and clinical experience and the like in the pressure sore evaluation process, so that the pressure sore pre-judgment result is influenced; 3) the prediction time is delayed, the patient is difficult to attach due to complex formation reasons of the pressure sores and unobvious early characteristics, the nursing work is complicated and easy to ignore, the prediction by the traditional method is delayed, the early treatment of the pressure sore patient is influenced, and the risk of pressure sores is increased.
Disclosure of Invention
Based on the above problems, by means of technologies such as machine learning and big data analysis, a method for early prediction of adverse events of unbalanced small sample pressure sores based on random forests is provided, pressure sores are used as research objects, pressure sore case data in 2011-plus 2017 hospital in certain three hospitals in Hunan province are relied on, mechanism analysis and data driving are fused for pressure sore feature extraction, pressure sore prediction based on random forests is established for the first time, a data set is divided by adopting a random sampling method, a pressure sore prediction model with high prediction precision is obtained, and real-time evaluation of batch processing can be realized.
The invention provides an early prediction method for adverse events of pressure sores of unbalanced small samples based on random forests, which specifically comprises the following steps:
acquiring clinical data, performing feature extraction and preprocessing on the data based on clinical experience to obtain preprocessed data, and dividing the preprocessed data into a training set and a test set;
training based on a bagging algorithm by taking the training set as input to obtain a plurality of decision trees to construct a calculation forest classifier, and predicting pressure sores in a voting mode;
and verifying and analyzing the random forest classifier by using the test set as input and adopting a cross-verification method.
Further, the pretreatment specifically includes:
carrying out structured processing on the text data by adopting a natural language processing method; and eliminating abnormal data, and complementing missing data by using a model method.
Further, the step of training based on the bagging algorithm to obtain a plurality of decision trees to construct the computational forest classifier comprises:
selecting a plurality of samples from the training set by adopting a Bootstrap method, and obtaining a plurality of training samples as a sample set;
selecting the best attribute of all attributes as a node by taking the sample set as input, and establishing a CART decision tree according to a preset decision tree construction method;
and repeating the steps for multiple times to obtain a plurality of different decision trees to form the random forest classifier.
Further, the preset decision tree construction method specifically includes:
acquiring a sample set, and judging whether the number of samples is smaller than a sample number threshold value and whether characteristics exist; if the sample characteristics do not exist or the number of samples is smaller than the sample number threshold value, returning to the decision sub-tree, and stopping recursion of the current node;
calculating a damping coefficient of the sample set, and judging whether the damping coefficient is smaller than a set damping coefficient threshold value; if the current node is smaller than the decision tree subtree, returning to the decision tree subtree, and stopping recursion of the current node;
identifying each characteristic type in the sample set, and calculating a kini coefficient under each characteristic; and constructing a decision tree by using the characteristic kini coefficient as a standard and adopting a binary recursion mode.
Further, the calculation formula of the kini coefficient is as follows:
Figure BDA0003177620500000031
wherein m represents the number of types of samples, and the value of m is 2 in the pressure sore sample setiIs the probability that the sample point belongs to the m-th class.
Further, the step of constructing the decision tree in a binary recursive manner by using the characteristic kini coefficient as a standard specifically includes:
screening which one has the most characteristic and the most characteristic value by taking the characteristic Gini coefficient as a standard, dividing a sample set into two parts, and establishing left and right nodes of the current node;
the calculation formula of the Gini coefficient of the divided sample set is as follows:
Figure BDA0003177620500000041
wherein | D-represents the total number of samples, | D1│、│D2| represents the number of samples for pressure sore and non-pressure sore categories, respectively.
Has the advantages that:
according to the invention, the probability of the pressure sore adverse event of the patient under the current physical condition can be obtained in real time after the relevant characteristic indexes of the patient are input, the model optimization is carried out on the basis of the early prediction of the pressure sore adverse event of the small unbalanced sample of the random forest, the prediction precision reaches 94.92%, the model can process the pressure sore detection work of the patient in batches, the model can autonomously predict the occurrence risk of the pressure sore adverse event of 1000 patients and automatically record and store the prediction result for only 100S, the risk prediction can be carried out on 36000 people in a hospital within one hour, and the prediction result can provide decision basis for the decision-making of whether medical staff carry out pressure sore early intervention treatment on the patient and the relevant treatment scheme.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of an early stage prediction method for pressure sore adverse events based on unbalanced small samples of random forests according to an embodiment of the present invention;
fig. 2 is a ten-fold cross validation result diagram provided in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, in the embodiment of the present invention, a method for early predicting an adverse pressure sore event based on an unbalanced small sample of a random forest is provided, which specifically includes the following steps:
step S101, acquiring clinical data, performing feature extraction and preprocessing on the data based on clinical experience to obtain preprocessed data, and dividing the preprocessed data into a training set and a test set.
In the embodiment of the invention, pressure sore feature extraction and data preprocessing based on clinical experience: the data comes from a Hospital of Hunan province, the original pressure sore data set is complicated in data, high in redundancy and sparse in key information, unstructured data accounts for more than 80%, abnormal points and noise are caused by errors in the process of inputting a small amount of data, the data are firstly structured and preprocessed, text data (such as medical advice) are structurally processed through a natural language processing method, then a quartile method is adopted to remove abnormal data, and a model method is used for completing missing data.
The invention determines indexes by adopting a correlation analysis method based on a component pressure sore adverse event index system such as expert interview, hierarchical analysis and the like, and the extraction rule of the indexes is detailed in table 1.
Table 1 index extraction rule table
Figure BDA0003177620500000051
Figure BDA0003177620500000061
Figure BDA0003177620500000071
As can be seen from Table 1, the data of the pressure chamber patients has the characteristics of high dimensionality but small data volume, so that an appropriate model is selected for predicting adverse pressure sore events, and the preprocessed data are grouped to obtain a training group and a testing group.
And S102, training by taking the training set as input based on a bagging algorithm to obtain a plurality of decision trees to construct a random forest classifier, and predicting pressure sores in a voting mode.
In the embodiment of the invention, D ═ x1,y1),(x2,y2),...,(xN,yN) Wherein x ═ { x ═ x1,x2...xnAnd (4) inputting variables of pressure sore data, namely indexes such as age, bed rest, braking, limb dysfunction and the like, and y is a characteristic identifier of the pressure sore data, namely a characteristic identifier corresponding to each characteristic index.
The random forest integrates a plurality of trees of the decision tree through the idea of ensemble learning, takes the decision tree as a base classifier of the decision tree, and is an algorithm which uses the decision tree in a bagging algorithm and finally makes classification in a voting mode. The bagging algorithm is a repeated sampling technology with samples put back from a data set according to uniform probability distribution, the size of a sub-training sample set is the same as that of an original data set, and when a training sample of each sub-classifier is constructed, the samples with samples put back from the original data set are sampled, so that the same sample data may appear in the same training sample set for many times. The method specifically comprises the following steps:
re-sampling n samples in a back-to-back manner by adopting a Bootstrap method from a training sample set to select n samples, namely, the training data set of each tree is different and contains repeated training samples; selectively selecting K attributes from all attributes, and selecting the optimal attribute as a node to establish a CART decision tree; repeating the steps for M times to obtain M different decision trees, wherein the M different decision trees form a random forest, and finally voting the classification result in a voting way to determine which type the data belongs to.
In the embodiment of the invention, the CART decision tree is constructed by dividing the value of the split attribute into two subsets through a CART algorithm, sequentially calculating Gini index values determined by the training set from the two subsets, and then constructing the decision tree in a binary recursion mode, thereby generating subtrees of a left branch and a right branch. When the node is split, the algorithm measures data partitioning by using Gini indexes, and the calculation process is as follows:
gini coefficient calculation mode of pressure sore training set sample:
Figure BDA0003177620500000081
wherein m represents the number of types of samples, and the value of m is 2 in the pressure sore sample setiIs the probability that the sample point belongs to the m-th class.
When the Gini coefficients of each partition attribute of the pressure sore data set are calculated in a manner that the data set D is divided into two subsets D1 and D2, the Gini coefficients of the partition are:
Figure BDA0003177620500000082
wherein | D | represents the total number of samples, | D1│、│D2| represents the number of samples for pressure sore and non-pressure sore categories, respectively.
In the embodiment of the invention, according to a training data set, the following operations are recursively performed on each node from a root node, and the specific steps of constructing the binary decision tree are as follows:
input for training pressure sore dataExercise set D ═ x1,y1),(x2,y2),...,(xN,yN) And stopping the calculated parameter condition.
Output is CART decision tree.
1. Judging whether the number of the samples is smaller than a threshold or judging whether the characteristics exist, if the characteristics of the samples do not exist or the number of the samples is smaller than the threshold, returning to the decision sub-tree, and stopping recursion of the current node. 2. And (3) calculating the kini coefficient of the sample set D according to the formula (1), judging whether the kini coefficient is smaller than a threshold value, if so, returning to a decision tree subtree, and stopping recursion of the current node. 3. And identifying each characteristic type, judging whether the characteristic type is a discrete value or a continuous value, and calculating the Keyny coefficient under each segmentation by using a corresponding processing method according to a formula (2) for each type. 4. According to the optimal characteristic and the optimal characteristic value, the data set is divided into two parts, namely D1 and D2, the left node and the right node of the current node are simultaneously established, the data set D of the node is D1, and the data set D of the right node is D2. 5. And (4) recursively and sequentially calling the left child node and the right child node to generate a decision tree.
And S103, verifying and analyzing the random forest classifier by using the test set as input and adopting a cross-validation method.
In the embodiment of the invention, in order to verify the correctness of the random forest classifier model without providing an assistant decision by a user, ensure that the model can accurately predict the pressure sore probability of a patient and output the prediction result on a computer, evaluation indexes of classification algorithms such as accuracy, Recall, F1-score, Roc curve, fusion Matrix and the like are adopted for the model to evaluate the performance of the model, as the pressure sore data is a small sample data set, the model verification is carried out by adopting cross verification, the test data can be used for obtaining higher reliability accuracy, and the selected test set can objectively represent the distribution of all data without elaborately designing the test set, and the cross-folding cross verification result is shown in figure 2. The ten-fold cross validation randomly divides the sample data into 10 parts, randomly selects 9 parts each time as a training set, and randomly selects 9 parts again to train the data after the round is completed, wherein the rest 1 part is used as a test set. After 10 rounds, a loss function is selected to evaluate the optimal model and parameters, and fig. 2 is the output of the prediction accuracy after each round of training.
According to the invention, after relevant characteristic indexes of a patient are input, the probability of the pressure sore adverse event of the patient under the current physical condition is obtained in real time, the model optimization is carried out on the basis of the early prediction of the pressure sore adverse event of an unbalanced small sample of a random forest, the prediction precision reaches 94.92%, the model can process the pressure sore detection work of the patient in batches, the model can autonomously predict the occurrence risk of the pressure sore adverse event of 1000 patients and automatically record and store the prediction result for only 100S, the risk prediction can be carried out on 36000 people in a hospital within one hour, and the prediction result can provide decision basis for the decision-making of whether medical staff carry out pressure sore early intervention treatment on the patient and relevant treatment schemes.
The present invention will be further described with reference to the following examples.
The invention provides an early prediction model of an unbalanced small sample pressure sore adverse event based on a random forest, which refers to a flow chart shown in the figure I and comprises the following steps.
(1) Inputting an original patient data table as an operation object for characteristic index extraction, wherein the data in the data table has both numerical data and a large amount of text data, completing missing values of the data by adopting a model method aiming at the numerical data, and extracting information of the text data by adopting a natural language processing technology, and the form of part of the data table is shown in the following tables 2 and 3:
TABLE 2 basic information Table
Figure BDA0003177620500000101
TABLE 3 advice list
Figure BDA0003177620500000102
(2) The pressure sore data sample set which can be used for model training is obtained by preprocessing patient data and relevant characteristic engineering, namely characteristic screening and characteristic extraction relevant work, the sample set is divided into a training set and a testing set by using a data sampling technology, and the divided data form is shown in the following table 4:
TABLE 4 results of the divisions
Figure BDA0003177620500000111
(3) Training a model by combining data of a training set with a random forest classifier, dividing and extracting the data set by using a bagging algorithm in the training of the model to obtain 94.92% of training precision, then putting the divided pressure sore data test set into a trained model effect for evaluation and verification, wherein the evaluation of the model result is shown in the following table:
TABLE 5 model evaluation chart
Figure BDA0003177620500000112
(4) According to an early prediction model of the adverse event of pressure sore of the small unbalanced sample based on the random forest, pressure sore risk prediction is carried out on different patients, relevant prediction results are output, data support and suggestions are improved for subsequent treatment, and the prediction results of part of patients are shown in the following table:
TABLE 6 prediction results
Figure BDA0003177620500000121
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

Claims (6)

1. An early prediction method for an adverse event of pressure sore of a small unbalanced sample based on a random forest is characterized by specifically comprising the following steps:
acquiring clinical data, performing feature extraction and preprocessing on the data based on clinical experience to obtain preprocessed data, and dividing the preprocessed data into a training set and a test set;
training based on a bagging algorithm by taking the training set as input to obtain a plurality of decision trees to construct a random forest classifier, and predicting pressure sores in a voting mode;
and verifying and analyzing the random forest classifier by using the test set as input and adopting a cross-verification method.
2. The method for early prediction of pressure sore adverse events based on unbalanced small samples of random forests as claimed in claim 1, wherein the preprocessing specifically comprises:
carrying out structured processing on the text data by adopting a natural language processing method; and eliminating abnormal data, and complementing missing data by using a model method.
3. The method of claim 1, wherein the training based on a bagging algorithm to obtain a plurality of decision trees to construct a random forest classifier comprises:
selecting a plurality of samples from the training set by adopting a Bootstrap method, and obtaining a plurality of training samples as a sample set;
selecting the best attribute of all attributes as a node by taking the sample set as input, and establishing a CART decision tree according to a preset decision tree construction method;
and repeating the steps for multiple times to obtain a plurality of different decision trees to form the random forest classifier.
4. The early prediction method for the adverse event of pressure sores on the basis of the unbalanced small samples of the random forest as claimed in claim 3, wherein the preset decision tree construction method specifically comprises the following steps:
acquiring a sample set, and judging whether the number of samples is smaller than a sample number threshold value and whether characteristics exist; if the sample characteristics do not exist or the number of samples is smaller than the sample number threshold value, returning to the decision sub-tree, and stopping recursion of the current node;
calculating a damping coefficient of the sample set, and judging whether the damping coefficient is smaller than a set damping coefficient threshold value; if the current node is smaller than the decision tree subtree, returning to the decision tree subtree, and stopping recursion of the current node;
identifying each characteristic type in the sample set, and calculating a kini coefficient under each characteristic; and constructing a decision tree by using the characteristic kini coefficient as a standard and adopting a binary recursion mode.
5. The method of claim 4, wherein the equation for calculating the Kearny coefficients is as follows:
Figure FDA0003177620490000021
wherein m represents the number of types of samples, and the value of m is 2 in the pressure sore sample setiIs the probability that the sample point belongs to the m-th class.
6. The method for early predicting the adverse event of the pressure sore by the small unbalanced sample based on the random forest as recited in claim 4, wherein the step of constructing the decision tree in a binary recursive manner by taking the characteristic kini coefficient as a standard specifically comprises the following steps:
screening an optimal characteristic value by taking the characteristic kiney coefficient as a standard, dividing a sample set into two parts, and establishing left and right nodes of a current node;
the calculation formula of the Gini coefficient of the divided sample set is as follows:
Figure FDA0003177620490000022
wherein | D-represents the total number of samples, | D1│、│D2| represents the number of samples for pressure sore and non-pressure sore categories, respectively.
CN202110837331.0A 2021-07-23 2021-07-23 Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest Pending CN113470819A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837331.0A CN113470819A (en) 2021-07-23 2021-07-23 Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837331.0A CN113470819A (en) 2021-07-23 2021-07-23 Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest

Publications (1)

Publication Number Publication Date
CN113470819A true CN113470819A (en) 2021-10-01

Family

ID=77882297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837331.0A Pending CN113470819A (en) 2021-07-23 2021-07-23 Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest

Country Status (1)

Country Link
CN (1) CN113470819A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114532995A (en) * 2022-03-25 2022-05-27 郑羽宇 Reminding method and reminding device for preventing pressure sores

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598726A (en) * 2019-07-16 2019-12-20 广东工业大学 Transmission tower bird damage risk prediction method based on random forest
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN112382388A (en) * 2020-12-14 2021-02-19 中南大学 Early warning method for adverse pressure sore event

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598726A (en) * 2019-07-16 2019-12-20 广东工业大学 Transmission tower bird damage risk prediction method based on random forest
AU2020100709A4 (en) * 2020-05-05 2020-06-11 Bao, Yuhang Mr A method of prediction model based on random forest algorithm
CN112382388A (en) * 2020-12-14 2021-02-19 中南大学 Early warning method for adverse pressure sore event

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114532995A (en) * 2022-03-25 2022-05-27 郑羽宇 Reminding method and reminding device for preventing pressure sores
CN114532995B (en) * 2022-03-25 2023-11-14 郑羽宇 Reminding method and reminding device for preventing pressure sores

Similar Documents

Publication Publication Date Title
US20220254493A1 (en) Chronic disease prediction system based on multi-task learning model
US20200250554A1 (en) Method and storage medium for predicting the dosage based on human physiological parameters
ȚĂRANU Data mining in healthcare: decision making and precision
Martins et al. Data mining for cardiovascular disease prediction
Karthiga et al. Early prediction of heart disease using decision tree algorithm
CN108492877B (en) Cardiovascular disease auxiliary prediction method based on DS evidence theory
CN110046757B (en) Outpatient clinic volume prediction system and prediction method based on LightGBM algorithm
CN113782183B (en) Device and method for predicting risk of pressure injury based on multi-algorithm fusion
CN110504031A (en) Cloud for Health behavior Intervention manages database building method and system
Adi et al. Stroke risk prediction model using machine learning
CN113470819A (en) Early prediction method for adverse event of pressure sore of small unbalanced sample based on random forest
Sudharson et al. Performance analysis of enhanced adaboost framework in multifacet medical dataset
CN114191665A (en) Method and device for classifying man-machine asynchronous phenomena in mechanical ventilation process
JP7365747B1 (en) Disease treatment process abnormality identification system based on hierarchical neural network
CN116798604A (en) Heating respiratory syndrome monitoring and early warning method and system based on multi-source data
CN113593703B (en) Device and method for constructing pressure injury risk prediction model
Xao et al. Fasting blood glucose change prediction model based on medical examination data and data mining techniques
Pelin et al. Prediction of human development index with health indicators using tree-based regression models
Sharma Data Mining Prediction Techniques in Health Care Sector
Rao et al. Extracting Insights and Prognosis of Corona Disease
CN112382388A (en) Early warning method for adverse pressure sore event
CN111243697A (en) Method and system for judging target object data based on neural network
Wang et al. Forecast of hospitalization costs of child patients based on machine learning methods and multiple classification
Saranya et al. BD-MDL: BIPOLAR DISORDER DETECTION USING MACHINE LEANRING AND DEEP LEARNING
CN117133459B (en) Machine learning-based postoperative intracranial infection prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination