CN114093523B - Construction method and application of new coronary pneumonia mild and severe disease prediction model - Google Patents

Construction method and application of new coronary pneumonia mild and severe disease prediction model Download PDF

Info

Publication number
CN114093523B
CN114093523B CN202111332027.7A CN202111332027A CN114093523B CN 114093523 B CN114093523 B CN 114093523B CN 202111332027 A CN202111332027 A CN 202111332027A CN 114093523 B CN114093523 B CN 114093523B
Authority
CN
China
Prior art keywords
group
prediction
mild
severe
coronary pneumonia
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111332027.7A
Other languages
Chinese (zh)
Other versions
CN114093523A (en
Inventor
李�杰
李鑫
埃德温·王
王亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202111332027.7A priority Critical patent/CN114093523B/en
Publication of CN114093523A publication Critical patent/CN114093523A/en
Application granted granted Critical
Publication of CN114093523B publication Critical patent/CN114093523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4842Monitoring progression or stage of a disease
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Surgery (AREA)
  • Molecular Biology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Biophysics (AREA)
  • Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • Psychiatry (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Signal Processing (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

A construction method and application of a new coronary pneumonia mild and severe symptom prediction model belong to the technical field of medical disease prediction. In order to solve the problems in the technology for predicting the patients with the mild and severe new coronary pneumonia, a method for constructing a prediction model of the mild and severe new coronary pneumonia is provided, and the method comprises four steps of processing a deficiency value and an extreme value, constructing a set of feature sets FS capable of reflecting the development risk of the mild and severe patients, constructing an integrated model EM to complement the advantages of a plurality of baseline models, and expanding the deficiency characteristics according to function-associated characteristics. The result shows that the construction method can well process the deficiency value and the extreme value, promote a plurality of prediction performances of the prediction model, has good performance on a plurality of prediction indexes, and can obtain stable performance. The new coronary pneumonia mild and severe prediction model obtained by the construction method can accurately predict patients with mild and severe symptoms at the early stage of infection of the patients with new coronary pneumonia, and is favorable for performing important nursing and treatment on the patients with severe symptoms in advance.

Description

Construction method and application of new coronary pneumonia mild and severe disease prediction model
Technical Field
The invention belongs to the technical field of medical disease prediction, and particularly relates to a construction method and application of a new coronary pneumonia mild-severe disease prediction model.
Background
In the early stage of infection of a new coronary pneumonia patient, accurate prediction of the patients with mild and severe symptoms is beneficial to implementation of graded nursing measures and advanced intensive nursing and treatment of the patients with severe symptoms, so that the cure rate of the new coronary pneumonia patient is effectively improved, and the burden of a medical system is reduced.
However, the performance and application of the prediction technique for patients with advanced coronary pneumonia are influenced by various factors, of which the following are important: 1) and processing missing values. The data set used for constructing the prediction model often contains abnormal values (mainly missing values), and the abnormal values are often far from the true values, so that the prediction result is deviated. Reasonable processing of these outliers helps to improve the performance of the prediction technique; 2) and (3) constructing a feature set capable of reflecting the development risk of the mild and serious symptoms of the patient. Among the many clinical features, only some of which reflect the patient's condition, redundant features may even reduce the performance of the prediction method. Therefore, correctly constructing a group of feature sets is particularly important for improving the performance of the prediction technology; 3) and (5) constructing a high-performance prediction model. Different prediction models often have different judgment modes during prediction, so that the prediction performances on different prediction objects are different. Advantages of the models are complemented in a reasonable mode, the defect of a single model is overcome, and better and more stable prediction performance can be obtained; 4) and (4) processing the missing features. In the application of prediction methods, the detection technology and medical resource reserves influence, and the prediction of some unknown patients can face the difficulty of lacking the required clinical characteristics. The extension of these missing features helps to broaden the application range of the prediction method.
Disclosure of Invention
In order to solve the technical problem, the invention provides a method for constructing a new coronary pneumonia mild-severe disease prediction model, which comprises the following steps:
s1, grouping the patient characteristics according to functions, and processing missing values and extreme values: dividing patients into a mild disease group and a severe disease group according to survival conditions of the patients, solving 95 percentile numerical values in all characteristics, replacing extreme values by the 95 percentile numerical values, and eliminating extreme value interference; according to the functional similarity of clinical characteristics, the characteristics of the mild case group and the severe case group are respectively divided into an independent characteristic group, a cardiovascular group, a hepatic and renal function group and an inflammation group according to characteristic functions to respectively fill missing values in each group;
s2, constructing a group of feature sets FS capable of reflecting the development risk of the mild and serious symptoms of the patient: based on a genetic algorithm, firstly coding a group of binary bit strings, wherein the length of each binary bit string is equal to the number of features contained in an original data set, whether each bit on each binary bit string corresponds to one feature is selected, selecting 5 baseline models with better prediction performance but different prediction results, respectively constructing a group of feature sets for each baseline model, and finally merging the features appearing in more than half of the baseline model feature sets to obtain a final group of feature sets FS;
s3, constructing an integrated model EM to complement the advantages of a plurality of baseline models: linearly combining the 5 kinds of baseline models which have good prediction performance and are different in prediction result and are obtained from S2 by using a group of coefficients to obtain an integrated model EM;
s4, expanding the missing feature according to the function association feature, and verifying the prediction algorithm: and for the features in the FS which are missing in the external verification set, performing extension replacement by using the features related to the functions, and repeating the three steps S1-S3 on the external verification set to verify the prediction algorithm.
Further defined, the missing value padding method described in S1 is as follows: for the missing value of a certain characteristic in a certain group of cardiovascular group, liver and kidney function group and inflammation group, selecting and estimating the value of the characteristic according to 3 individuals with similar values of other characteristics in the group, namely a certain individual X in the certain groupkIs characterized in that<x1k,x2k,…,xnk>Wherein x isnkIf the value is missing, the intra-group sum X is usedkEstimating x by the mean value of the feature n of 3 individuals with the nearest distance d to other featuresnk,Any individual X1And X2The distance d is defined as:
Figure GDA0003620202160000021
further, the 5 baseline models with better prediction performance but different prediction results in S2 are a gradient boosting decision tree, an extreme gradient boosting decision tree, a random forest, a linear regression, and a support vector machine, respectively.
Further defining, the method of constructing a set of feature sets for each baseline model described in S2 is as follows: and respectively taking the area under the ROC curve of the prediction output of each baseline model as an optimization target, performing 200 rounds of iterative operation, reserving the binary bit string with the first 30% of the output value under the ROC curve during each iteration, generating a new binary bit string through recombination and variation, and continuously iterating to maximize the target result.
Further defining, the calculation of the coefficients of S3 is based on a genetic algorithm: firstly, a group of binary bit strings are coded, each binary bit string can be decoded into a decimal between a value range of 0 and 1, the precision reaches 8 bits after the decimal point, the area under an ROC curve output by EM model prediction is taken as an optimization target, iterative computation enables the coefficient of the largest area under the output ROC curve, and a new binary bit string recombination and variation method is the same as that of S2.
Further defining, S3 the prediction score of the ensemble model EM for each patient is probemIs equal to each baseline model miOutput predicted value probiCorresponding coefficient ciThe formula is as follows:
Figure GDA0003620202160000022
the invention also provides a method for predicting the mild and severe new coronary pneumonia, which is characterized in that based on the prediction model of the mild and severe new coronary pneumonia obtained by the construction method of any one of claims 1 to 6, the method for predicting the mild and severe new coronary pneumonia comprises the following steps:
(1) inputting the clinical characteristics of a new coronary pneumonia patient into the severe prediction model of the new coronary pneumonia;
(2) according to the prediction score prob provided by the new severe coronary pneumonia prediction modelemCalculating formula and outputting the score prob of the tested patientemAnd according to probemThe patients to be tested were classified as light and heavy.
Further defined, the judgment criteria of the light weight and the heavy weight are as follows: when 0 is present<probem<When 0.5 hour, the tested patient belongs to the mild type of new coronary pneumonia; when probemWhen the lung cancer is more than or equal to 0.5, the tested patient belongs to severe new coronary pneumonia.
The invention has the beneficial effects that:
the invention provides a method for constructing a new coronary pneumonia mild and severe symptom prediction model, which is characterized in that training and testing are firstly carried out on a larger data set, and our method is verified on a larger independent verification set, and the result shows that: a) The data preprocessing method can well process missing values and extreme values; b) the constructed feature set can well help to improve multiple prediction performances of the prediction model; c) the integrated model generated by the advantage complementation and fusion method has good performance on a plurality of prediction indexes; d) by combining the correlation characteristic extension method, the prediction algorithm can still obtain stable performance on an independent verification set. Therefore, the prediction algorithm has good application prospect in helping human beings to overcome new crown epidemic situations. Therefore, the construction method can accurately predict patients with mild and severe symptoms at the early stage of infection of patients with new coronary pneumonia, is favorable for implementing graded nursing measures and is favorable for performing key nursing and treatment on patients with severe symptoms in advance, thereby effectively improving the cure rate of patients with new coronary pneumonia and reducing the burden of a medical system.
Drawings
Fig. 1 is a modeling block diagram of a new coronary pneumonia mild and severe prediction model.
Detailed Description
Example 1: construction method of new coronary pneumonia mild and severe disease prediction model
The method first trains and tests a group of sample sets (cohort 1) consisting of a plurality of new coronary confirmed patients to construct a prediction model. The clinical features of the patients in the sample set were 20 (including age, blood oxygen saturation, body temperature, platelets, mean arterial pressure, blood urea nitrogen, creatine, leukocytes, sodium ions, lymphocytes, international normalized ratios, D-dimer, glucose, glutamic-oxaloacetic transaminase, glutamic-pyruvic transaminase, interleukin-6, C-reactive protein, ferritin, procalcitonin, troponin) and the cure rate was 75.6% in these patients. Then training and testing are carried out on a sample set (cohort 2) consisting of a plurality of new coronary confirmed patients from another group of different sources to verify the model, the cure rate of the patients in the cohort 2 reaches 95.8 percent, and the specific model construction method is as follows:
step one, grouping the characteristics of the patients according to functions, and processing missing values and extreme values
Patients were assigned to treatment according to the patient treatment in cohort 1And (4) solving 95-percentile numerical values in each characteristic of the two groups of the mild disease group and the severe disease group, and replacing the extreme values by using the 95-percentile numerical values to eliminate the interference of the extreme values. And according to the functional similarity of clinical characteristics, the characteristics of the mild case group and the severe case group are respectively divided into an independent characteristic group, a cardiovascular group, a liver and kidney function group and an inflammation group according to characteristic functions to respectively fill the missing values in each group. For the deficiency value of a certain characteristic in the cardiovascular group, the liver and kidney function group and the inflammation group, the value of the characteristic of 3 individuals with similar values of other characteristics in the group is selected for estimation. Specifically, an individual X in a groupkIs characterized in that<x1k,x2k,…,xnk>Wherein x isnkIf the value is missing, the intra-group sum X is usedkEstimating x by the mean value of the feature n of 3 individuals with the nearest distance d to other featuresnk. Any individual X1And X2The distance d is defined as:
Figure GDA0003620202160000041
step two, constructing a group of feature sets FS capable of reflecting the development risk of the mild and serious symptoms of the patient
Designing a feature set construction method based on a genetic algorithm: first a set of binary strings (of length equal to the number of features contained in the original data set) is encoded, each bit on the binary string corresponding to whether a feature is selected. And selecting 5 groups of baseline models with better prediction performance but different prediction results, and respectively taking the area under the ROC curve (AUC) of the prediction output of each baseline model as an optimization target to perform 200 rounds of iterative operation. And in each iteration, the binary bit string with the first 30% of output AUC value is reserved, a new binary bit string is generated through recombination and mutation, and the iteration is continued to maximize the target result (AUC). Thus, for each baseline model, a set of feature sets will be constructed. The final feature set FS consists of features that occur in more than half of the baseline model feature set.
The obtained 5 kinds of baseline models with better prediction performance but different prediction results are respectively a gradient lifting decision tree (GBDT), an extreme gradient lifting (XGboost), a Random Forest (RF), a Linear Regression (LR) and a Support Vector Machine (SVM).
The finally constructed feature set FS comprises 14 clinical features, which are respectively: age, blood oxygen saturation, platelets, mean arterial pressure, leukocytes, lymphocytes, international normalized ratios, D-dimer, glucose, glutamic-pyruvic transaminase, interleukin-6, C-reactive protein, procalcitonin, troponin.
Step three, constructing an integrated model EM to complement the advantages of a plurality of baseline models
The 5 prediction models obtained in step two, which performed well but had differences in prediction results, were linearly combined using a set of coefficients. And (3) performing 100 rounds of half-half cross validation on each baseline model by using a data set corresponding to the features in the feature set FS, and recording the performance index score of each model in each test. The prediction score of EM model for each patient is probemIs equal to each baseline model miOutput predicted value probiCorresponding coefficient ciThe formula is as follows:
Figure GDA0003620202160000042
the coefficients are calculated based on a genetic algorithm: firstly, a group of binary bit strings are coded, each binary bit string can be decoded into a decimal within a value range of 0 to 1, and the precision reaches 8 bits after the decimal point. And taking the AUC predicted and output by the EM model as an optimization target, and iteratively calculating a coefficient which enables the AUC to be maximum. The new method of binary string reorganization and mutation is the same as that in step two.
The finally obtained corresponding coefficients of each baseline model are respectively as follows: 0.39620338(GBDT), 0.9574559(XGboost), 0.26222304(RF), 0.0315571(LR) and 0.24549838 (SVM). The prediction score for each patient by the EM model is a weighted average of the predicted values output by each baseline model and the corresponding coefficients.
The scores of the prediction indexes of the EM model on the queue 1 are respectively as follows: the precision rate is 0.868, the AUC is 0.907, the precision rate is 0.804 and the recall rate is 0.605;
fifthly, expanding the missing feature according to the function association feature and verifying the prediction algorithm
According to the associated feature expansion method, the features which are the same as or similar to the features in the FS are selected from the features in the queue 2, and then the feature set construction method in the step 2 is adopted to construct the feature set of the queue 2. Selecting a subset (subset 1) of patients in the cohort 1 which is consistent with the age distribution range of the patients in the cohort 2, and carrying out 100 rounds of half-and-half cross validation; the patients in cohort 2 were cross-validated 100 times in half and half, and at each validation, a subset of patients (subset 2) was extracted that was consistent with the proportion of non-cured patients in cohort 1, based on the time of patient diagnosis, and the performance of the method was compared between the two subsets.
The respective predictor scores of the EM model on subset 1 of cohort 1 are: the accuracy rate is 0.854, the AUC is 0.893, the precision rate is 0.799, and the recall rate is 0.588. Since subset 1 belongs to a subset of cohort 1 at a particular age, the model performs worse than in the whole (cohort 1). The respective prediction index scores of the EM model on the subset 2 are: the accuracy rate is 0.810, the AUC is 0.870, the precision rate is 0.683, and the recall rate is 0.511. Although the characteristics of the data set (queue 2) to which the subset 2 belongs are not exactly the same as those of the queue 1, the data set has better prediction performance due to the adoption of the associated characteristic expansion method.
Example 2: method for predicting severe and mild new coronary pneumonia
A model for predicting the mild and severe new coronary pneumonia is obtained based on the construction method obtained in the embodiment 1, and the method for predicting the mild and severe new coronary pneumonia comprises the following steps:
(1) inputting the clinical characteristics of a new coronary pneumonia patient into the new coronary pneumonia severe prediction model;
(2) a prediction score prob provided according to the new severe coronary pneumonia prediction modelemCalculating formula and outputting the score prob of the tested patientemAnd according to probemThe patients to be tested were classified as light and heavy.
The light and heavy judgment standards are as follows: when 0 is present<probem<When 0.5 hour, the tested patient belongs to the mild type of new coronary pneumonia; when probemWhen the lung cancer is more than or equal to 0.5, the detected patient belongs to severe new coronary pneumonia.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A construction method of a new coronary pneumonia mild-severe disease prediction model is characterized by comprising the following steps:
s1, grouping the patient characteristics according to functions, and processing missing values and extreme values: dividing patients into a mild case group and a severe case group according to the survival condition of the patients, solving 95 percent numerical values in all characteristics, replacing extreme values by the 95 percent numerical values, and eliminating the interference of the extreme values; according to the functional similarity of clinical characteristics, the characteristics of the mild case group and the severe case group are respectively divided into an independent characteristic group, a cardiovascular group, a liver and kidney function group and an inflammation group according to characteristic functions to respectively fill missing values in each group;
s2, constructing a group of feature sets FS capable of reflecting the development risk of the mild and serious symptoms of the patient: based on a genetic algorithm, firstly coding a group of binary bit strings, wherein the length of each binary bit string is equal to the number of features contained in an original data set, whether each bit on each binary bit string corresponds to one feature is selected, selecting 5 base line models with good prediction performance but different prediction results, performing 200 rounds of iterative operation by taking the area under a prediction output ROC curve of each base line model as an optimization target, respectively constructing a group of feature sets, retaining the binary bit strings with output AUC values of which are 30% in the front during each iteration, generating new binary bit strings through recombination and variation, continuously iterating to maximize a target result, and finally combining the feature sets appearing in more than half of the base line model feature sets to obtain a final group of feature sets FS; the 5 base line models with good prediction performance but different prediction results are respectively a gradient lifting decision tree, an extreme gradient lifting, a random forest, a linear regression and a support vector machine; the feature set FS has 14 clinical features, which are: age, blood oxygen saturation, platelets, mean arterial pressure, leukocytes, lymphocytes, international normalized ratios, D-dimer, glucose, glutamic-pyruvic transaminase, interleukin-6, C-reactive protein, procalcitonin, troponin;
s3, constructing an integrated model EM to complement the advantages of a plurality of baseline models: linearly combining the 5 kinds of baseline models which have good prediction performance and are different in prediction result and are obtained from S2 by using a group of coefficients to obtain an integrated model EM;
s4, expanding the missing feature according to the function association feature, and verifying the prediction algorithm: and for the features in the FS which are missing in the external verification set, performing extension replacement by using the features related to the functions, and repeating the three steps S1-S3 on the external verification set to verify the prediction algorithm.
2. The method of constructing according to claim 1, wherein the missing value filling method of S1 is as follows: for the missing value of a certain characteristic in a certain group of cardiovascular group, liver and kidney function group and inflammation group, selecting and estimating the value of the characteristic according to 3 individuals with similar values of other characteristics in the group, namely a certain individual X in the certain groupkIs characterized in that<x1k,x2k,…,xnk>Wherein x isnkIf the value is missing, the intra-group sum X is usedkEstimating x by the mean value of the feature n of 3 individuals with the nearest distance d to other featuresnk,Any individual X1And X2The distance d is defined as:
Figure FDA0003620202150000011
3. the construction method according to claim 1, wherein the coefficient of S3 is calculated based on a genetic algorithm: firstly, a group of binary bit strings are coded, each binary bit string is decoded into a decimal within a numeric range of 0 to 1, the precision reaches 8 bits after the decimal point, the area under an ROC curve predicted and output by an EM (expectation-maximization) model is taken as an optimization target, the coefficient with the largest area under the output ROC curve is subjected to iterative computation, and the new recombination and variation method of the binary bit strings is the same as that in S2.
4. The method of claim 3, wherein the integrated model EM of S3 has a prediction score of prob for each patientemIs equal to each baseline model miOutput predicted value probiCorresponding coefficient ciThe formula is as follows:
Figure FDA0003620202150000021
5. a method for predicting mild and severe new coronary pneumonia, which is based on the model for predicting mild and severe new coronary pneumonia obtained by the construction method according to any one of claims 1 to 4, and which comprises the steps of:
(1) inputting the clinical characteristics of a new coronary pneumonia patient into the severe prediction model of the new coronary pneumonia;
(2) according to the prediction score prob provided by the new severe coronary pneumonia prediction modelemCalculating formula and outputting the score prob of the tested patientemAnd according to probemThe patients to be tested were classified as light and heavy.
6. The method of claim 5, wherein the light and heavy criteria are: when 0 is present<probem<When 0.5 hour, the tested patient belongs to the mild type of new coronary pneumonia; when probemWhen the lung cancer is more than or equal to 0.5, the detected patient belongs to severe new coronary pneumonia.
CN202111332027.7A 2021-11-11 2021-11-11 Construction method and application of new coronary pneumonia mild and severe disease prediction model Active CN114093523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111332027.7A CN114093523B (en) 2021-11-11 2021-11-11 Construction method and application of new coronary pneumonia mild and severe disease prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111332027.7A CN114093523B (en) 2021-11-11 2021-11-11 Construction method and application of new coronary pneumonia mild and severe disease prediction model

Publications (2)

Publication Number Publication Date
CN114093523A CN114093523A (en) 2022-02-25
CN114093523B true CN114093523B (en) 2022-06-24

Family

ID=80299827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111332027.7A Active CN114093523B (en) 2021-11-11 2021-11-11 Construction method and application of new coronary pneumonia mild and severe disease prediction model

Country Status (1)

Country Link
CN (1) CN114093523B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974565A (en) * 2022-05-17 2022-08-30 上海市第四人民医院 Refined management and control method and system for diagnosis and treatment stages of new coronary patients

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681757A (en) * 2020-06-03 2020-09-18 广西壮族自治区人民医院 25(OH) D level-based prediction system for severity of new coronary pneumonia disease and construction and use method thereof

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101208602A (en) * 2005-04-15 2008-06-25 贝克顿迪金森公司 Diagnosis of sepsis
US20180025290A1 (en) * 2016-07-22 2018-01-25 Edwards Lifesciences Corporation Predictive risk model optimization
WO2018229019A1 (en) * 2017-06-12 2018-12-20 Koninklijke Philips N.V. Risk assessment of disseminated intravascular coagulation
EP3923785B1 (en) * 2019-02-14 2024-05-22 Baylor College of Medicine Method of predicting fluid responsiveness in patients
CN111524599A (en) * 2020-04-24 2020-08-11 中国地质大学(武汉) New coronary pneumonia data processing method and prediction system based on machine learning
CN112837822B (en) * 2020-09-24 2023-05-02 广州市疾病预防控制中心(广州市卫生检验中心、广州市食品安全风险监测与评估中心、广州医科大学公共卫生研究院) Marker for predicting light-to-heavy progress of patient with COVID-19, kit and establishment method
CN112652398A (en) * 2020-12-22 2021-04-13 浙江大学 New coronary pneumonia severe prediction method and system based on machine learning algorithm
CN112652361B (en) * 2020-12-29 2023-09-05 中国医科大学附属盛京医院 GBDT model-based myeloma high-risk screening method and application thereof
CN112967810A (en) * 2021-05-07 2021-06-15 四川大学华西医院 New coronavirus pneumonia severe prediction system and method
CN113409947B (en) * 2021-07-29 2022-02-01 四川大学华西医院 New coronary pneumonia severe change prediction model and system, and establishment method and prediction method thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111681757A (en) * 2020-06-03 2020-09-18 广西壮族自治区人民医院 25(OH) D level-based prediction system for severity of new coronary pneumonia disease and construction and use method thereof

Also Published As

Publication number Publication date
CN114093523A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN109036553B (en) Disease prediction method based on automatic extraction of medical expert knowledge
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN107742061B (en) Protein interaction prediction method, system and device
CN111968741B (en) Deep learning and integrated learning-based diabetes complication high-risk early warning system
Bhanot et al. A robust meta‐classification strategy for cancer detection from MS data
CN113505225B (en) Small sample medical relation classification method based on multi-layer attention mechanism
CN114093523B (en) Construction method and application of new coronary pneumonia mild and severe disease prediction model
CN116151485B (en) Method and system for predicting inverse facts and evaluating effects
CN113505477A (en) Process industry soft measurement data supplementing method based on SVAE-WGAN
CN114358169B (en) Colorectal cancer detection system based on XGBoost
Hong et al. Forward regression for Cox models with high-dimensional covariates
Moreira et al. Performance evaluation of predictive classifiers for pregnancy care
CN114925767A (en) Scene generation method and device based on variational self-encoder
CN118312816A (en) Cluster weighted clustering integrated medical data processing method and system based on member selection
Sudharson et al. Enhancing the Efficiency of Lung Disease Prediction using CatBoost and Expectation Maximization Algorithms
Zhang et al. QD-compressor: A quantization-based delta compression framework for deep neural networks
Bond et al. An unsupervised machine learning approach for ground‐motion spectra clustering and selection
CN118038959A (en) RNA modification prediction model construction method, mRNA and RNA modification prediction method
CN104616027A (en) Non-adjacent graph structure sparse face recognizing method
CN111488903A (en) Decision tree feature selection method based on feature weight
CN112784886B (en) Brain image classification method based on multi-layer maximum spanning tree graph core
CN113988083A (en) Factual information coding and evaluating method for shipping news abstract generation
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
Sherubha et al. Adaptive boosting model for breast cancer prediction
Dhamala et al. Multivariate time-series similarity assessment via unsupervised representation learning and stratified locality sensitive hashing: Application to early acute hypotensive episode detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant