CN113223708A - Method for constructing disease risk prediction model and related equipment - Google Patents

Method for constructing disease risk prediction model and related equipment Download PDF

Info

Publication number
CN113223708A
CN113223708A CN202110606089.6A CN202110606089A CN113223708A CN 113223708 A CN113223708 A CN 113223708A CN 202110606089 A CN202110606089 A CN 202110606089A CN 113223708 A CN113223708 A CN 113223708A
Authority
CN
China
Prior art keywords
copd
sample
risk prediction
prediction model
anxiety
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110606089.6A
Other languages
Chinese (zh)
Inventor
唐婷玉
黄勍栋
周晓曦
陆晓玲
吴海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Hospital
Original Assignee
Zhejiang Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Hospital filed Critical Zhejiang Hospital
Priority to CN202110606089.6A priority Critical patent/CN113223708A/en
Publication of CN113223708A publication Critical patent/CN113223708A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a construction method, equipment and a storage medium of a disease risk prediction model, and disease risk prediction equipment and a storage medium; by including the various indicators in the set of arguments: gender, marital status, education level, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, the course of COPD, the number of acute exacerbations of COPD in the last 1 year, regular inhalant therapy and a disease risk prediction model for home oxygen therapy can realize effective judgment on whether a COPD patient suffers from depression or anxiety.

Description

Method for constructing disease risk prediction model and related equipment
Technical Field
The invention relates to the technical field of medical information prediction, in particular to a method, equipment and a storage medium for constructing a disease risk prediction model, and disease risk prediction equipment and a storage medium.
Background
Chronic Obstructive Pulmonary Disease (COPD), a common disease that can be prevented and treated characterized by persistent airflow limitation, progresses progressively with increased chronic inflammatory responses of airways and lungs to toxic particles or gases. It mainly affects the lung and can cause various systemic (or extrapulmonary) adverse effects, wherein depression or anxiety is one of the important extrapulmonary manifestations of the disease. COPD patients, after having anxiety and depression, have a severe decrease in their compliance with therapy, with frequent acute exacerbations, leading to an extreme decline in quality of life and a significant increase in the risk of death.
Therefore, constructing a model that can effectively predict whether COPD patients suffer from depression or anxiety, so as to guide subsequent intervention and treatment schemes, is a technical problem to be solved at present.
Disclosure of Invention
In view of the above problems, the present invention provides a method, an apparatus, and a storage medium for constructing a disease risk prediction model, and a disease risk prediction device, an apparatus, and a storage medium.
In a first aspect, an embodiment of the present invention provides a method for constructing a disease risk prediction model, including:
obtaining a target sample set, wherein each sample in the target sample set comprises an autovariant set and a label, the label is used for describing whether a COPD patient corresponding to the sample has depression or anxiety, and the autovariant set comprises indexes of the COPD patient corresponding to the sample; and
based on the target sample set, performing model training by using a mechanical algorithm to obtain a disease risk prediction model;
wherein the condition risk prediction model is used to predict whether a COPD patient suffers from depression or anxiety; the indexes include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
Further, the obtaining a target sample set includes:
obtaining various clinical indexes of a sample;
counting, for each clinical index, the number of deletions of the sample lacking the clinical index;
screening candidate indexes from each clinical index according to the deletion quantity corresponding to each clinical index;
filling the candidate indexes missing in each sample;
forming a self-variable set corresponding to each sample based on the candidate indexes of each filled sample; and
for each sample, if the each sample comprises a field for representing anxiety-free state and a field for representing depression-free state, determining that the label corresponding to the each sample is a label for representing that the COPD patient corresponding to the sample does not suffer from depression or anxiety, otherwise, determining that the label corresponding to the each sample is a label for representing that the COPD patient corresponding to the sample suffers from depression or anxiety.
Further, the model training by using a machine algorithm based on the target sample set to obtain a disease risk prediction model includes:
randomly dividing the target sample set into a training sample subset and a test sample subset according to a preset proportion;
performing model training by using a logistic regression algorithm based on the training sample subset to obtain weights corresponding to all indexes, and obtaining a disease risk prediction model based on the weights corresponding to all indexes; and
and performing performance evaluation on the disease risk prediction model by using the training sample subset and the test sample subset, and performing model training again if the performance evaluation result does not meet the requirement.
In a second aspect, an embodiment of the present invention further provides a device for constructing a disease risk prediction model, where the device includes: a memory and a processor; the memory to store program instructions; the processor is configured to call the program instructions, and when the program instructions are executed, the processor is configured to perform any one of the above construction methods.
In a third aspect, an embodiment of the present invention further provides a disease risk prediction apparatus, where the apparatus includes: a memory and a processor;
the memory to store program instructions;
the processor, configured to invoke the program instructions, and when the program instructions are executed, configured to:
acquiring a self-variable set of a COPD patient to be detected;
inputting the set of independent variables of the COPD patient to be detected into the disease risk prediction model obtained according to any one of the construction methods, and obtaining the risk score of the COPD patient to be detected suffering from depression or anxiety; and
and judging whether the COPD patient to be detected has depression or anxiety according to the risk score.
In a fourth aspect, an embodiment of the present invention further provides a disease risk prediction apparatus, including: a memory and a processor;
the memory to store program instructions;
the processor, configured to invoke the program instructions, and when the program instructions are executed, configured to:
acquiring a self-variable set of a COPD patient to be detected;
inputting the set of independent variables of the COPD patient to be detected into a disease risk prediction model to obtain a risk score of the COPD patient to be detected suffering from depression or anxiety; and
judging whether the COPD patient to be detected suffers from depression or anxiety according to the risk score;
wherein, the set of self-variables comprises indexes of COPD patients to be detected, and the indexes comprise: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
Further, in the condition risk prediction apparatus, the condition risk prediction model is: y ═ sex (± (-1.135) + age ± + marking ± (-0.8872) + marking ±) 0.2666+ edu ± + edu [ (-) - + live (±) + 17.07) + income [ (-15.75) + income [ (-15.9) + income [ (-15.73) + patent [ (-1.59) + patent [ (+ 14.39+ patent [ (-2.44) + ebureden [ (+ 0.11) + cadence + clock + 0 + compact ± + factor + 2.44 + -); wherein, Y represents risk score, and x represents multiplier, and the concrete values of each index in the independent variable set are as follows: sex is female, sex2 is 0, sex is male, variable sex2 is 1; age is a specific age value; when the marital status is funeral, marriage2 is 0, and marriage3 is 0; when the marital status is single/divorced, marriage2 is 1, and marriage3 is 0; when the marriage status is married/factual marriage, marriage2 is 0, and marriage3 is 1; when the education degree is primary school or not school, edu2 is 0, edu3 is 0, and edu4 is 0; when the education degree is junior middle school, edu2 is 1, edu3 is 0, and edu4 is 0; when the education degree is high school or special school, edu2 is 0, edu3 is 1 and edu4 is 0; when the education degree is university and college, edu2 is 0, edu3 is 0, and edu4 is 1; when the long-term residence is in rural areas, live2 is 0; when the long-term residence place is a town, live2 is 1; when the annual income of the family is 20001-30000, income2 is 0, income3 is 0 and income4 is 0; when the annual income of the family is 30001-40000, incoe 2 is 1, incoe 3 is 0, and incoe 4 is 0; when the annual income of the family is 40001-; when the annual income of the family is 50001 or more, incoe 2 is 0, incoe 3 is 0, and incoe 4 is 1; the medical payment mode is public time, the payment2 is 0, the payment3 is 0, and the payment4 is 0; when the medical payment mode is medical insurance, the payment2 is 1, the payment3 is 0, and the payment4 is 0; when the medical payment mode is new agriculture, the payment2 is 0, the payment3 is 1, and the payment4 is 0; the medical payment mode is self-time-consuming, the payment2 is 0, the payment3 is 0, and the payment4 is 1; the direct economic burden for treating COPD over the past year was 3000 and below 3000, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 0; the direct economic burden for treating COPD over the past year was 3001-6000, eburden 2-1, eburden 3-0, eburden 4-0; the direct economic burden for treating COPD over the past year was 6001-; the direct economic burden for treating COPD over the past year was 9001 and above, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 1; when smoking is not performed, smoke2 is 0; when smoking, smoke2 is 1; COPD course is 0-5 years, copdYear2 is 0, copdYear3 is 0, and copdYear4 is 0; COPD course is 5-10 years, copdYear2 is 1, copdYear3 is 0, and copdYear4 is 0; COPD course is 10-15 years, copdYear2 is 0, copdYear3 is 1, and copdYear4 is 0; COPD course >15 years, copdYear2 ═ 0, copdYear3 ═ 0, copdYear4 ═ 1; exarbration 2 is 0 when the number of acute exacerbations of COPD <2 in the last 1 year; when the number of acute exacerbations of COPD is more than or equal to 2 in the last 1 year, exaerbation 2 is 1; in the absence of regular inhalation therapy, entrainment 2 is 0; where regular inhalation therapy is otherwise, inhalation2 is 1; in the absence of home oxygen therapy, oxygen2 ═ 0; in the case of home oxygen therapy, oxygen2 is 1.
Further, in the condition risk prediction device, the processor is specifically configured to: comparing the risk score with a risk score threshold, and if the risk score is larger than the risk score threshold, judging that the COPD patient to be detected has depression or anxiety; and if the risk score is less than or equal to the risk score threshold value, judging that the COPD patient to be detected does not suffer from depression or anxiety.
Further, the risk score threshold is determined based on a risk score of a sample in a target sample set, the risk score of the sample being obtained based on the disorder wind direction prediction model; and the target sample set is used for model training to obtain the disease wind direction prediction model.
Further, in the condition risk prediction device, the processor is further configured to:
receiving a clinical diagnosis result of whether the COPD patient to be tested has depression or anxiety;
obtaining a delta sample, wherein the delta sample comprises an invariant set and a label of the COPD patient to be tested, the label of the delta sample being determined according to the clinical diagnostic structure; and
updating the condition risk prediction model and the risk score threshold using the incremental sample as a sample of the target sample set; wherein the target sample set is used for model training to obtain the disease wind direction prediction model.
In a fifth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the above-mentioned construction method; or which computer program when executed by a processor carries out the steps performed by the processor in a condition risk prediction device according to any one of claims 4-9.
According to the method, the device and the storage medium for constructing the disease risk prediction model, the disease risk prediction device and the storage medium, effective judgment on whether a COPD patient suffers from depression or anxiety can be realized through the disease risk prediction model comprising indexes (sex, marital status, education degree, long-term residence, family annual income, medical expense payment mode, direct economic burden for treating COPD in the past year, smoking, COPD course, the number of acute exacerbations of COPD in the last 1 year, regular inhalant treatment and family oxygen therapy) in the autovariate set, the risk prediction model can be used for early screening whether the COPD patient suffers from depression and anxiety, and subsequent intervention and treatment scheme adjustment of the COPD patient can be guided based on the screening result.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 shows a flow chart of a method of constructing a risk prediction model of a condition according to an embodiment of the invention.
Fig. 2 is a block diagram showing a construction apparatus of a disease risk prediction model according to an embodiment of the present invention.
Fig. 3 shows an internal structural diagram of a computer apparatus according to an embodiment of the present invention.
Fig. 4 shows a flow chart of a method of risk prediction of a condition according to an embodiment of the invention.
Fig. 5 shows a block diagram of a disease risk prediction apparatus according to the present invention.
Fig. 6 is a graph showing the results of performance evaluation on a training set of the disease risk prediction model according to experimental example 1 of the present invention.
Fig. 7 is a graph showing the results of performance evaluation on a test set of the disease risk prediction model according to experimental example 1 of the present invention.
Fig. 8 is a graph showing the results of performance evaluation on the training set of the disease risk prediction model in comparative example 1 according to the present invention.
Fig. 9 shows a graph of the results of performance evaluation on the test set for the condition risk prediction model in comparative example 1 according to the present invention.
Fig. 10 is a graph showing the results of performance evaluation on the training set of the disease risk prediction model in comparative example 2 according to the present invention.
Fig. 11 is a graph showing the results of performance evaluation on the test set of the condition risk prediction model in comparative example 2 according to the present invention.
Fig. 12 is a graph showing the results of performance evaluation on the training set of the disease risk prediction model in comparative example 3 according to the present invention.
Fig. 13 is a graph showing the results of performance evaluation on the test set of the disease risk prediction model in comparative example 3 according to the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for constructing a disease risk prediction model according to an embodiment of the present invention, and the method is applied to a terminal for example in this embodiment, it can be understood that the method can also be applied to a server, and can also be applied to a system including the terminal and the server, and is implemented through interaction between the terminal and the server. In this embodiment, the method includes the following steps.
Step 101, obtaining a target sample set, wherein each sample in the target sample set comprises an autovariant set and a label, the label is used for describing whether a COPD patient corresponding to the sample has depression or anxiety, and the autovariant set comprises indexes of the COPD patient corresponding to the sample; wherein the condition risk prediction model is used to predict whether a COPD patient suffers from depression or anxiety; the indexes include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
In an embodiment of the present invention, each sample in the target sample set corresponds to a COPD patient, and each sample includes an independent variable set and a label, where the independent variable set includes indexes corresponding to the COPD patient, specifically, the indexes in this embodiment include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
In one embodiment, the marital status can be divided into two cases of not being in a marital state (funeral, singles and divorce), being in a marital state (married and factual marital); in another embodiment, the marital status can be divided into cases of funeral couple, single/divorce, married/factual marital; in other embodiments, the marital status can be divided into five cases, namely single, divorced, funeral, married and factual marital.
In one embodiment, the degree of education can be divided into two cases, including junior middle school and the following scholars (including not going to primary school, and junior middle school), high school and the above scholars (including high school, middle school, college, university and the above); in another embodiment, the education level can be divided into four cases of primary school or not, junior middle school, high school or secondary school, university and college; in other embodiments, the education level can be divided into more than six cases of primary school or not, junior high school, secondary school, college and university.
In one embodiment, the average annual income of family members can be divided into stages according to income, for example, the income can be divided into four cases of 20001-; in one embodiment, the direct economic burden for treating COPD over the past year can also be divided in stages, for example, 3000 and below, 3001-.
In one embodiment, the COPD course may be divided according to the length of the period for confirming COPD, and particularly may be divided according to different time periods; for example, the four cases can be divided into 0-5 years, 5-10 years, 10-15 years and >15 years.
In one embodiment, the number of acute exacerbations of COPD in the last 1 year is divided according to the number of times, for example, the number can be divided into two cases of <2 times and > 2 times.
In one embodiment, the long-term residence is divided into rural and urban areas; in some embodiments, the classification may also be based on the size of the towns.
In one embodiment, the medical payment mode is divided into three conditions of public fee, medical insurance/new agriculture and government agency and self fee; in another embodiment, the medical payment mode can be divided into four cases of public fee, medical insurance, new agriculture and government agency and self fee.
In one embodiment, the age can be divided into different cases according to different age groups, generally patients with COPD are mostly old, and the age can be divided into 10 years as one stage, for example, the cases can be divided into six cases of 50 years and below, 51-60 years, 61-70 years, 71-80 years, 81-90 years, 91 years and above; in another embodiment, the age value can be directly used as a specific parameter for calculation without dividing a specific age group.
In one embodiment, regular inhalation therapy is divided into two cases, no regular inhalation of therapeutic agent, wherein inhalation of therapeutic agent in regular inhalation is at least one; in a particular embodiment, wherein the therapeutic agent may be at least one of smoothie, sulindac, and cibotic. Wherein the Silihua is the trade name of the drug, which is called as tiotropium bromide powder inhalant or Nepeta tiotropium bromide spray, and the main component is tiotropium bromide; sulpiride is also the trade name of the medicine, is a compound preparation, and comprises the following components: salmeterol xinafoate and fluticasone propionate. The Xinbi can also be a medicine, is called as Xinbi Nibuhuo inhalant totally, is called as budesonide formoterol powder inhalant, and comprises the main components of budesonide and formoterol; the three medicines can increase the lung function, are used for the maintenance treatment of COPD, improve the life quality of COPD patients and reduce the acute exacerbation of COPD.
In one embodiment, the home oxygen therapy is divided into with home oxygen therapy and without home food; in another embodiment, the division can be performed according to the length of the household oxygen therapy, for example, the division can be divided into three cases of not performing the household oxygen therapy, the oxygen therapy time is 1-5 hours/day, the oxygen therapy time is 6-10 hours/day, and the oxygen therapy time is 10-15 hours/day; in another embodiment, the value of the length of the daily oxygen therapy can also be used directly as a parameter.
In one embodiment, step 101 may further include the following steps:
and step 1011, obtaining various clinical indexes of the sample.
Step 1012, counting the missing number of the sample missing each clinical index for each clinical index.
And 1013, screening candidate indexes from the various clinical indexes according to the missing quantity corresponding to the various clinical indexes respectively.
In the embodiment of the invention, the statistics of the missing quantity of each clinical index is carried out, specifically, the clinical index with the missing quantity smaller than the preset value is found out to be used as a candidate index, the missing quantity of the sample is large, and a large amount of filling is needed during the use, so that the stability and the accuracy of the index used as the model autovariable set are low.
Step 1014, filling the candidate indexes missing in each sample.
In the embodiment of the present invention, a specific method for filling the missing candidate indexes may be to randomly select the performance of most samples in the indexes for filling. In another embodiment, the first sample with the indicator may be randomly selected for padding. In another embodiment, the filling may be performed according to the correlation between other indexes of corresponding samples and the conditions of other samples, specifically, if a certain sample y lacks a certain candidate index x, at least one sample with higher index similarity to the sample y is found according to the indexes that are not missing in the sample y, and the candidate index x in the sample y is filled based on the candidate index x in the at least one sample. For example, if the missing value is the family average annual income, other samples with the same long-term residence places as the corresponding samples can be found, and the most cases in the family average annual income situations of the found other samples are used as the family average annual income of the corresponding samples for filling.
Step 1015, based on the candidate indexes of each filled sample, a set of auto-variables corresponding to each sample is formed.
Step 1016, for each sample, if the each sample comprises a field for characterizing non-anxiety disorder and a field for characterizing non-depression disorder, determining that the label corresponding to the each sample is a label for characterizing that the COPD patient corresponding to the sample does not suffer from depression or anxiety, otherwise, determining that the label corresponding to the each sample is a label for characterizing that the COPD patient corresponding to the sample suffers from depression or anxiety.
In the embodiment of the invention, the sample label is determined by searching whether a field for characterizing depression free and anxiety free of a COPD patient corresponding to the sample is contained in the data corresponding to the sample: judging that the COPD patient corresponding to the sample does not suffer from depression or anxiety if the data corresponding to the sample contains both the non-depression field and the non-anxiety field; if neither the anxiousness field nor the depression-free field is contained in the data corresponding to the sample, it indicates that the COPD patient corresponding to the sample has at least one of depression or anxiety.
And 102, performing model training by using a mechanical algorithm based on the target sample set to obtain a disease risk prediction model.
In the embodiment of the invention, a machine learning algorithm is utilized to carry out model training, and the obtained disease risk model is a classification model. In one embodiment, the machine learning algorithm therein is a supervised machine learning algorithm. In one embodiment, the machine learning algorithm may include at least one of a support vector machine algorithm, a naive bayes algorithm, a decision tree algorithm, a random forest algorithm, a neural network algorithm, and a regression algorithm.
In one embodiment, step 102, based on the target sample set, performing model training using a machine algorithm to obtain a disease risk prediction model, includes the following steps.
Step 1021, randomly dividing the target sample set into a training sample subset and a test sample subset according to a preset proportion.
In the embodiment of the present invention, the preset ratio refers to a ratio of the number of samples in the training sample subset and the test sample subset; in one embodiment, the predetermined ratio is 7: 3.
and 1022, performing model training by using a logistic regression algorithm based on the training sample subset to obtain weights corresponding to the indexes, and obtaining a disease risk prediction model based on the weights corresponding to the indexes.
In the embodiment of the invention, model training is carried out by using a logistic regression algorithm based on samples in a training sample subset, specifically, the weights of all indexes are obtained, and then a corresponding logistic regression model is obtained based on the weights of all indexes, wherein the logistic regression model is a disease risk prediction model.
And 1023, performing performance evaluation on the disease risk prediction model by using the training sample subset and the test sample subset, and if the performance evaluation result does not meet the requirement, performing model training again.
In the embodiment of the invention, the performance of the obtained model is evaluated by utilizing the training sample subset and the testing sample subset until the performance evaluation structure meets the requirement. Wherein, satisfying the requirement means that the obtained disease risk prediction model draws an ROC curve on the training sample subset and the test sample subset if the values of specificity TPR, sensitivity TNR, accuracy ACC, and AUC thereof are respectively greater than respective preset values, wherein the preset values are set according to the conventional experience in the art, and are not specifically limited herein. For example, in one specific embodiment, a model is deemed satisfactory where the AUC value is greater than 0.7.
In one embodiment, step 102 further comprises the steps of: and performing variable optimization on the disease risk prediction model by adopting a forward step-by-step selection method, a backward step-by-step selection method, a combination of the forward step-by-step selection method and the backward step-by-step selection method, so as to obtain the finally determined disease risk prediction model.
In one embodiment, when the disease risk prediction model does not meet the requirements, the index data of the samples in the target sample set can be checked, and model training is performed after obviously wrong index data is processed (for example, the samples can be removed or the data can be corrected according to actual needs); it is also possible to increase the number of samples in the target sample set and re-train the model. In one embodiment, model training may also be performed by changing specific indicators in the set of arguments (e.g., new indicators may be added). In one embodiment, the training sample subset and the test sample subset may be further subdivided, and the model training may be repeated.
In one embodiment, as shown in fig. 2, there is provided an apparatus for constructing a disease risk prediction model, including: a sample set acquisition unit 201 and a model acquisition unit 202;
a sample set obtaining unit 201, configured to randomly divide the target sample set into a training sample subset and a test sample subset according to a preset proportion;
a model obtaining unit 202, configured to perform model training by using a mechanical algorithm based on the target sample set, so as to obtain a disease risk prediction model;
wherein the condition risk prediction model is used to predict whether a COPD patient suffers from depression or anxiety; the indexes include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
For the specific definition of the apparatus for constructing a disease risk prediction model, see the above definition of the method for constructing a disease risk prediction model, which is not described herein again. The units in the device for constructing the disease risk prediction model can be wholly or partially realized by software, hardware and a combination thereof. The units may be embedded in hardware or independent from a processor in the computer device, or may be stored in a memory in the computer device in software, so that the processor can call and execute operations corresponding to the units.
In the embodiment of the present invention, an electronic device is provided, where the electronic device may be a computer device, the computer device may be a terminal, and an internal structure diagram of the electronic device may be as shown in fig. 3. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of constructing a model for risk prediction of a medical condition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, there is provided an electronic device, in particular a building device of a condition risk prediction model, the device comprising: the memory to store program instructions; the processor, configured to invoke the program instructions, and when the program instructions are executed, configured to perform the steps of: obtaining a target sample set, wherein each sample in the target sample set comprises an autovariant set and a label, the label is used for describing whether a COPD patient corresponding to the sample has depression or anxiety, and the autovariant set comprises indexes of the COPD patient corresponding to the sample; based on the target sample set, performing model training by using a mechanical algorithm to obtain a disease risk prediction model; wherein the condition risk prediction model is used to predict whether a COPD patient suffers from depression or anxiety; the indexes include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy. For the specific definition of the device for constructing a disease risk prediction model, see the above definition of the method for constructing a disease risk prediction model, which is not described herein again.
The electronic device in this embodiment may specifically be a computer device.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: obtaining a target sample set, wherein each sample in the target sample set comprises an autovariant set and a label, the label is used for describing whether a COPD patient corresponding to the sample has depression or anxiety, and the autovariant set comprises indexes of the COPD patient corresponding to the sample; based on the target sample set, performing model training by using a mechanical algorithm to obtain a disease risk prediction model; wherein the condition risk prediction model is used to predict whether a COPD patient suffers from depression or anxiety; the indexes include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
In one embodiment, an electronic device, in particular a medical condition risk prediction device, memory and a processor is provided; the memory to store program instructions; the processor is configured to invoke the program instructions for performing the steps of the disorder risk prediction method when the program instructions are executed.
Fig. 4 is a schematic flow chart of a disease risk prediction method according to an embodiment of the present invention, and as shown in fig. 4, the disease risk prediction method in this embodiment includes the following steps.
Step 401, obtaining an independent variable set of a COPD patient to be tested.
In the embodiment of the present invention, the set of independent variables specifically includes various indexes of a COPD patient to be detected, and the indexes specifically include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
Step 402, inputting the set of self-variables of the COPD patient to be tested into a disease risk prediction model, and obtaining the risk score of the COPD patient to be tested suffering from depression or anxiety.
In the embodiment of the present invention, the condition risk prediction model is obtained according to the construction method in the foregoing embodiment, and the condition risk prediction model is a model for determining whether a COPD patient has depression or anxiety.
Step 403, judging whether the COPD patient to be detected has depression or anxiety according to the risk score; wherein, the set of self-variables comprises indexes of COPD patients to be detected, and the indexes comprise: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
In an embodiment of the present invention, when determining whether the COPD patient to be detected has depression or anxiety according to the risk score, specifically, comparing the risk score with a risk score threshold, and if the risk score is greater than the risk score threshold, determining that the COPD patient to be detected has depression or anxiety; and if the risk score is less than or equal to the risk score threshold value, judging that the COPD patient to be detected does not suffer from depression or anxiety.
In one embodiment, the risk score threshold is determined based on a risk score of a sample in a target sample set, the risk score of a sample being obtained based on the disorder wind direction prediction model; and the target sample set is used for model training to obtain the disease wind direction prediction model. In a particular embodiment, the risk score threshold is obtained by obtaining a risk score for each sample in the subset of training samples from the condition risk prediction model and performing a ROC analysis on the risk score for each sample in the subset of training samples.
In one embodiment, the method for predicting the risk of a condition further comprises the steps of:
receiving a clinical diagnosis result of whether the COPD patient to be tested has depression or anxiety;
obtaining a incremental sample, wherein the incremental sample comprises an independent variable set and a label of the COPD patient to be detected, and the label of the incremental sample is determined according to the clinical diagnosis result; and
updating the condition risk prediction model and the risk score threshold using the incremental sample as a sample of the target sample set; wherein the target sample set is used for model training to obtain the disease wind direction prediction model.
In the embodiment of the invention, the diagnosis result of the patient with COPD to be detected is used as a label, the independent variable combination and the label of the patient with COPD to be detected are used as incremental sample data, the incremental sample data is used as a new sample and is added into a target sample set for model training to obtain a new target sample set, model training is carried out again by using the new target sample set, and the risk score threshold value is obtained again, so that the updating of the disease risk model and the risk score threshold value is realized. Therefore, the sample size of the model can be increased by collecting data of a COPD patient to be tested, and the prediction performance of the model is improved.
In one embodiment, the condition risk prediction model may be a static model. In one embodiment, the disease risk prediction model is: y ═ sex (± (-1.135) + age ± + marking ± (-0.8872) + marking ±) 0.2666+ edu ± + edu [ (-) - + live (±) + 17.07) + income [ (-15.75) + income [ (-15.9) + income [ (-15.73) + patent [ (-1.59) + patent [ (+ 14.39+ patent [ (-2.44) + ebureden [ (+ 0.11) + cadence + clock + 0 + compact ± + factor + 2.44 + -); wherein, Y represents risk score, and x represents multiplier, and the concrete values of each index in the independent variable set are as follows: sex is female, sex2 is 0, sex is male, variable sex2 is 1; age is a specific age value; when the marital status is funeral, marriage2 is 0, and marriage3 is 0; when the marital status is single/divorced, marriage2 is 1, and marriage3 is 0; when the marriage status is married/factual marriage, marriage2 is 0, and marriage3 is 1; when the education degree is primary school or not school, edu2 is 0, edu3 is 0, and edu4 is 0; when the education degree is junior middle school, edu2 is 1, edu3 is 0, and edu4 is 0; when the education degree is high school or special school, edu2 is 0, edu3 is 1 and edu4 is 0; when the education degree is university and college, edu2 is 0, edu3 is 0, and edu4 is 1; when the long-term residence is in rural areas, live2 is 0; when the long-term residence place is a town, live2 is 1; when the annual income of the family is 20001-30000, income2 is 0, income3 is 0 and income4 is 0; when the annual income of the family is 30001-40000, incoe 2 is 1, incoe 3 is 0, and incoe 4 is 0; when the annual income of the family is 40001-; when the annual income of the family is 50001 or more, incoe 2 is 0, incoe 3 is 0, and incoe 4 is 1; the medical payment mode is public time, the payment2 is 0, the payment3 is 0, and the payment4 is 0; when the medical payment mode is medical insurance, the payment2 is 1, the payment3 is 0, and the payment4 is 0; when the medical payment mode is new agriculture, the payment2 is 0, the payment3 is 1, and the payment4 is 0; the medical payment mode is self-time-consuming, the payment2 is 0, the payment3 is 0, and the payment4 is 1; the direct economic burden for treating COPD over the past year was 3000 and below 3000, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 0; the direct economic burden for treating COPD over the past year was 3001-6000, eburden 2-1, eburden 3-0, eburden 4-0; the direct economic burden for treating COPD over the past year was 6001-; the direct economic burden for treating COPD over the past year was 9001 and above, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 1; when smoking is not performed, smoke2 is 0; when smoking, smoke2 is 1; COPD course is 0-5 years, copdYear2 is 0, copdYear3 is 0, and copdYear4 is 0; COPD course is 5-10 years, copdYear2 is 1, copdYear3 is 0, and copdYear4 is 0; COPD course is 10-15 years, copdYear2 is 0, copdYear3 is 1, and copdYear4 is 0; COPD course >15 years, copdYear2 ═ 0, copdYear3 ═ 0, copdYear4 ═ 1; exarbration 2 is 0 when the number of acute exacerbations of COPD <2 in the last 1 year; when the number of acute exacerbations of COPD is more than or equal to 2 in the last 1 year, exaerbation 2 is 1; in the absence of regular inhalation therapy, entrainment 2 is 0; where regular inhalation therapy is otherwise, inhalation2 is 1; in the absence of home oxygen therapy, oxygen2 ═ 0; in the case of home oxygen therapy, oxygen2 is 1.
In one embodiment, according to the risk prediction model, the risk score of each sample in the training sample subset is calculated, and ROC (receiver operating curve) analysis is performed on the risk score of each sample in the training sample subset to obtain a risk score threshold of 1.414. When the corresponding risk score of the patient to be detected is larger than 1.414, judging that the COPD patient to be detected has depression or anxiety; and if the risk score of the COPD patient to be detected is less than or equal to 1.414, judging that the COPD patient to be detected does not suffer from depression or anxiety.
In one embodiment, as shown in fig. 5, there is provided a condition risk prediction apparatus including: a variable set acquisition unit 501, a risk score acquisition unit 502 and a judgment unit 503; wherein:
a variable set acquiring unit 501, configured to acquire an auto-variable set of a COPD patient to be detected;
a risk score obtaining unit 502, configured to input the set of self-variables of the COPD patient to be tested into the disease risk prediction model, and obtain a risk score of the COPD patient to be tested suffering from depression or anxiety;
a judging unit 503, configured to judge whether the COPD patient to be detected has depression or anxiety according to the risk score. Wherein, the set of self-variables comprises indexes of COPD patients to be detected, and the indexes comprise: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a self-variable set of a COPD patient to be detected;
inputting the set of independent variables of the COPD patient to be detected into a disease risk prediction model to obtain a risk score of the COPD patient to be detected suffering from depression or anxiety; and
and judging whether the COPD patient to be detected has depression or anxiety according to the risk score.
In the computer-readable storage medium, the disease risk prediction model may be obtained according to the construction method described in the embodiment of the present invention, or may be a static model in the embodiment of the present invention.
Experimental example 1
1. Data pre-processing
In the experimental example, the number of samples is 375, the number of clinical indexes corresponding to each sample is 27, and the index in which the number of missing samples is more than 100 is deleted by counting the number of clinical index missing samples of each sample.
In the sample data, each sample corresponds to a label, and the label shows whether the COPD patient corresponding to the sample has depression or anxiety. The samples were counted based on the above conditions, wherein the number of samples corresponding to COPD patients who did not suffer from depression or anxiety was 67, and the number of samples corresponding to COPD patients who suffered from depression or anxiety was 308. The indexes corresponding to each of the remaining samples are 13, which are respectively: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalant therapy and home oxygen therapy; filling missing values in the data. The clinical index data for each sample was labeled as follows:
age field: taking a specific age value;
a gender field: 1 for female and 2 for male;
marital status field: "funeral couple" ═ 1, "singleness/divorce" ═ 2, "married/factual marriage" ═ 3;
education level field: "primary school or not go to primary school" ═ 1, "junior middle school" ═ 2, "high school or middle school" ═ 3, "college and above" ═ 4;
long-term residence field: 1 in rural area and 2 in town;
family per year income field: 20001-;
medical expense payment mode field: the "public fee" is 1, the "medical insurance" is 2, the "new agriculture" is 3, and the "self fee" is 4;
direct economic burden field for the treatment of COPD in the past year: 3000 and below 1, 3001 and 6000, 6001 and 9000, 3, 9001 and above 4;
smoking (yes/no) field: 1 or 2 or more;
COPD course (years) field: 1 in 0-5 years, 2 in 5-10 years, 3 in 10-15 years and 4 in 15 years;
number of acute exacerbations of COPD field in last 1 year: 1 for "<2 times," ≧ 2 times "═ 2;
regular inhalant treatment (yes/no) field: 1 or 2 or else;
home oxygen therapy (yes/no) field: "no" is 1 and "yes" is 2.
2. Model training
All samples were as per training set: test set 7: a ratio of 3 is used for sample splitting and the random seed is set to 8. The samples in the training set were then subjected to logistic regression analysis and variable optimization using a forward stepwise selection method. The final model training results are shown in table 1:
TABLE 1
Coefficients:
Estimate Std.Error z value Pr(>|z|)
Figure BDA0003080949970000191
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
The above table is converted to the formula: y ═ sex (± (-1.135) + age ± + marking ± (-0.8872) + marking ±) 0.2666+ edu ± + edu [ (-) - + live (±) + 17.07) + income [ (-15.75) + income [ (-15.9) + income [ (-15.73) + patent [ (-1.59) + patent [ (+ 14.39+ patent [ (-2.44) + ebureden [ (+ 0.11) + cadence + clock + 0 + compact ± + factor + 2.44 + -); wherein Y represents a risk score, and x represents a multiple number, and the specific values of each index in the independent variable set are as follows: sex is female, sex2 is 0, sex is male, variable sex2 is 1; age is a specific age value; when the marital status is funeral, marriage2 is 0, and marriage3 is 0; when the marital status is single/divorced, marriage2 is 1, and marriage3 is 0; when the marriage status is married/factual marriage, marriage2 is 0, and marriage3 is 1; when the education degree is primary school or not school, edu2 is 0, edu3 is 0, and edu4 is 0; when the education degree is junior middle school, edu2 is 1, edu3 is 0, and edu4 is 0; when the education degree is high school or special school, edu2 is 0, edu3 is 1 and edu4 is 0; when the education degree is university and college, edu2 is 0, edu3 is 0, and edu4 is 1; when the long-term residence is in rural areas, live2 is 0; when the long-term residence place is a town, live2 is 1; when the annual income of the family is 20001-30000, income2 is 0, income3 is 0 and income4 is 0; when the annual income of the family is 30001-40000, incoe 2 is 1, incoe 3 is 0, and incoe 4 is 0; when the annual income of the family is 40001-; when the annual income of the family is 50001 or more, incoe 2 is 0, incoe 3 is 0, and incoe 4 is 1; the medical payment mode is public time, the payment2 is 0, the payment3 is 0, and the payment4 is 0; when the medical payment mode is medical insurance, the payment2 is 1, the payment3 is 0, and the payment4 is 0; when the medical payment mode is new agriculture, the payment2 is 0, the payment3 is 1, and the payment4 is 0; the medical payment mode is self-time-consuming, the payment2 is 0, the payment3 is 0, and the payment4 is 1; the direct economic burden for treating COPD over the past year was 3000 and below 3000, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 0; the direct economic burden for treating COPD over the past year was 3001-6000, eburden 2-1, eburden 3-0, eburden 4-0; the direct economic burden for treating COPD over the past year was 6001-; the direct economic burden for treating COPD over the past year was 9001 and above, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 1; when smoking is not performed, smoke2 is 0; when smoking, smoke2 is 1; COPD course is 0-5 years, copdYear2 is 0, copdYear3 is 0, and copdYear4 is 0; COPD course is 5-10 years, copdYear2 is 1, copdYear3 is 0, and copdYear4 is 0; COPD course is 10-15 years, copdYear2 is 0, copdYear3 is 1, and copdYear4 is 0; COPD course >15 years, copdYear2 ═ 0, copdYear3 ═ 0, copdYear4 ═ 1; exarbration 2 is 0 when the number of acute exacerbations of COPD <2 in the last 1 year; when the number of acute exacerbations of COPD is more than or equal to 2 in the last 1 year, exaerbation 2 is 1; in the absence of regular inhalation therapy, entrainment 2 is 0; where regular inhalation therapy is otherwise, inhalation2 is 1; in the absence of home oxygen therapy, oxygen2 ═ 0; in the case of home oxygen therapy, oxygen2 is 1.
3. Risk score threshold
And obtaining the risk score of each sample in the training set by using a model, and performing ROC analysis on the risk score of each sample in the training set to obtain a risk score threshold value of 1.414.
If the risk score value is less than or equal to 1.414, the subject is judged not to have anxiety or depression or is judged to have the anxiety or depression. The risk of anxiety or depression is small, and if the risk score value of the sample is greater than 1.414, the subject is judged to have anxiety or depression or the risk of anxiety or depression is large.
4. Performance evaluation
The evaluation result of the performance evaluation on the training set is shown in fig. 6, and it can be seen from fig. 6 that the specificity TPR of the model on the training set is 0.6273585, the sensitivity TNR is 0.8431373, and the accuracy ACC is 0.6692015; AUC value (Area under the curve): 0.7629.
the evaluation results of the performance evaluation on the test set are shown in fig. 7, and it can be seen from fig. 7 that the specificity TPR of the model on the training set is 0.5520833, the sensitivity TNR is 0.75, and the accuracy ACC is 0.5803571; AUC value (Area under the curve): 0.7018.
as the performance evaluation results of the model on the training set and the test set are known, the AUC values are all larger than 0.7, and the model has better clinical application value.
Comparative example 1
The comparative example was based on experimental example 1, in which four variables of age, education, COPD course, regular inhalant treatment were removed; model training is carried out again, the finally obtained model is shown in a table 2,
TABLE 2
Coefficients:
Estimate Std.Error z value Pr(>|z|)
Figure BDA0003080949970000211
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
The evaluation results of the performance evaluation of the model in the comparative example on the training set are shown in fig. 8, and it can be seen from fig. 8 that the specificity TPR of the model on the training set is 0.4056604, the sensitivity TNR is 0.9215686, and the accuracy ACC is 0.5057034; AUC value (Area under the curve): 0.693.
the evaluation results of the performance evaluation of the model in the comparative example on the test set are shown in fig. 9, and it can be seen from fig. 9 that the specificity TPR of the model on the training set is 0.34375, the sensitivity TNR is 0.9375, and the accuracy ACC is 0.4285714; AUC value (Area under the curve): 0.476.
comparative example 2
The comparative example was based on experimental example 1, in which four variables of age, education, COPD course, and home oxygen therapy were removed; model construction is carried out again, and the finally obtained model is shown in Table 3,
TABLE 3
Coefficients:
Estimate Std.Error z value Pr(>|z|)
Figure BDA0003080949970000221
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
The evaluation results of the performance evaluation of the model in the comparative example on the training set are shown in fig. 10, and it can be seen from fig. 10 that the specificity TPR of the model on the training set is 0.3915094, the sensitivity TNR is 0.9411765, and the accuracy ACC is 0.4980989; AUC value (Area under the curve): 0.696.
the evaluation results of the performance evaluation of the model in the comparative example on the test set are shown in fig. 11, and it can be seen from fig. 11 that the specificity TPR of the model on the training set is 0.3333333, the sensitivity TNR is 0.9375, and the accuracy ACC is 0.4196429; AUC value (Area under the curve): 0.440.
comparative example 3
On the basis of the experimental example 1, the four variables of education degree, family income per year, medical expense payment mode and COPD course are removed; model construction is carried out again, and the finally obtained model is shown in Table 4,
TABLE 4
Coefficients:
Estimate Std.Error zvalue Pr(>|z|)
Figure BDA0003080949970000222
Signif.codes:0‘***’0.001‘**’0.01‘*’0.05‘.’0.1‘’1
The evaluation results of the performance evaluation of the model in this comparative example on the training set are shown in fig. 12, and it can be seen from fig. 12 that the specificity TPR of the model on the training set is 0.5471698, the sensitivity TNR is 0.8039216, and the accuracy ACC is 0.5969582; AUC value (Area under the curve): 0.716.
the evaluation results of the performance evaluation of the model in this comparative example on the test set are shown in fig. 13, and it can be seen from fig. 13 that the specificity TPR of the model on the training set is 0.4166667, the sensitivity TNR is 0.6875, and the accuracy ACC is 0.4553571; AUC value (Area under the curve) 0.440.
The model used in experimental example 1 is a disease risk prediction model in the present invention, that is, the indexes of the set of independent variables are gender, marital status, education level, long-term residence, average family income, medical expense payment, direct economic burden for treating COPD in the past year, smoking, COPD course, the number of acute exacerbations of COPD in the last 1 year, regular inhalant therapy and home oxygen therapy. The AUC values of the model in both the training and test sets were greater than 0.7 in the experimental examples, thus indicating a good predictive effect on whether COPD patients suffer from depression or anxiety. In contrast, the models in comparative examples 1 to 3 adopt different sets of independent variables (specifically, different indexes) from those in the examples of the present invention, and the AUC values of the models in the three comparative examples on the training set are all lower than those of the experimental examples on the training set, and the more obvious difference is that the AUC values of the models in the three comparative examples on the test set are all close to 0.5, so that the models have no clinical diagnosis value. As can be seen from the above, the model used in experimental example 1 is very effective for predicting whether or not a COPD patient suffers from depression or anxiety, compared to the other combinations in the comparative example.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by hardware that is instructed to implement by a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
While the invention has been described in detail with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (10)

1. A method for constructing a disease risk prediction model, comprising:
obtaining a target sample set, wherein each sample in the target sample set comprises an autovariant set and a label, the label is used for describing whether a COPD patient corresponding to the sample has depression or anxiety, and the autovariant set comprises indexes of the COPD patient corresponding to the sample; and
based on the target sample set, performing model training by using a mechanical algorithm to obtain a disease risk prediction model;
wherein the condition risk prediction model is used to predict whether a COPD patient suffers from depression or anxiety; the indexes include: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
2. The construction method according to claim 1, wherein the obtaining a set of target samples comprises:
obtaining various clinical indexes of a sample;
counting, for each clinical index, the number of deletions of the sample lacking the clinical index;
screening candidate indexes from each clinical index according to the deletion quantity corresponding to each clinical index;
filling the candidate indexes missing in each sample;
forming a self-variable set corresponding to each sample based on the candidate indexes of each filled sample; and
for each sample, if the each sample comprises a field for representing anxiety-free state and a field for representing depression-free state, determining that the label corresponding to the each sample is a label for representing that the COPD patient corresponding to the sample does not suffer from depression or anxiety, otherwise, determining that the label corresponding to the each sample is a label for representing that the COPD patient corresponding to the sample suffers from depression or anxiety.
3. The construction method according to claim 1 or 2, wherein the model training using a machine algorithm based on the target sample set to obtain a disease risk prediction model comprises:
randomly dividing the target sample set into a training sample subset and a test sample subset according to a preset proportion;
performing model training by using a logistic regression algorithm based on the training sample subset to obtain weights corresponding to all indexes, and obtaining a disease risk prediction model based on the weights corresponding to all indexes; and
and performing performance evaluation on the disease risk prediction model by using the training sample subset and the test sample subset, and performing model training again if the performance evaluation result does not meet the requirement.
4. An apparatus for constructing a disease risk prediction model, the apparatus comprising: a memory and a processor; the memory to store program instructions; the processor, configured to invoke the program instructions, and when executed, configured to perform the building method of any one of claims 1-3.
5. A condition risk prediction device, characterized in that the device comprises: a memory and a processor;
the memory to store program instructions;
the processor, configured to invoke the program instructions, and when the program instructions are executed, configured to:
acquiring a self-variable set of a COPD patient to be detected;
inputting a set of independent variables of a COPD patient to be tested into a disease risk prediction model obtained by the construction method according to any one of claims 1-3, and obtaining a risk score of the COPD patient to be tested suffering from depression or anxiety; and
and judging whether the COPD patient to be detected has depression or anxiety according to the risk score.
6. A condition risk prediction device, characterized in that the device comprises: a memory and a processor;
the memory to store program instructions;
the processor, configured to invoke the program instructions, and when the program instructions are executed, configured to:
acquiring a self-variable set of a COPD patient to be detected;
inputting the set of independent variables of the COPD patient to be detected into a disease risk prediction model to obtain a risk score of the COPD patient to be detected suffering from depression or anxiety; and
judging whether the COPD patient to be detected suffers from depression or anxiety according to the risk score;
wherein, the set of self-variables comprises indexes of COPD patients to be detected, and the indexes comprise: gender, age, marital status, education, long-term residence, average family income, medical expense payment means, direct economic burden for treating COPD in the past year, smoking, course of COPD, number of acute exacerbations of COPD in the last 1 year, regular inhalation therapy and home oxygen therapy.
7. The medical condition risk prediction device according to claim 6, wherein the medical condition risk prediction model is: y ═ sex (± (-1.135) + age ± + marking ± (-0.8872) + marking ±) 0.2666+ edu ± + edu [ (-) - + live (±) + 17.07) + income [ (-15.75) + income [ (-15.9) + income [ (-15.73) + patent [ (-1.59) + patent [ (+ 14.39+ patent [ (-2.44) + ebureden [ (+ 0.11) + cadence + clock + 0 + compact ± + factor + 2.44 + -);
wherein, Y represents risk score, and x represents multiplier, and the concrete values of each index in the independent variable set are as follows: sex is female, sex2 is 0, sex is male, variable sex2 is 1; age is a specific age value; when the marital status is funeral, marriage2 is 0, and marriage3 is 0; when the marital status is single/divorced, marriage2 is 1, and marriage3 is 0; when the marriage status is married/factual marriage, marriage2 is 0, and marriage3 is 1; when the education degree is primary school or not school, edu2 is 0, edu3 is 0, and edu4 is 0; when the education degree is junior middle school, edu2 is 1, edu3 is 0, and edu4 is 0; when the education degree is high school or special school, edu2 is 0, edu3 is 1 and edu4 is 0; when the education degree is university and college, edu2 is 0, edu3 is 0, and edu4 is 1; when the long-term residence is in rural areas, live2 is 0; when the long-term residence place is a town, live2 is 1; when the annual income of the family is 20001-30000, income2 is 0, income3 is 0 and income4 is 0; when the annual income of the family is 30001-40000, incoe 2 is 1, incoe 3 is 0, and incoe 4 is 0; when the annual income of the family is 40001-; when the annual income of the family is 50001 or more, incoe 2 is 0, incoe 3 is 0, and incoe 4 is 1; the medical payment mode is public time, the payment2 is 0, the payment3 is 0, and the payment4 is 0; when the medical payment mode is medical insurance, the payment2 is 1, the payment3 is 0, and the payment4 is 0; when the medical payment mode is new agriculture, the payment2 is 0, the payment3 is 1, and the payment4 is 0; the medical payment mode is self-time-consuming, the payment2 is 0, the payment3 is 0, and the payment4 is 1; the direct economic burden for treating COPD over the past year was 3000 and below 3000, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 0; the direct economic burden for treating COPD over the past year was 3001-6000, eburden 2-1, eburden 3-0, eburden 4-0; the direct economic burden for treating COPD over the past year was 6001-; the direct economic burden for treating COPD over the past year was 9001 and above, eburden2 ═ 0, eburden3 ═ 0, eburden4 ═ 1; when smoking is not performed, smoke2 is 0; when smoking, smoke2 is 1; COPD course is 0-5 years, copdYear2 is 0, copdYear3 is 0, and copdYear4 is 0; COPD course is 5-10 years, copdYear2 is 1, copdYear3 is 0, and copdYear4 is 0; COPD course is 10-15 years, copdYear2 is 0, copdYear3 is 1, and copdYear4 is 0; COPD course >15 years, copdYear2 ═ 0, copdYear3 ═ 0, copdYear4 ═ 1; exarbration 2 is 0 when the number of acute exacerbations of COPD <2 in the last 1 year; when the number of acute exacerbations of COPD is more than or equal to 2 in the last 1 year, exaerbation 2 is 1; in the absence of regular inhalation therapy, entrainment 2 is 0; where regular inhalation therapy is otherwise, inhalation2 is 1; in the absence of home oxygen therapy, oxygen2 ═ 0; in the case of home oxygen therapy, oxygen2 is 1.
8. The condition risk prediction device according to any of claims 4-7, wherein the processor is specifically configured to:
comparing the risk score with a risk score threshold, and if the risk score is larger than the risk score threshold, judging that the COPD patient to be detected has depression or anxiety; if the risk score is less than or equal to the risk score threshold value, judging that the COPD patient to be detected does not suffer from depression or anxiety;
preferably, the risk score threshold is determined based on the risk scores of the samples in the target sample set, the risk scores of the samples being obtained based on the disorder wind direction prediction model; and the target sample set is used for model training to obtain the disease wind direction prediction model.
9. A condition risk prediction device according to any of claims 4-7, the processor further configured to:
receiving a clinical diagnosis result of whether the COPD patient to be tested has depression or anxiety;
obtaining a incremental sample, wherein the incremental sample comprises an independent variable set and a label of the COPD patient to be detected, and the label of the incremental sample is determined according to the clinical diagnosis result; and
updating the condition risk prediction model and the risk score threshold using the incremental sample as a sample of the target sample set;
wherein the target sample set is used for model training to obtain the disease wind direction prediction model.
10. A computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the construction method according to any one of claims 1 to 3; or which computer program when executed by a processor carries out the steps performed by the processor in a condition risk prediction device according to any one of claims 4-9.
CN202110606089.6A 2021-05-24 2021-05-24 Method for constructing disease risk prediction model and related equipment Pending CN113223708A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110606089.6A CN113223708A (en) 2021-05-24 2021-05-24 Method for constructing disease risk prediction model and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110606089.6A CN113223708A (en) 2021-05-24 2021-05-24 Method for constructing disease risk prediction model and related equipment

Publications (1)

Publication Number Publication Date
CN113223708A true CN113223708A (en) 2021-08-06

Family

ID=77082068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110606089.6A Pending CN113223708A (en) 2021-05-24 2021-05-24 Method for constructing disease risk prediction model and related equipment

Country Status (1)

Country Link
CN (1) CN113223708A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114582511A (en) * 2022-05-07 2022-06-03 中国人民解放军总医院第八医学中心 Bronchiectasis acute exacerbation early warning method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1215472A (en) * 1996-02-02 1999-04-28 史密丝克莱恩比彻姆公司 Method and system for identifying patient at risk for an adverse health outcome
CN107451390A (en) * 2017-02-22 2017-12-08 Cc和I研究有限公司 System for predicting acute exacerbations in patients with chronic obstructive pulmonary disease
CN108597601A (en) * 2018-04-20 2018-09-28 山东师范大学 Diagnosis of chronic obstructive pulmonary disease auxiliary system based on support vector machines and method
CN109215781A (en) * 2018-09-14 2019-01-15 苏州贝斯派生物科技有限公司 A kind of construction method and building system of the Kawasaki disease risk evaluation model based on logistic algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1215472A (en) * 1996-02-02 1999-04-28 史密丝克莱恩比彻姆公司 Method and system for identifying patient at risk for an adverse health outcome
CN107451390A (en) * 2017-02-22 2017-12-08 Cc和I研究有限公司 System for predicting acute exacerbations in patients with chronic obstructive pulmonary disease
CN108597601A (en) * 2018-04-20 2018-09-28 山东师范大学 Diagnosis of chronic obstructive pulmonary disease auxiliary system based on support vector machines and method
CN109215781A (en) * 2018-09-14 2019-01-15 苏州贝斯派生物科技有限公司 A kind of construction method and building system of the Kawasaki disease risk evaluation model based on logistic algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王雪: "慢性阻塞性肺疾病稳定期患者焦虑抑郁现况及相关影响因素研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114582511A (en) * 2022-05-07 2022-06-03 中国人民解放军总医院第八医学中心 Bronchiectasis acute exacerbation early warning method, device, equipment and medium
CN114582511B (en) * 2022-05-07 2022-11-15 中国人民解放军总医院第八医学中心 Bronchiectasis acute exacerbation early warning method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US8793144B2 (en) Treatment effect prediction system, a treatment effect prediction method, and a computer program product thereof
CN110051324B (en) Method and system for predicting death rate of acute respiratory distress syndrome
CN107785057B (en) Medical data processing method, device, storage medium and computer equipment
US8073629B2 (en) Simulation system of function of biological organ
CA2650872A1 (en) Methods and apparatus for identifying disease status using biomarkers
Dente et al. Towards precision medicine: Accurate predictive modeling of infectious complications in combat casualties
CN108231146B (en) Deep learning-based medical record model construction method, system and device
CN109582797A (en) Obtain method, apparatus, medium and electronic equipment that classification of diseases is recommended
JP2009529166A (en) Apparatus and method for computer modeling of respiratory diseases
CN108121896A (en) A kind of relationship between diseases analysis method and device based on miRNA
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
CN114783603A (en) Multi-source graph neural network fusion-based disease risk prediction method and system
Woods et al. The influence of gender on adults admitted for asthma
CN113223708A (en) Method for constructing disease risk prediction model and related equipment
Pacheco et al. A highly specific algorithm for identifying asthma cases and controls for genome-wide association studies
Kundu et al. A framework for understanding selection bias in real-world healthcare data
EP1722344A1 (en) Biological simulation system and computer program product
CN114582511B (en) Bronchiectasis acute exacerbation early warning method, device, equipment and medium
US20150347695A1 (en) Physician attribution for inpatient care
Ying et al. Gold classification of COPDGene cohort based on deep learning
González et al. Trialscope a unifying causal framework for scaling real-world evidence generation with biomedical language models
CN113921103A (en) Method, device, electronic equipment and medium for measuring sensitivity of differential diagnosis disease species
CN114649071A (en) Real world data-based peptic ulcer treatment scheme prediction system
Kang et al. Novel artificial intelligence-based technology to diagnose asthma using methacholine challenge tests
Zhu et al. Design and development of a readmission risk assessment system for patients with cardiovascular disease

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination