CN116364274A - Disease prediction method and system based on causal inference and dynamic integration of multiple labels - Google Patents
Disease prediction method and system based on causal inference and dynamic integration of multiple labels Download PDFInfo
- Publication number
- CN116364274A CN116364274A CN202310268757.8A CN202310268757A CN116364274A CN 116364274 A CN116364274 A CN 116364274A CN 202310268757 A CN202310268757 A CN 202310268757A CN 116364274 A CN116364274 A CN 116364274A
- Authority
- CN
- China
- Prior art keywords
- causal
- model
- prediction
- label
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001364 causal effect Effects 0.000 title claims abstract description 89
- 201000010099 disease Diseases 0.000 title claims abstract description 71
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000010354 integration Effects 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 34
- 208000024891 symptom Diseases 0.000 claims abstract description 12
- 229940079593 drug Drugs 0.000 claims abstract description 8
- 239000003814 drug Substances 0.000 claims abstract description 8
- 230000015654 memory Effects 0.000 claims description 19
- 238000012216 screening Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 9
- 230000035622 drinking Effects 0.000 claims description 5
- 238000003384 imaging method Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 5
- 230000000391 smoking effect Effects 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 3
- 238000013526 transfer learning Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 2
- 238000003745 diagnosis Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 4
- 238000011161 development Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 210000003743 erythrocyte Anatomy 0.000 description 5
- 102000001554 Hemoglobins Human genes 0.000 description 3
- 108010054147 Hemoglobins Proteins 0.000 description 3
- 206010033557 Palpitations Diseases 0.000 description 3
- 206010003119 arrhythmia Diseases 0.000 description 3
- 230000006793 arrhythmia Effects 0.000 description 3
- 208000000059 Dyspnea Diseases 0.000 description 2
- 206010013975 Dyspnoeas Diseases 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 208000013220 shortness of breath Diseases 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 206010008479 Chest Pain Diseases 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 208000007177 Left Ventricular Hypertrophy Diseases 0.000 description 1
- 206010028813 Nausea Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 208000029078 coronary artery disease Diseases 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008693 nausea Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a disease prediction method and a disease prediction system based on causal inference and dynamic integration of multiple labels, wherein the method comprises the following steps: acquiring multi-source information of a patient, comprising: demographic index, lifestyle, physical examination, complaint symptoms, past medical history, and past medication information; establishing a causal model to analyze causal relations among all the features and select a feature set with causal effects; training a plurality of multi-label-base learning classifiers by utilizing a feature set with causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance; and dynamically constructing a new multi-label integrated prediction model by combining prediction models with different numbers and types and with optimal performance, and selecting a combination model with highest prediction performance to predict the disease. The prediction method provided by the invention can reflect the causal relationship among the features, avoid misjudgment caused by correlation, improve the prediction accuracy, help to improve the diagnosis level and treatment effect of doctors, and promote the digital and intelligent development of the medical industry.
Description
Technical Field
The invention relates to the technical field of disease prediction, in particular to a disease prediction method and system based on causal inference and dynamic integration of multiple labels.
Background
Currently, predictions for various chronic diseases rely mainly on traditional medical and biometric methods. These methods are typically based on sample data sets, using some machine learning or artificial intelligence algorithms, such as support vector machines, decision trees, neural networks, etc., to predict the risk of onset of the disease. However, these methods have some limitations, such as failure to take into account interactions and time evolution of various factors, and thus, accuracy and reliability of prediction are limited.
Traditional machine learning algorithms typically ignore potential causal relationships in making disease predictions, which may lead to model bias in the predictions. In addition, the progression of the disease is often dynamic, and time factors also play an important role in the prognosis of the disease. Currently, disease prediction plays a vital role in clinical practice. Traditional disease prediction methods typically focus only on specific disease indicators or symptoms, ignoring complex relationships between different factors. Therefore, misjudgment and missed diagnosis often occur during diagnosis and treatment.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the defects existing in the prior art, thereby providing a disease prediction method and a disease prediction system based on causal inference and dynamic integration of multiple labels, and improving the accuracy of disease prediction by analyzing causal relations among different data by using a causal inference algorithm; meanwhile, the robustness and generalization performance of the algorithm are improved by training and predicting by adopting a dynamic integrated multi-label algorithm.
The technical scheme for solving the technical problems is as follows:
in a first aspect, the present invention provides a causal inference and dynamic integration multi-label based disease prediction method, comprising the steps of:
acquiring multi-source information of a patient, comprising: demographic index information, lifestyle information, physical examination information, complaint symptoms, past medical history information, and past medication information;
establishing a causal model to analyze causal relations among all the features, and screening feature sets with causal effects;
training a plurality of multi-label-base learning classifiers by using the feature set with causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance;
and dynamically constructing new multi-label integrated prediction models by combining different numbers and different types of prediction models with optimal performance, and selecting a combination model with highest prediction performance to predict the disease.
Optionally, the demographic index information includes: sex, age, height, weight; the lifestyle information includes: history of smoking and history of drinking; physical examination information, comprising: biochemical examination, electrocardiogram information and imaging data; the past medical history information includes: patient's own medical history and family medical history.
Optionally, the step of establishing a causal model to analyze causal relationships among the features and screen feature sets with causal effects includes:
constructing a Bayesian network: let probability P (U) be the joint probability distribution of outcome y, y e L, n=1,..n, N represents the number of patients, l= { L1, L2,..q } is the set of q different binary outcome labels, U is the set of nodes G of the directed acyclic graph, if G < U, G, P (U) > satisfies the markov condition, the triplet of < U, G, P (U) > is called bayesian network, each variable being independent of any subset of non-child items under the parent condition in G;
training a markov chain: setting BN<U,G,P(U)>F in a loyalty based hypothetical bayesian network i E F, denoted MB (Fi), where MB (F i )={pa(F i )Uch(F i )Usp(F i ) Is the only term, F represents different features, pa (F i ) Represents F i Of the parent node set, i.e. directly affecting F i Is a variable set of (1); ch (F) i ) Represents F i Of the sub-node sets, i.e. F i A set of directly affected variables; sp (F) i ) Representation and F i Other node sets with the same parent node, with F i A set of variables having an indirect influence relationship;
screening multi-label association features: the ending probability P (T i S) maximization, whereinThe causal feature selection for data set D is defined as:
S * =arrgmax|S|,
s.t.P i (T i S)=P′(T i S)(T i GT′ i ,,j≠i)
wherein T represents a disease category that may be output;
repeating the process of training the Markov chain and screening the multi-label associated features, finally maximizing the most feature distribution probability corresponding to all ending labels y E L, and selecting feature sets with causal effects.
Optionally, the process of training a plurality of multi-label based learning classifiers comprises:
initializing: for all patient individuals i, initializing the weight W of each patient 1 (i, l) acquiring an initialisation sample dataset D 1 L represents a label, and the iteration number t=1 is set;
training a base classifier: data set D using an mth base classifier 1 Stacking and integrating, and training a single base classifier h m1 (x, l) predicting patient outcome;
and (5) weight updating: computing hamming of a base classifierLoss, i.e. misclassified label proportion e t Calculating an update weight alpha t By alpha t Calculate the next iteration update W t+1 (i,l);
Repeating the integration iteration, namely setting the iteration times t=t+1 until the preset iteration times are reached;
weight lifting learning classifier weighted voting: taking a single classifier h of t=1, …, T mt To obtain a lifting learning classifier h m 。
Optionally, the step of dynamically constructing a new multi-label integrated prediction model by combining different numbers and different kinds of base classifiers includes:
initializing: raw dataset D with causal effects s Is an empty set;
classifying the training samples: using trained basis learning classifier h m (x, l) vs. feature x 1 Classifying to obtain c 1m =h m (x 1, l);
Updating data set D s :D s ={c 11 ,c 12 ,…,c 1m Y, repeating the classifying process of the training samples until all N inpatients are classified and predicted to obtain c nm =h m (x n L) to obtain a new dataset D s ={((c i1 ,c i2 ,…,c im ),y)};
Integrated part of training model: using the new dataset D s Model results of the training model, the model is dynamically selected according to the base learner pool, and finally stacked and integrated, and the new learning algorithm Z is used in the part, so that H=Z (D s );
And (3) outputting: h (x) =h (H 1 (x,l),h 2 (x,l),…,h m (x,l))。
Optionally, after acquiring the multi-source information of the patient, the method further includes:
and preprocessing and cleaning the multi-source information, removing noise and abnormal values, and performing feature selection and dimension reduction operation.
In a second aspect, embodiments of the present invention provide a causal inference and dynamic integration multi-labeled disease prediction system, the system comprising:
a data collection module for obtaining multi-source information of a patient, comprising: demographic index information, lifestyle information, physical examination information, complaint symptoms, past medical history information, and past medication information;
the causal inference module is used for analyzing causal relations among all the features to establish a causal model, and screening feature sets with causal effects based on the causal inference model;
the dynamic integrated multi-label algorithm module is used for training a plurality of multi-label-based learning classifiers by utilizing the feature set with causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance;
and the disease prediction model determining module is used for dynamically constructing a new multi-label integrated prediction model through different numbers and different types of prediction model combination modes with optimal performance, and selecting a combination model with highest prediction performance to predict the disease.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including: the system comprises a memory and a processor, wherein the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, thereby executing the method in the first aspect or any optional implementation manner of the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect, or any one of the alternative embodiments of the first aspect.
The disease prediction method and system based on causal inference and dynamic integration of multiple labels provided by the embodiment of the invention can reflect causal relationships among various characteristics by analyzing causal relationships and dynamic changes among different patient characteristics and combining a multiple label classification algorithm, avoid misjudgment caused by correlation, improve disease prediction accuracy, help improve diagnosis level and treatment effect of doctors, and facilitate the digitized and intelligent development of the medical industry. Meanwhile, the disease prediction method and the disease prediction system provided by the invention have wide application prospects, and have important significance in the fields of medical care, health management, medical scientific research and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a disease prediction method based on causal inference and dynamic integration of multiple tags according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a disease prediction method based on causal inference and dynamic integration of multiple tags according to an embodiment of the present invention;
FIG. 3 is a flowchart showing key steps of a causal inference and dynamic integration multi-label based disease prediction method according to an embodiment of the present invention;
FIG. 4 is a schematic block diagram of a causal inference and dynamic multi-label integrated disease prediction system according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The embodiment of the invention provides a disease prediction method based on causal inference and dynamic integration of multiple labels, wherein a flow chart of the method is shown in fig. 1, and a schematic block diagram is shown in fig. 2:
step S1: acquiring multi-source information of a patient, comprising: demographic index information, lifestyle information, physical examination information, complaint symptoms, past medical history information, and past medication information.
Specifically, in the embodiment of the present invention, the demographic index information includes: sex, age, height, weight, occupation; the lifestyle information includes: history of smoking and history of drinking; physical examination information, comprising: biochemical examination (white blood cells, red blood cells, distribution width of red blood cells, hemoglobin, platelets, etc.), electrocardiographic information, and imaging data; the past medical history information includes: patient's own medical history and family medical history. The complaint symptoms are clinical manifestations of the patient's own initiative, such as chest pain, nausea, palpitations, etc.
Step S2: and establishing a causal model to analyze causal relation among all the features, and screening out a feature set with causal effect.
Specifically, in the embodiment of the present invention, the collected patient information is used to construct a multi-label causal feature selection framework according to the causal invariance principle, and a feature set with causal effect of a final multi-label data source is given by using the concept of a markov blanket MB in a bayesian network, which specifically includes the following steps:
step S21: constructing a Bayesian network: let probability P (U) be the joint probability distribution of outcome y, y e L, n=1,..n, N represents the number of patients, l= { L1, L2,..q } is the set of q different binary outcome labels, U is the set of nodes G of the directed acyclic graph, if G < U, G, P (U) > satisfies the markov condition, the triplet of < U, G, P (U) > is called bayesian network, each variable being independent of any subset of non-child items under the parent condition in G;
step S22: training a markov chain: setting BN<U,G,P(U)>F in Bayesian networks for loyalty-based assumptions i E F, denoted MB (Fi), where MB (F i )={pa(F i )Uch(F i )Usp(F i ) Is the only term, F represents different features, pa (F i ) Represents F i Of the parent node set, i.e. directly affecting F i Is a variable set of (1); ch (F) i ) Represents F i Of the sub-node sets, i.e. F i A set of directly affected variables; sp (F) i ) Representation and F i Other node sets with the same parent node, with F i There is a set of variables that indirectly affect the relationship.
Step S23: screening multi-label association features: the ending probability P (T i S) maximization, whereinThe causal feature selection for data set D is defined as:
S * =arrgmax|S|,
s.t.P i (T i S)=P′(T i S)(T i GT i ′ , ,j≠i)
wherein T represents a disease category that may be output;
step S24: repeating the steps S22-S23, finally maximizing the probability of the most effective feature distribution corresponding to all the ending labels y E L, and selecting the feature set with causal effect.
Step S21-step S22 constructs causal chains through constructing causal network diagrams, selects the characteristics which are possible to have causal effect on the predicted diseases, combines the potential disease labels with the screened variables through step S23, and continuously screens the characteristics which have causal effect on the disease labels. Through the process, the dimension reduction can be effectively carried out on the patient information, the characteristics with causal effects are screened out for subsequent model construction, and the screened characteristics are ensured to have real causal effects on the labels.
Step S3: and training a plurality of multi-label-base learning classifiers by using the feature set with causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance.
The goal of step S3 is to find the underlying multi-label classifier to improve model predictive performance. Because of various multi-label algorithms, the selection of the basic classifier is a difficult point, in the embodiment of the invention, the basic classifier is built by combining four multi-label models of BR, CC, LP and RAkEL which are relatively stable in current performance by taking C4.5 as a meta-classifier, and the optimal prediction model is obtained by updating weights through Stacking integration (Stacking). For m=1, the embodiment of the invention, M, trains M multi-label based learning classifiers, comprising the steps of:
step S31: initializing: for all patient individuals i, initializing the weight W of each patient 1 (i, l) =1/N, an initialized sample data set D is acquired 1 Setting the iteration times t=1;
step S32: training a base classifier: data set D using an mth base classifier 1 Stacking and integrating, and training a single base classifier h m1 (x, l) predicting patient outcome;
step S33: and (5) weight updating: calculating the Hamming loss of the base classifier (the smaller the value is, the better the model prediction effect is, namely the wrong label proportion e) t Calculating an update weightUsing alpha t Calculate the next iteration updateAs normalization factor, y [ l ]]Indicating whether tag l belongs to instance (x, y).
Step S34: repeating the integration iteration, namely setting the iteration times t=t+1 until the iteration times t=t+1 are set until the preset iteration times T are reached;
step S35: weight lifting learning classifier weighted voting: taking a single classifier h of t=1, …, T mt To obtain a lifting learning classifier h m And, as aTo represent the optimal predictive model.
Step S4: and dynamically constructing new multi-label integrated prediction models by combining different numbers and different types of prediction models with optimal performance, and selecting a combination model with highest prediction performance to predict the disease.
The embodiment of the invention dynamically builds a new multi-label integrated prediction model, which comprises the following steps:
step S41: initializing: raw dataset D with causal effects s Is an empty set;
step S42: classifying the training samples: using trained basis learning classifier h m (x, l) vs. feature x 1 Classifying to obtain c 1m =h m (x 1 ,l);
Step S43: updating data set D s :D s ={c 11 ,c 12 ,…,c 1m Y, repeating the classifying process of the training samples until all N inpatients are classified and predicted to obtain c nm =h m (x n L) to obtain a new dataset D s ={((c i1 ,c i2 ,…,c im ),y)};
Step S44: integrated part of training model: using the new dataset D s Model results of the training model, the model is dynamically selected according to the base learner pool, and finally Stacking is performed, and the new learning algorithm Z is used in the part, so that h=z (D s );
Step S45: and (3) outputting: h (x) =h (H 1 (x,l),h 2 (x,l),…,h m (x,l))。
The process dynamically builds a new multi-label integrated prediction model through the combination modes of different numbers and different types of base classifiers, selects the base classifier with the optimal performance, and finally diagnoses the disease of the patient so as to ensure that the prediction accuracy is highest.
The flow chart of key steps of the disease prediction method based on causal inference and dynamic integration multi-label provided by the embodiment of the invention is shown in figure 3, has better universality and adaptability, can be suitable for various different disease prediction scenes, can adaptively update a prediction model according to the change of data, and keeps the accuracy and efficiency of prediction.
An embodiment of the present invention provides a disease prediction system based on causal inference and dynamic integration of multiple tags, as shown in fig. 4, the system includes:
a data collection module for obtaining multi-source information of a patient, comprising: demographic index information, lifestyle information, physical examination information, complaint symptoms, past medical history information, and past medication information. Details refer to the related description of step S1 in the above method embodiment, and will not be described herein.
The data processing module is used for preprocessing and cleaning the multi-source information, removing noise and abnormal values, performing feature selection and dimension reduction operation, and preparing for the construction of a subsequent disease prediction model.
And the causal inference module is used for analyzing causal relations among all the features to establish a causal model, and screening feature sets with causal effects based on the causal inference model. For details, refer to the related description of step S2 in the above method embodiment, and no further description is given here.
And the dynamic integration multi-label module is used for training a plurality of multi-label-based learning classifiers by utilizing the feature set with the causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance. For details, refer to the related description of step S3 in the above method embodiment, and no further description is given here.
And the disease prediction model determining module is used for dynamically constructing a new multi-label integrated prediction model through different numbers and different types of prediction model combination modes with optimal performance, and selecting a combination model with highest prediction performance to predict the disease. For details, see the description of step S4 in the above method embodiment, and the details are not repeated here.
The interface display module is used for displaying the disease prediction result to the user, so that the user can more conveniently obtain the prediction result through an intuitive user interface and make a corresponding clinical decision.
In a preferred embodiment, the system provided in the embodiment of the present invention further includes: a disease prediction model update module comprising: the incremental learning unit adjusts the weights of the new sample and the old sample by adopting a sample importance-based method, and the transfer learning unit predicts the new disease by utilizing the existing prediction model and can continuously ensure the disease prediction accuracy by learning and receiving new data.
In practical application, an example of application of the disease prediction system provided by the embodiment of the present invention is adopted:
1. patient A, female, 45 years old, height 160cm, weight 70kg, no smoking history, drinking history, no family history; the complaints are the symptoms such as palpitation, shortness of breath and the like in the near future; the patient has undergone an electrocardiographic examination to find an arrhythmia; the biochemical examination results were as follows: white blood cells 8.5X10-9/L, red blood cells 4.1X10-12/L, red blood cell distribution width 13.6%, hemoglobin 131g/L, platelets 226X 10-9/L. The imaging data shows the presence of left ventricular hypertrophy.
The doctor inputs the sex, age, height, weight, smoking history, drinking history, family history, clinical manifestation (symptoms such as palpitation and shortness of breath), biochemical examination (leucocyte, erythrocyte distribution width, hemoglobin, platelet and the like), electrocardiogram information, imaging information and the like of the first patient by using the system provided by the embodiment of the invention. The system screens out the characteristics related to the diseases, then carries out model training and prediction, and finally obtains the following disease prediction results:
hypertension: the probability of illness is 70%
Arrhythmia: the probability of illness is 60%
Coronary heart disease: the probability of illness is 40%
The doctor further diagnoses the disease prediction result by combining the information of the symptom, the sign, the examination result and the like of the first patient, and finally determines the disease prediction result as arrhythmia. And corresponding treatment schemes are formulated, including drug treatment, lifestyle adjustment and the like.
2. In the intelligent medical device, the device can monitor physiological indexes (such as heart rate, blood pressure and blood oxygen saturation) of a patient by using a sensor, transmit the data to a cloud server integrated with the disease prediction system provided by the embodiment for analysis and diagnosis, and automatically send an alarm or reminder to a doctor when necessary according to the diagnosis result.
Fig. 5 shows a schematic structural diagram of a computer device according to an embodiment of the present invention, including: a processor 901 and a memory 902, wherein the processor 901 and the memory 902 may be connected by a bus or otherwise, for example in fig. 5.
The processor 901 may be a central processing unit (Central Processing Unit, CPU). The processor 901 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination thereof.
The memory 902 is used as a non-transitory computer readable storage medium for storing a non-transitory server program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the methods in the above method embodiments. The processor 901 executes various functional applications of the processor and data processing, i.e., implements the methods in the above-described method embodiments, by running non-transitory server programs, instructions, and modules stored in the memory 902.
The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created by the processor 901, and the like. In addition, the memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 902 optionally includes memory remotely located relative to processor 901, which may be connected to processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 902 that, when executed by the processor 901, perform the methods of the method embodiments described above.
The specific details of the computer device may be correspondingly understood by referring to the corresponding related descriptions and effects in the above method embodiments, which are not repeated herein.
It will be appreciated by those skilled in the art that implementing all or part of the above-described methods in the embodiments may be implemented by a computer program for instructing relevant hardware, and the implemented program may be stored in a computer readable storage medium, and the program may include the steps of the embodiments of the above-described methods when executed. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations are within the scope of the invention as defined by the appended claims.
Claims (10)
1. A disease prediction method based on causal inference and dynamic integration of multiple tags, comprising the steps of:
acquiring multi-source information of a patient, comprising: demographic index information, lifestyle information, physical examination information, complaint symptoms, past medical history information, and past medication information;
establishing a causal model to analyze causal relations among all the features, and screening feature sets with causal effects;
training a plurality of multi-label-base learning classifiers by using the feature set with causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance;
and dynamically constructing new multi-label integrated prediction models by combining different numbers and different types of prediction models with optimal performance, and selecting a combination model with highest prediction performance to predict the disease.
2. The causal inference and dynamic integration multi-labeled disease prediction method according to claim 1, wherein the demographic index information comprises: sex, age, height, weight; the lifestyle information includes: history of smoking and history of drinking; physical examination information, comprising: biochemical examination, electrocardiogram information and imaging data; the past medical history information includes: patient's own medical history and family medical history.
3. The causal inference and dynamic integration multi-label based disease prediction method of claim 1, wherein the step of establishing a causal model to analyze causal relationships between each feature, and screening feature sets with causal effects comprises:
constructing a Bayesian network: let probability P (U) be the joint probability distribution of outcome y, y e L, n=1,..n, N represents the number of patients, l= { L1, L2,..q } is the set of q different binary outcome labels, U is the set of nodes G of the directed acyclic graph, if G < U, G, P (U) > satisfies the markov condition, the triplet of < U, G, P (U) > is called bayesian network, each variable being independent of any subset of non-child items under the parent condition in G;
training a markov chain: setting BN<U,G,P(U)>F in Bayesian networks for loyalty-based assumptions i E F, denoted MB (Fi), where MB (F i )={pa(F i )Uch(F i )Usp(F i ) Is the only term, F represents different features, pa (F i ) Represents F i Of the parent node set, i.e. directly affecting F i Is a variable set of (1); ch (F) i ) Represents F i Of the sub-node sets, i.e. F i A set of directly affected variables; sp (F) i ) Representation and F i Other node sets with the same parent node, with F i A set of variables having an indirect influence relationship;
screening multi-label association features: the ending probability P (T i S) maximization, whereinThe causal feature selection for data set D is defined as:
S * =arrgmax|S|,
s.t.P i (T i S)=P′(T i S)(T i GT′ i ,,j≠i)
wherein T represents a disease category that may be output;
repeating the process of training the Markov chain and screening the multi-label associated features, finally maximizing the most feature distribution probability corresponding to all ending labels y E L, and selecting feature sets with causal effects.
4. The causal inference and dynamic integration multi-labeled disease prediction method of claim 3, wherein the process of training a plurality of multi-labeled based learning classifiers comprises:
initializing: for all patient individuals i, initializing the weight W of each patient 1 (i, l) acquiring an initialisation sample dataset D 1 L represents a label, and the iteration number t=1 is set;
training a base classifier: data set D using an mth base classifier 1 Stacking and integrating, and training a single base classifier h m1 (x, l) predicting patient outcome;
and (5) weight updating: calculating Hamming loss of the base classifier and misclassification label proportion e t Calculating an update weight alpha t By alpha t Calculate the next iteration update W t+1 (i,l);
Repeating the integration iteration: setting iteration times t=t+1 until reaching preset iteration times;
weight lifting learning classifier weighted voting: taking a single classifier h of t=1, …, T mt To obtain a lifting learning classifier h m 。
5. The causal inference and dynamic integration multi-labeled disease prediction method according to claim 4, wherein the step of dynamically constructing a new multi-labeled integration prediction model by combining different numbers and different kinds of base classifiers comprises:
initializing: raw dataset D with causal effects s Is an empty set;
classifying the training samples: using trained basis learning classifier h m (x, l) vs. feature x 1 Classifying to obtain c 1m =h m (x 1 ,l);
Updating data set D s :D s ={c 11 ,c 12 ,…,c 1m Y, repeating the classifying process of the training samples until all N inpatients are classified and predicted to obtain c nm =h m (x n L) to obtain a new dataset D s ={((c i1 ,c i2 ,…,c im ),y)};
Integrated part of training model: using the new dataset D s Model results of the training model, the model is dynamically selected according to the base learner pool, and finally stacked and integrated, and the new learning algorithm Z is used in the part, so that H=Z (D s );
And (3) outputting: h (x) =h (H 1 (x,l),h 2 (x,l),…,h m (x,l))。
6. The causal inference and dynamic integration multi-labeled disease prediction method according to claim 5, further comprising, after obtaining the multi-source information of the patient:
and preprocessing and cleaning the multi-source information, removing noise and abnormal values, and performing feature selection and dimension reduction operation.
7. A causal inference and dynamic integration multi-tag based disease prediction system, comprising:
a data collection module for obtaining multi-source information of a patient, comprising: demographic index information, lifestyle information, physical examination information, complaint symptoms, past medical history information, and past medication information;
the causal inference module is used for analyzing causal relations among all the features to establish a causal model, and screening feature sets with causal effects based on the causal inference model;
the dynamic integrated multi-label algorithm module is used for training a plurality of multi-label-based learning classifiers by utilizing the feature set with causal effect, and updating weights through stacking integration to obtain a prediction model with optimal performance;
and the disease prediction model determining module is used for dynamically constructing a new multi-label integrated prediction model through different numbers and different types of prediction model combination modes with optimal performance, and selecting a combination model with highest prediction performance to predict the disease.
8. The causal inference and dynamic integration multi-tag based disease prediction system of claim 7, further comprising:
the data processing module is used for preprocessing and cleaning the multi-source information, removing noise and abnormal values, and performing feature selection and dimension reduction operation;
the interface display module is used for displaying the disease prediction result to a user;
a disease prediction model update module comprising: the device comprises an increment learning unit and a transfer learning unit, wherein the increment learning unit adopts a method based on the importance of samples to adjust the weights of new samples and old samples, and the transfer learning unit utilizes the existing prediction model to predict new diseases.
9. An electronic device, comprising:
a memory and a processor in communication with each other, the memory having stored therein computer instructions that, upon execution, perform the causal inference and dynamic integrated multi-label based disease prediction method of any of claims 1-6.
10. A computer readable storage medium storing computer instructions for causing the computer to perform the causal inference and dynamically integrated multi-labeled disease prediction method of any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310268757.8A CN116364274A (en) | 2023-03-16 | 2023-03-16 | Disease prediction method and system based on causal inference and dynamic integration of multiple labels |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310268757.8A CN116364274A (en) | 2023-03-16 | 2023-03-16 | Disease prediction method and system based on causal inference and dynamic integration of multiple labels |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116364274A true CN116364274A (en) | 2023-06-30 |
Family
ID=86918383
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310268757.8A Pending CN116364274A (en) | 2023-03-16 | 2023-03-16 | Disease prediction method and system based on causal inference and dynamic integration of multiple labels |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116364274A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116779044A (en) * | 2023-07-04 | 2023-09-19 | 重庆大学 | Gene classification method, system and equipment based on multi-tag feature selection |
CN117352162A (en) * | 2023-10-24 | 2024-01-05 | 重庆邮电大学 | Disease factor data processing method based on double-rule causal feature selection |
CN117409978A (en) * | 2023-12-15 | 2024-01-16 | 贵州大学 | Disease prediction model construction method, system, device and readable storage medium |
CN117457153A (en) * | 2023-12-26 | 2024-01-26 | 深圳市龙岗区第三人民医院 | Intelligent recommendation system and method for nursing in psychiatric house |
CN117809854A (en) * | 2023-12-29 | 2024-04-02 | 重庆邮电大学 | Dangerous factor causal relation extraction method based on medical causal knowledge embedding |
-
2023
- 2023-03-16 CN CN202310268757.8A patent/CN116364274A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116779044A (en) * | 2023-07-04 | 2023-09-19 | 重庆大学 | Gene classification method, system and equipment based on multi-tag feature selection |
CN117352162A (en) * | 2023-10-24 | 2024-01-05 | 重庆邮电大学 | Disease factor data processing method based on double-rule causal feature selection |
CN117409978A (en) * | 2023-12-15 | 2024-01-16 | 贵州大学 | Disease prediction model construction method, system, device and readable storage medium |
CN117409978B (en) * | 2023-12-15 | 2024-04-19 | 贵州大学 | Disease prediction model construction method, system, device and readable storage medium |
CN117457153A (en) * | 2023-12-26 | 2024-01-26 | 深圳市龙岗区第三人民医院 | Intelligent recommendation system and method for nursing in psychiatric house |
CN117809854A (en) * | 2023-12-29 | 2024-04-02 | 重庆邮电大学 | Dangerous factor causal relation extraction method based on medical causal knowledge embedding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Islam et al. | Chronic kidney disease prediction based on machine learning algorithms | |
WO2021120936A1 (en) | Chronic disease prediction system based on multi-task learning model | |
CN116364274A (en) | Disease prediction method and system based on causal inference and dynamic integration of multiple labels | |
CN111492437A (en) | Method and system for supporting medical decision | |
US20220084633A1 (en) | Systems and methods for automatically identifying a candidate patient for enrollment in a clinical trial | |
Kumar et al. | Medical big data mining and processing in e-healthcare | |
Ghasemieh et al. | A novel machine learning model with Stacking Ensemble Learner for predicting emergency readmission of heart-disease patients | |
Rattan et al. | Artificial intelligence and machine learning: what you always wanted to know but were afraid to ask | |
Sahoo et al. | Heart failure prediction using machine learning techniques | |
Zahid et al. | Mortality prediction with self normalizing neural networks in intensive care unit patients | |
Chinnasamy et al. | Machine learning based cardiovascular disease prediction | |
Popkes et al. | Interpretable outcome prediction with sparse Bayesian neural networks in intensive care | |
Koyi et al. | A research survey on state of the art heart disease prediction systems | |
Sampath et al. | Ensemble Nonlinear Machine Learning Model for Chronic Kidney Diseases Prediction | |
Islam et al. | Cardiovascular Disease Prediction Using Machine Learning Approaches | |
CN113012808B (en) | Health prediction method | |
Singh et al. | Real-Time Symptomatic Disease Predictor Using Multi-Layer Perceptron | |
Miriyala et al. | A review on recent machine learning algorithms used in CAD diagnosis | |
Kaur et al. | A Systematic Review of Medical Expert Systems for Cardiac Arrest Prediction | |
Belaala | Big Data analytics using Artificial Intelligence techniques in medical PHM | |
Zhang et al. | Cardiac arrhythmia classification with rejection of ECG recordings based on uncertainty estimation from deep neural networks | |
Saeidi et al. | Artificial intelligence and clinical decision making: approaches and challenges | |
An et al. | PARSE: A personalized clinical time-series representation learning framework via abnormal offsets analysis | |
US12094582B1 (en) | Intelligent healthcare data fabric system | |
AU2021102832A4 (en) | System & method for automatic health prediction using fuzzy based machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |