CN118335319A - Early prediction method for common major diseases based on virtual person simulation - Google Patents

Early prediction method for common major diseases based on virtual person simulation Download PDF

Info

Publication number
CN118335319A
CN118335319A CN202410749228.4A CN202410749228A CN118335319A CN 118335319 A CN118335319 A CN 118335319A CN 202410749228 A CN202410749228 A CN 202410749228A CN 118335319 A CN118335319 A CN 118335319A
Authority
CN
China
Prior art keywords
partition
variables
prediction
virtual human
bayesian network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410749228.4A
Other languages
Chinese (zh)
Other versions
CN118335319B (en
Inventor
张韬
李佳圆
温晓玲
伍东升
陈苗双
胡琳
陈馨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202410749228.4A priority Critical patent/CN118335319B/en
Publication of CN118335319A publication Critical patent/CN118335319A/en
Application granted granted Critical
Publication of CN118335319B publication Critical patent/CN118335319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention belongs to the technical field of disease prediction, and discloses a method for early predicting common major diseases based on virtual human simulation, which comprises a virtual human prediction model based on a dynamic Bayesian network and constructed through structure learning and parameter learning, and early predicting the common major diseases through the virtual human prediction model; the invention combines multidimensional dynamic molecular change characteristics in the process of the longitudinal queue progress of the healthy age-increasing, builds a prediction model of 'virtual man' simulation based on the technologies of machine learning enhanced dynamic Bayesian network model, group learning and the like, reveals the 'multi-cause multi-fruit' combined effect of complex exposure and phenotype characteristics on various healthy outcomes, screens novel healthy age-increasing markers with biological significance, and improves the sensitivity and accuracy of early risk prediction of serious diseases in the process of the age-increasing through complementation among different layers of histology information.

Description

Early prediction method for common major diseases based on virtual person simulation
Technical Field
The invention belongs to the technical field of disease prediction, and particularly relates to a common major disease early prediction method based on virtual human simulation.
Background
Common major disease risk factors include age, genetic factors, life style, environmental factors, chronic medical history, psychological factors and the like, and the incidence influence factors have wide range and relatively complex relationship. The analysis framework shown in fig. 2 provides a systematic method for evaluating evidence of preventive medical services, and provides important guidance for constructing an accurate risk evaluation model. With the rapid development of high-throughput genomics, proteomics, metabolomics and other multi-groups, it is possible to elucidate complex mechanisms in the development and progress of common major diseases at the molecular level. Studies have shown that: genetic factors also play an important role in the development and progression of common major diseases. Therefore, the risk of common major diseases is researched from the molecular level, and then a risk assessment model based on multiple groups of science is constructed, so that the phenotype of the common major diseases can be predicted, and the risk stratification of the common major diseases can be carried out, thereby laying a foundation for accurate prevention and diagnosis.
The traditional single-group study provides important information for screening novel molecular markers of common major diseases in the aging process, but the provided biological process information often has larger limitation, and the overall interpretation degree of the complex aging process is limited. The organic integration of information among different layers of genes, proteins, metabolism, lipid, flora and the like not only provides more evidence for explaining the biological mechanism of the aging process, but also facilitates deep mining of novel molecular markers related to common major diseases in the aging process. Therefore, multidimensional information such as genomics, metabonomics, diet patterns, nutritional conditions and the like is fused, a common major disease onset risk assessment model is built, high-risk groups of common major diseases are identified, risks of the common major diseases in the future of individuals are dynamically assessed, and personalized early-stage targeted preventive intervention is provided for the high-risk groups of the common major diseases.
Disclosure of Invention
The invention aims to provide a method for early predicting common major diseases based on virtual human simulation, which adopts a dynamic Bayesian network model based on real causal relationship and machine learning as a virtual human prediction model, can carry out stage prediction on the major diseases, and provides important reference basis for risk assessment and prevention and control of the common major diseases.
In order to achieve the above purpose, the invention adopts the following technical scheme:
The early prediction method for the common major diseases based on the virtual human simulation comprises a virtual human prediction model based on a dynamic Bayesian network and constructed through structure learning and parameter learning, and early prediction is carried out on the common major diseases through the virtual human prediction model;
The structure learning is used for screening out factors related to risk prediction from potential factors, and determining the topological structure of a dynamic Bayesian network to construct a virtual human prediction model;
The structure learning is to firstly convert knowledge into normalized fuzzy membership degree by utilizing a fuzzy theory, determine the causal relationship between two variables by the membership degree, and construct a plurality of initial virtual human prediction models by taking the relationship between the two variables as a strong limiting condition or a weak limiting condition for searching the Bayesian network structure space; then, an optimal virtual person prediction model is explored from a plurality of initial virtual person prediction models by using a partition MCMC method;
The parameter learning is used for determining probability distribution of each variable in the optimal virtual human prediction model under the condition of giving a father node set, estimating a conditional probability relation among all nodes in the optimal virtual human prediction model by using the existing data set, and distributing probability parameters for each node; the parameter learning takes the stage of common major diseases as a classification ending index of risk assessment, clusters the important diseases through an EKM method, and predicts the result through an amplification type two-stage stacking algorithm; finally, determining optimal parameters of the virtual person prediction model through a grid search method and 5-fold cross verification, so as to construct a final virtual person prediction model;
the amplification type two-stage stacking algorithm takes the posterior probabilities of the output classes of N different primary learners on the same data set as the input vectors of N fixed dimensions of the element layer classifier respectively, and adds one primary layer on the basis of the stacking algorithm.
Further, the causal relationship between two variables is determined by membership: when the membership is 0, there is no causal relationship between the two variables, when the membership is 1, there is causal relationship between the two variables, and when the membership is between 0 and 1, the causal relationship between the two variables cannot be determined.
Further, the strong constraint refers to the causal relationship of two variables when the membership is 0 and the membership is 1, and the weak constraint refers to the causal relationship of two variables when the membership is between 0 and 1.
Further, the partition MCMC method includes:
Dividing all variables in an initial topological structure of the Bayesian network into m zones according to partition requirements and partition rules, numbering the m zones in sequence, setting the variable number of the ith zone (i=1, 2, …, m) as k i, and setting the variable number Number of variables in each partitionThe specific variable in each partition is pi λ, the partition Λ= (λ, pi λ) is marked, and the bayesian network structure under the partition Λ is marked as; Given data D, the posterior distribution of the labeled partition Λ is proportional to the total score obtained by combining the scores of each node X i and its parent Pa i in the Bayesian network structureTotal scoreDetermining an optimal network structure in the Bayesian network structure space according to the marked partition with the largest total score for the equivalence of the marked partition space and the Bayesian network structure space;
In each iteration, the current marker partition is Λ, the proposed marker partition is Λ *, and the acceptance probability is Wherein, the method comprises the steps of, wherein,To mark partitionsIs partitioned by a markerOne partition is split into two partitions or two adjacent partitions are merged into one partition.
Further, the partition requirement includes that there are no arrow connections between variables in the same region, that there are no parent nodes for the variables in region 1, that each variable in each region except region 1 must have at least one parent node from the previous region;
the partitioning rule is determined by a strong constraint, and a random simulation mode is adopted for partitioning variables which are not explicitly specified in the strong constraint.
Further, different primary learners are selected according to different types of data sets, wherein the primary learners for image data comprise a convolutional neural network CNN, a full convolutional network FCN and a deep Boltzmann machine DBM; the primary learner for text data includes a cyclic neural network RNN, a long-short-term memory network LSTM and a gating cyclic unit GRU; the primary learner for the numerical data includes logistic regression, support vector machines SVM, and naive bayes.
The invention combines multidimensional dynamic molecular change characteristics in the process of the longitudinal queue progress of the healthy age-increasing, builds a prediction model of 'virtual man' simulation based on the technologies of machine learning enhanced dynamic Bayesian network model, group learning and the like, reveals the 'multi-cause multi-fruit' combined effect of complex exposure and phenotype characteristics on various healthy outcomes, screens novel healthy age-increasing markers with biological significance, and improves the sensitivity and accuracy of early risk prediction of serious diseases in the process of the age-increasing through complementation among different layers of histology information. On the basis, an early risk screening tool suitable for clinical application is developed, the conversion application of scientific research results is promoted, and a scientific basis is provided for the construction of a health and age-increasing comprehensive control system.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a USPSTF analysis framework.
FIG. 3 is a schematic diagram of a primary learner selection according to the present invention.
FIG. 4 is a graph comparing model accuracy of the present invention.
FIG. 5 is a graph of model sensitivity versus the present invention.
Fig. 6 is a graph comparing AUC of the model of the present invention.
FIG. 7 is a graph of model specificity versus the present invention.
Detailed Description
As shown in fig. 1, the method for early prediction of common serious diseases based on virtual human simulation provided by the embodiment constructs a prediction model of virtual human simulation based on techniques such as machine learning enhancement type dynamic bayesian network model, group learning and the like, and effectively expresses and fuses multi-source information.
In the embodiment, a dynamic Bayesian network is constructed through structure learning and parameter learning to serve as a virtual human prediction model, and early prediction of common major diseases is carried out through the virtual human prediction model.
The structure learning is used for screening out factors related to risk prediction from a plurality of potential factors and determining the topological structure of the dynamic Bayesian network model to construct a virtual person prediction model. The present embodiments are directed to identifying those factors that have an impact on the target variable, thereby constructing a more accurate and interpretable Bayesian network model structure; wherein it involves analyzing and mining data to identify correlations and causal relationships between variables; the result of structure learning will directly affect the quality and predictive performance of the virtual human predictive model.
The embodiment adopts a fuzzy theory and a partition MCMC method to realize structure learning; the structure learning comprises two stages, wherein the first stage is knowledge driving, and the fuzzy theory is a mathematical tool for processing fuzzy and uncertain information, so that the common ambiguity and uncertainty in the medical field can be effectively processed; the goal of the first stage is to translate knowledge into normalized fuzzy membership. In brief, it is desirable to transform uncertain medical knowledge into a formalized representation by a fuzzification method for subsequent learning and reasoning processes; fuzzy membership can be regarded as a priori probability reflecting causal relationships between variables; the method helps limit the number of variables and the complexity of the Bayesian network structure, so that the difficulty of structure learning is reduced to a certain extent. In the first stage, the embodiment uses the abundant knowledge experience obtained by a plurality of sources such as medical database, expert consultation and the like; including but not limited to case data, medical literature, expert opinion, etc. And then, processing the acquired rich knowledge experience by using a fuzzy theory, and comprehensively using information sources in all aspects to investigate the causal relationship among the variables.
In general, the causal relationships between two variables a and B can be divided into three types: type(s)A.fwdarw.B, i.e.A is the cause of B; type(s)B.fwdarw.A, B is the cause of A; type(s)There is no causal relationship between A and B.
And taking the three types as three fuzzy subsets for measuring the relation between the variable A and the variable B, and respectively calculating normalized fuzzy membership degrees of the relation between the variable A and the variable B belonging to the three fuzzy subsets according to each information source.
The following three cases can be classified according to the degree of membership: case(s)If the relation between the variables A and B belongs to a certain type of fuzzy membership degree which is equal to 0, the relation between the variables A and B is not necessarily existed; for example, if the type of relationship between variables A and BIf the fuzzy membership degree is equal to 0, the reason that A is not necessarily the reason of B is explained; case(s)If the relation between the variables A and B belongs to a certain type of fuzzy membership degree which is equal to 1, the relation between the variables A and B is necessarily shown; for example, if the type of relationship between variables A and BThe membership degree is equal to 1, and the reason that A is necessarily B is explained; case(s)If the relationship between the variables A and B belongs to any type of membership degree between more than 0 and less than 1, it is difficult to determine the relationship between A and B through knowledge experience, and further judgment needs to be performed by combining the data characteristics.
For the case ofAnd conditionsTaking the node A as a strong constraint condition for searching the structure space of the Bayesian network, namely forcedly limiting the existence (or nonexistence) of a connection arrow between the nodes A and B in the Bayesian network; for the above-mentioned casesThe method can be used as a weak constraint condition for searching the Bayesian network structure space, and is incorporated into the partitioned MCMC method structure learning of the next stage in the form of prior probability; the search range of the Bayesian network structure space can be reduced by the strong limiting condition and the weak limiting condition, and the aim of searching the optimal Bayesian network structure task by the partition MCMC method is fulfilled.
The second stage is to explore the optimal bayesian network structure by using a partition MCMC method, and describe potential relations among variables by using a bayesian network as a model. A bayesian network is a probabilistic graph model that represents the dependency between variables by directed acyclic graphs and uses probability distributions to describe these relationships. The objective in the second stage is to find the optimal network structure in the limited Bayesian network structure space, and take the network structure as a virtual human prediction model, the partition MCMC method can more efficiently explore a large search space, so that the learning process of the Bayesian network structure is accelerated.
The partition MCMC method comprises the following steps:
firstly, dividing all variables in an initial topological structure according to a Bayesian network into m zones, numbering the m zones in sequence, wherein the 1 st zone and the 2 nd zone … … th zone are required to be connected with each other without arrows; (b) the variable of zone 1 has no parent node; (c) Each variable of each zone other than zone 1 must have at least one parent node from the previous zone. The partitioning rule is mainly determined by the strong constraint in the first stage, but for the variable which is not explicitly specified in the strong constraint in the first stage, a random simulation mode is adopted for partitioning.
Then, let the number of variables in the i-th region (i=1, 2, …, m) be k i, and the number of all variablesEach partitioning methodPi λ is adopted to order variables corresponding to the partitioning method lambda; according to the partitioning rule, λ is the number of variables in each partition, pi λ is the record of the specific variables in each partition, the partition Λ= (λ, pi λ) is marked, and the method is adoptedRepresenting a bayesian network structure under the labeled partition Λ; given data D, the posterior distribution of the labeled partition Λ is proportional to the total score obtained by combining the scores of each node X i and its parent Pa i in the Bayesian network structureTotal scoreThe equivalence of the marker partition space and the Bayesian network structure space is indicated, namely once the marker partition with the largest score can be searched in the marker partition space, the corresponding optimal network structure in the Bayesian network structure space can be determined according to the marker partition with the largest score.
Then, constructing an MCMC method based on the marked partition space; in each iteration of the MCMC method, the current marker partition is noted as Λ, the proposed marker partition is noted as Λ *, and the probability of acceptance isWherein, the method comprises the steps of, wherein,To mark partitionsIs partitioned by a markerBy splitting a partition into two partitions or merging two adjacent partitions into a partition. The partition MCMC method can provide a series of samples of the Bayesian network structure meeting specific stable posterior probability distribution under the condition of a given training data set, and further adopts Bayesian theory to carry out statistical inference on the samples of the Bayesian network structure, so that the problem of uncertainty of the Bayesian network structure caused by high-dimensional complex data structure, sampling error and other reasons can be well processed.
The structure learning of the embodiment is realized by combining the fuzzy theory and the partition MCMC method, so that the advantages of the fuzzy theory and the partition MCMC method in the aspect of processing uncertainty information are fully exerted; by limiting the search range of the optimal bayesian network structure to a range having medical professional significance, the resulting structure is ensured to have practical significance, and the efficiency of the algorithm can be improved.
The parameter learning is used to determine the probability distribution of each variable in the network given its parent node set. The present embodiment will utilize existing data to estimate the conditional probability relationships between the nodes in the network, and assign appropriate probability parameters to each node in the bayesian network, so that the network can more accurately reflect the statistical features and probability distribution of the data. The basic construction process of the machine learning enhanced dynamic Bayesian network is formed through structure learning and parameter learning, and a final virtual human prediction model is constructed; by combining a real causal relationship and a machine learning technology, the virtual person prediction model can better utilize multi-source information to perform risk prediction and decision support, and provides effective tools and methods for data analysis and prediction tasks applied to actual scenes.
In the parameter learning process, the embodiment takes the stage of the common serious diseases as a classification ending index of risk assessment; there is a significant difference in the proportion of different categories of constituent indicators of common major disease categories. In the embodiment, the grading of different types of common major diseases is integrated into one index, so that the composition proportion difference of the different types of the composite index is further enlarged, and the data set is in an unbalanced state. In training a disease prediction model against an unbalanced dataset, conventional machine learning algorithms typically tend to generate models that maximize overall classification accuracy, while for a few classes they are easily ignored, which can lead to reduced performance of the model, especially for a few groups where prediction accuracy is severely impacted. In view of the problem that different stage data of common major diseases are unbalanced, model parameter estimation may be affected, algorithm running time is reduced, and according to the idea of decision stage fusion, the primary classifier respectively makes decisions on different types of data, and then a basic classifier result is fused in an integrated learning mode.
The present embodiment uses an amplified two-stage stacking algorithm to address the data imbalance problem, including two aspects: in a first aspect, a method of representing input attributes between levels is changed. The conventional stacking algorithm uses the output results of N different primary learners on the same sample data as the characteristic elements of one input vector of the meta-learner, so that when the number of the primary learners increases, the dimension of the meta-layer characteristic is also continuously increased, and the running time of the algorithm is prolonged. According to the embodiment, the posterior probabilities of the output classes of the N different primary learners on the same sample are respectively used as the N fixed-dimension input vectors of the element layer classifier, so that the training data dimension of the element layer classifier can be prevented from being enlarged along with the increase of the primary classifier, the sample content of the training data of the element layer learner can be improved, and the problem of data sparsity caused by overhigh data dimension is avoided while the running time is saved.
In a second aspect, increasing the number of layers of stacking algorithm; in the embodiment, a primary layer is added on the basis of stacking algorithm, namely, the traditional stacking two-layer algorithm is expanded into a three-layer algorithm; since this embodiment has considered the purpose of saving the running time by changing the input attribute representation method between the hierarchies, the stacking algorithm running time after adding the primary layers is not longer than the conventional stacking algorithm running time using the same number of primary learners. In addition, after a primary layer is added, the generalization capability of the integrated learning can be further improved. The amplification type two-stage stacking algorithm of the embodiment increases the number of layers of the algorithm based on the traditional stacking algorithm, so that the accuracy of the estimation result of the diagnostic model can be ensured, and the reliability of the estimation result on extrapolation can be ensured. The method can more effectively solve challenges brought by unbalanced data sets and improve accuracy and robustness of the model in few category prediction.
Clustering the main category samples by using an EKM (Ensemble K-modes) method in the parameter learning process; the main category samples can be effectively divided into different clusters by the EKM method. After the clustering is completed, adopting two different data combination strategies, (1) clustering a plurality of types of samples into K clusters, and then respectively forming a new data set s 1 with a few types of samples; (2) The majority samples are clustered into K clusters, each cluster of the clusters is divided into K copies, and a copy of the new balance samples s 2 which are formed by the majority samples and the minority samples is selected from each cluster.
When integrated learning is carried out, the embodiment firstly selects a plurality of corresponding primary learners according to different data types, and as shown in fig. 3, the primary learners for image data comprise a convolutional neural network CNN, a full convolutional network FCN, a deep Boltzmann machine DBM and the like; the primary learner for the text data comprises a cyclic neural network RNN, a long-short-term memory network LSTM, a gating cyclic unit GRU and the like; and the primary learner for the numerical data comprises logistic regression, support Vector Machines (SVM), naive Bayes and the like.
Then, optimal parameters of the model are selected by a mesh search method and 5-fold cross-validation, so that a conditional probability table is calculated for each node in the network. Finally, risk assessment is performed by using the conditional probability tables, and stage assessment is performed for common major diseases. The evaluation result can provide an important reference basis for risk evaluation and prevention and treatment of common major diseases.
Verifying the validity of the prediction method provided by the embodiment, and evaluating a virtual person prediction model after structure learning and parameter learning; the present example refers to the evaluation of the predicted outcome of the patient's condition. The actual state of the patient can be obtained through medical records, follow-up visit and the like, so that the consistency of the predicted result of the state of the patient and the actual situation is only required to be compared.
Evaluation index: the predictive model evaluation for the early stage of a significant disease may select a number of metrics such as root mean square error RMSE, mean absolute error MAE, mean absolute percent error MAPE, etc. for quantitative variables, as well as error rate and ROC analysis for categorical variables, etc. These indicators are relatively conventional, and therefore will not be described in detail. It should be noted, however, that the conventional ROC curve is only suitable for analyzing the two kinds of fatalities (e.g., whether they die), and that the fatalities of the present embodiment exist in more than two kinds of health states (e.g., extremely low risk, medium risk, high risk, extremely high risk, etc.). The traditional approach to handling multi-class ending variables is to reclassify the variables by class ending in two classes. But this may be a serious bias in the estimation of the classification prediction results. Therefore, the embodiment adopts a high-dimensional ROC analysis method, and different dimensions of the high-dimensional ROC curved surface are constructed by defining correct classification rates CCR of different categories. Specifically, a coordinate system is formed by taking the correct classification rate of each dimension class as a coordinate axis; and marking coordinate positions (namely working points) corresponding to different critical value combinations in the coordinate system, and then drawing through connecting points to form the high-dimensional ROC curved surface. Similar to the two-dimensional ROC curve, in the high-dimensional ROC analysis, the ability of the treatment selection markers to accurately discriminate across all subjects is measured using the volume VUS under the high-dimensional ROC curve. The probabilistic statistical meaning of VUS is the "probability of correctly grouping each individual in a new sample from each population into its actual group after the individual has been grouped into the sample. "this example uses two methods, non-parametric and semi-parametric, to estimate VUS, whose accuracy and precision have been verified in previous studies.
And clustering the data sets by adopting an EKM method, wherein the similarity between the new sample and the training set is mad, the similarity between the new sample and the training set, which is the similarity mac between the correctly classified samples, comprises four unused clustering sample data sets, namely s1 mad, s1 mac, s2 mad and s2 mac, and corrects the prediction probability and calculates the accuracy, the sensitivity, the specificity and the AUC through four primary learners (LR, C4.5, SVM and KNN), as shown in fig. 4-7.
(1) Accuracy: as shown in fig. 4, the performance shows a tendency to increase with an increase in IR. Overall, both s1 mac and s1 mac perform better than s2 mac and s2 mac.
(2) Sensitivity: as shown in fig. 5, the tendencies are similar. At smaller IR (2, 4), s1 mad and s1 mac perform better than s2 mad and s2 mac, but at larger IR (16, 32), the result is opposite that s2 mad and s2 mac perform better than s1 mad and s1 mac.
(3) AUC: as shown in fig. 6, the performance of the four methods showed a tendency to increase with an increase in IR on the four learners, similarly to the performance in accuracy and specificity. Overall, both s1 mac and s1 mac perform better than s2 mac and s2 mac.
(4) Specificity: as shown in fig. 7, the performance shows a tendency to increase with an increase in IR, similar to the performance in accuracy. Overall, the performance of s1 mad and s1 mac is better than that of s2 mad and s2 mac on the four classifiers except when ir=2.
The foregoing is merely a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any modification and substitution based on the technical scheme and the inventive concept provided by the present invention should be covered in the scope of the present invention.

Claims (6)

1. The method for early prediction of common major diseases based on virtual human simulation is characterized by comprising the following steps of: constructing a virtual human prediction model based on a dynamic Bayesian network through structure learning and parameter learning, and carrying out early prediction on common major diseases through the virtual human prediction model;
The structure learning is used for screening out factors related to risk prediction from potential factors, and determining the topological structure of a dynamic Bayesian network to construct a virtual human prediction model;
The structure learning is to firstly convert knowledge into normalized fuzzy membership degree by utilizing a fuzzy theory, determine the causal relationship between two variables by the membership degree, and construct a plurality of initial virtual human prediction models by taking the relationship between the two variables as a strong limiting condition or a weak limiting condition for searching the Bayesian network structure space; then, an optimal virtual person prediction model is explored from a plurality of initial virtual person prediction models by using a partition MCMC method;
The parameter learning is used for determining probability distribution of each variable in the optimal virtual human prediction model under the condition of giving a father node set, estimating a conditional probability relation among all nodes in the optimal virtual human prediction model by using the existing data set, and distributing probability parameters for each node; the parameter learning takes the stage of common major diseases as a classification ending index of risk assessment, clusters the important diseases through an EKM method, and predicts the result through an amplification type two-stage stacking algorithm; finally, determining optimal parameters of the virtual person prediction model through a grid search method and 5-fold cross verification, so as to construct a final virtual person prediction model;
the amplification type two-stage stacking algorithm takes the posterior probabilities of the output classes of N different primary learners on the same data set as the input vectors of N fixed dimensions of the element layer classifier respectively, and adds one primary layer on the basis of the stacking algorithm.
2. The method for early prediction of common major diseases based on virtual human simulation according to claim 1, wherein the method comprises the following steps: determining the causal relationship between two variables through membership: when the membership is 0, there is no causal relationship between the two variables, when the membership is 1, there is causal relationship between the two variables, and when the membership is between 0 and 1, the causal relationship between the two variables cannot be determined.
3. The method for early prediction of common major diseases based on virtual human simulation according to claim 2, wherein the method comprises the following steps: the strong constraint is the causal relationship of two variables when the membership degree is 0 and the membership degree is 1, and the weak constraint is the causal relationship of two variables when the membership degree is between 0 and 1.
4. The method for early prediction of common major diseases based on virtual human simulation according to claim 1, wherein the method comprises the following steps: the partition MCMC method comprises the following steps:
Dividing all variables in an initial topological structure of the Bayesian network into m zones according to partition requirements and partition rules, numbering the m zones in sequence, setting the variable number of the ith zone (i=1, 2, …, m) as k i, and setting the variable number Number of variables in each partitionThe specific variable in each partition is pi λ, the partition Λ= (λ, pi λ) is marked, and the bayesian network structure under the partition Λ is marked as; Given data D, the posterior distribution of the labeled partition Λ is proportional to the total score obtained by combining the scores of each node X i and its parent Pa i in the Bayesian network structureTotal scoreDetermining an optimal network structure in the Bayesian network structure space according to the marked partition with the largest total score for the equivalence of the marked partition space and the Bayesian network structure space;
In each iteration, the current marker partition is Λ, the proposed marker partition is Λ *, and the acceptance probability is Wherein, the method comprises the steps of, wherein,To mark partitionsIs partitioned by a markerOne partition is split into two partitions or two adjacent partitions are merged into one partition.
5. The method for early prediction of common major diseases based on virtual human simulation according to claim 4, wherein the method comprises the following steps: the partition requirement comprises that no arrow connection exists between variables in the same region, no father node exists in the variables in the 1 st region, and at least one father node from the previous region is needed in each variable of each region except the 1 st region;
the partitioning rule is determined by a strong constraint, and a random simulation mode is adopted for partitioning variables which are not explicitly specified in the strong constraint.
6. The method for early prediction of common major diseases based on virtual human simulation according to claim 1, wherein the method comprises the following steps: selecting different primary learners according to different types of data sets, wherein the primary learners for image data comprise a convolutional neural network CNN, a full convolutional network FCN and a deep Boltzmann machine DBM; the primary learner for text data includes a cyclic neural network RNN, a long-short-term memory network LSTM and a gating cyclic unit GRU; the primary learner for the numerical data includes logistic regression, support vector machines SVM, and naive bayes.
CN202410749228.4A 2024-06-12 2024-06-12 Early prediction method for common major diseases based on virtual person simulation Active CN118335319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410749228.4A CN118335319B (en) 2024-06-12 2024-06-12 Early prediction method for common major diseases based on virtual person simulation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410749228.4A CN118335319B (en) 2024-06-12 2024-06-12 Early prediction method for common major diseases based on virtual person simulation

Publications (2)

Publication Number Publication Date
CN118335319A true CN118335319A (en) 2024-07-12
CN118335319B CN118335319B (en) 2024-08-16

Family

ID=91780428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410749228.4A Active CN118335319B (en) 2024-06-12 2024-06-12 Early prediction method for common major diseases based on virtual person simulation

Country Status (1)

Country Link
CN (1) CN118335319B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170140109A1 (en) * 2014-02-04 2017-05-18 Optimata Ltd. Method and system for prediction of medical treatment effect
CN107767954A (en) * 2017-10-16 2018-03-06 中国科学院地理科学与资源研究所 A kind of Environmental Health Risk Monitoring early warning system and method based on space Bayesian network
CN111627553A (en) * 2020-05-26 2020-09-04 四川大学华西医院 Method for constructing individualized prediction model of first-onset schizophrenia
CN111863237A (en) * 2020-05-29 2020-10-30 东莞理工学院 Intelligent auxiliary diagnosis system for mobile terminal diseases based on deep learning
US20200381083A1 (en) * 2019-05-31 2020-12-03 410 Ai, Llc Estimating predisposition for disease based on classification of artificial image objects created from omics data
WO2023059663A1 (en) * 2021-10-04 2023-04-13 The Broad Institute, Inc. Systems and methods for assessment of body fat composition and type via image processing
US20230248998A1 (en) * 2023-04-12 2023-08-10 Buvaneswari Natarajan System and method for predicting diseases in its early phase using artificial intelligence
CN117153423A (en) * 2023-09-21 2023-12-01 南京工业大学 Bayesian inference-based method for predicting outbreak time of new-born infectious disease
CN117457217A (en) * 2023-12-22 2024-01-26 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) Risk assessment method and system for diabetic nephropathy
CN118155787A (en) * 2024-01-26 2024-06-07 云上贵州大数据产业发展有限公司 Medical data processing method and system based on Internet big data

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170140109A1 (en) * 2014-02-04 2017-05-18 Optimata Ltd. Method and system for prediction of medical treatment effect
CN107767954A (en) * 2017-10-16 2018-03-06 中国科学院地理科学与资源研究所 A kind of Environmental Health Risk Monitoring early warning system and method based on space Bayesian network
US20200381083A1 (en) * 2019-05-31 2020-12-03 410 Ai, Llc Estimating predisposition for disease based on classification of artificial image objects created from omics data
CN111627553A (en) * 2020-05-26 2020-09-04 四川大学华西医院 Method for constructing individualized prediction model of first-onset schizophrenia
CN111863237A (en) * 2020-05-29 2020-10-30 东莞理工学院 Intelligent auxiliary diagnosis system for mobile terminal diseases based on deep learning
WO2023059663A1 (en) * 2021-10-04 2023-04-13 The Broad Institute, Inc. Systems and methods for assessment of body fat composition and type via image processing
US20230248998A1 (en) * 2023-04-12 2023-08-10 Buvaneswari Natarajan System and method for predicting diseases in its early phase using artificial intelligence
CN117153423A (en) * 2023-09-21 2023-12-01 南京工业大学 Bayesian inference-based method for predicting outbreak time of new-born infectious disease
CN117457217A (en) * 2023-12-22 2024-01-26 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) Risk assessment method and system for diabetic nephropathy
CN118155787A (en) * 2024-01-26 2024-06-07 云上贵州大数据产业发展有限公司 Medical data processing method and system based on Internet big data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万红丽 等: "层次贝叶斯模型在校正低估以估计慢性病患病率中的应用", 现代预防医学, 10 November 2022 (2022-11-10) *
王潇;郭宗君;季晓云;王晓林;张敏;王志宏;: "血管性认知障碍发病危险因素预测模型研究", 青岛大学医学院学报, no. 03, 14 August 2017 (2017-08-14) *

Also Published As

Publication number Publication date
CN118335319B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
Lucca et al. CC-integrals: Choquet-like copula-based aggregation functions and its application in fuzzy rule-based classification systems
Wu et al. Beyond sparsity: Tree regularization of deep models for interpretability
Ramana et al. A critical study of selected classification algorithms for liver disease diagnosis
Bertolaccini et al. An overview of the use of artificial neural networks in lung cancer research
Todorov et al. Machine learning driven seismic performance limit state identification for performance-based seismic design of bridge piers
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
Kulluk et al. Fuzzy DIFACONN-miner: A novel approach for fuzzy rule extraction from neural networks
Yarasuri et al. Prediction of hepatitis disease using machine learning technique
Muhammad et al. Employing artificial neural networks for constructing metadata-based model to automatically select an appropriate data visualization technique
Teoh Yi Zhe et al. Knowledge workers mental workload prediction using optimised ELANFIS
Verikas et al. A general framework for designing a fuzzy rule-based classifier
Song et al. Impacts of fractional hot-deck imputation on learning and prediction of engineering data
CN115985503B (en) Cancer prediction system based on ensemble learning
Ettensperger Comparing supervised learning algorithms and artificial neural networks for conflict prediction: performance and applicability of deep learning in the field
Karaca Values and inductive risk in machine learning modelling: the case of binary classification models
Hussein et al. Deep learning and machine learning via a genetic algorithm to classify breast cancer DNA data
Sameer et al. Multi-objectives TLBO hybrid method to select the related risk features with rheumatism disease
Ali et al. An artificial intelligence-based framework for data-driven categorization of computer scientists: a case study of world’s top 10 computing departments
CN118335319B (en) Early prediction method for common major diseases based on virtual person simulation
Belete et al. Wrapper based feature selection techniques on EDHS-HIV/AIDS dataset
Felkin Comparing classification results between n-ary and binary problems
Sitepu et al. Analysis of Fuzzy C-Means and Analytical Hierarchy Process (AHP) Models Using Xie-Beni Index
Shaheen et al. Autonomic workload performance modeling for large-scale databases and data warehouses through deep belief network with data augmentation using conditional generative adversarial networks
Kusy Selection of pattern neurons for a probabilistic neural network by means of clustering and nearest neighbor techniques
Lavanya et al. AMCGWO: An enhanced feature selection based on swarm optimization for effective disease prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant