CN116959725A - Disease risk prediction method based on multi-mode data fusion - Google Patents
Disease risk prediction method based on multi-mode data fusion Download PDFInfo
- Publication number
- CN116959725A CN116959725A CN202310951791.5A CN202310951791A CN116959725A CN 116959725 A CN116959725 A CN 116959725A CN 202310951791 A CN202310951791 A CN 202310951791A CN 116959725 A CN116959725 A CN 116959725A
- Authority
- CN
- China
- Prior art keywords
- data
- fusion
- prediction
- disease risk
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 60
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 60
- 230000004927 fusion Effects 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000000605 extraction Methods 0.000 claims abstract description 21
- 238000013135 deep learning Methods 0.000 claims abstract description 15
- 238000004140 cleaning Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000013058 risk prediction model Methods 0.000 claims abstract description 5
- 238000012545 processing Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 13
- 238000005259 measurement Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 238000007499 fusion processing Methods 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention relates to the technical field of disease risk prediction, in particular to a disease risk prediction method based on multi-mode data fusion, which comprises the following steps: the data preprocessing comprises the steps of cleaning, normalizing and normalizing data of different modes such as medical images, genome data, electronic medical records and the like; feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm, a clustering algorithm and other methods; data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics; and outputting a prediction result, predicting risk of a new case, and outputting the prediction result. The invention realizes the efficient fusion and utilization of multi-mode data, provides a more accurate and stable disease risk prediction method, and has important practical value and wide application prospect in the field of medical health.
Description
Technical Field
The invention relates to the technical field of disease risk prediction, in particular to a disease risk prediction method based on multi-mode data fusion.
Background
In the medical field, disease risk prediction is a vital work, and can help doctors to identify high-risk disease groups in advance, so that early prevention and early treatment are realized, and the risk of disease occurrence is reduced. The existing disease risk prediction method is mainly based on single-mode data, such as medical images, genome data or electronic medical record data.
However, due to the differences between the characteristics and the information content of various data, the complexity of the disease cannot be comprehensively reflected by the data in a single mode, and the information of various data cannot be fully utilized, so that the accuracy and the stability of the prediction result are limited.
Disclosure of Invention
Based on the above object, the present invention provides a disease risk prediction method by multi-modal data fusion.
A disease risk prediction method based on multi-mode data fusion comprises the following steps:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of different modes of the electronic medical record;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm and a clustering algorithm method;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step four: and outputting a prediction result, predicting risk of a new case, and outputting the prediction result.
Further, the data preprocessing step comprises the steps of cleaning, normalizing and normalizing the medical image, genome data and the data of the electronic medical record,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a unified value range so as to eliminate the influence caused by the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result.
Furthermore, the feature extraction step adopts a deep learning algorithm, and adopts a deep learning algorithm, a clustering algorithm and a feature selection algorithm in the feature extraction process,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
and selecting the features with the greatest influence on the prediction result from the plurality of extracted features by using a feature selection algorithm.
Furthermore, the data fusion step is performed in a linear fusion mode, and the linear fusion is used for integrating the characteristics of different modes, including the characteristics of medical images, genome data and electronic medical records.
Further, the specific operation of the linear fusion is as follows:
firstly, carrying out normalization processing on feature matrixes from different modes to ensure that the value of each feature is in the same range, so that the weight fairness of each feature in the fusion process is ensured, and the fusion result is not excessively influenced due to the large value range of certain features;
furthermore, according to the preset weights or weights obtained through training and learning, the characteristics of different modes are weighted and summed, the determination of the weights is obtained through a cross-validation mode, so that the contribution of the fused characteristics to a prediction model is maximized, and the importance of the characteristics of different modes in disease risk prediction is considered in the determination process of the weights;
and finally, taking the feature matrix obtained through linear fusion as input data for training and predicting a prediction model.
Further, the data fusion step is based on fusion of models, and the operation flow is as follows:
firstly, respectively inputting features extracted from data of different modes into respective prediction models, wherein the prediction models are Convolutional Neural Network (CNN) models suitable for processing various data characteristics, and for genome data, a cyclic neural network (RNN) model is used; for electronic medical record data, a Support Vector Machine (SVM) model is used;
then, taking the prediction results of each model on the disease risk as new features, and combining the new features into a prediction result feature matrix;
furthermore, inputting the feature matrix of the predicted result into a new predicted model, wherein the new predicted model is a linear regression model and is used for learning how to best combine the predicted results of all the models so as to obtain the most accurate disease risk prediction;
and finally, predicting the unknown data by using a new prediction model, wherein the obtained result is a disease risk prediction result based on multi-mode data fusion.
Further, the step of validating the predictive model includes evaluating the predictive model using a separate test data set to determine performance of the model, the step including calculating an accuracy rate, a recall rate, an area under ROC curve (AUC) index to comprehensively evaluate the predictive performance of the model.
Further, the output of the prediction result comprises specific values of the disease risks, grade classification of the disease risks and confidence intervals of the disease risks.
The invention has the beneficial effects that:
the invention adopts the steps of feature extraction, data fusion, model construction, verification and the like, and effectively integrates the multi-mode data from medical images, genome data and electronic medical records. Compared with the traditional single-mode disease risk prediction method, the method disclosed by the invention can more comprehensively consider and utilize the data of various modes, and improves the accuracy and stability of the prediction result.
According to the method, through a fusion mode based on the model, not only can information of each mode be combined, but also potential association among the modes can be found, and the prediction performance is further improved. Meanwhile, the performance of the model can be ensured to be always kept in an optimal state by periodically updating and optimizing the prediction model, and the model is suitable for possible changes of medical data and disease modes.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a prediction method according to an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.
It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
Example 1
As shown in fig. 1, a disease risk prediction method by multi-modal data fusion includes the following steps:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing data of different modes such as medical images, genome data, electronic medical records and the like;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm, a clustering algorithm and other methods;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step four: outputting a prediction result, performing risk prediction on a new case, and outputting the prediction result;
by the method, not only can data of different modes be utilized, but also the accuracy of disease risk prediction can be improved through feature extraction and fusion.
The data preprocessing step comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of the electronic medical record,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a uniform value range (such as between 0 and 1) so as to eliminate the influence of the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result;
the aim of the step is to enable the data of different modes to be subjected to subsequent processing under the same standard, so that the accuracy and stability of disease risk prediction are further enhanced.
The feature extraction step adopts a deep learning algorithm, a clustering algorithm and a feature selection algorithm in the process of feature extraction,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
selecting the feature with the greatest influence on the prediction result from a plurality of extracted features by using a feature selection algorithm;
these selected features constitute a feature set that is intended to provide the best predictive performance while avoiding overfitting. The method can effectively extract representative characteristics, improves the accuracy of the prediction result, and has excellent efficiency and feasibility for processing big data.
In addition, the data fusion step can also adopt a deep fusion method, such as fusion of the features by using a Deep Neural Network (DNN), and can extract higher and more abstract fusion features through nonlinear transformation and mapping of the network. Therefore, the data of different modes can be fully utilized, and the prediction accuracy and stability are improved. The step is a core link of the method, and determines whether the prediction model can effectively utilize information of each mode, so that an effect superior to any single-mode prediction is achieved.
The linear fusion operation is as follows:
firstly, carrying out normalization processing on feature matrixes from different modes to ensure that the value of each feature is in the same range, for example, 0 to 1, so as to ensure that the weight of each feature in the fusion process is fair and the fusion result is not excessively influenced because the value range of some features is large;
furthermore, according to the preset weights or the weights obtained through training and learning, the characteristics of different modes are weighted and summed, the weights are determined through a cross-validation mode, so that the contribution of the fused characteristics to a prediction model is maximized, and in the process of determining the weights, the importance of the characteristics of different modes in disease risk prediction is considered, for example, for certain diseases, the importance of genome data may be higher than that of medical image data;
finally, the feature matrix obtained through linear fusion is used as input data for training and predicting a prediction model, so that the model can be ensured to fully utilize multi-mode information, and the accuracy and stability of prediction are improved;
by means of the linear fusion mode, information of different modes can be effectively combined, accuracy of disease risk prediction is improved, and a prediction result is more accurate and stable.
The step of verifying the prediction model comprises the step of evaluating the prediction model by using an independent test data set to determine the performance of the model, wherein the step comprises the steps of calculating the accuracy, the recall and the area under ROC curve (AUC) index to comprehensively evaluate the prediction performance of the model, and the step is helpful for understanding the performance of the model on unknown data, ensuring the generalization capability of the model and avoiding overfitting.
The output of the prediction result comprises specific values of the disease risks, grade classification of the disease risks and confidence intervals of the disease risks, and the result output mode can provide more comprehensive and specific information for doctors and help the doctors to make more accurate diagnosis and treatment decisions.
Example 2
The method comprises the following steps:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing data of different modes such as medical images, genome data, electronic medical records and the like;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm, a clustering algorithm and other methods;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step four: outputting a prediction result, performing risk prediction on a new case, and outputting the prediction result;
by the method, not only can data of different modes be utilized, but also the accuracy of disease risk prediction can be improved through feature extraction and fusion.
The data preprocessing step comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of the electronic medical record,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a uniform value range (such as between 0 and 1) so as to eliminate the influence of the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result;
the aim of the step is to enable the data of different modes to be subjected to subsequent processing under the same standard, so that the accuracy and stability of disease risk prediction are further enhanced.
The feature extraction step adopts a deep learning algorithm, a clustering algorithm and a feature selection algorithm in the process of feature extraction,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
selecting the feature with the greatest influence on the prediction result from a plurality of extracted features by using a feature selection algorithm;
these selected features constitute a feature set that is intended to provide the best predictive performance while avoiding overfitting. The method can effectively extract representative characteristics, improves the accuracy of the prediction result, and has excellent efficiency and feasibility for processing big data.
The data fusion step is based on the fusion of models, and the operation flow is as follows:
firstly, respectively inputting features extracted from data of different modes into respective prediction models, wherein the prediction models are Convolutional Neural Network (CNN) models suitable for processing various data characteristics, and for genome data, a cyclic neural network (RNN) model is used; for electronic medical record data, a Support Vector Machine (SVM) model is used;
then, the prediction results of the disease risks of the models are taken as new features and combined into a prediction result feature matrix, and the step is mainly based on the observation that the prediction results of the different models on the same task often contain different information, and the richer and more accurate prediction results can be obtained by combining the information;
furthermore, inputting the feature matrix of the predicted result into a new predicted model, wherein the new predicted model is a linear regression model and is used for learning how to best combine the predicted results of all the models so as to obtain the most accurate disease risk prediction;
finally, predicting the unknown data by using a new prediction model, wherein the obtained result is a disease risk prediction result based on multi-mode data fusion;
the fusion method based on the model can effectively combine data from different modes to find potential association among different modes, so that accuracy and robustness of disease risk prediction are improved. This approach has significant advantages in processing high-dimensional, complex medical data, which is an important means of implementing the present invention.
The output of the prediction result comprises specific values of the disease risks, grade classification of the disease risks and confidence intervals of the disease risks, and the result output mode can provide more comprehensive and specific information for doctors and help the doctors to make more accurate diagnosis and treatment decisions.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.
Claims (8)
1. The disease risk prediction method based on multi-mode data fusion is characterized by comprising the following steps of:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of different modes of the electronic medical record;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm and a clustering algorithm method;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature;
step four: constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step five: and outputting a prediction result, predicting risk of a new case, and outputting the prediction result.
2. The method for predicting disease risk by multi-modal data fusion as defined in claim 1 wherein the data preprocessing step includes cleaning, normalizing and normalizing the medical image, genomic data, and electronic medical record data,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a unified value range so as to eliminate the influence caused by the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result.
3. The method for predicting disease risk by multi-modal data fusion as defined in claim 1, wherein the feature extraction step employs a deep learning algorithm, and in the feature extraction process, employs a deep learning algorithm, a clustering algorithm and a feature selection algorithm,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
and selecting the features with the greatest influence on the prediction result from the plurality of extracted features by using a feature selection algorithm.
4. The method for predicting disease risk by multi-modal data fusion according to claim 1, wherein the data fusion step is performed by linear fusion, and the linear fusion is used for integrating features of different modalities, including features of medical images, genome data and electronic medical records.
5. The method for predicting disease risk by multi-modal data fusion according to claim 4, wherein the linear fusion specifically operates as follows:
firstly, carrying out normalization processing on feature matrixes from different modes to ensure that the value of each feature is in the same range, so that the weight fairness of each feature in the fusion process is ensured, and the fusion result is not excessively influenced due to the large value range of certain features;
furthermore, according to the preset weights or weights obtained through training and learning, the characteristics of different modes are weighted and summed, the determination of the weights is obtained through a cross-validation mode, so that the contribution of the fused characteristics to a prediction model is maximized, and the importance of the characteristics of different modes in disease risk prediction is considered in the determination process of the weights;
and finally, taking the feature matrix obtained through linear fusion as input data for training and predicting a prediction model.
6. The disease risk prediction method based on multi-modal data fusion according to claim 1, wherein the data fusion step is based on model fusion, and the operation flow is as follows:
firstly, respectively inputting the features extracted from the data of different modes into respective prediction models, wherein the prediction models are convolutional neural network models suitable for processing various data characteristics, and for genome data, a cyclic neural network model is used; for electronic medical record data, a support vector machine model is used;
then, taking the prediction results of each model on the disease risk as new features, and combining the new features into a prediction result feature matrix;
furthermore, inputting the feature matrix of the predicted result into a new predicted model, wherein the new predicted model is a linear regression model and is used for learning how to best combine the predicted results of all the models so as to obtain the most accurate disease risk prediction;
and finally, predicting the unknown data by using a new prediction model, wherein the obtained result is a disease risk prediction result based on multi-mode data fusion.
7. The method of claim 5, wherein the step of validating the model comprises evaluating the model using an independent test data set to determine the model performance, the step comprising calculating the accuracy, recall, and area under ROC curve metrics to fully evaluate the model's predicted performance.
8. The method for predicting disease risk by multi-modal data fusion according to claim 1, wherein the output of the prediction result includes specific values of disease risk, class classification of disease risk, confidence interval of disease risk.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310951791.5A CN116959725A (en) | 2023-07-31 | 2023-07-31 | Disease risk prediction method based on multi-mode data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310951791.5A CN116959725A (en) | 2023-07-31 | 2023-07-31 | Disease risk prediction method based on multi-mode data fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116959725A true CN116959725A (en) | 2023-10-27 |
Family
ID=88454519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310951791.5A Pending CN116959725A (en) | 2023-07-31 | 2023-07-31 | Disease risk prediction method based on multi-mode data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116959725A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117253614A (en) * | 2023-11-14 | 2023-12-19 | 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) | Diabetes risk early warning method based on big data analysis |
CN117423423A (en) * | 2023-12-18 | 2024-01-19 | 四川互慧软件有限公司 | Health record integration method, equipment and medium based on convolutional neural network |
CN117476247A (en) * | 2023-12-27 | 2024-01-30 | 杭州深麻智能科技有限公司 | Intelligent analysis method for disease multi-mode data |
-
2023
- 2023-07-31 CN CN202310951791.5A patent/CN116959725A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117253614A (en) * | 2023-11-14 | 2023-12-19 | 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) | Diabetes risk early warning method based on big data analysis |
CN117253614B (en) * | 2023-11-14 | 2024-01-26 | 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) | Diabetes risk early warning method based on big data analysis |
CN117423423A (en) * | 2023-12-18 | 2024-01-19 | 四川互慧软件有限公司 | Health record integration method, equipment and medium based on convolutional neural network |
CN117423423B (en) * | 2023-12-18 | 2024-02-13 | 四川互慧软件有限公司 | Health record integration method, equipment and medium based on convolutional neural network |
CN117476247A (en) * | 2023-12-27 | 2024-01-30 | 杭州深麻智能科技有限公司 | Intelligent analysis method for disease multi-mode data |
CN117476247B (en) * | 2023-12-27 | 2024-04-19 | 杭州乐九医疗科技有限公司 | Intelligent analysis method for disease multi-mode data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116959725A (en) | Disease risk prediction method based on multi-mode data fusion | |
CN107358014B (en) | Clinical pretreatment method and system of physiological data | |
TWI766618B (en) | Key point detection method, electronic device and computer readable storage medium | |
CN111950622B (en) | Behavior prediction method, device, terminal and storage medium based on artificial intelligence | |
CN113113130A (en) | Tumor individualized diagnosis and treatment scheme recommendation method | |
CN103714261A (en) | Intelligent auxiliary medical treatment decision supporting method of two-stage mixed model | |
CN112465231B (en) | Method, apparatus and readable storage medium for predicting regional population health status | |
Biswas et al. | Hybrid expert system using case based reasoning and neural network for classification | |
TWI677830B (en) | Method and device for detecting key variables in a model | |
Maram et al. | A framework for performance analysis on machine learning algorithms using covid-19 dataset | |
Tavakoli | Seq2image: Sequence analysis using visualization and deep convolutional neural network | |
CN116259415A (en) | Patient medicine taking compliance prediction method based on machine learning | |
Tiruneh et al. | Feature selection for construction organizational competencies impacting performance | |
Lu et al. | Rice disease identification method based on improved CNN-BiGRU | |
Swetha et al. | Leveraging Scalable Classifier Mining for Improved Heart Disease Diagnosis | |
CN113707323A (en) | Disease prediction method, device, equipment and medium based on machine learning | |
WO2023061174A1 (en) | Method and apparatus for constructing risk prediction model for autism spectrum disorder | |
CN116401564A (en) | PCA-based redundant variable screening improvement method and device | |
CN110033862B (en) | Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium | |
CN110265151B (en) | Learning method based on heterogeneous temporal data in EHR | |
Zhou et al. | Pre-clustering active learning method for automatic classification of building structures in urban areas | |
AlShwaish et al. | Mortality prediction based on imbalanced new born and perinatal period data | |
Zhu et al. | Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration to Mitigate EHR Data Sparsity | |
CN117476110B (en) | Multi-scale biomarker discovery system based on artificial intelligence | |
Medasani et al. | Machine Learning Techniques for Cardiac Risk Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |