CN116959725A - Disease risk prediction method based on multi-mode data fusion - Google Patents

Disease risk prediction method based on multi-mode data fusion Download PDF

Info

Publication number
CN116959725A
CN116959725A CN202310951791.5A CN202310951791A CN116959725A CN 116959725 A CN116959725 A CN 116959725A CN 202310951791 A CN202310951791 A CN 202310951791A CN 116959725 A CN116959725 A CN 116959725A
Authority
CN
China
Prior art keywords
data
fusion
prediction
disease risk
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310951791.5A
Other languages
Chinese (zh)
Inventor
马梦媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310951791.5A priority Critical patent/CN116959725A/en
Publication of CN116959725A publication Critical patent/CN116959725A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention relates to the technical field of disease risk prediction, in particular to a disease risk prediction method based on multi-mode data fusion, which comprises the following steps: the data preprocessing comprises the steps of cleaning, normalizing and normalizing data of different modes such as medical images, genome data, electronic medical records and the like; feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm, a clustering algorithm and other methods; data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics; and outputting a prediction result, predicting risk of a new case, and outputting the prediction result. The invention realizes the efficient fusion and utilization of multi-mode data, provides a more accurate and stable disease risk prediction method, and has important practical value and wide application prospect in the field of medical health.

Description

Disease risk prediction method based on multi-mode data fusion
Technical Field
The invention relates to the technical field of disease risk prediction, in particular to a disease risk prediction method based on multi-mode data fusion.
Background
In the medical field, disease risk prediction is a vital work, and can help doctors to identify high-risk disease groups in advance, so that early prevention and early treatment are realized, and the risk of disease occurrence is reduced. The existing disease risk prediction method is mainly based on single-mode data, such as medical images, genome data or electronic medical record data.
However, due to the differences between the characteristics and the information content of various data, the complexity of the disease cannot be comprehensively reflected by the data in a single mode, and the information of various data cannot be fully utilized, so that the accuracy and the stability of the prediction result are limited.
Disclosure of Invention
Based on the above object, the present invention provides a disease risk prediction method by multi-modal data fusion.
A disease risk prediction method based on multi-mode data fusion comprises the following steps:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of different modes of the electronic medical record;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm and a clustering algorithm method;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step four: and outputting a prediction result, predicting risk of a new case, and outputting the prediction result.
Further, the data preprocessing step comprises the steps of cleaning, normalizing and normalizing the medical image, genome data and the data of the electronic medical record,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a unified value range so as to eliminate the influence caused by the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result.
Furthermore, the feature extraction step adopts a deep learning algorithm, and adopts a deep learning algorithm, a clustering algorithm and a feature selection algorithm in the feature extraction process,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
and selecting the features with the greatest influence on the prediction result from the plurality of extracted features by using a feature selection algorithm.
Furthermore, the data fusion step is performed in a linear fusion mode, and the linear fusion is used for integrating the characteristics of different modes, including the characteristics of medical images, genome data and electronic medical records.
Further, the specific operation of the linear fusion is as follows:
firstly, carrying out normalization processing on feature matrixes from different modes to ensure that the value of each feature is in the same range, so that the weight fairness of each feature in the fusion process is ensured, and the fusion result is not excessively influenced due to the large value range of certain features;
furthermore, according to the preset weights or weights obtained through training and learning, the characteristics of different modes are weighted and summed, the determination of the weights is obtained through a cross-validation mode, so that the contribution of the fused characteristics to a prediction model is maximized, and the importance of the characteristics of different modes in disease risk prediction is considered in the determination process of the weights;
and finally, taking the feature matrix obtained through linear fusion as input data for training and predicting a prediction model.
Further, the data fusion step is based on fusion of models, and the operation flow is as follows:
firstly, respectively inputting features extracted from data of different modes into respective prediction models, wherein the prediction models are Convolutional Neural Network (CNN) models suitable for processing various data characteristics, and for genome data, a cyclic neural network (RNN) model is used; for electronic medical record data, a Support Vector Machine (SVM) model is used;
then, taking the prediction results of each model on the disease risk as new features, and combining the new features into a prediction result feature matrix;
furthermore, inputting the feature matrix of the predicted result into a new predicted model, wherein the new predicted model is a linear regression model and is used for learning how to best combine the predicted results of all the models so as to obtain the most accurate disease risk prediction;
and finally, predicting the unknown data by using a new prediction model, wherein the obtained result is a disease risk prediction result based on multi-mode data fusion.
Further, the step of validating the predictive model includes evaluating the predictive model using a separate test data set to determine performance of the model, the step including calculating an accuracy rate, a recall rate, an area under ROC curve (AUC) index to comprehensively evaluate the predictive performance of the model.
Further, the output of the prediction result comprises specific values of the disease risks, grade classification of the disease risks and confidence intervals of the disease risks.
The invention has the beneficial effects that:
the invention adopts the steps of feature extraction, data fusion, model construction, verification and the like, and effectively integrates the multi-mode data from medical images, genome data and electronic medical records. Compared with the traditional single-mode disease risk prediction method, the method disclosed by the invention can more comprehensively consider and utilize the data of various modes, and improves the accuracy and stability of the prediction result.
According to the method, through a fusion mode based on the model, not only can information of each mode be combined, but also potential association among the modes can be found, and the prediction performance is further improved. Meanwhile, the performance of the model can be ensured to be always kept in an optimal state by periodically updating and optimizing the prediction model, and the model is suitable for possible changes of medical data and disease modes.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a prediction method according to an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.
It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
Example 1
As shown in fig. 1, a disease risk prediction method by multi-modal data fusion includes the following steps:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing data of different modes such as medical images, genome data, electronic medical records and the like;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm, a clustering algorithm and other methods;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step four: outputting a prediction result, performing risk prediction on a new case, and outputting the prediction result;
by the method, not only can data of different modes be utilized, but also the accuracy of disease risk prediction can be improved through feature extraction and fusion.
The data preprocessing step comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of the electronic medical record,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a uniform value range (such as between 0 and 1) so as to eliminate the influence of the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result;
the aim of the step is to enable the data of different modes to be subjected to subsequent processing under the same standard, so that the accuracy and stability of disease risk prediction are further enhanced.
The feature extraction step adopts a deep learning algorithm, a clustering algorithm and a feature selection algorithm in the process of feature extraction,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
selecting the feature with the greatest influence on the prediction result from a plurality of extracted features by using a feature selection algorithm;
these selected features constitute a feature set that is intended to provide the best predictive performance while avoiding overfitting. The method can effectively extract representative characteristics, improves the accuracy of the prediction result, and has excellent efficiency and feasibility for processing big data.
In addition, the data fusion step can also adopt a deep fusion method, such as fusion of the features by using a Deep Neural Network (DNN), and can extract higher and more abstract fusion features through nonlinear transformation and mapping of the network. Therefore, the data of different modes can be fully utilized, and the prediction accuracy and stability are improved. The step is a core link of the method, and determines whether the prediction model can effectively utilize information of each mode, so that an effect superior to any single-mode prediction is achieved.
The linear fusion operation is as follows:
firstly, carrying out normalization processing on feature matrixes from different modes to ensure that the value of each feature is in the same range, for example, 0 to 1, so as to ensure that the weight of each feature in the fusion process is fair and the fusion result is not excessively influenced because the value range of some features is large;
furthermore, according to the preset weights or the weights obtained through training and learning, the characteristics of different modes are weighted and summed, the weights are determined through a cross-validation mode, so that the contribution of the fused characteristics to a prediction model is maximized, and in the process of determining the weights, the importance of the characteristics of different modes in disease risk prediction is considered, for example, for certain diseases, the importance of genome data may be higher than that of medical image data;
finally, the feature matrix obtained through linear fusion is used as input data for training and predicting a prediction model, so that the model can be ensured to fully utilize multi-mode information, and the accuracy and stability of prediction are improved;
by means of the linear fusion mode, information of different modes can be effectively combined, accuracy of disease risk prediction is improved, and a prediction result is more accurate and stable.
The step of verifying the prediction model comprises the step of evaluating the prediction model by using an independent test data set to determine the performance of the model, wherein the step comprises the steps of calculating the accuracy, the recall and the area under ROC curve (AUC) index to comprehensively evaluate the prediction performance of the model, and the step is helpful for understanding the performance of the model on unknown data, ensuring the generalization capability of the model and avoiding overfitting.
The output of the prediction result comprises specific values of the disease risks, grade classification of the disease risks and confidence intervals of the disease risks, and the result output mode can provide more comprehensive and specific information for doctors and help the doctors to make more accurate diagnosis and treatment decisions.
Example 2
The method comprises the following steps:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing data of different modes such as medical images, genome data, electronic medical records and the like;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm, a clustering algorithm and other methods;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature; constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step four: outputting a prediction result, performing risk prediction on a new case, and outputting the prediction result;
by the method, not only can data of different modes be utilized, but also the accuracy of disease risk prediction can be improved through feature extraction and fusion.
The data preprocessing step comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of the electronic medical record,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a uniform value range (such as between 0 and 1) so as to eliminate the influence of the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result;
the aim of the step is to enable the data of different modes to be subjected to subsequent processing under the same standard, so that the accuracy and stability of disease risk prediction are further enhanced.
The feature extraction step adopts a deep learning algorithm, a clustering algorithm and a feature selection algorithm in the process of feature extraction,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
selecting the feature with the greatest influence on the prediction result from a plurality of extracted features by using a feature selection algorithm;
these selected features constitute a feature set that is intended to provide the best predictive performance while avoiding overfitting. The method can effectively extract representative characteristics, improves the accuracy of the prediction result, and has excellent efficiency and feasibility for processing big data.
The data fusion step is based on the fusion of models, and the operation flow is as follows:
firstly, respectively inputting features extracted from data of different modes into respective prediction models, wherein the prediction models are Convolutional Neural Network (CNN) models suitable for processing various data characteristics, and for genome data, a cyclic neural network (RNN) model is used; for electronic medical record data, a Support Vector Machine (SVM) model is used;
then, the prediction results of the disease risks of the models are taken as new features and combined into a prediction result feature matrix, and the step is mainly based on the observation that the prediction results of the different models on the same task often contain different information, and the richer and more accurate prediction results can be obtained by combining the information;
furthermore, inputting the feature matrix of the predicted result into a new predicted model, wherein the new predicted model is a linear regression model and is used for learning how to best combine the predicted results of all the models so as to obtain the most accurate disease risk prediction;
finally, predicting the unknown data by using a new prediction model, wherein the obtained result is a disease risk prediction result based on multi-mode data fusion;
the fusion method based on the model can effectively combine data from different modes to find potential association among different modes, so that accuracy and robustness of disease risk prediction are improved. This approach has significant advantages in processing high-dimensional, complex medical data, which is an important means of implementing the present invention.
The output of the prediction result comprises specific values of the disease risks, grade classification of the disease risks and confidence intervals of the disease risks, and the result output mode can provide more comprehensive and specific information for doctors and help the doctors to make more accurate diagnosis and treatment decisions.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims (8)

1. The disease risk prediction method based on multi-mode data fusion is characterized by comprising the following steps of:
step one: the data preprocessing comprises the steps of cleaning, normalizing and normalizing the medical images, genome data and data of different modes of the electronic medical record;
step two: feature extraction, namely performing feature extraction on the preprocessed data, and selecting and extracting features by adopting a deep learning algorithm and a clustering algorithm method;
step three: data fusion, wherein the extracted features are fused by using a fusion algorithm to form a comprehensive feature;
step four: constructing a prediction model, and constructing a disease risk prediction model based on comprehensive characteristics;
step five: and outputting a prediction result, predicting risk of a new case, and outputting the prediction result.
2. The method for predicting disease risk by multi-modal data fusion as defined in claim 1 wherein the data preprocessing step includes cleaning, normalizing and normalizing the medical image, genomic data, and electronic medical record data,
data cleansing includes identifying and processing missing data, duplicate data, and outlier data to reduce its negative impact on the predicted outcome;
the standardization process is to convert the data with different measurement units or measurement scales into relative values without units so as to eliminate the influence caused by the different measurement units among the data and to compare and fuse the data from different sources;
the normalization processing is used for converting the data into a unified value range so as to eliminate the influence caused by the difference of the value ranges among the data and ensure the stability of model training and the reliability of a prediction result.
3. The method for predicting disease risk by multi-modal data fusion as defined in claim 1, wherein the feature extraction step employs a deep learning algorithm, and in the feature extraction process, employs a deep learning algorithm, a clustering algorithm and a feature selection algorithm,
adopts a deep learning algorithm to effectively extract the spatial characteristics in the medical image data,
adopting a clustering algorithm to process the electronic medical record data, grouping the medical record data according to the similarity, and extracting group characteristics;
and selecting the features with the greatest influence on the prediction result from the plurality of extracted features by using a feature selection algorithm.
4. The method for predicting disease risk by multi-modal data fusion according to claim 1, wherein the data fusion step is performed by linear fusion, and the linear fusion is used for integrating features of different modalities, including features of medical images, genome data and electronic medical records.
5. The method for predicting disease risk by multi-modal data fusion according to claim 4, wherein the linear fusion specifically operates as follows:
firstly, carrying out normalization processing on feature matrixes from different modes to ensure that the value of each feature is in the same range, so that the weight fairness of each feature in the fusion process is ensured, and the fusion result is not excessively influenced due to the large value range of certain features;
furthermore, according to the preset weights or weights obtained through training and learning, the characteristics of different modes are weighted and summed, the determination of the weights is obtained through a cross-validation mode, so that the contribution of the fused characteristics to a prediction model is maximized, and the importance of the characteristics of different modes in disease risk prediction is considered in the determination process of the weights;
and finally, taking the feature matrix obtained through linear fusion as input data for training and predicting a prediction model.
6. The disease risk prediction method based on multi-modal data fusion according to claim 1, wherein the data fusion step is based on model fusion, and the operation flow is as follows:
firstly, respectively inputting the features extracted from the data of different modes into respective prediction models, wherein the prediction models are convolutional neural network models suitable for processing various data characteristics, and for genome data, a cyclic neural network model is used; for electronic medical record data, a support vector machine model is used;
then, taking the prediction results of each model on the disease risk as new features, and combining the new features into a prediction result feature matrix;
furthermore, inputting the feature matrix of the predicted result into a new predicted model, wherein the new predicted model is a linear regression model and is used for learning how to best combine the predicted results of all the models so as to obtain the most accurate disease risk prediction;
and finally, predicting the unknown data by using a new prediction model, wherein the obtained result is a disease risk prediction result based on multi-mode data fusion.
7. The method of claim 5, wherein the step of validating the model comprises evaluating the model using an independent test data set to determine the model performance, the step comprising calculating the accuracy, recall, and area under ROC curve metrics to fully evaluate the model's predicted performance.
8. The method for predicting disease risk by multi-modal data fusion according to claim 1, wherein the output of the prediction result includes specific values of disease risk, class classification of disease risk, confidence interval of disease risk.
CN202310951791.5A 2023-07-31 2023-07-31 Disease risk prediction method based on multi-mode data fusion Pending CN116959725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310951791.5A CN116959725A (en) 2023-07-31 2023-07-31 Disease risk prediction method based on multi-mode data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310951791.5A CN116959725A (en) 2023-07-31 2023-07-31 Disease risk prediction method based on multi-mode data fusion

Publications (1)

Publication Number Publication Date
CN116959725A true CN116959725A (en) 2023-10-27

Family

ID=88454519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310951791.5A Pending CN116959725A (en) 2023-07-31 2023-07-31 Disease risk prediction method based on multi-mode data fusion

Country Status (1)

Country Link
CN (1) CN116959725A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253614A (en) * 2023-11-14 2023-12-19 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) Diabetes risk early warning method based on big data analysis
CN117423423A (en) * 2023-12-18 2024-01-19 四川互慧软件有限公司 Health record integration method, equipment and medium based on convolutional neural network
CN117476247A (en) * 2023-12-27 2024-01-30 杭州深麻智能科技有限公司 Intelligent analysis method for disease multi-mode data

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117253614A (en) * 2023-11-14 2023-12-19 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) Diabetes risk early warning method based on big data analysis
CN117253614B (en) * 2023-11-14 2024-01-26 天津医科大学朱宪彝纪念医院(天津医科大学代谢病医院、天津代谢病防治中心) Diabetes risk early warning method based on big data analysis
CN117423423A (en) * 2023-12-18 2024-01-19 四川互慧软件有限公司 Health record integration method, equipment and medium based on convolutional neural network
CN117423423B (en) * 2023-12-18 2024-02-13 四川互慧软件有限公司 Health record integration method, equipment and medium based on convolutional neural network
CN117476247A (en) * 2023-12-27 2024-01-30 杭州深麻智能科技有限公司 Intelligent analysis method for disease multi-mode data
CN117476247B (en) * 2023-12-27 2024-04-19 杭州乐九医疗科技有限公司 Intelligent analysis method for disease multi-mode data

Similar Documents

Publication Publication Date Title
CN116959725A (en) Disease risk prediction method based on multi-mode data fusion
CN107358014B (en) Clinical pretreatment method and system of physiological data
TWI766618B (en) Key point detection method, electronic device and computer readable storage medium
CN111950622B (en) Behavior prediction method, device, terminal and storage medium based on artificial intelligence
CN113113130A (en) Tumor individualized diagnosis and treatment scheme recommendation method
CN103714261A (en) Intelligent auxiliary medical treatment decision supporting method of two-stage mixed model
CN112465231B (en) Method, apparatus and readable storage medium for predicting regional population health status
Biswas et al. Hybrid expert system using case based reasoning and neural network for classification
TWI677830B (en) Method and device for detecting key variables in a model
Maram et al. A framework for performance analysis on machine learning algorithms using covid-19 dataset
Tavakoli Seq2image: Sequence analysis using visualization and deep convolutional neural network
CN116259415A (en) Patient medicine taking compliance prediction method based on machine learning
Tiruneh et al. Feature selection for construction organizational competencies impacting performance
Lu et al. Rice disease identification method based on improved CNN-BiGRU
Swetha et al. Leveraging Scalable Classifier Mining for Improved Heart Disease Diagnosis
CN113707323A (en) Disease prediction method, device, equipment and medium based on machine learning
WO2023061174A1 (en) Method and apparatus for constructing risk prediction model for autism spectrum disorder
CN116401564A (en) PCA-based redundant variable screening improvement method and device
CN110033862B (en) Traditional Chinese medicine quantitative diagnosis system based on weighted directed graph and storage medium
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
Zhou et al. Pre-clustering active learning method for automatic classification of building structures in urban areas
AlShwaish et al. Mortality prediction based on imbalanced new born and perinatal period data
Zhu et al. Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration to Mitigate EHR Data Sparsity
CN117476110B (en) Multi-scale biomarker discovery system based on artificial intelligence
Medasani et al. Machine Learning Techniques for Cardiac Risk Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination