CN111354464A - CAD prediction model establishing method and device and electronic equipment - Google Patents
CAD prediction model establishing method and device and electronic equipment Download PDFInfo
- Publication number
- CN111354464A CN111354464A CN201811588441.2A CN201811588441A CN111354464A CN 111354464 A CN111354464 A CN 111354464A CN 201811588441 A CN201811588441 A CN 201811588441A CN 111354464 A CN111354464 A CN 111354464A
- Authority
- CN
- China
- Prior art keywords
- cad
- prediction model
- data
- polygene
- physical condition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 208000029078 coronary artery disease Diseases 0.000 claims description 108
- 238000012549 training Methods 0.000 claims description 39
- 238000004422 calculation algorithm Methods 0.000 claims description 24
- 238000010801 machine learning Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 21
- 238000012360 testing method Methods 0.000 claims description 15
- 101150019620 CAD gene Proteins 0.000 claims description 13
- 201000010099 disease Diseases 0.000 claims description 11
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000003066 decision tree Methods 0.000 claims description 5
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims description 5
- 230000003234 polygenic effect Effects 0.000 claims description 4
- 102100034581 Dihydroorotase Human genes 0.000 claims 27
- 108090000623 proteins and genes Proteins 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000012098 association analyses Methods 0.000 description 6
- HVYWMOMLDIMFJA-DPAQBDIFSA-N cholesterol Chemical compound C1C=C2C[C@@H](O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H]([C@H](C)CCCC(C)C)[C@@]1(C)CC2 HVYWMOMLDIMFJA-DPAQBDIFSA-N 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000006870 function Effects 0.000 description 4
- 108700028369 Alleles Proteins 0.000 description 3
- 210000001367 artery Anatomy 0.000 description 3
- 230000036772 blood pressure Effects 0.000 description 3
- 235000012000 cholesterol Nutrition 0.000 description 3
- 210000004351 coronary vessel Anatomy 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000007614 genetic variation Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 206010002383 Angina Pectoris Diseases 0.000 description 2
- 108010010234 HDL Lipoproteins Proteins 0.000 description 2
- 102000015779 HDL Lipoproteins Human genes 0.000 description 2
- 208000007536 Thrombosis Diseases 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 206010012601 diabetes mellitus Diseases 0.000 description 2
- 210000004165 myocardium Anatomy 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 125000003729 nucleotide group Chemical group 0.000 description 2
- 238000013058 risk prediction model Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 201000001320 Atherosclerosis Diseases 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 206010052895 Coronary artery insufficiency Diseases 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 210000004204 blood vessel Anatomy 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000011960 computer-aided design Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 208000010125 myocardial infarction Diseases 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 239000000779 smoke Substances 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Abstract
The invention provides a method and a device for establishing a CAD prediction model and electronic equipment, which relate to the technical field of modeling and comprise the following steps: acquiring genotype data and physical condition data of a sample object; calculating by using the genotype data to obtain a CAD polygene risk value; and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data, so that the technical problem of low accuracy of data obtained by current CAD prediction in the prior art is solved.
Description
Technical Field
The invention relates to the technical field of modeling, in particular to a method and a device for establishing a CAD prediction model and electronic equipment.
Background
Coronary Artery Disease (CAD) is also called Coronary heart Disease, and is a cardiovascular Disease with extremely high morbidity and mortality in the world.
Blood enters the heart via two major coronary arteries and is fed to the heart via a network of blood vessels on the surface of the heart muscle. If cholesterol, fat deposits form in the arteries, narrowing the passage, a condition known as atherosclerosis. Blood flowing in the artery can form a thrombus, blocking the artery. When stressed physiologically or psychologically, the heart jumps faster, requiring more oxygen and nutrients, a condition that coronary arteries cannot handle when severely narrowed or occluded. The result is coronary insufficiency leading to angina or angina pectoris. A heart attack known as coronary heart disease occurs when blood flow to the heart muscle is suddenly reduced by a large amount due to a thrombus occluding a coronary artery.
In the prior art, CAD prediction is the first step of precaution against CAD, but the accuracy of data obtained by CAD prediction using current methods is low.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for establishing a CAD prediction model, and an electronic device, so as to solve the technical problem in the prior art that the accuracy of data obtained by performing CAD prediction is low.
In a first aspect, an embodiment of the present invention provides a method for building a CAD prediction model of a coronary heart disease, including:
acquiring genotype data and physical condition data of a sample object;
calculating by using the genotype data to obtain a CAD polygene risk value;
and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
With reference to the first aspect, the present invention provides a first possible implementation manner of the first aspect, where the calculating using the genotype data to obtain the CAD polygene risk value includes:
calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and performing weighting calculation on the weights to obtain a CAD polygene risk value.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the building a CAD prediction model based on the CAD multi-gene risk value and the physical condition data includes:
and training the initial model through a machine learning algorithm based on the CAD polygene risk value and the physical condition data input into the initial model to obtain a CAD prediction model.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the training the initial model through a machine learning algorithm based on the CAD multi-gene risk value and the physical condition data input into the initial model to obtain a CAD prediction model includes:
taking target data of sample objects with disease labels as a training set, and inputting the training set into an initial model, wherein the target data comprises: the CAD polygene risk values and the physical condition data;
training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and selecting from the plurality of models to be selected by using a test set to obtain a CAD prediction model, wherein the test set is target data of sample objects with labels which are not affected by diseases.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the machine learning algorithm includes: at least one of a random forest algorithm, a support vector machine algorithm, and a decision tree algorithm.
In a second aspect, an embodiment of the present invention further provides a CAD prediction model creating apparatus, including:
an acquisition unit for acquiring genotype data and physical condition data of a sample subject;
the calculation unit is used for calculating by utilizing the genotype data to obtain a CAD polygene risk value;
and the establishing unit is used for establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the computing unit includes:
the calculation module is used for calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and the weighting module is used for carrying out weighting calculation on the weights to obtain a CAD polygene risk value.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the establishing unit includes:
an input module, configured to use target data of a sample object with a diseased label as a training set, and input the training set into an initial model, where the target data includes: the CAD polygene risk values and the physical condition data;
the training module is used for training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and the selecting module is used for selecting from the plurality of models to be selected by utilizing a test set to obtain a CAD prediction model, wherein the test set is target data of a sample object with a label without a disease.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, the present invention also provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to execute the method according to the first aspect.
The technical scheme provided by the embodiment of the invention has the following beneficial effects: the embodiment of the invention provides a CAD prediction model establishing method and device and electronic equipment. Firstly, genotype data and physical condition data of a sample object are obtained, then, calculation is carried out by utilizing the genotype data to obtain a CAD multi-gene risk value, and then, a CAD prediction model is established based on the CAD multi-gene risk value and the physical condition data, therefore, the CAD prediction model is established by combining the genotype data and the physical condition data, not only the model can be established by utilizing the individual gene data from the gene level, but also the model can be established by utilizing the individual physical condition data from the physical condition level, the CAD prediction model is established by means of effectively integrating the information of the individual gene level and the information of the physical condition, therefore, the CAD prediction model established by utilizing the CAD prediction model establishing method can enable the prediction result to be more comprehensive, accurate and effective, and the accuracy of the prediction result is improved, therefore, the technical problem that the accuracy of data obtained by CAD prediction is low in the prior art is solved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart illustrating a CAD predictive model building method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a CAD predictive model building method according to a second embodiment of the present invention;
FIG. 3 is a diagram showing a summary statistic format of genome-wide association analysis provided by the second embodiment of the invention;
FIG. 4 is a flow chart illustrating a CAD predictive model application method according to a second embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a CAD predictive model building apparatus according to a third embodiment of the present invention;
fig. 6 shows a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Icon: 3-CAD prediction model building means; 31-an acquisition unit; 32-a calculation unit; 33-a building unit; 4-an electronic device; 41-a memory; 42-a processor; 43-bus; 44-communication interface.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the accuracy of data obtained by performing CAD prediction is low, and based on this, the method, the apparatus, and the electronic device for establishing a CAD prediction model provided in the embodiments of the present invention can solve the technical problem in the prior art that the accuracy of data obtained by performing CAD prediction is low at present.
For facilitating understanding of the embodiment, a detailed description will be first given of a CAD prediction model establishing method, an apparatus and an electronic device disclosed in the embodiments of the present invention.
The first embodiment is as follows:
the method for establishing the CAD prediction model of the coronary heart disease provided by the embodiment of the invention, as shown in figure 1, comprises the following steps:
s11: genotype data and physical condition data of the sample subject are obtained.
As a preferred aspect, the physical condition data of the sample subject may include: age, sex, body mass index, blood pressure, whether smoking, whether diabetes is present, high density lipoprotein, total cholesterol, etc.
S12: and (4) calculating by using genotype data to obtain a CAD polygene risk value.
The method for calculating the multigene Risk Score (i.e. the CAD multigene Risk value) by using the genotype data, wherein the multigene Risk Score (Polygenic rice Score) is a Score for measuring the Risk of a certain disease by using the individual gene sequencing data, and the currently available calculation methods are: LDpred, P + T, etc.
S13: and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
Specifically, the initial model is trained through a machine learning algorithm based on CAD polygene risk values and physical condition data input into the initial model, and a CAD prediction model is obtained. Therefore, this step can be understood as a training process of the model to realize the construction of the model through machine learning.
In this embodiment, the risk factor of CAD and the variables related to physical conditions are combined, and the prediction model is built by machine learning: constructing a model by utilizing personal gene sequencing data from a gene level; from a physical aspect, the model is constructed using the physical data of the individual.
Therefore, the method for establishing the CAD prediction model provided by the embodiment can be used as a method for establishing a coronary heart disease risk prediction model based on machine learning and multi-gene risk scoring, and the establishment of an individual CAD risk accurate prediction model is realized by using not only a machine learning method but also individual electronic medical record data and genetic variation data.
In the prior art, the risk of CAD is predicted from a single angle, and only the information of genes or the personal physical state information such as age, blood pressure, sex and the like is considered.
The information of the individual gene level and the information of the physical state are effectively integrated and combined together for constructing a model, and the information of the gene level and the information of the physical state level are combined together for predicting CAD. Therefore, during prediction, the information of the gene level and the information of the individual body state level are considered, and compared with the CAD prediction from a single angle, the accuracy of the prediction result is improved by using the CAD prediction model established by the CAD prediction model establishing method provided by the embodiment, so that the CAD prediction is more comprehensive, accurate and effective.
Example two:
the method for establishing the CAD prediction model provided by the embodiment of the invention, as shown in FIG. 2, comprises the following steps:
s21: genotype data and physical condition data of the sample subject are obtained.
Further, the physical condition data of the sample subject is a risk variable related to CAD, for example, age, sex, body mass index, blood pressure, whether or not to smoke, whether or not to suffer from diabetes, high density lipoprotein, total cholesterol, and the like.
S22: and calculating by using CAD gene related data in the genotype data to obtain the weight of CAD gene variation.
When calculating the multigene risk score using the CAD gene-related data in the genotype data, some other calculation methods, for example, P + T, etc., may be used in addition to the LDpred method. This embodiment will be described by taking the example of calculating the multi-gene risk score by the LDpred method.
In the step, the weight of each genetic variation is calculated by using the coronary heart disease whole genome association analysis summary statistics of UK biobank. The core of calculating the multigene risk score is to estimate the weight of each genetic variation to the disease, and this process generally uses the summary statistics of the global genome association analysis (GWAS) as input.
First, as shown in fig. 3, CAD genome-wide association analysis results are normalized into txt files of two forms, the format of the normalized genome-wide association analysis summary statistics. There are several different formats for the results of genome-wide association analysis, and the standardized format in this example can facilitate the use of LDpred to calculate multigene risk scores.
As shown in FIG. 3, wherein chr represents the chromosome where the mutation is located, pos represents the position on the chromosome, ref, alt represent the reference allele and the substitution allele, reffrq represents the frequency of the reference allele, info is not used and can be set to 1, rs represents the ID of SNP (single nucleotide polymorphism), pval represents the p value, and effalt represents the magnitude of the effect.
Then, the standardized summary statistics are integrated with an LD reference file by using a coord method in LDpred software to obtain an integrated HDF5 file. Then, the integrated file is used as an input to calculate a weight value of each SNP (single nucleotide polymorphism) by using an LDpred method in LDpred software.
S23: and performing weighting calculation on the weight to obtain a CAD polygene risk value.
And performing weighted calculation on the individual genotype data needing risk prediction according to the weight values obtained in the step S22 to obtain the multi-gene risk score. Specifically, by using a valid method in LDpred software, the calculated weight value output file and the personal genotype PLINK file are used as input, so that the final multiple-gene risk score of the coronary heart disease of each person, namely the CAD multiple-gene risk value, is calculated.
S24: taking target data of the sample object with the disease label as a training set, and inputting the training set into an initial model, wherein the target data comprises: CAD polygenic risk values and physical condition data.
Specifically, the calculated multi-gene risk score (i.e., the CAD multi-gene risk value) and the personal body state variable (i.e., the physical condition data) are put into an initial model together for feature selection, and the feature selection result is used as a training set, and then the training set is input into the initial model.
S25: and training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected.
Wherein, the machine learning algorithm includes: at least one of a random forest algorithm, a support vector machine algorithm, and a decision tree algorithm. In the step, a model is established and trained by using various algorithms such as random forests, support vector machines, decision trees and the like in machine learning.
Preferably, the risk variables (i.e., physical condition data) associated with coronary heart disease are added to the multi-gene risk score (i.e., CAD multi-gene risk value) to create a risk prediction model, thereby obtaining a plurality of models to be selected.
It should be noted that machine learning is a multi-domain interdisciplinary subject, and is specialized in studying how a computer simulates or implements human learning behaviors to acquire new knowledge or skills, and reorganize an existing knowledge structure to continuously improve its performance. Machine learning is often used in the binary task, and prediction of the binary labels can be achieved through modeling.
S26: and selecting from the plurality of models to be selected by using a test set to obtain the CAD prediction model, wherein the test set is target data of the sample object with the label without the disease.
One of the models to be selected in step S25 with the highest performance is selected as a final model, and the performance of the two classification models is represented by area under the curve (AUC). In the machine learning evaluation indexes, AUC is one of the most common and most common indexes, and the definition of AUC itself is based on geometry, but in statistics and machine learning, AUC is often used to evaluate the performance of a two-class model, and generally, the curve herein refers to a Receiver Operating Curve (ROC).
As another implementation of this embodiment, the data of the sample population with the diseased and non-diseased labels is divided into the training set and the test set according to the ratio of 70% and 30%, i.e. the sample population with the diseased label accounts for 70% and is used as the training set, and the sample population with the non-diseased label accounts for 30% and is used as the test set. And then, training a model on a training set by using various methods of machine learning random forests, support vector machines, decision trees and the like, and selecting an optimal model for actual prediction by AUC (AUC) tested on a test set, namely, the optimal model is used as a CAD (computer aided design) prediction model.
After the CAD prediction model is built, when new individual genotype data and physical condition data are obtained, the multi-gene risk score can be calculated, and then the data are input into the CAD prediction model for prediction, namely the application process of the CAD prediction model.
As shown in fig. 4, for the application method of the CAD prediction model, preferably, the prediction model used in the CAD prediction method is the CAD prediction model established by the CAD prediction model establishment method provided in this embodiment.
Specifically, the application method of the CAD prediction model may include: first, genotype data and physical condition data of a subject to be predicted are acquired. And then, calculating by using the genotype data to obtain a CAD polygene risk value of the object to be predicted. And finally, inputting the physical condition data and the CAD polygene risk value into a CAD prediction model to obtain a CAD prediction result of the object to be predicted.
In practical applications, for the application part of the CAD prediction model, it can be understood that: and inputting various data obtained by the personal physical examination and the multi-gene risk scores of the individuals into the model, so that the probability of obtaining the CAD by the individuals is accurately predicted.
Example three:
as shown in fig. 5, the CAD prediction model creation device 3 according to the embodiment of the present invention includes: an acquisition unit 31, a calculation unit 32 and a setup unit 33.
The acquisition unit is used for acquiring genotype data and physical condition data of the sample object. And the calculation unit is used for calculating by using the genotype data to obtain the CAD polygene risk value. The establishing unit is used for establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
Further, the calculation unit includes: a calculation module and a weighting module. The calculation module is used for calculating by using CAD gene related data in the genotype data to obtain the weight of CAD gene variation. And the weighting module is used for carrying out weighting calculation on the weights to obtain a CAD polygene risk value.
Further, the establishing unit includes: the device comprises an input module, a training module and a selection module. The input module is used for taking target data of the sample object with the diseased label as a training set and inputting the training set into the initial model, wherein the target data comprises: CAD polygenic risk values and physical condition data.
The training module is used for training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected. The selection module is used for selecting from a plurality of models to be selected by utilizing a test set to obtain a CAD prediction model, wherein the test set is target data of a sample object with a label without a disease.
The CAD prediction model establishment apparatus provided in the embodiment of the present invention has the same technical features as the CAD prediction model establishment method provided in the above embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
Example four:
as shown in fig. 6, the electronic device 4 includes a memory 41 and a processor 42, where the memory stores a computer program that can run on the processor, and the processor executes the computer program to implement the steps of the method provided in the first embodiment or the second embodiment.
Referring to fig. 6, the electronic device further includes: a bus 43 and a communication interface 44, the processor 42, the communication interface 44 and the memory 41 being connected by the bus 43; the processor 42 is for executing executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 44 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
The bus 43 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, and the processor 42 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 42, or implemented by the processor 42.
As a preferred embodiment of this embodiment, the processor 42 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 42. The Processor 42 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-programmable gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and a processor 42 reads information in the memory 41 and performs the steps of the method in combination with hardware thereof.
Example five:
the computer-readable medium provided by the embodiment of the invention has a non-volatile program code executable by a processor, and the program code causes the processor to execute the method provided by the first embodiment or the second embodiment.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer-readable medium having the processor-executable nonvolatile program code according to the embodiments of the present invention has the same technical features as the CAD prediction model building method, the CAD prediction model building apparatus, and the electronic device according to the embodiments, so that the same technical problems can be solved, and the same technical effects can be achieved.
The computer program product for performing the CAD prediction model building method provided in the embodiment of the present invention includes a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and will not be described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method for establishing a CAD prediction model of coronary heart disease is characterized by comprising the following steps:
acquiring genotype data and physical condition data of a sample object;
calculating by using the genotype data to obtain a CAD polygene risk value;
and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
2. The method for building a CAD prediction model according to claim 1, wherein the calculating using the genotype data to obtain a CAD polygene risk value comprises:
calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and performing weighting calculation on the weights to obtain a CAD polygene risk value.
3. The method of creating a CAD prediction model according to claim 1, wherein the creating a CAD prediction model based on the CAD polygenic risk values and the physical condition data comprises:
and training the initial model through a machine learning algorithm based on the CAD polygene risk value and the physical condition data input into the initial model to obtain a CAD prediction model.
4. The method for building a CAD prediction model according to claim 3, wherein the training of the initial model by a machine learning algorithm based on the CAD polygene risk values and the physical condition data input into the initial model to obtain the CAD prediction model comprises:
taking target data of sample objects with disease labels as a training set, and inputting the training set into an initial model, wherein the target data comprises: the CAD polygene risk values and the physical condition data;
training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and selecting from the plurality of models to be selected by using a test set to obtain a CAD prediction model, wherein the test set is target data of sample objects with labels which are not affected by diseases.
5. The CAD predictive model building method according to claim 3 or 4, wherein said machine learning algorithm includes: at least one of a random forest algorithm, a support vector machine algorithm, and a decision tree algorithm.
6. A CAD prediction model creation apparatus, comprising:
an acquisition unit for acquiring genotype data and physical condition data of a sample subject;
the calculation unit is used for calculating by utilizing the genotype data to obtain a CAD polygene risk value;
and the establishing unit is used for establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
7. The CAD predictive model building apparatus according to claim 6, wherein said calculation unit includes:
the calculation module is used for calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and the weighting module is used for carrying out weighting calculation on the weights to obtain a CAD polygene risk value.
8. The CAD prediction model creation apparatus according to claim 6, wherein the creation unit includes:
an input module, configured to use target data of a sample object with a diseased label as a training set, and input the training set into an initial model, where the target data includes: the CAD polygene risk values and the physical condition data;
the training module is used for training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and the selecting module is used for selecting from the plurality of models to be selected by utilizing a test set to obtain a CAD prediction model, wherein the test set is target data of a sample object with a label without a disease.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 5 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811588441.2A CN111354464A (en) | 2018-12-24 | 2018-12-24 | CAD prediction model establishing method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811588441.2A CN111354464A (en) | 2018-12-24 | 2018-12-24 | CAD prediction model establishing method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111354464A true CN111354464A (en) | 2020-06-30 |
Family
ID=71195553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811588441.2A Pending CN111354464A (en) | 2018-12-24 | 2018-12-24 | CAD prediction model establishing method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111354464A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112017784A (en) * | 2020-10-22 | 2020-12-01 | 平安科技(深圳)有限公司 | Coronary heart disease risk prediction method based on multi-modal data and related equipment |
CN112768074A (en) * | 2021-01-19 | 2021-05-07 | 大禹(上海)医疗健康科技有限公司 | Artificial intelligence-based serious disease risk prediction method and system |
CN113066531A (en) * | 2021-04-13 | 2021-07-02 | 平安国际智慧城市科技股份有限公司 | Risk prediction method and device, computer equipment and storage medium |
CN113593630A (en) * | 2021-08-23 | 2021-11-02 | 北京果壳生物科技有限公司 | Family coronary heart disease risk assessment and risk factor identification system |
CN113611412A (en) * | 2020-09-03 | 2021-11-05 | 北京大学 | Method, device and system for predicting coronary heart disease risk caused by T2DM |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102803951A (en) * | 2009-06-15 | 2012-11-28 | 心脏Dx公司 | Determination of coronary artery disease risk |
CN103493054A (en) * | 2010-10-12 | 2014-01-01 | 美国西门子医疗解决公司 | Healthcare information technology system for predicting development of cardiovascular conditions |
US20140342355A1 (en) * | 2011-08-05 | 2014-11-20 | Gendiag.Exe, S.L. | Cardiovascular disease |
CN105002286A (en) * | 2015-07-30 | 2015-10-28 | 中国医学科学院阜外心血管病医院 | Multiple single nucleotide polymorphic loca related to onset risks of hypertension and/or cardiovascular disease and associated application |
US20150356243A1 (en) * | 2013-01-11 | 2015-12-10 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
US20160215341A1 (en) * | 2013-08-30 | 2016-07-28 | Gendiag.Exe, S.L. | Risk markers for cardiovascular disease in patients with chronic kidney disease |
CN106609300A (en) * | 2015-10-23 | 2017-05-03 | 北京乐普基因科技股份有限公司 | Coronary artery disease risk assessment kit and risk assessment method |
CN108172296A (en) * | 2018-01-23 | 2018-06-15 | 上海其明信息技术有限公司 | A kind of method for building up of database and the Risk Forecast Method of genetic disease |
CN109065171A (en) * | 2018-11-05 | 2018-12-21 | 苏州贝斯派生物科技有限公司 | The construction method and system of Kawasaki disease risk evaluation model based on integrated study |
-
2018
- 2018-12-24 CN CN201811588441.2A patent/CN111354464A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102803951A (en) * | 2009-06-15 | 2012-11-28 | 心脏Dx公司 | Determination of coronary artery disease risk |
CN103493054A (en) * | 2010-10-12 | 2014-01-01 | 美国西门子医疗解决公司 | Healthcare information technology system for predicting development of cardiovascular conditions |
US20140342355A1 (en) * | 2011-08-05 | 2014-11-20 | Gendiag.Exe, S.L. | Cardiovascular disease |
US20150356243A1 (en) * | 2013-01-11 | 2015-12-10 | Oslo Universitetssykehus Hf | Systems and methods for identifying polymorphisms |
US20160215341A1 (en) * | 2013-08-30 | 2016-07-28 | Gendiag.Exe, S.L. | Risk markers for cardiovascular disease in patients with chronic kidney disease |
CN105002286A (en) * | 2015-07-30 | 2015-10-28 | 中国医学科学院阜外心血管病医院 | Multiple single nucleotide polymorphic loca related to onset risks of hypertension and/or cardiovascular disease and associated application |
CN106609300A (en) * | 2015-10-23 | 2017-05-03 | 北京乐普基因科技股份有限公司 | Coronary artery disease risk assessment kit and risk assessment method |
CN108172296A (en) * | 2018-01-23 | 2018-06-15 | 上海其明信息技术有限公司 | A kind of method for building up of database and the Risk Forecast Method of genetic disease |
CN109065171A (en) * | 2018-11-05 | 2018-12-21 | 苏州贝斯派生物科技有限公司 | The construction method and system of Kawasaki disease risk evaluation model based on integrated study |
Non-Patent Citations (2)
Title |
---|
裴晶晶等: "基于高通量测序技术下的心脑血管疾病患病风险评估模型研究", 《云南民族大学学报(自然科学版)》 * |
裴晶晶等: "基于高通量测序技术下的心脑血管疾病患病风险评估模型研究", 《云南民族大学学报(自然科学版)》, no. 03, 21 May 2018 (2018-05-21), pages 81 - 86 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113611412A (en) * | 2020-09-03 | 2021-11-05 | 北京大学 | Method, device and system for predicting coronary heart disease risk caused by T2DM |
CN112017784A (en) * | 2020-10-22 | 2020-12-01 | 平安科技(深圳)有限公司 | Coronary heart disease risk prediction method based on multi-modal data and related equipment |
CN112017784B (en) * | 2020-10-22 | 2021-02-09 | 平安科技(深圳)有限公司 | Coronary heart disease risk prediction method based on multi-modal data and related equipment |
CN112768074A (en) * | 2021-01-19 | 2021-05-07 | 大禹(上海)医疗健康科技有限公司 | Artificial intelligence-based serious disease risk prediction method and system |
CN113066531A (en) * | 2021-04-13 | 2021-07-02 | 平安国际智慧城市科技股份有限公司 | Risk prediction method and device, computer equipment and storage medium |
CN113593630A (en) * | 2021-08-23 | 2021-11-02 | 北京果壳生物科技有限公司 | Family coronary heart disease risk assessment and risk factor identification system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111354464A (en) | CAD prediction model establishing method and device and electronic equipment | |
Azodi et al. | Opening the black box: interpretable machine learning for geneticists | |
Wang et al. | Deep learning for plant genomics and crop improvement | |
US9646265B2 (en) | Model updating method, model updating device, and recording medium | |
Marjoram et al. | Post-GWAS: where next? More samples, more SNPs or more biology? | |
Kirk et al. | Model selection in systems and synthetic biology | |
Binder et al. | Big data in medical science—a biostatistical view: Part 21 of a series on evaluation of scientific publications | |
US20160249863A1 (en) | Health condition determination method and health condition determination system | |
CN112074915B (en) | Visualization of biomedical predictions | |
Arenas et al. | Protein evolution along phylogenetic histories under structurally constrained substitution models | |
JP2007034700A (en) | Prediction program and prediction device | |
CN111178537A (en) | Feature extraction model training method and device | |
JP6851460B2 (en) | Optimal solution determination method, optimal solution determination program, non-temporary recording medium and optimal solution determination device | |
Hoang et al. | Splice sites detection using chaos game representation and neural network | |
Aadland et al. | High-throughput reconstruction of ancestral protein sequence, structure, and molecular function | |
JPWO2008111349A1 (en) | Survival analysis system, survival analysis method, and survival analysis program | |
JP2009237923A (en) | Learning method and system | |
JP6840627B2 (en) | Hyperparameter evaluation method, computer and program | |
WO2023148733A1 (en) | Method and system for predicting drug-drug interactions | |
EP2560108A1 (en) | Logical operation system | |
KR101864986B1 (en) | Disease susceptibility and causal element prediction method based on genome information and apparatus therefor | |
Kamal et al. | An integrated algorithm for local sequence alignment | |
JP6422512B2 (en) | Computer system and graphical model management method | |
EP3985580A1 (en) | Information processing device, information processing method, and program | |
CN110738318B (en) | Network structure operation time evaluation and evaluation model generation method, system and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |