CN111354464A - CAD prediction model establishing method and device and electronic equipment - Google Patents

CAD prediction model establishing method and device and electronic equipment Download PDF

Info

Publication number
CN111354464A
CN111354464A CN201811588441.2A CN201811588441A CN111354464A CN 111354464 A CN111354464 A CN 111354464A CN 201811588441 A CN201811588441 A CN 201811588441A CN 111354464 A CN111354464 A CN 111354464A
Authority
CN
China
Prior art keywords
cad
prediction model
data
polygene
physical condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811588441.2A
Other languages
Chinese (zh)
Inventor
王伟任
罗依雯
蒋佳新
杨超
朱木春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201811588441.2A priority Critical patent/CN111354464A/en
Publication of CN111354464A publication Critical patent/CN111354464A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Abstract

The invention provides a method and a device for establishing a CAD prediction model and electronic equipment, which relate to the technical field of modeling and comprise the following steps: acquiring genotype data and physical condition data of a sample object; calculating by using the genotype data to obtain a CAD polygene risk value; and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data, so that the technical problem of low accuracy of data obtained by current CAD prediction in the prior art is solved.

Description

CAD prediction model establishing method and device and electronic equipment
Technical Field
The invention relates to the technical field of modeling, in particular to a method and a device for establishing a CAD prediction model and electronic equipment.
Background
Coronary Artery Disease (CAD) is also called Coronary heart Disease, and is a cardiovascular Disease with extremely high morbidity and mortality in the world.
Blood enters the heart via two major coronary arteries and is fed to the heart via a network of blood vessels on the surface of the heart muscle. If cholesterol, fat deposits form in the arteries, narrowing the passage, a condition known as atherosclerosis. Blood flowing in the artery can form a thrombus, blocking the artery. When stressed physiologically or psychologically, the heart jumps faster, requiring more oxygen and nutrients, a condition that coronary arteries cannot handle when severely narrowed or occluded. The result is coronary insufficiency leading to angina or angina pectoris. A heart attack known as coronary heart disease occurs when blood flow to the heart muscle is suddenly reduced by a large amount due to a thrombus occluding a coronary artery.
In the prior art, CAD prediction is the first step of precaution against CAD, but the accuracy of data obtained by CAD prediction using current methods is low.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for establishing a CAD prediction model, and an electronic device, so as to solve the technical problem in the prior art that the accuracy of data obtained by performing CAD prediction is low.
In a first aspect, an embodiment of the present invention provides a method for building a CAD prediction model of a coronary heart disease, including:
acquiring genotype data and physical condition data of a sample object;
calculating by using the genotype data to obtain a CAD polygene risk value;
and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
With reference to the first aspect, the present invention provides a first possible implementation manner of the first aspect, where the calculating using the genotype data to obtain the CAD polygene risk value includes:
calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and performing weighting calculation on the weights to obtain a CAD polygene risk value.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the building a CAD prediction model based on the CAD multi-gene risk value and the physical condition data includes:
and training the initial model through a machine learning algorithm based on the CAD polygene risk value and the physical condition data input into the initial model to obtain a CAD prediction model.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the training the initial model through a machine learning algorithm based on the CAD multi-gene risk value and the physical condition data input into the initial model to obtain a CAD prediction model includes:
taking target data of sample objects with disease labels as a training set, and inputting the training set into an initial model, wherein the target data comprises: the CAD polygene risk values and the physical condition data;
training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and selecting from the plurality of models to be selected by using a test set to obtain a CAD prediction model, wherein the test set is target data of sample objects with labels which are not affected by diseases.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the machine learning algorithm includes: at least one of a random forest algorithm, a support vector machine algorithm, and a decision tree algorithm.
In a second aspect, an embodiment of the present invention further provides a CAD prediction model creating apparatus, including:
an acquisition unit for acquiring genotype data and physical condition data of a sample subject;
the calculation unit is used for calculating by utilizing the genotype data to obtain a CAD polygene risk value;
and the establishing unit is used for establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the computing unit includes:
the calculation module is used for calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and the weighting module is used for carrying out weighting calculation on the weights to obtain a CAD polygene risk value.
With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the establishing unit includes:
an input module, configured to use target data of a sample object with a diseased label as a training set, and input the training set into an initial model, where the target data includes: the CAD polygene risk values and the physical condition data;
the training module is used for training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and the selecting module is used for selecting from the plurality of models to be selected by utilizing a test set to obtain a CAD prediction model, wherein the test set is target data of a sample object with a label without a disease.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program operable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, the present invention also provides a computer-readable medium having non-volatile program code executable by a processor, where the program code causes the processor to execute the method according to the first aspect.
The technical scheme provided by the embodiment of the invention has the following beneficial effects: the embodiment of the invention provides a CAD prediction model establishing method and device and electronic equipment. Firstly, genotype data and physical condition data of a sample object are obtained, then, calculation is carried out by utilizing the genotype data to obtain a CAD multi-gene risk value, and then, a CAD prediction model is established based on the CAD multi-gene risk value and the physical condition data, therefore, the CAD prediction model is established by combining the genotype data and the physical condition data, not only the model can be established by utilizing the individual gene data from the gene level, but also the model can be established by utilizing the individual physical condition data from the physical condition level, the CAD prediction model is established by means of effectively integrating the information of the individual gene level and the information of the physical condition, therefore, the CAD prediction model established by utilizing the CAD prediction model establishing method can enable the prediction result to be more comprehensive, accurate and effective, and the accuracy of the prediction result is improved, therefore, the technical problem that the accuracy of data obtained by CAD prediction is low in the prior art is solved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart illustrating a CAD predictive model building method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a CAD predictive model building method according to a second embodiment of the present invention;
FIG. 3 is a diagram showing a summary statistic format of genome-wide association analysis provided by the second embodiment of the invention;
FIG. 4 is a flow chart illustrating a CAD predictive model application method according to a second embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a CAD predictive model building apparatus according to a third embodiment of the present invention;
fig. 6 shows a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention.
Icon: 3-CAD prediction model building means; 31-an acquisition unit; 32-a calculation unit; 33-a building unit; 4-an electronic device; 41-a memory; 42-a processor; 43-bus; 44-communication interface.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
At present, the accuracy of data obtained by performing CAD prediction is low, and based on this, the method, the apparatus, and the electronic device for establishing a CAD prediction model provided in the embodiments of the present invention can solve the technical problem in the prior art that the accuracy of data obtained by performing CAD prediction is low at present.
For facilitating understanding of the embodiment, a detailed description will be first given of a CAD prediction model establishing method, an apparatus and an electronic device disclosed in the embodiments of the present invention.
The first embodiment is as follows:
the method for establishing the CAD prediction model of the coronary heart disease provided by the embodiment of the invention, as shown in figure 1, comprises the following steps:
s11: genotype data and physical condition data of the sample subject are obtained.
As a preferred aspect, the physical condition data of the sample subject may include: age, sex, body mass index, blood pressure, whether smoking, whether diabetes is present, high density lipoprotein, total cholesterol, etc.
S12: and (4) calculating by using genotype data to obtain a CAD polygene risk value.
The method for calculating the multigene Risk Score (i.e. the CAD multigene Risk value) by using the genotype data, wherein the multigene Risk Score (Polygenic rice Score) is a Score for measuring the Risk of a certain disease by using the individual gene sequencing data, and the currently available calculation methods are: LDpred, P + T, etc.
S13: and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
Specifically, the initial model is trained through a machine learning algorithm based on CAD polygene risk values and physical condition data input into the initial model, and a CAD prediction model is obtained. Therefore, this step can be understood as a training process of the model to realize the construction of the model through machine learning.
In this embodiment, the risk factor of CAD and the variables related to physical conditions are combined, and the prediction model is built by machine learning: constructing a model by utilizing personal gene sequencing data from a gene level; from a physical aspect, the model is constructed using the physical data of the individual.
Therefore, the method for establishing the CAD prediction model provided by the embodiment can be used as a method for establishing a coronary heart disease risk prediction model based on machine learning and multi-gene risk scoring, and the establishment of an individual CAD risk accurate prediction model is realized by using not only a machine learning method but also individual electronic medical record data and genetic variation data.
In the prior art, the risk of CAD is predicted from a single angle, and only the information of genes or the personal physical state information such as age, blood pressure, sex and the like is considered.
The information of the individual gene level and the information of the physical state are effectively integrated and combined together for constructing a model, and the information of the gene level and the information of the physical state level are combined together for predicting CAD. Therefore, during prediction, the information of the gene level and the information of the individual body state level are considered, and compared with the CAD prediction from a single angle, the accuracy of the prediction result is improved by using the CAD prediction model established by the CAD prediction model establishing method provided by the embodiment, so that the CAD prediction is more comprehensive, accurate and effective.
Example two:
the method for establishing the CAD prediction model provided by the embodiment of the invention, as shown in FIG. 2, comprises the following steps:
s21: genotype data and physical condition data of the sample subject are obtained.
Further, the physical condition data of the sample subject is a risk variable related to CAD, for example, age, sex, body mass index, blood pressure, whether or not to smoke, whether or not to suffer from diabetes, high density lipoprotein, total cholesterol, and the like.
S22: and calculating by using CAD gene related data in the genotype data to obtain the weight of CAD gene variation.
When calculating the multigene risk score using the CAD gene-related data in the genotype data, some other calculation methods, for example, P + T, etc., may be used in addition to the LDpred method. This embodiment will be described by taking the example of calculating the multi-gene risk score by the LDpred method.
In the step, the weight of each genetic variation is calculated by using the coronary heart disease whole genome association analysis summary statistics of UK biobank. The core of calculating the multigene risk score is to estimate the weight of each genetic variation to the disease, and this process generally uses the summary statistics of the global genome association analysis (GWAS) as input.
First, as shown in fig. 3, CAD genome-wide association analysis results are normalized into txt files of two forms, the format of the normalized genome-wide association analysis summary statistics. There are several different formats for the results of genome-wide association analysis, and the standardized format in this example can facilitate the use of LDpred to calculate multigene risk scores.
As shown in FIG. 3, wherein chr represents the chromosome where the mutation is located, pos represents the position on the chromosome, ref, alt represent the reference allele and the substitution allele, reffrq represents the frequency of the reference allele, info is not used and can be set to 1, rs represents the ID of SNP (single nucleotide polymorphism), pval represents the p value, and effalt represents the magnitude of the effect.
Then, the standardized summary statistics are integrated with an LD reference file by using a coord method in LDpred software to obtain an integrated HDF5 file. Then, the integrated file is used as an input to calculate a weight value of each SNP (single nucleotide polymorphism) by using an LDpred method in LDpred software.
S23: and performing weighting calculation on the weight to obtain a CAD polygene risk value.
And performing weighted calculation on the individual genotype data needing risk prediction according to the weight values obtained in the step S22 to obtain the multi-gene risk score. Specifically, by using a valid method in LDpred software, the calculated weight value output file and the personal genotype PLINK file are used as input, so that the final multiple-gene risk score of the coronary heart disease of each person, namely the CAD multiple-gene risk value, is calculated.
S24: taking target data of the sample object with the disease label as a training set, and inputting the training set into an initial model, wherein the target data comprises: CAD polygenic risk values and physical condition data.
Specifically, the calculated multi-gene risk score (i.e., the CAD multi-gene risk value) and the personal body state variable (i.e., the physical condition data) are put into an initial model together for feature selection, and the feature selection result is used as a training set, and then the training set is input into the initial model.
S25: and training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected.
Wherein, the machine learning algorithm includes: at least one of a random forest algorithm, a support vector machine algorithm, and a decision tree algorithm. In the step, a model is established and trained by using various algorithms such as random forests, support vector machines, decision trees and the like in machine learning.
Preferably, the risk variables (i.e., physical condition data) associated with coronary heart disease are added to the multi-gene risk score (i.e., CAD multi-gene risk value) to create a risk prediction model, thereby obtaining a plurality of models to be selected.
It should be noted that machine learning is a multi-domain interdisciplinary subject, and is specialized in studying how a computer simulates or implements human learning behaviors to acquire new knowledge or skills, and reorganize an existing knowledge structure to continuously improve its performance. Machine learning is often used in the binary task, and prediction of the binary labels can be achieved through modeling.
S26: and selecting from the plurality of models to be selected by using a test set to obtain the CAD prediction model, wherein the test set is target data of the sample object with the label without the disease.
One of the models to be selected in step S25 with the highest performance is selected as a final model, and the performance of the two classification models is represented by area under the curve (AUC). In the machine learning evaluation indexes, AUC is one of the most common and most common indexes, and the definition of AUC itself is based on geometry, but in statistics and machine learning, AUC is often used to evaluate the performance of a two-class model, and generally, the curve herein refers to a Receiver Operating Curve (ROC).
As another implementation of this embodiment, the data of the sample population with the diseased and non-diseased labels is divided into the training set and the test set according to the ratio of 70% and 30%, i.e. the sample population with the diseased label accounts for 70% and is used as the training set, and the sample population with the non-diseased label accounts for 30% and is used as the test set. And then, training a model on a training set by using various methods of machine learning random forests, support vector machines, decision trees and the like, and selecting an optimal model for actual prediction by AUC (AUC) tested on a test set, namely, the optimal model is used as a CAD (computer aided design) prediction model.
After the CAD prediction model is built, when new individual genotype data and physical condition data are obtained, the multi-gene risk score can be calculated, and then the data are input into the CAD prediction model for prediction, namely the application process of the CAD prediction model.
As shown in fig. 4, for the application method of the CAD prediction model, preferably, the prediction model used in the CAD prediction method is the CAD prediction model established by the CAD prediction model establishment method provided in this embodiment.
Specifically, the application method of the CAD prediction model may include: first, genotype data and physical condition data of a subject to be predicted are acquired. And then, calculating by using the genotype data to obtain a CAD polygene risk value of the object to be predicted. And finally, inputting the physical condition data and the CAD polygene risk value into a CAD prediction model to obtain a CAD prediction result of the object to be predicted.
In practical applications, for the application part of the CAD prediction model, it can be understood that: and inputting various data obtained by the personal physical examination and the multi-gene risk scores of the individuals into the model, so that the probability of obtaining the CAD by the individuals is accurately predicted.
Example three:
as shown in fig. 5, the CAD prediction model creation device 3 according to the embodiment of the present invention includes: an acquisition unit 31, a calculation unit 32 and a setup unit 33.
The acquisition unit is used for acquiring genotype data and physical condition data of the sample object. And the calculation unit is used for calculating by using the genotype data to obtain the CAD polygene risk value. The establishing unit is used for establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
Further, the calculation unit includes: a calculation module and a weighting module. The calculation module is used for calculating by using CAD gene related data in the genotype data to obtain the weight of CAD gene variation. And the weighting module is used for carrying out weighting calculation on the weights to obtain a CAD polygene risk value.
Further, the establishing unit includes: the device comprises an input module, a training module and a selection module. The input module is used for taking target data of the sample object with the diseased label as a training set and inputting the training set into the initial model, wherein the target data comprises: CAD polygenic risk values and physical condition data.
The training module is used for training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected. The selection module is used for selecting from a plurality of models to be selected by utilizing a test set to obtain a CAD prediction model, wherein the test set is target data of a sample object with a label without a disease.
The CAD prediction model establishment apparatus provided in the embodiment of the present invention has the same technical features as the CAD prediction model establishment method provided in the above embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
Example four:
as shown in fig. 6, the electronic device 4 includes a memory 41 and a processor 42, where the memory stores a computer program that can run on the processor, and the processor executes the computer program to implement the steps of the method provided in the first embodiment or the second embodiment.
Referring to fig. 6, the electronic device further includes: a bus 43 and a communication interface 44, the processor 42, the communication interface 44 and the memory 41 being connected by the bus 43; the processor 42 is for executing executable modules, such as computer programs, stored in the memory 41.
The Memory 41 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 44 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
The bus 43 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.
The memory 41 is used for storing a program, and the processor 42 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 42, or implemented by the processor 42.
As a preferred embodiment of this embodiment, the processor 42 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 42. The Processor 42 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-programmable gate Array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 41, and a processor 42 reads information in the memory 41 and performs the steps of the method in combination with hardware thereof.
Example five:
the computer-readable medium provided by the embodiment of the invention has a non-volatile program code executable by a processor, and the program code causes the processor to execute the method provided by the first embodiment or the second embodiment.
Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer-readable medium having the processor-executable nonvolatile program code according to the embodiments of the present invention has the same technical features as the CAD prediction model building method, the CAD prediction model building apparatus, and the electronic device according to the embodiments, so that the same technical problems can be solved, and the same technical effects can be achieved.
The computer program product for performing the CAD prediction model building method provided in the embodiment of the present invention includes a computer-readable storage medium storing a nonvolatile program code executable by a processor, where instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, and will not be described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for establishing a CAD prediction model of coronary heart disease is characterized by comprising the following steps:
acquiring genotype data and physical condition data of a sample object;
calculating by using the genotype data to obtain a CAD polygene risk value;
and establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
2. The method for building a CAD prediction model according to claim 1, wherein the calculating using the genotype data to obtain a CAD polygene risk value comprises:
calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and performing weighting calculation on the weights to obtain a CAD polygene risk value.
3. The method of creating a CAD prediction model according to claim 1, wherein the creating a CAD prediction model based on the CAD polygenic risk values and the physical condition data comprises:
and training the initial model through a machine learning algorithm based on the CAD polygene risk value and the physical condition data input into the initial model to obtain a CAD prediction model.
4. The method for building a CAD prediction model according to claim 3, wherein the training of the initial model by a machine learning algorithm based on the CAD polygene risk values and the physical condition data input into the initial model to obtain the CAD prediction model comprises:
taking target data of sample objects with disease labels as a training set, and inputting the training set into an initial model, wherein the target data comprises: the CAD polygene risk values and the physical condition data;
training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and selecting from the plurality of models to be selected by using a test set to obtain a CAD prediction model, wherein the test set is target data of sample objects with labels which are not affected by diseases.
5. The CAD predictive model building method according to claim 3 or 4, wherein said machine learning algorithm includes: at least one of a random forest algorithm, a support vector machine algorithm, and a decision tree algorithm.
6. A CAD prediction model creation apparatus, comprising:
an acquisition unit for acquiring genotype data and physical condition data of a sample subject;
the calculation unit is used for calculating by utilizing the genotype data to obtain a CAD polygene risk value;
and the establishing unit is used for establishing a CAD prediction model based on the CAD polygene risk value and the physical condition data.
7. The CAD predictive model building apparatus according to claim 6, wherein said calculation unit includes:
the calculation module is used for calculating by using CAD gene associated data in the genotype data to obtain the weight of CAD gene variation;
and the weighting module is used for carrying out weighting calculation on the weights to obtain a CAD polygene risk value.
8. The CAD prediction model creation apparatus according to claim 6, wherein the creation unit includes:
an input module, configured to use target data of a sample object with a diseased label as a training set, and input the training set into an initial model, where the target data includes: the CAD polygene risk values and the physical condition data;
the training module is used for training the initial model through a machine learning algorithm based on the training set to obtain a plurality of models to be selected;
and the selecting module is used for selecting from the plurality of models to be selected by utilizing a test set to obtain a CAD prediction model, wherein the test set is target data of a sample object with a label without a disease.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 5 when executing the computer program.
10. A computer-readable medium having non-volatile program code executable by a processor, wherein the program code causes the processor to perform the method of any of claims 1 to 5.
CN201811588441.2A 2018-12-24 2018-12-24 CAD prediction model establishing method and device and electronic equipment Pending CN111354464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811588441.2A CN111354464A (en) 2018-12-24 2018-12-24 CAD prediction model establishing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811588441.2A CN111354464A (en) 2018-12-24 2018-12-24 CAD prediction model establishing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111354464A true CN111354464A (en) 2020-06-30

Family

ID=71195553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811588441.2A Pending CN111354464A (en) 2018-12-24 2018-12-24 CAD prediction model establishing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111354464A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112017784A (en) * 2020-10-22 2020-12-01 平安科技(深圳)有限公司 Coronary heart disease risk prediction method based on multi-modal data and related equipment
CN112768074A (en) * 2021-01-19 2021-05-07 大禹(上海)医疗健康科技有限公司 Artificial intelligence-based serious disease risk prediction method and system
CN113066531A (en) * 2021-04-13 2021-07-02 平安国际智慧城市科技股份有限公司 Risk prediction method and device, computer equipment and storage medium
CN113593630A (en) * 2021-08-23 2021-11-02 北京果壳生物科技有限公司 Family coronary heart disease risk assessment and risk factor identification system
CN113611412A (en) * 2020-09-03 2021-11-05 北京大学 Method, device and system for predicting coronary heart disease risk caused by T2DM

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102803951A (en) * 2009-06-15 2012-11-28 心脏Dx公司 Determination of coronary artery disease risk
CN103493054A (en) * 2010-10-12 2014-01-01 美国西门子医疗解决公司 Healthcare information technology system for predicting development of cardiovascular conditions
US20140342355A1 (en) * 2011-08-05 2014-11-20 Gendiag.Exe, S.L. Cardiovascular disease
CN105002286A (en) * 2015-07-30 2015-10-28 中国医学科学院阜外心血管病医院 Multiple single nucleotide polymorphic loca related to onset risks of hypertension and/or cardiovascular disease and associated application
US20150356243A1 (en) * 2013-01-11 2015-12-10 Oslo Universitetssykehus Hf Systems and methods for identifying polymorphisms
US20160215341A1 (en) * 2013-08-30 2016-07-28 Gendiag.Exe, S.L. Risk markers for cardiovascular disease in patients with chronic kidney disease
CN106609300A (en) * 2015-10-23 2017-05-03 北京乐普基因科技股份有限公司 Coronary artery disease risk assessment kit and risk assessment method
CN108172296A (en) * 2018-01-23 2018-06-15 上海其明信息技术有限公司 A kind of method for building up of database and the Risk Forecast Method of genetic disease
CN109065171A (en) * 2018-11-05 2018-12-21 苏州贝斯派生物科技有限公司 The construction method and system of Kawasaki disease risk evaluation model based on integrated study

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102803951A (en) * 2009-06-15 2012-11-28 心脏Dx公司 Determination of coronary artery disease risk
CN103493054A (en) * 2010-10-12 2014-01-01 美国西门子医疗解决公司 Healthcare information technology system for predicting development of cardiovascular conditions
US20140342355A1 (en) * 2011-08-05 2014-11-20 Gendiag.Exe, S.L. Cardiovascular disease
US20150356243A1 (en) * 2013-01-11 2015-12-10 Oslo Universitetssykehus Hf Systems and methods for identifying polymorphisms
US20160215341A1 (en) * 2013-08-30 2016-07-28 Gendiag.Exe, S.L. Risk markers for cardiovascular disease in patients with chronic kidney disease
CN105002286A (en) * 2015-07-30 2015-10-28 中国医学科学院阜外心血管病医院 Multiple single nucleotide polymorphic loca related to onset risks of hypertension and/or cardiovascular disease and associated application
CN106609300A (en) * 2015-10-23 2017-05-03 北京乐普基因科技股份有限公司 Coronary artery disease risk assessment kit and risk assessment method
CN108172296A (en) * 2018-01-23 2018-06-15 上海其明信息技术有限公司 A kind of method for building up of database and the Risk Forecast Method of genetic disease
CN109065171A (en) * 2018-11-05 2018-12-21 苏州贝斯派生物科技有限公司 The construction method and system of Kawasaki disease risk evaluation model based on integrated study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
裴晶晶等: "基于高通量测序技术下的心脑血管疾病患病风险评估模型研究", 《云南民族大学学报(自然科学版)》 *
裴晶晶等: "基于高通量测序技术下的心脑血管疾病患病风险评估模型研究", 《云南民族大学学报(自然科学版)》, no. 03, 21 May 2018 (2018-05-21), pages 81 - 86 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611412A (en) * 2020-09-03 2021-11-05 北京大学 Method, device and system for predicting coronary heart disease risk caused by T2DM
CN112017784A (en) * 2020-10-22 2020-12-01 平安科技(深圳)有限公司 Coronary heart disease risk prediction method based on multi-modal data and related equipment
CN112017784B (en) * 2020-10-22 2021-02-09 平安科技(深圳)有限公司 Coronary heart disease risk prediction method based on multi-modal data and related equipment
CN112768074A (en) * 2021-01-19 2021-05-07 大禹(上海)医疗健康科技有限公司 Artificial intelligence-based serious disease risk prediction method and system
CN113066531A (en) * 2021-04-13 2021-07-02 平安国际智慧城市科技股份有限公司 Risk prediction method and device, computer equipment and storage medium
CN113593630A (en) * 2021-08-23 2021-11-02 北京果壳生物科技有限公司 Family coronary heart disease risk assessment and risk factor identification system

Similar Documents

Publication Publication Date Title
CN111354464A (en) CAD prediction model establishing method and device and electronic equipment
Azodi et al. Opening the black box: interpretable machine learning for geneticists
Wang et al. Deep learning for plant genomics and crop improvement
US9646265B2 (en) Model updating method, model updating device, and recording medium
Marjoram et al. Post-GWAS: where next? More samples, more SNPs or more biology?
Kirk et al. Model selection in systems and synthetic biology
Binder et al. Big data in medical science—a biostatistical view: Part 21 of a series on evaluation of scientific publications
US20160249863A1 (en) Health condition determination method and health condition determination system
CN112074915B (en) Visualization of biomedical predictions
Arenas et al. Protein evolution along phylogenetic histories under structurally constrained substitution models
JP2007034700A (en) Prediction program and prediction device
CN111178537A (en) Feature extraction model training method and device
JP6851460B2 (en) Optimal solution determination method, optimal solution determination program, non-temporary recording medium and optimal solution determination device
Hoang et al. Splice sites detection using chaos game representation and neural network
Aadland et al. High-throughput reconstruction of ancestral protein sequence, structure, and molecular function
JPWO2008111349A1 (en) Survival analysis system, survival analysis method, and survival analysis program
JP2009237923A (en) Learning method and system
JP6840627B2 (en) Hyperparameter evaluation method, computer and program
WO2023148733A1 (en) Method and system for predicting drug-drug interactions
EP2560108A1 (en) Logical operation system
KR101864986B1 (en) Disease susceptibility and causal element prediction method based on genome information and apparatus therefor
Kamal et al. An integrated algorithm for local sequence alignment
JP6422512B2 (en) Computer system and graphical model management method
EP3985580A1 (en) Information processing device, information processing method, and program
CN110738318B (en) Network structure operation time evaluation and evaluation model generation method, system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination