CN112509700A - Stable coronary heart disease risk prediction method and device - Google Patents

Stable coronary heart disease risk prediction method and device Download PDF

Info

Publication number
CN112509700A
CN112509700A CN202110157644.1A CN202110157644A CN112509700A CN 112509700 A CN112509700 A CN 112509700A CN 202110157644 A CN202110157644 A CN 202110157644A CN 112509700 A CN112509700 A CN 112509700A
Authority
CN
China
Prior art keywords
heart disease
coronary heart
stable coronary
relative abundance
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110157644.1A
Other languages
Chinese (zh)
Inventor
杨跃进
朱海波
杨进刚
董超然
许靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuwai Hospital of CAMS and PUMC
Original Assignee
Fuwai Hospital of CAMS and PUMC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuwai Hospital of CAMS and PUMC filed Critical Fuwai Hospital of CAMS and PUMC
Priority to CN202110157644.1A priority Critical patent/CN112509700A/en
Publication of CN112509700A publication Critical patent/CN112509700A/en
Priority to PCT/CN2022/075241 priority patent/WO2022166934A1/en
Priority to CN202210114319.1A priority patent/CN114360726B/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a risk prediction method and a risk prediction device for stable coronary heart disease, wherein the method comprises the following steps: obtaining DNA data of stool samples of stable coronary heart disease patients and healthy people; performing double-end sequencing treatment on the DNA data of the fecal sample to obtain intestinal flora metagenome data; performing species annotation analysis and function annotation analysis to obtain relative abundance information; determining intestinal flora characteristic data according to the relative abundance information and a pre-screened stable coronary heart disease biomarker; inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model; adjusting parameters of the machine learning model, and testing the machine learning model after parameter adjustment; and (4) performing performance evaluation on the machine learning model by using the AUROC index to predict the risk of the stable coronary heart disease. The method carries out risk prediction on the stable coronary heart disease and improves the prediction accuracy.

Description

Stable coronary heart disease risk prediction method and device
Technical Field
The invention relates to the technical field of biomedicine, in particular to a risk prediction method and device for stable coronary heart disease.
Background
According to the latest report, cardiovascular disease mainly refers to coronary atherosclerotic heart disease (CAD), and has become the leading killer of urban and rural residents in China. At present, the mainstream view is that cardiovascular diseases including coronary heart disease are a type of immune metabolic diseases and a type of systemic, progressive and inflammatory diseases. The main pathologies are atherosclerotic plaque formation and inflammatory progression, the essential features of which include a non-bacterial inflammatory response resulting from lipid deposition and inflammatory cell accumulation, known as metabolic inflammation. Because in the process of atheromatous plaque and progression, various inflammatory cells and a large number of inflammatory mediators are involved in various links from the continuous progression of lipid streaks to atheromatous plaques until rupture, leading to thrombosis. Because the dynamic and complex of the coronary heart disease and the mechanism of the formation, the development and the rupture of the inflammatory unstable plaque are still unclear, if the initiation factor or the reason of the inflammatory instability of the coronary plaque can be clarified and an effective method for interfering the inflammatory process by a source is found, the incidence, the development and the rupture of the inflammatory instability of the coronary plaque and the sudden event of acute coronary syndrome can be effectively prevented, and the morbidity and the mortality of cardiovascular diseases in China can be greatly reduced; has great and profound social significance and scientific value for guaranteeing the life safety and the body health of people in China and the construction of healthy China.
The intestinal mucosa is the largest immunocompetent organ of the organism, hundreds of billions of bacteria deposited in the intestinal tract are called intestinal microbiota, and a host provides proper environment and necessary nutrition for the intestinal microbiota. In turn, the gut flora is involved in regulating various functions of the human body, such as providing metabolic nutrition to the host, promoting growth and immune regulation, eliminating pathogenic microorganisms, maintaining gut barrier integrity and normal homeostasis. With the recent research, the intestinal microbial flora plays a role in source regulation in human immune inflammatory diseases and metabolic diseases, is closely related to the existence of metabolic inflammation and diseases such as insulin resistance, atherosclerosis, obesity, diabetes and the like, and is exposed from one corner of iceberg when the intestinal flora is used as a source regulation influence factor for the occurrence and development of coronary heart disease. It has been shown that patients with coronary heart disease have a dysbacteriosis of the intestinal tract, manifested by an increased proportion of E.coli, streptococci and H.pylori. The intestinal flora can promote the formation of atherosclerosis through a plurality of pathways such as metabolic pathway, inflammatory reaction and the like. Therefore, the study on the characteristics of the intestinal flora of the coronary heart disease is helpful for better understanding the pathogenesis of the coronary heart disease and provides a new idea for the prediction and treatment of the coronary heart disease.
With the rapid development of various sequencing technologies such as metagenomics, massive data are generated. Therefore, it is very important to extract biomarkers capable of performing risk prediction on stable coronary heart disease from the numerous and complicated redundant biological data and to realize accurate risk prediction of stable coronary heart disease.
Disclosure of Invention
The embodiment of the invention provides a risk prediction method of a stable coronary heart disease, which is used for carrying out risk prediction on the stable coronary heart disease and improving the prediction accuracy, and comprises the following steps:
obtaining DNA data of stool samples of stable coronary heart disease patients and healthy people;
performing double-end sequencing treatment on the DNA data of the fecal sample to obtain intestinal flora metagenome data;
performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people;
determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patients and the historical information of the relative abundance of the healthy people;
inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model;
utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the machine learning model;
testing the machine learning model after parameter adjustment by using the test data;
according to the test result, performing performance evaluation on the machine learning model by using the AUROC index;
and (4) performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
The embodiment of the invention provides a risk prediction device for a stable coronary heart disease, which is used for carrying out risk prediction on the stable coronary heart disease and improving the prediction accuracy, and comprises the following components:
the DNA data acquisition module is used for acquiring the DNA data of the excrement samples of stable coronary heart disease patients and healthy people;
the double-end sequencing processing module is used for carrying out double-end sequencing processing on the DNA data of the excrement sample to obtain intestinal flora metagenome data;
the annotation analysis module is used for performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people;
the characteristic data determination module is used for determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patient and the historical information of the relative abundance of the healthy population;
the model training module is used for inputting the intestinal flora characteristic data into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model;
the parameter adjusting module is used for adjusting parameters of the machine learning model by utilizing a GridSearchCV algorithm and a Hyperopt algorithm;
the model testing module is used for testing the machine learning model after the parameters are adjusted by using the testing data;
the performance evaluation module is used for evaluating the performance of the machine learning model by using the AUROC index according to the test result;
and the risk prediction module is used for predicting the risk of the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the risk prediction method for stable coronary heart disease.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the above risk prediction method for stable coronary heart disease is stored in the computer-readable storage medium.
The embodiment of the invention obtains the DNA data of the excrement samples of stable coronary heart disease patients and healthy people; performing double-end sequencing treatment on the DNA data of the fecal sample to obtain intestinal flora metagenome data; performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people; determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patients and the historical information of the relative abundance of the healthy people; inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model; utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the machine learning model; testing the machine learning model after parameter adjustment by using the test data; according to the test result, performing performance evaluation on the machine learning model by using the AUROC index; and (4) performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation. The embodiment of the invention fully considers the characteristics of the intestinal flora of patients with stable coronary heart disease, and utilizes a machine learning algorithm to screen non-invasive biomarkers which can be used for predicting and monitoring the risk of stable coronary heart disease from complex and redundant biological big data, thereby improving the prediction accuracy and making up the blank of clinical early warning of stable coronary heart disease.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
FIG. 1 is a schematic diagram of a risk prediction method for stable coronary heart disease according to an embodiment of the present invention;
FIG. 2 is a graph of AUROC in a training set according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the biomarkers of stable coronary heart disease screened to play an important role in the model in the example of the present invention;
fig. 4 is a structural diagram of a risk prediction device for stable coronary heart disease in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
As described above, with the rapid development of various sequencing technologies such as metagenomics, huge amounts of data are generated. It is a very challenging task to find useful information from numerous and complicated biological data for disease prediction and diagnosis. With the advent of the big data era, researchers developed various algorithms to mine relevant data in the field of life sciences, and what has to be mentioned for marker diagnostic models is a machine learning algorithm. Machine learning includes a number of methods: linear regression, random forest, etc. Different algorithms are applicable under different conditions and are easily influenced by individual differences of biological samples, experimental methods and the like.
In order to perform risk prediction on stable coronary heart disease and improve the prediction accuracy, an embodiment of the present invention provides a risk prediction method for stable coronary heart disease, as shown in fig. 1, the method may include:
101, obtaining DNA data of excrement samples of stable coronary heart disease patients and healthy people;
102, performing double-end sequencing on the DNA data of the fecal sample to obtain intestinal flora metagenome data;
103, performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people;
104, determining intestinal flora characteristic data according to the relative abundance information and a pre-screened stable coronary heart disease biomarker, wherein the stable coronary heart disease biomarker is pre-screened according to difference bacteria relative abundance historical information, and the difference bacteria relative abundance historical information is obtained by performing difference analysis on the relative abundance historical information of stable coronary heart disease patients and healthy people;
step 105, inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model;
step 106, utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the machine learning model;
step 107, testing the machine learning model after parameter adjustment by using the test data;
108, evaluating the performance of the machine learning model by using an AUROC index according to a test result;
and step 109, performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
As shown in FIG. 1, the embodiment of the present invention obtains DNA data of stool samples of patients with stable coronary heart disease and healthy people; performing double-end sequencing treatment on the DNA data of the fecal sample to obtain intestinal flora metagenome data; performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people; determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patients and the historical information of the relative abundance of the healthy people; inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model; utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the machine learning model; testing the machine learning model after parameter adjustment by using the test data; according to the test result, performing performance evaluation on the machine learning model by using the AUROC index; and (4) performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation. The embodiment of the invention fully considers the characteristics of the intestinal flora of patients with stable coronary heart disease, and utilizes a machine learning algorithm to screen non-invasive biomarkers which can be used for predicting and monitoring the risk of stable coronary heart disease from complex and redundant biological big data, thereby improving the prediction accuracy and making up the blank of clinical early warning of stable coronary heart disease.
In the examples, fecal sample DNA data was obtained for patients with stable coronary heart disease and healthy populations.
In this embodiment, after obtaining fecal sample DNA data of stable coronary heart disease patients and healthy people, total amount data and total concentration data of the fecal sample DNA data are determined by using an agarose gel method; comparing the total amount data and the total concentration data with a preset threshold value; and screening the DNA data of the excrement sample according to the comparison result.
In the embodiment, double-end sequencing processing is carried out on the DNA data of the fecal sample to obtain the metagenome data of the intestinal flora.
In the embodiment, after obtaining the intestinal flora metagenome data, removing the joint in the intestinal flora metagenome data by using Trimmomatic software, and pruning the intestinal flora metagenome data with the joint removed according to a preset base quality value; performing quality evaluation on the trimmed intestinal flora metagenome data by using FastQC software; performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data, wherein the species annotation analysis and the function annotation analysis comprise the following steps: and performing species annotation analysis and functional annotation analysis on the metagenome data of the intestinal flora qualified by the quality evaluation.
In specific implementation, a fecal sample of a patient is collected after the patient receives the item test, and is stored in dry ice within 30 minutes and stored in a refrigerator at-80 ℃ as soon as possible for testing. Extracting DNA, performing quality control on the extracted nucleic acid substances by using an agarose gel method, requiring the total amount of the DNA to be more than or equal to 1 mug and the total concentration of the DNA to be more than or equal to 20 ng/muL, building a library for samples with qualified quality, and then performing illumina hiseq4000 double-end sequencing on the DNA data of the excrement samples to obtain double-end sequencing data of each sample, and storing the double-end sequencing data in a FASTQ file. FASTQ is a text format that stores biological sequences (usually nucleic acid sequences) and the corresponding quality assessments, all in ASCII encoded, almost standard format for high throughput sequencing.
In specific implementation, data is subjected to quality control by using Trimmomatic software, namely, joints (adapters) and low-quality sequences in original data are trimmed and removed. Trimmomatic is a popular Illumina platform data filtering tool, supports multithreading, has high data processing speed, is mainly used for removing joints in a Fastq sequence and pruning Fastq according to base quality values. The method comprises two modes of double-ended sequencing and single-ended sequencing, supports gzip and bzip2 compressed files at the same time, and supports the mutual conversion of phred-33 and phred-64 formats. FastQC is a piece of Java-based software that can quickly assess the quality of sequencing data. And evaluating the quality of the data after quality control by using FastQC software for the filtered data. The quality of the FASTQ sequencing file can be judged from the analysis results of FastQC. If the quality of the FASTQ sequencing file is qualified, performing subsequent data analysis; otherwise, the adjustment parameters are required to be redone, and double-ended sequencing data is trimmed by using Trimmomatic software. It should be noted that the sequenced sequence has a mass value (expressed in letters or symbols, which can be converted to ASCII values minus 64) for each base, which represents the accuracy of the sequence determined, and that if the sequence has a low mass value on average or an average mass value of less than 20, or N is many, the sequence is considered to be of low quality.
In the embodiment, species annotation analysis and function annotation analysis are carried out on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people.
In this embodiment, performing species annotation analysis and functional annotation analysis on the intestinal flora metagenome data includes: downloading an intestinal flora database, the intestinal flora database comprising a plurality of reference genomes, the reference genomes comprising: bacteria, archaea, viruses and eukaryotes; according to the intestinal flora database, species annotation analysis is carried out on the intestinal flora metagenome data by utilizing MetaPhIAn2 software, and function annotation analysis is carried out on the intestinal flora metagenome data by utilizing HUMAnN2 software.
In this example, metagenomic species annotation analysis was performed on the data after quality control using metaphilan 2 software. Metaphilan 2 collated 17000 multiple reference genomes, including 13500 bacteria and archaea, 3500 viruses and 110 eukaryotes. After downloading the corresponding database, accurate taxa assignment and accurate calculation of the relative abundance of the species can be achieved using the software. It can achieve the precision of species level and the identification and tracking of strain level. And performing species annotation and function annotation on the intestinal flora metagenome data to obtain species abundance information of the intestinal flora and establish a model for prediction.
In this embodiment, R software package vegan is used to analyze species diversity, and the input file is intestinal flora species abundance data. LEfSe (LDA Effect size) has a webpage running version (http:// huttenhouwer. sph. harvard. edu/galaxy /), and the abundance data of the enterobacteria species are prepared and input into the webpage running version, and the webpage running version is operated according to a default flow, so that the result, namely the differential flora among the groups can be obtained. The coronary heart disease intestinal flora characteristic data is different bacterial species abundance data obtained from LEfSe analysis.
In an embodiment, the characteristic data of the intestinal flora is determined according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patient and the historical information of the relative abundance of the healthy population.
In this example, the stable coronary heart disease biomarker was pre-screened as follows: and (4) performing feature selection on the historical information of the relative abundance of the differential bacteria by using a Boruta feature selection package, and determining the biomarker of the stable coronary heart disease.
In this embodiment, the characteristic selection is performed on the differential bacteria relative abundance historical information by using a Boruta characteristic selection package as follows: creating a shadow feature matrix according to the historical information of the relative abundance of the differential bacteria; determining real characteristic data and shadow characteristic data according to the shadow characteristic matrix; determining an importance label corresponding to the historical information of the relative abundance of each differential bacterium according to the real characteristic data and the shadow characteristic data; and performing feature selection on the relative abundance historical information of the differential bacteria according to the importance degree label.
In this embodiment, the pre-selected biomarkers for stable coronary heart disease include: bacteroides massiliensis, unclassified Eggerthella unicalified, Klebsiella pneumoniae, Clostridium occidentalis, unclassified Paraleyde Paraprevotella unicalified, Klebsiella pneumoniae _5_1_63Faa Lachnospiraceae _5_1_63FAA, Ananaerobacter faecalis hadrus, unclassified Bilophilum unicalified, Eubacterium ventriosum, Prevotella coprinus, human Roseberia rosenbifera hominiensis, Enterobacterium barbarum Barcelides, Pseudomonas multocida, Escherichia coli, and Escherichia coli.
In this embodiment, the historical information of relative abundance of the difference bacterium is obtained by performing difference analysis on the historical information of relative abundance of the stable coronary heart disease patient and the healthy population, and includes: the historical information of the relative abundance of the difference bacteria is obtained by performing difference analysis on the historical information of the relative abundance of stable coronary disease patients and healthy people by utilizing LDA Effect Size software.
In specific implementation, a boruta algorithm is adopted for feature selection. The goal of Boruta is to select all feature sets associated with the dependent variable, rather than selecting a feature set that minimizes the cost function of the model for a particular model. The significance of the Boruta algorithm is that the Boruta algorithm can help us to more comprehensively understand the influence factors of dependent variables, so that feature selection can be performed better and more efficiently. Boruta is a feature selection package in python, and after the package is installed, the historical information of relative abundance of differential bacteria is input, so that important features suitable for modeling can be obtained. The specific algorithm steps are as follows: (1) creating shadow features (shadow feature), namely randomly disordering the sequence of each real feature R to obtain a shadow feature matrix S, and splicing the shadow feature matrix S behind the real features to form a new feature matrix N = [ R, S ]; (2) training a model by using the new feature matrix N as input to obtain real features and shadow features; (3) taking the maximum value of the shadow features, recording one-time hit when the value in the real features is larger than the maximum value; (4) accumulating hits with the true features recorded in (3), marking features as important or unimportant; (5) insignificant features are deleted and 1-4 repeated until all features are marked.
In the embodiment, the intestinal flora characteristic data is input into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model. And performing parameter adjustment on the machine learning model by utilizing a GridSearchCV algorithm and a Hyperopt algorithm. And testing the machine learning model after the parameters are adjusted by using the test data. And according to the test result, performing performance evaluation on the machine learning model by using the AUROC index. And (4) performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
In this embodiment, inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training includes: and inputting the characteristic data of the intestinal flora into a pre-established LightGBM machine learning model for training. Utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the LightGBM machine learning model; testing the LightGBM machine learning model after parameter adjustment by using the test data; and according to the test result, performing performance evaluation on the LightGBM machine learning model by using the AUROC index.
In this embodiment, GridSearchCV (grid search) adjusts parameters, that is, in a specified parameter range, the parameters are sequentially adjusted according to step size, and a learner is trained by using the adjusted parameters to find the parameter with the highest precision on the verification set from all the parameters, which is a loop and comparison process. LightGBM is a model which is stronger and faster than Xgboost, has great improvement in performance, and has the advantages compared with the traditional algorithm: the method has the advantages of higher training efficiency, low memory use, higher accuracy, support of parallelization learning and capability of processing large-scale data. And the Hyperopt is a tool for adjusting the parameters through Bayesian optimization, and the method has higher speed and better effect. In addition, Hyperopt is combined with MongoDB to perform distributed parameter adjustment, and relatively excellent parameters can be quickly found.
In this embodiment, the LightGBM machine learning construction model is constructed by using the LightGBM packet in python. The model mainly comprises two algorithms: unilateral gradient sampling (GOSS) and mutually Exclusive Feature Binding (EFB). Gos (from reduced sample perspective): most of the samples of the small gradient are excluded and only the remaining samples are used to calculate the information gain. Each data instance has different gradients, and the instance with the larger gradient has larger influence on the information gain according to the definition of calculating the information gain, so that the sample with the larger gradient is kept (the preset threshold value or the highest percentile interval) when sampling, and the sample with the smaller gradient is randomly removed. This measure achieves more accurate results than random sampling at the same sampling rate, especially when the information gain range is large. EFB (from a feature reduction perspective): binding mutually exclusive features, i.e., replacing with a composite feature, many features are almost mutually exclusive (e.g., many features will not be non-zero values at the same time), especially over sparse feature spaces. The mutual exclusion characteristics can be bound, the binding problem is reduced to the graph coloring problem, and an approximate solution is obtained through a greedy algorithm.
In this example, GridSearchCV and Hyperopt are the packages given in python, and we perform parameter tuning after installing these packages in python. The name GridSearchCV can actually be split into two parts, GridSearch and CV, i.e., grid search and cross validation. And grid searching, namely searching parameters, namely adjusting the parameters in sequence according to the step length in a specified parameter range, training a learner by using the adjusted parameters, and finding the parameter with the highest precision on the verification set from all the parameters, which is a training and comparing process. Hyperopt is a class library in python for "distributed asynchronous algorithm configuration/hyper-parameter optimization". By using the method, the complicated hyper-parameter optimization process can be realized, and the optimal hyper-parameter can be automatically obtained. In a broad sense, the model with the hyper-parameters can be regarded as a necessary non-convex function, so that the hyper-pt can almost stably obtain a parameter adjusting result more reasonable than a manual parameter adjusting result. Especially for the model with more complex parameter adjustment, the final performance far exceeding that of the artificial parameter adjustment can be obtained at a speed far faster than that of the artificial parameter adjustment.
In this embodiment, AUROC is collectively called "area under receiver operating characteristic curve", and is often used as an index for evaluating model prediction capability. Before discussing the AUROC curve, we need to understand the concept of a confusion matrix (confusion matrix). A binary prediction may have 4 outcomes: we predict 0, while the true category is 0: this is called True Negative (TN); we predict 0, while the true category is 1: this is called False Negative (FN, False Negative); we predict 1, while the true category is 0: this is called False Positive (FP); we predict 1, while the true category is 1: this is called True Positive (TP). When comparing two different models, it is often more convenient to use a single index than to use multiple indices, and we calculate two indices based on a confusion matrix, and we then combine the two indices into one:
the True Positive Rate (TPR), i.e., sensitivity, hit rate, recall, is defined as TP/(TP + FN). This index corresponds to the proportion of positive data points that are correctly identified as positive to all positive data points. In other words, the higher the TPR, the fewer positive data points we miss.
The False Positive Rate (FPR), i.e., false positive rate, is defined as FP/(FP + TN). This index corresponds to the proportion of negative data points that are mistaken for positives to all negative data points. In other words, the higher the FPR, the more negative data points we misclassify.
To combine the FPR and TPR into one index, we first compute the logistic regression of the first two indices based on different thresholds (e.g., 0.00; 0.01, 0.02, …, 1.00) and then plot them as one image, with the FPR value on the horizontal axis and the TPR value on the vertical axis. The resulting curve is the ROC curve, and the indicator we consider is the AUC of the curve, called AUROC. The diagonal dotted line is the ROC curve for the stochastic predictor: AUROC is 0.5. A stochastic predictor is typically used as a baseline to verify that the model is useful. The higher AUROC indicates the better predictive power of the model.
The following provides a specific example illustrating a specific application of the method for predicting risk of stable coronary heart disease according to the present invention.
1. Clinical enrollment criteria:
patients were divided into 2 groups based on the clinical characteristics of coronary atherosclerotic heart disease, including (1) the stable CAD group (plaque stabilizing group), i.e., stable CAD group, sacad, N =213, (2) the normal control group without atherosclerotic plaques, i.e., normal coronary array group, NCA, N = 175. On the basis of clinical information collection, fresh or properly frozen feces of all groups of people are collected for intestinal metagenome sequencing.
The inclusion criteria of the study population are as follows: stable coronary heart disease (old myocardial infarction, PCI history, stable angina or "healthy person" without clinical ischemic symptoms, with coronary stenosis >50% found by CT/contrast imaging).
Exclusion criteria:
1) myocardial infarction type 2-5 as diagnosed by the international general myocardial infarction definition;
2) severe heart failure/cardiogenic shock (Killip grade >2 or NYHA grade > 2);
3) mechanical complications (perforation of the ventricular septum, rupture of the free wall, rupture of the papillary muscle, etc.);
4) sudden cardiac arrest and/or cardiopulmonary resuscitation after the onset;
5) any antibiotic taken orally or intravenously within 3 months is more than or equal to 1 week;
6) acute Coronary Syndrome (ACS) or coronary revascularization (including PCI and CABG) within 3 months;
7) trauma or surgery within 3 months;
8) history of cerebrovascular disease (including cerebral infarction or cerebral hemorrhage) within 3 months;
9) bleeding of the upper or lower digestive tract within 3 months;
10) clear infection (including digestive tract, respiratory tract, body surface infection, etc.) within 3 months;
11) chronic intestinal diseases (e.g., Crohn's disease, ulcerative colitis, etc.);
12) any tumor;
13) rheumatic immune diseases;
14) chronic kidney disease, including after kidney transplantation.
Study subject enrollment and case information collection procedure:
(1) informed consent;
(2) inclusion/exclusion criteria;
(3) patient lifestyle questionnaires clinical data;
(4) on the basis of clinical information collection, blood and fresh or properly frozen excrement of all groups of people are collected for omics analysis.
The clinical study was conducted in compliance with the requirements of the world medical Congress Helsinki declaration and the relevant national regulations. The clinical study protocol was approved by the medical ethics committee in the hospital Fuweifen, and all clinical patients participating in the experiment signed the Notice consent of the project.
2. The implementation method comprises the following steps:
a total of 388 participants participated in the study at the national cardiovascular disease center, the Chinese academy of medicine, the Fuweisan Hospital. They were classified into the following two groups according to diagnostic guidelines and exclusion criteria: NCA group (N = 175), sacad group (N = 213).
Collecting blood samples of patients in the morning of the next day of admission of the patients under the condition that the fasting time is more than 10 hours, and completing the detection of relevant clinical routine biochemical indexes by a hospital outside the mons, wherein all the detections are performed according to an international standard method. At the same time, a sample of the patient's feces was collected and stored in dry ice within 30 minutes and stored in a freezer at-80 ℃ as soon as possible to be tested. Extracting DNA, and performing quality control on the extracted nucleic acid material by using an agarose gel method. The total amount of DNA is required to be more than or equal to 1 mug, and the total concentration of DNA is required to be more than or equal to 20 ng/muL. And (4) performing library construction on the qualified sample, and performing double-end sequencing on the illumina hiseq 4000. After the original metagenome double-end sequencing data is obtained, the data is subjected to quality control by using Trimmomatic software, and low-quality sequences and joints are removed. And the data after quality control is evaluated by FastQC software. And performing metagenomic species annotation analysis on the data after quality control by using MetaPhIAn2 software. After obtaining the abundance information of the species of the intestinal flora of cancer patients and normal people, analyzing the species diversity of the analytes, analyzing the flora difference among the groups by adopting LEfSe (LDA Effect size) to obtain the characteristics of the intestinal flora of the coronary heart disease, and establishing a model at the species level for prediction. And (3) adopting a LightGBM machine learning method for modeling and a ten-by-ten cross validation method to randomly divide the data into a training set and a test set. First, feature selection is performed using a boruta algorithm. And continuously adjusting parameters by GridSearchCV (grid search) and Hyperopt, and selecting the optimal parameters. And (3) acquiring a batch of external data which never participate in modeling, using the constructed model for predicting the batch of data, and judging whether the prediction model is good or bad through AUROC. The importance of a feature is expressed in its contribution to the model. All analyses used the scimit-spare package of Python. FIG. 2 is a graph of AUROC in the training set, and FIG. 3 is a graph of the screened stable coronary heart disease biomarkers that play an important role in the model.
Based on the same inventive concept, the embodiment of the present invention further provides a risk prediction device for stable coronary heart disease, as described in the following embodiments. Because the principles of solving the problems are similar to the risk prediction method of the stable coronary heart disease, the implementation of the device can be referred to the implementation of the method, and repeated details are not repeated.
Fig. 4 is a structural diagram of a risk prediction apparatus for stable coronary heart disease according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes:
a DNA data obtaining module 401, configured to obtain DNA data of stool samples of patients with stable coronary heart disease and healthy people;
a double-end sequencing processing module 402, configured to perform double-end sequencing processing on the fecal sample DNA data to obtain intestinal flora metagenome data;
an annotation analysis module 403, configured to perform species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain relative abundance information of stable coronary disease patients and healthy people;
a characteristic data determining module 404, configured to determine characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of stable coronary heart disease, where the biomarker of stable coronary heart disease is pre-screened according to a difference bacterium relative abundance history information obtained by performing difference analysis on the relative abundance history information of stable coronary heart disease patients and healthy people;
the model training module 405 is configured to input the feature data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model;
a parameter adjusting module 406, configured to perform parameter adjustment on the machine learning model by using a GridSearchCV algorithm and a Hyperopt algorithm;
the model testing module 407 is configured to test the machine learning model after parameter adjustment by using the test data;
the performance evaluation module 408 is used for evaluating the performance of the machine learning model by using the AUROC index according to the test result;
and the risk prediction module 409 is used for predicting the risk of the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
In one example, the stable coronary heart disease biomarker is pre-screened as follows:
and (4) performing feature selection on the historical information of the relative abundance of the differential bacteria by using a Boruta feature selection package, and determining the biomarker of the stable coronary heart disease.
In one embodiment, the difference bacterium relative abundance historical information is subjected to feature selection by using a Boruta feature selection package as follows:
creating a shadow feature matrix according to the historical information of the relative abundance of the differential bacteria;
determining real characteristic data and shadow characteristic data according to the shadow characteristic matrix;
determining an importance label corresponding to the historical information of the relative abundance of each differential bacterium according to the real characteristic data and the shadow characteristic data;
and performing feature selection on the relative abundance historical information of the differential bacteria according to the importance degree label.
In one embodiment, the pre-selected biomarker for stable coronary heart disease comprises: bacteroides massiliensis, unclassified Eggerthella unicalified, Klebsiella pneumoniae, Clostridium occidentalis, unclassified Paraleyde Paraprevotella unicalified, Klebsiella pneumoniae _5_1_63Faa Lachnospiraceae _5_1_63FAA, Ananaerobacter faecalis hadrus, unclassified Bilophilum unicalified, Eubacterium ventriosum, Prevotella coprinus, human Roseberia rosenbifera hominiensis, Enterobacterium barbarum Barcelides, Pseudomonas multocida, Escherichia coli, and Escherichia coli.
In conclusion, the embodiment of the invention obtains the DNA data of the stool samples of patients with stable coronary heart disease and healthy people; performing double-end sequencing treatment on the DNA data of the fecal sample to obtain intestinal flora metagenome data; performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people; determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patients and the historical information of the relative abundance of the healthy people; inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model; utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the machine learning model; testing the machine learning model after parameter adjustment by using the test data; according to the test result, performing performance evaluation on the machine learning model by using the AUROC index; and (4) performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation. The embodiment of the invention fully considers the characteristics of the intestinal flora of patients with stable coronary heart disease, and utilizes a machine learning algorithm to screen non-invasive biomarkers which can be used for predicting and monitoring the risk of stable coronary heart disease from complex and redundant biological big data, thereby improving the prediction accuracy and making up the blank of clinical early warning of stable coronary heart disease.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for risk prediction of stable coronary heart disease, comprising:
obtaining DNA data of stool samples of stable coronary heart disease patients and healthy people;
performing double-end sequencing treatment on the DNA data of the fecal sample to obtain intestinal flora metagenome data;
performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people;
determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patients and the historical information of the relative abundance of the healthy people;
inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model;
utilizing a GridSearchCV algorithm and a Hyperopt algorithm to carry out parameter adjustment on the machine learning model;
testing the machine learning model after parameter adjustment by using the test data;
according to the test result, performing performance evaluation on the machine learning model by using the AUROC index;
and (4) performing risk prediction on the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
2. The method for risk prediction of stable coronary heart disease according to claim 1, wherein the biomarker for stable coronary heart disease is pre-screened as follows:
and (4) performing feature selection on the historical information of the relative abundance of the differential bacteria by using a Boruta feature selection package, and determining the biomarker of the stable coronary heart disease.
3. The method of risk prediction of stable coronary heart disease according to claim 2, wherein the historical information of relative abundance of the differential bacteria is characterized by the Boruta trait selection package as follows:
creating a shadow feature matrix according to the historical information of the relative abundance of the differential bacteria;
determining real characteristic data and shadow characteristic data according to the shadow characteristic matrix;
determining an importance label corresponding to the historical information of the relative abundance of each differential bacterium according to the real characteristic data and the shadow characteristic data;
and performing feature selection on the relative abundance historical information of the differential bacteria according to the importance degree label.
4. The method for risk prediction of stable coronary heart disease according to claim 1, wherein the pre-selected biomarkers of stable coronary heart disease comprise: bacteroides massiliensis, unclassified Eggerthella unicalified, Klebsiella pneumoniae, Clostridium occidentalis, unclassified Paraleyde Paraprevotella unicalified, Klebsiella pneumoniae _5_1_63Faa Lachnospiraceae _5_1_63FAA, Ananaerobacter faecalis hadrus, unclassified Bilophilum unicalified, Eubacterium ventriosum, Prevotella coprinus, human Roseberia rosenbifera hominiensis, Enterobacterium barbarum Barcelides, Pseudomonas multocida, Escherichia coli, and Escherichia coli.
5. A risk prediction device for stable coronary heart disease, comprising:
the DNA data acquisition module is used for acquiring the DNA data of the excrement samples of stable coronary heart disease patients and healthy people;
the double-end sequencing processing module is used for carrying out double-end sequencing processing on the DNA data of the excrement sample to obtain intestinal flora metagenome data;
the annotation analysis module is used for performing species annotation analysis and function annotation analysis on the intestinal flora metagenome data to obtain the relative abundance information of stable coronary heart disease patients and healthy people;
the characteristic data determination module is used for determining the characteristic data of the intestinal flora according to the relative abundance information and a pre-screened biomarker of the stable coronary heart disease, wherein the biomarker of the stable coronary heart disease is pre-screened according to the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the stable coronary heart disease patient and the historical information of the relative abundance of the healthy population;
the model training module is used for inputting the intestinal flora characteristic data into a pre-established machine learning model for training to obtain a stable coronary heart disease risk prediction model;
the parameter adjusting module is used for adjusting parameters of the machine learning model by utilizing a GridSearchCV algorithm and a Hyperopt algorithm;
the model testing module is used for testing the machine learning model after the parameters are adjusted by using the testing data;
the performance evaluation module is used for evaluating the performance of the machine learning model by using the AUROC index according to the test result;
and the risk prediction module is used for predicting the risk of the stable coronary heart disease by using the stable coronary heart disease risk prediction model qualified in performance evaluation.
6. The risk prediction device for stable coronary heart disease according to claim 5, wherein the biomarker for stable coronary heart disease is pre-screened as follows:
and (4) performing feature selection on the historical information of the relative abundance of the differential bacteria by using a Boruta feature selection package, and determining the biomarker of the stable coronary heart disease.
7. The stable coronary heart disease risk prediction device of claim 6, wherein the historical information of relative abundance of the differential bacteria is characterized by the Boruta feature selection package as follows:
creating a shadow feature matrix according to the historical information of the relative abundance of the differential bacteria;
determining real characteristic data and shadow characteristic data according to the shadow characteristic matrix;
determining an importance label corresponding to the historical information of the relative abundance of each differential bacterium according to the real characteristic data and the shadow characteristic data;
and performing feature selection on the relative abundance historical information of the differential bacteria according to the importance degree label.
8. The risk prediction device of stable coronary heart disease according to claim 5, wherein the pre-screened biomarkers of stable coronary heart disease comprise: bacteroides massiliensis, unclassified Eggerthella unicalified, Klebsiella pneumoniae, Clostridium occidentalis, unclassified Paraleyde Paraprevotella unicalified, Klebsiella pneumoniae _5_1_63Faa Lachnospiraceae _5_1_63FAA, Ananaerobacter faecalis hadrus, unclassified Bilophilum unicalified, Eubacterium ventriosum, Prevotella coprinus, human Roseberia rosenbifera hominiensis, Enterobacterium barbarum Barcelides, Pseudomonas multocida, Escherichia coli, and Escherichia coli.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 4.
CN202110157644.1A 2021-02-05 2021-02-05 Stable coronary heart disease risk prediction method and device Pending CN112509700A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202110157644.1A CN112509700A (en) 2021-02-05 2021-02-05 Stable coronary heart disease risk prediction method and device
PCT/CN2022/075241 WO2022166934A1 (en) 2021-02-05 2022-01-30 Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof
CN202210114319.1A CN114360726B (en) 2021-02-05 2022-01-30 Stable coronary heart disease onset risk assessment marker and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110157644.1A CN112509700A (en) 2021-02-05 2021-02-05 Stable coronary heart disease risk prediction method and device

Publications (1)

Publication Number Publication Date
CN112509700A true CN112509700A (en) 2021-03-16

Family

ID=74952773

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110157644.1A Pending CN112509700A (en) 2021-02-05 2021-02-05 Stable coronary heart disease risk prediction method and device
CN202210114319.1A Active CN114360726B (en) 2021-02-05 2022-01-30 Stable coronary heart disease onset risk assessment marker and application thereof

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202210114319.1A Active CN114360726B (en) 2021-02-05 2022-01-30 Stable coronary heart disease onset risk assessment marker and application thereof

Country Status (1)

Country Link
CN (2) CN112509700A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114283890A (en) * 2021-12-15 2022-04-05 南京医科大学 Disease risk prediction method and device based on rumen coccus microbiota
WO2022166934A1 (en) * 2021-02-05 2022-08-11 中国医学科学院阜外医院 Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof
TWI826332B (en) * 2023-06-08 2023-12-11 宏碁股份有限公司 Method and system for establishing disease prediction model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101568837A (en) * 2006-08-07 2009-10-28 比奥-拉德巴斯德公司 Method for the prediction of vascular events and the diagnosis of acute coronary syndrome
CN102762743A (en) * 2009-12-09 2012-10-31 阿维埃尔公司 Biomarker assay for diagnosis and classification of cardiovascular disease
CN107710205A (en) * 2015-04-14 2018-02-16 优比欧迈公司 For the sign in the microorganism group source of cardiovascular disease condition, diagnosis and the method and system for the treatment of
CN108351342A (en) * 2015-08-20 2018-07-31 深圳华大生命科学研究院 The biomarker of coronary heart disease
CN108962381A (en) * 2017-05-19 2018-12-07 西门子保健有限责任公司 The method based on study of personalized evaluation, long-term forecast and management for atherosclerosis

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016049920A1 (en) * 2014-09-30 2016-04-07 Bgi Shenzhen Co., Limited Biomarkers for coronary artery disease
WO2016049918A1 (en) * 2014-09-30 2016-04-07 Bgi Shenzhen Co., Limited Biomarkers for coronary artery disease
KR101940423B1 (en) * 2016-12-16 2019-01-18 주식회사 엠디헬스케어 Method for diagnosis of heart disease using analysis of bacteria metagenome
CN111157722B (en) * 2019-11-25 2022-10-11 广州惠善医疗技术有限公司 Use of biomarkers
CN111430027B (en) * 2020-03-18 2023-04-28 浙江大学 Duplex affective disorder biomarker based on intestinal microorganisms and screening application thereof
CN111440884B (en) * 2020-04-22 2021-03-16 中国医学科学院北京协和医院 Intestinal flora for diagnosing sarcopenia and application thereof
CN112111586A (en) * 2020-08-11 2020-12-22 康美华大基因技术有限公司 Crohn disease related microbial marker set and application thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101568837A (en) * 2006-08-07 2009-10-28 比奥-拉德巴斯德公司 Method for the prediction of vascular events and the diagnosis of acute coronary syndrome
CN102762743A (en) * 2009-12-09 2012-10-31 阿维埃尔公司 Biomarker assay for diagnosis and classification of cardiovascular disease
CN107710205A (en) * 2015-04-14 2018-02-16 优比欧迈公司 For the sign in the microorganism group source of cardiovascular disease condition, diagnosis and the method and system for the treatment of
CN108351342A (en) * 2015-08-20 2018-07-31 深圳华大生命科学研究院 The biomarker of coronary heart disease
CN108962381A (en) * 2017-05-19 2018-12-07 西门子保健有限责任公司 The method based on study of personalized evaluation, long-term forecast and management for atherosclerosis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DURGADEVI VELUSAMY等: "Ensemble of heterogeneous classifiers for diagnosis and prediction of coronary artery disease with reduced feature subset", 《COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE》 *
MARIUSTRØSEID等: "The gut microbiome in coronary artery disease and heart failure:Current knowledge and future directions", 《EBIOMEDICINE》 *
李俊艳 等: "基于高通道测序的肠道菌群与冠心病的相关性研究", 《中国全科医学杂志》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022166934A1 (en) * 2021-02-05 2022-08-11 中国医学科学院阜外医院 Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof
CN114283890A (en) * 2021-12-15 2022-04-05 南京医科大学 Disease risk prediction method and device based on rumen coccus microbiota
TWI826332B (en) * 2023-06-08 2023-12-11 宏碁股份有限公司 Method and system for establishing disease prediction model

Also Published As

Publication number Publication date
CN114360726A (en) 2022-04-15
CN114360726B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN114292931B (en) Risk assessment marker for acute coronary syndrome and application thereof
Blanco-Míguez et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4
CN112509635A (en) Acute coronary syndrome risk prediction method and device for stable coronary heart disease
CN112509700A (en) Stable coronary heart disease risk prediction method and device
JP6681337B2 (en) Device, kit and method for predicting the onset of sepsis
US20240079092A1 (en) Systems and methods for deriving and optimizing classifiers from multiple datasets
CN105296590B (en) Large intestine carcinoma marker and its application
JP2013513387A (en) Biomarker assay for diagnosis and classification of cardiovascular disease
US20150018238A1 (en) Multi-biomarker-based outcome risk stratification model for pediatric septic shock
KR102044094B1 (en) Method for classifying cancer or normal by deep neural network using gene expression data
CN111206079B (en) Death time inference method based on microbiome sequencing data and machine learning algorithm
CN112289376B (en) Method and device for detecting somatic cell mutation
CN110904213A (en) Intestinal flora-based ulcerative colitis biomarker and application thereof
CN111020020A (en) Biomarker combination for schizophrenia, application thereof and metaplan 2 screening method
CN110838365A (en) Irritable bowel syndrome related flora marker and kit thereof
CN115896242A (en) Intelligent cancer screening model and method based on peripheral blood immune characteristics
Li et al. Exploring postmortem succession of rat intestinal microbiome for PMI based on machine learning algorithms and potential use for humans
Kayvanpour et al. microRNA neural networks improve diagnosis of acute coronary syndrome (ACS)
CN116913382A (en) Artificial intelligence model and method for predicting intestinal age index based on microbiome sequencing data
US20240194294A1 (en) Artificial-intelligence-based method for detecting tumor-derived mutation of cell-free dna, and method for early diagnosis of cancer, using same
WO2022166934A1 (en) Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof
Long et al. Exploration of the shared gene signatures between myocardium and blood in sepsis: evidence from bioinformatics analysis
CN111020021A (en) Intestinal flora-based small-scale schizophrenia biomarker combination, application thereof and mOTU screening method
CN115678999B (en) Application of marker in lung cancer recurrence prediction and prediction model construction method
KR102659915B1 (en) Method of gene selection for predicting medical information of patients and uses thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210316