WO2016049920A1 - Biomarkers for coronary artery disease - Google Patents
Biomarkers for coronary artery disease Download PDFInfo
- Publication number
- WO2016049920A1 WO2016049920A1 PCT/CN2014/088046 CN2014088046W WO2016049920A1 WO 2016049920 A1 WO2016049920 A1 WO 2016049920A1 CN 2014088046 W CN2014088046 W CN 2014088046W WO 2016049920 A1 WO2016049920 A1 WO 2016049920A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cvd
- streptococcus
- cad
- biomarker
- con
- Prior art date
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/689—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Definitions
- the present invention relates to biomarkers and methods for predicting the risk of a disease related to microbes, in particular coronary artery disease (CAD) or related heart diseases.
- CAD coronary artery disease
- Coronary artery disease refers to any abnormal condition of the coronary arteries that interferes with the delivery of an adequate supply of blood to the cardiac (i. e. , heart) muscle or any portion thereof.
- CAD is caused by the accumulation of plaque on the arterial walls (i. e. , atherosclerosis) , particularly in the large and medium-sized arteries serving the heart. These conditions have similar causes, mechanisms, and treatments.
- CAD represents the leading cause of death and morbidity worldwide. Early diagnosis of CAD will help to not only prevent mortality, but also reduce the costs for surgical intervention.
- the “gold standard” for detecting CAD is invasive coronary angiography. However, this is costly, and can pose risk to the patient. Prior to angiography, non-invasive diagnostic modalities such as myocardial perfusion imaging (MPI) and CT-angiography may be used, however these have complications including radiation exposure, contrast agent sensitivity, and only add moderately to obstructive CAD identification.
- MPI myocardial perfusion imaging
- CT-angiography may be used, however these have complications including radiation exposure, contrast agent sensitivity, and only add moderately to obstructive CAD identification.
- Coronary artery disease As one of the most influential complex diseases, has been increasingly investigated by GWAS in recent years and revealed 10.6% of the inherent cause by 46 common variations (Ehret, G. B. et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103-109, incorporated herein by reference) .
- CAD Coronary artery disease
- gut microbiota plays a crucial role on our health in many aspects, such as intaking energy from food, producing important metabolites, promoting the development and maturity of immune system, and protecting the host from pathogen infection et, al.
- the characteristics for most coronary artery disease are inflammation, oxidation and lipid metabolism, which might potentially correlate with the gut microbes and their metabolites.
- Embodiments of the present disclosure seek to solve at least one of the problems existing in the prior art to at least some extent.
- the present invention is based on the following findings by the inventors:
- CAD coronary artery disease
- MWAS Metagenome-Wide Association Study
- the inventors calculated probability of illness through a random forest model based on the 65 CAD-associated gut microbes and 4 optimized gut microbes.
- the inventors′ data provide insight into the characteristics of the gut metagenome related to CAD risk, a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders, and the potential usefulness for a gut-microbiota-based approach for assessment of individuals at risk of such disorders.
- a biomarker set for predicting a disease related to microbiota in a subject consisting of:
- a gut biomarker comprising at least one of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp.
- HGA1 Eubacterium limosum
- Gemella sanguinis Klebsiella pneumoniae
- Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
- Lactobacillus fermentum Lactobacillus salivarius
- Lactobacillus vaginalis Rothia mucilaginosa
- Ruminococcus gnavus Ruminococcus obeum
- Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
- Lactobacillus amylovorus Lactobacillus fermentum
- Lactobacillus salivarius Lactobacillus vaginalis
- Rothia mucilaginosa Ruminococcus gnavus
- Ruminococcus obeum Ruminococcus sp.
- 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
- 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
- the biomarker set consists of at least one of the species listed in Table 4, preferably at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%of the species listed in Table 4.
- Streptococcus oralis preferably, at least one of Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis.
- the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009 as stated in Table 5-1.
- biomarker set for predicting a disease related to microbiota in a subject consisting of:
- a gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
- the disease is coronary artery disease or related heart disease.
- kit for determining the gene marker set of any one of claims 1 to 4 comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth as below:
- the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
- kit for determining the gene marker set above-described comprising one or more probes designed according to the genes as set forth as below:
- the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
- CAD coronary artery disease
- CAD coronary artery disease
- the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
- the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
- HGA1 Eubacterium limosum
- Gemella sanguinis Klebsiella pneumoniae
- Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
- Lactobacillus fermentum Lactobacillus salivarius
- Lactobacillus vaginalis Rothia mucilaginosa
- Ruminococcus gnavus Ruminococcus obeum
- Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
- Lactobacillus amylovorus Lactobacillus fermentum
- Lactobacillus salivarius Lactobacillus vaginalis
- Rothia mucilaginosa Ruminococcus gnavus
- Ruminococcus obeum Ruminococcus sp.
- 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
- 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
- the training dataset is at least one of Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- CAD coronary artery disease
- CAD coronary artery disease
- CAD coronary artery disease
- the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
- the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
- HGA1 Eubacterium limosum
- Gemella sanguinis Klebsiella pneumoniae
- Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
- Lactobacillus fermentum Lactobacillus salivarius
- Lactobacillus vaginalis Rothia mucilaginosa
- Ruminococcus gnavus Ruminococcus obeum
- Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
- Lactobacillus amylovorus Lactobacillus fermentum
- Lactobacillus salivarius Lactobacillus vaginalis
- Rothia mucilaginosa Ruminococcus gnavus
- Ruminococcus obeum Ruminococcus sp.
- 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
- 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
- the training dataset is at least one of Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- CAD coronary artery disease
- a method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota comprising:
- the method comprises:
- CAD coronary artery disease
- the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
- the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
- HGA1 Eubacterium limosum
- Gemella sanguinis Klebsiella pneumoniae
- Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
- Lactobacillus fermentum Lactobacillus salivarius
- Lactobacillus vaginalis Rothia mucilaginosa
- Ruminococcus gnavus Ruminococcus obeum
- Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
- Lactobacillus amylovorus Lactobacillus fermentum
- Lactobacillus salivarius Lactobacillus vaginalis
- Rothia mucilaginosa Ruminococcus gnavus
- Ruminococcus obeum Ruminococcus sp.
- 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
- 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
- the training dataset is at least one of Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- CAD coronary artery disease
- the markers of the present invention are more specific and sensitive as compared with conventional markers.
- analysis of stool promises accuracy, safety, affordability, and patient compliance. And samples of stool are transportable.
- the present invention relates to an in vitro method, which is comfortable and noninvasive, so people will participate in a given screening program more easily.
- the markers of the present invention may also serve as tools for therapy monitoring in CAD patients to detect the response to therapy.
- Fig. 1 Density histogram showing the P-value distribution of all genes identified in the study cohorts.
- the horizon line represents the distribution of P-values under the null hypothesis.
- Fig. 2 The 65 most discriminant MLGs in the Random Forest model using 126 MLG markers.
- the bar length indicated the importance of variable (MLG species) .
- Fig. 3 Performance of 65 MLG Random Forest models. 165samples (case 88, control 77) were train set and other 86samples (case 29, control 57) were test set to validation with false negative rate 2/29 and false positive rate 12/57.
- Example 1 Identifying biomarkers for evaluating coronary artery disease risk
- ACCVD atherosclerotic cardiovascular disease
- DNA library construction was performed following the manufacturer ⁇ s instruction (Illumina) .
- the inventors used the same workflow as described previously to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridiza-tion of the sequencing primers.
- the inventors constructed one paired-end (PE) library with insert size of 350 bp for each sample, followed by a high-throughput sequencing to obtain around 30 million PE reads of length 2x100bp.
- High-quality reads were obtained by filtering low-quality reads with ambiguous ⁇ N′ bases, adapter contamination and human DNA contamination from the Illumina raw reads, and by trimming low-quality terminal bases of reads simultaneously.
- the inventors totally output about 4.77 Gb per sample of fecal micbiota sequencing data (high quality clean data) (Table 2) from 165 samples (88 cases and 77 controls) on Illumina HiSeq 2000 platform.
- Taxonomic assignment of genes was performed using an in-house pipeline which had described in the published T2D paper (Qin et al. 2012, supra) .
- IMG species and mOTU species profiles were aligned to the 4,653 reference genomes from IMG v400 (Markowitz, V. M. et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic acids research 40, D115-D122 (2012) , incorporated herein by reference) and to the 79268 sequences of mOTU reference (Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nature methods 10, 1196-1199 (2013) , incorporated herein by reference) with default parameters, respectively. 1290 IMG species (species that were shared among at least 10 subjects) and 560 species level mOTUs were identified.
- the inventors used the permutational multivariate analysis of variance (PERMANOVA) to assess the effect of 25 different characteristics, including CAD status, HDLC, CHOL, Gender, FBG, hypertension, APOB, Age, CREA , LDLC, HbA1c, APOA, TP, diabetes, ALB, TRIG, BMI, WHR, Lpa, HBDH, CKMB, AST, CK, ProBNP_E_, ALT, on gene profiles of 4.5M reference gene catalogue.
- the inventors performed the analysis using the method implemented in package ′′vegan′′ in R, and the permuted p-value was obtained by 10,000 times permutations.
- PERMANOA identified two significant factors associated with gut microbe (based on gene profiles) (q ⁇ 0.05, Table 3) .
- FDR false discovery rate
- Receiver Operator Characteristic (ROC) analysis The inventors applied the ROC analysis to assess the performance of the ACVD classification based on metagenomic markers. The inventors then used the “pROC” package in R to draw the ROC curve.
- ROC Receiver Operator Characteristic
- MLG metagenomic linkage group
- the inventors used the 438,750 gene markers to built the metagenomic linkage group (MLG) using the same method described in the published T2D paper (Qin et al. 2012, supra) . All the 438,750 genes were annotated by aligning these genes to the 4,653 reference genomes in IMG v400. An MLG was assigned to a genome if more than 50% constitutive genes were annotated to that genome, otherwise it was termed as unclassified. Total 136 MLG genomes with gene number>550 were selected, these MLG genomes which belonging to a same species were grouped to construct MLG species, and finally the inventors obtained 127 MLGs species.
- MLG metagenomic linkage group
- the inventors performed Wilcoxon rank-sum test to the 127 MLGs species with Benjamini-Hochberg adjustment, and 126 MLGs were selected out as ACVD-associated MLGs with q ⁇ 0.05. To estimate the relative abundance of an MLG species, the inventors estimated the average abundance of the genes of the MLG species, after removing the 5% lowest and 5% highest abundant genes (Qin et al. 2012, supra) .
- MLG metagenomic linkage groups
- Qin et al. 2012, supra the distribution and the occurrence rate (Qin et al. 2012, supra) of 438, 750 genes, 94.8% of the significant genes (P-value ⁇ 0.01) were included into MLGs.
- 136 MLGs (each>550 genes, >50% coverage and q ⁇ 0.05) were annotated to NCBI database, and MLGs from same species were grouped to get 126 MLG species.
- MLG species marker identification To identify 126 MLG species makers, the inventors used “randomForest 4.5-36” package in R vision 2.10 based on the 126 ACVD associated MLG species. Firstly, the inventors sorted all the 126 MLG species by the importance given by the “randomForest” method (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) . MLG marker sets were constructed by creating incremental subsets of the top ranked MLG species, starting from 5 MLG species and ending at all 126 MLG species. For each MLG makers set, the inventors calculated the false predication ratio in our 165 Chinese cohorts.
- the 65 MLG species sets with lowest false prediction ratio were selected out as MLG species makers (Fig. 2, Table 4 and Table 5-1 ⁇ 5-2) with false negative (FN) rate 6.81% (6/88) and false positive (FP) rate 3.89% (3/77) (Fig. 3, Trainset) .
- the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the selected MLG species markers (Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5) and calculate the area under the ROC curve (AUC) was 98.17% (95% CI: 96.6%-99.74%) using R package “pROC” (Fig. 4) .
- Most of the case enriched MLG species (totally 51) were opportunistic pathogens from Streptococcus (9/11 MLG species were oral pathogens) , Clostridium (6 MLG species) , Ruminococcus (4 MLG species) and Lactobacillus (4 MLG species) .
- Rothia mucilaginosa naturally inhabits in the oral cavity and upper respiratory tract and is increasingly recognized as an emerging opportunistic pathogen associated with prosthetic device infections and Endocarditis.
- Gemella sanguinis could strengthen the inflammation in immunodeficiency patients.
- the Akkermansia muciniphila was also enriched in CAD patients.
- IMG species and mOTU species markers identification based on the IMG species and mOTU species profiles, the inventors identified the ACVD associated IMG species and mOTU species with q ⁇ 0.05 (Wilcoxon rank-sum test with Benjamini-Hochberg adjustment) . Subsequently, IMG species markers and mOTU species markers were selecting using the random forest approach as in MLG species markers selection.
- the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the 4 microbes from Streptococcus (Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis) as a biomarker (Table 9) and calculated the area under the ROC curve was 85.86% (95% CI: 80.24%-91.48%) using R package “pROC” (Fig. 5) .
- the false negative (FN) rate was 28.40% (25/88) and false positive (FP) rate was 20.77% (16/77) .
- mlg_id X104 93250 ⁇ 97356 4107 mlg_id: X124 97357 ⁇ 100151 2795 mlg_id: X79 100152 ⁇ 103376 3225 mlg_id: X13 103377 ⁇ 104672 1296 mlg_id: X96 104673 ⁇ 106081 1409 mlg_id: X105 106082 ⁇ 106998 917 mlg_id: X6 106999 ⁇ 107917 919 mlg_id: X85 107918 ⁇ 109895 1978 mlg_id: X3 109896 ⁇ 110770 875 mlg_id: X94 110771 ⁇ 114337 3567 mlg_id: X111 114338 ⁇ 115109 772 mlg_id: X8 115110 ⁇ 116480 1371 mlg_id: X98 116481 ⁇ 119348 2868
- the inventors used another new independent study group, including 29 case samples and 57 control samples that were used as test set (Table 10) and also collected in Guangdong Provincial People′s Hospital.
- DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1.
- the inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
- training dataset is a matrix, each row represents MLG; each column represents samples; each cell represents relative abundance profile of a MLG in a sample; sample disease status of training sample in Example 1 is a vectot, 1 for CAD, 0 for control) , and a testset (just the MLG relative abundance profile of the test set) .
- testset just the MLG relative abundance profile of the test set.
- the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the testset.
- Output is matrix containing the prediction results (the first column “0” is probability of health; the second column “1” is probability of CAD; cutoff is 0.5 and if the probability of CAD ⁇ 0.5, the subject is at risk of CAD)
- the inventors used the 65 selected MLGs to redo random forest and then probability of illness was calculated (Table 11, Fig. 3 Testset) .
- Streptococcus Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis
- FN false negative
- FP false positive
- the area under the ROC curve was 81.94% (95% CI: 72.98%-90.9%) in test set.
- the inventors have identified and validated 65 CAD-associated gut microbes and 4 optimized gut microbes by a random forest model based on CAD-associated genes markers. And the inventors have constructed a method to evaluate the risk of CAD disease based on these 65 CAD-associated gut microbes and 4 optimized gut microbes.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provided are biomarkers and methods for predicting the risk of a disease related to microbes, in particular coronary artery disease (CAD) or related heart diseases.
Description
CROSS-REFERENCE TO RELATED APPLICATION
None
The present invention relates to biomarkers and methods for predicting the risk of a disease related to microbes, in particular coronary artery disease (CAD) or related heart diseases.
Coronary artery disease (CAD) refers to any abnormal condition of the coronary arteries that interferes with the delivery of an adequate supply of blood to the cardiac (i. e. , heart) muscle or any portion thereof. Typically, CAD is caused by the accumulation of plaque on the arterial walls (i. e. , atherosclerosis) , particularly in the large and medium-sized arteries serving the heart. These conditions have similar causes, mechanisms, and treatments. CAD represents the leading cause of death and morbidity worldwide. Early diagnosis of CAD will help to not only prevent mortality, but also reduce the costs for surgical intervention.
The “gold standard” for detecting CAD is invasive coronary angiography. However, this is costly, and can pose risk to the patient. Prior to angiography, non-invasive diagnostic modalities such as myocardial perfusion imaging (MPI) and CT-angiography may be used, however these have complications including radiation exposure, contrast agent sensitivity, and only add moderately to obstructive CAD identification.
Current knowledge indicates the genetic, environmental factors and their interactions collaboratively induce complex phenotype and many diseases. Coronary artery disease (CAD) , as one of the most influential complex diseases, has been increasingly investigated by GWAS in recent years and revealed 10.6% of the inherent cause by 46 common variations (Ehret, G. B. et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103-109, incorporated herein by reference) . However, our knowledge on the effect of environmental factor like gut microbes and the contribution of genes and microbes to disease still need further.
Our “forgotten organ” , gut microbiota, plays a crucial role on our health in many aspects,
such as intaking energy from food, producing important metabolites, promoting the development and maturity of immune system, and protecting the host from pathogen infection et, al. Recent studies suggested the flora dysbiosis, chronic inflammatory and metabolic abnormity exist in the intestine of some metabolic diseases like diabetes and obesity. The characteristics for most coronary artery disease are inflammation, oxidation and lipid metabolism, which might potentially correlate with the gut microbes and their metabolites. A recent research indicates gut microbes could metabolize the red meat ingredients (L-carnitine, phosphatidyl-choline, cholesterol) into TMA, which would be further oxidized into TMAO in the liver to arise the oxidization reaction in blood vessel to lead inflammatory and lipid deposition, ultimately resulting in atherosclerosis and coronary heart disease. Meanwhile, compared with healthy subjects, the symptomatic atherosclerosis patients gut microbiota exhibits obvious abnormality (Koeth, R. A. et al. Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nature medicine 19, 576-585, incorporated herein by reference) . These study suggested the dysbiosis of gut microbes may strongly influenced the pathogenesis of coronary artery disease by inducing the human metabolic abnormality. However, the characters of gut flora dysbiosis in atherosclerosis induced pathogenesis of coronary artery disease patients and its impact on metabolic system are still puzzling.
SUMMARY
Embodiments of the present disclosure seek to solve at least one of the problems existing in the prior art to at least some extent.
The present invention is based on the following findings by the inventors:
Assessment and characterization of gut microbiota has become a major research area in human disease, including coronary artery disease (CAD) . To carry out analysis on gut microbial content in CAD patients, the inventors carried out a protocol for a Metagenome-Wide Association Study (MGWAS) (Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012) , incorporated herein by reference) based on deep shotgun sequencing of the gut microbial DNA from 165 individuals. The inventors identified and validated 65 CAD-associated gut microbes and 4 optimized gut microbes. To exploit the potential ability of CAD classification by gut microbiota, the inventors calculated probability of illness through a random forest model based on the 65 CAD-associated gut microbes and 4 optimized gut microbes.
The inventors′ data provide insight into the characteristics of the gut metagenome related to CAD risk, a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders, and the potential usefulness for a gut-microbiota-based approach for assessment of individuals at risk of such disorders.
In one aspect of present disclosure, there is provided with a biomarker set for predicting a disease related to microbiota in a subject consisting of:
a gut biomarker comprising at least one of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544, or microbes with genomic DNA comprising at least a partial sequence of SEQ ID NO: 1 to 122009, alternatively, the biomarker set consists of at least one of the species listed in Table 4, preferably at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%of the species listed in Table 4.
preferably, at least one of Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis.
According to embodiments of present disclosure, the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009 as stated in Table 5-1.
In another aspect of present disclosure, there is provided with a biomarker set for predicting a
disease related to microbiota in a subject consisting of:
a gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the disease is coronary artery disease or related heart disease.
In another aspect of present disclosure, there is provided with a kit for determining the gene marker set of any one of claims 1 to 4, comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth as below:
the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
In another aspect of present disclosure, there is provided with a kit for determining the gene marker set above-described , comprising one or more probes designed according to the genes as set forth as below:
the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
In another aspect of present disclosure, there is provided with use of the gene marker set described above for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set described above in the samples obtained in step (1) ;
(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing
samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the training dataset is at least one of Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
In another aspect of present disclosure, there is provided with use of the gene marker set described above for preparation of a kit for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set described above in the samples obtained in step (1) ;
(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has
or is at the risk of developing the coronary artery disease (CAD) or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the training dataset is at least one of Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
In another aspect of present disclosure, there is provided with a method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota, comprising:
determining the relative abundance of the biomarkers described above in a sample from the subject, and
determining whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota based on the relative abundance.
According to embodiments of present disclosure, the method comprises:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set described above in the samples obtained in step (1) ;
(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus
infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the training dataset is at least one of Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
It is believed that 65 CAD-associated gut microbes and 4 optimized gut microbes are valuable for increasing CAD detection at earlier stages due to the following. First, the markers of the present invention are more specific and sensitive as compared with conventional markers. Second, analysis of stool promises accuracy, safety, affordability, and patient compliance. And samples of stool are transportable. Thus, the present invention relates to an in vitro method, which is comfortable and noninvasive, so people will participate in a given screening program more easily. Third, the markers of the present invention may also serve as tools for therapy monitoring in CAD patients to detect the response to therapy.
BRIEF DISCRIPTION OF DRAWINGS
These and other aspects and advantages of the present disclosure will become apparent and more readily appreciated from the following descriptions taken in conjunction with the drawings, in which:
Fig. 1 Density histogram showing the P-value distribution of all genes identified in the study cohorts. The horizon line represents the distribution of P-values under the null hypothesis.
Fig. 2 The 65 most discriminant MLGs in the Random Forest model using 126 MLG markers.
The bar length indicated the importance of variable (MLG species) .
Fig. 3 Performance of 65 MLG Random Forest models. 165samples (case 88, control 77) were train set and other 86samples (case 29, control 57) were test set to validation with false negative rate 2/29 and false positive rate 12/57.
Fig. 4 Identification of ACVD-associated markers from gut metagenome. Performance of 65 MLG Random Forest models, 165 samples (88 cases and 77 controls) were applied as the training sets (AUC=98.17%) . The area between the two outside curves represents the 95% CI shape.
Fig. 5 Identification of ACVD-associated markers from gut metagenome. Performance of 4 MLG Random Forest models, 165 samples (88 cases and 77 controls) were applied as the training sets (AUC=85.86%) . The area between the two outside curves represents the 95% CI shape.
EXAMPLES
Terms used herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a” , “an” and “the” are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.
The present invention is further exemplified in the following non-limiting Examples. Unless otherwise stated, parts and percentages are by weight and degrees are Celsius. As apparent to one of ordinary skill in the art, these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only, and the agents were all commercially available.
Example 1. Identifying biomarkers for evaluating coronary artery disease risk
1.1 Sample collection
Fecal samples from 165 south Chinese subjects, including 88 atherosclerotic cardiovascular disease (ACVD) patients and 77 control subjects (training set, Table 1) , were collected by Guangdong Provincial People′s Hospital in 2011. ACVD patients were diagnosed and categorized according to pathological features (coronary angiography) . Subjects were asked to collect fresh feces samples at hospital. Collected samples were put in sterile tubes and stored at -80℃immediately until further analysis.
The complete ethical approval has been obtained, and all the patients gave written informed consent. The study was approved by the Institutional Review Board of Guangdong General
Hospital.
Table 1 Baseline characteristics of atherosclerotic cardiovascular disease (ACVD) cases and controls. Fourth column reports results from Wilcoxon rank-sum tests.
NOTE: For the information of gender, one of the 88 patients’ was unknown and two of the 77 controls’ were unknown.
1.2 DNA extraction
Fecal samples were thawed on ice and DNA extraction was performed using the Qiagen QIAamp DNA Stool Mini Kit (Qiagen) according to manufacturer`s instructions. Extracts were treated with DNase-free RNase to eliminate RNA contamination. DNA quantity was determined using NanoDrop spectrophotometer, Qubit Fluorometer (with the Quant-iTTMdsDNA BR Assay Kit) and gel electrophoresis.
1.3 DNA library construction and sequencing of fecal samples
DNA library construction was performed following the manufacturer`s instruction (Illumina) . The inventors used the same workflow as described previously to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridiza-tion of the sequencing primers. The inventors constructed one paired-end (PE) library with insert size of 350 bp for each sample, followed by a high-throughput sequencing to obtain around 30 million PE reads of length 2x100bp. High-quality reads were obtained by filtering low-quality reads with ambiguous `N′ bases, adapter contamination and human DNA contamination from the Illumina raw reads, and by trimming low-quality terminal bases of reads simultaneously.
The inventors totally output about 4.77 Gb per sample of fecal micbiota sequencing data (high quality clean data) (Table 2) from 165 samples (88 cases and 77 controls) on Illumina HiSeq 2000 platform.
Table 2 Summary of metagenomic data. Fourth column reports results from Wilcoxon rank-sum tests.
Parameter | Controls | Cases | P-value |
Average raw bases (G) | 4.85 | 4.92 | 0.831 |
After removing low quality bases | 4.76 (98.14%) | 4.79 (97.36%) | |
After removing human reads | 4.73 (97.53%) | 4.78 (97.15%) | 0.874 |
1.4 Metagenomic data processing and analysis
1.4.1 Gene catalogue construction
Gene catalogue construction. Employing the same parameters that were used to construct the type 2 diabetes gene catalogue (Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012) , incorporated herein by reference) , the inventors performed de novo assembly and gene prediction for the high quality reads of 165 samples using SOAPdenovo v1.06 (Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20, 265-272, doi: 10.1101/gr. 097261.109 (2009) , incorporated herein by reference) and GeneMark v2.7 (Zhu, W. , Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic acids research 38, e132, doi: 10.1093/nar/gkq275 (2010) , incorporated herein by reference) , respectively. All predicted genes were aligned pairwise using BLAT and genes, of which over 90% of their length can be aligned to another one with more than 95% identity (no gaps allowed) , were removed as redundancies, resulting in a non-redundant gene catalogue comprising of 4,537,046 genes (4.5 M gene catalogue) .
Taxonomic assignment of genes. Taxonomic assignment of the predicted genes was performed using an in-house pipeline which had described in the published T2D paper (Qin et al. 2012, supra) .
1.4.2 Data profile construction
Gene profile. These 4,537,046 genes and their associated measures of relative abundance in 165 samples were used to establish the gene profile for the association study (The inventors use the same method described in the published T2D paper (Qin et al. 2012, supra) to compute the relative gene abundance. ) .
IMG species and mOTU species profiles. Toatal fecal clean reads were aligned to the 4,653 reference genomes from IMG v400 (Markowitz, V. M. et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic acids research 40, D115-D122 (2012) , incorporated herein by reference) and to the 79268 sequences of mOTU reference (Sunagawa, S.
et al. Metagenomic species profiling using universal phylogenetic marker genes. Nature methods 10, 1196-1199 (2013) , incorporated herein by reference) with default parameters, respectively. 1290 IMG species (species that were shared among at least 10 subjects) and 560 species level mOTUs were identified.
1.4.3 Analysis of factors influencing gut microbiota gene profile. The inventors used the permutational multivariate analysis of variance (PERMANOVA) to assess the effect of 25 different characteristics, including CAD status, HDLC, CHOL, Gender, FBG, hypertension, APOB, Age, CREA , LDLC, HbA1c, APOA, TP, diabetes, ALB, TRIG, BMI, WHR, Lpa, HBDH, CKMB, AST, CK, ProBNP_E_, ALT, on gene profiles of 4.5M reference gene catalogue. The inventors performed the analysis using the method implemented in package ″vegan″ in R, and the permuted p-value was obtained by 10,000 times permutations. The inventors also corrected for multiple testing using ″p. adjust″ in R with Benjamini-Hochberg method to get the q-value for each test. PERMANOA identified two significant factors associated with gut microbe (based on gene profiles) (q<0.05, Table 3) . The analysis indicated CAD and HDLC status were both the strongest associated markers, supporting the diseases status was the major determinant influencing the composition of gut microbiota. Gender, age and some CAD clinical indices like CHOL, FGB, hypertension and APOB, were also significant factors.
Table 3 PERMANOVA based on euclidean distance analysis of gene profile. The analysis was conducted to test whether clinical parameters, and ACVD status have significant impact on the gut microbiota with q-value<0.05.
1.4.4 Identification of ACVD associated markers
Identification of ACVD associated genes. To identify the association between the metagenomic profile and ACVD, a two-tailed Wilcoxon rank-sum test was used in 2.1M high occurrence gene (genes that were present in less than 10 samples across all 165 samples were removed) profiles. 438,750 gene markers (20.48% of 2.1M genes) were obtained, which were enriched in either case or control with p-value<0.01, FDR=2.23% (Fig. 1) .
Estimating the false discovery rate (FDR) . Instead of a sequential p-value rejection method, the inventors applied the “q-value” method proposed in a previous study to estimate the FDR (Storey, J. D. A direct approach to false discovery rates. Journal of the Royal Statistical Society 64,
479-498 (2002) , incorporated herein by reference) .
Receiver Operator Characteristic (ROC) analysis. The inventors applied the ROC analysis to assess the performance of the ACVD classification based on metagenomic markers. The inventors then used the “pROC” package in R to draw the ROC curve.
1.4.5 Contruction of MLG and identification of ACVD associated MLG species markers
126 MLG species based on the 438,750 ACVD associated maker gene profile. The inventors used the 438,750 gene markers to built the metagenomic linkage group (MLG) using the same method described in the published T2D paper (Qin et al. 2012, supra) . All the 438,750 genes were annotated by aligning these genes to the 4,653 reference genomes in IMG v400. An MLG was assigned to a genome if more than 50% constitutive genes were annotated to that genome, otherwise it was termed as unclassified. Total 136 MLG genomes with gene number>550 were selected, these MLG genomes which belonging to a same species were grouped to construct MLG species, and finally the inventors obtained 127 MLGs species. The inventors performed Wilcoxon rank-sum test to the 127 MLGs species with Benjamini-Hochberg adjustment, and 126 MLGs were selected out as ACVD-associated MLGs with q<0.05. To estimate the relative abundance of an MLG species, the inventors estimated the average abundance of the genes of the MLG species, after removing the 5% lowest and 5% highest abundant genes (Qin et al. 2012, supra) .
In total, The inventors built 136 metagenomic linkage groups (MLG>550 genes) based on the distribution and the occurrence rate (Qin et al. 2012, supra) of 438, 750 genes, 94.8% of the significant genes (P-value<0.01) were included into MLGs. 136 MLGs (each>550 genes, >50% coverage and q<0.05) were annotated to NCBI database, and MLGs from same species were grouped to get 126 MLG species.
65 MLG species marker identification. To identify 126 MLG species makers, the inventors used “randomForest 4.5-36” package in R vision 2.10 based on the 126 ACVD associated MLG species. Firstly, the inventors sorted all the 126 MLG species by the importance given by the “randomForest” method (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) . MLG marker sets were constructed by creating incremental subsets of the top ranked MLG species, starting from 5 MLG species and ending at all 126 MLG species. For each MLG makers set, the inventors calculated the false predication ratio in our 165 Chinese cohorts. Finally, the 65 MLG species sets with lowest false prediction ratio were selected out as MLG species makers (Fig. 2, Table 4 and
Table 5-1、 5-2) with false negative (FN) rate 6.81% (6/88) and false positive (FP) rate 3.89% (3/77) (Fig. 3, Trainset) . Furthermore, the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the selected MLG species markers (Table 6-1、 6-2、 6-3、 6-4、 6-5) and calculate the area under the ROC curve (AUC) was 98.17% (95% CI: 96.6%-99.74%) using R package “pROC” (Fig. 4) .
Among the 65 MLG species, the control enriched MLG species Bacteroides uniformis (q=4.21E-11) , Bacteroides vulgatus (q=1.80E-09) and Clostridiales-sp. -SS3/4 (q=1.68E-08) , were known of SCFAs producing bacterial. Most of the case enriched MLG species (totally 51) were opportunistic pathogens from Streptococcus (9/11 MLG species were oral pathogens) , Clostridium (6 MLG species) , Ruminococcus (4 MLG species) and Lactobacillus (4 MLG species) . Rothia mucilaginosa naturally inhabits in the oral cavity and upper respiratory tract and is increasingly recognized as an emerging opportunistic pathogen associated with prosthetic device infections and Endocarditis. The Clostridium bolteae, isolated from human fecal material, blood and intra-abdominal abscess, were Gram-positive pathogens and could produce some toxins including neurotoxin, it encountered in clinically significant infections in humans, and the mean counts of which in autistic children were 46-fold (P-value=0.01) greater than those in control children. Gemella sanguinis could strengthen the inflammation in immunodeficiency patients. The Akkermansia muciniphila was also enriched in CAD patients.
1.4.6 Identification of ACVD associated IMG species and mOTU species. IMG species and mOTU species markers identification based on the IMG species and mOTU species profiles, the inventors identified the ACVD associated IMG species and mOTU species with q<0.05 (Wilcoxon rank-sum test with Benjamini-Hochberg adjustment) . Subsequently, IMG species markers and mOTU species markers were selecting using the random forest approach as in MLG species markers selection.
65 IMG species with ROC 98.52% and 15 mOTUs species with ROC 96.16% were also clearly separate CAD patients from healthy subjects (q<0.05; see Table 7, 8) by Wilcoxon rank-sum test and random forest selection. Through overlapping with the 65 MLG markers, the inventors found the oral original pathogens including Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis and Akkermansia muciniphila were significantly distributed in cases.
The inventors drew the ROC curve using the OOB (out of bag) prediction probability of
illness from randomForest model based on the 4 microbes from Streptococcus (Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis) as a biomarker (Table 9) and calculated the area under the ROC curve was 85.86% (95% CI: 80.24%-91.48%) using R package “pROC” (Fig. 5) . The false negative (FN) rate was 28.40% (25/88) and false positive (FP) rate was 20.77% (16/77) .
Table 5-1. SEQ ID of the 65 MLG species
MLG ID | SEQ ID NO: | genes number |
mlg_id: X127 | 1~2248 | 2248 |
mlg_id: X21 | 2249~2862 | 614 |
mlg_id: X63 | 2863~6837 | 3975 |
mlg_id: X118 | 6838~7608 | 771 |
mlg_id: X102 | 7609~9275 | 1667 |
mlg_id: X26 | 9276~10648 | 1373 |
mlg_id: X80 | 10649~13331 | 2683 |
mlg_id: X72 | 13332~15672 | 2341 |
mlg_id: X125 | 15673~18620 | 2948 |
mlg_id: X44 | 18621~19520 | 900 |
mlg_id: X74 | 19521~20240 | 720 |
mlg_id: X84 | 20241~20801 | 561 |
mlg_id: X95 | 20802~23660 | 2859 |
mlg_id: X108 | 23661~26771 | 3111 |
mlg_id: X115 | 26772~30163 | 3392 |
mlg_id: X92 | 30164~32611 | 2448 |
mlg_id: X109 | 32612~35767 | 3156 |
mlg_id: X103 | 35768~38880 | 3113 |
mlg_id: X10 | 38881~39514 | 634 |
mlg_id: X11 | 39515~40260 | 746 |
mlg_id: X91 | 40261~41635 | 1375 |
mlg_id: X48 | 41636~42428 | 793 |
mlg_id: X107 | 42429~43644 | 1216 |
mlg_id: X93 | 43645~45587 | 1943 |
mlg_id: X77 | 45588~46386 | 799 |
mlg_id: X65 | 46387~48007 | 1621 |
mlg_id: X123 | 48008~50806 | 2799 |
mlg_id: X50 | 50807~51375 | 569 |
mlg_id: X39 | 51376~52814 | 1439 |
mlg_id: X64 | 52815~59026 | 6212 |
mlg_id: X114 | 59027~60592 | 1566 |
mlg_id: X101 | 60593~63979 | 3387 |
mlg_id: X86 | 63980~64989 | 1010 |
mlg_id: X76 | 64990~67004 | 2015 |
mlg_id: X70 | 67005~68492 | 1488 |
mlg_id: X68 | 68493~70925 | 2433 |
mlg_id: X17 | 70926~72312 | 1387 |
mlg_id: X2 | 72313~72990 | 678 |
mlg_id: X1 | 72991~75225 | 2235 |
mlg_id: X88 | 75226~76581 | 1356 |
mlg_id: X116 | 76582~79319 | 2738 |
mlg_id: X82 | 79320~81414 | 2095 |
mlg_id: X25 | 81415~83112 | 1698 |
mlg_id: X28 | 83113~85682 | 2570 |
mlg_id: X75 | 85683~87229 | 1547 |
mlg_id: X40 | 87230~88951 | 1722 |
mlg_id: X83 | 88952~90306 | 1355 |
mlg_id: X59 | 90307~91116 | 810 |
mlg_id: X69 | 91117~92589 | 1473 |
mlg_id: X24 | 92590~93249 | 660 |
mlg_id: X104 | 93250~97356 | 4107 |
mlg_id: X124 | 97357~100151 | 2795 |
mlg_id: X79 | 100152~103376 | 3225 |
mlg_id: X13 | 103377~104672 | 1296 |
mlg_id: X96 | 104673~106081 | 1409 |
mlg_id: X105 | 106082~106998 | 917 |
mlg_id: X6 | 106999~107917 | 919 |
mlg_id: X85 | 107918~109895 | 1978 |
mlg_id: X3 | 109896~110770 | 875 |
mlg_id: X94 | 110771~114337 | 3567 |
mlg_id: X111 | 114338~115109 | 772 |
mlg_id: X8 | 115110~116480 | 1371 |
mlg_id: X98 | 116481~119348 | 2868 |
mlg_id: X4 | 119349~120420 | 1072 |
mlg_id: X5 | 120421~122009 | 1589 |
Table 5-2. SEQ ID of the 4 MLG species
MLG ID | SEQ ID NO: | genes number |
mlg_id: |
1~1356 | 1356 |
mlg_id: X68 | 1357~3789 | 2433 |
mlg_id: X96 | 3790~5198 | 1409 |
mlg_id: X82 | 5199~7293 | 2095 |
Table 9 4 MLGs relative abundance profiles in165samples
Example 2. Validating the biomarkers in another 86 individuals
For validating the discriminatory power of the biomarkers, namely the 65 selected MLGs and 4 microbes from Streptococcus, the inventors used another new independent study group, including 29 case samples and 57 control samples that were used as test set (Table 10) and also collected in Guangdong Provincial People′s Hospital.
Table 10. Sample information
Group | case | control | total number |
Test set | 29 | 57 | 86 |
For each sample, DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1. The inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
About the randomForest model, using “randomForest 4.5-36” package in R vision 2.10, input is a training dataset (namely Table 6-1、 6-2、 6-3、 6-4、 6-5 or Table 9 respcetively) , sample disease status (training dataset is a matrix, each row represents MLG; each column represents samples; each cell represents relative abundance profile of a MLG in a sample; sample disease status of training sample in Example 1 is a vectot, 1 for CAD, 0 for control) , and a testset (just the MLG
relative abundance profile of the test set) . Then the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the testset. Output is matrix containing the prediction results (the first column “0” is probability of health; the second column “1” is probability of CAD; cutoff is 0.5 and if the probability of CAD≥0.5, the subject is at risk of CAD)
The inventors used the 65 selected MLGs to redo random forest and then probability of illness was calculated (Table 11, Fig. 3 Testset) . The model was tested on the test set (n=86, 29 case samples and 57 control samples) and prediction error was calculated. False negative (FN) rate was 6.89% (2/29) and false positive (FP) rate was 21.05% (12/57) , and the area under the ROC curve was 94.34% (95% CI: 89.86%-98.83%) .
Furthermore, the inventors used 4 microbes from Streptococcus (Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis) as a biomarker to test the power in separation CAD patients and controls ( (Table 11) , founding that false negative (FN) rate was 17.24% (5/29) and false positive (FP) rate was 35.08% (20/57) , and the area under the ROC curve was 81.94% (95% CI: 72.98%-90.9%) in test set.
Table 11 Prediction results of 65 MLGs and 4 MLGs
Thus the inventors have identified and validated 65 CAD-associated gut microbes and 4 optimized gut microbes by a random forest model based on CAD-associated genes markers. And the inventors have constructed a method to evaluate the risk of CAD disease based on these 65 CAD-associated gut microbes and 4 optimized gut microbes.
Although explanatory embodiments have been shown and described, it would be appreciated by those skilled in the art that the above embodiments can not be construed to limit the present disclosure, and changes, alternatives, and modifications can be made in the embodiments without departing from spirit, principles and scope of the present disclosure.
Claims (22)
- A biomarker set for predicting a disease related to microbiota in a subject consisting of:Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544.
- The biomarker set for predicting a disease related to microbiota in a subject according to claim 1, comprising at least a partial sequence of SEQ ID NO: 1 to 122009.
- A biomarker set for predicting a disease related to microbiota in a subject consisting of:a gut biomarker comprises at least a partial sequence of SEQ ID NO: 1 to 122009.
- The biomarker set for predicting a disease related to microbiota in a subject, wherein the disease is coronary artery disease or related heart disease.
- A kit for determining the gene marker set of any one of claims 1 to 4, comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth in claim 3.
- A kit for determining the gene marker set of any one of claims 1 to 4, comprising one or more probes designed according to the genes as set forth in claim 3.
- Use of the gene marker set of any one of claims 1 to 4 for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:(1) collecting a sample from the subject to be tested;(2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 4 in the samples obtained in step (1) ;(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- The use of claim 7, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
- The use of claim8, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 4, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
- The use of claim 8, wherein the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
- The use of claim 8, wherein the training dataset is Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- Use of the gene marker set of any one of claims 1 to 4 for preparation of a kit for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:(1) collecting a sample from the subject to be tested;(2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 4 in the samples obtained in step (1) ;(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- The use of claim 12, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subject susing a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
- The use of claim 13, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 4, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
- The use of claim 13, wherein the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
- The use of claim 13, wherein the training dataset is Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- A method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota, comprising:determining the relative abundance of the biomarkers of any one of claims 1 to 4 in a sample from the subject, anddetermining whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota based on the relative abundance.
- The method according to claim 17, comprising:(1) collecting a sample from the subject to be tested;(2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 4 in the samples obtained in step (1) ;(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker subject to be tested with a training dataset using a Multivariate statistical model,wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
- The method of claim 18, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
- The method of claim 19, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 4, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
- The method of claim 19, wherein the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
- The method of claim 19, wherein the training dataset is Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/088046 WO2016049920A1 (en) | 2014-09-30 | 2014-09-30 | Biomarkers for coronary artery disease |
CN201480082463.5A CN107075563B (en) | 2014-09-30 | 2014-09-30 | Biomarkers for coronary artery disease |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/088046 WO2016049920A1 (en) | 2014-09-30 | 2014-09-30 | Biomarkers for coronary artery disease |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016049920A1 true WO2016049920A1 (en) | 2016-04-07 |
Family
ID=55629345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/088046 WO2016049920A1 (en) | 2014-09-30 | 2014-09-30 | Biomarkers for coronary artery disease |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107075563B (en) |
WO (1) | WO2016049920A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10357521B2 (en) | 2015-05-14 | 2019-07-23 | University Of Puerto Rico | Methods for restoring microbiota of newborns |
CN110396537A (en) * | 2018-04-24 | 2019-11-01 | 深圳华大生命科学研究院 | Asthma biomarker and application thereof |
CN110914453A (en) * | 2017-07-31 | 2020-03-24 | 深圳华大生命科学研究院 | Biomarkers for atherosclerotic cardiovascular disease |
US11564667B2 (en) | 2015-12-28 | 2023-01-31 | New York University | Device and method of restoring microbiota of newborns |
CN115851910A (en) * | 2022-11-23 | 2023-03-28 | 湖州市中心医院 | Marker, system and application for diagnosing or predicting coronary heart disease |
WO2023049842A1 (en) * | 2021-09-23 | 2023-03-30 | Flagship Pioneering Innovations Vi, Llc | Diagnosis and treatment of diseases and conditions of the intestinal tract |
WO2023064923A3 (en) * | 2021-10-15 | 2023-06-29 | Mammoth Biosciences, Inc. | Fusion effector proteins and uses thereof |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019051678A1 (en) * | 2017-09-13 | 2019-03-21 | Bgi Shenzhen | Biomarker for atherosclerotic cardiovascular diseases |
WO2019071516A1 (en) * | 2017-10-12 | 2019-04-18 | Perfect (China) Co., Ltd. | Biomarkers for chronic hepatitis b and use thereof |
WO2019205188A1 (en) * | 2018-04-24 | 2019-10-31 | 深圳华大生命科学研究院 | Biomarker for depression and use thereof |
CN110396538B (en) * | 2018-04-24 | 2023-05-23 | 深圳华大生命科学研究院 | Migraine biomarkers and uses thereof |
CN110872632A (en) * | 2018-08-30 | 2020-03-10 | 深圳华大生命科学研究院 | Specific gene sequence of streptococcus pharyngolaris, detection primer and application thereof |
CN111004735A (en) * | 2019-03-21 | 2020-04-14 | 江南大学 | Lactobacillus fermentum and application thereof in improving intestinal health |
CN112710722A (en) * | 2019-10-26 | 2021-04-27 | 复旦大学 | Machine learning-based biomarker dimension expansion screening method |
CN112509701A (en) * | 2021-02-05 | 2021-03-16 | 中国医学科学院阜外医院 | Risk prediction method and device for acute coronary syndrome |
WO2022166934A1 (en) * | 2021-02-05 | 2022-08-11 | 中国医学科学院阜外医院 | Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof |
CN112509700A (en) * | 2021-02-05 | 2021-03-16 | 中国医学科学院阜外医院 | Stable coronary heart disease risk prediction method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007137865A1 (en) * | 2006-06-01 | 2007-12-06 | University Of Zürich | The use of mrp 8/14 levels for discrimination of individuals at risk of acute coronary syndromes |
WO2008132763A2 (en) * | 2007-04-30 | 2008-11-06 | Decode Genetics Ehf | Genetic variants useful for risk assessment of coronary artery disease and myocardial infarction |
WO2014019271A1 (en) * | 2012-08-01 | 2014-02-06 | Bgi Shenzhen | Biomarkers for diabetes and usages thereof |
WO2014053608A1 (en) * | 2012-10-03 | 2014-04-10 | Metabogen Ab | Identification of a person having risk for atherosclerosis and associated diseases by the person's gut microbiome and the prevention of such diseases |
WO2014060538A1 (en) * | 2012-10-17 | 2014-04-24 | Institut National De La Recherche Agronomique | Determination of reduced gut bacterial diversity |
-
2014
- 2014-09-30 CN CN201480082463.5A patent/CN107075563B/en active Active
- 2014-09-30 WO PCT/CN2014/088046 patent/WO2016049920A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007137865A1 (en) * | 2006-06-01 | 2007-12-06 | University Of Zürich | The use of mrp 8/14 levels for discrimination of individuals at risk of acute coronary syndromes |
WO2008132763A2 (en) * | 2007-04-30 | 2008-11-06 | Decode Genetics Ehf | Genetic variants useful for risk assessment of coronary artery disease and myocardial infarction |
WO2014019271A1 (en) * | 2012-08-01 | 2014-02-06 | Bgi Shenzhen | Biomarkers for diabetes and usages thereof |
WO2014053608A1 (en) * | 2012-10-03 | 2014-04-10 | Metabogen Ab | Identification of a person having risk for atherosclerosis and associated diseases by the person's gut microbiome and the prevention of such diseases |
WO2014060538A1 (en) * | 2012-10-17 | 2014-04-24 | Institut National De La Recherche Agronomique | Determination of reduced gut bacterial diversity |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10357521B2 (en) | 2015-05-14 | 2019-07-23 | University Of Puerto Rico | Methods for restoring microbiota of newborns |
US11564667B2 (en) | 2015-12-28 | 2023-01-31 | New York University | Device and method of restoring microbiota of newborns |
CN110914453A (en) * | 2017-07-31 | 2020-03-24 | 深圳华大生命科学研究院 | Biomarkers for atherosclerotic cardiovascular disease |
CN110914453B (en) * | 2017-07-31 | 2023-12-19 | 深圳华大生命科学研究院 | Biomarkers for atherosclerotic cardiovascular disease |
CN110396537A (en) * | 2018-04-24 | 2019-11-01 | 深圳华大生命科学研究院 | Asthma biomarker and application thereof |
CN110396537B (en) * | 2018-04-24 | 2023-06-20 | 深圳华大生命科学研究院 | Asthma biomarker and application thereof |
WO2023049842A1 (en) * | 2021-09-23 | 2023-03-30 | Flagship Pioneering Innovations Vi, Llc | Diagnosis and treatment of diseases and conditions of the intestinal tract |
WO2023064923A3 (en) * | 2021-10-15 | 2023-06-29 | Mammoth Biosciences, Inc. | Fusion effector proteins and uses thereof |
CN115851910A (en) * | 2022-11-23 | 2023-03-28 | 湖州市中心医院 | Marker, system and application for diagnosing or predicting coronary heart disease |
Also Published As
Publication number | Publication date |
---|---|
CN107075563B (en) | 2021-05-04 |
CN107075563A (en) | 2017-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016049920A1 (en) | Biomarkers for coronary artery disease | |
WO2016049918A1 (en) | Biomarkers for coronary artery disease | |
Andersson et al. | 70-year legacy of the Framingham Heart Study | |
Zupancic et al. | Analysis of the gut microbiota in the old order Amish and its relation to the metabolic syndrome | |
US20190367995A1 (en) | Biomarkers for colorectal cancer | |
Exarchos et al. | Artificial intelligence techniques in asthma: a systematic review and critical appraisal of the existing literature | |
CN107075446B (en) | Biomarkers for obesity related diseases | |
CN105473738B (en) | colorectal cancer biomarker | |
CN111028223B (en) | Method for processing microsatellite unstable intestinal cancer energy spectrum CT iodogram image histology characteristics | |
CN114438165B (en) | Acute coronary syndrome risk assessment marker for stable coronary heart disease and application | |
CN110241205A (en) | A kind of schizophrenia biomarker combinations and its application and screening based on intestinal flora | |
CN105705652A (en) | Method for aiding differential diagnosis of stroke | |
Kwak et al. | Development of a NOVEL metagenomic biomarker for prediction of upper gastrointestinal tract involvement in patients with Crohn’s disease | |
Lambert et al. | Diagnostic accuracy of FEV1/forced vital capacity ratio z scores in asthmatic patients | |
Zammit et al. | Quantification of celiac disease severity using video capsule endoscopy: a comparison of human experts and machine learning algorithms | |
CN113046429B (en) | Cerebral apoplexy polygene genetic risk scoring and morbidity risk evaluating device and application thereof | |
Lassau et al. | AI-based multi-modal integration of clinical characteristics, lab tests and chest CTs improves COVID-19 outcome prediction of hospitalized patients | |
CN114317725B (en) | Crohn disease biomarker, kit and screening method of biomarker | |
CN110914453B (en) | Biomarkers for atherosclerotic cardiovascular disease | |
Amin et al. | The future of sudden cardiac death research | |
WO2016049927A1 (en) | Biomarkers for obesity related diseases | |
Hirata et al. | Echocardiographic artificial intelligence for pulmonary hypertension classification | |
KR102161511B1 (en) | Extracting method for biomarker for diagnosis of biliary tract cancer, computing device therefor, biomarker for diagnosis of biliary tract cancer, and biliary tract cancer diagnosis device comprising same | |
JP4461263B2 (en) | Method for obtaining data for enabling early diagnosis of Dravet syndrome and use thereof | |
Chen et al. | Deep learning integration of chest computed tomography imaging and gene expression identifies novel aspects of COPD |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14903200 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14903200 Country of ref document: EP Kind code of ref document: A1 |