WO2016049920A1 - Biomarkers for coronary artery disease - Google Patents

Biomarkers for coronary artery disease Download PDF

Info

Publication number
WO2016049920A1
WO2016049920A1 PCT/CN2014/088046 CN2014088046W WO2016049920A1 WO 2016049920 A1 WO2016049920 A1 WO 2016049920A1 CN 2014088046 W CN2014088046 W CN 2014088046W WO 2016049920 A1 WO2016049920 A1 WO 2016049920A1
Authority
WO
WIPO (PCT)
Prior art keywords
cvd
streptococcus
cad
biomarker
con
Prior art date
Application number
PCT/CN2014/088046
Other languages
French (fr)
Inventor
Qiang FENG
Zhuye JIE
Huihua XIA
Jun Wang
Original Assignee
Bgi Shenzhen Co., Limited
Bgi Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bgi Shenzhen Co., Limited, Bgi Shenzhen filed Critical Bgi Shenzhen Co., Limited
Priority to PCT/CN2014/088046 priority Critical patent/WO2016049920A1/en
Priority to CN201480082463.5A priority patent/CN107075563B/en
Publication of WO2016049920A1 publication Critical patent/WO2016049920A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to biomarkers and methods for predicting the risk of a disease related to microbes, in particular coronary artery disease (CAD) or related heart diseases.
  • CAD coronary artery disease
  • Coronary artery disease refers to any abnormal condition of the coronary arteries that interferes with the delivery of an adequate supply of blood to the cardiac (i. e. , heart) muscle or any portion thereof.
  • CAD is caused by the accumulation of plaque on the arterial walls (i. e. , atherosclerosis) , particularly in the large and medium-sized arteries serving the heart. These conditions have similar causes, mechanisms, and treatments.
  • CAD represents the leading cause of death and morbidity worldwide. Early diagnosis of CAD will help to not only prevent mortality, but also reduce the costs for surgical intervention.
  • the “gold standard” for detecting CAD is invasive coronary angiography. However, this is costly, and can pose risk to the patient. Prior to angiography, non-invasive diagnostic modalities such as myocardial perfusion imaging (MPI) and CT-angiography may be used, however these have complications including radiation exposure, contrast agent sensitivity, and only add moderately to obstructive CAD identification.
  • MPI myocardial perfusion imaging
  • CT-angiography may be used, however these have complications including radiation exposure, contrast agent sensitivity, and only add moderately to obstructive CAD identification.
  • Coronary artery disease As one of the most influential complex diseases, has been increasingly investigated by GWAS in recent years and revealed 10.6% of the inherent cause by 46 common variations (Ehret, G. B. et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103-109, incorporated herein by reference) .
  • CAD Coronary artery disease
  • gut microbiota plays a crucial role on our health in many aspects, such as intaking energy from food, producing important metabolites, promoting the development and maturity of immune system, and protecting the host from pathogen infection et, al.
  • the characteristics for most coronary artery disease are inflammation, oxidation and lipid metabolism, which might potentially correlate with the gut microbes and their metabolites.
  • Embodiments of the present disclosure seek to solve at least one of the problems existing in the prior art to at least some extent.
  • the present invention is based on the following findings by the inventors:
  • CAD coronary artery disease
  • MWAS Metagenome-Wide Association Study
  • the inventors calculated probability of illness through a random forest model based on the 65 CAD-associated gut microbes and 4 optimized gut microbes.
  • the inventors′ data provide insight into the characteristics of the gut metagenome related to CAD risk, a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders, and the potential usefulness for a gut-microbiota-based approach for assessment of individuals at risk of such disorders.
  • a biomarker set for predicting a disease related to microbiota in a subject consisting of:
  • a gut biomarker comprising at least one of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp.
  • HGA1 Eubacterium limosum
  • Gemella sanguinis Klebsiella pneumoniae
  • Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
  • Lactobacillus fermentum Lactobacillus salivarius
  • Lactobacillus vaginalis Rothia mucilaginosa
  • Ruminococcus gnavus Ruminococcus obeum
  • Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
  • Lactobacillus amylovorus Lactobacillus fermentum
  • Lactobacillus salivarius Lactobacillus vaginalis
  • Rothia mucilaginosa Ruminococcus gnavus
  • Ruminococcus obeum Ruminococcus sp.
  • 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
  • 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
  • the biomarker set consists of at least one of the species listed in Table 4, preferably at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%of the species listed in Table 4.
  • Streptococcus oralis preferably, at least one of Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis.
  • the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009 as stated in Table 5-1.
  • biomarker set for predicting a disease related to microbiota in a subject consisting of:
  • a gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
  • the disease is coronary artery disease or related heart disease.
  • kit for determining the gene marker set of any one of claims 1 to 4 comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth as below:
  • the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
  • kit for determining the gene marker set above-described comprising one or more probes designed according to the genes as set forth as below:
  • the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
  • CAD coronary artery disease
  • CAD coronary artery disease
  • the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  • the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
  • HGA1 Eubacterium limosum
  • Gemella sanguinis Klebsiella pneumoniae
  • Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
  • Lactobacillus fermentum Lactobacillus salivarius
  • Lactobacillus vaginalis Rothia mucilaginosa
  • Ruminococcus gnavus Ruminococcus obeum
  • Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
  • Lactobacillus amylovorus Lactobacillus fermentum
  • Lactobacillus salivarius Lactobacillus vaginalis
  • Rothia mucilaginosa Ruminococcus gnavus
  • Ruminococcus obeum Ruminococcus sp.
  • 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
  • 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
  • the training dataset is at least one of Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  • CAD coronary artery disease
  • CAD coronary artery disease
  • CAD coronary artery disease
  • the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  • the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
  • HGA1 Eubacterium limosum
  • Gemella sanguinis Klebsiella pneumoniae
  • Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
  • Lactobacillus fermentum Lactobacillus salivarius
  • Lactobacillus vaginalis Rothia mucilaginosa
  • Ruminococcus gnavus Ruminococcus obeum
  • Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
  • Lactobacillus amylovorus Lactobacillus fermentum
  • Lactobacillus salivarius Lactobacillus vaginalis
  • Rothia mucilaginosa Ruminococcus gnavus
  • Ruminococcus obeum Ruminococcus sp.
  • 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
  • 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
  • the training dataset is at least one of Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  • CAD coronary artery disease
  • a method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota comprising:
  • the method comprises:
  • CAD coronary artery disease
  • the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  • the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
  • HGA1 Eubacterium limosum
  • Gemella sanguinis Klebsiella pneumoniae
  • Lachnospiraceae bacterium 9_1_43BFAA Lactobacillus amylovorus
  • Lactobacillus fermentum Lactobacillus salivarius
  • Lactobacillus vaginalis Rothia mucilaginosa
  • Ruminococcus gnavus Ruminococcus obeum
  • Ruminococcus sp HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA
  • Lactobacillus amylovorus Lactobacillus fermentum
  • Lactobacillus salivarius Lactobacillus vaginalis
  • Rothia mucilaginosa Ruminococcus gnavus
  • Ruminococcus obeum Ruminococcus sp.
  • 5_1_39BFAA Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp.
  • 2_1_36FAA Streptococcus vestibularis, Subdoligranulum sp.
  • the training dataset is at least one of Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  • CAD coronary artery disease
  • the markers of the present invention are more specific and sensitive as compared with conventional markers.
  • analysis of stool promises accuracy, safety, affordability, and patient compliance. And samples of stool are transportable.
  • the present invention relates to an in vitro method, which is comfortable and noninvasive, so people will participate in a given screening program more easily.
  • the markers of the present invention may also serve as tools for therapy monitoring in CAD patients to detect the response to therapy.
  • Fig. 1 Density histogram showing the P-value distribution of all genes identified in the study cohorts.
  • the horizon line represents the distribution of P-values under the null hypothesis.
  • Fig. 2 The 65 most discriminant MLGs in the Random Forest model using 126 MLG markers.
  • the bar length indicated the importance of variable (MLG species) .
  • Fig. 3 Performance of 65 MLG Random Forest models. 165samples (case 88, control 77) were train set and other 86samples (case 29, control 57) were test set to validation with false negative rate 2/29 and false positive rate 12/57.
  • Example 1 Identifying biomarkers for evaluating coronary artery disease risk
  • ACCVD atherosclerotic cardiovascular disease
  • DNA library construction was performed following the manufacturer ⁇ s instruction (Illumina) .
  • the inventors used the same workflow as described previously to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridiza-tion of the sequencing primers.
  • the inventors constructed one paired-end (PE) library with insert size of 350 bp for each sample, followed by a high-throughput sequencing to obtain around 30 million PE reads of length 2x100bp.
  • High-quality reads were obtained by filtering low-quality reads with ambiguous ⁇ N′ bases, adapter contamination and human DNA contamination from the Illumina raw reads, and by trimming low-quality terminal bases of reads simultaneously.
  • the inventors totally output about 4.77 Gb per sample of fecal micbiota sequencing data (high quality clean data) (Table 2) from 165 samples (88 cases and 77 controls) on Illumina HiSeq 2000 platform.
  • Taxonomic assignment of genes was performed using an in-house pipeline which had described in the published T2D paper (Qin et al. 2012, supra) .
  • IMG species and mOTU species profiles were aligned to the 4,653 reference genomes from IMG v400 (Markowitz, V. M. et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic acids research 40, D115-D122 (2012) , incorporated herein by reference) and to the 79268 sequences of mOTU reference (Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nature methods 10, 1196-1199 (2013) , incorporated herein by reference) with default parameters, respectively. 1290 IMG species (species that were shared among at least 10 subjects) and 560 species level mOTUs were identified.
  • the inventors used the permutational multivariate analysis of variance (PERMANOVA) to assess the effect of 25 different characteristics, including CAD status, HDLC, CHOL, Gender, FBG, hypertension, APOB, Age, CREA , LDLC, HbA1c, APOA, TP, diabetes, ALB, TRIG, BMI, WHR, Lpa, HBDH, CKMB, AST, CK, ProBNP_E_, ALT, on gene profiles of 4.5M reference gene catalogue.
  • the inventors performed the analysis using the method implemented in package ′′vegan′′ in R, and the permuted p-value was obtained by 10,000 times permutations.
  • PERMANOA identified two significant factors associated with gut microbe (based on gene profiles) (q ⁇ 0.05, Table 3) .
  • FDR false discovery rate
  • Receiver Operator Characteristic (ROC) analysis The inventors applied the ROC analysis to assess the performance of the ACVD classification based on metagenomic markers. The inventors then used the “pROC” package in R to draw the ROC curve.
  • ROC Receiver Operator Characteristic
  • MLG metagenomic linkage group
  • the inventors used the 438,750 gene markers to built the metagenomic linkage group (MLG) using the same method described in the published T2D paper (Qin et al. 2012, supra) . All the 438,750 genes were annotated by aligning these genes to the 4,653 reference genomes in IMG v400. An MLG was assigned to a genome if more than 50% constitutive genes were annotated to that genome, otherwise it was termed as unclassified. Total 136 MLG genomes with gene number>550 were selected, these MLG genomes which belonging to a same species were grouped to construct MLG species, and finally the inventors obtained 127 MLGs species.
  • MLG metagenomic linkage group
  • the inventors performed Wilcoxon rank-sum test to the 127 MLGs species with Benjamini-Hochberg adjustment, and 126 MLGs were selected out as ACVD-associated MLGs with q ⁇ 0.05. To estimate the relative abundance of an MLG species, the inventors estimated the average abundance of the genes of the MLG species, after removing the 5% lowest and 5% highest abundant genes (Qin et al. 2012, supra) .
  • MLG metagenomic linkage groups
  • Qin et al. 2012, supra the distribution and the occurrence rate (Qin et al. 2012, supra) of 438, 750 genes, 94.8% of the significant genes (P-value ⁇ 0.01) were included into MLGs.
  • 136 MLGs (each>550 genes, >50% coverage and q ⁇ 0.05) were annotated to NCBI database, and MLGs from same species were grouped to get 126 MLG species.
  • MLG species marker identification To identify 126 MLG species makers, the inventors used “randomForest 4.5-36” package in R vision 2.10 based on the 126 ACVD associated MLG species. Firstly, the inventors sorted all the 126 MLG species by the importance given by the “randomForest” method (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) . MLG marker sets were constructed by creating incremental subsets of the top ranked MLG species, starting from 5 MLG species and ending at all 126 MLG species. For each MLG makers set, the inventors calculated the false predication ratio in our 165 Chinese cohorts.
  • the 65 MLG species sets with lowest false prediction ratio were selected out as MLG species makers (Fig. 2, Table 4 and Table 5-1 ⁇ 5-2) with false negative (FN) rate 6.81% (6/88) and false positive (FP) rate 3.89% (3/77) (Fig. 3, Trainset) .
  • the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the selected MLG species markers (Table 6-1 ⁇ 6-2 ⁇ 6-3 ⁇ 6-4 ⁇ 6-5) and calculate the area under the ROC curve (AUC) was 98.17% (95% CI: 96.6%-99.74%) using R package “pROC” (Fig. 4) .
  • Most of the case enriched MLG species (totally 51) were opportunistic pathogens from Streptococcus (9/11 MLG species were oral pathogens) , Clostridium (6 MLG species) , Ruminococcus (4 MLG species) and Lactobacillus (4 MLG species) .
  • Rothia mucilaginosa naturally inhabits in the oral cavity and upper respiratory tract and is increasingly recognized as an emerging opportunistic pathogen associated with prosthetic device infections and Endocarditis.
  • Gemella sanguinis could strengthen the inflammation in immunodeficiency patients.
  • the Akkermansia muciniphila was also enriched in CAD patients.
  • IMG species and mOTU species markers identification based on the IMG species and mOTU species profiles, the inventors identified the ACVD associated IMG species and mOTU species with q ⁇ 0.05 (Wilcoxon rank-sum test with Benjamini-Hochberg adjustment) . Subsequently, IMG species markers and mOTU species markers were selecting using the random forest approach as in MLG species markers selection.
  • the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the 4 microbes from Streptococcus (Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis) as a biomarker (Table 9) and calculated the area under the ROC curve was 85.86% (95% CI: 80.24%-91.48%) using R package “pROC” (Fig. 5) .
  • the false negative (FN) rate was 28.40% (25/88) and false positive (FP) rate was 20.77% (16/77) .
  • mlg_id X104 93250 ⁇ 97356 4107 mlg_id: X124 97357 ⁇ 100151 2795 mlg_id: X79 100152 ⁇ 103376 3225 mlg_id: X13 103377 ⁇ 104672 1296 mlg_id: X96 104673 ⁇ 106081 1409 mlg_id: X105 106082 ⁇ 106998 917 mlg_id: X6 106999 ⁇ 107917 919 mlg_id: X85 107918 ⁇ 109895 1978 mlg_id: X3 109896 ⁇ 110770 875 mlg_id: X94 110771 ⁇ 114337 3567 mlg_id: X111 114338 ⁇ 115109 772 mlg_id: X8 115110 ⁇ 116480 1371 mlg_id: X98 116481 ⁇ 119348 2868
  • the inventors used another new independent study group, including 29 case samples and 57 control samples that were used as test set (Table 10) and also collected in Guangdong Provincial People′s Hospital.
  • DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1.
  • the inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
  • training dataset is a matrix, each row represents MLG; each column represents samples; each cell represents relative abundance profile of a MLG in a sample; sample disease status of training sample in Example 1 is a vectot, 1 for CAD, 0 for control) , and a testset (just the MLG relative abundance profile of the test set) .
  • testset just the MLG relative abundance profile of the test set.
  • the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the testset.
  • Output is matrix containing the prediction results (the first column “0” is probability of health; the second column “1” is probability of CAD; cutoff is 0.5 and if the probability of CAD ⁇ 0.5, the subject is at risk of CAD)
  • the inventors used the 65 selected MLGs to redo random forest and then probability of illness was calculated (Table 11, Fig. 3 Testset) .
  • Streptococcus Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis
  • FN false negative
  • FP false positive
  • the area under the ROC curve was 81.94% (95% CI: 72.98%-90.9%) in test set.
  • the inventors have identified and validated 65 CAD-associated gut microbes and 4 optimized gut microbes by a random forest model based on CAD-associated genes markers. And the inventors have constructed a method to evaluate the risk of CAD disease based on these 65 CAD-associated gut microbes and 4 optimized gut microbes.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are biomarkers and methods for predicting the risk of a disease related to microbes, in particular coronary artery disease (CAD) or related heart diseases.

Description

BIOMARKERS FOR CORONARY ARTERY DISEASE
CROSS-REFERENCE TO RELATED APPLICATION
None
FIELD
The present invention relates to biomarkers and methods for predicting the risk of a disease related to microbes, in particular coronary artery disease (CAD) or related heart diseases.
BACKGROUND
Coronary artery disease (CAD) refers to any abnormal condition of the coronary arteries that interferes with the delivery of an adequate supply of blood to the cardiac (i. e. , heart) muscle or any portion thereof. Typically, CAD is caused by the accumulation of plaque on the arterial walls (i. e. , atherosclerosis) , particularly in the large and medium-sized arteries serving the heart. These conditions have similar causes, mechanisms, and treatments. CAD represents the leading cause of death and morbidity worldwide. Early diagnosis of CAD will help to not only prevent mortality, but also reduce the costs for surgical intervention.
The “gold standard” for detecting CAD is invasive coronary angiography. However, this is costly, and can pose risk to the patient. Prior to angiography, non-invasive diagnostic modalities such as myocardial perfusion imaging (MPI) and CT-angiography may be used, however these have complications including radiation exposure, contrast agent sensitivity, and only add moderately to obstructive CAD identification.
Current knowledge indicates the genetic, environmental factors and their interactions collaboratively induce complex phenotype and many diseases. Coronary artery disease (CAD) , as one of the most influential complex diseases, has been increasingly investigated by GWAS in recent years and revealed 10.6% of the inherent cause by 46 common variations (Ehret, G. B. et al. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103-109, incorporated herein by reference) . However, our knowledge on the effect of environmental factor like gut microbes and the contribution of genes and microbes to disease still need further.
Our “forgotten organ” , gut microbiota, plays a crucial role on our health in many aspects,  such as intaking energy from food, producing important metabolites, promoting the development and maturity of immune system, and protecting the host from pathogen infection et, al. Recent studies suggested the flora dysbiosis, chronic inflammatory and metabolic abnormity exist in the intestine of some metabolic diseases like diabetes and obesity. The characteristics for most coronary artery disease are inflammation, oxidation and lipid metabolism, which might potentially correlate with the gut microbes and their metabolites. A recent research indicates gut microbes could metabolize the red meat ingredients (L-carnitine, phosphatidyl-choline, cholesterol) into TMA, which would be further oxidized into TMAO in the liver to arise the oxidization reaction in blood vessel to lead inflammatory and lipid deposition, ultimately resulting in atherosclerosis and coronary heart disease. Meanwhile, compared with healthy subjects, the symptomatic atherosclerosis patients gut microbiota exhibits obvious abnormality (Koeth, R. A. et al. Intestinal microbiota metabolism of L-carnitine, a nutrient in red meat, promotes atherosclerosis. Nature medicine 19, 576-585, incorporated herein by reference) . These study suggested the dysbiosis of gut microbes may strongly influenced the pathogenesis of coronary artery disease by inducing the human metabolic abnormality. However, the characters of gut flora dysbiosis in atherosclerosis induced pathogenesis of coronary artery disease patients and its impact on metabolic system are still puzzling.
SUMMARY
Embodiments of the present disclosure seek to solve at least one of the problems existing in the prior art to at least some extent.
The present invention is based on the following findings by the inventors:
Assessment and characterization of gut microbiota has become a major research area in human disease, including coronary artery disease (CAD) . To carry out analysis on gut microbial content in CAD patients, the inventors carried out a protocol for a Metagenome-Wide Association Study (MGWAS) (Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012) , incorporated herein by reference) based on deep shotgun sequencing of the gut microbial DNA from 165 individuals. The inventors identified and validated 65 CAD-associated gut microbes and 4 optimized gut microbes. To exploit the potential ability of CAD classification by gut microbiota, the inventors calculated probability of illness through a random forest model based on the 65 CAD-associated gut microbes and 4 optimized gut microbes.  The inventors′ data provide insight into the characteristics of the gut metagenome related to CAD risk, a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders, and the potential usefulness for a gut-microbiota-based approach for assessment of individuals at risk of such disorders.
In one aspect of present disclosure, there is provided with a biomarker set for predicting a disease related to microbiota in a subject consisting of:
a gut biomarker comprising at least one of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544, or microbes with genomic DNA comprising at least a partial sequence of SEQ ID NO: 1 to 122009, alternatively, the biomarker set consists of at least one of the species listed in Table 4, preferably at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%of the species listed in Table 4.
preferably, at least one of Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis.
According to embodiments of present disclosure, the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009 as stated in Table 5-1.
In another aspect of present disclosure, there is provided with a biomarker set for predicting a  disease related to microbiota in a subject consisting of:
a gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the disease is coronary artery disease or related heart disease.
In another aspect of present disclosure, there is provided with a kit for determining the gene marker set of any one of claims 1 to 4, comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth as below:
the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
In another aspect of present disclosure, there is provided with a kit for determining the gene marker set above-described , comprising one or more probes designed according to the genes as set forth as below:
the gut biomarker comprises at least a partial sequence of at least one of SEQ ID NO: 1 to 122009.
In another aspect of present disclosure, there is provided with use of the gene marker set described above for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set described above in the samples obtained in step (1) ;
(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing  samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the training dataset is at least one of Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
In another aspect of present disclosure, there is provided with use of the gene marker set described above for preparation of a kit for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set described above in the samples obtained in step (1) ;
(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has  or is at the risk of developing the coronary artery disease (CAD) or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the training dataset is at least one of Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
In another aspect of present disclosure, there is provided with a method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota, comprising:
determining the relative abundance of the biomarkers described above in a sample from the subject, and
determining whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota based on the relative abundance.
According to embodiments of present disclosure, the method comprises:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set described above in the samples obtained in step (1) ;
(3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set described above, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus  infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
According to embodiments of present disclosure, the training dataset is at least one of Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
It is believed that 65 CAD-associated gut microbes and 4 optimized gut microbes are valuable for increasing CAD detection at earlier stages due to the following. First, the markers of the present invention are more specific and sensitive as compared with conventional markers. Second, analysis of stool promises accuracy, safety, affordability, and patient compliance. And samples of stool are transportable. Thus, the present invention relates to an in vitro method, which is comfortable and noninvasive, so people will participate in a given screening program more easily. Third, the markers of the present invention may also serve as tools for therapy monitoring in CAD patients to detect the response to therapy.
BRIEF DISCRIPTION OF DRAWINGS
These and other aspects and advantages of the present disclosure will become apparent and more readily appreciated from the following descriptions taken in conjunction with the drawings, in which:
Fig. 1 Density histogram showing the P-value distribution of all genes identified in the study cohorts. The horizon line represents the distribution of P-values under the null hypothesis.
Fig. 2 The 65 most discriminant MLGs in the Random Forest model using 126 MLG markers.  The bar length indicated the importance of variable (MLG species) .
Fig. 3 Performance of 65 MLG Random Forest models. 165samples (case 88, control 77) were train set and other 86samples (case 29, control 57) were test set to validation with false negative rate 2/29 and false positive rate 12/57.
Fig. 4 Identification of ACVD-associated markers from gut metagenome. Performance of 65 MLG Random Forest models, 165 samples (88 cases and 77 controls) were applied as the training sets (AUC=98.17%) . The area between the two outside curves represents the 95% CI shape.
Fig. 5 Identification of ACVD-associated markers from gut metagenome. Performance of 4 MLG Random Forest models, 165 samples (88 cases and 77 controls) were applied as the training sets (AUC=85.86%) . The area between the two outside curves represents the 95% CI shape.
EXAMPLES
Terms used herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a” , “an” and “the” are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the invention, but their usage does not delimit the invention, except as outlined in the claims.
The present invention is further exemplified in the following non-limiting Examples. Unless otherwise stated, parts and percentages are by weight and degrees are Celsius. As apparent to one of ordinary skill in the art, these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only, and the agents were all commercially available.
Example 1. Identifying biomarkers for evaluating coronary artery disease risk
1.1 Sample collection
Fecal samples from 165 south Chinese subjects, including 88 atherosclerotic cardiovascular disease (ACVD) patients and 77 control subjects (training set, Table 1) , were collected by Guangdong Provincial People′s Hospital in 2011. ACVD patients were diagnosed and categorized according to pathological features (coronary angiography) . Subjects were asked to collect fresh feces samples at hospital. Collected samples were put in sterile tubes and stored at -80℃immediately until further analysis.
The complete ethical approval has been obtained, and all the patients gave written informed consent. The study was approved by the Institutional Review Board of Guangdong General  Hospital.
Table 1 Baseline characteristics of atherosclerotic cardiovascular disease (ACVD) cases and controls. Fourth column reports results from Wilcoxon rank-sum tests.
Figure PCTCN2014088046-appb-000001
NOTE: For the information of gender, one of the 88 patients’ was unknown and two of the 77 controls’ were unknown.
1.2 DNA extraction
Fecal samples were thawed on ice and DNA extraction was performed using the Qiagen QIAamp DNA Stool Mini Kit (Qiagen) according to manufacturer`s instructions. Extracts were treated with DNase-free RNase to eliminate RNA contamination. DNA quantity was determined using NanoDrop spectrophotometer, Qubit Fluorometer (with the Quant-iTTMdsDNA BR Assay Kit) and gel electrophoresis.
1.3 DNA library construction and sequencing of fecal samples
DNA library construction was performed following the manufacturer`s instruction (Illumina) . The inventors used the same workflow as described previously to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridiza-tion of the sequencing primers. The inventors constructed one paired-end (PE) library with insert size of 350 bp for each sample, followed by a high-throughput sequencing to obtain around 30 million PE reads of length 2x100bp. High-quality reads were obtained by filtering low-quality reads with ambiguous `N′ bases, adapter contamination and human DNA contamination from the Illumina raw reads, and by trimming low-quality terminal bases of reads simultaneously.
The inventors totally output about 4.77 Gb per sample of fecal micbiota sequencing data (high quality clean data) (Table 2) from 165 samples (88 cases and 77 controls) on Illumina HiSeq 2000 platform.
Table 2 Summary of metagenomic data. Fourth column reports results from Wilcoxon rank-sum tests.
Parameter Controls Cases P-value
Average raw bases (G) 4.85 4.92 0.831
After removing low quality bases 4.76 (98.14%) 4.79 (97.36%)  
After removing human reads 4.73 (97.53%) 4.78 (97.15%) 0.874
1.4 Metagenomic data processing and analysis
1.4.1 Gene catalogue construction
Gene catalogue construction. Employing the same parameters that were used to construct the type 2 diabetes gene catalogue (Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012) , incorporated herein by reference) , the inventors performed de novo assembly and gene prediction for the high quality reads of 165 samples using SOAPdenovo v1.06 (Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20, 265-272, doi: 10.1101/gr. 097261.109 (2009) , incorporated herein by reference) and GeneMark v2.7 (Zhu, W. , Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic acids research 38, e132, doi: 10.1093/nar/gkq275 (2010) , incorporated herein by reference) , respectively. All predicted genes were aligned pairwise using BLAT and genes, of which over 90% of their length can be aligned to another one with more than 95% identity (no gaps allowed) , were removed as redundancies, resulting in a non-redundant gene catalogue comprising of 4,537,046 genes (4.5 M gene catalogue) .
Taxonomic assignment of genes. Taxonomic assignment of the predicted genes was performed using an in-house pipeline which had described in the published T2D paper (Qin et al. 2012, supra) .
1.4.2 Data profile construction
Gene profile. These 4,537,046 genes and their associated measures of relative abundance in 165 samples were used to establish the gene profile for the association study (The inventors use the same method described in the published T2D paper (Qin et al. 2012, supra) to compute the relative gene abundance. ) .
IMG species and mOTU species profiles. Toatal fecal clean reads were aligned to the 4,653 reference genomes from IMG v400 (Markowitz, V. M. et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic acids research 40, D115-D122 (2012) , incorporated herein by reference) and to the 79268 sequences of mOTU reference (Sunagawa, S.  et al. Metagenomic species profiling using universal phylogenetic marker genes. Nature methods 10, 1196-1199 (2013) , incorporated herein by reference) with default parameters, respectively. 1290 IMG species (species that were shared among at least 10 subjects) and 560 species level mOTUs were identified.
1.4.3 Analysis of factors influencing gut microbiota gene profile. The inventors used the permutational multivariate analysis of variance (PERMANOVA) to assess the effect of 25 different characteristics, including CAD status, HDLC, CHOL, Gender, FBG, hypertension, APOB, Age, CREA , LDLC, HbA1c, APOA, TP, diabetes, ALB, TRIG, BMI, WHR, Lpa, HBDH, CKMB, AST, CK, ProBNP_E_, ALT, on gene profiles of 4.5M reference gene catalogue. The inventors performed the analysis using the method implemented in package ″vegan″ in R, and the permuted p-value was obtained by 10,000 times permutations. The inventors also corrected for multiple testing using ″p. adjust″ in R with Benjamini-Hochberg method to get the q-value for each test. PERMANOA identified two significant factors associated with gut microbe (based on gene profiles) (q<0.05, Table 3) . The analysis indicated CAD and HDLC status were both the strongest associated markers, supporting the diseases status was the major determinant influencing the composition of gut microbiota. Gender, age and some CAD clinical indices like CHOL, FGB, hypertension and APOB, were also significant factors.
Table 3 PERMANOVA based on euclidean distance analysis of gene profile. The analysis was conducted to test whether clinical parameters, and ACVD status have significant impact on the gut microbiota with q-value<0.05.
Figure PCTCN2014088046-appb-000002
1.4.4 Identification of ACVD associated markers
Identification of ACVD associated genes. To identify the association between the metagenomic profile and ACVD, a two-tailed Wilcoxon rank-sum test was used in 2.1M high occurrence gene (genes that were present in less than 10 samples across all 165 samples were removed) profiles. 438,750 gene markers (20.48% of 2.1M genes) were obtained, which were enriched in either case or control with p-value<0.01, FDR=2.23% (Fig. 1) .
Estimating the false discovery rate (FDR) . Instead of a sequential p-value rejection method, the inventors applied the “q-value” method proposed in a previous study to estimate the FDR (Storey, J. D. A direct approach to false discovery rates. Journal of the Royal Statistical Society 64,  479-498 (2002) , incorporated herein by reference) .
Receiver Operator Characteristic (ROC) analysis. The inventors applied the ROC analysis to assess the performance of the ACVD classification based on metagenomic markers. The inventors then used the “pROC” package in R to draw the ROC curve.
1.4.5 Contruction of MLG and identification of ACVD associated MLG species markers
126 MLG species based on the 438,750 ACVD associated maker gene profile. The inventors used the 438,750 gene markers to built the metagenomic linkage group (MLG) using the same method described in the published T2D paper (Qin et al. 2012, supra) . All the 438,750 genes were annotated by aligning these genes to the 4,653 reference genomes in IMG v400. An MLG was assigned to a genome if more than 50% constitutive genes were annotated to that genome, otherwise it was termed as unclassified. Total 136 MLG genomes with gene number>550 were selected, these MLG genomes which belonging to a same species were grouped to construct MLG species, and finally the inventors obtained 127 MLGs species. The inventors performed Wilcoxon rank-sum test to the 127 MLGs species with Benjamini-Hochberg adjustment, and 126 MLGs were selected out as ACVD-associated MLGs with q<0.05. To estimate the relative abundance of an MLG species, the inventors estimated the average abundance of the genes of the MLG species, after removing the 5% lowest and 5% highest abundant genes (Qin et al. 2012, supra) .
In total, The inventors built 136 metagenomic linkage groups (MLG>550 genes) based on the distribution and the occurrence rate (Qin et al. 2012, supra) of 438, 750 genes, 94.8% of the significant genes (P-value<0.01) were included into MLGs. 136 MLGs (each>550 genes, >50% coverage and q<0.05) were annotated to NCBI database, and MLGs from same species were grouped to get 126 MLG species.
65 MLG species marker identification. To identify 126 MLG species makers, the inventors used “randomForest 4.5-36” package in R vision 2.10 based on the 126 ACVD associated MLG species. Firstly, the inventors sorted all the 126 MLG species by the importance given by the “randomForest” method (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) . MLG marker sets were constructed by creating incremental subsets of the top ranked MLG species, starting from 5 MLG species and ending at all 126 MLG species. For each MLG makers set, the inventors calculated the false predication ratio in our 165 Chinese cohorts. Finally, the 65 MLG species sets with lowest false prediction ratio were selected out as MLG species makers (Fig. 2, Table 4 and  Table 5-1、 5-2) with false negative (FN) rate 6.81% (6/88) and false positive (FP) rate 3.89% (3/77) (Fig. 3, Trainset) . Furthermore, the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the selected MLG species markers (Table 6-1、 6-2、 6-3、 6-4、 6-5) and calculate the area under the ROC curve (AUC) was 98.17% (95% CI: 96.6%-99.74%) using R package “pROC” (Fig. 4) .
Among the 65 MLG species, the control enriched MLG species Bacteroides uniformis (q=4.21E-11) , Bacteroides vulgatus (q=1.80E-09) and Clostridiales-sp. -SS3/4 (q=1.68E-08) , were known of SCFAs producing bacterial. Most of the case enriched MLG species (totally 51) were opportunistic pathogens from Streptococcus (9/11 MLG species were oral pathogens) , Clostridium (6 MLG species) , Ruminococcus (4 MLG species) and Lactobacillus (4 MLG species) . Rothia mucilaginosa naturally inhabits in the oral cavity and upper respiratory tract and is increasingly recognized as an emerging opportunistic pathogen associated with prosthetic device infections and Endocarditis. The Clostridium bolteae, isolated from human fecal material, blood and intra-abdominal abscess, were Gram-positive pathogens and could produce some toxins including neurotoxin, it encountered in clinically significant infections in humans, and the mean counts of which in autistic children were 46-fold (P-value=0.01) greater than those in control children. Gemella sanguinis could strengthen the inflammation in immunodeficiency patients. The Akkermansia muciniphila was also enriched in CAD patients.
1.4.6 Identification of ACVD associated IMG species and mOTU species. IMG species and mOTU species markers identification based on the IMG species and mOTU species profiles, the inventors identified the ACVD associated IMG species and mOTU species with q<0.05 (Wilcoxon rank-sum test with Benjamini-Hochberg adjustment) . Subsequently, IMG species markers and mOTU species markers were selecting using the random forest approach as in MLG species markers selection.
65 IMG species with ROC 98.52% and 15 mOTUs species with ROC 96.16% were also clearly separate CAD patients from healthy subjects (q<0.05; see Table 7, 8) by Wilcoxon rank-sum test and random forest selection. Through overlapping with the 65 MLG markers, the inventors found the oral original pathogens including Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis and Akkermansia muciniphila were significantly distributed in cases.
The inventors drew the ROC curve using the OOB (out of bag) prediction probability of  illness from randomForest model based on the 4 microbes from Streptococcus (Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis) as a biomarker (Table 9) and calculated the area under the ROC curve was 85.86% (95% CI: 80.24%-91.48%) using R package “pROC” (Fig. 5) . The false negative (FN) rate was 28.40% (25/88) and false positive (FP) rate was 20.77% (16/77) .
Figure PCTCN2014088046-appb-000003
Figure PCTCN2014088046-appb-000004
Table 5-1. SEQ ID of the 65 MLG species
MLG ID SEQ ID NO: genes number
mlg_id: X127 1~2248 2248
mlg_id: X21 2249~2862 614
mlg_id: X63 2863~6837 3975
mlg_id: X118 6838~7608 771
mlg_id: X102 7609~9275 1667
mlg_id: X26 9276~10648 1373
mlg_id: X80 10649~13331 2683
mlg_id: X72 13332~15672 2341
mlg_id: X125 15673~18620 2948
mlg_id: X44 18621~19520 900
mlg_id: X74 19521~20240 720
mlg_id: X84 20241~20801 561
mlg_id: X95 20802~23660 2859
mlg_id: X108 23661~26771 3111
mlg_id: X115 26772~30163 3392
mlg_id: X92 30164~32611 2448
mlg_id: X109 32612~35767 3156
mlg_id: X103 35768~38880 3113
mlg_id: X10 38881~39514 634
mlg_id: X11 39515~40260 746
mlg_id: X91 40261~41635 1375
mlg_id: X48 41636~42428 793
mlg_id: X107 42429~43644 1216
mlg_id: X93 43645~45587 1943
mlg_id: X77 45588~46386 799
mlg_id: X65 46387~48007 1621
mlg_id: X123 48008~50806 2799
mlg_id: X50 50807~51375 569
mlg_id: X39 51376~52814 1439
mlg_id: X64 52815~59026 6212
mlg_id: X114 59027~60592 1566
mlg_id: X101 60593~63979 3387
mlg_id: X86 63980~64989 1010
mlg_id: X76 64990~67004 2015
mlg_id: X70 67005~68492 1488
mlg_id: X68 68493~70925 2433
mlg_id: X17 70926~72312 1387
mlg_id: X2 72313~72990 678
mlg_id: X1 72991~75225 2235
mlg_id: X88 75226~76581 1356
mlg_id: X116 76582~79319 2738
mlg_id: X82 79320~81414 2095
mlg_id: X25 81415~83112 1698
mlg_id: X28 83113~85682 2570
mlg_id: X75 85683~87229 1547
mlg_id: X40 87230~88951 1722
mlg_id: X83 88952~90306 1355
mlg_id: X59 90307~91116 810
mlg_id: X69 91117~92589 1473
mlg_id: X24 92590~93249 660
mlg_id: X104 93250~97356 4107
mlg_id: X124 97357~100151 2795
mlg_id: X79 100152~103376 3225
mlg_id: X13 103377~104672 1296
mlg_id: X96 104673~106081 1409
mlg_id: X105 106082~106998 917
mlg_id: X6 106999~107917 919
mlg_id: X85 107918~109895 1978
mlg_id: X3 109896~110770 875
mlg_id: X94 110771~114337 3567
mlg_id: X111 114338~115109 772
mlg_id: X8 115110~116480 1371
mlg_id: X98 116481~119348 2868
mlg_id: X4 119349~120420 1072
mlg_id: X5 120421~122009 1589
Table 5-2. SEQ ID of the 4 MLG species
MLG ID SEQ ID NO: genes number
mlg_id: X88 1~1356 1356
mlg_id: X68 1357~3789 2433
mlg_id: X96 3790~5198 1409
mlg_id: X82 5199~7293 2095
Figure PCTCN2014088046-appb-000005
Figure PCTCN2014088046-appb-000006
Figure PCTCN2014088046-appb-000007
Figure PCTCN2014088046-appb-000008
Figure PCTCN2014088046-appb-000009
Figure PCTCN2014088046-appb-000010
Figure PCTCN2014088046-appb-000011
Figure PCTCN2014088046-appb-000012
Figure PCTCN2014088046-appb-000013
Figure PCTCN2014088046-appb-000014
Figure PCTCN2014088046-appb-000015
Figure PCTCN2014088046-appb-000016
Figure PCTCN2014088046-appb-000017
Figure PCTCN2014088046-appb-000018
Figure PCTCN2014088046-appb-000019
Figure PCTCN2014088046-appb-000020
Figure PCTCN2014088046-appb-000021
Figure PCTCN2014088046-appb-000022
Figure PCTCN2014088046-appb-000023
Figure PCTCN2014088046-appb-000024
Figure PCTCN2014088046-appb-000025
Figure PCTCN2014088046-appb-000026
Figure PCTCN2014088046-appb-000027
Figure PCTCN2014088046-appb-000028
Figure PCTCN2014088046-appb-000029
Figure PCTCN2014088046-appb-000030
Table 9 4 MLGs relative abundance profiles in165samples
Figure PCTCN2014088046-appb-000031
Figure PCTCN2014088046-appb-000032
Figure PCTCN2014088046-appb-000033
Figure PCTCN2014088046-appb-000034
Example 2. Validating the biomarkers in another 86 individuals
For validating the discriminatory power of the biomarkers, namely the 65 selected MLGs and 4 microbes from Streptococcus, the inventors used another new independent study group, including 29 case samples and 57 control samples that were used as test set (Table 10) and also collected in Guangdong Provincial People′s Hospital.
Table 10. Sample information
Group case control total number
Test set 29 57 86
For each sample, DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1. The inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
About the randomForest model, using “randomForest 4.5-36” package in R vision 2.10, input is a training dataset (namely Table 6-1、 6-2、 6-3、 6-4、 6-5 or Table 9 respcetively) , sample disease status (training dataset is a matrix, each row represents MLG; each column represents samples; each cell represents relative abundance profile of a MLG in a sample; sample disease status of training sample in Example 1 is a vectot, 1 for CAD, 0 for control) , and a testset (just the MLG  relative abundance profile of the test set) . Then the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the testset. Output is matrix containing the prediction results (the first column “0” is probability of health; the second column “1” is probability of CAD; cutoff is 0.5 and if the probability of CAD≥0.5, the subject is at risk of CAD)
The inventors used the 65 selected MLGs to redo random forest and then probability of illness was calculated (Table 11, Fig. 3 Testset) . The model was tested on the test set (n=86, 29 case samples and 57 control samples) and prediction error was calculated. False negative (FN) rate was 6.89% (2/29) and false positive (FP) rate was 21.05% (12/57) , and the area under the ROC curve was 94.34% (95% CI: 89.86%-98.83%) .
Furthermore, the inventors used 4 microbes from Streptococcus (Streptococcus oralis, Streptococcus sanguinis, Streptococcus mitis and Streptococcus infantis) as a biomarker to test the power in separation CAD patients and controls ( (Table 11) , founding that false negative (FN) rate was 17.24% (5/29) and false positive (FP) rate was 35.08% (20/57) , and the area under the ROC curve was 81.94% (95% CI: 72.98%-90.9%) in test set.
Table 11 Prediction results of 65 MLGs and 4 MLGs
Figure PCTCN2014088046-appb-000035
Figure PCTCN2014088046-appb-000036
Figure PCTCN2014088046-appb-000037
Thus the inventors have identified and validated 65 CAD-associated gut microbes and 4 optimized gut microbes by a random forest model based on CAD-associated genes markers. And the inventors have constructed a method to evaluate the risk of CAD disease based on these 65 CAD-associated gut microbes and 4 optimized gut microbes.
Although explanatory embodiments have been shown and described, it would be appreciated by those skilled in the art that the above embodiments can not be construed to limit the present disclosure, and changes, alternatives, and modifications can be made in the embodiments without departing from spirit, principles and scope of the present disclosure.

Claims (22)

  1. A biomarker set for predicting a disease related to microbiota in a subject consisting of:
    Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544.
  2. The biomarker set for predicting a disease related to microbiota in a subject according to claim 1, comprising at least a partial sequence of SEQ ID NO: 1 to 122009.
  3. A biomarker set for predicting a disease related to microbiota in a subject consisting of:
    a gut biomarker comprises at least a partial sequence of SEQ ID NO: 1 to 122009.
  4. The biomarker set for predicting a disease related to microbiota in a subject, wherein the disease is coronary artery disease or related heart disease.
  5. A kit for determining the gene marker set of any one of claims 1 to 4, comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth in claim 3.
  6. A kit for determining the gene marker set of any one of claims 1 to 4, comprising one or more probes designed according to the genes as set forth in claim 3.
  7. Use of the gene marker set of any one of claims 1 to 4 for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:
    (1) collecting a sample from the subject to be tested;
    (2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 4 in the samples obtained in step (1) ;
    (3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
    wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  8. The use of claim 7, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  9. The use of claim8, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 4, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
  10. The use of claim 8, wherein the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
  11. The use of claim 8, wherein the training dataset is Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  12. Use of the gene marker set of any one of claims 1 to 4 for preparation of a kit for predicting the risk of coronary artery disease (CAD) or related disorder in a subject to be tested, comprising:
    (1) collecting a sample from the subject to be tested;
    (2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 4 in the samples obtained in step (1) ;
    (3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
    wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  13. The use of claim 12, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subject susing a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  14. The use of claim 13, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 4, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
  15. The use of claim 13, wherein the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus  vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
  16. The use of claim 13, wherein the training dataset is Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  17. A method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota, comprising:
    determining the relative abundance of the biomarkers of any one of claims 1 to 4 in a sample from the subject, and
    determining whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota based on the relative abundance.
  18. The method according to claim 17, comprising:
    (1) collecting a sample from the subject to be tested;
    (2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 4 in the samples obtained in step (1) ;
    (3) obtaining a probability of CAD by comparing the relative abundance information of each biomarker subject to be tested with a training dataset using a Multivariate statistical model,
    wherein the probability of CAD greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
  19. The method of claim 18, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having CAD and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  20. The method of claim 19, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 4, each column representing samples, each cell representing relative abundance profile of a biomarker in  the sample, and sample disease status is a vectot, with 1 for CAD and 0 for control.
  21. The method of claim 19, wherein the relative abundance infromation of each of Akkermansia muciniphila, Bacteroides fragilis, Clostridium bolteae, Clostridium hathewayi, Clostridium nexile, Clostridium sp. HGF2, Clostridium spiroforme, Clostridium symbiosum, Coprobacillus sp. 3_3_56FAA, Eggerthella sp. HGA1, Eubacterium limosum, Gemella sanguinis, Klebsiella pneumoniae, Lachnospiraceae bacterium 9_1_43BFAA, Lactobacillus amylovorus, Lactobacillus fermentum, Lactobacillus salivarius, Lactobacillus vaginalis, Rothia mucilaginosa, Ruminococcus gnavus, Ruminococcus obeum, Ruminococcus sp. 5_1_39BFAA, Ruminococcus torques, Streptococcus anginosus, Streptococcus infantarius, Streptococcus infantis, Streptococcus mitis, Streptococcus oralis, Streptococcus parasanguinis, Streptococcus pasteurianus, Streptococcus salivarius, Streptococcus sanguinis, Streptococcus sp. 2_1_36FAA, Streptococcus vestibularis, Subdoligranulum sp. 4_3_54A2FAA, CVD 1218, CVD 1259, CVD 1486, CVD 19194, CVD 19221, CVD 2015, CVD 2448, CVD 25206, CVD 461, CVD 547, CVD 659, CVD 8035, CVD 8194, CVD 8305, CVD 9620, CVD 977, Bacteroides cellulosilyticus, Bacteroides stercoris, Bacteroides uniformis, Bacteroides vulgatus, Bacteroides xylanisolvens, Bilophila wadsworthia, Clostridiales sp. SS3/4, Parabacteroides distasonis, Con 14667, Con 14806, Con 17745, Con 3602, Con 4962, Con 5544 is obtained based on the relative abundance information of SEQ ID NO: 1 to 122009.
  22. The method of claim 19, wherein the training dataset is Table 6-1、 6-2、 6-3、 6-4、 6-5, and the probability of CAD being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the coronary artery disease (CAD) or related disorder.
PCT/CN2014/088046 2014-09-30 2014-09-30 Biomarkers for coronary artery disease WO2016049920A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2014/088046 WO2016049920A1 (en) 2014-09-30 2014-09-30 Biomarkers for coronary artery disease
CN201480082463.5A CN107075563B (en) 2014-09-30 2014-09-30 Biomarkers for coronary artery disease

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/088046 WO2016049920A1 (en) 2014-09-30 2014-09-30 Biomarkers for coronary artery disease

Publications (1)

Publication Number Publication Date
WO2016049920A1 true WO2016049920A1 (en) 2016-04-07

Family

ID=55629345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088046 WO2016049920A1 (en) 2014-09-30 2014-09-30 Biomarkers for coronary artery disease

Country Status (2)

Country Link
CN (1) CN107075563B (en)
WO (1) WO2016049920A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10357521B2 (en) 2015-05-14 2019-07-23 University Of Puerto Rico Methods for restoring microbiota of newborns
CN110396537A (en) * 2018-04-24 2019-11-01 深圳华大生命科学研究院 Asthma biomarker and application thereof
CN110914453A (en) * 2017-07-31 2020-03-24 深圳华大生命科学研究院 Biomarkers for atherosclerotic cardiovascular disease
US11564667B2 (en) 2015-12-28 2023-01-31 New York University Device and method of restoring microbiota of newborns
CN115851910A (en) * 2022-11-23 2023-03-28 湖州市中心医院 Marker, system and application for diagnosing or predicting coronary heart disease
WO2023049842A1 (en) * 2021-09-23 2023-03-30 Flagship Pioneering Innovations Vi, Llc Diagnosis and treatment of diseases and conditions of the intestinal tract
WO2023064923A3 (en) * 2021-10-15 2023-06-29 Mammoth Biosciences, Inc. Fusion effector proteins and uses thereof

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019051678A1 (en) * 2017-09-13 2019-03-21 Bgi Shenzhen Biomarker for atherosclerotic cardiovascular diseases
WO2019071516A1 (en) * 2017-10-12 2019-04-18 Perfect (China) Co., Ltd. Biomarkers for chronic hepatitis b and use thereof
WO2019205188A1 (en) * 2018-04-24 2019-10-31 深圳华大生命科学研究院 Biomarker for depression and use thereof
CN110396538B (en) * 2018-04-24 2023-05-23 深圳华大生命科学研究院 Migraine biomarkers and uses thereof
CN110872632A (en) * 2018-08-30 2020-03-10 深圳华大生命科学研究院 Specific gene sequence of streptococcus pharyngolaris, detection primer and application thereof
CN111004735A (en) * 2019-03-21 2020-04-14 江南大学 Lactobacillus fermentum and application thereof in improving intestinal health
CN112710722A (en) * 2019-10-26 2021-04-27 复旦大学 Machine learning-based biomarker dimension expansion screening method
CN112509701A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Risk prediction method and device for acute coronary syndrome
WO2022166934A1 (en) * 2021-02-05 2022-08-11 中国医学科学院阜外医院 Gut microbiota markers for evaluating onset risk of cardiovascular diseases and uses thereof
CN112509700A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Stable coronary heart disease risk prediction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007137865A1 (en) * 2006-06-01 2007-12-06 University Of Zürich The use of mrp 8/14 levels for discrimination of individuals at risk of acute coronary syndromes
WO2008132763A2 (en) * 2007-04-30 2008-11-06 Decode Genetics Ehf Genetic variants useful for risk assessment of coronary artery disease and myocardial infarction
WO2014019271A1 (en) * 2012-08-01 2014-02-06 Bgi Shenzhen Biomarkers for diabetes and usages thereof
WO2014053608A1 (en) * 2012-10-03 2014-04-10 Metabogen Ab Identification of a person having risk for atherosclerosis and associated diseases by the person's gut microbiome and the prevention of such diseases
WO2014060538A1 (en) * 2012-10-17 2014-04-24 Institut National De La Recherche Agronomique Determination of reduced gut bacterial diversity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007137865A1 (en) * 2006-06-01 2007-12-06 University Of Zürich The use of mrp 8/14 levels for discrimination of individuals at risk of acute coronary syndromes
WO2008132763A2 (en) * 2007-04-30 2008-11-06 Decode Genetics Ehf Genetic variants useful for risk assessment of coronary artery disease and myocardial infarction
WO2014019271A1 (en) * 2012-08-01 2014-02-06 Bgi Shenzhen Biomarkers for diabetes and usages thereof
WO2014053608A1 (en) * 2012-10-03 2014-04-10 Metabogen Ab Identification of a person having risk for atherosclerosis and associated diseases by the person's gut microbiome and the prevention of such diseases
WO2014060538A1 (en) * 2012-10-17 2014-04-24 Institut National De La Recherche Agronomique Determination of reduced gut bacterial diversity

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10357521B2 (en) 2015-05-14 2019-07-23 University Of Puerto Rico Methods for restoring microbiota of newborns
US11564667B2 (en) 2015-12-28 2023-01-31 New York University Device and method of restoring microbiota of newborns
CN110914453A (en) * 2017-07-31 2020-03-24 深圳华大生命科学研究院 Biomarkers for atherosclerotic cardiovascular disease
CN110914453B (en) * 2017-07-31 2023-12-19 深圳华大生命科学研究院 Biomarkers for atherosclerotic cardiovascular disease
CN110396537A (en) * 2018-04-24 2019-11-01 深圳华大生命科学研究院 Asthma biomarker and application thereof
CN110396537B (en) * 2018-04-24 2023-06-20 深圳华大生命科学研究院 Asthma biomarker and application thereof
WO2023049842A1 (en) * 2021-09-23 2023-03-30 Flagship Pioneering Innovations Vi, Llc Diagnosis and treatment of diseases and conditions of the intestinal tract
WO2023064923A3 (en) * 2021-10-15 2023-06-29 Mammoth Biosciences, Inc. Fusion effector proteins and uses thereof
CN115851910A (en) * 2022-11-23 2023-03-28 湖州市中心医院 Marker, system and application for diagnosing or predicting coronary heart disease

Also Published As

Publication number Publication date
CN107075563B (en) 2021-05-04
CN107075563A (en) 2017-08-18

Similar Documents

Publication Publication Date Title
WO2016049920A1 (en) Biomarkers for coronary artery disease
WO2016049918A1 (en) Biomarkers for coronary artery disease
Andersson et al. 70-year legacy of the Framingham Heart Study
Zupancic et al. Analysis of the gut microbiota in the old order Amish and its relation to the metabolic syndrome
US20190367995A1 (en) Biomarkers for colorectal cancer
Exarchos et al. Artificial intelligence techniques in asthma: a systematic review and critical appraisal of the existing literature
CN107075446B (en) Biomarkers for obesity related diseases
CN105473738B (en) colorectal cancer biomarker
CN111028223B (en) Method for processing microsatellite unstable intestinal cancer energy spectrum CT iodogram image histology characteristics
CN114438165B (en) Acute coronary syndrome risk assessment marker for stable coronary heart disease and application
CN110241205A (en) A kind of schizophrenia biomarker combinations and its application and screening based on intestinal flora
CN105705652A (en) Method for aiding differential diagnosis of stroke
Kwak et al. Development of a NOVEL metagenomic biomarker for prediction of upper gastrointestinal tract involvement in patients with Crohn’s disease
Lambert et al. Diagnostic accuracy of FEV1/forced vital capacity ratio z scores in asthmatic patients
Zammit et al. Quantification of celiac disease severity using video capsule endoscopy: a comparison of human experts and machine learning algorithms
CN113046429B (en) Cerebral apoplexy polygene genetic risk scoring and morbidity risk evaluating device and application thereof
Lassau et al. AI-based multi-modal integration of clinical characteristics, lab tests and chest CTs improves COVID-19 outcome prediction of hospitalized patients
CN114317725B (en) Crohn disease biomarker, kit and screening method of biomarker
CN110914453B (en) Biomarkers for atherosclerotic cardiovascular disease
Amin et al. The future of sudden cardiac death research
WO2016049927A1 (en) Biomarkers for obesity related diseases
Hirata et al. Echocardiographic artificial intelligence for pulmonary hypertension classification
KR102161511B1 (en) Extracting method for biomarker for diagnosis of biliary tract cancer, computing device therefor, biomarker for diagnosis of biliary tract cancer, and biliary tract cancer diagnosis device comprising same
JP4461263B2 (en) Method for obtaining data for enabling early diagnosis of Dravet syndrome and use thereof
Chen et al. Deep learning integration of chest computed tomography imaging and gene expression identifies novel aspects of COPD

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14903200

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14903200

Country of ref document: EP

Kind code of ref document: A1