WO2016049932A1 - Biomarkers for obesity related diseases - Google Patents

Biomarkers for obesity related diseases Download PDF

Info

Publication number
WO2016049932A1
WO2016049932A1 PCT/CN2014/088062 CN2014088062W WO2016049932A1 WO 2016049932 A1 WO2016049932 A1 WO 2016049932A1 CN 2014088062 W CN2014088062 W CN 2014088062W WO 2016049932 A1 WO2016049932 A1 WO 2016049932A1
Authority
WO
WIPO (PCT)
Prior art keywords
dsm
bacteroides
obesity
biomarker
subject
Prior art date
Application number
PCT/CN2014/088062
Other languages
French (fr)
Inventor
Qiang FENG
Dongya ZHANG
Longqing TANG
Jun Wang
Original Assignee
Bgi Shenzhen Co., Limited
Bgi Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bgi Shenzhen Co., Limited, Bgi Shenzhen filed Critical Bgi Shenzhen Co., Limited
Priority to CN201480082465.4A priority Critical patent/CN107075446B/en
Priority to PCT/CN2014/088062 priority patent/WO2016049932A1/en
Publication of WO2016049932A1 publication Critical patent/WO2016049932A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to biomarkers and methods for predicting the risk of a disease related to microbes, in particular obesity or related diseases.
  • Obesity which is prevalent in developed countries, has increased considerably worldwide (de Carvalho Pereira et al. , 2013) . It is reported that the prevalence of overweight and obesity combined rose by 27.5% for adults and 47.1% for children between 1980 and 2013 in the world. The number of overweight individuals increased from 857 million in 1980, to 2.1 billion in 2013, and of these, 671 million are affected by obesity. More than 50% of which live in ten countries, and USA has the largest number of obese individuals, followed by China (Ng et al. , 2014) .
  • BMI body mass index
  • Embodiments of the present disclosure seek to solve at least one of the problems existing in the prior art to at least some extent.
  • the present invention is based on the following findings by the inventors:
  • GWAS Metagenome-Wide Association Study
  • the inventors identified and validated 54 obesity-associated gut microbes.
  • the inventors calculated probability of illness through a random forest model based on the 54 obesity-associated gut microbes.
  • the inventors' data provide insight into the characteristics of the gut metagenome related to obesity risk, a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders, and the potential usefulness for a gut-microbiota-based approach for assessment of individuals at risk of such disorders.
  • the markers of the present invention are more specific and sensitive as compared with conventional markers.
  • analysis of stool promises accuracy, safety, affordability, and patient compliance. And samples of stool are transportable.
  • the present invention relates to an in vitro method, which is comfortable and noninvasive, so people will participate in a given screening program more easily.
  • the markers of the present invention may also serve as tools for therapy monitoring in obesity patients to detect the response to therapy.
  • a biomarker set for predicting a disease related to microbiota in a subject consisting of:
  • gut biomarker comprising Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp.
  • 3_1_33FAA Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp.
  • Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353 or
  • microbes with genomic DNA sequences comprising SEQ ID NO: 1 to 48497
  • the biomarker set consists of at least one of the species listed in Table 3, preferably at least 10% , at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%of the species listed in Table 3.
  • biomarker set for predicting a disease related to microbiota in a subject consisting of:
  • a gut biomarker comprises at least a partial sequence of SEQ ID NO: 1 to 48497 as stated in Table 4.
  • the disease is obesity or related disease.
  • kits for determining the gene marker set described above comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth in at least a partial sequence of SEQ ID NO: 1 to 48497.
  • kits for determining the gene marker set described above comprising one or more probes designed according to the genes as set forth in SEQ ID NO: 1 to 48497.
  • the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  • the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  • the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
  • Dialister invisus DSM 15470 Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf.
  • prausnitzii KLE1255 Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp.
  • Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
  • the training dataset is at least one of Table 5-1 ⁇ 5-2 ⁇ 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  • the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  • the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  • the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
  • Dialister invisus DSM 15470 Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf.
  • prausnitzii KLE1255 Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp.
  • Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
  • the training dataset is at least one of Table 5-1 ⁇ 5-2 ⁇ 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  • a method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota comprising:
  • the method comprises:
  • the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  • the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  • the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
  • Dialister invisus DSM 15470 Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf.
  • prausnitzii KLE1255 Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp.
  • Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
  • the training dataset is at least one of Table 5-1 ⁇ 5-2 ⁇ 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  • Fig. 1 The association analysis of Obese p-value distribution identified a disproportionate over-representation of strongly associated markers at lower P-values.
  • Fig. 5 The probability of the illness were calculated in each before and after operation samples.
  • Fig. 6 shows the variation of probability of the illness in each samples.
  • Fig. 7 shows the probability of the illness among the three groups. The probability after the operation samples were significantly lower than before.
  • Example 1 Identifying biomarkers for evaluating obesity risk
  • DNA library construction was performed following the manufacturer ⁇ s instruction (Illumina, insert size 350bp, read length 100bp) .
  • the inventors used the same workflow as described previously to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridiza-tion of the sequencing primers.
  • the inventors constructed one paired-end (PE) library with insert size of 350 bp for each sample, followed by a high-throughput sequencing to obtain around 30 million PE reads of length 2x100bp.
  • High-quality reads were obtained by filtering low-quality reads with ambiguous ⁇ N' bases, adapter contamination and human DNA contamination from the Illumina raw reads, and by trimming low-quality terminal bases of reads simultaneously.
  • the inventors totally output about 5.9 Gb per sample of fecal micbiota sequencing data (high quality clean data) (Table 1) from 158 samples (78 cases and 80 controls) on Illumina HiSeq 2000 platform.
  • Table 1 Summary of metagenomic data. Fourth column reports results from Wilcoxon rank-sum tests.
  • the average reads mapping rate was shown on Table 1. This mapping rate was close to the samples in Li, J. et al. 2014, supra, which indicated that this mapping rate was sufficient for the further study.
  • the inventors derived the gene profile (9.9Mb genes) from the mapping result using the same method as Li, J. et al. 2014, supra.
  • Taxonomic assignment of genes was performed using an in-house pipeline which had described in the published paper (Li, J. et al. 2014, supra) .
  • PERMANOVA permutational multivariate analysis of variance
  • the inventors performed the analysis using the method implemented in package ′′vegan′′ in R, and the permuted p-value was obtained by 10,000 times permutations.
  • the inventors also corrected for multiple testing using ′′p. adjust′′ in R with Benjamini-Hochberg method to get the q-value for each test.
  • PERMANOA identified three significant factors associated with gut microbe (based on gene profiles) (q ⁇ 0.05, Table 2) .
  • FDR false discovery rate
  • Receiver Operator Characteristic (ROC) analysis The inventors applied the ROC analysis to assess the performance of the obesity classification based on metagenomic markers. The inventors then used the “pROC” package in R to draw the ROC curve.
  • ROC Receiver Operator Characteristic
  • 237 MLG species based on the 396, 100 obesity associated maker genes profile.
  • the inventors used the 396, 100 gene markers to built the metagenomic linkage group (MLG) using the same method described in the published T2D paper (Qin et al. 2012, supra) . All the 396, 100 genes were annotated by aligning these genes to the 4, 653 reference genomes in IMG v400. An MLG was assigned to a genome if more than 50% constitutive genes were annotated to that genome, otherwise it was termed as unclassified. Total 237 MLG genomes with gene number > 100 were selected (P-value ⁇ 0.01) . To estimate the relative abundance of an MLG species, the inventors estimated the average abundance of the genes of the MLG species, after removing the 5% lowest and 5% highest abundant genes (Qin et al. 2012, supra) .
  • a random forest model (R. 2.14, randomForest4.6-7 package) (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) was trained using the MLG abundance profile of the training cohort (158 samples) to select the optimal set of MLG markers. The model was tested on one or more testing sets and the prediction error was calculated.
  • RandomForest4.6-7 package package in R vision 2.14
  • input is a training dataset (namely relative abundance profiles of selected MLGs in training samples)
  • sample disease status sample disease status of training samples is a vectot, 1 for obesity, 0 for control
  • test set just the relative abundance profiles of selected MLGs in test set
  • the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the test set.
  • Output is the prediction results (probability of illness; cutoff is 0.5 and if the probability of illness ⁇ 0.5, the subject is at risk of obesity)
  • MLG species marker identification To identify 237 MLG species makers, the inventors used “randomForest4.6-7 package” package in R vision 2.14 based on the 237 obesity associated MLG species. Firstly, the inventors sorted all the 237 MLG species by the importance given by the “randomForest” method (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) . MLG marker sets were constructed by creating incremental subsets of the top ranked MLG species, starting from 1 MLG species and ending at all 237 MLG species. For each MLG makers set, the inventors calculated the false predication ratio in the 158 samples.
  • the 54 MLG species sets with lowest false prediction ratio were selected out as MLG species makers (Table 3 and Table 4) .
  • the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the selected MLG species markers (Table 5-0 ⁇ 5-1 ⁇ 5-2 ⁇ 5-3 ⁇ 5-4) and the area under the ROC curve (AUC) was 0.9651 in the 158 samples (Fig. 2) .
  • TPR true positive rate
  • FPR false positive rate
  • Table 3 54 most discriminant MLGs (species markers) associated with obesity
  • mlg_id 4404 33424 ⁇ 35102 1679 mlg_id: 6546 35103 ⁇ 35487 385 mlg_id: 11335 35488 ⁇ 35965 478 mlg_id: 3647 35966 ⁇ 38270 2305 mlg_id: 94 38271 ⁇ 38506 236 mlg_id: 13978 38507 ⁇ 40389 1883 mlg_id: 12929 40390 ⁇ 40550 161 mlg_id: 3665 40551 ⁇ 40691 141 mlg_id: 12935 40692 ⁇ 41880 1189 mlg_id: 31770 41881 ⁇ 42212 332 mlg_id: 5711 42213 ⁇ 44891 2679 mlg_id: 15213 44892 ⁇ 47552 2661 mlg_id: 2245 47553 ⁇ 48497 945
  • DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1.
  • the inventors calculated the gene abundance profile for these samples using the same method as described in Qin et al. 2012, supra. Then the gene relative abundance of each of the markers as set forth in SEQ ID NOs: 1-48497 was determined.
  • the inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
  • the inventors used random forest to set a model.
  • About the randomForest model using “randomForest4.6-7 package” package in R vision 2.14, input is a training dataset (namely relative abundance profiles of selected MLGs in training samples, Table 5-1 ⁇ 5-2 ⁇ 5-3 ⁇ 5-4) , sample disease status (sample disease status of training samples is a vectot,1 for obesity, 0 for control) , and a test set ( just the relative abundance profiles of selected MLGs in test set) .
  • the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the test set.
  • Output is the prediction results (probability of illness; cutoff is 0.5 and if the probability of illness ⁇ 0.5, the subject is at risk of obesity) .
  • Case means before operation samples
  • control means after operation 1 month and 3 month.
  • DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1.
  • the inventors calculated the gene abundance profile for these samples using the same method as described in Qin et al. 2012, supra. Then the gene relative abundance of each of the markers as set forth in SEQ ID NOs: 1-48497 was determined.
  • the inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
  • the inventors used random forest to set a model.
  • About the randomForest model using “randomForest4.6-7 package” package in R vision 2.14, input is a training dataset (namely relative abundance profiles of selected MLGs in training samples, Table 5-1 ⁇ 5-2 ⁇ 5-3 ⁇ 5-4) , sample disease status (sample disease status of training samples is a vectot,1 for obesity, 0 for control) , and a test set ( just the relative abundance profiles of selected MLGs in test set) .
  • the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the test set.
  • Output is the prediction results (probability of illness; cutoff is 0.5 and if the probability of illness ⁇ 0.5, the subject is at risk of obesity) .
  • the probability of the illness were calculated in each before and after operation samples. It shows that after the operation, the probability of the illness are becoming low (Fig. 5, Table 8) .
  • the error rate was 9% (2/22) .
  • the inventors have identified and validated 54 obesity-associated gut microbes by a random forest model based on obesity-associated genes markers. And the inventors have constructed a method to evaluate the risk of obesity disease based on these 54 obesity-associated gut microbes.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention provides biomarkers and methods for predicting the risk of a disease related to microbes, in particular obesity or related diseases.

Description

BIOMARKERS FOR OBESITY RELATED DISEASES
CROSS-REFERENCE TO RELATED APPLICATION
None
FIELD
The present invention relates to biomarkers and methods for predicting the risk of a disease related to microbes, in particular obesity or related diseases.
BACKGROUND
Obesity, which is prevalent in developed countries, has increased considerably worldwide (de Carvalho Pereira et al. , 2013) . It is reported that the prevalence of overweight and obesity combined rose by 27.5% for adults and 47.1% for children between 1980 and 2013 in the world. The number of overweight individuals increased from 857 million in 1980, to 2.1 billion in 2013, and of these, 671 million are affected by obesity. More than 50% of which live in ten countries, and USA has the largest number of obese individuals, followed by China (Ng et al. , 2014) .
There is a growing body of evidence suggesting that patients who are diagnosed by their physician that they are overweight are more likely to lose weight relative to those who are not diagnosed. However, low rates of physician diagnosis and advice for behavioral health risk factors related to obesity is concerning (Bleich et al. , 2011) .
In children, the diagnosis of obesity is based on age-and gender-specific body mass index (BMI) cut-points. This is in contrast to adults, in which an obesity diagnosis is based on a BMI regardless of age or gender. Unlike adults, for whom obesity diagnostic criteria are simpler, fewer obese children being accurately diagnosed for the more complicated diagnostic criteria and change in terminology for pediatric obesity (Walsh et al. , 2013) . Moreover, limitations of BMI in terms of identification of the different populations should be considered (Nevill et al. , 2006) . Therefore, waist circumference (WC) can be considered a reliable and useful tool for epidemiological studies to assess abdominal adiposity, but this measurement seems to  be harder to perform (Miguel-Etayo et al. , 2014) . What’s more, regional studies of diagnosis of pediatric obesity using International Classification of Diseases, Ninth Revision (ICD-9) , National Ambulatory Medical Care Survey (NAMCS) , and National Hospital Medical Care Survey (NHAMCS) have shown relatively low sensitivity of a clinical diagnosis (Walsh et al. , 2013) .
Recent insight suggests that the human gut microbiota could play an important role in obesity. An early report, based on sequencing of amplified 16S rRNA genes, indicated a much higher ratio of Firmicutes to Bacteroidetes in faecal samples from 12 obese humans than in two lean controls (Ley et al. , 2006) . Recent observational studies using metagenomic sequencing in human obesity have demonstrated reduced bacterial diversity, a relative depletion of Bacteroidetes , and enrichment in genes involved in carbohydrate and lipid metabolism (Allin and Pedersen, 2014) . These correlative findings indicated the altered gut microbiota is a causal factor in the pathogenesis of obesity. This indicating that maybe we can use the characteristics of gut microbiota as criteria to diagnosis of obesity.
In summary, there are considerable missed opportunities and low sensitivity in the diagnosis of obesity. A more valid (less biased) assessment of overweight and/or obesity need to be developed.
SUMMARY
Embodiments of the present disclosure seek to solve at least one of the problems existing in the prior art to at least some extent.
The present invention is based on the following findings by the inventors:
Assessment and characterization of gut microbiota has become a major research area in human disease, including obesity. To carry out analysis on gut microbial content in obesity patients, the inventors carried out a protocol for a Metagenome-Wide Association Study (MGWAS) (Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012) , incorporated herein by reference) based on deep shotgun sequencing of the gut microbial DNA from 158 individuals. The inventors identified and validated 54 obesity-associated gut microbes. To exploit the potential ability of obesity classification by gut microbiota, the  inventors calculated probability of illness through a random forest model based on the 54 obesity-associated gut microbes. The inventors' data provide insight into the characteristics of the gut metagenome related to obesity risk, a paradigm for future studies of the pathophysiological role of the gut metagenome in other relevant disorders, and the potential usefulness for a gut-microbiota-based approach for assessment of individuals at risk of such disorders.
It is believed that 54 obesity-associated gut microbes are valuable for increasing obesity detection at earlier stages due to the following. First, the markers of the present invention are more specific and sensitive as compared with conventional markers. Second, analysis of stool promises accuracy, safety, affordability, and patient compliance. And samples of stool are transportable. Thus, the present invention relates to an in vitro method, which is comfortable and noninvasive, so people will participate in a given screening program more easily. Third, the markers of the present invention may also serve as tools for therapy monitoring in obesity patients to detect the response to therapy.
In one aspect of present disclosure, there is provided with a biomarker set for predicting a disease related to microbiota in a subject consisting of:
gut biomarker comprising Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353 or
microbes, with genomic DNA sequences comprising SEQ ID NO: 1 to 48497
Alternatively, the biomarker set consists of at least one of the species listed in Table 3,  preferably at least 10% , at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 100%of the species listed in Table 3.
In another aspect of present disclosure, there is provided with a biomarker set for predicting a disease related to microbiota in a subject consisting of:
a gut biomarker comprises at least a partial sequence of SEQ ID NO: 1 to 48497 as stated in Table 4..
According to embodiments of present disclosure, the disease is obesity or related disease.
In another aspect of present disclosure, there is provided with a kit for determining the gene marker set described above, comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth in at least a partial sequence of SEQ ID NO: 1 to 48497.
In another aspect of present disclosure, there is provided with a kit for determining the gene marker set described above, comprising one or more probes designed according to the genes as set forth in SEQ ID NO: 1 to 48497.
In another aspect of present disclosure, there is provided with use of the gene marker set described above for predicting the risk of obesity or related disorder in a subject to be tested, comprising:
(1) collecting a samplefrom the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 3 in the samples obtained in step (1) ;
(3) obtaining a probability of obesity by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in  the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
According to embodiments of present disclosure, the training dataset is at least one of Table 5-1、 5-2、 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
In another aspect of present disclosure, there is provided with use of the gene marker set described above for preparation of a kit for predicting the risk of obesity or related disorder in a subject to be tested, comprising:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 3 in the samples obtained in step (1) ;
(3) obtaining a probability of obesity by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on  the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
According to embodiments of present disclosure, the training dataset is at least one of Table 5-1、 5-2、 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder. In another aspect of present disclosure, there is provided with a method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota, comprising:
determining the relative abundance of the biomarkers described above in a sample from the subject, and
determining whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota based on the relative abundance.
According to embodiments of present disclosure, the method comprises:
(1) collecting a sample from the subject to be tested;
(2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 3 in the samples obtained in step (1) ;
(3) obtaining a probability of obesity by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
wherein the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
According to embodiments of present disclosure, the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
According to embodiments of present disclosure, the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
According to embodiments of present disclosure, the relative abundance infromation of each of Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena  DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
According to embodiments of present disclosure, the training dataset is at least one of Table 5-1、 5-2、 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
BRIEF DISCRIPTION OF DRAWINGS
These and other aspects and advantages of the present disclosure will become apparent and more readily appreciated from the following descriptions taken in conjunction with the drawings, in which:
Fig. 1 The association analysis of Obese p-value distribution identified a disproportionate over-representation of strongly associated markers at lower P-values.
Fig. 2 The ROC were drawn by the probability of the illness in training set, and AUC=0.9651.
Fig. 3 To validate 54 MLG markers, the inventors used random forest to set a model, then predicted the 42 samples, and the probability of the illness in each samples were calculated. The predict error rate is 8/42=19.05%.
Fig. 4 The ROC in test set (42 samples) were drawn by the probability of the illness in test set, and AUC=0.9188.
Fig. 5 The probability of the illness were calculated in each before and after operation samples.
Fig. 6 shows the variation of probability of the illness in each samples.
Fig. 7 shows the probability of the illness among the three groups. The probability after the operation samples were significantly lower than before.
EXAMPLES
Terms used herein have meanings as commonly understood by a person of ordinary skill in the areas relevant to the present invention. Terms such as “a” , “an” and “the” are not intended to refer to only a singular entity, but include the general class of which a specific example may be used for illustration. The terminology herein is used to describe specific embodiments of the  invention, but their usage does not delimit the invention, except as outlined in the claims.
The present invention is further exemplified in the following non-limiting Examples. Unless otherwise stated, parts and percentages are by weight and degrees are Celsius. As apparent to one of ordinary skill in the art, these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only, and the agents were all commercially available.
Example 1. Identifying biomarkers for evaluating obesity risk
1.1 Sample collection
Fecal samples from 158 Chinese subjects, including 78 obesity patients and 80 control subjects (training set) , were collected by Rui Jin Hospital Shanghai Jiao Tong Univeristy School of Medicine in 2012. Obesity patients were age from 18 to 30 with BMI over 25. Subjects were asked to collect fresh feces samples at hospital. Collected samples were put in sterile tubes and stored at -80℃ immediately until further analysis.
The complete ethical approval has been obtained, and all the patients gave written informed consent. The study was approved by the Institutional Review Board of Rui Jin Hospital Shanghai Jiao Tong Univeristy School of Medicine.
1.2 DNA extraction
Fecal samples were thawed on ice and DNA extraction was performed using the Qiagen QIAamp DNA Stool Mini Kit (Qiagen) according to manufacturer`s instructions. Extracts were treated with DNase-free RNase to eliminate RNA contamination. DNA quantity was determined using NanoDrop spectrophotometer, Qubit Fluorometer (with the Quant-iTTMdsDNA BR Assay Kit) and gel electrophoresis.
1.3 DNA library construction and sequencing of fecal samples
DNA library construction was performed following the manufacturer`s instruction (Illumina, insert size 350bp, read length 100bp) . The inventors used the same workflow as described previously to perform cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridiza-tion of the sequencing primers. The inventors constructed one paired-end (PE) library with insert size of 350 bp for each sample, followed by a high-throughput sequencing to obtain around 30 million PE reads of length 2x100bp. High-quality reads were obtained by filtering low-quality reads with ambiguous `N' bases, adapter contamination and human DNA contamination from the Illumina raw reads, and by trimming low-quality terminal bases of reads simultaneously.
The inventors totally output about 5.9 Gb per sample of fecal micbiota sequencing data (high quality clean data) (Table 1) from 158 samples (78 cases and 80 controls) on Illumina HiSeq 2000 platform.
Table 1 Summary of metagenomic data. Fourth column reports results from Wilcoxon rank-sum tests.
Figure PCTCN2014088062-appb-000001
1.4 Metagenomic data processing and analysis
1.4.1 Reads mapping
The inventors used the updated human gut gene catalogue established in Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. (2014) (incorporated herein by reference) and mapped the high quality reads to it with the alignment criteria identity >= 90. The average reads mapping rate was shown on Table 1. This mapping rate was close to the samples in Li, J. et al. 2014, supra, which indicated that this mapping rate was sufficient for the further study. After reads mapping, the inventors derived the gene profile (9.9Mb genes) from the mapping result using the same method as Li, J. et al. 2014, supra.
Taxonomic assignment of genes. Taxonomic assignment of the predicted genes was performed using an in-house pipeline which had described in the published paper (Li, J. et al. 2014, supra) .
1.4.2 Data profile construction
Gene profile. Based on the reads mapping results, the inventors use the same method described in the published T2D paper (Qin et al. 2012, supra) to compute the relative gene abundance.
1.4.3 Analysis of factors influencing gut microbiota gene profile. The inventors used the  permutational multivariate analysis of variance (PERMANOVA) to assess the effect of 6 clinical parameters, including age, sex, height, weight, BMI and obese, based on gene profile . The inventors performed the analysis using the method implemented in package ″vegan″ in R, and the permuted p-value was obtained by 10,000 times permutations. The inventors also corrected for multiple testing using ″p. adjust″ in R with Benjamini-Hochberg method to get the q-value for each test. PERMANOA identified three significant factors associated with gut microbe (based on gene profiles) (q <0.05, Table 2) . The analysis indicated weight, BMI and obese status were strong associated markers, supporting the diseases (obese) status was the major determinant influencing the composition of gut microbiota.
Table 2 PERMANOVA based on euclidean distance analysis of gene profile. The analysis was conducted to test whether clinical parameters, and obese status have significant impact on the gut microbiota with q-value <0.05.
phenotype Df Sums Of Sqs Mean Sqs F. Model R2 Pr (>F)
Age 1 0.317034738 0.317034738 1.004112579 0.006395454 0.4094
Sex 1 0.377329497 0.377329497 1.196542903 0.007611763 0.1727
Height 1 0.331409667 0.331409667 1.049947284 0.006685435 0.3291
Weight 1 0.969536515 0.969536515 3.111941857 0.019558192 1.00E-04
BMI
1 0.954186893 0.954186893 3.0617069 0.019248548 1.00E-04
Obese
1 0.972185352 0.972185352 3.120613959 0.019611626 2.00E-04
1.4.4 Identification of obesity associated markers
Identification of obesity associated genes. To identify the association between the metagenomic profile and obesity, a two-tailed Wilcoxon rank-sum test was used in 9,879,897 high occurrence gene (genes that were present in less than 10 samples across all 158 samples were removed) profiles. 396, 100 gene markers were obtained, which were enriched in either case or control with p-value < 0.01, FDR= 3.8% (Fig. 1) .
Estimating the false discovery rate (FDR) . Instead of a sequential p-value rejection method, the inventors applied the “q-value” method proposed in a previous study to estimate the FDR (Storey, J. D. A direct approach to false discovery rates. Journal of the Royal Statistical Society 64,  479-498 (2002) , incorporated herein by reference) .
Receiver Operator Characteristic (ROC) analysis. The inventors applied the ROC analysis to assess the performance of the obesity classification based on metagenomic markers. The inventors then used the “pROC” package in R to draw the ROC curve.
1.4.5 Construction of MLG and identification of obesity associated MLG species markers
237 MLG species based on the 396, 100 obesity associated maker genes profile. The inventors used the 396, 100 gene markers to built the metagenomic linkage group (MLG) using the same method described in the published T2D paper (Qin et al. 2012, supra) . All the 396, 100 genes were annotated by aligning these genes to the 4, 653 reference genomes in IMG v400. An MLG was assigned to a genome if more than 50% constitutive genes were annotated to that genome, otherwise it was termed as unclassified. Total 237 MLG genomes with gene number > 100 were selected (P-value <0.01) . To estimate the relative abundance of an MLG species, the inventors estimated the average abundance of the genes of the MLG species, after removing the 5% lowest and 5% highest abundant genes (Qin et al. 2012, supra) .
1.5 MLG-based classifier
A random forest model (R. 2.14, randomForest4.6-7 package) (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) was trained using the MLG abundance profile of the training cohort (158 samples) to select the optimal set of MLG markers. The model was tested on one or more testing sets and the prediction error was calculated.
About the randomForest model, using “randomForest4.6-7 package” package in R vision 2.14, input is a training dataset (namely relative abundance profiles of selected MLGs in training samples) , sample disease status (sample disease status of training samples is a vectot, 1 for obesity, 0 for control) , and a test set (just the relative abundance profiles of selected MLGs in test set) . Then the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the test set. Output is the prediction results (probability of illness; cutoff is 0.5 and if the probability of illness ≥ 0.5, the subject is at risk of obesity)
54 MLG species marker identification. To identify 237 MLG species makers, the inventors used “randomForest4.6-7 package” package in R vision 2.14 based on the 237 obesity associated  MLG species. Firstly, the inventors sorted all the 237 MLG species by the importance given by the “randomForest” method (Liaw, Andy & Wiener, Matthew. Classification and Regression by randomForest, R News (2002) , Vol. 2/3 p. 18, incorporated herein by reference) . MLG marker sets were constructed by creating incremental subsets of the top ranked MLG species, starting from 1 MLG species and ending at all 237 MLG species. For each MLG makers set, the inventors calculated the false predication ratio in the 158 samples. Finally, the 54 MLG species sets with lowest false prediction ratio were selected out as MLG species makers (Table 3 and Table 4) . Furthermore, the inventors drew the ROC curve using the OOB (out of bag) prediction probability of illness from randomForest model based on the selected MLG species markers (Table 5-0、 5-1、 5-2、 5-3、 5-4) and the area under the ROC curve (AUC) was 0.9651 in the 158 samples (Fig. 2) . At the best cutoff 0.5294, true positive rate (TPR) was 0.8625, and false positive rate (FPR) was 0.07692, indicating that the 54 MLG markers could be used to accurately classify obesity individuals.
Table 3 54 most discriminant MLGs (species markers) associated with obesity
Figure PCTCN2014088062-appb-000002
Figure PCTCN2014088062-appb-000003
Figure PCTCN2014088062-appb-000004
Table 4. SEQ ID of the 54 MLG species
MLG ID SEQ ID NO: genes number
mlg_id: 11523 1~1762 1762
mlg_id: 127 1763~1867 105
mlg_id: 3640 1868~2332 465
mlg_id: 13556 2333~2457 125
mlg_id: 10626 2458~2598 141
mlg_id: 3600 2599~3865 1267
mlg_id: 139 3866~5780 1915
mlg_id: 3231 5781~5888 108
mlg_id: 15383 5889~6022 134
mlg_id: 6315 6023~6157 135
mlg_id: 2514 6158~8015 1858
mlg_id: 10488 8016~8816 801
mlg_id: 141 8817~8982 166
mlg_id: 8678 8983~10386 1404
mlg_id: 844 10387~12442 2056
mlg_id: 13519 12443~12728 286
mlg_id: 12243 12729~12944 216
mlg_id: 74 12945~13047 103
mlg_id: 15399 13048~13425 378
mlg_id: 252 13426~13576 151
mlg_id: 12049 13577~13848 272
mlg_id: 31817 13849~14267 419
mlg_id: 8585 14268~17734 3467
mlg_id: 61 17735~17890 156
mlg_id: 10628 17891~18031 141
mlg_id: 1180 18032~18234 203
mlg_id: 115 18235~19185 951
mlg_id: 11108 19186~20157 972
mlg_id: 994 20158~20261 104
mlg_id: 13037 20262~20391 130
mlg_id: 20358 20392~20644 253
mlg_id: 10879 20645~20749 105
mlg_id: 2728 20750~21085 336
mlg_id: 605 21086~23716 2631
mlg_id: 10526 23717~24188 472
mlg_id: 2060 24189~28313 4125
mlg_id: 4035 28314~28498 185
mlg_id: 7361 28499~30352 1854
mlg_id: 15056 30353~30659 307
mlg_id: 12221 30660~33317 2658
mlg_id: 8423 33318~33423 106
mlg_id: 4404 33424~35102 1679
mlg_id: 6546 35103~35487 385
mlg_id: 11335 35488~35965 478
mlg_id: 3647 35966~38270 2305
mlg_id: 94 38271~38506 236
mlg_id: 13978 38507~40389 1883
mlg_id: 12929 40390~40550 161
mlg_id: 3665 40551~40691 141
mlg_id: 12935 40692~41880 1189
mlg_id: 31770 41881~42212 332
mlg_id: 5711 42213~44891 2679
mlg_id: 15213 44892~47552 2661
mlg_id: 2245 47553~48497 945
Table 5-0 Prediction results of 54 MLGs in 158 samples
Figure PCTCN2014088062-appb-000005
Figure PCTCN2014088062-appb-000006
Figure PCTCN2014088062-appb-000007
Figure PCTCN2014088062-appb-000008
Figure PCTCN2014088062-appb-000009
Figure PCTCN2014088062-appb-000010
Figure PCTCN2014088062-appb-000011
Figure PCTCN2014088062-appb-000012
Figure PCTCN2014088062-appb-000013
Figure PCTCN2014088062-appb-000014
Figure PCTCN2014088062-appb-000015
Figure PCTCN2014088062-appb-000016
Figure PCTCN2014088062-appb-000017
Figure PCTCN2014088062-appb-000018
Figure PCTCN2014088062-appb-000019
Figure PCTCN2014088062-appb-000020
Figure PCTCN2014088062-appb-000021
Figure PCTCN2014088062-appb-000022
Figure PCTCN2014088062-appb-000023
Figure PCTCN2014088062-appb-000024
Figure PCTCN2014088062-appb-000025
Figure PCTCN2014088062-appb-000026
Example 2. Validating the 54 MLG biomarkers in 42 samples (test set)
The inventors validated the discriminatory power of the obesity classifier using another new independent study group, including 17 obesity patients and 25 non- obesity controls that were also collected in Rui Jin Hospital Shanghai Jiao Tong Univeristy School of Medicine .
For each sample, DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1. The inventors calculated the gene abundance profile for these samples using the same method as described in Qin et al. 2012, supra. Then the gene relative abundance of each of the markers as set forth in SEQ ID NOs: 1-48497 was determined. The inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
To validate 54 MLG markers, the inventors used random forest to set a model. About the randomForest model, using “randomForest4.6-7 package” package in R vision 2.14, input is a training dataset (namely relative abundance profiles of selected MLGs in training samples, Table 5-1、 5-2、 5-3、 5-4) , sample disease status (sample disease status of training samples is a vectot,1 for obesity, 0 for control) , and a test set ( just the relative abundance profiles of selected MLGs in test set) . Then the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the test set. Output is the prediction results (probability of illness; cutoff is 0.5 and if the probability of illness ≥ 0.5, the subject is at risk of obesity) .
Then the inventors predicted the 42 samples, and the probability of the illness in each samples were calculated (Table 6) . The predict error rate is 8/42=19.05% (Fig. 3) . And most of obesity patients (16/17) were diagnosed as obesity correctly. Also, the ROC in test set were drawn by the probability of the illness in test set, and AUC=0.9188 (Fig. 4) , validating that the 54 MLG markers could be used to accurately classify obesity individuals. At the best cutoff 0.592, true positive rate (TPR) was 0.8824, and false positive rate (FPR) was 0.16.
Table 6 Prediction results of 54 MLGs in 42 samples
Figure PCTCN2014088062-appb-000027
Figure PCTCN2014088062-appb-000028
Example 3. Validating the 54 MLG biomarkers in 22 samples (test set)
The inventors validated the discriminatory power of the obesity classifier using another test set samples (Table 7) , including 9 case samples and 13 control samples (5 samples after operation 1 month and 8 samples after operation 3 month) that were also collected in Rui Jin Hospital Shanghai Jiao Tong Univeristy School of Medicine . Case means before operation samples, control means after operation 1 month and 3 month.
Table 7 22 samples information
Before 1-M 3-M
DB62 DB-S1-62 DB-S3-62
DB67   DB-S3-67
DB68 DB-S1-68 DB-S3-68
DB78   DB-S3-78
DB85 DB-S1-85  
DB124 DB-S1-124 DB-S3-124
DB125 DB-S1-125 DB-S3-125
DB126   DB-S3-126
DB01   DB-S3-01
*Before: before the operation; 1-M: operation after one month; 3-M: operation after three month.
For each sample, DNA was extracted and a DNA library was constructed followed by high throughput sequencing as described in Example 1. The inventors calculated the gene abundance profile for these samples using the same method as described in Qin et al. 2012, supra. Then the gene relative abundance of each of the markers as set forth in SEQ ID NOs: 1-48497 was determined. The inventors estimated the relative abundance of a MLG in all samples by using the relative abundance values of genes from this MLG (Qin et al. 2012, supra) .
To validate 54 MLG markers, the inventors used random forest to set a model. About the randomForest model, using “randomForest4.6-7 package” package in R vision 2.14, input is a training dataset (namely relative abundance profiles of selected MLGs in training samples, Table 5-1、 5-2、 5-3、 5-4) , sample disease status (sample disease status of training samples is a vectot,1 for obesity, 0 for control) , and a test set ( just the relative abundance profiles of selected MLGs in test set) . Then the inventors used the randomForest function from randomForest package in R software to build the classification, and predict function was used to predict the test set. Output is the prediction results (probability of illness; cutoff is 0.5 and if the probability of illness ≥ 0.5, the subject is at risk of obesity) .
The probability of the illness were calculated in each before and after operation samples. It shows that after the operation, the probability of the illness are becoming low (Fig. 5, Table 8) . The error rate was 9% (2/22) .
To know the variation of each samples, the probability of the illness were showed in Fig. 6 .  And among the three groups, the probability of illness after the operation samples were significantly lower than before (Fig. 7, Table 9) , validating that the 54 MLG markers could be used to accurately classify obesity individuals.
Table 8 Prediction results of 54 MLGs in 22 samples
Samples (DB: obesity) probability of obesity
DB85 0.882
DB68 0.826
DB78 0.822
DB01 0.81
DB125 0.776
DB126 0.732
DB62 0.662
DB.S1.68 0.654
DB124 0.606
DB67 0.592
DB.S1.62 0.5
DB.S3.126 0.496
DB.S1.125 0.476
DB.S3.125 0.458
DB.S3.62 0.412
DB.S1.85 0.396
DB.S3.124 0.368
DB.S3.67 0.354
DB.S1.124 0.34
DB.S3.01 0.336
DB.S3.68 0.256
DB.S3.78 0.252
Table 9 P value of probability of illness in three groups
  Before 1-M 3-M
Before   0.0007437 4.114e-05
1-M     0.8584
Thus the inventors have identified and validated 54 obesity-associated gut microbes by a random forest model based on obesity-associated genes markers. And the inventors have constructed a method to evaluate the risk of obesity disease based on these 54 obesity-associated gut microbes.
Although explanatory embodiments have been shown and described, it would be appreciated by those skilled in the art that the above embodiments can not be construed to limit the present disclosure, and changes, alternatives, and modifications can be made in the embodiments without departing from spirit, principles and scope of the present disclosure.

Claims (21)

  1. A biomarker set for predicting a disease related to microbiota in a subject consisting of:
    gut biomarker comprising Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, or
    microbes, with genomic DNA sequences comprising SEQ ID NO: 1 to 48497.
  2. A biomarker set for predicting a disease related to microbiota in a subject according to claim 1 consisting of:
    a gut biomarker comprises at least a partial sequence of SEQ ID NO: 1 to 48497.
  3. The biomarker set for predicting a disease related to microbiota in a subject of claim 1 or 2, wherein the disease is obesity or related disease.
  4. A kit for determining the gene marker set of any one of claims 1 to 3, comprising primers used for PCR amplification and designed according to the DNA sequecne as set forth in claim 2.
  5. A kit for determining the gene marker set of any one of claims 1 to 3, comprising one or more probes designed according to the genes as set forth in claim 2.
  6. Use of the gene marker set of any one of claims 1 to 3 for predicting the risk of obesity or related disorder in a subject to be tested, comprising:
    (1) collecting a sample from the subject to be tested;
    (2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 3 in the samples obtained in step (1) ;
    (3) obtaining a probability of obesity by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
    wherein the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  7. The use of claim 6, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  8. The use of claim 7, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
  9. The use of claim 7, wherein the relative abundance infromation of each of Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
  10. The use of claim 7, wherein the training dataset is at least one of Table 5-1、 5-2、 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  11. Use of the gene marker set of any one of claims 1 to 3 for preparation of a kit for predicting the risk of obesity or related disorder in a subject to be tested, comprising:
    (1) collecting a sample from the subject to be tested;
    (2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 3 in the samples obtained in step (1) ;
    (3) obtaining a probability of obesity by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
    wherein the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  12. The use of claim 11, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  13. The use of claim 12, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
  14. The use of claim 12, wherein the relative abundance infromation of each of Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon 158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and  Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
  15. The use of claim 12, wherein the training dataset is at least one of Table 5-1、 5-2、 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  16. A method of diagnosing whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota, comprising:
    determining the relative abundance of the biomarkers of any one of claims 1 to 3 in a sample from the subject, and
    determining whether a subject has an abnormal condition related to microbiota or is at the risk of developing an abnormal condition related to microbiota based on the relative abundance.
  17. The method according to claim 16, comprising:
    (1) collecting a sample from the subject to be tested;
    (2) determining the relative abundance information of each biomarker of the biomarker set according to any one of claims 1 to 3 in the samples obtained in step (1) ;
    (3) obtaining a probability of obesity by comparing the relative abundance information of each biomarker of subject to be tested with a training dataset using a Multivariate statistical model,
    wherein the probability of obesity greater than a cutoff indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
  18. The method of claim 17, wherein the training dataset is constructed based on the relative abundance information of each biomarker of a plurality of subjects having obesity and a plurality of normal subjects using a Multivariate statistical model, alternatively the Multivariate statistical model is a randomForest model.
  19. The method of claim 18, wherein the training dataset is a matrix with each row representing each biomarker of the biomarker set according to any one of claims 1 to 3, each column representing samples, each cell representing relative abundance profile of a biomarker in the sample, and sample disease status is a vectot, with 1 for obesity and 0 for control.
  20. The method of claim 18, wherein the relative abundance infromation of each of Bacteroides intestinalis DSM 17393, Alistipes shahii WAL 8301, Faecalibacterium prausnitzii L2-6, Bacteroides ovatus 3_8_47FAA, Bacteroides intestinalis DSM 17393, Bacteroides sp. 1_1_30, Coprococcus eutactus ATCC 27759, Klebsiella pneumoniae 342, Veillonella sp. oral taxon  158 str. F0412, Bacteroides sp. 1_1_30, Dialister invisus DSM 15470, Bacteroides intestinalis DSM 17393, Faecalibacterium prausnitzii L2-6, Bacteroides sp. 3_1_33FAA, Faecalibacterium cf. prausnitzii KLE1255, Klebsiella oxytoca KCTC 1686, Bacteroides thetaiotaomicron VPI-5482, Bacteroides ovatus 3_8_47FAA, Haemophilus parainfluenzae T3T1, Haemophilus parainfluenzae T3T1, Faecalibacterium prausnitzii L2-6, Bacteroides intestinalis DSM 17393, Haemophilus parainfluenzae T3T1, Ruminococcus sp. 5_1_39BFAA, Coprococcus comes ATCC 27758, Eubacterium hallii DSM 3353, Dorea formicigenerans ATCC 27755, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Dorea longicatena DSM 13814, Ruminococcus obeum A2-162, Eubacterium hallii DSM 3353, Ruminococcus torques L2-14, Dorea longicatena DSM 13814, Collinsella aerofaciens ATCC 25986, Ruminococcus obeum A2-162 and Eubacterium hallii DSM 3353 is obtained based on the relative abundance information of SEQ ID NO: 1 to 48497.
  21. The method of claim 18, wherein the training dataset is at least one of Table 5-1、 5-2、 5-3 and 5-4, and the probability of obesity being at least 0.5 indicates that the subject to be tested has or is at the risk of developing the obesity or related disorder.
PCT/CN2014/088062 2014-09-30 2014-09-30 Biomarkers for obesity related diseases WO2016049932A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201480082465.4A CN107075446B (en) 2014-09-30 2014-09-30 Biomarkers for obesity related diseases
PCT/CN2014/088062 WO2016049932A1 (en) 2014-09-30 2014-09-30 Biomarkers for obesity related diseases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/088062 WO2016049932A1 (en) 2014-09-30 2014-09-30 Biomarkers for obesity related diseases

Publications (1)

Publication Number Publication Date
WO2016049932A1 true WO2016049932A1 (en) 2016-04-07

Family

ID=55629355

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/088062 WO2016049932A1 (en) 2014-09-30 2014-09-30 Biomarkers for obesity related diseases

Country Status (2)

Country Link
CN (1) CN107075446B (en)
WO (1) WO2016049932A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10357521B2 (en) 2015-05-14 2019-07-23 University Of Puerto Rico Methods for restoring microbiota of newborns
EP3520799A1 (en) * 2018-02-06 2019-08-07 European Molecular Biology Laboratory In-vitro model of the human gut microbiome and uses thereof in the analysis of the impact of xenobiotics
WO2020123787A1 (en) * 2018-12-12 2020-06-18 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Integrated proviral sequencing assay
CN112154202A (en) * 2018-05-04 2020-12-29 4D制药研究有限公司 Simulating an intestinal environment
EP3697219A4 (en) * 2017-10-16 2021-07-14 Mayo Foundation for Medical Education and Research Methods and materials for identifying and treating mammals responsive to obesity treatments
US11564667B2 (en) 2015-12-28 2023-01-31 New York University Device and method of restoring microbiota of newborns
WO2023091883A3 (en) * 2021-11-19 2023-08-24 University Of Georgia Research Foundation, Inc. Dual-specificity rna aptamers for regulating o-glcnacylation

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019041140A1 (en) * 2017-08-29 2019-03-07 深圳华大基因研究院 Application of alistipes shahii in preparing a composition for preventing and/or treating lipid metabolism related diseases
WO2019204985A1 (en) * 2018-04-24 2019-10-31 深圳华大生命科学研究院 Osteoporosis biomarker and use thereof
WO2019205188A1 (en) * 2018-04-24 2019-10-31 深圳华大生命科学研究院 Biomarker for depression and use thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1978107A1 (en) * 2007-04-03 2008-10-08 Centre National De La Recherche Scientifique (Cnrs) Fto gene polymorphisms associated to obesity and/or type II diabetes
CA2776420A1 (en) * 2009-10-05 2011-04-14 Aak Patent B.V. Methods for diagnosing irritable bowel syndrome
NZ602473A (en) * 2010-03-01 2015-01-30 Agronomique Inst Nat Rech Method of diagnostic of obesity
US10973861B2 (en) * 2013-02-04 2021-04-13 Seres Therapeutics, Inc. Compositions and methods
WO2014145958A2 (en) * 2013-03-15 2014-09-18 Seres Health, Inc. Network-based microbial compositions and methods

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AJAY DUSEJA ET AL.: "Obesity and NAFLD: The role of Bacteria and Microbiota", CLINICS IN LIVER DISEASE, vol. 19, no. 1, 24 October 2013 (2013-10-24), pages 59 - 71 *
CHIH-MIN CHIU ET AL.: "Systematic Analysis of the Association between Gut Flora and Obesity through High-Throughput Sequencing and Bioinformatics Approaches", BIOMED RESEARCH INTERNATIONAL, vol. 2014, 14 August 2014 (2014-08-14), pages 1 - 10 *
JUNJIE QIN ET AL.: "A human gut microbial gene catalog established by metagenomic sequencing", NATURE, vol. 464, no. 7285, 23 September 2013 (2013-09-23), pages 59 - 65 *
YUANHUA LIU ET AL.: "Adapting functional genomic tools to metagenomic analyses: investigating the role of gut bacteria in relation to obesity", BRIEFINGS IN FUNCTIONAL GENOMICS, vol. 9, no. 5, 6 May 2010 (2010-05-06), pages 355 - 361 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10357521B2 (en) 2015-05-14 2019-07-23 University Of Puerto Rico Methods for restoring microbiota of newborns
US11564667B2 (en) 2015-12-28 2023-01-31 New York University Device and method of restoring microbiota of newborns
EP3697219A4 (en) * 2017-10-16 2021-07-14 Mayo Foundation for Medical Education and Research Methods and materials for identifying and treating mammals responsive to obesity treatments
EP3520799A1 (en) * 2018-02-06 2019-08-07 European Molecular Biology Laboratory In-vitro model of the human gut microbiome and uses thereof in the analysis of the impact of xenobiotics
WO2019154823A1 (en) * 2018-02-06 2019-08-15 European Molecular Biology Laboratory In-vitro model of the human gut microbiome and uses thereof in the analysis of the impact of xenobiotics
CN112154202A (en) * 2018-05-04 2020-12-29 4D制药研究有限公司 Simulating an intestinal environment
JP2021521871A (en) * 2018-05-04 2021-08-30 フォーディー ファーマ リサーチ リミテッド4D Pharma Research Limited Simulated intestinal environment
WO2020123787A1 (en) * 2018-12-12 2020-06-18 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Integrated proviral sequencing assay
WO2023091883A3 (en) * 2021-11-19 2023-08-24 University Of Georgia Research Foundation, Inc. Dual-specificity rna aptamers for regulating o-glcnacylation

Also Published As

Publication number Publication date
CN107075446A (en) 2017-08-18
CN107075446B (en) 2022-01-21

Similar Documents

Publication Publication Date Title
WO2016049932A1 (en) Biomarkers for obesity related diseases
US20190367995A1 (en) Biomarkers for colorectal cancer
CN107075563B (en) Biomarkers for coronary artery disease
CN107217089B (en) Method and device for determining individual state
WO2020244017A1 (en) Intestinal flora-based schizophrenia biomarker combination, and applications thereof and motu screening method therefor
EP3245298B1 (en) Biomarkers for colorectal cancer related diseases
CN110904213B (en) Ulcerative colitis biomarker based on intestinal flora and application thereof
CN111500705B (en) IgAN intestinal flora marker, igAN metabolite marker and application thereof
CN107217088B (en) Ankylosing spondylitis microbial markers
CN112899368A (en) Biomarker for early diagnosis of primary hepatocellular carcinoma, detection reagent and application thereof
CN110838365A (en) Irritable bowel syndrome related flora marker and kit thereof
CN111334590A (en) Kit for identifying colorectal cancer and application thereof
CN110643721A (en) Kit for detecting colorectal cancer indicator bacteria
WO2017156739A1 (en) Isolated nucleic acid application thereof
WO2016049927A1 (en) Biomarkers for obesity related diseases
CN114891904A (en) Maternal intestinal flora marker for children ASD diagnosis and application thereof
CN107217086B (en) Disease marker and application
CN105733988B (en) Composition and application
CN105671177B (en) Ankylosing spondylitis marker and application thereof
CN108064273B (en) Biomarkers for colorectal cancer-related diseases
WO2016049917A1 (en) Biomarkers for obesity related diseases
CN109072278A (en) Isolated nucleic acid and application
JP4461263B2 (en) Method for obtaining data for enabling early diagnosis of Dravet syndrome and use thereof
CN110781915A (en) Method for improving colorectal cancer indicator bacterium detection sensitivity by applying support vector machine algorithm
CN114058695B (en) Application of urinary tract flora detection in female urinary tract calculus diagnosis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14903143

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14903143

Country of ref document: EP

Kind code of ref document: A1