CN107075562B

CN107075562B - Biomarkers for obesity related diseases

Info

Publication number: CN107075562B
Application number: CN201480082372.1A
Authority: CN
Inventors: 冯强; 张东亚; 唐龙清; 王俊
Original assignee: BGI Shenzhen Co Ltd
Current assignee: BGI Shenzhen Co Ltd
Priority date: 2014-09-30
Filing date: 2014-09-30
Publication date: 2021-09-24
Anticipated expiration: 2034-09-30
Also published as: CN107075562A; WO2016049917A1

Abstract

Biomarkers and methods for predicting the risk of a microorganism-associated disease, particularly obesity or related diseases, are provided.

Description

Biomarkers for obesity related diseases

Cross Reference to Related Applications

Is free of

Technical Field

The present invention relates to biomarkers and methods for predicting the risk of a disease associated with a microorganism, in particular obesity or related diseases.

Background

Obesity is common in developed countries and increases significantly worldwide (de Carvalho Pereira et al, 2013). It has been reported that adults have increased 27.5% and children have increased 47.1% of the prevalence of total overweight and obesity worldwide between 1980 and 2013. Overweight people increased from 8.57 billion in 1980 to 21 billion in 2013, with 6.71 billion population affected by obesity. Among these, over 50% of obese patients live in ten countries, and the united states has the largest number of obese people, followed by china (Ng et al, 2014).

There is increasing evidence that patients diagnosed as overweight by physicians are more likely to lose weight than patients not diagnosed as overweight. However, the low diagnosis rate of physicians is associated with recommendations for health risk factors for obesity-related behavior (Bleich et al, 2011).

In children, the diagnosis of obesity is based on age and gender specific Body Mass Index (BMI) entry points. This is in contrast to adults, where the diagnosis of obesity is made based on BMI regardless of age or gender. Unlike adults, the diagnostic criteria for obesity are simpler for adults, a few obese children are diagnosed accurately with more complex diagnostic criteria and the terminology for childhood obesity is changed (Walsh et al, 2013). Furthermore, limitations in the identity of BMI among different populations should be considered (Nevill et al, 2006). Thus, Waist Circumference (WC) can be considered a reliable and useful tool for epidemiological studies to assess abdominal adiposity, but such measurements appear to be more difficult to perform (Miguel-Etayo et al, 2014). Furthermore, regional studies on childhood obesity diagnosis using the international classification of diseases (the ninth revision (ICD-9)), the national outpatient medical care survey (NAMCS), and the national inpatient medical care survey (NHAMCS) showed relatively low sensitivity of clinical diagnosis (Walsh et al, 2013).

Recent observations indicate that human gut microbiota can play an important role in obesity. Early reports based on amplified 16s rrna gene sequencing indicated that the ratio of Firmicutes to bacteroides in stool samples from 12 obese humans was much higher than for two lean controls (Ley et al, 2006). Reduced bacterial diversity, relative absence of bacteroides (bacteroides), and enrichment of genes involved in carbohydrate and lipid metabolism have been demonstrated in recent observational studies employing metagenomic sequencing in human obesity (Allin and Pedersen, 2014). These related findings indicate that alterations in gut microbiota are a causative factor in the pathogenesis of obesity. This suggests that perhaps we could use this characteristic of the gut microbiota as a criterion for the diagnosis of obesity.

In summary, there are considerable overlooked opportunities and low sensitivity for the diagnosis of obesity. There is a need to develop more effective (less biased) assessments of overweight and/or obesity.

Disclosure of Invention

Embodiments of the present disclosure seek to address, at least to some extent, at least one of the problems in the prior art.

The present invention is based on the following findings of the present inventors:

the assessment and characterization of gut microbiota has become a major area of research in human diseases including obesity. For analysis of gut microbial composition in obese patients, the present inventors performed a metagenomic association analysis (MGWAS) protocol based on deep shotgun sequencing of gut microbial DNA from 158 individuals (Qin, j. et al, american-wide association study of gut microbiota in type 2diabetes. nature 490,55-60(2012), incorporated herein by reference). The present inventors identified and validated 396,100 obesity-related gene markers. To exploit the potential capability of obesity classifiers by gut microbiota, the present inventors developed a disease classifier system based on 9 gene markers defined as the optimal gene set by the minimum redundancy-maximum correlation (mRMR) feature selection method. In order to visually assess the risk of obesity disease based on the 9 intestinal microbial gene markers, the present inventors calculated a health index. The present inventors' data have conducted intensive studies on the characteristics of the gut metagenome associated with obesity risk, providing an example of future studies of the pathophysiological role of gut metagenome in other related diseases and potential applications for the assessment of individuals at risk for such diseases based on gut microbiota.

It is believed that genetic markers of the gut microbiota are valuable for improving the detectability of obesity in the early stages, for the following reasons. First, the markers of the present invention are more specific and more sensitive than conventional markers. Second, stool analysis ensures accuracy, safety, affordability, and patient compliance. And the stool sample is transportable. The present invention therefore relates to a comfortable and non-invasive in vitro method, making it easier for a person to participate in a given screening procedure. Third, the markers of the invention can also be used as therapy monitoring tools for cancer patients to detect their response to therapy.

One aspect of the present disclosure provides a biomarker set for predicting a disease associated with microbiota in a subject, consisting of:

comprises the amino acid sequence of SEQ ID NO: 1 to 9.

According to an embodiment of the present disclosure, the disease is obesity or a related disease.

With these biomarkers, a subject can be analyzed for certain diseases associated with microbiota, e.g., obesity or related diseases can be determined based on certain samples from the subject, e.g., certain stool samples can be used.

In another aspect of the present disclosure, there is provided a kit for determining the above gene marker set, comprising a nucleic acid sequence for PCR amplification and according to the sequence set forth as SEQ ID NO: 1 to 9, or a DNA sequence as described in at least a partial sequence thereof.

Another aspect of the present disclosure provides a kit for determining the above-mentioned gene marker set, comprising one or more sequences according to SEQ ID NO: 1 to 9, or a pharmaceutically acceptable salt thereof.

Another aspect of the present disclosure provides the use of the above-described gene marker set for predicting the risk of obesity or a related disease in a subject, comprising:

(1) collecting sample j from the subject;

(2) determining the relative abundance of SEQ ID NO: relative abundance information for each of 1 to 9; and

(3) calculated from I as follows_jIndex of sample j represented:

A_ijis the relative abundance of marker i in sample j, wherein i refers to each gene marker in the set of gene markers;

n is a first subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition,

m is a second subset of the markers in the selected biomarkers associated with the abnormal condition enriched in all controls,

| N | and | M | are the number of biomarkers in the first and second subsets respectively,

wherein

An index greater than a threshold value indicates that the subject has an abnormal condition or is at risk of developing an abnormal condition.

According to some embodiments of the disclosure, | N | is 5 and | M | is 4.

According to some embodiments of the present disclosure, the threshold value is 0.03519 to 0.1337.

Another aspect of the present disclosure provides a use of the above gene marker set for the preparation of a kit for predicting the risk of obesity or related diseases in a subject, comprising:

(1) collecting sample j from the subject;

(3) calculated from I as follows_jIndex of sample j represented:

n is a first subset of the selected biomarkers associated with the abnormal condition that are enriched in all patients,

m is a second subset of the selected biomarkers associated with the abnormal condition that are enriched in all controls,

wherein

According to some embodiments of the disclosure, | N | is 5 and | M | is 4.

Another aspect of the present disclosure provides a method of diagnosing whether a subject has or is at risk for developing an abnormal condition associated with a microbiota, comprising:

determining the relative abundance of the aforementioned biomarkers in a sample from the subject, an

Determining whether the subject has or is at risk of developing an abnormal condition associated with the microbiota based on the relative abundance.

According to an embodiment of the present disclosure, the method comprises:

(1) collecting sample j from the subject;

(2) determining the nucleotide sequence of SEQ ID NO: relative abundance information for each of 1 to 9; and

(3) calculated from I as follows_jIndex of sample j represented:

wherein

According to some embodiments of the disclosure, | N | is 5 and | M | is 4.

According to an embodiment of the present disclosure, the abnormal condition associated with microbiota is obesity or a related disease.

Drawings

These and other aspects and advantages of the present disclosure will become apparent and more readily appreciated from the following description, taken in conjunction with the accompanying drawings, wherein:

FIG. 1 correlation analysis of the obesity P-value distribution determines a disproportionate over-representation of strongly related markers at lower P-values.

FIG. 2 the present inventors performed incremental searches in obesity-related gene markers by the minimum redundant maximum correlation (mRMR) method and generated a continuous number of subsets. The error rate for each subset is then estimated by leave-one-out cross-validation (LOOCV) of the linear discriminative classifier. The best (lowest error rate) subset contained 9 gene markers.

Fig. 3ROC is plotted from the obesity index in the training set, AUC 0.9763.

The ROC for the test set (42 samples) of fig. 4 is plotted from the obesity index for the test set, AUC 0.9024.

The ROC for the test set (22 samples) of fig. 5 is plotted from the obesity index for the test set, AUC 0.8462.

Detailed Description

Examples of the invention

The terms used herein have the meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. Terms such as "a," "an," and "the" are not intended to refer to only a singular entity, but include the general class of terms that may be used to describe a particular example. The terms used herein are used to describe specific embodiments of the invention, but their usage does not limit the invention, unless otherwise specified in the claims.

The invention is further illustrated in the following non-limiting examples. Unless otherwise indicated, parts and percentages are by weight and degrees are in degrees Celsius. It will be apparent to those of ordinary skill in the art that these examples, while representing preferred embodiments of the invention, are given by way of illustration only and that all reagents are commercially available.

Detailed description of the preferred embodiments

EXAMPLE 1 identification of biomarkers for assessing obesity Risk

1.1 sample Collection

Stool samples from 158 chinese subjects, including 78 obese patients and 80 control subjects (training set), were collected in 2012 by the rekins hospital, the shanghai university of transportation medical school. Obese patients are from 18 to 30 years of age with a BMI above 25. Subjects were asked to collect fresh stool samples at the hospital. The collected samples were placed in sterile tubes and immediately stored at-80 ℃ until further analysis.

Full ethical approval was obtained and all patients were given written informed consent. The study was approved by the ethical review committee of the rekins hospital, Shanghai university of medicine.

1.2DNA extraction

Stool samples were thawed on ice and DNA extraction was performed using Qiagen QIAamp DNA pool Mini kit (Qiagen) according to the manufacturer's instructions. The extract was treated with DNase-free RNase to eliminate RNA contamination. The amount of DNA was determined using a NanoDrop spectrophotometer, a Qubit fluorometer (with Quant-iTTMdsDNA BR assay kit) and gel electrophoresis.

1.3 DNA library construction and sequencing of fecal samples

DNA library construction was performed according to the manufacturer's instructions (Illumina, insert size 350bp, read length 100 bp). The inventors performed cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridization of sequencing primers using the same workflow as previously described. The inventors constructed a paired-end (PE) library with an insert size of 350bp for each sample, followed by high throughput sequencing to obtain about 3 million PE reads 2x100 bp in length. High quality reads are obtained by filtering low quality reads with indeterminate "N" bases, linker contamination, and human DNA contamination from Illumina raw reads and by simultaneously cleaving the low quality terminal bases of the reads.

The inventors output fecal microbiota sequencing data (high quality clean data) of about 5.9Gb per sample in total from 158 samples (78 cases and 80 controls) on the Illumina HiSeq 2000 platform (table 1).

Table 1 metagenomic data summary. The fourth column reports the results from the Wilcoxon rank sum test.

1.4 metagenomic data processing and analysis

1.4.1 read alignment

The present inventors used a newer human intestinal gene catalog established by Li, J.et al, An integrated catalog of reference genes in the human gut genome. Nat.Biotechnol. (2014) (incorporated herein by reference) and aligned high quality reads to the newer human intestinal gene catalog with An alignment standard identity ≧ 90. The average read alignments are shown in table 1. This alignment is close to the sample in Li, j, et al, 2014, supra, indicating that this alignment is sufficient for further studies. After read alignment, the inventors derived a gene profile (9.9Mb gene) from the alignment results using the same method as Li, j. et al, 2014, supra.

Taxonomic assignment of genes. The internally developed protocol (pipeline) described in published papers (Li, j. et al, 2014, supra) was used for taxonomic assignment of predicted genes.

1.4.2 data File construction

Gene profiling. Based on the results of the read alignment, the inventors calculated relative gene abundance using the same method described in the published T2D paper (Qin et al, 2012, supra).

1.4.3 analysis of factors affecting the genetic profile of gut microbiota. Based on the gene profile, the inventors used non-parametric multivariate analysis of variance (PERMANOVA) to assess the impact of 6 clinical parameters, including age, gender, height, weight, BMI, and obesity. The inventors performed the analysis using the method implemented in the "vegan" package in R, and obtained the substituted (permuted) p values by 10,000 substitutions (permutation). The inventors also used the Benjamini-Hochberg method to correct multiple tests with "p.adjust" in R to obtain the q value for each test. PERMANOA identified three important factors (based on gene profiling) associated with gut microbiota (q <0.05, Table 2). Analysis showed that body weight, BMI and obesity status are strongly correlated markers, demonstrating that disease (obesity) status is a major determinant affecting the composition of the gut microbiota.

Table 2 PERMANOVA based on Euclidean distance analysis of gene profiles. Analyses were performed at q-values <0.05 to test whether clinical parameters and obesity status have significant effects on gut microbiota.

1.4.4 determination of obesity-related markers

Determination of obesity-related genes. To determine the association between metagenomic profile and obesity, a two-tailed Wilcoxon rank-sum test was employed in the 9,879,897 high frequency gene (removing genes present in less than 10 samples in all 158 samples) profile. 396,100 gene markers enriched in both cases and controls were obtained with p values <0.01 and FDR 3.8% (FIG. 1).

False discovery rate estimation (FDR). The inventors applied the "q-value" method proposed in the previous study to estimate FDR (storage, JDA direct approach to false discovery rates. journal of the Royal Statistical Society 64,479-498(2002), incorporated herein by reference), rather than the sequential p-value elimination method.

Receiver Operating Characteristic (ROC) analysis. The inventors applied ROC analysis to evaluate the performance of the metagenomic marker-based obesity classification. The inventors then used the "pROC" package in R to plot ROC curves.

1.5 method of selecting 9 best markers from biomarkers (maximum associated minimum redundancy (mRMR) feature selection framework)

To determine the optimal gene set, a minimum redundancy maximum correlation (mRMR) (see Peng, H., Long, F. & Ding, C. feature selection based on organizational information: criteria of max-significance and min-redundancy, IEEE Trans Pattern animal Intel 27,1226-1238, doi: 10.1109/TPAMI.2005.159(2005), incorporated herein by reference) feature selection was used to select from all obesity-related gene markers. The inventors performed incremental searches using the "sideChannelAttack" package of the R software and found 158 sets of contiguous markers (sequential markers sets). The inventors estimated the error rate for each continuum by leave-one-out cross-validation (LOOCV) of the linear discriminative classifier. The best choice of marker set is the one corresponding to the lowest error rate. In this study, the inventors performed feature selection on a panel of 396,100 obesity-associated gene markers. Since it is computationally infeasible to use all genes for mRMR, the inventors have derived a statistically non-redundant set of genes. First, we selected 8010 genes (q < 0.0005). Subsequently, the inventors applied mRMR signature selection and determined the best set of 9 gene biomarkers (lowest error rate, fig. 2) that were strongly correlated with obesity for the obesity classification, which is shown in table 3 and table 4. Gene id is from a reference gene list that has been published as Li, j, et al, 2014, supra.

TABLE 3.9 enrichment information for the best Gene markers

Gene id	Enrichment (1 ═ obesity, 0 ═ control)
		64552	0
1208989	0
		2285506	0
3104115	1
		3581202	0
5042942	1
		5243950	1
6793200	1
		7860042	1

TABLE 4.9 SEQ ID of the best Gene markers

Gene id	SEQ ID NO:
		Gene _ id:7860042	1
Gene _ id:1208989	2
		Gene _ id:5243950	3
Gene _ id:5042942	4
		Gene _ id:3104115	5
Gene _ id:2285506	6
		Gene _ id:3581202	7
Gene _ id:64552	8
		Gene _ id:6793200	9

1.6 intestinal health index (obesity index)

To develop the potential for disease classification by gut microbiota, the inventors developed a disease classification system based on the 9 gene markers defined by the inventors. In order to intuitively evaluate the risk of disease based on these intestinal microbial gene markers, the present inventors calculated an intestinal health index (obesity index).

To evaluate the effect of the gut metagenome on obesity, the inventors defined and calculated an gut health index for each individual based on the 9 gene markers selected as described above. For each individual sample, the formula I is calculated_jIntestinal health index for sample j expressed:

A_ijis the relative abundance of marker i in sample j;

n is a subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition (i.e., a subset of the markers enriched in all obesity among the 9 selected gene markers),

m is a subset of the markers in the selected biomarkers associated with the abnormal condition that are enriched in all controls (i.e., a subset of the markers in the 9 selected gene markers that are enriched in all controls),

| N | and | M | are the number (size) of biomarkers in the two subsets, where | N | is 5 and | M | is 4, respectively, where an index greater than a cutoff value indicates that the subject has obesity or is at risk of developing obesity.

1.7 Classification of obesity based on gut microbiota

The present inventors calculated an obesity index based on the relative abundance of these 9 gene markers, which clearly distinguished the microbiome of obese patients from the control microbiome (table 5). The 78 obese patient microbiome was classified from the 80 control microbiome using the obesity index, which showed an area 0.9763 under the Receiver Operating Characteristic (ROC) curve (fig. 3). At the optimal index cut-off value of 0.03519, the True Positive Rate (TPR) was 0.9487, the False Positive Rate (FPR) was 0.1, and the error rate was 8.23% (13/158), indicating that 9 gene markers can be used to accurately classify obese individuals.

TABLE 5 calculated gut health index for 158 samples (obese patients and non-obese controls)

Example 2 validation of 9 Gene biomarkers in 42 samples (test set)

The present inventors validated the discrimination ability of the obesity classifier using another new independent study group, including 17 obese patients and 25 non-obese controls collected at rekins hospital, the Shanghai university of medicine.

DNA from each sample was extracted and a DNA library was constructed, followed by high throughput sequencing as described in example 1. The inventors calculated the gene abundance profiles of these samples using the same method as described in Qin et al, 2012, supra. Then determining the sequence as shown in SEQ ID NO: 1-9 relative abundance of the genes for each marker. The index for each sample was then calculated by the following formula:

A_ijis the relative abundance of marker i in sample j;

| N | and | M | are the number of biomarkers in the two subsets, where | N | is 5 and | M | is 4,

wherein an index greater than a threshold value indicates that the subject has or is at risk of developing obesity.

Table 6 shows the calculated index for each sample and table 7 shows the relative abundance of the relevant genes for representative sample DB 78A. In this assessment analysis, at a cut-off value of 0.03519 (the best index cut-off in the above 158 samples), the error rate was 21.42% (9/42), demonstrating that 54 gene markers could be classified as obese individuals. Most obese patients (16/17) were correctly diagnosed with obesity. In addition, the ROC for the test set was plotted from the obesity index for the test set, AUC 0.9024 (fig. 4). At the optimum threshold 0.1337, the True Positive Rate (TPR) was 0.9412 and the False Positive Rate (FPR) was 0.24.

TABLE 6 calculation of the gut health index for 42 samples

TABLE 7 relative abundance of genes in sample DB78A

Gene id	DB78A (calculation of relative abundance of genes)	Enrichment (1 ═ obesity, 0 ═ control))
			64552	0	0
1208989	0	0
			2285506	1.46332E-06	0
3104115	3.47323E-06	1
			3581202	0	0
5042942	0	1
			5243950	5.26732E-06	1
6793200	1.06787E-06	1
			7860042	0	1

Example 3 validation of 9 Gene biomarkers in 22 samples (test set)

The inventors verified the discrimination ability of the obesity classifier using another 22 samples (table 8) including 9 case samples and 13 control samples (5 samples after 1 month of operation and 8 samples after 3 months of operation), which were also collected at rekins hospital, the medical institute of Shanghai transportation university. Cases represent pre-operative samples and controls represent 1 and 3 months post-operative.

TABLE 8.22 information on samples

Before: before operation; 1-M: surgery after one month; 3-M: three months later.

A_ijis the relative abundance of marker i in sample j.

m is the subset of markers enriched in all controls among the selected biomarkers associated with the abnormal condition (i.e., the subset of markers enriched in all controls among the 9 selected gene markers),

Table 9 shows the calculated index for each sample and table 10 shows the relative abundance of the relevant genes of representative sample DB 126. In this assessment analysis, the error rate was 22.72% (5/22) at a cutoff value of 0.03519 (the best index cutoff value among the above 158 samples), demonstrating that the 54 gene markers could be classified as obese individuals. And most obese patients (8/9) were correctly diagnosed as obese. In addition, the ROC for the test set was plotted from the obesity index for the test set, AUC 0.8462 (fig. 5). At the optimum threshold 0.9695, the True Positive Rate (TPR) was 0.6667 and the False Positive Rate (FPR) was 0.07692.

TABLE 9 calculation of the gut health index for 22 samples

TABLE 10 relative abundance of genes in sample DB126

Gene id	DB12 (calculation of relative abundance of genes)	Enrichment (1 ═ obesity, 0 ═ control)
			64552	0	0
1208989	0	0
			2285506	7.99701E-08	0
3104115	6.25943E-05	1
			3581202	0	0
5042942	7.19308E-08	1
			5243950	5.97579E-07	1
6793200	0	1
			7860042	1.52752E-07	1

Thus, the inventors identified and validated 9 marker sets by a minimum redundancy-maximum correlation (mRMR) feature selection method based on 396,100 obesity-related markers. And the present inventors established an intestinal health index, and evaluated the risk of obesity based on these 9 intestinal microbial gene markers.

Although illustrative embodiments have been shown and described, it will be understood by those skilled in the art that the above embodiments are not to be construed as limiting the present disclosure and that changes, substitutions and alterations can be made thereto without departing from the spirit, principles and scope of the present invention.

Sequence listing

<110> Shenzhen Hua Dagen science and technology Limited

Shenzhen Huada Gene Research Institute

<120> biomarkers for obesity-related diseases

<130> XXXX

<160> 9

<170> PatentIn version 3.5

<210> 1

<211> 267

<212> DNA

<213> unknown

<220>

<223> isolated from intestinal tract, not identified

<400> 1

gtactatata aatatggagg tgattacatg gcagagcgtg aaaaacaata taaaggaatc 60

attagttatg aaaaattatg gaatcttatg caaacaaaaa atataaaaaa aagagacttg 120

agagagactt ataaaatttc tcctactatt attagtagac ttagcaacaa cgcaaacgta 180

gctgtagaca ctatcatgta tctttgtgaa atcttaaact gtcagcccag tgatattatg 240

gaatacatcc cgccggaatc agtttaa 267

<210> 2

<211> 1362

<212> DNA

<213> unknown

<220>

<223> isolated from intestinal tract, not identified

<400> 2

atgtctgata aaattgccgc cattgccacg ggccacgccc gcacgggcat cggcgttctg 60

cgtctatccg gcgacgggtg cattgaggcc gcggaacagg tcttccggct gaactccggc 120

aggccgcttt cttccctctc cgaccgcaag cttgcgctcg gcacgctctt tgacgcacag 180

ggacggccca tcgaccactg catggcattc atctcccgcg ccccgcattc ctacaccggc 240

gaggataccg ccgaaattca gtgccacggc tcccccgcag cactgaccgc cgggcttgaa 300

gcgctgttcg ccgcaggctt ccggcaggcg cggcccggtg aattcacgcg ccgcgcgttt 360

ttgaacggcc agatggattt gacgcaggcc gaagccgtga tcgatctcat cgacgccgag 420

acagccgacg cggcggcaaa cgccgccgga caagttgcag gtgcgatccg caaaaagatc 480

gacccgatct acgacggctg ggtcgatctg tgcgcgcatt tccacgccgt gctcgattac 540

cccgacgagg atatcgaccc cttcacgctc tccggctatg aagcgtctct gacagaaagc 600

agccgtcagc ttaccgcact gctttcctcc tgtcgtcgcg gccggatggt gcagtccgga 660

atcaaggcgg tcattctggg cagtccaaac gccggaaaat ccagccttct taaccgcctg 720

tccggctttg accgcgtgat cgtcaccgac attcccggca cgacgcgcga cacggtcgag 780

cagaccgtca cactcgggcg gcaccttgtc cgtctggttg ataccgcagg catccgcgac 840

acgcaggatg tgatcgaaaa aatcggcgtt gaccgctcga tggaagccgc gaaggactgc 900

gatgtagcgc tgtatgtcct tgatgattcc aagcctatga cggaagagga tcgccgcgcg 960

atgggcgcgg cgctcgaagc gccggaagcc gttgcgattt tgaacaaaca ggatctggct 1020

gccgtgatcg aaccgtccga tctgccgttt gcgtacattg tccccatctc ctgcaaggac 1080

ggcacgggct ttgatctgct ggagcaggcg tttgatatgc tcttcccgga cgatacgccg 1140

tgtgacggtt cgcttctgac gaacgcgcgg caggccggcg ccattctgcg cgcaaaaaaa 1200

tccgtcgatt ccgccgtgca gagcctgcgt gcgggtatga cgcccgacgc ggtgctggtc 1260

gatctggaat cggcgatgga tgcgctcggc gaggtcacgg gccggacgat gcgcgaggat 1320

atcacaaacc gcatttttga gcggttctgc gttggaaaat aa 1362

<210> 3

<211> 546

<212> DNA

<213> Dorea longicatena

<220>

<223> Dorea longicatena isolated from intestinal tract

<400> 3

agaagctggg ttgaatttct agaaggagga gcacacatga gacaggaacg tattgatgct 60

aattcattag aattaaacga aaaagtagta tcaattaaac gtgtaaccaa agttgttaag 120

ggtggccgta atatgagatt cacagcttta gtagttgttg gtgacggaaa tggccatgtt 180

ggtgcaggtt taggaaaagc tacagaaatt ccggaagcaa tccgcaaagg aaaagaagat 240

gcagctaaga atctcatctc tgtagcatta gatgaaaatg acagtgtaac acatgattat 300

atcggtaaat tcggaggcgc ttctgtatta cttaagaagg ctccagaagg tactggagtt 360

atcgccggtg gtccggcgcg tgccgtaatc gagatggcag gaatcaagaa cattcgtaca 420

aaatctcttg gttctaacaa caaacagaat gtagtacttg ctacaatcga aggattacgc 480

cagattaaaa ctccagaaga agtagctaag cttcgcggta aatctgttga agagatcttt 540

ggctaa 546

<210> 4

<211> 567

<212> DNA

<213> unknown

<220>

<223> isolated from intestinal tract, not identified

<400> 4

atggagattc agatctatta tcagaaatgg gatgaaaaaa gcaaagattc acatgagtta 60

ttagaacgtg tgattcgcat ttacgccaag gcacatcaga taaaattgcc ggaaacgtta 120

cagatttgtc aggaacaaaa taaaaagccg tttctaaaaa acgtatcgga tatctgcttt 180

tctatttccc acagcgggga atggtggagc tgcgcgattg cgccgcagga agttggtctg 240

gacattcagg aagaacatga ctgcaaggca gaacgtctgg caaaacgatt ttttcatccg 300

atggaagtag catggatgga acagcatgga tatgagaatt tttgccggtt gtgggcatac 360

aaggaaagct atgtgaaata tacagggatt gggcttgtaa agggtatgga ttattattcg 420

gtggtgaagc cggatggcac attgaccgga gaggaaagtg tctggcaaaa gcagattgcc 480

tttcaaccag agtgttatat ggttgtgact gcaaaacagg aagcagaagt aaagctgttt 540

ttattgacag aacagagtaa aaaatga 567

<210> 5

<211> 849

<212> DNA

<213> unknown

<220>

<223> isolated from intestinal tract, not identified

<400> 5

atgttcgaag accagataag aaagtacgac aacctcgttt ctcaatacaa aaagattgta 60

tcagacaaat tctgcaagca agactttatc gaatacaatg agattctatt ctctgcccac 120

agctgtgcta tcgaggggaa cagtttttct gtggatgaaa cacgtaccct caaggaaaaa 180

ggactgggta tgatacccca aggaaaaact ctccttgaag cgtttgagat gcttgaccat 240

ttccaagcat acgaatatct tctgaaaaat ctggacaaac ctttgtctga ggaactttta 300

aaggaaacac acaaactact gacagaacat acattggcat tccggacaca acatgacgaa 360

catccttcac agccaggaga atataccaca gtagacatgt gtgcaggaga tactattttt 420

ggagaccata aggaacttat aaagcgcgta cccaacctat tggaaagtac acagaaagca 480

atggactcgg gtgatataca tcccatcata atcgcagcca aatttcacgg tttttatgaa 540

tatctccatc cttttcgaga cgggaacggc cggcttggaa gaatgtttac caacttcatt 600

ttgctgaaaa aagaccaacc gatacttatc attccccgtg aaaaaagaga agagtatatc 660

agtgcgttaa gattcatccg aaaagaacgc acagatgaat atctgataga ctttttcttt 720

aacacagcca ttgaaagaat gcaatctgaa attgaagaaa aaaacaatat gacaaacaat 780

ttcattcatg gaataaattt tattcagaat gaaaaatcca cctcacaaga caacgagatt 840

gaagcttaa 849

<210> 6

<211> 1020

<212> DNA

<213> Bacteroides intestinalis

<220>

<223> isolated from intestinal tract, Bacteroides intestinalis

<400> 6

ataaatcata aatctgaaat cataaatcat aagatggata ccaaatcaca aaacagtaac 60

atgctactag cgttcttaac gctgacaggc gtcattgcca ttgtagccgt agtgggtttc 120

ttcatgttgc gcaaaggacc ggaaatcgta caaggtcaag ctgaagtcac agagtatcgg 180

gtatcgagca aagtaccggg acgaattctg gaattccgcg taaaggaagg acaaagtgta 240

caagccgggg ataccctcgc catactggaa gctccggatg tgttagctaa gctggaacag 300

gctcgggccg cagaagctgc tgctcaggca cagaatgaga aagcactgaa aggcgcccgt 360

catgaacagg tacaagctgc ctttgagatg tggcaaaaag ccaaagcagg acttgaaata 420

gccgagaaat cttacaaacg tgtgaagaat cttgctgatc agggcgtgat gtcagcccaa 480

aagctggatg aggttactgc acagcgggac gctgccgtcg ctacagaaaa agccgccaaa 540

gctcaatacg acatggcaaa aaacggtgcg gaacgagaag ataaagcagc agctgccgcc 600

ttggtagacc gtgccaaagg agccgttgca gaagtagaat cttatatcaa agaaacttat 660

cttatagccc agacagccgg agaagtttcg gaaatcttcc ccaaagtagg tgaattggta 720

ggaaccggtg caccgatcat gaacatagcc atactggatg atatgtgggt aacttttaat 780

gttcgggaag atttgctgca aggtctgaca atgggaactg aattcgaagc tttcgttccg 840

gcattggata aaaatatccg cctgaaagtg aactacatga aggatcttgg tacttatgca 900

gcctggaagg ctacgaaaac aaccgggcaa tttgatctga agacttttga ggtaaaagct 960

ttaccacagg agaaagtgga aggcttacgt cccggaatgt cggtgatact gaagaagtga 1020

<210> 7

<211> 765

<212> DNA

<213> unknown

<220>

<223> isolated from intestinal tract, not identified

<400> 7

atggcaagtc ctttcctgcc gaaaaacggc atcacaggac aattcgccat cggacggcaa 60

aatcctgcga accgggcgga gcaaatgcag aaacttgcaa attcagcctg cacacgatat 120

aataaagcgg aaagtaggtg cattttcgtg aaaaaggcag tcaattgggc aatgtacact 180

ggcagcatcg gctttatgct gaacggaatc attggcatcg tgaagctgct gcacggcaat 240

aacaaggtgg cgatgcaagc tcaccttgct gtcgttttct ttgtcatcgc ggcggttggc 300

atcatctgca agattctgct gaaccgctgc ccgaactgcc ggaagagcgt catgacgcgc 360

ggcgaatact gcccgtactg cggtgaaaaa atcaaaaaaa gcaatgagct gctggacggc 420

acagttctgc ggcacgagtg gttcaaaagc agaaaatcgc caatttcgcc cgcagacggg 480

aagaatgaag cgggggaaaa ggtgcgcttt cataaaaagg cagttaattg ggtgcaggcg 540

attgcacttg ttgcgttcgt ggtggatttg tgcttcattg ccggggaatg gttgcagagc 600

gattgcgaaa ggctgacggc aggctgcatc ggtgcggcgt gctggttcat cgtgctgata 660

tgcgtgtttt gcaagatacg gatgaaccgc tgcccgcact gcgggaaggt tgtcgagaca 720

cgcggcgact attgtccgta ctgcggcgaa aaagtcaaat tgtga 765

<210> 8

<211> 3603

<212> DNA

<213> Prevotella copri

<220>

<223> Prevotella copri isolated from intestinal tract

<400> 8

tattttttga aagtagatgt taccgattcg ctttcagaaa cgtattttta ttatagtatt 60

aacgaaaaag agaaaaatat gaaagaagta agatgtgtat tcatgtttgc tgtcctgact 120

ttatggatgg cattgcctgt aggagcacag aatgcaaaga gaggcatcac gatgtcgctc 180

gttaatgagt ctttggcttc tgcattgaga aaagtacagc aggagtcgga ttataaggtg 240

agttttgtga tagaggacgt aaatccctat accacaacgg ttcatctgaa gaatgcttct 300

gcctcaactg ccgtaaagca gattttacag ggtaaacctt tcacttactc agtaagtggt 360

aagtttatta ccgtcaagaa ggttttgcaa cataaaactg cagctgaagc tacagataat 420

ggagctgcca tccgccccct ctcaggaaca attaccgatg aggatggaga acctatgatt 480

ggcgcttctg ttgtagtgcc tggcagtcct tttggaaccg tgactaatac ggatggtaat 540

tttgagtttt atattcctaa aggatgtcat gaggttacgg tttcgtatgt cggtatgaat 600

gaccagaagg taaaggtggc tggtagggat catgtgaaga ttgtgatgtc tgaaaataag 660

accatacttg gtgaagtggt tgtaacgggt tatcagacaa tctctaagga gcgtgcaacc 720

ggtgctttta ccaaggtaac agctgatgaa ctgaaagaca agaggttggg taatctctct 780

actgtgctgg ctggcgaggt tgccggatat aatgacggca tgattcgtgg cgtaaccacg 840

atgaatgcct ctgcttctcc tctctatgtc atagatggct ttccagtaga gaaaactaag 900

ttgaccggca atggaactgg tgatattacc gaggaaatcc ctgatttgaa tatggatgat 960

attgagagta ttactgttct taaagatgct gctgctgctt ccatctatgg tgctcgtgct 1020

gccaatggtg tcattgtcat cactaccaag aaatcggcag ctaagggtaa gaccaatgtc 1080

tctttcaatg cttctcttac ctggcatcct tattcttatt atacagatta tctcgccaat 1140

tcttctcttg ttattgactt ggagaaggag tgggctgcgc agaatccgaa tcttgcaggc 1200

aatggagcta aggtatacgc tcagaatatg ttggatcaga aaatctatac gagcgcgggt 1260

atctgcaata tcttgaatta ttatgcaggc aatatctctg agtcagaaat gaatgctaaa 1320

ctgactgatt tggctggaaa aggttataac tactatgacc agatgaagaa gtatgctaag 1380

cgcgatccgt tgtatcagca gtataatatg aatattgtaa acaactcggt gaataatctc 1440

ttcaaggctt cggtcactta taagtataat acccttgaag ataagtattc caataatcag 1500

agtttaggta tcaaccttac caatacttct catttcacaa aatggctttc tttggattta 1560

ggtgcttacc taaaggtggg tgaggatcag acacagaact acaatgtcct ttctcctgga 1620

tattctgttt tgccatacga taatttagtg aatgctgatg gctcttgtta taccaagccg 1680

atgagtgaga gatatgatgc aagtactttg gctatctatc agaagtatgg tctttataat 1740

atggatatca ctcctttgga agagctggat cgtcagatta ccaccaacaa ggagttggct 1800

ttacgaactt ttgcccgtct gaatgtacag attctgccct gtttgaagta tgctgcttcc 1860

ttccagtacg aacgagcttc atatcgtggt gaaaactggg ctgataaagc ttctgtgcag 1920

gttcggggtg tggtaaacgg gtatgcaacc gataatggtg atggctctgt caattatgtg 1980

attccttatg gggatatcct ggttcgttca gaccaatata cctctgctta taacttccgt 2040

cagcagttga gttttgatca gactttcaag gatgttcaca gcgtaactgc tattctgggc 2100

acggaaacca tccagaacaa gcaggaattg catcgagaca aactgtttaa ctatgattct 2160

cagatgttaa catccaatat ggtgaataat gctgatttgt tgaaaggtat agcgggtgtg 2220

ctgggatata aatctatgac cgcgactgat ttgtctgctt cttatgaaaa tgtaaaccgt 2280

tatgtttcgg tatatggtaa tgctgcttat acttacgatg accgttatag cttgactggt 2340

agtttacgat gggaccgctc taatctctgg ggaaccagtt cgaagtttca gaacaagcct 2400

atctggtcgg taggtgccag ttggattatc agtaaggaga agttcttcca cgcagattgg 2460

gtcaattatc tgaaactccg tgtttctgat ggtattgccg gtaacgtatc taagaattct 2520

gccccttata tggtggcaag ctataataat aacgggcatg tgggaggtac acagggatat 2580

gtgcagtcgc gtgccaatcc gatgctgagc tgggagaaaa ccaatacctt taatataggt 2640

atcgatttct cgctctttaa gaaccgactc aatggtactg tagagtatta cgacaagaag 2700

ggtaccgacc tcctggcttc ttccatgggt gtacctactg aaggttgggg ctacagcacc 2760

tataccatca acaatggcga gatgtacaat cgtggtgtag aaatttccct gagtggtgag 2820

gttctgcgta ccagggattt ctcatggaga gccaatatga cttatgctta taataagaat 2880

gaggtgactt atgtaaatgt aaaggcgccg gtttatattc ttcagttgga ttacccttcg 2940

gcttatccta ttatcggcaa cgaatataat gcaatctatg gctataagtg ggctggattg 3000

agtgaagaag gattgccaca ggtgtataac gagaatggtg agaaggtaac taatcagcca 3060

acgacattgg atgccatctc ttacatgggt actaccactc ctaagtatag tggatctttc 3120

ggaaccagta tcagctacaa ggatttcgat ttcagtatgc agttcctttt tgcgggtggt 3180

cataagatgc gcaatgccaa tcctgctttc ctcacttgca gttattccag cgtaggatat 3240

atctccaata ttgcgggagc tagtgccgga cttgctaatc gttggcagaa accaggtgat 3300

gaggcttata ccaatgtgcc aaaggctgtt tttgccgaaa gtggtttgtc agcatcttct 3360

ttgtatagta cttattttta ttctgatatc aatatccttg atgccagtta tattcgtctg 3420

aacaacattt ctctggctta tcatttgcca aagtcgctgt gccgttcgct ctatatgcag 3480

agtgcccgtg ttcaggctaa tgtggagaat ccgttcttct gggcaaagac caaacaggct 3540

aagtaccagc tgggtggtta taatgccacc aactatgtgt taggcattta tctcaacttt 3600

taa 3603

<210> 9

<211> 375

<212> DNA

<213> unknown

<220>

<223> isolated from intestinal tract, not identified

<400> 9

atggcaaaat acgatgattt accagtattc aaggctacgt atgatctgtt atttcagata 60

tttaatgtaa gccagcattg gcgacgtgat atacgatata gcttaggtga agatcttaag 120

aaggaaataa tagaaatctt acaactcatc tatcaagcaa attctacaag aagcaaaatt 180

gcatatattt cgtcttgcag agtaaaaata gtcaaagtca aactacaggt acgtatagcc 240

aaagatttga aagaattgca tataaatcaa tatgcatttc tggcagaaat gatggaaagt 300

gtttctaaac agctgacaag ctggatgaaa tcagaacaga aaaaagaaca aaatttagaa 360

caaaaagaaa aataa 375

Claims

1. A kit for detecting a biomarker set consisting of: SEQ ID NO: 1 to 9, wherein the total amount of the intestinal biomarker,

the kit comprises a nucleic acid sequence for PCR amplification and according to SEQ ID NO: 1 to 9, and a primer designed for each sequence.

2. A kit for detecting a biomarker set consisting of:

SEQ ID NO: 1 to 9, the kit comprising one or more intestinal biomarkers according to SEQ ID NO: 1 to 9, and a probe designed for the gene of each sequence.

3. Use of a primer as claimed in claim 1 or a probe as claimed in claim 2 in the manufacture of a kit for predicting the risk of obesity in a subject, comprising:

(1) collecting a sample from a subjectj；

(2) Determining the presence of a nucleic acid sequence consisting of SEQ ID NO: 1 to 9 relative abundance information for each of the intestinal biomarkers in the intestinal biomarker set; and

(3) is calculated according to the following formulaI _jSample of representationsjIndex (c):

A _ijis a samplejMiddle markeriWherein i refers to each of the gut biomarkers in the set of gut biomarkers;

Nis a first subset of the selected 9 of the intestinal biomarkers that are enriched in all obese patients,

Mis a second subset of the selected 9 of the intestinal biomarkers enriched in all controls,

| N | and | M | are the number of the biomarkers in the first subset and the second subset, respectively,

wherein an index greater than a threshold value indicates that the subject has or is at risk of developing obesity,

the | N | is 5, the | M | is 4,

the critical value is 0.03519-0.1337.