Disclosure of Invention
Embodiments of the present disclosure seek to address, at least to some extent, at least one of the problems in the prior art.
The present invention is based on the following findings of the present inventors:
the assessment and characterization of gut microbiota has become a major area of research in human diseases including obesity. For analysis of gut microbial composition in obese patients, the present inventors performed a metagenomic association analysis (MGWAS) protocol based on deep shotgun sequencing of gut microbial DNA from 158 individuals (Qin, j. et al, american-wide association study of gut microbiota in type 2diabetes. nature 490,55-60(2012), incorporated herein by reference). The present inventors identified and validated 396,100 obesity-related gene markers. To exploit the potential capability of obesity classifiers by gut microbiota, the present inventors developed a disease classifier system based on 9 gene markers defined as the optimal gene set by the minimum redundancy-maximum correlation (mRMR) feature selection method. In order to visually assess the risk of obesity disease based on the 9 intestinal microbial gene markers, the present inventors calculated a health index. The present inventors' data have conducted intensive studies on the characteristics of the gut metagenome associated with obesity risk, providing an example of future studies of the pathophysiological role of gut metagenome in other related diseases and potential applications for the assessment of individuals at risk for such diseases based on gut microbiota.
It is believed that genetic markers of the gut microbiota are valuable for improving the detectability of obesity in the early stages, for the following reasons. First, the markers of the present invention are more specific and more sensitive than conventional markers. Second, stool analysis ensures accuracy, safety, affordability, and patient compliance. And the stool sample is transportable. The present invention therefore relates to a comfortable and non-invasive in vitro method, making it easier for a person to participate in a given screening procedure. Third, the markers of the invention can also be used as therapy monitoring tools for cancer patients to detect their response to therapy.
One aspect of the present disclosure provides a biomarker set for predicting a disease associated with microbiota in a subject, consisting of:
comprises the amino acid sequence of SEQ ID NO: 1 to 9.
According to an embodiment of the present disclosure, the disease is obesity or a related disease.
With these biomarkers, a subject can be analyzed for certain diseases associated with microbiota, e.g., obesity or related diseases can be determined based on certain samples from the subject, e.g., certain stool samples can be used.
In another aspect of the present disclosure, there is provided a kit for determining the above gene marker set, comprising a nucleic acid sequence for PCR amplification and according to the sequence set forth as SEQ ID NO: 1 to 9, or a DNA sequence as described in at least a partial sequence thereof.
Another aspect of the present disclosure provides a kit for determining the above-mentioned gene marker set, comprising one or more sequences according to SEQ ID NO: 1 to 9, or a pharmaceutically acceptable salt thereof.
Another aspect of the present disclosure provides the use of the above-described gene marker set for predicting the risk of obesity or a related disease in a subject, comprising:
(1) collecting sample j from the subject;
(2) determining the relative abundance of SEQ ID NO: relative abundance information for each of 1 to 9; and
(3) calculated from I as followsjIndex of sample j represented:
Aijis the relative abundance of marker i in sample j, wherein i refers to each gene marker in the set of gene markers;
n is a first subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition,
m is a second subset of the markers in the selected biomarkers associated with the abnormal condition enriched in all controls,
| N | and | M | are the number of biomarkers in the first and second subsets respectively,
wherein
An index greater than a threshold value indicates that the subject has an abnormal condition or is at risk of developing an abnormal condition.
According to some embodiments of the disclosure, | N | is 5 and | M | is 4.
According to some embodiments of the present disclosure, the threshold value is 0.03519 to 0.1337.
Another aspect of the present disclosure provides a use of the above gene marker set for the preparation of a kit for predicting the risk of obesity or related diseases in a subject, comprising:
(1) collecting sample j from the subject;
(2) determining the relative abundance of SEQ ID NO: relative abundance information for each of 1 to 9; and
(3) calculated from I as followsjIndex of sample j represented:
Aijis the relative abundance of marker i in sample j, wherein i refers to each gene marker in the set of gene markers;
n is a first subset of the selected biomarkers associated with the abnormal condition that are enriched in all patients,
m is a second subset of the selected biomarkers associated with the abnormal condition that are enriched in all controls,
| N | and | M | are the number of biomarkers in the first and second subsets respectively,
wherein
An index greater than a threshold value indicates that the subject has an abnormal condition or is at risk of developing an abnormal condition.
According to some embodiments of the disclosure, | N | is 5 and | M | is 4.
According to some embodiments of the present disclosure, the threshold value is 0.03519 to 0.1337.
Another aspect of the present disclosure provides a method of diagnosing whether a subject has or is at risk for developing an abnormal condition associated with a microbiota, comprising:
determining the relative abundance of the aforementioned biomarkers in a sample from the subject, an
Determining whether the subject has or is at risk of developing an abnormal condition associated with the microbiota based on the relative abundance.
According to an embodiment of the present disclosure, the method comprises:
(1) collecting sample j from the subject;
(2) determining the nucleotide sequence of SEQ ID NO: relative abundance information for each of 1 to 9; and
(3) calculated from I as followsjIndex of sample j represented:
Aijis the relative abundance of marker i in sample j, wherein i refers to each gene marker in the set of gene markers;
n is a first subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition,
m is a second subset of the markers in the selected biomarkers associated with the abnormal condition enriched in all controls,
| N | and | M | are the number of biomarkers in the first and second subsets respectively,
wherein
An index greater than a threshold value indicates that the subject has an abnormal condition or is at risk of developing an abnormal condition.
According to some embodiments of the disclosure, | N | is 5 and | M | is 4.
According to some embodiments of the present disclosure, the threshold value is 0.03519 to 0.1337.
According to an embodiment of the present disclosure, the abnormal condition associated with microbiota is obesity or a related disease.
Detailed Description
Examples of the invention
The terms used herein have the meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. Terms such as "a," "an," and "the" are not intended to refer to only a singular entity, but include the general class of terms that may be used to describe a particular example. The terms used herein are used to describe specific embodiments of the invention, but their usage does not limit the invention, unless otherwise specified in the claims.
The invention is further illustrated in the following non-limiting examples. Unless otherwise indicated, parts and percentages are by weight and degrees are in degrees Celsius. It will be apparent to those of ordinary skill in the art that these examples, while representing preferred embodiments of the invention, are given by way of illustration only and that all reagents are commercially available.
Detailed description of the preferred embodiments
EXAMPLE 1 identification of biomarkers for assessing obesity Risk
1.1 sample Collection
Stool samples from 158 chinese subjects, including 78 obese patients and 80 control subjects (training set), were collected in 2012 by the rekins hospital, the shanghai university of transportation medical school. Obese patients are from 18 to 30 years of age with a BMI above 25. Subjects were asked to collect fresh stool samples at the hospital. The collected samples were placed in sterile tubes and immediately stored at-80 ℃ until further analysis.
Full ethical approval was obtained and all patients were given written informed consent. The study was approved by the ethical review committee of the rekins hospital, Shanghai university of medicine.
1.2DNA extraction
Stool samples were thawed on ice and DNA extraction was performed using Qiagen QIAamp DNA pool Mini kit (Qiagen) according to the manufacturer's instructions. The extract was treated with DNase-free RNase to eliminate RNA contamination. The amount of DNA was determined using a NanoDrop spectrophotometer, a Qubit fluorometer (with Quant-iTTMdsDNA BR assay kit) and gel electrophoresis.
1.3 DNA library construction and sequencing of fecal samples
DNA library construction was performed according to the manufacturer's instructions (Illumina, insert size 350bp, read length 100 bp). The inventors performed cluster generation, template hybridization, isothermal amplification, linearization, blocking and denaturation, and hybridization of sequencing primers using the same workflow as previously described. The inventors constructed a paired-end (PE) library with an insert size of 350bp for each sample, followed by high throughput sequencing to obtain about 3 million PE reads 2x100 bp in length. High quality reads are obtained by filtering low quality reads with indeterminate "N" bases, linker contamination, and human DNA contamination from Illumina raw reads and by simultaneously cleaving the low quality terminal bases of the reads.
The inventors output fecal microbiota sequencing data (high quality clean data) of about 5.9Gb per sample in total from 158 samples (78 cases and 80 controls) on the Illumina HiSeq 2000 platform (table 1).
Table 1 metagenomic data summary. The fourth column reports the results from the Wilcoxon rank sum test.
1.4 metagenomic data processing and analysis
1.4.1 read alignment
The present inventors used a newer human intestinal gene catalog established by Li, J.et al, An integrated catalog of reference genes in the human gut genome. Nat.Biotechnol. (2014) (incorporated herein by reference) and aligned high quality reads to the newer human intestinal gene catalog with An alignment standard identity ≧ 90. The average read alignments are shown in table 1. This alignment is close to the sample in Li, j, et al, 2014, supra, indicating that this alignment is sufficient for further studies. After read alignment, the inventors derived a gene profile (9.9Mb gene) from the alignment results using the same method as Li, j. et al, 2014, supra.
Taxonomic assignment of genes. The internally developed protocol (pipeline) described in published papers (Li, j. et al, 2014, supra) was used for taxonomic assignment of predicted genes.
1.4.2 data File construction
Gene profiling. Based on the results of the read alignment, the inventors calculated relative gene abundance using the same method described in the published T2D paper (Qin et al, 2012, supra).
1.4.3 analysis of factors affecting the genetic profile of gut microbiota. Based on the gene profile, the inventors used non-parametric multivariate analysis of variance (PERMANOVA) to assess the impact of 6 clinical parameters, including age, gender, height, weight, BMI, and obesity. The inventors performed the analysis using the method implemented in the "vegan" package in R, and obtained the substituted (permuted) p values by 10,000 substitutions (permutation). The inventors also used the Benjamini-Hochberg method to correct multiple tests with "p.adjust" in R to obtain the q value for each test. PERMANOA identified three important factors (based on gene profiling) associated with gut microbiota (q <0.05, Table 2). Analysis showed that body weight, BMI and obesity status are strongly correlated markers, demonstrating that disease (obesity) status is a major determinant affecting the composition of the gut microbiota.
Table 2 PERMANOVA based on Euclidean distance analysis of gene profiles. Analyses were performed at q-values <0.05 to test whether clinical parameters and obesity status have significant effects on gut microbiota.
1.4.4 determination of obesity-related markers
Determination of obesity-related genes. To determine the association between metagenomic profile and obesity, a two-tailed Wilcoxon rank-sum test was employed in the 9,879,897 high frequency gene (removing genes present in less than 10 samples in all 158 samples) profile. 396,100 gene markers enriched in both cases and controls were obtained with p values <0.01 and FDR 3.8% (FIG. 1).
False discovery rate estimation (FDR). The inventors applied the "q-value" method proposed in the previous study to estimate FDR (storage, JDA direct approach to false discovery rates. journal of the Royal Statistical Society 64,479-498(2002), incorporated herein by reference), rather than the sequential p-value elimination method.
Receiver Operating Characteristic (ROC) analysis. The inventors applied ROC analysis to evaluate the performance of the metagenomic marker-based obesity classification. The inventors then used the "pROC" package in R to plot ROC curves.
1.5 method of selecting 9 best markers from biomarkers (maximum associated minimum redundancy (mRMR) feature selection framework)
To determine the optimal gene set, a minimum redundancy maximum correlation (mRMR) (see Peng, H., Long, F. & Ding, C. feature selection based on organizational information: criteria of max-significance and min-redundancy, IEEE Trans Pattern animal Intel 27,1226-1238, doi: 10.1109/TPAMI.2005.159(2005), incorporated herein by reference) feature selection was used to select from all obesity-related gene markers. The inventors performed incremental searches using the "sideChannelAttack" package of the R software and found 158 sets of contiguous markers (sequential markers sets). The inventors estimated the error rate for each continuum by leave-one-out cross-validation (LOOCV) of the linear discriminative classifier. The best choice of marker set is the one corresponding to the lowest error rate. In this study, the inventors performed feature selection on a panel of 396,100 obesity-associated gene markers. Since it is computationally infeasible to use all genes for mRMR, the inventors have derived a statistically non-redundant set of genes. First, we selected 8010 genes (q < 0.0005). Subsequently, the inventors applied mRMR signature selection and determined the best set of 9 gene biomarkers (lowest error rate, fig. 2) that were strongly correlated with obesity for the obesity classification, which is shown in table 3 and table 4. Gene id is from a reference gene list that has been published as Li, j, et al, 2014, supra.
TABLE 3.9 enrichment information for the best Gene markers
Gene id
|
Enrichment (1 ═ obesity, 0 ═ control)
|
64552
|
0
|
1208989
|
0
|
2285506
|
0
|
3104115
|
1
|
3581202
|
0
|
5042942
|
1
|
5243950
|
1
|
6793200
|
1
|
7860042
|
1 |
TABLE 4.9 SEQ ID of the best Gene markers
Gene id
|
SEQ ID NO:
|
Gene _ id:7860042
|
1
|
Gene _ id:1208989
|
2
|
Gene _ id:5243950
|
3
|
Gene _ id:5042942
|
4
|
Gene _ id:3104115
|
5
|
Gene _ id:2285506
|
6
|
Gene _ id:3581202
|
7
|
Gene _ id:64552
|
8
|
Gene _ id:6793200
|
9 |
1.6 intestinal health index (obesity index)
To develop the potential for disease classification by gut microbiota, the inventors developed a disease classification system based on the 9 gene markers defined by the inventors. In order to intuitively evaluate the risk of disease based on these intestinal microbial gene markers, the present inventors calculated an intestinal health index (obesity index).
To evaluate the effect of the gut metagenome on obesity, the inventors defined and calculated an gut health index for each individual based on the 9 gene markers selected as described above. For each individual sample, the formula I is calculatedjIntestinal health index for sample j expressed:
Aijis the relative abundance of marker i in sample j;
n is a subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition (i.e., a subset of the markers enriched in all obesity among the 9 selected gene markers),
m is a subset of the markers in the selected biomarkers associated with the abnormal condition that are enriched in all controls (i.e., a subset of the markers in the 9 selected gene markers that are enriched in all controls),
| N | and | M | are the number (size) of biomarkers in the two subsets, where | N | is 5 and | M | is 4, respectively, where an index greater than a cutoff value indicates that the subject has obesity or is at risk of developing obesity.
1.7 Classification of obesity based on gut microbiota
The present inventors calculated an obesity index based on the relative abundance of these 9 gene markers, which clearly distinguished the microbiome of obese patients from the control microbiome (table 5). The 78 obese patient microbiome was classified from the 80 control microbiome using the obesity index, which showed an area 0.9763 under the Receiver Operating Characteristic (ROC) curve (fig. 3). At the optimal index cut-off value of 0.03519, the True Positive Rate (TPR) was 0.9487, the False Positive Rate (FPR) was 0.1, and the error rate was 8.23% (13/158), indicating that 9 gene markers can be used to accurately classify obese individuals.
TABLE 5 calculated gut health index for 158 samples (obese patients and non-obese controls)
Example 2 validation of 9 Gene biomarkers in 42 samples (test set)
The present inventors validated the discrimination ability of the obesity classifier using another new independent study group, including 17 obese patients and 25 non-obese controls collected at rekins hospital, the Shanghai university of medicine.
DNA from each sample was extracted and a DNA library was constructed, followed by high throughput sequencing as described in example 1. The inventors calculated the gene abundance profiles of these samples using the same method as described in Qin et al, 2012, supra. Then determining the sequence as shown in SEQ ID NO: 1-9 relative abundance of the genes for each marker. The index for each sample was then calculated by the following formula:
Aijis the relative abundance of marker i in sample j;
n is a subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition (i.e., a subset of the markers enriched in all obesity among the 9 selected gene markers),
m is a subset of the markers in the selected biomarkers associated with the abnormal condition that are enriched in all controls (i.e., a subset of the markers in the 9 selected gene markers that are enriched in all controls),
| N | and | M | are the number of biomarkers in the two subsets, where | N | is 5 and | M | is 4,
wherein an index greater than a threshold value indicates that the subject has or is at risk of developing obesity.
Table 6 shows the calculated index for each sample and table 7 shows the relative abundance of the relevant genes for representative sample DB 78A. In this assessment analysis, at a cut-off value of 0.03519 (the best index cut-off in the above 158 samples), the error rate was 21.42% (9/42), demonstrating that 54 gene markers could be classified as obese individuals. Most obese patients (16/17) were correctly diagnosed with obesity. In addition, the ROC for the test set was plotted from the obesity index for the test set, AUC 0.9024 (fig. 4). At the optimum threshold 0.1337, the True Positive Rate (TPR) was 0.9412 and the False Positive Rate (FPR) was 0.24.
TABLE 6 calculation of the gut health index for 42 samples
TABLE 7 relative abundance of genes in sample DB78A
Gene id
|
DB78A (calculation of relative abundance of genes)
|
Enrichment (1 ═ obesity, 0 ═ control))
|
64552
|
0
|
0
|
1208989
|
0
|
0
|
2285506
|
1.46332E-06
|
0
|
3104115
|
3.47323E-06
|
1
|
3581202
|
0
|
0
|
5042942
|
0
|
1
|
5243950
|
5.26732E-06
|
1
|
6793200
|
1.06787E-06
|
1
|
7860042
|
0
|
1 |
Example 3 validation of 9 Gene biomarkers in 22 samples (test set)
The inventors verified the discrimination ability of the obesity classifier using another 22 samples (table 8) including 9 case samples and 13 control samples (5 samples after 1 month of operation and 8 samples after 3 months of operation), which were also collected at rekins hospital, the medical institute of Shanghai transportation university. Cases represent pre-operative samples and controls represent 1 and 3 months post-operative.
TABLE 8.22 information on samples
Before: before operation; 1-M: surgery after one month; 3-M: three months later.
DNA from each sample was extracted and a DNA library was constructed, followed by high throughput sequencing as described in example 1. The inventors calculated the gene abundance profiles of these samples using the same method as described in Qin et al, 2012, supra. Then determining the sequence as shown in SEQ ID NO: 1-9 relative abundance of the genes for each marker. The index for each sample was then calculated by the following formula:
Aijis the relative abundance of marker i in sample j.
N is a subset of the markers enriched in all patients among the selected biomarkers associated with the abnormal condition (i.e., a subset of the markers enriched in all obesity among the 9 selected gene markers),
m is the subset of markers enriched in all controls among the selected biomarkers associated with the abnormal condition (i.e., the subset of markers enriched in all controls among the 9 selected gene markers),
| N | and | M | are the number of biomarkers in the two subsets, where | N | is 5 and | M | is 4,
wherein an index greater than a threshold value indicates that the subject has or is at risk of developing obesity.
Table 9 shows the calculated index for each sample and table 10 shows the relative abundance of the relevant genes of representative sample DB 126. In this assessment analysis, the error rate was 22.72% (5/22) at a cutoff value of 0.03519 (the best index cutoff value among the above 158 samples), demonstrating that the 54 gene markers could be classified as obese individuals. And most obese patients (8/9) were correctly diagnosed as obese. In addition, the ROC for the test set was plotted from the obesity index for the test set, AUC 0.8462 (fig. 5). At the optimum threshold 0.9695, the True Positive Rate (TPR) was 0.6667 and the False Positive Rate (FPR) was 0.07692.
TABLE 9 calculation of the gut health index for 22 samples
TABLE 10 relative abundance of genes in sample DB126
Gene id
|
DB12 (calculation of relative abundance of genes)
|
Enrichment (1 ═ obesity, 0 ═ control)
|
64552
|
0
|
0
|
1208989
|
0
|
0
|
2285506
|
7.99701E-08
|
0
|
3104115
|
6.25943E-05
|
1
|
3581202
|
0
|
0
|
5042942
|
7.19308E-08
|
1
|
5243950
|
5.97579E-07
|
1
|
6793200
|
0
|
1
|
7860042
|
1.52752E-07
|
1 |
Thus, the inventors identified and validated 9 marker sets by a minimum redundancy-maximum correlation (mRMR) feature selection method based on 396,100 obesity-related markers. And the present inventors established an intestinal health index, and evaluated the risk of obesity based on these 9 intestinal microbial gene markers.
Although illustrative embodiments have been shown and described, it will be understood by those skilled in the art that the above embodiments are not to be construed as limiting the present disclosure and that changes, substitutions and alterations can be made thereto without departing from the spirit, principles and scope of the present invention.
Sequence listing
<110> Shenzhen Hua Dagen science and technology Limited
Shenzhen Huada Gene Research Institute
<120> biomarkers for obesity-related diseases
<130> XXXX
<160> 9
<170> PatentIn version 3.5
<210> 1
<211> 267
<212> DNA
<213> unknown
<220>
<223> isolated from intestinal tract, not identified
<400> 1
gtactatata aatatggagg tgattacatg gcagagcgtg aaaaacaata taaaggaatc 60
attagttatg aaaaattatg gaatcttatg caaacaaaaa atataaaaaa aagagacttg 120
agagagactt ataaaatttc tcctactatt attagtagac ttagcaacaa cgcaaacgta 180
gctgtagaca ctatcatgta tctttgtgaa atcttaaact gtcagcccag tgatattatg 240
gaatacatcc cgccggaatc agtttaa 267
<210> 2
<211> 1362
<212> DNA
<213> unknown
<220>
<223> isolated from intestinal tract, not identified
<400> 2
atgtctgata aaattgccgc cattgccacg ggccacgccc gcacgggcat cggcgttctg 60
cgtctatccg gcgacgggtg cattgaggcc gcggaacagg tcttccggct gaactccggc 120
aggccgcttt cttccctctc cgaccgcaag cttgcgctcg gcacgctctt tgacgcacag 180
ggacggccca tcgaccactg catggcattc atctcccgcg ccccgcattc ctacaccggc 240
gaggataccg ccgaaattca gtgccacggc tcccccgcag cactgaccgc cgggcttgaa 300
gcgctgttcg ccgcaggctt ccggcaggcg cggcccggtg aattcacgcg ccgcgcgttt 360
ttgaacggcc agatggattt gacgcaggcc gaagccgtga tcgatctcat cgacgccgag 420
acagccgacg cggcggcaaa cgccgccgga caagttgcag gtgcgatccg caaaaagatc 480
gacccgatct acgacggctg ggtcgatctg tgcgcgcatt tccacgccgt gctcgattac 540
cccgacgagg atatcgaccc cttcacgctc tccggctatg aagcgtctct gacagaaagc 600
agccgtcagc ttaccgcact gctttcctcc tgtcgtcgcg gccggatggt gcagtccgga 660
atcaaggcgg tcattctggg cagtccaaac gccggaaaat ccagccttct taaccgcctg 720
tccggctttg accgcgtgat cgtcaccgac attcccggca cgacgcgcga cacggtcgag 780
cagaccgtca cactcgggcg gcaccttgtc cgtctggttg ataccgcagg catccgcgac 840
acgcaggatg tgatcgaaaa aatcggcgtt gaccgctcga tggaagccgc gaaggactgc 900
gatgtagcgc tgtatgtcct tgatgattcc aagcctatga cggaagagga tcgccgcgcg 960
atgggcgcgg cgctcgaagc gccggaagcc gttgcgattt tgaacaaaca ggatctggct 1020
gccgtgatcg aaccgtccga tctgccgttt gcgtacattg tccccatctc ctgcaaggac 1080
ggcacgggct ttgatctgct ggagcaggcg tttgatatgc tcttcccgga cgatacgccg 1140
tgtgacggtt cgcttctgac gaacgcgcgg caggccggcg ccattctgcg cgcaaaaaaa 1200
tccgtcgatt ccgccgtgca gagcctgcgt gcgggtatga cgcccgacgc ggtgctggtc 1260
gatctggaat cggcgatgga tgcgctcggc gaggtcacgg gccggacgat gcgcgaggat 1320
atcacaaacc gcatttttga gcggttctgc gttggaaaat aa 1362
<210> 3
<211> 546
<212> DNA
<213> Dorea longicatena
<220>
<223> Dorea longicatena isolated from intestinal tract
<400> 3
agaagctggg ttgaatttct agaaggagga gcacacatga gacaggaacg tattgatgct 60
aattcattag aattaaacga aaaagtagta tcaattaaac gtgtaaccaa agttgttaag 120
ggtggccgta atatgagatt cacagcttta gtagttgttg gtgacggaaa tggccatgtt 180
ggtgcaggtt taggaaaagc tacagaaatt ccggaagcaa tccgcaaagg aaaagaagat 240
gcagctaaga atctcatctc tgtagcatta gatgaaaatg acagtgtaac acatgattat 300
atcggtaaat tcggaggcgc ttctgtatta cttaagaagg ctccagaagg tactggagtt 360
atcgccggtg gtccggcgcg tgccgtaatc gagatggcag gaatcaagaa cattcgtaca 420
aaatctcttg gttctaacaa caaacagaat gtagtacttg ctacaatcga aggattacgc 480
cagattaaaa ctccagaaga agtagctaag cttcgcggta aatctgttga agagatcttt 540
ggctaa 546
<210> 4
<211> 567
<212> DNA
<213> unknown
<220>
<223> isolated from intestinal tract, not identified
<400> 4
atggagattc agatctatta tcagaaatgg gatgaaaaaa gcaaagattc acatgagtta 60
ttagaacgtg tgattcgcat ttacgccaag gcacatcaga taaaattgcc ggaaacgtta 120
cagatttgtc aggaacaaaa taaaaagccg tttctaaaaa acgtatcgga tatctgcttt 180
tctatttccc acagcgggga atggtggagc tgcgcgattg cgccgcagga agttggtctg 240
gacattcagg aagaacatga ctgcaaggca gaacgtctgg caaaacgatt ttttcatccg 300
atggaagtag catggatgga acagcatgga tatgagaatt tttgccggtt gtgggcatac 360
aaggaaagct atgtgaaata tacagggatt gggcttgtaa agggtatgga ttattattcg 420
gtggtgaagc cggatggcac attgaccgga gaggaaagtg tctggcaaaa gcagattgcc 480
tttcaaccag agtgttatat ggttgtgact gcaaaacagg aagcagaagt aaagctgttt 540
ttattgacag aacagagtaa aaaatga 567
<210> 5
<211> 849
<212> DNA
<213> unknown
<220>
<223> isolated from intestinal tract, not identified
<400> 5
atgttcgaag accagataag aaagtacgac aacctcgttt ctcaatacaa aaagattgta 60
tcagacaaat tctgcaagca agactttatc gaatacaatg agattctatt ctctgcccac 120
agctgtgcta tcgaggggaa cagtttttct gtggatgaaa cacgtaccct caaggaaaaa 180
ggactgggta tgatacccca aggaaaaact ctccttgaag cgtttgagat gcttgaccat 240
ttccaagcat acgaatatct tctgaaaaat ctggacaaac ctttgtctga ggaactttta 300
aaggaaacac acaaactact gacagaacat acattggcat tccggacaca acatgacgaa 360
catccttcac agccaggaga atataccaca gtagacatgt gtgcaggaga tactattttt 420
ggagaccata aggaacttat aaagcgcgta cccaacctat tggaaagtac acagaaagca 480
atggactcgg gtgatataca tcccatcata atcgcagcca aatttcacgg tttttatgaa 540
tatctccatc cttttcgaga cgggaacggc cggcttggaa gaatgtttac caacttcatt 600
ttgctgaaaa aagaccaacc gatacttatc attccccgtg aaaaaagaga agagtatatc 660
agtgcgttaa gattcatccg aaaagaacgc acagatgaat atctgataga ctttttcttt 720
aacacagcca ttgaaagaat gcaatctgaa attgaagaaa aaaacaatat gacaaacaat 780
ttcattcatg gaataaattt tattcagaat gaaaaatcca cctcacaaga caacgagatt 840
gaagcttaa 849
<210> 6
<211> 1020
<212> DNA
<213> Bacteroides intestinalis
<220>
<223> isolated from intestinal tract, Bacteroides intestinalis
<400> 6
ataaatcata aatctgaaat cataaatcat aagatggata ccaaatcaca aaacagtaac 60
atgctactag cgttcttaac gctgacaggc gtcattgcca ttgtagccgt agtgggtttc 120
ttcatgttgc gcaaaggacc ggaaatcgta caaggtcaag ctgaagtcac agagtatcgg 180
gtatcgagca aagtaccggg acgaattctg gaattccgcg taaaggaagg acaaagtgta 240
caagccgggg ataccctcgc catactggaa gctccggatg tgttagctaa gctggaacag 300
gctcgggccg cagaagctgc tgctcaggca cagaatgaga aagcactgaa aggcgcccgt 360
catgaacagg tacaagctgc ctttgagatg tggcaaaaag ccaaagcagg acttgaaata 420
gccgagaaat cttacaaacg tgtgaagaat cttgctgatc agggcgtgat gtcagcccaa 480
aagctggatg aggttactgc acagcgggac gctgccgtcg ctacagaaaa agccgccaaa 540
gctcaatacg acatggcaaa aaacggtgcg gaacgagaag ataaagcagc agctgccgcc 600
ttggtagacc gtgccaaagg agccgttgca gaagtagaat cttatatcaa agaaacttat 660
cttatagccc agacagccgg agaagtttcg gaaatcttcc ccaaagtagg tgaattggta 720
ggaaccggtg caccgatcat gaacatagcc atactggatg atatgtgggt aacttttaat 780
gttcgggaag atttgctgca aggtctgaca atgggaactg aattcgaagc tttcgttccg 840
gcattggata aaaatatccg cctgaaagtg aactacatga aggatcttgg tacttatgca 900
gcctggaagg ctacgaaaac aaccgggcaa tttgatctga agacttttga ggtaaaagct 960
ttaccacagg agaaagtgga aggcttacgt cccggaatgt cggtgatact gaagaagtga 1020
<210> 7
<211> 765
<212> DNA
<213> unknown
<220>
<223> isolated from intestinal tract, not identified
<400> 7
atggcaagtc ctttcctgcc gaaaaacggc atcacaggac aattcgccat cggacggcaa 60
aatcctgcga accgggcgga gcaaatgcag aaacttgcaa attcagcctg cacacgatat 120
aataaagcgg aaagtaggtg cattttcgtg aaaaaggcag tcaattgggc aatgtacact 180
ggcagcatcg gctttatgct gaacggaatc attggcatcg tgaagctgct gcacggcaat 240
aacaaggtgg cgatgcaagc tcaccttgct gtcgttttct ttgtcatcgc ggcggttggc 300
atcatctgca agattctgct gaaccgctgc ccgaactgcc ggaagagcgt catgacgcgc 360
ggcgaatact gcccgtactg cggtgaaaaa atcaaaaaaa gcaatgagct gctggacggc 420
acagttctgc ggcacgagtg gttcaaaagc agaaaatcgc caatttcgcc cgcagacggg 480
aagaatgaag cgggggaaaa ggtgcgcttt cataaaaagg cagttaattg ggtgcaggcg 540
attgcacttg ttgcgttcgt ggtggatttg tgcttcattg ccggggaatg gttgcagagc 600
gattgcgaaa ggctgacggc aggctgcatc ggtgcggcgt gctggttcat cgtgctgata 660
tgcgtgtttt gcaagatacg gatgaaccgc tgcccgcact gcgggaaggt tgtcgagaca 720
cgcggcgact attgtccgta ctgcggcgaa aaagtcaaat tgtga 765
<210> 8
<211> 3603
<212> DNA
<213> Prevotella copri
<220>
<223> Prevotella copri isolated from intestinal tract
<400> 8
tattttttga aagtagatgt taccgattcg ctttcagaaa cgtattttta ttatagtatt 60
aacgaaaaag agaaaaatat gaaagaagta agatgtgtat tcatgtttgc tgtcctgact 120
ttatggatgg cattgcctgt aggagcacag aatgcaaaga gaggcatcac gatgtcgctc 180
gttaatgagt ctttggcttc tgcattgaga aaagtacagc aggagtcgga ttataaggtg 240
agttttgtga tagaggacgt aaatccctat accacaacgg ttcatctgaa gaatgcttct 300
gcctcaactg ccgtaaagca gattttacag ggtaaacctt tcacttactc agtaagtggt 360
aagtttatta ccgtcaagaa ggttttgcaa cataaaactg cagctgaagc tacagataat 420
ggagctgcca tccgccccct ctcaggaaca attaccgatg aggatggaga acctatgatt 480
ggcgcttctg ttgtagtgcc tggcagtcct tttggaaccg tgactaatac ggatggtaat 540
tttgagtttt atattcctaa aggatgtcat gaggttacgg tttcgtatgt cggtatgaat 600
gaccagaagg taaaggtggc tggtagggat catgtgaaga ttgtgatgtc tgaaaataag 660
accatacttg gtgaagtggt tgtaacgggt tatcagacaa tctctaagga gcgtgcaacc 720
ggtgctttta ccaaggtaac agctgatgaa ctgaaagaca agaggttggg taatctctct 780
actgtgctgg ctggcgaggt tgccggatat aatgacggca tgattcgtgg cgtaaccacg 840
atgaatgcct ctgcttctcc tctctatgtc atagatggct ttccagtaga gaaaactaag 900
ttgaccggca atggaactgg tgatattacc gaggaaatcc ctgatttgaa tatggatgat 960
attgagagta ttactgttct taaagatgct gctgctgctt ccatctatgg tgctcgtgct 1020
gccaatggtg tcattgtcat cactaccaag aaatcggcag ctaagggtaa gaccaatgtc 1080
tctttcaatg cttctcttac ctggcatcct tattcttatt atacagatta tctcgccaat 1140
tcttctcttg ttattgactt ggagaaggag tgggctgcgc agaatccgaa tcttgcaggc 1200
aatggagcta aggtatacgc tcagaatatg ttggatcaga aaatctatac gagcgcgggt 1260
atctgcaata tcttgaatta ttatgcaggc aatatctctg agtcagaaat gaatgctaaa 1320
ctgactgatt tggctggaaa aggttataac tactatgacc agatgaagaa gtatgctaag 1380
cgcgatccgt tgtatcagca gtataatatg aatattgtaa acaactcggt gaataatctc 1440
ttcaaggctt cggtcactta taagtataat acccttgaag ataagtattc caataatcag 1500
agtttaggta tcaaccttac caatacttct catttcacaa aatggctttc tttggattta 1560
ggtgcttacc taaaggtggg tgaggatcag acacagaact acaatgtcct ttctcctgga 1620
tattctgttt tgccatacga taatttagtg aatgctgatg gctcttgtta taccaagccg 1680
atgagtgaga gatatgatgc aagtactttg gctatctatc agaagtatgg tctttataat 1740
atggatatca ctcctttgga agagctggat cgtcagatta ccaccaacaa ggagttggct 1800
ttacgaactt ttgcccgtct gaatgtacag attctgccct gtttgaagta tgctgcttcc 1860
ttccagtacg aacgagcttc atatcgtggt gaaaactggg ctgataaagc ttctgtgcag 1920
gttcggggtg tggtaaacgg gtatgcaacc gataatggtg atggctctgt caattatgtg 1980
attccttatg gggatatcct ggttcgttca gaccaatata cctctgctta taacttccgt 2040
cagcagttga gttttgatca gactttcaag gatgttcaca gcgtaactgc tattctgggc 2100
acggaaacca tccagaacaa gcaggaattg catcgagaca aactgtttaa ctatgattct 2160
cagatgttaa catccaatat ggtgaataat gctgatttgt tgaaaggtat agcgggtgtg 2220
ctgggatata aatctatgac cgcgactgat ttgtctgctt cttatgaaaa tgtaaaccgt 2280
tatgtttcgg tatatggtaa tgctgcttat acttacgatg accgttatag cttgactggt 2340
agtttacgat gggaccgctc taatctctgg ggaaccagtt cgaagtttca gaacaagcct 2400
atctggtcgg taggtgccag ttggattatc agtaaggaga agttcttcca cgcagattgg 2460
gtcaattatc tgaaactccg tgtttctgat ggtattgccg gtaacgtatc taagaattct 2520
gccccttata tggtggcaag ctataataat aacgggcatg tgggaggtac acagggatat 2580
gtgcagtcgc gtgccaatcc gatgctgagc tgggagaaaa ccaatacctt taatataggt 2640
atcgatttct cgctctttaa gaaccgactc aatggtactg tagagtatta cgacaagaag 2700
ggtaccgacc tcctggcttc ttccatgggt gtacctactg aaggttgggg ctacagcacc 2760
tataccatca acaatggcga gatgtacaat cgtggtgtag aaatttccct gagtggtgag 2820
gttctgcgta ccagggattt ctcatggaga gccaatatga cttatgctta taataagaat 2880
gaggtgactt atgtaaatgt aaaggcgccg gtttatattc ttcagttgga ttacccttcg 2940
gcttatccta ttatcggcaa cgaatataat gcaatctatg gctataagtg ggctggattg 3000
agtgaagaag gattgccaca ggtgtataac gagaatggtg agaaggtaac taatcagcca 3060
acgacattgg atgccatctc ttacatgggt actaccactc ctaagtatag tggatctttc 3120
ggaaccagta tcagctacaa ggatttcgat ttcagtatgc agttcctttt tgcgggtggt 3180
cataagatgc gcaatgccaa tcctgctttc ctcacttgca gttattccag cgtaggatat 3240
atctccaata ttgcgggagc tagtgccgga cttgctaatc gttggcagaa accaggtgat 3300
gaggcttata ccaatgtgcc aaaggctgtt tttgccgaaa gtggtttgtc agcatcttct 3360
ttgtatagta cttattttta ttctgatatc aatatccttg atgccagtta tattcgtctg 3420
aacaacattt ctctggctta tcatttgcca aagtcgctgt gccgttcgct ctatatgcag 3480
agtgcccgtg ttcaggctaa tgtggagaat ccgttcttct gggcaaagac caaacaggct 3540
aagtaccagc tgggtggtta taatgccacc aactatgtgt taggcattta tctcaacttt 3600
taa 3603
<210> 9
<211> 375
<212> DNA
<213> unknown
<220>
<223> isolated from intestinal tract, not identified
<400> 9
atggcaaaat acgatgattt accagtattc aaggctacgt atgatctgtt atttcagata 60
tttaatgtaa gccagcattg gcgacgtgat atacgatata gcttaggtga agatcttaag 120
aaggaaataa tagaaatctt acaactcatc tatcaagcaa attctacaag aagcaaaatt 180
gcatatattt cgtcttgcag agtaaaaata gtcaaagtca aactacaggt acgtatagcc 240
aaagatttga aagaattgca tataaatcaa tatgcatttc tggcagaaat gatggaaagt 300
gtttctaaac agctgacaag ctggatgaaa tcagaacaga aaaaagaaca aaatttagaa 360
caaaaagaaa aataa 375