The content of the invention
The embodiment of the disclosure attempts at least to solve at least one problem present in prior art to a certain extent.
Following discovery of the invention based on the present inventor:
Assessment and sign to intestinal microbiota are had become to the main research in the human diseases including obesity
Field.In order to which the enteric microorganism composition to obesity patient is analyzed, the present inventor is based on from 158 individual enteron aisles
The depth air gun sequencing of microbial DNA implements grand genome association analysis (MGWAS) scheme (Qin, J. et al., A
metagenome-wide association study of gut microbiota in type 2 diabetes.Nature
490,55-60 (2012), are incorporated herein by reference).The present inventor differentiates and demonstrates the related base of 396,100 obesity
Because of label.In order to which using the potential ability that fat grader is carried out by intestinal microbiota, inventor developed based on 9
The classification of diseases device system of individual gene marker, the gene marker is by minimal redundancy-maximal correlation (mRMR) Method for Feature Selection
It is defined as optimal gene set.In order to carry out intuitively assessing obesity based on 9 enteric microorganism gene markers
Risk, the present inventor calculates health index.The grand genome of the data pair of the present inventor enteron aisle related to obesity risk
Feature has made intensive studies the future studies of the pathophysiological role in other relevant diseases there is provided the grand genome of enteron aisle
Example and potential application for being estimated based on intestinal microbiota to the individual of such disease risks.
It is believed that detection of the gene marker of intestinal microbiota for improving obesity in early stage is that have
Value, this is due to the following reasons.First, label of the invention is more specific and quicker compared with conventional labels thing
Sense.Second, copra analysis ensure that accuracy, security, affordability and patient compliance.And fecal specimens are to transport
's.Therefore, the present invention relates to a kind of comfortable and noninvasive in-vitro method so that people more easily participate in given screening journey
Sequence.3rd, label of the invention is also used as the Treatment monitoring instrument of cancer patient, to detect its response to treatment.
On the one hand the disclosure provides the biomarker collection for being used for predicting subject's disease related to micropopulation, its
Consist of:
Including SEQ ID NO:The enteron aisle biomarker of 1 to 9 at least part sequence.
According to the embodiment of the disclosure, the disease is obesity or relevant disease.
Using these biomarkers, subject's some diseases related to micropopulation can be analyzed, such as based on next
From some samples of subject, for example, it can use some fecal specimens, it may be determined that obesity or relevant disease.
On the other hand the disclosure provides the kit for being used for determining said gene label collection, and it includes being used for PCR expansions
Increase and according to such as SEQ ID NO:DNA sequence dna described in 1 to 9 at least part sequence and the primer designed.
On the other hand the disclosure provides the kit for being used for determining said gene label collection, and it includes more than one root
According to SEQ ID NO:Gene described in 1 to 9 and the probe designed.
Another aspect of the present disclosure, which provides said gene label collection, to be used to predict subject's obesity or relevant disease
Risk purposes, it includes:
(1) sample j is collected from subject;
(2) SEQ ID NO in the DNA of sample are determined:The relative abundance information of each in 1 to 9;With
(3) calculate according to the following formula by IjThe sample j of expression index:
AijIt is the relative abundance of label i in sample j, wherein i refers to each gene mark that the gene marker is concentrated
Remember thing;
N is the label being enriched with all patients in the selected biomarker related to unusual condition
First subset,
M is the label being enriched with all controls in the selected biomarker related to unusual condition
Yield in the second subset,
| N | and | M | it is the number of the biomarker in the first subset and yield in the second subset respectively,
Wherein
Index more than critical value shows that subject has in unusual condition or risk in the situation of undergoing an unusual development.
According to some embodiments of the disclosure, | N | it is 5, | M | it is 4.
According to some embodiments of the disclosure, critical value is 0.03519 to 0.1337.
Another aspect of the present disclosure provides said gene label collection and prepared for predicting subject's obesity or phase
Purposes in the kit of the risk of related disorders, it includes:
(1) sample j is collected from subject;
(2) SEQ ID NO in the DNA of sample are determined:The relative abundance information of each in 1 to 9;With
(3) calculate according to the following formula by IjThe sample j of expression index:
AijIt is the relative abundance of label i in sample j, wherein i refers to each gene mark that the gene marker is concentrated
Remember thing;
N is the mark being enriched with all patients in the selected selected biomarker related to unusual condition
First subset of thing,
M is the mark being enriched with all controls in the selected selected biomarker related to unusual condition
The yield in the second subset of thing,
| N | and | M | it is the number of the biomarker in the first subset and yield in the second subset respectively,
Wherein
Index more than critical value shows that subject has in unusual condition or risk in the situation of undergoing an unusual development.
According to some embodiments of the disclosure, | N | it is 5, | M | it is 4.
According to some embodiments of the disclosure, critical value is 0.03519 to 0.1337.
Another aspect of the present disclosure provides whether diagnosis subject has the unusual condition related to micropopulation or place
Method in the risk for developing the unusual condition related to micropopulation, it includes:
It is determined that in the sample from subject above-mentioned biomarker relative abundance, and
Determine subject whether with the unusual condition related to micropopulation or in development based on the relative abundance
In the risk of the unusual condition related to micropopulation.
According to the embodiment of the disclosure, this method includes:
(1) sample j is collected from subject;
(2) SEQ ID NO in the DNA of determination sample:The relative abundance information of each in 1 to 9;With
(3) calculate according to the following formula by IjThe sample j of expression index:
AijIt is the relative abundance of label i in sample j, wherein i refers to each gene mark that the gene marker is concentrated
Remember thing;
N is the label being enriched with all patients in the selected biomarker related to unusual condition
First subset,
M is the label being enriched with all controls in the selected biomarker related to unusual condition
Yield in the second subset,
| N | and | M | it is the number of the biomarker in the first subset and yield in the second subset respectively,
Wherein
Index more than critical value shows that subject has in unusual condition or risk in the situation of undergoing an unusual development.
According to some embodiments of the disclosure, | N | it is 5, | M | it is 4.
According to some embodiments of the disclosure, critical value is 0.03519 to 0.1337.
According to the embodiment of the disclosure, the unusual condition relevant with micropopulation is obesity or relevant disease.
Embodiment
Example 1. differentiates the biomarker for assessing obesity risk
1.1 sample collection
Excrement from 158 Chinese subjects (including 78 obesity patients and 80 control subjects's (training set))
Sample was collected by Medical College, Shanghai Communication Univ.'s Ruijin Hospital in 2012.Obesity patient's age, BMI was higher than from 18 to 30 years old
25.It is required that subject collects fresh excreta sample in hospital.The sample of collection is placed in sterile tube, -80 DEG C are stored in immediately
Until being further analyzed.
Complete ethics approval is achieved, and all patients give written informed consent.The research obtains Shanghai
The approval of medical college of university of communications Ruijin Hospital Institutional Review Board.
1.2DNA extract
Fecal specimens are thawed on ice, and use Qiagen QIAamp DNA Stool Mini kits
(Qiagen) DNA extractions, are carried out according to the explanation of manufacturer.The RNase of no DNA enzymatic is used to handle extract to eliminate RNA dirts
Dye.Use NanoDrop spectrophotometers, Qubit fluorescence photometers (there is Quant-iTTMdsDNA BR to determine kit) and gel
Electrophoretic determination amount of DNA.
The DNA library of 1.3 fecal specimens builds and is sequenced
Carrying out DNA library structure according to the explanation of manufacturer, (Illumina inserts size 350bp, read length
100bp).The present inventor carries out fasciation into, template hybridization, isothermal duplication, linearisation, envelope using workflow as hereinbefore
Close and be denatured and sequencing primer hybridization.The present inventor constructs one with insertion size for each sample
(PE) library of 350bp end pairing, then carries out high-flux sequence and obtains about 3,000 ten thousand PE that length is 2x100bp
Read.Polluted by being filtered out from the original reads of Illumina with uncertain " N " base, joint pollution and people's source DNA
Low quality read and obtain high-quality read by the low quality terminal bases for shearing read simultaneously.
The present inventor is on the platforms of Illumina HiSeq 2000 from 158 samples (78 cases and 80 controls)
Each sample about 5.9Gb fecal microorganism group's sequencing data (high quality, clean data) (table 1) is exported altogether.
The grand genomic data of table 1 collects.4th result of the row report from Wilcoxon rank tests.
1.4 grand genomic data processing and analysis
1.4.1 read is compared
Inventors used Li, J. et al., An integrated catalog of reference genes in
The renewal that the human gut microbiome.Nat.Biotechnol. (2014) (being incorporated herein by reference) are set up
Human intestine's gene catalogue, and to compare standard homogeneity >=90 by the human intestine of high-quality read comparison to the renewal
Gene catalogue.Average read comparison rate is shown in Table 1.The comparison rate is close to Li, J. et al., 2014, ibid in sample, this
Illustrate that the comparison rate is sufficient for further research.After read comparison, the present inventor's use and Li, J. et al., 2014,
Ibid identical method exports gene profile (9.9Mb genes) from comparison result.
The taxology distribution of gene.Using interior described in published paper (Li, J. et al., 2014, ibid)
The flow (pipeline) of portion's exploitation is predicted the taxology distribution of gene.
1.4.2 data file is built
Gene profile.Based on read compare result, the present inventor use disclosed T2D papers (Qin et al., 2012, together
On) described in same procedure calculate Relative gene abundance.
1.4.3 the factor analysis of intestinal microbiota gene profile is influenceed.Based on gene profile, the present inventor is more using nonparametric
First variance analysis (PERMANOVA) assesses the shadow of 6 clinical parameters (including age, sex, height, body weight, BMI and obesity)
Ring.Inventor is analyzed using the method implemented in " vegan " bag in R, and passes through 10,000 displacements
(permutation) (permuted) p value of displacement is obtained.The present inventor is also using Benjamini-Hochberg methods in R
It is middle that multiple testing is corrected using " p.adjust ", to obtain the q values each tested.PERMANOA is determined and enteric microorganism
Related three key factors (being based on gene profile) (q<0.05, table 2).Analysis shows, body weight, BMI and obese state are strong close
Connection mark, it was demonstrated that disease (obesity) state is the major determinant for influenceing intestinal microbiota to constitute.
The PERMANOVA of Euclidean distance analysis of the table 2 based on gene profile.In q values<Analyzed for 0.05 time to test
Whether clinical parameter and Obesity have on intestinal microbiota significantly affects.
1.4.4 the determination of obesity mark of correlation thing
The determination of obesity related gene.In order to determine the association between grand genome spectrum and obesity, 9,879,897
Using double tails in individual High frequency gene (removal is present in less than the gene in 10 samples in all 158 samples) spectrum
Wilcoxon rank tests.Obtain 396 be all enriched with case and control, 100 gene markers, p value<0.01、FDR
=3.8% (Fig. 1).
False discovery rate estimates (FDR).The present inventor applies " q values " method proposed in previous research rather than continuous p
Value exclusive method (sequential p-value rejection method) estimates FDR (Storey, JDA direct
approach to false discovery rates.Journal of the Royal Statistical Society
64,479-498 (2002), are incorporated herein by reference).
Receiver Operating Characteristics (ROC) analyze.The present inventor analyzes to assess based on grand genomic marker thing using ROC
The performance of obesity classification.Then, the present inventor wraps to draw ROC curve using " pROC " in R.
1.5 select method (maximal correlation minimal redundancy (mRMR) feature choosing of 9 optimum mark things from biomarker
Select framework)
In order to determine optimal gene set, using minimal redundancy maximal correlation (mRMR) (details referring to Peng, H.,
Long, F.&Ding, C.Feature selection based on mutual information:criteria of
Max--relevance and min-redundancy, IEEE Trans Pattern Anal Mach Intell27,1226-
1238, doi:10.1109/TPAMI.2005.159 (2005), it is incorporated herein by reference) Method for Feature Selection is from all fertilizer
Selected in fat related gene label.Inventor performs incremental inspection using " sideChannelAttack " bag of R softwares
Rope, and find 158 continued labelling thing collection (sequential markers sets).One is stayed by linear discriminant grader
Cross validation (LOOCV), the present inventor have estimated the error rate of each continuum.The optimal selection of label collection is corresponded to most
That label collection of low error rate.In our current research, inventor is to one group 396, and 100 obesity-related gene labels are carried out
Feature selecting.Due to can not computationally use all genes to carry out mRMR, the present inventor has obtained statistically nonredundancy
Gene set.First, we select 8010 gene (q<0.0005).Then, the present inventor applies mRMR Method for Feature Selection and true
Best set (lowest error rate, figure with 9 gene biological labels of obesity strong correlation in classifying for obesity are determined
2), it is shown in table 3 and table 4.Gene id comes from disclosed such as Li, J. et al., 2014, reference gene catalogue ibid.
The enrichment information of the optimal gene marker of 3. 9, table
Gene id |
Enrichment (1=obesity, 0=controls) |
64552 |
0 |
1208989 |
0 |
2285506 |
0 |
3104115 |
1 |
3581202 |
0 |
5042942 |
1 |
5243950 |
1 |
6793200 |
1 |
7860042 |
1 |
The SEQ ID of the optimal gene marker of 4. 9, table
Gene id |
SEQ ID NO: |
Gene _ id:7860042 |
1 |
Gene _ id:1208989 |
2 |
Gene _ id:5243950 |
3 |
Gene _ id:5042942 |
4 |
Gene _ id:3104115 |
5 |
Gene _ id:2285506 |
6 |
Gene _ id:3581202 |
7 |
Gene _ id:64552 |
8 |
Gene _ id:6793200 |
9 |
1.6 intestinal health indexes (obesity index)
In order to develop the potential ability that classification of diseases is carried out by intestinal microbiota, inventor developed based on this hair
The classification of diseases system for 9 gene markers that a person of good sense defines.It is straight in order to be carried out based on these enteric microorganism gene markers
See ground and evaluate disease risks, the present inventor calculates intestinal health index (obesity index).
In order to evaluate influence of the grand genome of enteron aisle to obesity, the present inventor is based on selected 9 as described above
Gene marker defines and calculated the intestinal health index of each individual.For each single sample, calculate according to the following formula
By IjThe sample j of expression intestinal health index:
AijIt is the relative abundance of label i in sample j;
N be in the selected biomarker related to unusual condition (i.e. in this 9 selected gene markers
The label being enriched with all obesity subset) in the label being enriched with all patients subset,
M be in the selected biomarker related to unusual condition (i.e. in this 9 selected gene markers
The label being enriched with all controls subset) in the label being enriched with all controls subset,
| N | and | M | it is the number (size) of the biomarker in the two subsets respectively, wherein | N | it is 5, | M | it is 4,
Index wherein more than critical value shows that subject has obesity or in the risk for developing obesity.
The 1.7 obesity classification based on enteric microorganism
Relative abundance of the present inventor based on this 9 gene markers calculates obesity index, and it is clearly distinguished
Obesity patient's microorganism group is with compareing microorganism group (table 5).Using obesity index by 78 obesity patient's microorganism groups
Sort out to come from 80 control microorganism groups, it shows that Receiver Operating Characteristics (ROC) TG-AUC is 0.9763 (figure
3).Under optimum index critical value 0.03519, True Positive Rate (TPR) is 0.9487, and false positive rate (FPR) is 0.1, error rate
For 8.23% (13/158), show that 9 gene markers can be used for Accurate classification obesity individual.
The intestinal health index (obesity patient and the control of non-obese disease) for 158 samples that table 5. is calculated
Example 2. verifies 9 gene biological labels in 42 samples (test set)
The present inventor (is included in Medical College, Shanghai Communication Univ.'s Ruijin Hospital collection using another new dependent research groups
17 obesity patients and the control of 25 non-obese diseases) demonstrate the resolving ability of obesity grader.
DNA and the constructed dna library of each sample are extracted, high-flux sequence is then carried out as described in example 1.The present inventor
Using with Qin et al., 2012, ibid described in identical method calculate the gene abundance spectrums of these samples.Then such as SEQ is determined
ID NO:The gene relative abundance of each label shown in 1-9.Then the index of each sample is calculated by following formula:
AijIt is the relative abundance of label i in sample j;
N be in the selected biomarker related to unusual condition (i.e. in this 9 selected gene markers
The label being enriched with all obesity subset) in the label being enriched with all patients subset,
M is the biomarker in the selected selection related to unusual condition (i.e. in this 9 selected gene marks
Remember thing in the label being enriched with all controls subset) in the label being enriched with all controls subset,
| N | and | M | it is the number of the biomarker in the two subsets respectively, wherein | N | it is 5, | M | it is 4,
Wherein, the index more than critical value shows that subject has obesity or in the risk for developing obesity.
Table 6 shows the index calculated of each sample, and table 7 shows representative sample DB78A related gene
Relative abundance.It is that (optimum index above in 158 samples is critical at 0.03519 in critical value in the analysis and assessment
Value), error rate is 21.42% (9/42), and checking illustrates that 54 gene markers can sort out obesity individual.It is most of
Obesity patient (16/17) is correctly diagnosed as obesity.In addition, the ROC of test set is painted by the obesity index of test set
System, AUC=0.9024 (Fig. 4).At optimal critical value 0.1337, True Positive Rate (TPR) is 0.9412, false positive rate (FPR)
For 0.24.
Table 6. calculates the intestinal health index of 42 samples
The sample DB78A of table 7. gene relative abundance
Gene id |
DB78A (calculating of gene relative abundance) |
Enrichment (1=obesity, 0=controls) |
64552 |
0 |
0 |
1208989 |
0 |
0 |
2285506 |
1.46332E-06 |
0 |
3104115 |
3.47323E-06 |
1 |
3581202 |
0 |
0 |
5042942 |
0 |
1 |
5243950 |
5.26732E-06 |
1 |
6793200 |
1.06787E-06 |
1 |
7860042 |
0 |
1 |
Example 3. verifies 9 gene biological labels in 22 samples (test set)
Inventor demonstrates the resolving ability (table 8) of obesity grader using other 22 samples, including 9 diseases
Example sample and 13 control samples (5 samples after operation 1 month and 8 samples after operation 3 months), sample is also in Shanghai
Medical college of university of communications Ruijin Hospital is collected.After case represents that preoperative sample, control represent operation 1 month and 3 months.
The information of 8. 22 samples of table
* before:Operation consent;1-M:Performed the operation after one month;3-M:Performed the operation after three months.
DNA and the constructed dna library of each sample are extracted, high-flux sequence is then carried out as described in example 1.The present inventor
Using with Qin et al., 2012, ibid described in identical method calculate the gene abundance spectrums of these samples.It is then determined that such as SEQ
ID NO:The gene relative abundance of each label shown in 1-9.Then the index of each sample is calculated by following formula:
AijIt is the relative abundance of label i in sample j.
N be in the selected biomarker related to unusual condition (i.e. in this 9 selected gene markers
The label being enriched with all obesity subset) in the label being enriched with all patients subset,
M be in the selected biomarker related to unusual condition (i.e. in this 9 selected gene markers
It is all control enrichment in label subsets) in the label being enriched with all controls subset,
| N | and | M | it is the number of the biomarker in the two subsets respectively, wherein | N | it is 5, | M | it is 4,
Wherein, the index more than critical value shows that subject has obesity or in the risk for developing obesity.
Table 9 shows the index calculated of each sample, and table 10 shows representative sample DB126 related gene
Relative abundance.It is (the optimum index critical value in 158 samples above) at 0.03519 in critical value in the analysis and assessment,
Error rate is 22.72% (5/22), and checking illustrates that 54 gene markers can sort out obesity individual.And mostly
Number obesity patient (8/9) is correctly diagnosed as obesity.In addition, the ROC of test set is painted by the obesity index of test set
System, AUC=0.8462 (Fig. 5).At optimal critical value 0.9695, True Positive Rate (TPR) is 0.6667, false positive rate (FPR)
For 0.07692.
Table 9. calculates the intestinal health index of 22 samples
The sample DB126 of table 10. gene relative abundance
Gene id |
DB12 (calculating of gene relative abundance) |
Enrichment (1=obesity, 0=controls) |
64552 |
0 |
0 |
1208989 |
0 |
0 |
2285506 |
7.99701E-08 |
0 |
3104115 |
6.25943E-05 |
1 |
3581202 |
0 |
0 |
5042942 |
7.19308E-08 |
1 |
5243950 |
5.97579E-07 |
1 |
6793200 |
0 |
1 |
7860042 |
1.52752E-07 |
1 |
Therefore, inventor passes through minimal redundancy-maximum correlation (mRMR) spy based on 396,100 fat mark of correlation things
Levy back-and-forth method and identify and demonstrate 9 label collection.And the present inventor establishes intestinal health index, based on this 9 intestines
Road microbial gene label have evaluated the risk of obesity.
While there has been shown and described that illustrative embodiment, it will be understood by those skilled in the art that on
State embodiment and be not construed to limit the disclosure, and embodiment can be changed, substitutions and modifications are without de-
From spirit, principle and the scope of the present invention.