WO2012068949A1 - Classification method based on the metagenome 16s high variable region v6 - Google Patents

Classification method based on the metagenome 16s high variable region v6 Download PDF

Info

Publication number
WO2012068949A1
WO2012068949A1 PCT/CN2011/081858 CN2011081858W WO2012068949A1 WO 2012068949 A1 WO2012068949 A1 WO 2012068949A1 CN 2011081858 W CN2011081858 W CN 2011081858W WO 2012068949 A1 WO2012068949 A1 WO 2012068949A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
sequence
sample
hypervariable region
region
Prior art date
Application number
PCT/CN2011/081858
Other languages
French (fr)
Chinese (zh)
Inventor
刘晓
周宏伟
栗东芳
Original Assignee
深圳华大基因科技有限公司
深圳华大基因研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技有限公司, 深圳华大基因研究院 filed Critical 深圳华大基因科技有限公司
Publication of WO2012068949A1 publication Critical patent/WO2012068949A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria

Definitions

  • the invention relates to the technical field of microbial gene sequencing analysis, in particular to a classification method based on the hypergenome 16S hypervariable region V6. Background technique
  • 16S rRNA RiboNucleic Acid
  • the size of 16S rRNA is about 1500bp ( ⁇ JJit, Base Pair ). Containing information can reflect the evolutionary relationship of the biological world, easy to operate, suitable for all levels of taxon; so in the study of metagenomics, 16S region sequencing is the most commonly used clustering and classification methods.
  • the traditional metagenomic sequencing is performed by Sanger technology sequencing 16S rRNA gene ( 16S rDNA ) to obtain a read length of at least 500 bp. This read length is long enough to assemble a nearly complete 16S rDNA sequence, helping us to accurately study each A sequence of species sources, but it is prone to chimerism, and the cost of sequencing is relatively high, time consuming and laborious.
  • One technical problem to be solved by the present invention is to provide a classification method based on the macrogen 16S hypervariable region V6, by performing solexa sequencing on the high variable region V6 region of 16S, and by short sequences of these 16S variable regions.
  • Systematic classification can accurately reflect species abundance information on a low cost basis.
  • One aspect of the present invention provides a classification method based on the hypergenome 16S hypervariable region V6, the method comprising: deoxyribonucleic acid DNA for extracting microorganisms; and high primer 16G ribosomal deoxyribonucleic acid rDNA Transformer V6 performs polymerase chain reaction PCR, labeling each sample; mixing PCR products of different samples; performing Solexa library construction on the mixed PCR products; The library of the variable region V6 is subjected to double-end pair-end sequencing to obtain a sequencing sequence; the sequencing sequence is screened to filter out low-quality sequencing sequences; based on the sequencing sequence, the relationship of the contig is used to the high-variable region V6 The full-length sequence is assembled; the sequencing sequence is assigned to the corresponding sample by the tag sequence; and the sequencing sequence is analyzed to achieve high-throughput classification of the microbial population by sequencing using the high variable region.
  • the method further comprises: performing sampling of the microbial population prior to extracting the deoxyribonucleic acid DNA of the microorganism.
  • the method further comprises: after classifying the sequencing sequence, The sequences of different degrees of difference were classified by the operational taxonomic unit OTU; based on the tag sequence and the sequencing sequence, the diversity analysis of the population diversity estimation Chaol algorithm and angiotensin converting enzyme ACE was performed.
  • the method for classifying the macrogenome 16S hypervariable region V6 provided by the present invention, after analyzing the diversity of the population diversity estimation Chaol algorithm and the angiotensin converting enzyme ACE, the diversity analysis of the output microbial population Figure and relative abundance map.
  • the step "polymerase chain reaction PCR is performed on the hypervariable region V6 of the metagenomic 16S ribosomal deoxyribonucleic acid rDNA by a primer, And labeling the sequence for each sample” further includes: using primers 967f: CNACGCGAAGAACCTTANC (Seq ID ⁇ : 1) and 1406R: GACAGCCATGCANCACCT (Seq ID NO: 2) to replicate the 16S hypervariable region V6 region of the bacteria in the microbial population Fragment; tag sequence for each microbial sample, the tag sequence to the front of the 5, end of primer 967f, and 1 ⁇ 2 GT between the tag sequence and primer 967f.
  • the method further comprises: using polymerase chain reaction PCR for the hypervariable region V6 of archaea, using primer 958AR: AATTGGANTCAACGCCGG (Seq ID NO: 3) and 1048AR: CGRCGGCCATGCACCWC (Seq ID NO: 4).
  • the step of "mixing PCR products of different samples” further comprises: quantifying the concentration of the PCR product of the 16S high variable region V6 ; and mixed together in equimolar amounts.
  • the step of "building a library for the mixed PCR product by Solexa” further comprises: purifying the mixed product, and repairing the end, Add base at 3, end Base A, plus a double-end Pair-end sequencing adaptor; after the addition of the linker, the sample is purified; the purified sample is dissolved, and used as a template for polymerase chain reaction PCR amplification; and the polymerase chain The reaction PCR product was subjected to gel purification.
  • low quality data a sequence that does not match the nearest primer, a sequence that is less than 50 1 ⁇ 2 pairs, or has at least one ambiguity The sequence of the base.
  • the step of "assembling the full-length sequence of the hypervariable region V6 by using the relationship of contigs" further comprises: employing a hypervariable region The PCR product 5 of V6, the first 75, 70, 65, 60 and 55 bases of the ends are overlapped for assembly; wherein, the standard of assembly is that a pair of sequences has an overlap length of more than 5 base pairs and less than 10% in the overlap region. The degree of mismatch.
  • the step of "classifying the sequence by sequencing” further comprises: comparing the sequencing sequences assigned to the corresponding sample to the existing 16S In the v6 database, high-throughput classification analysis of microbial populations is achieved by label sequencing using high variable regions, and the structure of the microbial population is studied.
  • the present invention provides a high-throughput sequencing of microbial populations in a specific environment based on the method of classification of the metagenomic 16S hypervariable region V6, which combines the labeling technology with Solexa technology, which reduces labor and economics.
  • the cost makes it easy to study the relationship between microbial community structure and health, environmental factors, and the like.
  • FIG. 1 is a flowchart of a method for classifying a hypergen group 16S hypervariable region V6 according to an embodiment of the present invention
  • FIG. 2 is a flow chart showing another embodiment of a classification method based on the metagenomic 16S hypervariable region V6 provided by the present invention
  • Figure 3 shows the number of OTUs in the case of a microbial population of 0.03 and 0.3 in different environments. detailed description
  • FIG. 1 is a flow chart showing a method for classifying a hypergen group 16S hypervariable region V6 according to an embodiment of the present invention.
  • the classification method flow 100 based on the metagenomic 16S hypervariable region V6 includes:
  • Step 102 extracting DNA DNA of the microorganism.
  • step 104 the high variable region V6 of the macrogen 16S ribosomal deoxyribonucleic acid rDNA is introduced by a primer (there is a 20 1 ⁇ 2 pair of bp conserved regions at both ends of the region, and the intermediate variable region is about 60-90 bp). Polymerase chain reaction PCR and labeling sequences for each sample.
  • primers 967f CNACGCGAAGAACCTTANC (Seq ID ⁇ : 1 ) and 1406R: GACAGCCATGCANCACCT (Seq ID NO: 2) are used to replicate fragments of the 16S hypervariable region V6 region of the bacteria in the microbial population; and each microbial sample is tagged , the tag sequence is to the front of the 5' end of the primer 967f, and 1 ⁇ 2 GT (ie 1 ⁇ 2 G and T) is added between the tag sequence and the primer 967f.
  • the tag sequence may be a bar code sequence consisting of 8 bases, and the tag sequence is designed to conform to certain rules, such as base content and number of bases, etc., in order to prevent tags from being mutually related due to individual sequencing errors and the like. Confusion, for example, can be found in the methods and principles disclosed in U.S. Patent Application No. US20100267043A1.
  • Primer 958AR was used: AATTGGANTCAACGCCGG (Seq ID NO: 3) and 1048AR: CGRCGGCCATGCACCWC (Seq ID NO: 4)
  • Step 106 mixing the PCR products of different samples.
  • the PCR product of the 16S high variable region V6 is quantified using a spectrophotometer (e.g., Nanodrop) and then mixed together in equimolar amounts.
  • a spectrophotometer e.g., Nanodrop
  • Step 108 Perform a Solexa database construction method on the mixed PCR product.
  • the mixed product is purified with QIAquick PCR purification Kit (Qiagen), end-repaired (ie, the viscous end of all DNA duplexes is blunt-ended by enzymatic reaction), plus "A", plus Pair-end linker (Pair -end library preparation kit, IUumina ); after the addition of the linker, the sample is purified; the purified sample is dissolved, and polymerase chain reaction PCR amplification (12 cycles) is used as a template; and (QIAquick gel extraction kit) , Qiagen) Gelatinization of the polymerase chain reaction PCR product (ie, spot electrophoresis, gelatinization at the DNA position, purification with a kit).
  • QiAquick PCR purification Kit Qiagen
  • end-repaired ie, the viscous end of all DNA duplexes is blunt-ended by enzymatic reaction
  • A plus
  • step 110 the library of the high variable region V6 is subjected to pair-end sequencing using a Solexa sequencing tool (such as IUumina GA, illumina GA2, illumina Hiseq2000, illumina Hiseq 1000, etc.) to obtain the original sequencing sequence data.
  • a Solexa sequencing tool such as IUumina GA, illumina GA2, illumina Hiseq2000, illumina Hiseq 1000, etc.
  • sequencing is performed directly using the Illumina GA II (75 bp pair-end strategy).
  • the Illumina genome analyzer is a new generation of high-throughput sequencer with low-cost sequencing and high data readout.
  • Solexa sequencing costs one-tenth the cost of 454 sequencing with the same amount of sequencing.
  • the error rate is low (such as single-base sequencing error rate ⁇ 10 _ 5 ), sequencing is unbiased, and for the metagenomics, the abundance information of the species can be truly reflected.
  • the sequencing sequence data is screened to filter out low quality sequencing sequences.
  • the low quality sequencing sequence is selected from any of the following sequencing sequences: a sequencing sequence that does not match the nearest primer, a sequencing sequence of less than 50 bp, or a sequencing sequence that has at least one cryptogenic base.
  • Step 114 Perform a full-length sequence of the high variable region V6 by using the relationship of the contigs Assembly. For example, the PCR product 5 of the high variable region V6, the first 75, 70, 65, 60 or 55 base pairs bp of the ends are overlapped and assembled; wherein the assembly standard may be a pair of sequences having an overlap length of more than 5 bp. And a mismatch of less than 10% in the overlap area (ie, a match of more than 90%).
  • Step 116 assigning the sequencing sequence to the corresponding sample by the tag sequence.
  • Step 118 Perform high-throughput classification of the microbial population by performing classification analysis on the sequencing sequence to achieve sequencing using the high variable region. For example, the sequencing sequence assigned to the corresponding sample is compared to the database 16S v6 database refhvr-V6 by GAST software to achieve high-throughput classification analysis of the microbial population using high-variable region tag sequencing, and then to study the microorganism The structure of the group.
  • the present invention provides a classification method based on the macrogen 16S hypervariable region V6, which is classified on the basis of the best matching by comparing the sequence of the 16S hypervariable region with the rRNA database.
  • the classification method can provide information on the composition and diversification of the microbial population, and has the same technical effect as the 16S measurement full length in the microbial classification and the relative abundance of the measured population; in addition, the present invention can be found by using a large number of parallel sequencing. More rare microbial species.
  • Solexa since Solexa has a read length of about 75 bp, it has a large throughput and produces a large amount of data. This method has a good cost-effectiveness in exploring changes in the structure of microbial communities (including thin biological mites).
  • Fig. 2 is a flow chart showing another embodiment of the classification method based on the metagenomic 16S hypervariable region V6 provided by the present invention.
  • the classification method 200 based on the metagenomic 16S hypervariable region V6 includes: Steps 201, 202-218, 219, and 220, wherein steps 202-218, 204, 206, and 208 can be performed separately from FIG.
  • steps 202-218, 204, 206, and 208 can be performed separately from FIG.
  • the same or similar technical contents of the steps 102-118 shown are for the sake of clarity, and the technical contents thereof will not be described herein.
  • step 201 sampling of the microbial population is performed. For example, a precipitate is taken from a water such as a lake as a sample.
  • step 219 is performed to classify the sequences of the different degrees of difference by the operational taxonomy unit (OTU).
  • OTU operational taxonomy unit
  • Step 220 Based on the tag sequence and the sequencing sequence, the diversity analysis of Chaol and Angiotensin Converting Enzyme (ACE) was performed using Mothur. Canoco software.
  • ACE Angiotensin Converting Enzyme
  • the present invention provides a classification method based on the macrogen 16S hypervariable region V6.
  • the sequencing sequence of the 16S hypervariable region V6 measured by Solexa is short and does not contain sufficient evolutionary information to infer the systematic classification
  • the present invention utilizes Search software such as GAST, Mothur software, etc., achieve high-throughput classification analysis of microbial populations by tag sequencing using high variable regions by aligning the sequencing sequences of each sample into the database 16S v6 region database refhvr_V6.
  • Search software such as GAST, Mothur software, etc.
  • the sequencing technology used in the present invention combines tag sequences with greatly improved resolution, a single Run on Solexa (IUumina) produces 100-fold more sequencing sequences than 454. Therefore, it is possible to obtain a good classification effect only by sequencing the 16S rRNA V6 region in such a short length, and because of the combination of the labeling technique, the measured length is relatively short, so that it can be in a single Lane (IUumina high-throughput sequencer)
  • the chip has 8 channels, each channel is called "lane", which is more expensive, which greatly saves the cost of sequencing each sample.
  • Step 1 Sampling the microbial population.
  • Step 2 Extract the DNA of the microbial sample.
  • Step 3 Use specific primers for PCR amplification and add a sequence tag to each sample.
  • primers 967f CNACGCGAAGAACCTTANC (Seq ID ⁇ : 1 ) and 1406R: GACAGCCATGCANCACCT (Seq ID NO: 2) were used to replicate the 16S V6 region fragment of the bacterium in the microbial population. Since ⁇ needs to be mixed and sequenced for all microorganisms, a label sequence can be added to each sample. This sequence can be a modified base barcode sequence consisting of 8 bases, which is in front of the 5' end of primer 967f. A linker "GT” is added between the tag sequence (barcode sequence) and the primer 967f.
  • primers 958AR AATTGGANTCAACGCCGG (Seq ID NO: 3) and 1048AR: CGRCGGCCATGCACCWC (Seq ID NO: 4) can be used, followed by the same method for microorganisms. Sample plus barcode sequence and "GT" connector.
  • Step 4 Mix the PCR products of the sample and use the optimized Solexa library for the mixed PCR products.
  • the concentration is quantified using a spectrophotometer Nanodrop, and then mixed in equimolar amounts.
  • a total of 65 samples of the PCR product of 52 bacteria V6 and the PCR product of V6 of 13 archaea are mixed together.
  • Step 5 Solexa sequencing. Specifically, sequencing can be performed directly with Illumina GA II according to the manufacturer's Illumina instructions (75 bp pair-end strategy, ie, 75-base double-end sequencing), as shown in Table 1.
  • Mangrove 1 a AATTGCCG with mangrove 1 sample DNA as template 75,016 26,436 Mangrove 1 b TTAAGGCC with mangrove 1 sample DNA as template 95,077 28,337 Mangrove 1 c TTGGTTCC with mangrove 1 sample DNA as template 66,169 20,403 above Shenzhen-Xianhu Botanical Garden Sample DNA is amplified by bacterial v6 region primers
  • Xianhu Botanical Garden 1 a AACCAACC Take the sample DNA of Xianhu Botanical Garden as template 100,127 28,292 Xianhu Botanical Garden 1— b CAGACAGA Take Xianhu Botanical Garden 1 sample DNA as template 70,732 21,892 Xianhu Botanical Garden 1—c CAGTGAGA to Xianhu Botanical Garden 1 Sample DNA As a template 104, 230 29,007 Xianhu Botanical Garden 1 - d CATCTCGT with Xianhu Botanical Garden 1 sample DNA as template 33,956 10,280 Xianhu Botanical Garden 1 - e GGTAGGAT with Xianhu Botanical Garden 1 sample DNA as template 37,598 10,168 Xianhu Botanical Garden 1 - f GTGTAGAG to Xianhu Botanical Garden 1 sample DNA as template 14,345 6,106 Xianhu Bot
  • Step 7 Assemble the full length sequence of V6 using the relationship of the contigs.
  • the sequence of the hypervariable region V6 is assembled by the reads overlap region of the Pair-end.
  • the average length of the PCR product is 100 bp, and each tag sequence is 75 bp in length on both ends; since the mass of Solexa sequencing is gradually decreased at the 3, the end can be 5, the first 75, 70, 65, 60 Overlap with 55 bp to assemble the full length sequence of V6.
  • the standard for a pair of sequence connections is an overlap length greater than 5 bp and a mismatch of less than 10% in the overlap region.
  • the 1 ⁇ 2 reading (base calling) on the mismatched sites is dependent on the quality of the sequencing at both ends.
  • Step 8 Correspond to the corresponding sample by the barcode tag sequence.
  • Step 9 Classify the microbial population in the sample, specifically, each The sequencing sequence of the sample was compared to the 16S v6 database refhvr-V6, and then the difference was calculated using the GSAT algorithm.
  • Step 10 Perform OTU (operational taxonomic unit) classification, for example, using the GAST-OTU strategy (that is, using the GAST algorithm to calculate the OTU strategy) to classify OTUs for sequences of different degrees of difference.
  • the GAST-OTU strategy that is, using the GAST algorithm to calculate the OTU strategy
  • OTUs for sequences of different degrees of difference.
  • more than 3.7 million tag sequences and 680,000 exact sequencing sequences (ie, perfectly matched) are obtained, which are classified into OTUs by software mothur (v ⁇ 6.0); among them, software mothur (v.1.6.0)
  • the download path is http: ⁇ www.mothur.org/wiki/Main- Page.
  • Figure 3 shows the number of OTUs in the case of microbial populations with different degrees of 0.03 and 0.3 in different environments.
  • the sparse curve shows the OTUs of the Beishan Reservoir 4, the sediments of Xianhu Botanical Garden 1 and Symbolisha 8 in the case of unique (the algorithm for evaluating the degree of difference) of 0.03 and 0.3. Quantity.
  • the sediments of Beishan Reservoir have the largest species diversity and uniformity, and the microbial diversity of Symbolisha seawater sediments is the lowest.
  • the reservoir freshwater sediments show more distribution diversity than other environments. Studies show that about 27% of reservoir freshwater sediments, 20% of Donghu Park sediments, and 17% of Symbolisha marine sediments.
  • the sequence of objects has not previously been classified, indicating that there are more untapped rare species in freshwater environments.
  • the present invention provides a high-throughput sequencing of a microbial population in a specific environment based on the metagenomic 16S hypervariable region V6 classification method using Solexa technology combined with a labeling technique.
  • a single Lane we measured the Approximately 4 million 16S rRNA V6 tag sequences from 65 samples.
  • the number of different tag sequences is 257,001, 228,101, 144,295 and 137,997, with an estimated diversity of 1 million.
  • the sediments of Beishan Reservoir have the highest species diversity and homogeneity.
  • the method of classifying the microbial population by Solexa sequencing 16S rRNA v6 variable region is economical, which reduces the labor and economic cost, and makes the relationship between microbial community structure and health, environmental factors, etc. It becomes easy to work on.
  • the number of reads is higher than the number of previously reported sequencing 16S tag sequences.
  • the number of 690,165 precision v6 tag sequences is approximately 630,000 higher than in the Ribosomal Database Project release 10.15 database.
  • the present invention provides a classification method based on the macrogen 16S hypervariable region V6, which only uses high-variable region v6 sequencing to classify microorganisms in a sample, and this method shows very much in classifying and measuring the relative abundance of the microbial population. Good results, even in the case where the variable region V6 region sequences differ from their nearest reference sequences, can achieve good results. The results showed that by analyzing the V6 variable region for microbial species analysis, not only the main microorganisms but also more rare microorganisms could be detected.
  • V6 variable region of SSU rRNA By sequencing the V6 variable region of SSU rRNA, it was found that the diversity of microorganisms is not limited to the previous classification of Burkins by phenotype, and the microbial population is far more complex than imagined. In addition, in developing the diversity and relative abundance of microbial populations, a large number of parallel Solexa sequencing V6 variable region sequences have advantages over other techniques. Further research on sequencing of variable regions revealed many advantages over other sequencing, such as the relative level of microbial diversity, the length of the sequence, the density of homopolymers, the ability to recognize species levels, or the adaptation to different amplifications. The advantages of primers.
  • the present invention provides a classification method based on the metagenomic 16S hypervariable region V6, and the V6 variable region Solexa sequencing can generate similar taxonomic and relative abundance values compared to conventional full-length SSU rRNA sequencing, but due to its short sequence The same run, which provides more samples of the reads, identifies more microbes, and costs less for each read than traditional full-length SSU rRNA sequencing.
  • Solexa sequencing will provide a broader opportunity to sequence variable microbes for variable region sequencing, such as long sequencing, variable region applications, and a variety of variables. Combination of regions, or deeper sequencing depth.
  • the biggest advantage of variable-zone tag sequencing is that it uses the advantages of a large number of parallel Solexa sequencing, which is several orders of magnitude deeper than the original, and promotes the broad diversity of microbial populations and rare organisms.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed is a classification method for analyzing a microorganism population by using a Solexa sequencing platform to sequence the metagenome 16S rDNA high variable region V6.

Description

基于宏基因组 16S高可变区 V6的分类方法 技术领域  Classification method based on metagenomic 16S hypervariable region V6
本发明涉及微生物基因测序分析技术领域, 尤其涉及一种基于 宏基因组 16S高可变区 V6的分类方法。 背景技术  The invention relates to the technical field of microbial gene sequencing analysis, in particular to a classification method based on the hypergenome 16S hypervariable region V6. Background technique
为了研究生物环境中微生物群体的种类, 一般传统的方法包 括: 直接对微生物进行培养, 变性梯度凝胶电泳 ( DGGE , Denaturing Gradient Gel Electrophoresis ), 末端限制性内切醉片段 长度多态性 ( T-RFLP , Terminal Restriction Fragment Length Polymorphism) , 荧光原位杂交 ( FISH , Fluorescence In Situ Hybridization ), 对可能的微生物种类进行 PCR (聚合酶链式反 应, Polymerase Chain Reaction ); 但这些方式都只能揭露环境中 很小一部分微生物种类。 如果能进行宏基因组的分析, 通过直接对 环境中的微生物群体进行基因组研究, 得到一个比较全面的微生物 种类目录, 有助于对微生物群体的后续研究和应用。  In order to study the types of microbial populations in biological environments, conventional methods include: direct microbial culture, DGGE, Denaturing Gradient Gel Electrophoresis, end-restricted end-cut fragment length polymorphism (T- RFLP, Terminal Restriction Fragment Length Polymorphism, Fluorescence In Situ Hybridization, PCR (Polymerase Chain Reaction) for possible microbial species; however, these methods can only be revealed in the environment. A small part of the microbial species. If the metagenomic analysis can be carried out, a comprehensive catalogue of microbial species can be obtained by directly conducting genome research on the microbial population in the environment, which is helpful for the subsequent research and application of the microbial population.
由于原核生物中 16S rRNA (核蛋白核糖核酸, ribosomal RNA (RiboNucleic Acid) ) 的序列高度保守, 可精确指示细菌之间 的亲缘关系; 16S rRNA的大小为 1500bp ( ^JJit, Base Pair )左 右, 所含信息能反映生物界进化关系, 易操作, 适用于各级分类单 元; 所以在宏基因组的研究中, 16S 区测序是最常用的聚类和分类 方法。 传统的宏基因组的测序是通过 Sanger 技术测序 16S rRNA gene ( 16S rDNA )得到至少 500bp 的读长, 这个读长的长度足够 长, 能够装配出近乎完整的 16S rDNA序列, 帮助我们去精准地研 究每一条序列的物种来源, 但它容易产生嵌合体, 而且测序成本比 较高, 费时又费力。  Because the sequence of 16S rRNA (RiboNucleic Acid) in prokaryotes is highly conserved, it can accurately indicate the phylogenetic relationship between bacteria; the size of 16S rRNA is about 1500bp ( ^JJit, Base Pair ). Containing information can reflect the evolutionary relationship of the biological world, easy to operate, suitable for all levels of taxon; so in the study of metagenomics, 16S region sequencing is the most commonly used clustering and classification methods. The traditional metagenomic sequencing is performed by Sanger technology sequencing 16S rRNA gene ( 16S rDNA ) to obtain a read length of at least 500 bp. This read length is long enough to assemble a nearly complete 16S rDNA sequence, helping us to accurately study each A sequence of species sources, but it is prone to chimerism, and the cost of sequencing is relatively high, time consuming and laborious.
随着新开发出的测序技术以及测序成本的逐步降低, 宏基因组 的研究变得越来越实用, 所涉及的技术包括 Pyrosequencing、 Solexa等。 对于这些革命性的技术的一个主要挑战就是读长太短, 无法对每个个体的 16S rDNA进行测序, 因而它的测序信息不足以 让我们去精准地对微生物进行分类。 With the newly developed sequencing technology and the gradual reduction in sequencing costs, the metagenomics Research has become more and more practical, and the technologies involved include Pyrosequencing, Solexa, etc. One of the main challenges for these revolutionary technologies is that they are too short to sequence each individual's 16S rDNA, so its sequencing information is not sufficient for us to accurately classify microorganisms.
综上所述, 提供一种更加准确地对微生物进行聚类分析的方法 且方便快捷、 成本低廉成为本领域亟待解决的技术问题。 发明内容  In summary, it is a technical problem to be solved in the art to provide a more accurate method for cluster analysis of microorganisms, which is convenient, fast, and low in cost. Summary of the invention
本发明要解决的一个技术问题是提供一种基于宏基因组 16S 高可变区 V6 的分类方法, 通过对 16S 的高可变区 V6 区进行 solexa测序, 并通过对这些 16S可变区的短序列进行系统分类, 可 以在成本低廉的基础上准确反映物种的丰度信息。  One technical problem to be solved by the present invention is to provide a classification method based on the macrogen 16S hypervariable region V6, by performing solexa sequencing on the high variable region V6 region of 16S, and by short sequences of these 16S variable regions. Systematic classification can accurately reflect species abundance information on a low cost basis.
本发明的一个方面提供了一种基于宏基因组 16S 高可变区 V6 的分类方法, 该方法包括: 对于提取微生物的脱氧核糖核酸 DNA; 通过引物对宏基因组 16S核糖体脱氧核糖核酸 rDNA的高 可变区 V6进行聚合酶链式反应 PCR, 为每个样品加上标签序列; 把不同样品的 PCR 产物进行混合; 对混合后的 PCR 产物进行 Solexa建库法建库; 使用 Solexa测序工具对高可变区 V6的文库进 行双末端 pair-end测序, 得到测序序列 (reads ); 对测序序列进行 筛选, 以过滤掉低质量的测序序列; 基于测序序列利用重叠群的关 系对高可变区 V6 的全长序列进行组装; 通过标签序列把测序序列 分配到对应的样品上; 通过对测序序列进行分类分析, 以实现使用 高可变区的测序对微生物群体进行高通量的分类。  One aspect of the present invention provides a classification method based on the hypergenome 16S hypervariable region V6, the method comprising: deoxyribonucleic acid DNA for extracting microorganisms; and high primer 16G ribosomal deoxyribonucleic acid rDNA Transformer V6 performs polymerase chain reaction PCR, labeling each sample; mixing PCR products of different samples; performing Solexa library construction on the mixed PCR products; The library of the variable region V6 is subjected to double-end pair-end sequencing to obtain a sequencing sequence; the sequencing sequence is screened to filter out low-quality sequencing sequences; based on the sequencing sequence, the relationship of the contig is used to the high-variable region V6 The full-length sequence is assembled; the sequencing sequence is assigned to the corresponding sample by the tag sequence; and the sequencing sequence is analyzed to achieve high-throughput classification of the microbial population by sequencing using the high variable region.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 该方法还包括: 在提取微生物的脱氧核糖核酸 DNA 之前, 执行微生物群体的取样。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, the method further comprises: performing sampling of the microbial population prior to extracting the deoxyribonucleic acid DNA of the microorganism.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 该方法还包括: 在通过对测序序列进行分类之后, 对 不同差异度的序列进行操作分类学单元 OTU 的分类; 根据标签序 列和测序序列, 进行种群多样性估计 Chaol 算法和血管紧张素转 化酶 ACE的多样性分析。 In an embodiment of the method for classifying the macrogen 16S hypervariable region V6 provided by the present invention, the method further comprises: after classifying the sequencing sequence, The sequences of different degrees of difference were classified by the operational taxonomic unit OTU; based on the tag sequence and the sequencing sequence, the diversity analysis of the population diversity estimation Chaol algorithm and angiotensin converting enzyme ACE was performed.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 在进行种群多样性估计 Chaol 算法和血管紧张素转 化酶 ACE 的多样性分析之后, 输出微生物群体的多样性分析图和 相对丰度图。  In one embodiment of the method for classifying the macrogenome 16S hypervariable region V6 provided by the present invention, after analyzing the diversity of the population diversity estimation Chaol algorithm and the angiotensin converting enzyme ACE, the diversity analysis of the output microbial population Figure and relative abundance map.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 步骤 "通过引物对宏基因组 16S核糖体脱氧核糖核酸 rDNA的高可变区 V6进行聚合酶链式反应 PCR, 并为每个样品加 上 标 签 序 列 " 进 一 步 包 括 : 使 用 引 物 967f : CNACGCGAAGAACCTTANC ( Seq ID ΝΟ: 1 ) 和 1406R: GACAGCCATGCANCACCT ( Seq ID NO:2 )去复制微生物群体 中细菌的 16S 高可变区 V6 区片段; 对每个微生物样品加标签序 列, 标签序列 到引物 967f的 5, 端的前面, 以及在标签序列和 引物 967f之间加上 ½ GT。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, the step "polymerase chain reaction PCR is performed on the hypervariable region V6 of the metagenomic 16S ribosomal deoxyribonucleic acid rDNA by a primer, And labeling the sequence for each sample" further includes: using primers 967f: CNACGCGAAGAACCTTANC (Seq ID ΝΟ: 1) and 1406R: GACAGCCATGCANCACCT (Seq ID NO: 2) to replicate the 16S hypervariable region V6 region of the bacteria in the microbial population Fragment; tag sequence for each microbial sample, the tag sequence to the front of the 5, end of primer 967f, and 1⁄2 GT between the tag sequence and primer 967f.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 该方法还包括: 对于古生菌的高可变区 V6 的聚合酶 链式反应 PCR, 使用引物 958AR: AATTGGANTCAACGCCGG ( Seq ID NO:3 )和 1048AR: CGRCGGCCATGCACCWC ( Seq ID NO:4 )。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, the method further comprises: using polymerase chain reaction PCR for the hypervariable region V6 of archaea, using primer 958AR: AATTGGANTCAACGCCGG (Seq ID NO: 3) and 1048AR: CGRCGGCCATGCACCWC (Seq ID NO: 4).
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 步骤 "把不同样品的 PCR 产物进行混合" 进一步包 括: 对 16S的高可变区 V6的 PCR产物进行浓度定量; 以及按照 等摩尔的量混合在一起。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, the step of "mixing PCR products of different samples" further comprises: quantifying the concentration of the PCR product of the 16S high variable region V6 ; and mixed together in equimolar amounts.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 步骤 "对混合后的 PCR 产物进行 Solexa 建库法建 库" 进一步包括: 把混合产物进行纯化, 末端修复, 在 3,端加上碱 基 A, 加上双末端 Pair-end测序接头; 加完接头后, 对样品进行纯 化; 对纯化后的样品进行溶解, 并作为模板进行聚合酶链式反应 PCR扩增; 以及对聚合酶链式反应 PCR产物进行胶纯化。 In one embodiment of the method for classifying the macrogen 16S hypervariable region V6 provided by the present invention, the step of "building a library for the mixed PCR product by Solexa" further comprises: purifying the mixed product, and repairing the end, Add base at 3, end Base A, plus a double-end Pair-end sequencing adaptor; after the addition of the linker, the sample is purified; the purified sample is dissolved, and used as a template for polymerase chain reaction PCR amplification; and the polymerase chain The reaction PCR product was subjected to gel purification.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 低质量的数据: 与最邻近的引物不匹配的序列、 小于 50 ½对的序列, 或者具有至少一个引起歧义碱基的序列。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, low quality data: a sequence that does not match the nearest primer, a sequence that is less than 50 1⁄2 pairs, or has at least one ambiguity The sequence of the base.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 步骤 "利用重叠群的关系对高可变区 V6 的全长序列 进行组装" 进一步包括: 采用高可变区 V6的 PCR产物 5, 端的前 75、 70、 65、 60和 55碱 来进行重叠从而组装; 其中, 组装的 标准是一对序列具有大于 5个碱基对的重叠长度和在重叠区域小于 10%的不匹配度。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, the step of "assembling the full-length sequence of the hypervariable region V6 by using the relationship of contigs" further comprises: employing a hypervariable region The PCR product 5 of V6, the first 75, 70, 65, 60 and 55 bases of the ends are overlapped for assembly; wherein, the standard of assembly is that a pair of sequences has an overlap length of more than 5 base pairs and less than 10% in the overlap region. The degree of mismatch.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 步骤 "通过对测序序列进行分类分析" 进一步包括: 将分配到对应样品上的测序序列比对到现有 16S v6数据库中, 来 达到使用高可变区的标签测序对微生物群体进行高通量的分类分 析, 进而研究微生物群体的结构。  In one embodiment of the method for classifying a macrogenome 16S hypervariable region V6 provided by the present invention, the step of "classifying the sequence by sequencing" further comprises: comparing the sequencing sequences assigned to the corresponding sample to the existing 16S In the v6 database, high-throughput classification analysis of microbial populations is achieved by label sequencing using high variable regions, and the structure of the microbial population is studied.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法, 采 用结合了加标签技术的 Solexa技术, 对特定环境下的微生物群体 进行了高通量测序, 既减少了人力劳动也节省了经济花费, 使得在 研究微生物群落结构与健康、 环境因子等等的关系上变得容易可 行。 附图说明  The present invention provides a high-throughput sequencing of microbial populations in a specific environment based on the method of classification of the metagenomic 16S hypervariable region V6, which combines the labeling technology with Solexa technology, which reduces labor and economics. The cost makes it easy to study the relationship between microbial community structure and health, environmental factors, and the like. DRAWINGS
图 1示出本发明实施例提供的一种基于宏基因组 16S高可变区 V6的分类方法的流程图;  FIG. 1 is a flowchart of a method for classifying a hypergen group 16S hypervariable region V6 according to an embodiment of the present invention;
图 2示出本发明提供的基于宏基因组 16S高可变区 V6的分类 方法的另一个实施例的流程图; 图 3示出不同环境下微生物群体在 0.03 和 0.3 的差异度情况 下的 OTUs的数量。 具体实施方式 2 is a flow chart showing another embodiment of a classification method based on the metagenomic 16S hypervariable region V6 provided by the present invention; Figure 3 shows the number of OTUs in the case of a microbial population of 0.03 and 0.3 in different environments. detailed description
下面参照附图对本发明进行更全面的描述, 其中说明本发明的 示例性实施例。  The invention will now be described more fully hereinafter with reference to the accompanying drawings
图 1示出本发明实施例提供的一种基于宏基因组 16S高可变区 V6的分类方法的流程图。  FIG. 1 is a flow chart showing a method for classifying a hypergen group 16S hypervariable region V6 according to an embodiment of the present invention.
如图 1所示, 基于宏基因组 16S高可变区 V6的分类方法流程 100包括:  As shown in Fig. 1, the classification method flow 100 based on the metagenomic 16S hypervariable region V6 includes:
步骤 102, 提取微生物的脱氧核糖核酸 DNA。 例如, 采用 Ultraclean Soil DNA kit试剂盒( MoBio, USA )从样品沉积物中 提取微生物的 DNA0 Step 102, extracting DNA DNA of the microorganism. Example, DNA of microorganisms using Ultraclean Soil DNA kit kit (MoBio, USA) extracted from the sample sediment 0
步骤 104, 通过引物对宏基因组 16S 核糖体脱氧核糖核酸 rDNA的高可变区 V6 (该区的两端各有 20 ½对 bp左右的保守 区, 中间的可变区为 60-90bp 左右)进行聚合酶链式反应 PCR, 并为每个样品加上标签序列。 例如, 使用 引物 967f: CNACGCGAAGAACCTTANC ( Seq ID ΝΟ: 1 ) 和 1406R: GACAGCCATGCANCACCT ( Seq ID NO:2 )去复制微生物群体 中细菌的 16S高可变区 V6区片段; 并对每个微生物样品加标签序 列, 标签序列 到引物 967f的 5, 端的前面, 以及在标签序列和 引物 967f之间加上 ½ GT (即 ½ G和 T )。 其中, 标签序列可 以是由 8 个碱基组成条形码序列,标签序列的设计要符合一定规 则, 比如碱基含量和不同碱基数目等,目的是防止因为个别测序错 误等原因导致标签相互之间的混淆, 例如可以参考美国专利申请 US20100267043A1中公开的方法和原则。  In step 104, the high variable region V6 of the macrogen 16S ribosomal deoxyribonucleic acid rDNA is introduced by a primer (there is a 20 1⁄2 pair of bp conserved regions at both ends of the region, and the intermediate variable region is about 60-90 bp). Polymerase chain reaction PCR and labeling sequences for each sample. For example, primers 967f: CNACGCGAAGAACCTTANC (Seq ID ΝΟ: 1 ) and 1406R: GACAGCCATGCANCACCT (Seq ID NO: 2) are used to replicate fragments of the 16S hypervariable region V6 region of the bacteria in the microbial population; and each microbial sample is tagged , the tag sequence is to the front of the 5' end of the primer 967f, and 1⁄2 GT (ie 1⁄2 G and T) is added between the tag sequence and the primer 967f. The tag sequence may be a bar code sequence consisting of 8 bases, and the tag sequence is designed to conform to certain rules, such as base content and number of bases, etc., in order to prevent tags from being mutually related due to individual sequencing errors and the like. Confusion, for example, can be found in the methods and principles disclosed in U.S. Patent Application No. US20100267043A1.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法的一 个实施例中, 对于古生菌的高可变区 V6的聚合酶链式反应 PCR, 使用引物 958AR: AATTGGANTCAACGCCGG ( Seq ID NO:3 ) 和 1048AR: CGRCGGCCATGCACCWC ( Seq ID NO:4 )„ In one embodiment of the method for classifying the macrogenome 16S hypervariable region V6 provided by the present invention, for polymerase chain reaction PCR of the hypervariable region V6 of archaea, Primer 958AR was used: AATTGGANTCAACGCCGG (Seq ID NO: 3) and 1048AR: CGRCGGCCATGCACCWC (Seq ID NO: 4) „
步骤 106, 把不同样品的 PCR 产物进行混合。 例如, 采用分 光光度计 (如 Nanodrop )对 16S的高可变区 V6的 PCR产物进行 浓度定量, 之后再按照等摩尔的量混合在一起。  Step 106, mixing the PCR products of different samples. For example, the PCR product of the 16S high variable region V6 is quantified using a spectrophotometer (e.g., Nanodrop) and then mixed together in equimolar amounts.
步骤 108, 对混合后的 PCR产物进行 Solexa建库法建库。 例 如, 把混合产物用 QIAquick PCR purification Kit ( Qiagen )进行 纯化, 末端修复(即通过酶反应使所有 DNA双链的粘性末端成为 平末端), 加 " A", 加上 Pair-end 的接头 (Pair-end library preparation kit, IUumina ); 加完接头后, 对样品进行纯化; 对纯化 后的样品进行溶解, 并作为模板进行聚合酶链式反应 PCR 扩增 ( 12cycles ); 以及用 (QIAquick gel extraction kit, Qiagen)对聚合酶 链式反应 PCR 产物进行胶纯化 (即点样电泳, 在 DNA位置切 胶, 用试剂盒纯化)。  Step 108: Perform a Solexa database construction method on the mixed PCR product. For example, the mixed product is purified with QIAquick PCR purification Kit (Qiagen), end-repaired (ie, the viscous end of all DNA duplexes is blunt-ended by enzymatic reaction), plus "A", plus Pair-end linker (Pair -end library preparation kit, IUumina ); after the addition of the linker, the sample is purified; the purified sample is dissolved, and polymerase chain reaction PCR amplification (12 cycles) is used as a template; and (QIAquick gel extraction kit) , Qiagen) Gelatinization of the polymerase chain reaction PCR product (ie, spot electrophoresis, gelatinization at the DNA position, purification with a kit).
步骤 110, 使用 Solexa测序工具(如 IUumina GA, illumina GA2, illumina Hiseq2000, illumina Hiseq 1000等)对高可变区 V6 的文库进行 pair -end测序, 得到原始的测序序列数据。 例如, 直接 用 Illumina GA II ( 75bp pair-end 策略)进行测序。 Solexa测序 仪 ( Illumina genome analyzer )是新一代的高通量测序仪, 测序价 氐廉, 数据读取量大, 相同的测序量的情况下, Solexa测序费用 是 454测序费用的十分之一, 而且错误率低(如单碱基测序错误率 <10_ 5 ), 测序无偏性, 对于宏基因组, 可以真实反映物种的丰度信 息。 In step 110, the library of the high variable region V6 is subjected to pair-end sequencing using a Solexa sequencing tool (such as IUumina GA, illumina GA2, illumina Hiseq2000, illumina Hiseq 1000, etc.) to obtain the original sequencing sequence data. For example, sequencing is performed directly using the Illumina GA II (75 bp pair-end strategy). The Illumina genome analyzer is a new generation of high-throughput sequencer with low-cost sequencing and high data readout. Solexa sequencing costs one-tenth the cost of 454 sequencing with the same amount of sequencing. Moreover, the error rate is low (such as single-base sequencing error rate <10 _ 5 ), sequencing is unbiased, and for the metagenomics, the abundance information of the species can be truly reflected.
步骤 112, 对测序序列数据进行筛选, 以过滤掉低质量的测序 序列。 例如, 低质量的测序序列选自以下测序序列中的任意一种: 与最邻近的引物不匹配的测序序列、 小于 50 基对 bp 的测序序 列, 或者具有至少一个引起歧义碱基的测序序列。  In step 112, the sequencing sequence data is screened to filter out low quality sequencing sequences. For example, the low quality sequencing sequence is selected from any of the following sequencing sequences: a sequencing sequence that does not match the nearest primer, a sequencing sequence of less than 50 bp, or a sequencing sequence that has at least one cryptogenic base.
步骤 114, 利用重叠群的关系对高可变区 V6 的全长序列进行 组装。 例如, 采用高可变区 V6的 PCR产物 5, 端的前 75、 70、 65、 60或 55碱基对 bp来进行重叠从而组装; 其中, 组装的标准 可以是一对序列具有大于 5bp 的重叠长度, 并且在重叠区域小于 10%的不匹配度(即高于 90 %的匹配度)。 Step 114: Perform a full-length sequence of the high variable region V6 by using the relationship of the contigs Assembly. For example, the PCR product 5 of the high variable region V6, the first 75, 70, 65, 60 or 55 base pairs bp of the ends are overlapped and assembled; wherein the assembly standard may be a pair of sequences having an overlap length of more than 5 bp. And a mismatch of less than 10% in the overlap area (ie, a match of more than 90%).
步骤 116, 通过标签序列把测序序列分配到对应的样品上。 步骤 118, 通过对测序序列进行分类分析, 以实现使用高可变 区的测序对微生物群体进行高通量的分类。 例如, 通过 GAST 软 件将分配到对应样品上的测序序列比对到数据库 16S v6 数据库 refhvr— V6 中, 来达到使用高可变区的标签测序对微生物群体进行 高通量的分类分析, 进而研究微生物群体的结构。  Step 116, assigning the sequencing sequence to the corresponding sample by the tag sequence. Step 118: Perform high-throughput classification of the microbial population by performing classification analysis on the sequencing sequence to achieve sequencing using the high variable region. For example, the sequencing sequence assigned to the corresponding sample is compared to the database 16S v6 database refhvr-V6 by GAST software to achieve high-throughput classification analysis of the microbial population using high-variable region tag sequencing, and then to study the microorganism The structure of the group.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法, 通 过把 16S高可变区的序列与 rRNA数据库进行比较, 在最好匹配的 基础上进行分类。 该分类方法在微生物群体的构成和多样化上都可 以提供信息, 其与 16S测全长在微生物分类和测量群体的相对丰度 上具有等同的技术效果; 此外, 本发明采用大量平行测序能够发现 更多的稀有微生物种类。  The present invention provides a classification method based on the macrogen 16S hypervariable region V6, which is classified on the basis of the best matching by comparing the sequence of the 16S hypervariable region with the rRNA database. The classification method can provide information on the composition and diversification of the microbial population, and has the same technical effect as the 16S measurement full length in the microbial classification and the relative abundance of the measured population; in addition, the present invention can be found by using a large number of parallel sequencing. More rare microbial species.
进一步的, 由于 Solexa的读长为 75bp左右, 它的通量大, 产 出数据多, 这种方法在探究微生物群落(包括稀薄生物圏)结构的 变化上具有 (艮好的成本效益„  Further, since Solexa has a read length of about 75 bp, it has a large throughput and produces a large amount of data. This method has a good cost-effectiveness in exploring changes in the structure of microbial communities (including thin biological mites).
图 2示出本发明提供的基于宏基因组 16S高可变区 V6的分类 方法的另一个实施例的流程图。  Fig. 2 is a flow chart showing another embodiment of the classification method based on the metagenomic 16S hypervariable region V6 provided by the present invention.
如图 2所示, 基于宏基因组 16S高可变区 V6的分类方法流程 200 包括: 步骤 201、 202 - 218、 219 和 220 , 其中步骤 202 - 218、 204、 206和 208可以分别执行与图 1所示的步骤 102 - 118相 同或相似的技术内容, 为筒洁起见, 这里不再赘述其技术内容。  As shown in FIG. 2, the classification method 200 based on the metagenomic 16S hypervariable region V6 includes: Steps 201, 202-218, 219, and 220, wherein steps 202-218, 204, 206, and 208 can be performed separately from FIG. The same or similar technical contents of the steps 102-118 shown are for the sake of clarity, and the technical contents thereof will not be described herein.
如图 2所示, 在步骤 202 "提取微生物的脱氧核糖核酸 DNA" 之前, 执行步骤 201, 微生物群体的取样。 例如, 从湖泊等水域中 提取沉淀物作为取样样本。 在步骤 218 "通过对测序序列进行分类分析" 之后, 执行步骤 219, 对不同差异度的序列进行操作分类学单元(OTU ) 的分类。 例如, 利用 V丄 6.0 版本的 Mothur 软件 ( 下载网址为 http://www.mothur.org/wiki/Main_Page ), 采用 GAST-OTU策略对 不同差异度的序列进行 OTU的分类。 As shown in Fig. 2, before step 202 "Extracting the DNA DNA of the microorganism", step 201, sampling of the microbial population is performed. For example, a precipitate is taken from a water such as a lake as a sample. After the step 218 "by classifying the sequence of the sequencing", step 219 is performed to classify the sequences of the different degrees of difference by the operational taxonomy unit (OTU). For example, using the V丄6.0 version of Mothur software (download at http://www.mothur.org/wiki/Main_Page), the GAST-OTU strategy is used to classify OTUs for sequences of different degrees of difference.
步骤 220, 根据标签序列和测序序列, 利用 Mothur. Canoco 软件进行 Chaol 和血管紧张素转化酶 ( ACE , Angiotensin Converting Enzyme )的多样性分析。  Step 220: Based on the tag sequence and the sequencing sequence, the diversity analysis of Chaol and Angiotensin Converting Enzyme (ACE) was performed using Mothur. Canoco software.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法, 尽 管 Solexa测得的 16S高可变区 V6的测序序列艮短并不能包含足够 的进化信息去推论出系统分类, 但本发明利用搜索软件如 GAST, Mothur 软件等, 通过比对每个样品的测序序列到数据库 16S v6 region database refhvr_V6中, 来达到使用高可变区的标签测序对 微生物群体进行高通量的分类分析。 总之, 使用 Solexa 测序技术 对微生物样本进行测序能够在通量、 成本和有效的分类效果上达到 很好的平衡, 此外, 本发明采用的测序技术结合了标签序列, 在分 辨率上大大提高, 单个 Run上 Solexa ( IUumina ) 能产生比 454多 100倍的测序序列。 因此, 仅仅通过测序 16S rRNA V6区这么短的 长度就能得到很好的分类效果, 另外由于结合了标签技术, 测的长 度相对比较短, 因而能够在单个 Lane ( IUumina 高通量测序仪一 张芯片有 8 个通道, 每个通道被称为 "lane" )上点更多的样, 大 大节约了每个样品的测序成本。  The present invention provides a classification method based on the macrogen 16S hypervariable region V6. Although the sequencing sequence of the 16S hypervariable region V6 measured by Solexa is short and does not contain sufficient evolutionary information to infer the systematic classification, the present invention utilizes Search software such as GAST, Mothur software, etc., achieve high-throughput classification analysis of microbial populations by tag sequencing using high variable regions by aligning the sequencing sequences of each sample into the database 16S v6 region database refhvr_V6. In summary, the use of Solexa sequencing technology to sequence microbial samples provides a good balance of throughput, cost, and effective classification. In addition, the sequencing technology used in the present invention combines tag sequences with greatly improved resolution, a single Run on Solexa (IUumina) produces 100-fold more sequencing sequences than 454. Therefore, it is possible to obtain a good classification effect only by sequencing the 16S rRNA V6 region in such a short length, and because of the combination of the labeling technique, the measured length is relatively short, so that it can be in a single Lane (IUumina high-throughput sequencer) The chip has 8 channels, each channel is called "lane", which is more expensive, which greatly saves the cost of sequencing each sample.
接下来详细描述本发明提供的基于宏基因组 16S 高可变区 V6 的分类方法的一个具体实施方式。  Next, a specific embodiment of the classification method based on the metagenomic 16S hypervariable region V6 provided by the present invention will be described in detail.
步骤 1、 进行微生物群体的取样。  Step 1. Sampling the microbial population.
具体来说, 分别提取深圳 -北山水库沉积物、 深圳-仙湖植物 园沉积物、 深圳-红树林沉积物、 深圳-大梅沙沉积物、 深圳-龙 岗河沉积物、 深圳-污水处理厂沉积物、 深圳 -东湖公园沉积物, 总共 65份样品。 Specifically, extracts from Shenzhen-Beishan Reservoir sediments, Shenzhen-Xianhu Botanical Garden sediments, Shenzhen-Mangrove sediments, Shenzhen-Dameisha sediments, Shenzhen-Longganghe sediments, Shenzhen-Sewage treatment plant sediments , Shenzhen-Donghu Park sediments, A total of 65 samples.
步骤 2、 提取微生物样本的 DNA。  Step 2. Extract the DNA of the microbial sample.
具体来说, 所有的沉积物的 DNA都采用 Ultraclean Soil DNA kit ( MoBio, USA )从新鲜或深藏冷冻的沉积物样品中提取出来。  Specifically, all sediment DNA was extracted from fresh or deep frozen sediment samples using the Ultraclean Soil DNA kit (MoBio, USA).
步骤 3、 使用特定的引物进行 PCR 扩增, 同时对每一个样品 加上一个序列标签。  Step 3. Use specific primers for PCR amplification and add a sequence tag to each sample.
具体来说, 使用引物 967f: CNACGCGAAGAACCTTANC ( Seq ID ΝΟ: 1 )和 1406R: GACAGCCATGCANCACCT ( Seq ID NO:2 )去复制微生物群体中细菌的 16S V6 区片段。 由于^ 需要对所有微生物进行混合测序, 可以对每个样品加一个标签序 列, 这个序列可以是由 8个碱基组成的经过修正了错误的条形码序 列, 这个序列 到引物 967f的 5' 端的前面, 在标签序列 (条形 码序列)和引物 967f之间加了一个连接物 "GT"。  Specifically, primers 967f: CNACGCGAAGAACCTTANC (Seq ID ΝΟ: 1 ) and 1406R: GACAGCCATGCANCACCT (Seq ID NO: 2) were used to replicate the 16S V6 region fragment of the bacterium in the microbial population. Since ^ needs to be mixed and sequenced for all microorganisms, a label sequence can be added to each sample. This sequence can be a modified base barcode sequence consisting of 8 bases, which is in front of the 5' end of primer 967f. A linker "GT" is added between the tag sequence (barcode sequence) and the primer 967f.
此外, 对于古生菌的 V6 区域的聚合酶链式反应 PCR产物, 可以使用引物 958AR: AATTGGANTCAACGCCGG ( Seq ID NO:3 ) 和 1048AR: CGRCGGCCATGCACCWC ( Seq ID NO:4 ), 随后采用相同的方式对微生物样品加上条形码序列和 "GT" 连接物。  In addition, for the polymerase chain reaction PCR product of the V6 region of archaea, primers 958AR: AATTGGANTCAACGCCGG (Seq ID NO: 3) and 1048AR: CGRCGGCCATGCACCWC (Seq ID NO: 4) can be used, followed by the same method for microorganisms. Sample plus barcode sequence and "GT" connector.
步骤 4、 样品的 PCR产物混合并对混合的 PCR产物使用优化 的 Solexa建库。  Step 4. Mix the PCR products of the sample and use the optimized Solexa library for the mixed PCR products.
具体来说, 对于加上条形码标签的 16S V6区的 PCR产物, 采 用分光光度计 Nanodrop进行浓度的定量, 然后按照等摩尔的量混 合在一起。 该具体实施方式中是将 52个细菌 V6的 PCR产物和 13 个古生菌的 V6的 PCR产物共 65个样品混合在一起。  Specifically, for the PCR product of the 16S V6 region to which the barcode label is attached, the concentration is quantified using a spectrophotometer Nanodrop, and then mixed in equimolar amounts. In this embodiment, a total of 65 samples of the PCR product of 52 bacteria V6 and the PCR product of V6 of 13 archaea are mixed together.
把这些混合产物用试剂盒: QIAquick PCR purification Kit ( Qiagen )进行纯化, 末端修复, 在 3,端加上碱基 A, 加上双末端 Pair-end 须,】序接头 (用试齐 j盒 Pair-end library preparation kit, Illumina )„ 加完接头后, 对样品进行纯化, 用 30μΙ^ 的 ΕΒ (洗脱 液, Elution buffer )进行溶解。 然后取 的溶液作为模板进行 PCR 扩增 ( 12cycles )„ 使用试剂盒 (QIAquick gel extraction kit, Qiagen)对 PCR产物进行胶纯化。 These mixed products were purified by kit: QIAquick PCR purification Kit (Qiagen), terminal repair, base A at the 3' end, plus paired-end Pair-end, 】 -end library preparation kit, Illumina ) „ After the fitting is completed, the sample is purified and eluted with 30 μΙ^ Solution, Elution buffer). The extracted solution was then used as a template for PCR amplification (12 cycles). The PCR product was gel purified using a QIAquick gel extraction kit (Qagen).
步骤 5、 Solexa测序。 具体来说, 可以按照厂家 Illumina的说 明书直接用 Illumina GA II 进行测序 ( 75bp pair-end 策略, 即读 长为 75个碱基的双末端测序), 如表 1所示。  Step 5. Solexa sequencing. Specifically, sequencing can be performed directly with Illumina GA II according to the manufacturer's Illumina instructions (75 bp pair-end strategy, ie, 75-base double-end sequencing), as shown in Table 1.
表 1 样品名 -标签序列 - Solexa reads  Table 1 Sample Name - Label Sequence - Solexa reads
总 reads 特异 样品名称 标签序列 备注  Total reads specific sample name tag sequence note
数 reads数 深圳- 北山水库样品以细菌 v6区引物扩增  Number of reads Shenzhen - Beishan Reservoir samples amplified by bacteria v6 region primers
北山水库— 1 AACGGCAA 76,019 30,309 北山水库— 2 AAGGAACC 71,051 24,714 北山水库— 3 AATTGCGC 60,441 23,558 北山水库— 4 ACAGACTC 83,156 32,553 北山水库— 5 ACTCAGAC 74,639 26,703 北山水库— 6 CACTACTC 84,390 31,436 北山水库— 7 CACTAGAC 77,874 28,372 北山水库— 8 CAGTGTCA 74,475 30,909 北山水库— 9 GGAAGCAT 38,817 15,630 北山水库— 10 GTAGCATC 82,994 30,159 北山水库— 11 GTCTTGAG 66,029 25,175 北山水库— 12 GTTCCATC 54,770 21,399 北山水库— 13 GTTCCTAC 47,365 19,466 北山水库— 14 TGCTCATC 55,725 21,762 北山水库— 15 TGGTTGCA 57,097 23,625 北山水库— 16 TTATCCGC 29,660 13,013 北山水库— 17 TTCGCCAT 36,927 15,879 北山水库— 18 TTGCGGTA 38,178 19,324 深圳-东湖公园及深圳-红树林样品以细菌 v6区引物扩增 Beishan Reservoir — 1 AACGGCAA 76,019 30,309 Beishan Reservoir — 2 AAGGAACC 71,051 24,714 Beishan Reservoir — 3 AATTGCGC 60,441 23,558 Beishan Reservoir — 4 ACAGACTC 83,156 32,553 Beishan Reservoir — 5 ACTCAGAC 74,639 26,703 Beishan Reservoir — 6 CACTACTC 84,390 31,436 Beishan Reservoir — 7 CACTAGAC 77,874 28,372 Beishan Reservoir — 8 CAGTGTCA 74,475 30,909 Beishan Reservoir — 9 GGAAGCAT 38,817 15,630 Beishan Reservoir — 10 GTAGCATC 82,994 30,159 Beishan Reservoir — 11 GTCTTGAG 66,029 25,175 Beishan Reservoir — 12 GTTCCATC 54,770 21,399 Beishan Reservoir — 13 GTTCCTAC 47,365 19,466 Beishan Reservoir — 14 TGCTCATC 55,725 21,762 Beishan Reservoir — 15 TGGTTGCA 57,097 23,625 Beishan Reservoir — 16 TTATCCGC 29,660 13,013 Beishan Reservoir — 17 TTCGCCAT 36,927 15,879 Beishan Reservoir — 18 TTGCGGTA 38,178 19,324 Shenzhen-Donghu Park and Shenzhen-Mangrove samples are amplified by bacterial v6 region primers
东湖公园 1 ACGATCGT 取样于东湖公园草坪 85,018 28,387 东湖公园 2 CAAGTGCT 取样于东湖公园草坪 85,926 26,677 东湖公园 3 TGTGAGAG 取样于东湖公园草坪 74,697 23,953 红树林 1 TGCTCTAC 取样于红树林树根周围 57,215 20,566 红树林 2 GTTGGATC 取样于红树林树根周围 96,113 31,700 深圳-仙湖植物园、 深圳-龙岗河及深圳 -污水处理厂样品以细菌 v6区引物扩增 仙湖植物园 1 CAGACTCA 取样于仙湖植物园花丛 79,151 25,651 仙湖植物园 2 CCATGGAT 取样于仙湖植物园花丛 81,335 26,120 仙湖植物园 3 TTGCTACC 取样于仙湖植物园花丛 65,935 21,413 龙岗河 1 GTACACGT 取样于龙岗河淤泥 91,359 27,729 龙岗河 2 GTAGCTAC 取样于龙岗河淤泥 75,548 23,583 污水处理厂 1 GTCAGTCA 取样于污水处理厂废水入口 70,169 23,649 污水处理厂 2 GTGTACTG 取样于污水处理厂废水入口 72,584 21,994 以上深圳-红树林样品 DNA 以细菌 v6区引物扩增 Donghu Park 1 ACGATCGT Sampling at Donghu Park Lawn 85,018 28,387 Donghu Park 2 CAAGTGCT Sampling at Donghu Park Lawn 85,926 26,677 Donghu Park 3 TGTGAGAG Sampling at Donghu Park Lawn 74,697 23,953 Mangrove 1 TGCTCTAC Sampling around mangrove roots 57,215 20,566 Mangrove 2 GTTGGATC Sampling around the roots of mangrove trees 96,113 31,700 Shenzhen-Xianhu Botanical Garden, Shenzhen-Longgang River and Shenzhen-Sewage Treatment Plant samples were amplified by bacteria v6 region primers Xianhu Botanical Garden 1 CAGACTCA Sampling at Xianhu Botanical Garden Flowers 79,151 25,651 Xianhu Botanical Garden 2 CCATGGAT Sampling at Xianhu Botanical Garden, 81,335 26,120 Xianhu Botanical Garden 3 TTGCTACC Sampling at Xianhu Botanical Garden 65,935 21,413 Longgang River 1 GTACACGT Sampling at Longgang River Sludge 91,359 27,729 Longgang River 2 GTAGCTAC Sampling at Longgang River Sludge 75,548 23,583 Sewage Treatment Plant 1 GTCAGTCA Sampling at wastewater treatment plant wastewater inlet 70,169 23,649 Sewage treatment plant 2 GTGTACTG Sampling at wastewater treatment plant wastewater inlet 72,584 21,994 Above Shenzhen-Mangrove sample DNA Amplification with bacterial v6 region primer
红树林 1 a AATTGCCG 以 红树林 1样品 DNA作为模板 75,016 26,436 红树林 1 b TTAAGGCC 以 红树林 1样品 DNA作为模板 95,077 28,337 红树林 1 c TTGGTTCC 以 红树林 1样品 DNA作为模板 66,169 20,403 以上深圳-仙湖植物园样品 DNA以细菌 v6区引物扩增 Mangrove 1 a AATTGCCG with mangrove 1 sample DNA as template 75,016 26,436 Mangrove 1 b TTAAGGCC with mangrove 1 sample DNA as template 95,077 28,337 Mangrove 1 c TTGGTTCC with mangrove 1 sample DNA as template 66,169 20,403 above Shenzhen-Xianhu Botanical Garden Sample DNA is amplified by bacterial v6 region primers
仙湖植物园 1— a AACCAACC 以仙湖植物园 1 样品 DNA作为模板 100,127 28,292 仙湖植物园 1— b CAGACAGA 以仙湖植物园 1 样品 DNA作为模板 70,732 21,892 仙湖植物园 1— c CAGTGAGA 以仙湖植物园 1 样品 DNA作为模板 104,230 29,007 仙湖植物园 1— d CATCTCGT 以仙湖植物园 1 样品 DNA作为模板 33,956 10,280 仙湖植物园 1— e GGTAGGAT 以仙湖植物园 1 样品 DNA作为模板 37,598 10,168 仙湖植物园 1— f GTGTAGAG 以仙湖植物园 1 样品 DNA作为模板 14,345 6,106 仙湖植物园 1— g GTTGGTAC 以仙湖植物园 1 样品 DNA作为模板 33,356 12,099 仙湖植物园 1— h TGGAGTAG 以仙湖植物园 1 样品 DNA作为模板 107,998 29,426 仙湖植物园 1— i TGTGACTG 以仙湖植物园 1 样品 DNA作为模板 214,344 54,573 仙湖植物园 1J GTCAGAGA 以仙湖植物园 1 样品 DNA作为模板 44,731 11,576 仙湖植物园 1— k GTCTTCTG 以仙湖植物园 1 样品 DNA作为模板 44,912 11,195 深圳-大梅沙海岸样品以细菌 v6区引物扩增 Xianhu Botanical Garden 1— a AACCAACC Take the sample DNA of Xianhu Botanical Garden as template 100,127 28,292 Xianhu Botanical Garden 1— b CAGACAGA Take Xianhu Botanical Garden 1 sample DNA as template 70,732 21,892 Xianhu Botanical Garden 1—c CAGTGAGA to Xianhu Botanical Garden 1 Sample DNA As a template 104, 230 29,007 Xianhu Botanical Garden 1 - d CATCTCGT with Xianhu Botanical Garden 1 sample DNA as template 33,956 10,280 Xianhu Botanical Garden 1 - e GGTAGGAT with Xianhu Botanical Garden 1 sample DNA as template 37,598 10,168 Xianhu Botanical Garden 1 - f GTGTAGAG to Xianhu Botanical Garden 1 sample DNA as template 14,345 6,106 Xianhu Botanical Garden 1— g GTTGGTAC with Xianhu Botanical Garden 1 sample DNA as template 33,356 12,099 Xianhu Botanical Garden 1—h TGGAGTAG with Xianhu Botanical Garden 1 sample DNA as template 107,998 29,426 Xianhu Botanical Garden 1-i TGTGACTG takes the DNA of Xianhu Botanical Garden 1 as template 214,344 54,573 Xianhu Botanical Garden 1J GTCAGAGA takes Xiannan Botanical Garden 1 sample DNA as template 44,731 11,576 Xianhu Botanical Garden 1-k GTCTTCTG takes Xianhu Botanical Garden 1 sample DNA as template 44,912 11,195 Shenzhen-Da Bacterial samples coast sand v6 region primer
大梅沙 1 AACGCGTT 40,151 11,803 大梅沙—2 AAGCTTGC 77,856 24,389 大梅沙—3 ACAGAGAC 103,043 28,314 大梅沙—4 ACCTGATG 93,710 26,667 大梅沙—5 ACTCACTC 74,530 21,144 大梅沙—6 CAACGATC 99,482 30,508 大梅沙—7 CAACGTAC 104,545 29,222 大梅沙—8 CAAGTCGT 113,590 26,376 深圳-东湖公园、 深圳-红树林、 深圳-仙湖植物园、 深圳-龙岗河及深圳 -污水处理厂 样品以细菌 v6区引物扩增 Dameisha 1 AACGCGTT 40,151 11,803 Dameisha-2 AAGCTTGC 77,856 24,389 Dameisha-3 ACAGAGAC 103,043 28,314 Dameisha-4 ACCTGATG 93,710 26,667 Dameisha-5 ACTCACTC 74,530 21,144 Dameisha-6 CAACGATC 99,482 30,508 Dameisha —7 CAACGTAC 104,545 29,222 Dameisha—8 CAAGTCGT 113,590 26,376 Shenzhen-Donghu Park, Shenzhen-Mangrove, Shenzhen-Xianhu Botanical Garden, Shenzhen-Longgang River and Shenzhen-Sewage Treatment Plant samples were amplified by bacterial v6 region primers
东湖公园 l a GGATGGTA 以东湖公园 1样品 DNA作为模板 16,274 3,450 东湖公园 3 a TGGTTCGA 以东湖公园 3样品 DNA作为模板 14,042 3,346 东湖公园 3 b ACTGCAGT 以东湖公园 3样品 DNA作为模板 7,949 2,008 红树林 1 a TGGAGATG 以红树林 1 样品 DNA作为模板 27,490 6,453 红树林 2 a TGCAACGT 以红树林 2样品 DNA作为模板 10,502 2,654 仙湖植物园 1 a TGACTCTC 以仙湖植物园 1样品 DNA作为模板 49,096 11,083 仙湖植物园 2 a GGATTACC 以仙湖植物园 2样品 DNA作为模板 7,625 2,527 仙湖植物园 3 a CCAACCTT 以仙湖植物园 3样品 DNA作为模板 2,447 959 龙岗河 l a TGACTGAC 以龙岗河 1样品 DNA作为模板 32,108 8,401 龙岗河 2 a ACCACATG 以龙岗河 2样品 DNA作为模板 4,919 1,852 污水处理厂 l a ACCACTAG 以污水处理厂 1样品 DNA作为模板 5,765 2,078 污水处理厂 2 a ACCTGTAG 以污水处理厂 2样品 DNA作为模板 7,140 2,367 红树林 2 b GGAACGTA 以红树林 2样品 DNA作为模板 4,454 1,398 步骤 6、 得到原始的测序数据后, 过滤掉那些低质量的数据。 具体来说, 去掉那些和最邻近的引物不匹配的序列, 小于 50bp 的 序列, 或者有一个或多个不同减基的序列, 如表 2所示。 East Lake Park la GGATGGTA Take East Lake Park 1 sample DNA as template 16,274 3,450 East Lake Park 3 a TGGTTCGA Take East Lake Park 3 sample DNA as template 14,042 3,346 East Lake Park 3 b ACTGCAGT Take East Lake Park 3 sample DNA as template 7,949 2,008 Mangrove 1 a TGGAGATG Mangrove 1 sample DNA as template 27,490 6,453 mangrove 2 a TGCAACGT with mangrove 2 sample DNA as template 10,502 2,654 Xianhu Botanical Garden 1 a TGACTCTC with Xianhu Botanical Garden 1 sample DNA as template 49,096 11,083 Xianhu Botanical Garden 2 a GGATTACC Take the sample DNA of Xianhu Botanical Garden 2 as a template 7,625 2,527 Xianhu Botanical Garden 3 a CCAACCTT Take the sample DNA of Xianhu Botanical Garden 3 as template 2,447 959 Longgang River la TGACTGAC Take Longgang River 1 sample DNA as template 32,108 8,401 Longgang River 2 a ACCACATG with Longgang River 2 sample DNA as template 4,919 1,852 sewage treatment plant la ACCACTAG with sewage treatment plant 1 sample DNA as template 5,765 2,078 sewage treatment plant 2 a ACCTGTAG with sewage treatment plant 2 sample DNA as template 7,140 2,367 mangrove 2 b GGAACGTA uses mangrove 2 sample DNA as template 4,454 1,398 Step 6. After obtaining the original sequencing data, the low quality data is filtered out. Specifically, sequences that do not match the nearest primer, sequences less than 50 bp, or sequences with one or more different subtractions are removed, as shown in Table 2.
表 2 宏基因组数据  Table 2 Metagenomic data
Figure imgf000014_0001
Figure imgf000014_0001
*这些数据是用 60bp的长度来找重叠群的 (容许 0和 1个错配 的情况下)  *These data are used to find contigs with a length of 60bp (in case of 0 and 1 mismatch)
步骤 7、 利用重叠群的关系来组装 V6的全长序列。  Step 7. Assemble the full length sequence of V6 using the relationship of the contigs.
具体来说, 通过所述 Pair-end的 reads重叠区来组装高可变区 V6的序列。 PCR产物的平均长度为 100bp, 每一条标签序列都是 在两端侧 75bp的长度; 由于 Solexa测序的质量在 3, 端是逐渐下 降的, 所以可以采用 5, 端的前 75、 70、 65、 60和 55 bp来进行 重叠从而组装 V6的全长序列。 一对序列连接的标准是大于 5bp的 重叠长度和在重叠区域小于 10%的不匹配度。 在不匹配位点上的 ½读取(Base calling )是取决于两端的测序质量。  Specifically, the sequence of the hypervariable region V6 is assembled by the reads overlap region of the Pair-end. The average length of the PCR product is 100 bp, and each tag sequence is 75 bp in length on both ends; since the mass of Solexa sequencing is gradually decreased at the 3, the end can be 5, the first 75, 70, 65, 60 Overlap with 55 bp to assemble the full length sequence of V6. The standard for a pair of sequence connections is an overlap length greater than 5 bp and a mismatch of less than 10% in the overlap region. The 1⁄2 reading (base calling) on the mismatched sites is dependent on the quality of the sequencing at both ends.
步骤 8、 通过条形码标签序列把测序序列都对应到相应的样品 上。  Step 8. Correspond to the corresponding sample by the barcode tag sequence.
步骤 9、 对样品里的微生物群体进行分类, 具体来说, 把每个 样品的测序序列比对到 16S v6 数据库 refhvr— V6 中, 然后采用 GSAT算法计算差异度。 Step 9. Classify the microbial population in the sample, specifically, each The sequencing sequence of the sample was compared to the 16S v6 database refhvr-V6, and then the difference was calculated using the GSAT algorithm.
步錄 10、 执行 OTU ( operational taxonomic unit ) 的分类, 例 如采用 GAST-OTU的策略 (即使用 GAST算法来计算 OTU的策 略)对不同差异度的序列进行 OTU 的分类。 本具体实施方式中得 到了大于 370万的标签序列和 68万的精确测序序列 (即完美匹配 的), 利用软件 mothur(v丄 6.0)分类到 OTUs 上; 其中软件 mothur(v.1.6.0) 的 下 载 途 径 为 http:〃 www.mothur.org/wiki/Main— Page。  Step 10: Perform OTU (operational taxonomic unit) classification, for example, using the GAST-OTU strategy (that is, using the GAST algorithm to calculate the OTU strategy) to classify OTUs for sequences of different degrees of difference. In this embodiment, more than 3.7 million tag sequences and 680,000 exact sequencing sequences (ie, perfectly matched) are obtained, which are classified into OTUs by software mothur (v丄6.0); among them, software mothur (v.1.6.0) The download path is http:〃 www.mothur.org/wiki/Main- Page.
步骤 11、 数据分析。  Step 11. Data analysis.
具体来说, 采用 Mothur. Canoco ( v4.5 )软件进行 Chaol , 血 管紧张素转化酵 ( ACE, Angiotensin Converting Enzyme )多样性 分析等, 如表 3和表 4所示。 从而得到微生物群体的多样性分析图 和相对丰度图等等。  Specifically, Chaol, Angiotensin Converting Enzyme (ACE) diversity analysis was performed using Mothur. Canoco (v4.5) software, as shown in Tables 3 and 4. Thereby, a diversity analysis map and a relative abundance map of the microbial population are obtained.
特定环境下的多样性评价  Diversity assessment in specific environments
Figure imgf000015_0001
Figure imgf000015_0001
*这些 reads来自于 60bp的重叠群(容许 0和 1个错配) 在使用精准的 V6标签序列进行分类时, 通过 ACE和 Chaol分类 显示特定环境中极其丰富的物种多样性, 我们的数据也支持了之前 的观点: 每克土壤中有成百万的细菌。 一个完整的结合有条形码标 签技术的 Solexa run能产生 1亿的标签序列, 这将使通过测序来探 究环境中细菌多样性变得越来越实用。 在特定的沉积物中常见的属和丰度高的属的分析 *These reads are from 60bp contigs (allowing 0 and 1 mismatches). When using accurate V6 tag sequences for classification, ACE and Chaol classifications show extremely rich species diversity in specific environments. Our data also supports Previous view: There are millions of bacteria per gram of soil. A complete combination of bar code labeling technology, Solexa run, can generate 100 million tag sequences, which will make it increasingly practical to explore bacterial diversity in the environment through sequencing. Analysis of genus and abundance genus common in specific sediments
Figure imgf000016_0001
Figure imgf000016_0001
Figure imgf000017_0001
Figure imgf000017_0001
*引证数据的数量由 Google 学术里引证数量而来 ( 2009.11.18 ); #NA代表不确定或者 ^由 Google学术搜索到。 *The number of citation data is derived from the number of citations in Google Scholar (2009.11.18); #NA stands for uncertainty or ^ is searched by Google Scholar.
图 3示出不同环境下微生物群体在 0.03 和 0.3 的差异度情况 下的 OTUs的数量。  Figure 3 shows the number of OTUs in the case of microbial populations with different degrees of 0.03 and 0.3 in different environments.
如图 3所示, 稀疏曲线显示了北山水库 4, 仙湖植物园 1 和 大梅沙 8的沉积物在 unique (—种评价差异度的算法)为 0.03 和 0.3 的差异度的情况下的 OTUs的数量。 北山水库沉积物有最大的 物种多样性和均勾性, 大梅沙海水沉积物的微生物多样性是最低 的。 在纲水平的分类结构中, 水库淡水沉积物显示了比其他环境有 更多的分布多样性, 研究表明大约 27%水库淡水沉积物, 20%东 湖公园沉积物, 17%的大梅沙海洋沉积物的序列以前没有被分类定 义过, 这表明在淡水环境中有更多未开发的稀有物种。  As shown in Figure 3, the sparse curve shows the OTUs of the Beishan Reservoir 4, the sediments of Xianhu Botanical Garden 1 and Dameisha 8 in the case of unique (the algorithm for evaluating the degree of difference) of 0.03 and 0.3. Quantity. The sediments of Beishan Reservoir have the largest species diversity and uniformity, and the microbial diversity of Dameisha seawater sediments is the lowest. In the classified structure of the level, the reservoir freshwater sediments show more distribution diversity than other environments. Studies show that about 27% of reservoir freshwater sediments, 20% of Donghu Park sediments, and 17% of Dameisha marine sediments. The sequence of objects has not previously been classified, indicating that there are more untapped rare species in freshwater environments.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法, 采 用结合了加标签技术的 Solexa技术, 对特定环境下的微生物群体 进行了高通量测序, 在单个的 Lane中我们测了来自 65个样品的大 约 400万个 16S rRNA V6标签序列。 在特定的北山水库, 东湖公 园, 红树林和大梅沙海水沉积物的环境中, 不同标签序列的数量分 别是 257,001, 228,101, 144,295 和 137,997个, 预估的多样性达到 了 100万。 其中, 北山水库沉积物有最高的物种多样性和均一性。 由此可见 Solexa测序 16S rRNA v6可变区对微生物群体进行分类 的方法是经济节约型的, 既减少了人力劳动也节省了经济花费, 使 得在研究微生物群落结构与健康, 环境因子等等的关系上变得容易 可行。 此外, 不管是总共的 reads还是 0个错配的 reads数量都比 以前报道的测序 16S标签序列的数量高。 690,165 个精准 v6标签 序列的数量比 Ribosomal Database Project release 10.15数据库中 的高大约 630,000个。  The present invention provides a high-throughput sequencing of a microbial population in a specific environment based on the metagenomic 16S hypervariable region V6 classification method using Solexa technology combined with a labeling technique. In a single Lane, we measured the Approximately 4 million 16S rRNA V6 tag sequences from 65 samples. In the specific Beishan Reservoir, Donghu Park, Mangrove and Dameisha seawater sediments, the number of different tag sequences is 257,001, 228,101, 144,295 and 137,997, with an estimated diversity of 1 million. Among them, the sediments of Beishan Reservoir have the highest species diversity and homogeneity. It can be seen that the method of classifying the microbial population by Solexa sequencing 16S rRNA v6 variable region is economical, which reduces the labor and economic cost, and makes the relationship between microbial community structure and health, environmental factors, etc. It becomes easy to work on. In addition, the number of reads, whether it is a total of reads or 0 mismatches, is higher than the number of previously reported sequencing 16S tag sequences. The number of 690,165 precision v6 tag sequences is approximately 630,000 higher than in the Ribosomal Database Project release 10.15 database.
参考前述本发明示例性的描述, 本领域技术人员可以清楚的知 晓本发明具有以下优点: 本发明提供的基于宏基因组 16S高可变区 V6的分类方法, 仅 仅用高变区 v6 测序来对样品中的微生物进行分类, 这种方法在分 类和测量微生物群体的相对丰度上显示了很好的效果, 甚至在可变 区 V6 区序列与它们最近的参考序列有一定差异的情况下同样能够 达到很好的效果。 结果显示通过测 V6可变区进行微生物种类的分 析, 不仅可以测到主要的那些微生物, 还可以测到更多的稀有微生 物。 通过测序 SSU rRNA的 V6可变区发现微生物的多样性不仅仅 局限于以前按照表型来分类的伯吉分类法, 而且微生物群体也远比 想象中的复杂。 此外, 在开发微生物群体的多样性和相对丰度上, 大量平行 Solexa测序 V6可变区序列有超越其他技^ M艮多的优势。 通过对可变区测序的进一步研究发现它有许多超越其他测序的优 点, 比如微生物多样性的相对水平, 序列的长度, 同聚物的密度, 能够识别到种水平的能力, 或适应不同扩增引物的优点。 With reference to the foregoing exemplary description of the invention, it will be apparent to those skilled in the art that the invention has the following advantages: The present invention provides a classification method based on the macrogen 16S hypervariable region V6, which only uses high-variable region v6 sequencing to classify microorganisms in a sample, and this method shows very much in classifying and measuring the relative abundance of the microbial population. Good results, even in the case where the variable region V6 region sequences differ from their nearest reference sequences, can achieve good results. The results showed that by analyzing the V6 variable region for microbial species analysis, not only the main microorganisms but also more rare microorganisms could be detected. By sequencing the V6 variable region of SSU rRNA, it was found that the diversity of microorganisms is not limited to the previous classification of Burkins by phenotype, and the microbial population is far more complex than imagined. In addition, in developing the diversity and relative abundance of microbial populations, a large number of parallel Solexa sequencing V6 variable region sequences have advantages over other techniques. Further research on sequencing of variable regions revealed many advantages over other sequencing, such as the relative level of microbial diversity, the length of the sequence, the density of homopolymers, the ability to recognize species levels, or the adaptation to different amplifications. The advantages of primers.
本发明提供的基于宏基因组 16S高可变区 V6的分类方法, V6 可变区 Solexa测序能够产生与传统的全长 SSU rRNA测序相似的 分类法和相对丰度值, 但由于它的序列比较短, 同样一个 run , 它 能提供更多样品的 reads, 识别更多的微生物, 比传统的全长 SSU rRNA 测序每个 read 花费更少。 随着技术的进步, 产生更多的 Reads数据和更长的序列, Solexa测序将提供更广阔的机会给可变 区测序分类微生物, 比如长测序, 可变区域的应用, 各种各样可变 区的结合, 或者更深的测序深度。 可变区标签测序的最大优势在于 它应用了大量平行 Solexa 测序的优势, 比原先所达到的大好几个 数量级的测序深度和宽度, 促进了微生物群体和稀有生物圏广阔多 样性的开发。  The present invention provides a classification method based on the metagenomic 16S hypervariable region V6, and the V6 variable region Solexa sequencing can generate similar taxonomic and relative abundance values compared to conventional full-length SSU rRNA sequencing, but due to its short sequence The same run, which provides more samples of the reads, identifies more microbes, and costs less for each read than traditional full-length SSU rRNA sequencing. As technology advances to produce more Reads data and longer sequences, Solexa sequencing will provide a broader opportunity to sequence variable microbes for variable region sequencing, such as long sequencing, variable region applications, and a variety of variables. Combination of regions, or deeper sequencing depth. The biggest advantage of variable-zone tag sequencing is that it uses the advantages of a large number of parallel Solexa sequencing, which is several orders of magnitude deeper than the original, and promotes the broad diversity of microbial populations and rare organisms.
本发明的描述是为了示例和描述起见而给出的, 而并不是无遗 漏的或者将本发明限于所公开的形式。 很多修改和变化对于本领域 的普通技术人员而言是显然的。 本发明中描述的功能模块以及功能 模块的划分方式仅为说明本发明的思想, 本领域技术人员根据本发 明的教导以及实际应用的需要可以自由改变功能模块的划分方式及 其模块构造以实现相同的功能; 选择和描述实施例是为了更好说明 本发明的原理和实际应用, 并且使本领域的普通技术人员能够理解 本发明从而设计适于特定用途的带有各种修改的各种实施例。 The description of the present invention has been presented for purposes of illustration and description. Many modifications and variations will be apparent to those skilled in the art. The functional modules and functional modules described in the present invention are divided only to explain the idea of the present invention, and those skilled in the art according to the present invention The teachings of the present invention and the needs of the actual application can freely change the division manner of the functional modules and their module configurations to achieve the same functions. The embodiments are chosen and described in order to better explain the principles and practical applications of the present invention, and The skilled artisan can understand the invention and thus design various embodiments with various modifications that are suitable for a particular use.

Claims

权 利 要 求 Rights request
1. 一种基于宏基因组 16S高可变区 V6的分类方法, 其特征在于, 所述方法包括: A classification method based on a metagenomic 16S hypervariable region V6, characterized in that the method comprises:
对于提取的微生物的脱氧核糖核酸 DNA, 通过引物对宏基因组 16S 核糖体脱氧核糖核酸 rDNA的高可变区 V6进行聚合酶链式反应 PCR, 为每个样品加上标签序列;  For the deoxyribonucleic acid DNA of the extracted microorganism, a polymerase chain reaction PCR is performed on the hypervariable region V6 of the metagenomic 16S ribosomal deoxyribonucleic acid rDNA by a primer to add a tag sequence to each sample;
把不同样品的 PCR产物进行混合;  Mixing the PCR products of different samples;
对混合后的 PCR产物进行 Solexa建库法建库;  The Solexa database construction method is performed on the mixed PCR product;
使用 Solexa测序工具对所述高可变区 V6的文库进行双末端 pair- end测序, 得到测序序列;  The library of the high variable region V6 was subjected to double-end pair-end sequencing using a Solexa sequencing tool to obtain a sequencing sequence;
对所述测序序列进行筛选, 以过滤掉低质量的测序序列;  The sequencing sequence is screened to filter out low quality sequencing sequences;
基于所述测序序列利用重叠群的关系对所述高可变区 V6 的全长序 列进行组装;  Assembling the full-length sequence of the hypervariable region V6 based on the relationship of the contigs based on the sequencing sequence;
通过标签序列把所述测序序列分配到对应的样品上;  Assigning the sequencing sequence to a corresponding sample by a tag sequence;
通过对所述测序序列进行分类, 以实现使用所述高可变区的测序对 微生物群体进行高通量的分类。  The high-throughput classification of the microbial population is achieved by classifying the sequencing sequences to achieve sequencing using the hypervariable regions.
2.根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 在 提取微生物的脱氧核糖核酸 DNA之前, 执行微生物群体的取样。 The method according to claim 1, characterized in that the method further comprises: performing sampling of the microbial population before extracting the DNA DNA of the microorganism.
3.根据权利要求 1所述的方法, 其特征在于, 所述方法还包括: 在 通过对所述测序序列进行分类之后, 对不同差异度的测序序列进行操作 分类学单元 OTU的分类; The method according to claim 1, wherein the method further comprises: performing, by classifying the sequencing sequence, performing classification of the classification sequence OTU on sequencing sequences of different degrees of difference;
根据所述标签序列和测序序列, 进行种群多样性估计 Chaol 算法 和血管紧张素转化酶 ACE的多样性分析。  Based on the tag sequence and sequencing sequence, population diversity estimation Chaol algorithm and angiotensin-converting enzyme ACE diversity analysis were performed.
4.根据权利要求 3所述的方法, 其特征在于, 在进行种群多样性估 计 Chaol算法和血管紧张素转化酶 ACE的多样性分析之后, 输出微生 物群体的多样性分析图和相对丰度图 The method according to claim 3, wherein after the population diversity estimation Chaol algorithm and the angiotensin-converting enzyme ACE diversity analysis, the output micro-production Diversity analysis and relative abundance map of the population
5.根据权利要求 1所述的方法, 其特征在于, 所述通过引物对宏基 因组 16S核糖体脱氧核糖核酸 rDNA的高可变区 V6进行聚合酶链式反 应 PCR, 为每个样品加上标签序列包括: The method according to claim 1, wherein the polymerase chain reaction PCR is performed on the hypervariable region V6 of the metagenomic 16S ribosomal deoxyribonucleic acid rDNA by a primer to label each sample. The sequence includes:
使用 引物 967f: CNACGCGAAGAACCTTANC 和 1406R: GACAGCCATGCANCACCT 去复制微生物群体中细菌的 16S 高可变 区 V6区片段;  Primer 967f: CNACGCGAAGAACCTTANC and 1406R: GACAGCCATGCANCACCT were used to replicate the 16S hypervariable region V6 region fragment of the bacteria in the microbial population;
对每个微生物样品加标签序列, 所述标签序列 到所述引物 967f 的 5, 端的前面, 以及在所述标签序列和所述引物 967f 之间加上碱基 GT。  Each microbial sample is tagged, the tag sequence is placed in front of the 5' end of the primer 967f, and a base GT is added between the tag sequence and the primer 967f.
6.根据权利要求 5所述的方法, 其特征在于, 所述方法还包括: 对于古生菌的高可变区 V6 的聚合酶链式反应 PCR , 使用引物 958AR : AATTGGANTCAACGCCGG 和 1048AR : CGRCGGCCATGCACCWC。 The method according to claim 5, wherein the method further comprises: for polymerase chain reaction PCR of the hypervariable region V6 of archaea, using primers 958AR: AATTGGANTCAACGCCGG and 1048AR: CGRCGGCCATGCACCWC.
7. 根据权利要求 1 所述的方法, 其特征在于, 所述把不同样品的 PCR产物进行混合包括: 7. The method of claim 1 wherein said mixing the PCR products of different samples comprises:
对所述 16S的高可变区 V6的 PCR产物进行浓度定量; 以及按照 等摩尔的量混合在一起。  The PCR product of the 16S high variable region V6 is subjected to concentration quantification; and mixed together in equimolar amounts.
8. 根据权利要求 1 所述的方法, 其特征在于, 所述对混合后的 PCR产物进行 Solexa建库法建库包括: The method according to claim 1, wherein the performing the Solexa database construction process on the mixed PCR product comprises:
把混合产物进行纯化, 末端修复, 在 3,端加上碱基 A, 加上双末端 Pair-end测序接头;  The mixed product is purified, end-repaired, base A is added at the 3' end, and a double-end Pair-end sequencing linker is added;
加完接头后, 对样品进行纯化;  After the joint is added, the sample is purified;
对纯化后的样品进行溶解, 并作为模板进行聚合酶链式反应 PCR 扩增; 以及 对所述聚合酶链式反应 PCR产物进行胶纯化。 The purified sample is dissolved and used as a template for polymerase chain reaction PCR amplification; The polymerase chain reaction PCR product was subjected to gel purification.
9.根据权利要求 1所述的方法, 其特征在于, 所述低质量的测序序 列包括: 与最邻近的引物不匹配的测序序列、 小于 50 基对的测序序 列, 或者具有至少一个不同威基的测序序列。 9. The method of claim 1, wherein the low quality sequencing sequence comprises: a sequencing sequence that does not match the nearest primer, a sequencing sequence of less than 50 base pairs, or has at least one different basis. Sequencing sequence.
10.根据权利要求 1 所述的方法, 其特征在于, 所述基于所述测序 序列利用重叠群的关系对所述高可变区 V6的全长序列进行组装包括: 采用所述高可变区 V6的 PCR产物 5, 端的前 75、 70、 65、 60和 55碱基对来进行重叠从而组装; 其中, 组装的标准是一对测序序列具有 大于 5 对的重叠长度和在重叠区域小于 10%的不匹配度。 10. The method according to claim 1, wherein the assembling the full length sequence of the hypervariable region V6 based on the relationship of the contigs based on the sequencing sequence comprises: employing the hypervariable region The PCR product 5 of V6, the first 75, 70, 65, 60 and 55 base pairs of the ends are overlapped for assembly; wherein the assembly standard is that a pair of sequencing sequences has an overlap length of more than 5 pairs and less than 10% in the overlap region The degree of mismatch.
11.根据权利要求 1 所述的方法, 其特征在于, 所述通过对所述测 序序列进行分类包括: The method according to claim 1, wherein the classifying the sequence of the sequence comprises:
将分配到对应样品上的所述测序序列比对到现有 16s v6数据库中, 来达到使用高可变区的标签测序对微生物群体进行高通量的分类, 进而 研究微生物群体的结构。  The sequencing sequences assigned to the corresponding samples were aligned into an existing 16s v6 database to achieve high throughput classification of the microbial population using tag sequencing using high variable regions, thereby studying the structure of the microbial population.
PCT/CN2011/081858 2010-11-24 2011-11-07 Classification method based on the metagenome 16s high variable region v6 WO2012068949A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010557119.0 2010-11-24
CN2010105571190A CN102477460A (en) 2010-11-24 2010-11-24 Method for performing sequencing and cluster analysis on V6 hypervariable region of metagenomic 16S rDNA

Publications (1)

Publication Number Publication Date
WO2012068949A1 true WO2012068949A1 (en) 2012-05-31

Family

ID=46090244

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/081858 WO2012068949A1 (en) 2010-11-24 2011-11-07 Classification method based on the metagenome 16s high variable region v6

Country Status (2)

Country Link
CN (1) CN102477460A (en)
WO (1) WO2012068949A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105637099B (en) * 2013-08-23 2020-05-19 深圳华大智造科技有限公司 Long fragment de novo assembly using short reads
CN104484558B (en) * 2014-12-08 2018-04-24 深圳华大基因科技服务有限公司 The analysis report automatic generation method and system of biological information project
CN105279391A (en) * 2015-09-06 2016-01-27 苏州协云和创生物科技有限公司 Metagenome 16S rRNA high-throughput sequencing data processing and analysis process control method
CN105296620B (en) * 2015-10-26 2019-01-15 上海市内分泌代谢病研究所 The macro genome signature of enteron aisle is as diabetes B acarbose therapeutic efficacy screening mark
CN107038349B (en) * 2016-02-03 2020-03-31 深圳华大生命科学研究院 Method and apparatus for determining pre-rearrangement V/J gene sequence
CN106055924B (en) 2016-05-19 2019-02-01 完美(中国)有限公司 Microbiological manipulations taxon is determining and sequence assists isolated method and system
CN106021987B (en) * 2016-05-24 2019-04-09 人和未来生物科技(长沙)有限公司 Ultralow frequency mutating molecule label clustering clustering algorithm
CN106775998A (en) * 2016-11-09 2017-05-31 上海派森诺生物科技股份有限公司 High flux 16S sequencing data automatic processing methods
CN106480213A (en) * 2016-11-30 2017-03-08 江西中烟工业有限责任公司 The grand gene order-checking of ageing tobacco leaf surface microorganism and authentication method
CN107292124A (en) * 2017-06-25 2017-10-24 广东国盛医学科技有限公司 Grand genome manipulation taxon recognition methods based on layering pivot deep learning
CN110111843B (en) * 2018-01-05 2021-07-06 深圳华大基因科技服务有限公司 Method, apparatus and storage medium for clustering nucleic acid sequences
CN109797438A (en) * 2019-01-17 2019-05-24 武汉康测科技有限公司 A kind of joint component and library constructing method quantifying sequencing library building for the variable region 16S rDNA
CN110176275A (en) * 2019-05-22 2019-08-27 中国药科大学 The macro genomic data analysis method in oral cavity based on high-flux sequence
CN111816258B (en) * 2020-07-20 2023-10-31 杭州谷禾信息技术有限公司 Optimization method for accurate identification of human flora 16S rDNA high-throughput sequencing species
CN112489726A (en) * 2020-11-10 2021-03-12 哈尔滨因极科技有限公司 Analysis method, device and equipment based on 16S microbial amplification sequencing data
CN113077845A (en) * 2021-04-13 2021-07-06 中国科学院大气物理研究所 Analysis method for composition of atmospheric aerosol microbial community

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833613A (en) * 2010-06-04 2010-09-15 中国科学院青岛生物能源与过程研究所 Oral microbial community database and application thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833613A (en) * 2010-06-04 2010-09-15 中国科学院青岛生物能源与过程研究所 Oral microbial community database and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GALAND P E ET AL.: "Unique archaeal assemblages in the Arctic Ocean unveiled by massively parallel tag sequencing.", THE ISME JOURNAL., vol. 3, 2009, pages 860 - 869 *
WU J Y ET AL.: "Effects of polymerase, template dilution and cycle number on PCR based 16S rRNA diversity analysis using the deep sequencing method.", BMC MICROBIOLOGY., vol. 10, 12 October 2010 (2010-10-12), pages 255, XP021073133, DOI: doi:10.1186/1471-2180-10-255 *

Also Published As

Publication number Publication date
CN102477460A (en) 2012-05-30

Similar Documents

Publication Publication Date Title
WO2012068949A1 (en) Classification method based on the metagenome 16s high variable region v6
McKain et al. Practical considerations for plant phylogenomics
Turner et al. Methods for genomic partitioning
Shokralla et al. Next‐generation sequencing technologies for environmental DNA research
Zhang et al. 454 Pyrosequencing reveals bacterial diversity of activated sludge from 14 sewage treatment plants
Dufresnes et al. Integrating hybrid zone analyses in species delimitation: lessons from two anuran radiations of the Western Mediterranean
Zhou et al. BIPES, a cost-effective high-throughput method for assessing microbial diversity
Dentinger et al. Comparing COI and ITS as DNA barcode markers for mushrooms and allies (Agaricomycotina)
Almeida et al. Bioinformatics tools to assess metagenomic data for applied microbiology
Lanzén et al. Exploring the composition and diversity of microbial communities at the Jan Mayen hydrothermal vent field using RNA and DNA
Andersen et al. Genomic insights into Candidatus Amarolinea aalborgensis gen. nov., sp. nov., associated with settleability problems in wastewater treatment plants
Mora et al. Morphology and metabarcoding: a test with stream diatoms from Mexico highlights the complementarity of identification methods
Meiser et al. Sequencing genomes from mixed DNA samples-evaluating the metagenome skimming approach in lichenized fungi
US20210403991A1 (en) Sequencing Process
Bricheux et al. Pyrosequencing assessment of prokaryotic and eukaryotic diversity in biofilm communities from a F rench river
Ibarbalz et al. The bias associated with amplicon sequencing does not affect the quantitative assessment of bacterial community dynamics
Sonthiphand et al. Evaluating primers for profiling anaerobic ammonia oxidizing bacteria within freshwater environments
Akacin et al. Comparing the significance of the utilization of next generation and third generation sequencing technologies in microbial metagenomics
CN107868837A (en) A kind of Primer composition and its application for being used to analyze enteric microorganism
Thies Molecular approaches to studying the soil biota
Dueholm et al. Comprehensive ecosystem-specific 16S rRNA gene databases with automated taxonomy assignment (AutoTax) provide species-level resolution in microbial ecology
Nema The role and future possibilities of next-generation sequencing in studying microbial diversity
Ntushelo Identifying bacteria and studying bacterial diversity using the 16S ribosomal RNA gene-based sequencing techniques: A review
Gorden et al. Next generation sequencing of STR artifacts produced from historical bone samples
Rodrigues et al. Molecular Diversity of Environmental Prokaryotes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11842498

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11842498

Country of ref document: EP

Kind code of ref document: A1