WO2014075296A1 - 核酸测序方法、系统及质控方法、系统 - Google Patents

核酸测序方法、系统及质控方法、系统 Download PDF

Info

Publication number
WO2014075296A1
WO2014075296A1 PCT/CN2012/084757 CN2012084757W WO2014075296A1 WO 2014075296 A1 WO2014075296 A1 WO 2014075296A1 CN 2012084757 W CN2012084757 W CN 2012084757W WO 2014075296 A1 WO2014075296 A1 WO 2014075296A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
library
chip
quality control
formal
Prior art date
Application number
PCT/CN2012/084757
Other languages
English (en)
French (fr)
Inventor
刘琳
何毅敏
尹烨
席凤
罗宇芬
Original Assignee
深圳华大基因科技服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技服务有限公司 filed Critical 深圳华大基因科技服务有限公司
Priority to CN201280076470.5A priority Critical patent/CN104822842A/zh
Priority to PCT/CN2012/084757 priority patent/WO2014075296A1/zh
Publication of WO2014075296A1 publication Critical patent/WO2014075296A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation

Definitions

  • the present invention relates to the field of nucleic acid sequencing technology, and in particular to a quality control method and a quality control system for a nucleic acid sequencing library, and a nucleic acid sequencing method and a nucleic acid sequencing system. Background technique
  • High-throughput sequencing also known as "Next-generation sequencing technology.” Sequence and general reading of hundreds of thousands to millions of DNA molecules in parallel. High-throughput sequencing is a milestone in the evolution of sequencing technology. This technology allows simultaneous sequencing of millions of DNA molecules, which allows detailed transcriptomes and genomes of a species to be detailed. Analysis is possible. It is also called deepepsequencing or next generation sequencing (NGS).
  • NGS next generation sequencing
  • MPSS Large-scale parallel signature Massively Parallel Signature Sequencing
  • POLony Sequencing Polymerase Cloning
  • 454 Pyrosequic acid sequencing (454 pyrosequencing). 11 lumina (Solexa) sequencing ABI SOLiD sequencing. Ion semiconductor sequencing. Nanosphere sequencing (DNA nanoba 11 sequenc ing ) Wait.
  • the PGM sequencing platform is a sequencer from Life Technologies.
  • the short sequencing time is its biggest feature.
  • Ion Proton is a high-throughput upgraded sequencer based on the same technology.
  • the earliest PGM sequencing library building method has lower data throughput, and the output of 314 chip is only 10M.
  • the cost of sequencer flux and single chip also increase, and the corresponding sequencing risk is also Improve, in this case, how to effectively control the sequencing library in advance becomes a prominent problem.
  • the database construction process is as follows: First, the genomic DNA is broken into the main band less than 500 bp according to the PGM sample preparation method.
  • Sequencing libraries constructed based on bridge amplification can be used for quality control of libraries using Ag i lent 2100, qPCR, etc., such as sequencing libraries for the Illumina So l exa sequencing platform.
  • there is no special instrument or method for the quality control of the library constructed by emPCR and as the technology continues to increase, the throughput of the Ion PGM sequencer is also doubled, and Ion
  • the Pro ton sequencer has a 100-fold increase in flux compared to PGM's 314. Their high-throughput characteristics also place higher demands on the quality control of their sequencing libraries.
  • the object of the present invention is to provide a nucleic acid sequencing library quality control method, a quality control system, and a nucleic acid sequencing method capable of efficiently detecting an unqualified library when performing nucleic acid sequencing using a high-throughput sequencing system, in view of the deficiencies of the prior art. And system.
  • the present invention adopts the following technical solutions:
  • the invention discloses a quality control method for nucleic acid sequencing, wherein the nucleic acid sequencing is performed by a high-throughput sequencing system, and the quality control method comprises: using a predictive chip pair sample library before formal sequencing of the sample library by using a formal chip The prediction sequence is performed, and the sample library is judged to be qualified according to the prediction result, and the unqualified sample is not subjected to formal sequencing, and the capacity of the prediction chip is smaller than the capacity of the official chip.
  • the invention also discloses a nucleic acid sequencing method, comprising the steps of sequencing a sequencing library, and the step of quality control of the sequencing library by using the above-mentioned quality control method.
  • the invention further discloses a quality control system for a sequencing library, comprising a prediction sequence module, wherein the prediction sequence module is provided with a prediction sequence module, and the prediction sequence module is used to use the prediction sequence before formal sequencing of the library by the official chip
  • the chip predicts the library, and the predicted sequence result can be used to determine whether the library is qualified.
  • the capacity of the predictive chip is smaller than the capacity of the official sequencing chip.
  • the invention also discloses a nucleic acid sequencing system, comprising a formal sequencing module for formal sequencing of a sequencing library by using a formal chip, and a quality control system of the above-mentioned sequencing library, which is used for qualitatively sequencing the library before formal sequencing. Control, if the quality control result is qualified, then The data is formally sequenced, and the prediction chip is used to predict the sequence data in the quality control system, and is combined with the data obtained by the formal sequencing module to be validly sequenced together; if the quality control result is unqualified, no formal sequencing is performed.
  • the prediction sequence is performed using a prediction chip smaller than the official chip capacity, and the sample library can be judged according to the result of the prediction sequence, thereby selecting an unqualified library and avoiding Waste of material reagents, time and labor costs caused by direct sequencing of unqualified libraries.
  • Figure 1 Schematic diagram of the DNA and I SP ligation reaction. Em-PCR template preparation Chinese library DNA and ISP connection, a schematic diagram of the reaction. The gray sphere is ISP, the black curve indicates the library DNA, and the rectangles at both ends indicate the adapters at both ends.
  • Figure 2 Comparison of the read length of the 314 chip and the 316 chip.
  • the abscissa indicates the read length, and the ordinate indicates the number of base sequences of the read length.
  • Figure 3 Comparison of base sequence quality between 314 and 316 chips.
  • the abscissa indicates the quality and the ordinate indicates the number of base sequences.
  • Figure 4 Comparison of base distribution between 314 chip and 316 chip.
  • the abscissa indicates the length of the reading, and the ordinate indicates the proportion of different bases.
  • the figure shows the base percentage composition along reads measured in each sequencing.
  • Figure 5 Comparison of the base quality of 314 and 316 chips as a function of cycle number.
  • the abscissa indicates the read length and the ordinate indicates the average mass value of the base in this cycle.
  • the present invention relates to a quality control method, system for nucleic acid sequencing using a high-throughput sequencing system, and a nucleic acid sequencing method and system including the same.
  • the quality control method of the present invention mainly comprises: predicting the sample library using the predictive chip, and determining whether the sample library is qualified according to the predicted sequence result before the formal sequencing of the sample library is performed by using the official chip, and the unqualified sample is not subjected to formal sequencing, Predict chip capacity Less than the capacity of the official chip.
  • the nucleic acid sequencing method of the present invention comprises the steps of sequencing the sequencing library, and further comprising the step of quality control of the sequencing library using the quality control method of the present invention prior to the step of sequencing the sequencing library.
  • the data of the prediction sequence using the prediction chip is collectively combined with the data obtained by the subsequent formal sequencing as the effective sequencing data.
  • the nucleic acid sequencing method of the present invention may further comprise the step of preparing a sequencing library, the step of preparing the sequencing library comprises: breaking the DNA sample into fragments, repairing the end, connecting with the linker, and then performing the target fragment The emulsion is amplified by PCR, and then the target fragment is recovered to obtain a sequencing library.
  • the quality control system of the sequencing library of the present invention comprises a prediction sequence module, wherein the prediction prediction chip is used to predict the library, and the prediction sequence result can be used to determine whether the library is qualified, and the capacity of the prediction sequence chip is smaller than the capacity of the formal sequencing chip. .
  • the nucleic acid sequencing system of the present invention comprises a formal sequencing module and a sequencing control library of the above-mentioned sequencing library for quality control of the sequencing library before formal sequencing, and if the quality control result is qualified, formal sequencing is performed, and the quality control is performed.
  • the prediction chip used in the system for predictive order data is combined with the data obtained by formal sequencing in the formal sequencing module as effective sequencing data; if the quality control result is unqualified, no formal sequencing is performed.
  • the capacity of the pre-chip is 1% to 10% of the official chip capacity.
  • High throughput sequencing systems to which the methods or systems of the invention are applicable are preferably high throughput sequencing systems using emulsion PCR (emul s i on PCR, emPCR). More preferred are the currently used Ion Tor rent sequencing platforms, the ABI SOL iD sequencing platform and the Roche 454 sequencing platform. Among them, Ion PGM (Ion Per sona l Genome Machine) and Ion Proton in the Ion Torrent sequencing platform are particularly suitable for the quality control method of the present invention.
  • the prediction library when used to predict the sequence of the sample library, only one sample library may be detected at a time, or a plurality of sample libraries of different sources may be mixed by adding an index tag sequence, and then simultaneously On a predictive chip Take a test.
  • the results of the mixed detection can also effectively reflect the quality of each sample library, and judge whether it is qualified or not. It is not because the multiple sample libraries are mixed and simultaneously detected, and the detection accuracy is lower than that of the single library alone.
  • the method or system of the present invention is most suitable for use in the sequencing process using the Ion PGM and Ion Proton systems.
  • the chip PGM 314 chip with the lowest capacity can be used as the preferred predictive chip for the method or system of the present invention.
  • Ion PGM's supporting chips include 314 chips, 316 chips, and 318 chips.
  • the capacity and market price can be referred to the following table 1:
  • a library constructed of a genomic DNA sample such as an E. coli genome, is first in a 314 chip (capacity 10M).
  • the prediction sequence is performed, the predicted sequence result data is obtained, and the predicted sequence result data is analyzed for quality control. If the data meets the requirements, the library is qualified, and the library is officially sequenced on the 316 chip (capacity 100M), thereby obtaining Good data performance.
  • multiple libraries can be simultaneously quality-controlled by adding an index sequence to the library, and then separately on the machine or mixed on the machine (sequence throughput and required data according to the sequencer) The amount to choose), get the expected data results.
  • the Ion Proton sequencing platform it is the latest generation of next-generation sequencers from Life Technologies after Ion PGM. Its time to market is September 2012.
  • the Ion Proton sequencing platform does not have a dedicated quality control method.
  • the chip types are PI and ⁇ , the capacity is greater than 1G, and the price is much higher than the Ion 314, 316 or 318 chips. Each chip is disposable.
  • the Ion 314, 316 or 318 chip which is relatively low in capacity but absolutely inexpensive, is compatible with the Ion Proton platform. Thus, with the quality control method described in the present invention, the capacity is smaller before formal sequencing using PI or ⁇
  • the Ion 314, 316 or 318 chip preferably has a 314 chip for predictive ordering, which can be used in the predictive sequence process. Most of the unqualified libraries were selected at a relatively small cost, excluding the losses caused by most of the unqualified libraries being directly on the machine.
  • the present invention is equally flexible for use in the quality control of libraries involved in the emPCR process during library construction, as applied to Life technologies (applied biosys terns) SOLiD and Roche 454 sequencing platforms.
  • Each of these two high-throughput sequencing platforms has only one type of chip, and there is no specific quality control method. Manual or machine real-time monitoring is required for sequencing.
  • the quality control method of the present invention such as designing or purchasing a chip compatible with the S01iD, 454 or PGM platform and having a capacity of at least 1% of the required data amount, the library quality control can detect most unqualified libraries, exempt from The waste of material reagents, time and labor costs caused by direct sequencing of qualified libraries is highly practical.
  • the Ion 314 chip is also suitable as a predictive sequencing process for predictive chips for quality control, where the predictive sequence process can be performed on the Ion Torrent sequencing platform. If the cost is relaxed, the life technologies 316 and 318 chips can replace the 314 chip for the prediction order here.
  • the detected library can be prepared based on the standard library preparation method provided by each sequencing platform, for example, based on the library preparation method provided by the current Life Technologies PGM sequencing platform.
  • the total DNA sample is first broken into fragments of a certain length by mechanical or enzymatic cleavage, and then the end is repaired and linked to the linker.
  • the target fragment was ligated to the adaptor, the target fragment was subjected to Em-PCR amplification by a specific PCR primer, and the target fragment library was finally recovered by agarose electrophoresis and gel-cutting.
  • the constructed library (see Example 1, a human pair-end DNA tag library constructed using Escherichia coli genomic DNA) is mixed in proportion for design purposes, and PGM 314 is used.
  • the chip predicts the library to predict the quality and quantitative concentration of the library, and compares the change in mass value with the 316 chip using the same read length (ie, the total read length of the library is 100 and 200 bases, respectively). Next, compare the changes in library quality values of the first 100 cycles).
  • the quality value (Q-Value) can reflect the quality of the sequencing, between 0-40. In this range, the higher the quality, the better.
  • Q20 refers to the proportion of bases with mass values greater than 20 in all bases, which can reflect the quality of the sequence sequenced.
  • the increase in sequencing read length will result in a decrease in the quality of the sequencing, which is a decrease in the Q20 value.
  • the chips with different read lengths and different yields are the same, so that the quality of the same library can be seen on the 316 chip by using the 314 chip sequencing quality changes with lower yield and lower cost.
  • the read cluster base distribution and its cyclical trend also show similar quality conditions.
  • the quality of the library is qualified, and those skilled in the art can judge the empirical value according to the prediction result of the 314 chip, generally speaking, Illumina hiseq2000 sequencing platform 100PE (pair-end) library construction experience, Q20 greater than 80% can be judged as a library qualified.
  • the judgment standard for easy operation can be directly determined, that is, the library with the quality control result Q20>80% is determined as a qualified library, and the available In the latter step, it is officially sequenced, and vice versa, it is unqualified, which can avoid the waste of the back-step large-capacity chip.
  • the prediction chip is used to predict the sequence data, which is also valid data, which can be integrated into the data obtained from subsequent formal sequencing for subsequent analysis.
  • Figure 3 Figure 4, and Figure 5 show the comparison of the base sequence quality of the 314 chip and the 316 chip, the comparison of the base distribution, and the comparison of the base mass and the cycle number, among which A: 314 chips; B: 316 chip (Qualified library), C: 316 chip (unqualified library without quality control). It can be clearly seen from these three figures that the quality of the bases, the base distribution, and the base mass change with the number of cycles are quite different.
  • Figure A shows the sequencing results of the 314 chip with a read length of 100 bp
  • Figure B shows the sequencing results of the normal library of the 316 chip with a 200b P read length
  • Figure C shows the results of the normal library sequencing of the 316 chip with a read length of 200 bp. From the results of Fig.
  • the abscissa quality value also shows different trends.
  • the high quality value data of the qualified library in the first two graphs is higher than the unqualified library data, and the total Q20 value of the unqualified library is also low.
  • a qualified library In a qualified library.
  • Figure 4 pass and fail The difference in the library is more obvious.
  • the bases of the qualified libraries are uniformly distributed, and this trend and the read length are synchronized, and the base distribution of the unqualified library shows significant fluctuations.
  • Figure 5 reflects the heat history of the ordinate mass value as a function of the abscissa read length. The lighter the color, the higher the proportion of bases distributed there.
  • the high-quality base ratio is significantly higher than the unqualified The library, and the 314 and 316 chips have a consistent trend.
  • Ion prot on of li fe techno l og ies because the technical basis of the Ion pro ton sequencing platform is exactly the same as PGM, so the low-cost 314 chip and PGM sequencing platform can be used as the quality control method of I on prot on. It can be applied to other sequencing platforms using em-PCR technology, such as ABI's SOL iD sequencing platform and Roche's 454 sequencing platform.
  • em-PCR technology such as ABI's SOL iD sequencing platform and Roche's 454 sequencing platform.
  • NanoDrop 1000 DNA concentration Spectrophotometer Thermo Fisher test instrument
  • DNA Polymerase I (DNA polymerase I ) part # 1000577
  • 5xT4 DNA Ligase Buffer 5xT4 DNA Ligase Buffer ( 5xT4 DNA ligase slow part # 1000581 ⁇ )
  • T4 DNA Ligase (T4 DNA ligase) part # 1000580
  • Resuspension Buffer (dissolving buffer) part # 1001388
  • the product was purified and purified by QIAquick PCR Purification Kit and dissolved in 40 ul EB.
  • the reaction system is 200 ⁇ 1, and its composition is:
  • the reaction conditions are: incubation at room temperature for 20 min
  • reaction system lOOul its composition is:
  • reaction conditions are: 25 ° C for 15 min, 72 ° C for 5 min, 4 ° C ⁇
  • the reaction product was purified by purification in 1.8 volumes of Ampure Beads (Beckman Coulter Genomics) and dissolved in 20 ⁇ l of EB.
  • the purified DNA in the previous step was electrophoresed in 2.0% recycled gel.
  • the condition is 100V, 2h.
  • Em-PCR template preparation 2. 1 200bp library reference Ion XpressTM Template 200 Kit instruction specification The following reagents are from Ion XpressTM Template 200 Kit
  • the library prepared in the previous step was diluted to a final concentration of 280*10 6 molecules per 18 ⁇ l, which satisfies 280*10 6 molecules per reaction (280* 10 6 ISP/reaction).
  • IKA DT-20 oil phase (9ml) and ISP, as well as PCR aqueous phase MIX, PCR aqueous phase MIX were prepared according to the Ion XpressTM Template 200 Kit specification.
  • the DNA in the library is ligated to the ISP and replicated (Fig. 1).
  • the reaction product is in the state of ISPs water-in-oil.
  • biotin-containing My one Beads was added to specifically bind to the ISPs amplification product and then the lysate Melt-off solution was added to change the DNA template on the ISPs from double-stranded to single-stranded. To obtain single-chain ISPs.
  • Qubit 2.0 Qubit 2.0 (Invitrogen) to meet the sequencing requirements of the machine.
  • the library prepared in the previous step is diluted to a final concentration of 280*10 6 molecules per 5 ⁇ 1, which is 160*10 6 molecules per reaction (160*10 6 ISP/reaction).
  • PCR aqueous phase mix components are as follows:
  • the DNA in the library is ligated to the ISP and replicated (Fig. 1).
  • the reaction product is in the state of ISPs water-in-oil.
  • biotin-containing My one Beads was added to specifically bind to the ISPs amplification product and then the lysate Melt-off solution was added to change the DNA template on the ISPs from double-stranded to single-stranded. To obtain single-chain ISPs.
  • This step refers to the Ion One TouchTM Template Kit operating instructions and places the reagents in the designated location, which is done by the machine ES in the One Touch automation system.
  • the single-chain ISP is tested by Qubit 2.0 (Invitrogen) to meet the sequencing requirements of the machine and proceeds to the next step.
  • Qubit 2.0 Invitrogen
  • the sequencing operation process is detailed in the PGM operating instructions.
  • the 100 bp library uses a 100 bp sequencing reagent and the 200 bp library uses a 200 bp sequencing reagent.
  • Install the corresponding sequencing chip (such as 314 Chip, 316 chip, 318 chip, etc.)
  • the lOObp library and the 200 bp library were respectively 314 chips and 316 chips. Enzymes and prepared single-stranded ISPs were added to the chip for sequencing. Among them, the 314 chip was used as the pre-measurement chip, and the result was predicted by the 314 chip (Q20 average was 80.7%), and then 316 chip was used for formal sequencing. In the above specific embodiments, the 314 and 316 chips were used for sequencing using the quality control method of the present invention. When using 314 chips, the average read length is lOObp, and the Q20 average is 80.7%. When using 316 chip sequencing, the average read length is 200bp, and the quality value is always maintained at 60.9% ( Figure 3).
  • the difference between the two is 20 %, this is because the 314 chip has a shorter read length and better quality than the 316 chip.
  • the Q20 of the 314 chip needs to be reduced by a certain value. (such as 20%), can effectively reflect the quality of the library. If different libraries are tagged using a tag sequence, the quality of multiple different libraries can be simultaneously detected on a low cost 314 chip.
  • the table below shows the expected run time for sequencing and the amount of data expected to be output for each chip at different read lengths.
  • the low-capacity chip runs for a shorter period of time and its capacity meets the amount of data required for multiple library mixing controls, saving time spent on direct sequencing of unqualified libraries. For example, if 10 libraries are mixed and 314 chips are used for quality control, it only takes 1.5 hours. One library is unqualified. The unqualified library directly consumes 2.4h for 318 sequencing, saving 0.9h. If no library is unqualified, this quality control data can be It is used directly as sequencing data and does not take much time.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本发明公开了一种高通量核酸测序的质控方法、质控系统以及相应的核酸测序方法和系统。包括使用预测芯片对样品文库进行预测序,根据预测序结果判断样品文库是否合格,不合格样品不进行正式测序,所述预测芯片的容量小于正式芯片的容量。

Description

核酸测序方法、 系统及质控方法、 系统 技术领域
本发明涉及核酸测序技术领域, 特别是涉及一种核酸测序文库的质 控方法和质控系统, 以及一种核酸测序方法和核酸测序系统。 背景技术
高通量测序技术 ( High-throughput sequencing ), 又称 "下一代,, 测序技术 (Next-generation sequencing technology )。 以能一次并行 对几十万到几百万条 DNA分子进行序列测定和一般读长较短等为标志。 高通量测序技术堪称测序技术发展历程的一个里程碑。 该技术可以对数 百万个 DNA 分子进行同时测序。这使得对一个物种的转录组和基因组进 行细致全貌的分析成为可能。 因此也称其为深度测序(deepsequencing) 或下一代测序技术(next generation sequencing, NGS)。根据发展历史、 影响力、 测序原理和技术不同等主要有以下几种: 大规模平行签名测序 ( Massively Parallel Signature Sequencing, MPSS)、 聚合酶克隆 ( Polony Sequencing ). 454 焦石舞酸测序 ( 454 pyrosequencing ). 11 lumina (Solexa) sequencing ABI SOLiD sequencing. 离子半导体 测序 ( Ion semiconductor sequenc ing ). DNA纳米球测序 ( DNA nanoba 11 sequenc ing ) 等。
PGM测序平台是 Life Technologies公司推出的一种测序仪, 测序 时间短是其最大的特点, Ion Proton 是基于相同技术的高通量升级版测 序仪。
最早的 PGM测序文库建库方法数据量产出较低, 314芯片产量仅为 10M, 随着技术的不断更新, 测序仪通量和单张芯片的成本也有所增大, 相应的测序风险也在提高, 在这样的情况下, 如何对测序文库提前进行 有效的质控, 就成为一个突出的问题。
以 Life Technologies公司的 PGM测序平台的一种样品文库制备方 法 (参考 Ion XpressTMTemplate 200 Kit说明书) 为例, 建库过程如下: 首先将基因组 DNA按照 PGM样品制备方法打断成主带小于 500bp的 一系列 DNA片段;然后将因打断形成的粘性末端修复成平末端;再将 DNA 片段能与 y 端带有 "τ"碱基的并含有用于标记样品来源的标签序列的 接头连接; 连接产物用电泳法选择回收目的片段的分子量大小; 然后使 用乳液 PCR ( emu l s ion PCR , emPCR )技术扩增两端带有接头的 DNA片 段并对最后的 PCR产物进行纯化。
基于桥式扩增构建的测序文库可以用 Ag i lent 2100 , qPCR等进行 文库的质量控制, 如适用于 I l lumina So l exa测序平台的测序文库。 但 是涉及 emPCR构建得的文库的质量控制, 现在没有专门的仪器或方法, 而且随着技术的不断提升, Ion PGM测序仪通量也得到成倍提升, 而 Ion
Pro ton测序仪相比较于 PGM的 314芯片通量提高 100倍以上,他们的高 通量的特点对其测序文库的质控也提出了更高的要求。
发明内容
本发明的目的是针对现有技术的不足, 提供一种能够在采用高通量 测序系统进行核酸测序时, 有效检测出不合格文库的核酸测序文库质控 方法、 质控系统, 以及核酸测序方法和系统。
为实现上述目的, 本发明采用了以下技术方案:
本发明公开了一种核酸测序的质控方法, 所述核酸测序采用高通量 测序系统进行, 所述质控方法包括, 在使用正式芯片对样品文库进行正 式测序之前, 使用预测芯片对样品文库进行预测序, 根据预测序结果判 断样品文库是否合格, 不合格样品不进行正式测序, 所述预测芯片的容 量小于正式芯片的容量。
本发明还公开了一种核酸测序方法, 包括对测序文库进行测序的步 骤, 以及在此之前采用上述质控方法对测序文库进行质控的步骤。
本发明进一步公开了一种测序文库的质控系统, 包括预测序模块, 所述预测序模块中设置有预测序芯片, 该预测序模块用于在采用正式芯 片对文库进行正式测序之前利用预测序芯片对文库进行预测序, 预测序 结果可用于判断文库是否合格, 所述预测序芯片的容量小于正式测序芯 片的容量。
本发明同时公开了一种核酸测序系统, 包括正式测序模块, 用于采 用正式芯片对测序文库进行正式测序, 还包括上述的测序文库的质控系 统, 用于在正式测序之前对测序文库进行质控, 如果质控结果合格则进 行正式测序, 且在质控系统中采用预测芯片进行预测序的数据, 与正式 测序模块中正式测序得到的数据一起汇总共同作为有效测序数据; 如果 质控结果不合格, 则不进行正式测序。
本发明通过在采用高通量测序系统进行核酸测序时, 使用比正式芯 片容量更小的预测芯片进行预测序, 能够根据预测序的结果判断出样品 文库是否合格, 从而 选出不合格文库, 避免了不合格文库直接测序造 成的材料试剂、 时间及人工成本的浪费。 附图说明
图 1: 文库 DNA与 I SP连接反应示意图。 Em-PCR模板制备中文库 DNA 与 ISP连接, 发生反应的示意图。 其中灰色球状为 ISP, 黑色曲线表示 文库 DNA, 两端的长方形表示两端的接头 ( adapter )。
图 2: 314芯片与 316芯片读长的比较。 A: 314芯片; B: 316芯片。 其中横坐标表示读长数, 纵坐标表示该读长的碱基序列数。
图 3: 314芯片与 316芯片碱基序列质量的比较。 A: 314芯片; B: 316 芯片, C: 316 芯片 (未经质控的不合格文库)。 其中横坐标表示质量情 况, 纵坐标表示碱基序列数量。
图 4: 314 芯片与 316 芯片碱基分布的比较。 A: 314 芯片; B: 316 芯片, C: 316芯片 (未经质控的不合格文库)。 其中横坐标表示读长数, 纵坐标表示不同的碱基所占比例。 该图显示了每次测序中测到的各种碱 基比例 ( base percentage composition along reads )
图 5: 314芯片与 316芯片碱基质量随循环数变化趋势的比较。 A: 314 芯片; B: 316芯片, C: 316芯片 (未经质控的不合格文库)。 其中横坐标 表示读长数, 纵坐标表示在此循环中碱基的平均质量值。 具体实施方式
本发明是关于采用高通量测序系统进行核酸测序的质控方法、 系 统以及包括该质控方法的核酸测序方法和系统。
本发明的质控方法主要包括在使用正式芯片对样品文库进行正式 测序之前, 使用预测芯片对样品文库进行预测序, 根据预测序结果判断 样品文库是否合格, 不合格样品不进行正式测序, 所述预测芯片的容量 小于正式芯片的容量。
本发明的核酸测序方法, 包括对测序文库进行测序的步骤, 还包括 在对测序文库进行测序的步骤之前, 采用本发明的质控方法对测序文库 进行质控的步骤。 在本发明的测序方法中, 如果质控结果合格, 那么采 用预测芯片进行预测序的数据, 与后续正式测序得到的数据一起汇总共 同作为有效测序数据。 本发明的核酸测序方法, 还可以包括制备测序文 库的步骤, 所述制备测序文库的步骤包括, 将 DNA样品打断成片段后对 末端进行修复反应, 并与接头进行连接,再对目的片段进行乳液 PCR 扩 增, 之后回收目的片段, 得到测序文库。
本发明的测序文库的质控系统, 包括预测序模块, 其中设置有预测 用预测序芯片对文库进行预测序,预测序结果可用于判断文库是否合格, 预测序芯片的容量小于正式测序芯片的容量。
本发明的核酸测序系统, 包括正式测序模块以及上述的测序文库的 测序文库的质控系统用于在正式测序之前对测序文库进行质控, 如果质 控结果合格则进行正式测序, 且在质控系统中采用预测芯片进行预测序 的数据, 与正式测序模块中正式测序得到的数据一起汇总共同作为有效 测序数据; 如果质控结果不合格, 则不进行正式测序。
为了能够从预测序得到的数据中有效分析出样品文库是否合格, 预 片的容量为正式芯片容量的 1 % ~ 10%。
本发明的方法或系统所适用的高通量测序系统, 优选是使用乳液 PCR ( emul s i on PCR , emPCR ) 的高通量测序系统。 更优选的是目前常用 的 Ion Tor rent测序平台、 ABI SOL iD测序平台和 Roche 454测序平台。 其中, Ion Torrent 测序平台中的 I on PGM ( Ion Per sona l Genome Machine ) 和 Ion Pro ton尤其适用于本发明的质控方法。
在本发明的方法或系统中, 利用预测芯片对样品文库进行预测序 时, 可以一次仅检测一个样品文库, 也可以对多个不同来源的样品文库 通过增加 index标签序列, 混合在一起, 然后同时在一个预测芯片上进 行一次检测。混合检测得到的结果也能有效反应出各个样品文库的质量, 并据此判断其合格与否, 并不因多个样品文库混合同时检测而具有相比 较单一文库单独检测更低的准确性。
本发明的方法或系统最适合应用于采用 Ion PGM和 Ion Proton 系 统进行测序的过程。 由此, 与 Ion PGM和 Ion Proton相配套的芯片中, 具有最低容量的芯片 PGM 314芯片便可作为本发明方法或系统的首选预 测芯片。 Ion PGM的配套芯片包括 314芯片、 316芯片、 318芯片, 其容 量及市场价格可参考如下表 1:
表 1
Figure imgf000007_0001
在本发明一个具体的实施方式中, 基于目前 Life technologies公 司的 PGM及 Ion Proton测序平台提供的文库制备方法, 将一个基因组 DNA样品构建的文库, 如大肠杆菌基因组, 先在 314芯片 (容量 10M)上 进行预测序,得到预测序结果数据,分析预测序的结果数据来进行质控, 如果数据符合要求,则说明文库合格,再将此文库在 316芯片(容量 100M ) 上进行正式测序, 从而得到良好的数据效果。 在此基础上, 对于多个不 同来源的文库, 可通过在文库中加入 index序列对多个文库同时进行质 控, 再分别上机或混合上机(根据测序仪测序通量及所需要的数据量来 选择), 得到预期的数据结果。 对于 Ion Proton 测序平台, 它是 Life Technologies公司在继 Ion PGM之后最新推出的新一代测序仪。 它的上 市时间是 2012年 9月。 Ion Proton测序平台没有专门的质控方法, 其 芯片类型有 PI和 ΡΠ, 容量都大于 1G, 价格也远高于 Ion 314、 316或 318 芯片, 各芯片都是一次性使用。 由于容量相对低但价格绝对低廉的 Ion 314、 316或 318芯片可兼容于 Ion Proton平台, 由此, 利用本发 明所述的质控方法,在利用 PI或 ΡΠ进行正式测序以前,采用容量更小 的 Ion 314、 316或 318芯片优选 314芯片进行预测序, 可在预测序过程 中付出相对小的代价 选出绝大部分不合格文库,免除大部分不合格文 库直接上机而造成的损失。
本发明可同样灵活地运用于文库构建时涉及 emPCR过程的文库的质 控上, 如应用于 Life technologies (applied biosys terns) SOLiD 和 Roche 454测序平台。 这两种高通量测序平台各自都只有 1种类型的芯 片, 也没有专门的质控方法, 测序时需要人工或机器实时监控。 利用本 发明质控方法,如设计或购买与 S01iD、 454或 PGM平台兼容的且容量仅 为所需数据量至少 1%的芯片来进行文库质控,可检测出大部分不合格文 库,免除不合格文库直接测序造成的材料试剂、 时间及人工成本的浪费, 有很强的实用性。 在 SOliD及 Roche 454测序平台上, Ion 314芯片也 适合作为预测芯片用于质控的预测序过程, 这时预测序过程可以在 Ion Torrent 测序平台上进行。 如果放宽费用的考虑, life technologies 316、 318芯片可以代替 314芯片用于此处的预测序。
本发明的方法或系统中, 所检测的文库, 其制备方法可以基于目前 各测序平台所提供的标准文库制备方法来进行, 比如基于目前 Life technologies公司的 PGM测序平台提供的文库制备方法。 首先将总 DNA 样品利用机械法或酶切法打断成一定长度的片段, 然后对末端进行修复 反应, 并与接头进行连接。 目的片段与接头连接后, 再通过特定的 PCR 引物对目的片段进行 Em-PCR扩增,最后通过琼脂糖电泳并切胶回收目的 片段文库。
在本发明的一个具体实施方式中, 将构建好的文库(参见实施例 1, 使用大肠杆菌基因组 DNA为材料构建的人类 pair-end DNA标签文库), 按设计目的进行比例混合后, 使用 PGM 314芯片对文库进行预测序, 以 预测文库的质量和定量浓度, 并与使用相同读长的 316芯片比较质量值 的变化(即看这个文库在测序总读长分别为 100和 200个碱基的情况下, 比较前 100循环的文库质量值变化)。 质量值 (Q-Value) 可以反映测序 质量, 介于 0-40之间, 在此范围内, 越高表示质量越好。 Q20是指质量 值大于 20的碱基在所有碱基中所占的比例,可以反映测序出来的序列质 量好坏, 数值越接近 1, 说明测序质量越好。 使用 314 芯片的文库 Q20 平均值在 80.7%, 使用 316芯片质量值一直都维持在 60.9% (如图 3 ), 两者的差异在 20%,这是由于 314芯片读长较短( 314和 316芯片的测序 总读长分别为 100和 200个碱基),质量会较 316芯片更好; 也从另一方 面反映出, 如果以 314芯片作为 316芯片的质控用芯片, 则 314芯片的 Q20需降低一定数值。本方法中使用的 314芯片测序读长为 100个碱基, 316芯片测序读长为 200个碱基, 测序读长增加会造成测序质量的下降, 表现为 Q20值的降低, 这个降低的趋势在不同读长不同产量的芯片上是 一样的, 由此可以利用产量较低成本也较低的 314芯片测序质量变化, 看出相同文库在 316芯片上质量变化情况。 读簇碱基分布及其循环变化 趋势也体现出相似的质量情况。
在本发明的上述具体实施方式中, 对于采用例如 314芯片作为预测 芯片进行质控时, 文库质量是否合格, 本领域技术人员可根据 314芯片 的预测序结果通过经验值判断, 通常而言, 对于 Illumina hiseq2000 测序平台 100PE ( pair-end)文库构建的经验, Q20大于 80%可判断为文 库合格。 这样在本发明中, 采用 314芯片作为预测芯片时, 在采用经验 值判断的基础上, 也可以直接确定便于操作的判断标准, 就是将质控结 果 Q20> 80%的文库确定为合格文库, 可用于后一步正式测序, 反之则不 合格, 可避免后步大容量芯片的浪费。 并且, 对于质控合格的文库, 采 用预测芯片进行预测序的数据, 也是有效数据, 可汇总入后续正式测序 得到的数据中, 用于后续分析。
图 3、 图 4、 图 5分别给出了 314芯片与 316芯片碱基序列质量的 比较图、 碱基分布的比较图、 碱基质量随循环数变化趋势的比较图, 其 中 A: 314芯片; B: 316芯片 (经质检合格的文库), C: 316芯片 (未经 质控的不合格文库)。 由这三幅图可清楚地看到,质检合格及不合格的文 库,其无论是碱基序列质量、碱基分布还是碱基质量随循环数变化趋势, 均具有相当的区别。图 A为 lOObp读长的 314芯片测序结果,图 B为 200bP 读长的 316芯片正常文库测序结果, 图 C为 200bp读长的 316芯片正常 文库测序结果。 从图 3结果来看, 随着读长增加, 横坐标质量值也表现 出不同趋势, 合格文库在前两个图中高质量值数据都高于不合格文库数 据, 不合格文库总体 Q20值也低于合格文库。 在图 4中, 合格与不合格 文库的差异更为明显, 合格文库的碱基为均勾分布的, 且这个趋势和读 长是同步的, 而不合格文库的碱基分布呈现明显波动。 图 5反映了纵坐 标质量值随横坐标读长的变化趋势热度图, 颜色越浅表示碱基在该处的 分布比例较高,在合格文库中, 高质量的碱基比例明显高于不合格文库, 并且 314芯片和 316芯片具有一致的变化趋势。
l i fe techno l og ies 的 Ion prot on , 由于 Ion pro ton测序平台的 技术基础和 PGM完全一致, 所以可以同样用成本低廉的 314芯片和 PGM 测序平台作为 I on prot on 的质控手段, 此方法可同时应用于其他使用 em-PCR技术的测序平台,如 ABI的 SOL iD测序平台和 Roche的 454测序 平台。 下面通过具体实施方式结合附图对本发明作进一步详细说明。
在本申请的实施例中所用试剂和仪器的列表:
主要实验仪器列表
表 2
仪器名称 型号 厂家
热循环仪(PCR仪) Veriti Thermal ABI
Cycler
NanoDrop 1000 ( DNA浓度 Spectrophotometer Thermo Fisher 检测仪器) Scientific
电泳槽 DYCP-31DN 北京六一仪器厂 电泳仪 DYY-6C 北京六一仪器厂 凝胶成像系统 Tanon 上海天能科技有限公 司
DarkReader TransLife D195M Clare Chemical technologiestor (切胶仪器) Reasearch
Covaris打碎仪 S-2 Covaris
Thermo mixer (力口热混匀仪 Thermomixer Eppendorf
哭 ) comfort
^[氐温离心机 5417R Eppendorf
台式离心才几 5418 Eppendorf
台式离心才几 SVC-75004334 Heraeus
微波炉 MM721AAU. 美的
热循环仪(PCR仪) BS 124S Sartorius 试剂
表 3
试剂名称
10 mM dNTP Mix ( 10 mM dNTP 混合液) part # 1000564
DNA Polymerase I ( DNA聚合酶 I ) part # 1000577
5xT4 DNA Ligase Buffer ( 5xT4 DNA连接酶緩 part # 1000581 冲液 )
T4 DNA Ligase ( T4 DNA连接酶) part # 1000580
10x Restriction Buffer ( 10x限制性酶切緩冲液 ) part # 1000583
5 x Phusion HF Buffer (5x Phusion高保真酶緩冲 part # 1000585 液)
2xPhusion Polymerase (Phusion高保真酶) part # 1000584
25 mM dNTP Mix(25 mM dNTP混合液) part # 1001663
25 bp Ladder part # 1001662
1 Ox Gel Elution Buffer(10x溶胶緩冲液) part # 1000571
Resuspension Buffer (溶解緩冲液 ) part # 1001388
Sera- mag Magnetic Oligo(dT) Beads(01igo(dT) part # 1002545 磁珠)
Ultra Pure Water (超纯水) part # 1000467
10x Polynucleotide Kinase Buffer B904(Enzymatics) lOx blue buffer B011 (Enzymatics) dATP P0756L(NEB)
2x Rapid ligation buffer B 101 (Enzymatics)
Index PE Adapter Oligo Mix
注: 若实验中所列试剂未在上表中, 则为 Life technologies PE DNA 样品制备试剂盒( Ion OneTouch™ System Template Kit, 4468660 , 购自 Life technologies ) 内试剂。 实施例 1 PGM测序非标签文库的构建具体实例
以下是按照 life technology公布的实验流程操作的常规步骤。
1. PGM测序片段文库构建
1) 全基因组打断
将 Ecoli g DNA打断成为目标长度片段。使用 Covaris® S2 System 进 成分 浓度 取样体积
Ecoli 100ng/ul ΙΟμΙ
H20 90μ1
总量 ΙΟΟμΙ
Figure imgf000012_0001
打断产物纯化 QIAquick PCR Purification Kit 回收纯化 , 溶于 40ul EB。
2) 末端修复反应
Ecoli故 DNA末端 4爹复反应 ,体系 口下 ( Use buffer and enzyme mix supplied in the Ion Xpress™ Plus Fragment Library Kit ):
反应体系 200μ1, 其组成是:
表 6
试剂 体积 /反应
片段化 DNA 39
Nuclease-free Water 119 L
5X End Repair Buffer 40 End Repair Enzyme 2 μΐ^
里 200
反应条件为: 室温 孵育 20min
.纯化, 溶于 25μ1的
EB ( QIAGEN Elution Buffer ) 中
3) 连接接头 (adapter )反应
文库 DNA的连接接头 ( adapter )反应 , 体系 口下 ( Use reagents supplied in the Ion Fragment Library Kit ):
反应体系 lOOul, 其组成是:
表 7
Figure imgf000013_0001
反应条件为: 25 °C 15min , 72 °C 5min, 4°C∞
反应产物经 1.8倍体积 Ampure Beads(Beckman Coulter Genomics) 纯化回收纯化, 溶于 20μ1的 EB中。
4) 目的片段选择
上一步骤中纯化后的 DNA在 2.0%的回收胶中电泳。 条件为 100V , 2h。 选择目的片段 180-200bp或是 280-300bp进行切胶回收。 回溶 40μ1ΕΒ中。
2. Em-PCR模板制备 2. 1 200bp文库参照 Ion Xpress™ Template 200 Kit说明书操作规范 以下试剂源自 Ion Xpress™ Template 200 Kit
1) 确定合适的文库浓度
以 Agilent 2100检测结果为准, 将上一步制备好的文库进行稀释, 最终浓度达到每 18μ1中含有 280* 106个分子,即满足 280* 106molecules per reaction (280* 106 ISP/reaction)
2) 生成油包水 ISP模板 :
参照 Ion Xpress™ Template 200 Kit说明书操作规范,分别制备 IKA DT-20 油相 ( 9ml ) 和 ISP,以及 PCR水相 MIX , PCR水相 MIX组成成 分是:
表 8
Figure imgf000014_0001
最后将稀释合格的文库( 18μ1/反应)与 PCR水相 MIX混匀, 进行 PCR反应。 反应程序如下:
表 9
阶段 步骤 温度 时间
保持 变性 94 °C 6分钟
循环 变性 94 °C 30秒
(40cycles) 退火 58 °C 30秒
延伸 72 °C 90秒
循环 变性 94 °C 30秒
(l Ocycles) 延伸 68 °C 6min 保持 - 10。C 00
3) 制备 ISPs单链模板
文库中 DNA与 ISP连接并复制 (附图 1 ), 反应产物为 ISPs油包水 的状态。 产物富集后, 加入带有生物素的 My one Beads与 ISPs 扩增 产物特异性结合然后加入裂解液 Melt-off solution将 ISPs上的 DNA 模板由双链变为单链。, 从而获得单链 ISPs。
Melt-off solution成分: ¾口下:
表 10
Figure imgf000015_0001
经 Qubit 2.0 ( Invitrogen公司)检测合格满足上机测序要求。
2.2 100bp 文库参考 IonOne Touch System操作 (以下试剂源自 the Ion One Touch™ Template Kit )
1) 确定合适的文库浓度
以 2100检测结果为准, 将上一步制备好的文库进行稀释, 最终浓度 达到每 5μ1 中含有 280*106个分子,即满足 160*106molecules per reaction (160*106 ISP/reaction)
2) 生成油包水 ISP模板
参考 Ion One Touch™ Template Kit操作说明 ,将油相和回收 ISP需要用 的液体各 50ml安装在 One Touch自动化操作系统后, 安置配套的 PCR 反应板, 并配置 PCR水相 mix 。 PCR水相 mix成分如下:
表 11
顺序 试剂 体积 ( ΐ )
1 Nuclease-free water 595
2 Ion One Touch™Reagent Mix 200 3 Ion One Touch™Enzyme Mix 100
4 Diluted library 5
5 Ion Sphere™ Particles 100
里 900
^!寻配置好的 mix 放置 One Touch system 上, 点击开始运行, 自动 4匕 操作系统即开始进行 PCR, 文库 DNA与 ISP连接并复制。 在 PCR 程序结束后, 自行进行 ISP的富集工作
3) 制备 ISPs单链模板
文库中 DNA与 ISP连接并复制 (附图 1 ), 反应产物为 ISPs油包水 的状态。 产物富集后, 加入带有生物素的 My one Beads与 ISPs 扩增 产物特异性结合然后加入裂解液 Melt-off solution将 ISPs上的 DNA 模板由双链变为单链。, 从而获得单链 ISPs。
这一步骤参照 Ion One Touch™ Template Kit操作说明, 将试剂放于指 定位置, 由 One Touch 自动化系统中的机器 ES完成。
Melt-off solution成分: ¾口下:
表 12
Figure imgf000016_0001
单链 ISP经 Qubit 2.0 ( Invitrogen公司)检测合格满足上机测序要求 后进行下一步骤。 实施例 2 所构建文库的测序
使用实施例 1所得的文库,分别使用不同的测序芯片(314/316),安排 在 PGM进行测序 (严格按照仪器推荐的流程操作)。
测序操作流程详见 PGM操作说明书。 100bp 文库采用 100bp测序 试剂, 200bp 文库采用 200bp 测序试剂。 安装对应的测序芯片 (如 314 芯片, 316芯片, 318芯片等)
数据中 lOObp文库和 200bp文库分别采用 314芯片和 316芯片。 在芯片上加入酶和制备好的单链 ISPs进行测序。 其中, 314芯片作为预 测芯片, 经 314芯片预测后结果为文库合格(Q20平均值在 80.7% ), 之 后再采用 316芯片进行正式测序。 在以上具体实施方式中, 利用本发明的质控方法, 使用 314和 316 两种芯片进行测序。 使用 314芯片时, 平均读长为 lOObp, Q20平均值 在 80.7%, 使用 316芯片测序时, 平均读长为 200bp, 质量值一直都维 持在 60.9% (如图 3 ), 两者的差异在 20%, 这是由于 314芯片读长较短, 质量会较 316芯片更好;也从另一方面反映出,如果以 314芯片作为 316 芯片的质控用芯片, 则 314芯片的 Q20需降低一定数值 (如 20% ), 可 有效反映文库的质量情况。 如果使用标签序列对不同文库进行标记, 则 可以在低成本的 314芯片上同时检测多个不同文库的质量。
以下表格写明了各芯片在不同读长的情况下测序的预期运行时间 以及预期输出的数据量。 由表格看出, 低容量芯片运行所需时间较短且 其容量满足多个文库混合质控所需数据量, 可节省对不合格文库直接测 序花费的时间。 如 10个文库混合采用 314芯片进行质控则只需要花费 1.5h, 里边有一文库不合格, 不合格文库直接 318测序花费 2.4h, 节省 0.9h; 若没有一文库不合格, 这质控数据可直接作为测序数据使用, 也 没有多花费时间。
表 13
Figure imgf000017_0001
参考文献 1. Ion Xpress™ Template 200 Kit说明书 . Life technologies.
2. Ion OneTouch™ Template Kit操作说明 . Life technologies. 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人 员将会理解。 根据已经公开的所有教导, 可以对那些细节进行各种修 改和替换, 这些改变均在本发明的保护范围之内。 本发明的全部范围 由所附权利要求及其任何等同物给出。

Claims

权 利 要 求
1、 一种测序文库的质控方法, 所述质控方法包括, 在使用正式芯片 对文库进行正式测序之前, 使用预测芯片对文库进行预测序, 根据预测 序结果判断文库是否合格, 不合格文库不进行正式测序, 所述预测芯片 的容量小于正式芯片的容量。
2、 根据权利要求 1所述的测序文库的质控方法, 其特征在于: 所述 预测芯片的容量至少为正式芯片容量的 1%。
3、 根据权利要求 2所述的测序文库的质控方法, 其特征在于: 所述 预测芯片的容量为正式芯片容量的 1% ~ 10%。
4、 根据权利要求 1 ~ 3任意一项所述的测序文库的质控方法, 其特 征在于: 所述测序文库的构建包括乳液 PCR过程。
5、 根据权利要求 4所述的测序文库的质控方法, 其特征在于: 所述 预测序为利用预测芯片对一个文库或者混合文库进行检测。
6、 根据权利要求 5所述的测序文库的质控方法, 其特征在于: 所述 混合文库带有标签序列标记。
7、 根据权利要求 5所述的测序文库的质控方法, 其特征在于: 所述 测序在高通量测序系统上进行, 所述高通量测序系统选自 Ion Torrent 测序平台、 ABI SOL iD测序平台、 Roche 454测序平台中的至少一种; 所 述 Ion Tor rent测序平台包括 I on PGM和 I on Prot on。
8、 根据权利要求 7所述的测序文库的质控方法, 其特征在于: 所述 预测芯片为 PGM 314芯片、 316芯片和 318芯片中的至少一种, 所述预 测序在所述 I on Torrent测序平台上进行。
9、 根据权利要求 8所述的测序文库的质控方法, 其特征在于: 所述 预测序结果若 Q20大于 80% , 则判断文库为合格文库, 所述 Q20是指质 量值大于 20的碱基在所有碱基中所占的比例。
10、 一种核酸测序方法, 包括对测序文库进行测序的步骤, 其特征 在于: 在对测序文库进行测序的步骤之前, 还包括对测序文库进行质控 的步骤,所述对测序文库进行质控是采用权利要求 1 ~ 9中任意一项所述 的质控方法进行。
11、 根据权利要求 10所述的核酸测序方法, 其特征在于: 在所述测 序方法中, 如果质控结果合格, 那么采用预测芯片进行预测序的数据, 与后续正式测序得到的数据一起汇总共同作为有效测序数据。
12、 根据权利要求 11所述的核酸测序方法, 其特征在于: 还包括制 备测序文库的步骤, 所述制备测序文库的步骤包括, 将 DNA样品打断成 片段后对末端进行修复反应,并与接头进行连接,再对目的片段进行乳液 PCR扩增, 之后回收目的片段, 得到测序文库。
13、 一种测序文库的质控系统, 其特征在于: 包括预测序模块, 所 述预测序模块中设置有预测序芯片, 该预测序模块用于在采用正式芯片 对文库进行正式测序之前利用预测序芯片对文库进行预测序, 预测序结 果可用于判断文库是否合格, 所述预测序芯片的容量小于正式测序芯片 的容量。
14、 根据权利要求 13所述的系统, 其特征在于: 所述预测芯片的容 量至少为正式芯片容量的 1 %。
15、 根据权利要求 14所述的系统, 其特征在于: 所述预测芯片的容 量为正式芯片容量的 1% ~ 10%。
16、 根据权利要求 13 ~ 15中任意一项所述的系统, 其特征在于: 所 述测序文库的构建包括乳液 PCR过程。
17、 根据权利要求 16所述的系统, 其特征在于: 所述预测序为利用 预测芯片对一个文库或者混合文库进行检测。
18、 根据权利要求 17所述的系统, 其特征在于: 所述混合文库带有 标签序列标记。
19、 根据权利要求 17所述的系统, 其特征在于: 所述测序在高通量 测序系统上进行, 所述高通量测序系统选自 I on Torrent测序平台、 ABI SOL iD测序平台、 Roche 454测序平台中的至少一种; 所述 Ion Torrent 测序平台包括 Ion PGM和 I on Pro ton。
20、 根据权利要求 19 所述的系统, 其特征在于: 所述预测芯片为 PGM 314 芯片、 316 芯片和 318 芯片中的至少一种, 所述预测序在所述 Ion Tor rent测序平台上进行。
21、 根据权利要求 20所述的系统, 其特征在于: 所述预测序结果若 Q20大于 80% , 则判断文库为合格文库, 所述 Q20是指质量值大于 20的 碱基在所有碱基中所占的比例。
22、 一种核酸测序系统, 包括正式测序模块, 用于采用正式芯片对 测序文库进行正式测序,其特征在于: 还包括权利要求 13 ~ 21中任意一 项所述的测序文库的质控系统, 用于在正式测序之前对测序文库进行质 控, 如果质控结果合格则进行正式测序, 且在质控系统中采用预测芯片 进行预测序的数据, 与正式测序模块中正式测序得到的数据一起汇总共 同作为有效测序数据; 如果质控结果不合格, 则不进行正式测序。
PCT/CN2012/084757 2012-11-16 2012-11-16 核酸测序方法、系统及质控方法、系统 WO2014075296A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280076470.5A CN104822842A (zh) 2012-11-16 2012-11-16 核酸测序方法、系统及质控方法、系统
PCT/CN2012/084757 WO2014075296A1 (zh) 2012-11-16 2012-11-16 核酸测序方法、系统及质控方法、系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/084757 WO2014075296A1 (zh) 2012-11-16 2012-11-16 核酸测序方法、系统及质控方法、系统

Publications (1)

Publication Number Publication Date
WO2014075296A1 true WO2014075296A1 (zh) 2014-05-22

Family

ID=50730515

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/084757 WO2014075296A1 (zh) 2012-11-16 2012-11-16 核酸测序方法、系统及质控方法、系统

Country Status (2)

Country Link
CN (1) CN104822842A (zh)
WO (1) WO2014075296A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016044328A1 (en) * 2014-09-18 2016-03-24 The Regents Of The University Of California Single-molecule phenotype analysis

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102316648B1 (ko) 2018-01-05 2021-10-22 일루미나, 인코포레이티드 시퀀싱 시스템에서의 시약 냉각기 불안정성 및 플로우 셀 히터 고장의 예측
US11288576B2 (en) * 2018-01-05 2022-03-29 Illumina, Inc. Predicting quality of sequencing results using deep neural networks
CN109629008B (zh) * 2018-12-29 2021-12-03 艾吉泰康生物科技(北京)有限公司 二代测序建库试剂组分质控方法及使用的模板组合

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AIRD D. ET AL: "ANALYZING AND MINIMIZING PCR AMPLIFICATION BIAS IN ILLUMINA SEQUENCING LIBRARIES", GENOME BIOLOGY, 2011, pages 1 - 14 *
CHIEN Z. ET AL: "CONSTRUCTION OF 12 MULTIPLEX TRANSCRIPTOME LIBRARIES FOR ROCHE/454 PYROSEQUENCING PLATFROM", JOURNAL OF XIAMEN UNIVERSITY (NATURAL SCIENCE), vol. 51, no. 4, July 2012 (2012-07-01), pages 774 - 781 *
KIDDER B.L.ET AL: "CHIP-SEQ:TECHNICAL CONSIDERATIONS FOR OBTAINING HIGH-QUALITY DATA", NATURE IMMUNOLOGY, vol. 12, no. 10, 2011, pages 918 - 922, XP055240604 *
WANG SHENGYUE: "PROSPECTS OF CLINICAL APPLICATION OF NEW GENERATIONAL HIGH FLUX SEQUENCING TECHNIQUE", GUANGDONG MEDICAL JOURNAL, vol. 31, no. 3, February 2010 (2010-02-01), pages 269 - 272 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016044328A1 (en) * 2014-09-18 2016-03-24 The Regents Of The University Of California Single-molecule phenotype analysis
US11634706B2 (en) 2014-09-18 2023-04-25 The Trustees of the California State University Single-molecule phenotype analysis

Also Published As

Publication number Publication date
CN104822842A (zh) 2015-08-05

Similar Documents

Publication Publication Date Title
Hayashi et al. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs
US20240150828A1 (en) Sequencing methods and compositions for prenatal diagnoses
RU2752700C2 (ru) Способы и композиции для днк-профилирования
CN111254190B (zh) 一种血浆病毒组学的纳米孔三代测序检测方法
WO2012037880A1 (zh) Dna标签及其应用
CN108138365A (zh) 一种高通量的单细胞转录组建库方法
Holmberg et al. Akonni TruTip® and Qiagen® methods for extraction of fetal circulating DNA-evaluation by real-time and digital PCR
WO2023030259A1 (zh) 一种基于二代测序技术检测微单倍型基因座的引物组合物、试剂盒和方法及其应用
WO2014075296A1 (zh) 核酸测序方法、系统及质控方法、系统
WO2017093561A1 (en) Method for non-invasive prenatal testing
CN108300790B (zh) 基于165个y-snp的法医学二代测序试剂盒
WO2016045105A1 (zh) Pf快速建库方法及其应用
CN109295500B (zh) 一种单细胞甲基化测序技术及其应用
CN109161587A (zh) 一种检测染色体重复片段断裂位点和定位信息的方法
Morrissy et al. Digital gene expression by tag sequencing on the illumina genome analyzer
TW201321520A (zh) 用於病毒檢測的方法和系統
WO2023155847A1 (zh) 用于构建检测染色体拷贝数变异的测序文库的方法和试剂盒
Ezer et al. Generation of RNA sequencing libraries for transcriptome analysis of globin-rich tissues of the domestic dog
CN113186291B (zh) 基于多重pcr的引物组和试剂盒
WO2022246783A1 (zh) 鉴别或辅助鉴别哺乳动物物种的探针组合物及其试剂盒与应用
Carbonell-Sala et al. CapTrap-Seq: A platform-agnostic and quantitative approach for high-fidelity full-length RNA transcript sequencing
WO2020135650A1 (zh) 一种基因测序文库的构建方法
CN113293200B (zh) 一种降低或消除二代测序中扩增产物污染的方法及应用
Roth Application of Synthetic External RNA Controls in a Targeted Hybrid Capture Assay
Liu et al. Integrative lncRNA, circRNA, and mRNA analysis reveals expression profiles of six forensic body fluids/tissue

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12888433

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 04/11/2015)

122 Ep: pct application non-entry in european phase

Ref document number: 12888433

Country of ref document: EP

Kind code of ref document: A1