HK1237002B

HK1237002B - Primer for nucleic acid random fragmentation and nucleic acid random fragmentation method

Info

Publication number: HK1237002B
Application number: HK17110727.0A
Authority: HK
Inventors: 耿春雨; 韩鸿雁; 郭冠瑛; 章文蔚; 蒋慧; 江媛
Original assignee: 深圳华大智造科技股份有限公司
Filing date: 2014-10-17
Publication date: 2022-07-08

Description

A primer for random nucleic acid fragmentation and a method for random nucleic acid fragmentation

技术领域Technical Field

本申请涉及核酸破碎处理领域，特别是涉及一种用于核酸随机片段化的引物，以及基于该引物的核酸随机片段化的方法。The present application relates to the field of nucleic acid fragmentation processing, and in particular to a primer for random nucleic acid fragmentation, and a method for random nucleic acid fragmentation based on the primer.

背景技术Background Art

自从罗氏发明了焦磷酸测序方法开辟了二代测序以来，直至现在，二代测序经历了一段高速发展期。但随着高通量测序的发展，高通量和低成本的样本制备环节逐渐成了测序领域的一个重点考虑的因素。各种原理的样本处理方法及自动化装置不断被研发出来，主要包括：样本片段化、核酸分子的末端处理及接头连接等。Since Roche invented pyrosequencing, pioneering next-generation sequencing (NGS), it has experienced rapid growth. However, with the development of high-throughput sequencing, high-throughput and low-cost sample preparation has become a key consideration in the sequencing field. Various sample processing methods and automated devices have been developed, primarily for sample fragmentation, nucleic acid end-processing, and adapter ligation.

其中样本片段化主要分为物理方法，如超声剪切，或酶学方法，即非特异的核酸内切酶处理，两种方法来实现。其中物理方法以基于专利的自适应聚焦超声（AdaptiveFocused Acoustic，AFA）技术的Covaris为主。在等温的条件下，利用几何聚焦声波能量，通过>400kHz的球面固态超声传感器，将波长为1mm的声波能量聚焦在样品上。该方法确保了核酸样品的完整性得以保留，并能实现高回收率。Covaris的仪器包括经济的M系列、单管全功率的S系列以及更高通量的E和L系列。基于物理方法打断的片段随机性良好，但是通量上也要依赖大量的Covaris打断仪，同时需要后续单独进行末端处理、加接头和PCR以及各种纯化操作。其中酶学方法有一种NEB公司推出的NEB Next dsDNAFragmentase。该试剂首先在双链DNA产生随机的切刻位点，然后通过另一种酶识别切刻位点来切割互补的DNA链，从而实现打断的目的。这种试剂可以用于基因组DNA、全基因组扩增产物和PCR产物等，随机性也较好，但是会产生一些人工短片段插入和缺失，同时也不可避免的需要后续单独进行末端处理、加接头和PCR以及相应的纯化操作。另外以Epicentra公司的Nextera试剂盒领衔的转座酶打断试剂盒，利用转座酶同时完成DNA片段化和接头的添加，从而减少样品处理的时间。Sample fragmentation is primarily achieved through physical methods, such as ultrasonic shearing, or enzymatic methods, which involve non-specific endonuclease treatment. The primary physical method is the Covaris, based on patented Adaptive Focused Acoustic (AFA) technology. Under isothermal conditions, this method geometrically focuses acoustic energy, with a wavelength of 1 mm, onto the sample via a spherical solid-state ultrasonic transducer operating at >400 kHz. This method ensures the integrity of nucleic acid samples and achieves high recovery rates. Covaris instruments include the economical M series, the single-tube, full-power S series, and the higher-throughput E and L series. While fragmentation based on physical methods offers high randomness, throughput relies on a large number of Covaris fragmentation instruments, requiring subsequent end-treatment, adapter addition, PCR, and various purification steps. Enzymatic methods include NEB Next dsDNAFragmentase, launched by NEB. This reagent first creates random nick sites in double-stranded DNA, then a second enzyme recognizes the nick sites and cleaves the complementary DNA strand, achieving fragmentation. This reagent can be used for genomic DNA, whole-genome amplification products, and PCR products, and has good randomness. However, it can produce some artificial short insertions and deletions, and inevitably requires subsequent end-treatment, adapter addition, PCR, and corresponding purification operations. Transposase fragmentation kits, led by Epicentra's Nextera kit, utilize transposase to simultaneously fragment DNA and add adapters, thereby reducing sample processing time.

从各种操作的简便性来看，转座酶打断的方式无疑在通量及操作简便性上远远胜过其它方法，但是这种打断方式也有自身的缺点：转座酶实现转座依赖特定的19bp Me序列。因此，虽然转座酶可以通过包埋两种完全不同的接头序列而在靶序列的5’端和3’端加上不同的接头序列，但是接头均需要含有Me特定序列，从而带来的一个影响即打断所产生的片段的两端会对称的各有一个Me序列，并且由于转座酶的特殊作用使得目的序列或打断片段与Me序列之间存在一个9nt碱基缺失的缺口。靶序列邻近的两端完全一致的Me序列会对下游的一些技术应用带来影响，比如基于连接法的二代测序技术，同一条链两侧的Me序列为互补的序列，从而容易引起单链分子内部出现退火而不利于锚定引物的结合。From the perspective of operational simplicity, transposase fragmentation undoubtedly surpasses other methods in terms of throughput and operational ease. However, this fragmentation method also has its own drawbacks: transposase relies on a specific 19-bp Me sequence for transposition. Therefore, while the transposase can add different adapter sequences to the 5' and 3' ends of the target sequence by embedding two completely different adapter sequences, both adapters must contain a specific Me sequence. This results in fragments with symmetrical Me sequences at either end. Furthermore, due to the specific action of the transposase, a 9-nt base gap is missing between the target sequence or the fragmented fragment and the Me sequence. Identical Me sequences at both ends of the adjacent target sequence can impact downstream applications, such as ligation-based next-generation sequencing, where the Me sequences on either side of the same strand are complementary, which can easily cause annealing within the single-stranded molecule and hinder the binding of the anchor primer.

曾经有相关专利申请CN 102703426 A提出了一种解决办法，即将打断后的序列进行特定内切酶酶切，从而去除9nt序列和Me序列，这种方法只是利用了转座酶打断的优势将核酸序列进行随机打断，但是引入了后续接头需要单独进行加入的缺点，步骤繁琐而不易于更高通量的应用。A related patent application, CN 102703426 A, once proposed a solution, which is to subject the interrupted sequence to a specific endonuclease digestion to remove the 9nt sequence and Me sequence. This method simply takes advantage of the transposase interruption to randomly interrupt the nucleic acid sequence, but introduces the disadvantage that subsequent linkers need to be added separately. The steps are cumbersome and not easy to apply to higher throughput.

发明内容Summary of the Invention

本申请的目的是提供一种新的用于核酸随机片段化的引物，在此基础上，提供了一种新的DNA双随机引物破碎方法，即核酸随机片段化的方法。The purpose of this application is to provide a new primer for random nucleic acid fragmentation, and on this basis, to provide a new DNA double random primer fragmentation method, that is, a method for random nucleic acid fragmentation.

为了实现上述目的，本申请采用了以下技术方案：In order to achieve the above objectives, this application adopts the following technical solutions:

本申请一方面公开了一种用于核酸随机片段化的引物，该引物由若干条上游随机引物和若干条下游随机引物组成；上游随机引物的序列组成为：5’-X-Y-3’；下游随机引物的序列组成为：5’-P-Y’-X’-close-3’；其中，Y和Y’为随机序列，X为测序平台的5’端接头的全部或部分序列，X’为测序平台的3’端接头的全部或部分序列，P为磷酸化修饰，close为用于阻止3-5磷酸二酯键形成的封闭修饰。On the one hand, the present application discloses a primer for random nucleic acid fragmentation, which consists of several upstream random primers and several downstream random primers; the sequence composition of the upstream random primer is: 5'-X-Y-3'; the sequence composition of the downstream random primer is: 5'-P-Y'-X'-close-3'; wherein Y and Y' are random sequences, X is the entire or partial sequence of the 5' end adapter of the sequencing platform, X' is the entire or partial sequence of the 3' end adapter of the sequencing platform, P is a phosphorylation modification, and close is a blocking modification used to prevent the formation of a 3-5 phosphodiester bond.

需要说明的是，本申请的上游随机引物和下游随机引物与通常的PCR引物不同，根据设计，上游随机引物和下游随机引物是杂交结合在同一条模板链上的，并且，在延伸阶段，只有最接近的上游随机引物和下游随机引物之间，通过上游随机引物向3’端延伸，将上游随机引物和下游随机引物之间填满，由于下游随机引物的5’端具有磷酸化修饰，上游随机引物的延伸序列的3’端能与下游随机引物的5’端连接起来，即将上游随机引物及其延伸序列与下游随机引物连成一段；由于上游随机引物和下游随机引物都是随机杂交到模板上的，因此，可以实现DNA样品的随机打断破碎。不同于常规的随机引物对DNA样品进行扩增的是，本申请的引物中，只有上游随机引物延伸，并且，延伸至下游随机引物处就立即停止，不会像随机引物对DNA样品进行扩增那样，在第二轮扩增时一条上游引物的扩增链可能对应多个下游引物，从而扩增出一系列的弥散条带，影响结果的准确性。It should be noted that the upstream random primer and the downstream random primer of the present application are different from conventional PCR primers. According to the design, the upstream random primer and the downstream random primer are hybridized and bound to the same template chain. Moreover, during the extension stage, only the space between the upstream random primer and the downstream random primer closest to each other is filled by the upstream random primer extending toward the 3' end. Since the 5' end of the downstream random primer is phosphorylated, the 3' end of the extended sequence of the upstream random primer can be connected to the 5' end of the downstream random primer, that is, the upstream random primer and its extended sequence are connected to the downstream random primer into a section. Since the upstream random primer and the downstream random primer are both randomly hybridized to the template, random fragmentation of the DNA sample can be achieved. Unlike conventional random primer amplification of DNA samples, in the primers of the present application, only the upstream random primer is extended, and the extension stops immediately when it reaches the downstream random primer. Unlike random primer amplification of DNA samples, in the second round of amplification, the amplified chain of one upstream primer may correspond to multiple downstream primers, thereby amplifying a series of diffuse bands, affecting the accuracy of the results.

还需要说明的是，本申请中Y和Y’两段随机序列，跟常规的随机引物的序列一样，由3-12个碱基组成，优选的由5-9个碱基组成。X和X’序列是测序平台的接头序列，为后续的测序做准备；其中X为测序平台的5’端接头的全部或部分序列，是因为，在DNA样品随机破碎后，还需要采用一对通用引物对纯化的DNA随机片段进行信号放大，即PCR扩增，该PCR扩增的引物以DNA随机片段为模板进行扩增时，可以通过引物设计补全5’端接头的全部序列，因此，在上游随机引物中X序列可以是测序平台的5’端接头的全部或部分序列；基于同样的考虑，X’为测序平台的3’端接头的全部或部分序列。It should also be noted that the two random sequences Y and Y' in this application, like those of conventional random primers, consist of 3-12 bases, preferably 5-9 bases. Sequences X and X' are the adapter sequences of the sequencing platform, preparing for subsequent sequencing. X represents the entire or partial sequence of the 5' adapter of the sequencing platform because, after random fragmentation of the DNA sample, a pair of universal primers is required to amplify the signal of the purified random DNA fragments, i.e., PCR amplification. When the primers for this PCR amplification use the random DNA fragments as templates for amplification, the entire sequence of the 5' adapter can be complemented through primer design. Therefore, the X sequence in the upstream random primer can be the entire or partial sequence of the 5' adapter of the sequencing platform. Based on the same considerations, X' represents the entire or partial sequence of the 3' adapter of the sequencing platform.

优选的，所述封闭修饰为双脱氧修饰。需要说明的是，封闭修饰的作用是避免下游随机引物的3’端延伸，也避免下游随机引物与下游随机引物之间的连接；因此，理论上，凡是可以阻止3-5磷酸二酯键形成的封闭修饰都可以用于本申请，本申请经过多方面综合考虑，证实双脱氧修饰既能起到封闭作用，又不影响随机引物的杂交。Preferably, the blocking modification is a dideoxy modification. It should be noted that the role of the blocking modification is to prevent the 3' end extension of the downstream random primer and to prevent the connection between the downstream random primers; therefore, in theory, any blocking modification that can prevent the formation of 3-5 phosphodiester bonds can be used in this application. After comprehensive consideration of various aspects, this application confirms that the dideoxy modification can both play a blocking role and not affect the hybridization of random primers.

优选的，上游随机引物的X序列的5’端还包括2-6个保护碱基；下游随机引物的X’序列3’端还包括2-6个保护碱基，并且封闭修饰在末端的一个保护碱基之上。Preferably, the 5' end of the X sequence of the upstream random primer further includes 2-6 protection bases; the 3' end of the X' sequence of the downstream random primer further includes 2-6 protection bases, and the blocking modification is on one of the terminal protection bases.

需要说明的是，保护碱基的作用是使得序列更稳定，可以理解，在较次的要求中，也可以不设计保护碱基。It should be noted that the role of the protective base is to make the sequence more stable. It is understandable that, under lower requirements, the protective base may not be designed.

优选的，上游随机引物的X序列和Y序列之间包括若干个间隔碱基；下游随机引物的Y’序列和X’序列之间包括若干个间隔碱基。需要说明的是，间隔碱基并不是必须的，间隔碱基的作用，同样是为了保障序列的稳定性，但是，间隔碱基在后续的纯化后的PCR扩增中是会被扩增的，也就是说，会在接头和随机打断的DNA片段之间造成间隔，因此，除非是一些特殊的要求，最好不添加间隔碱基。Preferably, several spacer bases are included between the X and Y sequences of the upstream random primers; and several spacer bases are included between the Y' and X' sequences of the downstream random primers. It should be noted that spacer bases are not required. Their function is also to ensure sequence stability. However, spacer bases will be amplified during the subsequent PCR amplification after purification, that is, they will create gaps between the adapter and the randomly interrupted DNA fragments. Therefore, unless there are special requirements, it is best not to add spacer bases.

本申请的一种实现方式中，上游随机引物的X序列具有Seq ID No.1所示序列，下游引物的X’序列具有Seq ID No.2所示序列；In one implementation of the present application, the X sequence of the upstream random primer has the sequence shown in Seq ID No. 1, and the X' sequence of the downstream primer has the sequence shown in Seq ID No. 2;

Seq ID No.1：5’-GACCGCTTGGCCTCCGACT-3’Seq ID No.1: 5’-GACCGCTTGGCCTCCGACT-3’

Seq ID No.2：5’-GTCTCCAGTCGAAGCCCGA-3’。Seq ID No. 2: 5’-GTCTCCAGTCGAAGCCCGA-3’.

本申请的另一面公开了一种核酸随机片段化的方法，包括采用本申请的引物对DNA样品进行双随机锚定，具体包括，将上游随机引物和下游随机引物杂交到变性的DNA样品上，在最相邻的上游随机引物和下游随机引物之间，通过DNA聚合酶的作用，上游随机引物向3’端延伸，将上游随机引物和下游随机引物之间的序列填满，然后在DNA连接酶的作用下，将上游随机引物延伸序列的3’端与下游随机引物的5’端连接起来，即将上游随机引物及其延伸序列与下游随机引物连接成一段，通过上游随机引物和下游随机引物的随机杂交实现对DNA样品的双随机打断。Another aspect of the present application discloses a method for random nucleic acid fragmentation, comprising using the primers of the present application to perform double random anchoring on a DNA sample, specifically comprising hybridizing an upstream random primer and a downstream random primer to a denatured DNA sample, extending the upstream random primer toward its 3' end between the most adjacent upstream random primer and the downstream random primer through the action of a DNA polymerase, thereby filling the sequence between the upstream random primer and the downstream random primer, and then ligating the 3' end of the extended sequence of the upstream random primer to the 5' end of the downstream random primer under the action of a DNA ligase, that is, connecting the upstream random primer and its extended sequence to the downstream random primer into a segment, thereby achieving double random fragmentation of the DNA sample through random hybridization of the upstream random primer and the downstream random primer.

需要说明的是，本申请的方法，是将一定量的上游随机引物和下游随机引物以及变性的DNA样品，加入到反应液中，反应液中含有DNA聚合酶、DNA连接酶以及dNTP，实现延伸和连接反应的。实际上，本申请中上游随机引物和下游随机引物的杂交，上游随机引物的延伸，与常规的PCR原理相同，随机引物先杂交到变性的DNA样品上，然后在DNA聚合酶的作用下，上游随机引物的3’端开始延伸，由于只有下游随机引物的5’端有磷酸化修饰，因此，只有最相邻的上游随机引物和下游随机引物之间才能在延伸后，通过DNA连接酶连接成一段，从而保障了随机打断的准确性，避免了错误结构的引入。It should be noted that the method of the present application is to add a certain amount of upstream random primers and downstream random primers and a denatured DNA sample to a reaction solution containing DNA polymerase, DNA ligase and dNTPs to achieve extension and ligation reactions. In fact, the hybridization of the upstream random primer and the downstream random primer in the present application, and the extension of the upstream random primer, are the same as the principles of conventional PCR. The random primers first hybridize to the denatured DNA sample, and then under the action of DNA polymerase, the 3' end of the upstream random primer begins to extend. Since only the 5' end of the downstream random primer is phosphorylated, only the most adjacent upstream random primer and downstream random primer can be connected into a segment by DNA ligase after extension, thereby ensuring the accuracy of random interruption and avoiding the introduction of erroneous structures.

优选的，在上游随机引物和下游随机引物杂交到变性的DNA样品上的过程中，上游随机引物和下游随机引物的总用量为R×n皮摩尔，其中2.7≤R≤750，n=1.515×(m÷L)，m为DNA样品的重量，单位为纳克，L为预计打断后的DNA片段长度，n为将DNA样品破碎至L长度的片段所需要的上游随机引物和下游随机引物理论用量，单位为皮摩尔。Preferably, in the process of hybridizing the upstream random primer and the downstream random primer to the denatured DNA sample, the total amount of the upstream random primer and the downstream random primer is R×n picomoles, wherein 2.7≤R≤750, n=1.515×(m÷L), m is the weight of the DNA sample in nanograms, L is the expected length of the DNA fragment after shearing, and n is the theoretical amount of the upstream random primer and the downstream random primer required to break the DNA sample into fragments of L length, in picomoles.

需要说明的是，上游随机引物和下游随机引物用量是根据需要片段化的程度相关的，可以理解，上游随机引物和下游随机引物的用量越大，杂交到DNA分子链上的随机引物越密集，最相邻的上游随机引物和下游随机引物之间的距离越小，从而连接成的片段也越小，即获得的DNA片段越小，也就是说，片段化程度越高；反之，则获得的随机打断的DNA片段越大；本申请推导了理论上要将一个DNA样品破碎到L长度的片段所需要的引物用量为n，并根据大量的试验分析总结出，根据不同的破碎长度L的需求，实际上上游随机引物和下游随机引物的总用量为理论用量n的R倍，因此，即R×n皮摩尔，R越大，破碎片段L越小，L为破碎后的片段的碱基对数，单位为bp。It should be noted that the amount of upstream random primers and downstream random primers used is related to the degree of fragmentation required. It can be understood that the greater the amount of upstream random primers and downstream random primers used, the denser the random primers hybridized to the DNA molecule chain, the smaller the distance between the most adjacent upstream random primers and downstream random primers, and thus the smaller the connected fragments, that is, the smaller the obtained DNA fragments, that is, the higher the degree of fragmentation; conversely, the obtained randomly interrupted DNA fragments are larger; the present application has deduced that the theoretical amount of primers required to fragment a DNA sample into fragments of length L is n, and based on a large number of experimental analyses, it has been concluded that, depending on the requirements of different fragmentation lengths L, the total amount of upstream random primers and downstream random primers used is actually R times the theoretical amount n, therefore, R×n picomoles, the larger R is, the smaller the fragmented fragment L, and L is the number of base pairs of the fragments after fragmentation, in bp.

优选的，上游随机引物和下游随机引物的摩尔用量比为，上游随机引物：下游随机引物=1~3:1，优选为2:1，优选的R=20。Preferably, the molar ratio of the upstream random primer to the downstream random primer is upstream random primer: downstream random primer = 1 to 3:1, preferably 2:1, and preferably R=20.

本申请的另一面公开了一种核酸文库的构建方法，包括采用本申请的核酸随机片段化的方法对DNA样品进行随机片段化，然后采用一对通用引物对双随机打断的DNA片段进行PCR扩增，即获得随机片段富集的核酸文库；该通用引物由正向引物和反向引物组成，正向引物的3’端具有与上游随机引物的测序平台的5’端接头的全部或部分序列，反向引物的3’端具有与下游随机引物的测序平台的3’端接头的反向互补序列的全部或部分。Another aspect of the present application discloses a method for constructing a nucleic acid library, comprising randomly fragmenting a DNA sample using the nucleic acid random fragmentation method of the present application, and then using a pair of universal primers to PCR amplify the double-randomly interrupted DNA fragments, thereby obtaining a nucleic acid library enriched in random fragments; the universal primers are composed of a forward primer and a reverse primer, the 3' end of the forward primer has all or part of the sequence of the 5' end adapter of the upstream random primer sequencing platform, and the 3' end of the reverse primer has all or part of the reverse complementary sequence of the 3' end adapter of the downstream random primer sequencing platform.

需要说明的是，本申请的引物组中所提到的正向引物和反向引物，即常规的PCR扩增引物，该引物是针对随机破碎的DNA片段设计的，以随机破碎的DNA片段为模板进行PCR扩增，以实现随机破碎的DNA片段的信号扩大；所以，正向引物的3’端具有测序平台的5’端接头的全部或部分序列，反向引物的3’端具有测序平台的3’端接头的反向互补序列的全部或部分。由于上游随机引物和下游随机引物中含有接头的全部或部分序列，而正向引物和反向引物中也含有部分或全部序列，因此，只要PCR扩增的产物中含有全部的接头序列即可；例如，仅以上游随机引物和正向引物进行说明，上游随机引物的X序列是测序平台的5’端接头的全部序列时，正向引物的3’端可以是5’端接头的全部序列，也可以是5’端接头的5’端的部分序列，只要能够与上游随机引物的X序列杂交扩增出完整的5’端接头序列即可；同样的，上游随机引物的X序列是测序平台的5’端接头的部分序列时，可以通过正向引物补全5’端接头的序列，此时，正向引物的3’端可以是5’端接头的全部序列，也可以是补全序列以及部分与上游随机引物的X序列中的5’端接头序列杂交的序列，从而扩增出完整的5’端接头。因此，正向引物的3’端可以是测序平台的5’端接头的全部或部分序列；同样的，反向引物的3’端具有测序平台的3’端接头的反向互补序列的全部或部分。It should be noted that the forward primer and reverse primer mentioned in the primer set of the present application, i.e., conventional PCR amplification primers, are designed for randomly fragmented DNA fragments, and PCR amplification is performed using the randomly fragmented DNA fragments as templates to achieve signal amplification of the randomly fragmented DNA fragments; therefore, the 3' end of the forward primer has all or part of the sequence of the 5' end adapter of the sequencing platform, and the 3' end of the reverse primer has all or part of the reverse complementary sequence of the 3' end adapter of the sequencing platform. Since the upstream random primer and the downstream random primer contain all or part of the adapter sequence, and the forward primer and the reverse primer also contain part or all of the sequence, it is sufficient that the PCR amplification product contains the entire adapter sequence. For example, using only the upstream random primer and the forward primer as an example, when the X sequence of the upstream random primer is the entire sequence of the 5' adapter on the sequencing platform, the 3' end of the forward primer can be the entire sequence of the 5' adapter or a partial sequence of the 5' end of the 5' adapter, as long as it can hybridize with the X sequence of the upstream random primer to amplify the complete 5' adapter sequence. Similarly, when the X sequence of the upstream random primer is a partial sequence of the 5' adapter on the sequencing platform, the 5' adapter sequence can be complemented by the forward primer. In this case, the 3' end of the forward primer can be the entire sequence of the 5' adapter or a complementary sequence as well as a sequence that partially hybridizes with the 5' adapter sequence in the X sequence of the upstream random primer, thereby amplifying the complete 5' adapter. Therefore, the 3' end of the forward primer can be all or part of the sequence of the 5' end adapter of the sequencing platform; similarly, the 3' end of the reverse primer has all or part of the reverse complementary sequence of the 3' end adapter of the sequencing platform.

还需要说明的是，虽然本申请的引物能够很好的保障DNA样品的随机打断，并且不会产生弥散片段干扰，但是，由于在连接上游随机引物和下游随机引物的时候并没有进行实质的PCR扩增，只是上游随机引物进行了延伸，因此，DNA随机片段的拷贝数是有限的，所以，在测序或文库构建时需要采用正向引物和反向引物对随机打断的DNA片段进行PCR扩增，并且正向引物和反向引物都是根据测序平台的接头设计的，也就是说，所有的DNA随机片段实际上都包含有测序平台接头的部分或全部序列，在测序平台的接头序列确定后，正向引物和反向引物对所有的DNA片段都会扩增，因此，称为通用引物。还需要说明的是，通常在连接成随机打断的DNA片段后，需要将连接的片段纯化出来，然后进行PCR扩增，将连接好的DNA随机打断片段纯化的方法可以参考常规的DNA纯化方法，本申请优选的采用磁珠法将DNA片段分离纯化出来。It should also be noted that although the primers of the present application can well ensure the random interruption of DNA samples and will not produce diffuse fragment interference, since no actual PCR amplification is performed when connecting the upstream random primer and the downstream random primer, only the upstream random primer is extended. Therefore, the copy number of the random DNA fragments is limited. Therefore, when sequencing or library construction, it is necessary to use forward primers and reverse primers to PCR amplify the randomly interrupted DNA fragments, and the forward primers and reverse primers are designed based on the adapter of the sequencing platform. In other words, all random DNA fragments actually contain part or all of the sequence of the sequencing platform adapter. After the adapter sequence of the sequencing platform is determined, the forward primer and reverse primer will amplify all DNA fragments. Therefore, they are called universal primers. It should also be noted that usually after connecting to randomly interrupted DNA fragments, the connected fragments need to be purified and then PCR amplified. The method for purifying the connected randomly interrupted DNA fragments can refer to conventional DNA purification methods. The present application preferably uses a magnetic bead method to separate and purify the DNA fragments.

优选的，正向引物和反向引物的5’端分别具有第二测序平台的接头序列。Preferably, the 5' ends of the forward primer and the reverse primer respectively have an adapter sequence of the second sequencing platform.

需要说明的是，正向引物和反向引物的5’端分别具有第二测序平台的接头序列，这个并不是必须的；可以理解，如果只需要在一个测序平台进行测序，则只需要一个测序平台的5’端接头和3’端接头即可，第二测序平台的接头，是方便后续的测序需求。It should be noted that the 5’ ends of the forward primer and the reverse primer respectively have the adapter sequence of the second sequencing platform, which is not necessary; it can be understood that if sequencing is only required on one sequencing platform, only the 5’ end adapter and 3’ end adapter of one sequencing platform are required, and the adapter of the second sequencing platform is to facilitate subsequent sequencing needs.

本申请的一种实现方式中，正向引物含有Seq ID No.1所示序列，反向引物含有Seq ID No.3所示序列；In one implementation of the present application, the forward primer contains the sequence shown in Seq ID No. 1, and the reverse primer contains the sequence shown in Seq ID No. 3;

Seq ID No.3：5’-TCGGGCTTCGACTGGAGAC-3’。Seq ID No. 3: 5’-TCGGGCTTCGACTGGAGAC-3’.

由于采用以上技术方案，本申请的有益效果在于：Due to the adoption of the above technical solution, the beneficial effects of this application are:

本申请引物及基于该引物的核酸随机片段化的方法，适用各种DNA样品，包括cDNA序列的随机破碎，上下游双随机引物可以覆盖目标序列的几乎完整序列，并达到较高的覆盖均一性，从而为后续的分子生物学操作和信息的挖掘提供了更好的基础。并且，本申请的随机破碎方法只需要几种常规的工具酶及PCR核酸扩增设备即可完成全程的样本处理，从而使得中小型科研机构、高校院所、下游应用领域独立开展高通量文库制备成为可能。同时，本发明由于只需要将目标序列进行变性，然后通过随机引物的随机锚定及聚合连接反应即可进行后续的扩增，从而具有简便、快速等优势，摆脱了对大型高端仪器设备或昂贵试剂盒的依赖，其简单易操作的特点大大降低了对技术人员的专业技能要求，极大地扩宽了大规模高通量测序的应用领域。The present application primers and the method for random fragmentation of nucleic acids based on the primers are applicable to various DNA samples, including random fragmentation of cDNA sequences. The upstream and downstream double random primers can cover almost the complete sequence of the target sequence and achieve a higher coverage uniformity, thereby providing a better basis for subsequent molecular biology operations and information mining. In addition, the random fragmentation method of the present application only requires several conventional tool enzymes and PCR nucleic acid amplification equipment to complete the sample processing of the whole process, thereby making it possible for small and medium-sized scientific research institutions, colleges and universities, and downstream application fields to independently carry out high-throughput library preparation. At the same time, the present invention only needs to denature the target sequence, and then the random anchoring and polymerization ligation reaction of the random primers can be used for subsequent amplification, so as to have the advantages of simplicity and speed, get rid of the dependence on large-scale high-end instruments and equipment or expensive test kits, and its simple and easy-to-operate characteristics greatly reduce the professional skills requirements for technical personnel, greatly widening the application field of large-scale high-throughput sequencing.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1：是本申请实施例中DNA随机破碎方法的技术分解图；FIG1 is a technical breakdown diagram of the random DNA fragmentation method according to an embodiment of the present invention;

图2：是本申请实施例中条件优化的电泳结果图；Figure 2 is a diagram showing the electrophoresis results of the optimized conditions in the examples of the present application;

图3：是本申请实施例中条件优化的电泳结果图。Figure 3 is a diagram showing the electrophoresis results of the optimized conditions in the examples of this application.

具体实施方式DETAILED DESCRIPTION

本申请设计的引物，将上游随机引物和下游随机引物都结合在同一模板链上，而不是分别结合在两条互补的模板链上，其目的并不是进行PCR扩增，而是，如图1所示，利用上游随机引物的延伸，将相邻的上游随机引物和下游随机引物之间的空隙填满，再利用DNA连接酶将其连接成完整的片段。由于上游和下游引物都是随机的，因此，可以实现对DNA样品的双随机打断；又由于上游随机引物延伸至下游随机引物处就不再延伸，避免了像随机引物对DNA样品进行PCR扩增那样，在第二轮扩增时，一个上游引物的扩增链，对应多个下游引物，从而得到弥散的片段带，对后续操作造成影响。The primers designed in this application bind both the upstream random primer and the downstream random primer to the same template strand, rather than to two complementary template strands. Their purpose is not to perform PCR amplification, but rather, as shown in Figure 1, to fill the gap between adjacent upstream random primers and downstream random primers by extending the upstream random primer, which are then ligated into complete fragments using DNA ligase. Because both the upstream and downstream primers are random, double random fragmentation of the DNA sample can be achieved. Furthermore, because the upstream random primer stops extending after reaching the downstream random primer, this avoids the problem of random primers amplifying the DNA sample during the second round of PCR amplification, where a single upstream primer amplification strand corresponds to multiple downstream primers, resulting in diffuse fragment bands and compromising subsequent operations.

基于本申请的引物的核酸随机片段化的方法，所需的时间短，几乎全程在PCR仪上完成，能够实现最快的样品制备，可以实现自动化的操作，相比其它方法，可有效降低人为错误操作，降低样本处理的系统误差；同时，也摆脱了对大型高端仪器设备或昂贵试剂盒的依赖。The method for random nucleic acid fragmentation based on the primers of the present application requires a short time and is almost entirely completed on a PCR instrument. It can achieve the fastest sample preparation and can realize automated operation. Compared with other methods, it can effectively reduce human error and reduce systematic errors in sample processing. At the same time, it also gets rid of the dependence on large-scale high-end instruments and equipment or expensive test kits.

需要说明的是，在本申请的核酸随机片段化的方法中，将上游随机引物和下游随机引物杂交到变性的DNA样品上，其中DNA样品的变性可以采用高温处理或者化学试剂变性法，高温处理的时间与温度成反比，温度越高，处理时间则越短。适宜的变性温度为98-95℃，处理时间1-5分钟，本申请的一种实现方案中选取95℃反应5分钟。化学试剂变性法常用变性试剂包含KOH、NaOH、EDTA等，本申请不做具体限定。化学试剂变性后，退火反应需中和反应体系的碱离子浓度，使反应体系保持在中性适宜的盐离子环境，中和缓冲液可以是各种低浓度的酸缓冲液，且仅对碱溶液处理组进行中和缓冲液的处理，如HCl和Tris-HCl的组合缓冲液。It should be noted that in the method of random fragmentation of nucleic acids in the present application, upstream random primers and downstream random primers are hybridized to denatured DNA samples, wherein the denaturation of the DNA sample can be carried out by high temperature treatment or chemical reagent denaturation method. The time of high temperature treatment is inversely proportional to the temperature. The higher the temperature, the shorter the treatment time. The suitable denaturation temperature is 98-95°C, and the treatment time is 1-5 minutes. In one implementation scheme of the present application, 95°C is selected for reaction for 5 minutes. Commonly used denaturing reagents for chemical reagent denaturation method include KOH, NaOH, EDTA, etc., which are not specifically limited in this application. After the chemical reagent is denatured, the annealing reaction needs to neutralize the alkali ion concentration of the reaction system to keep the reaction system in a neutral and suitable salt ion environment. The neutralization buffer can be various low-concentration acid buffers, and only the alkaline solution treatment group is treated with the neutralization buffer, such as a combination buffer of HCl and Tris-HCl.

可以通过调控上游随机引物、下游随机引物与模板之间的摩尔比例使得延伸产物片段大小满足测序平台需要，在此不做具体限定。本申请中，上游随机引物和下游随机引物中随机序列的个数可以有不用设计方案，包含5个随机碱基、6个随机碱基、7个随机碱基、8个随机碱基等以确保引物结合到靶序列的不同位置，这与常规的随机引物相同，在此不做具体限定。同时为防止5’随机引物和3’随机引物在后续反应中引入错误结构，将上游随机引物的5’端设计无磷酸化修饰，而下游随机引物的5’端有磷酸化修饰，且下游随机引物的3’端进行封闭修饰，以避免3-5磷酸二酯键形成，优选的为双脱氧修饰。此外，下游随机引物和上游随机引物中还可以包含后续适用于不同测序平台的接头序列。Can make extension product fragment size meet sequencing platform needs by the molar ratio between regulation and control upstream random primer, downstream random primer and template, do not do specific restriction at this.In the application, the number of random sequence can have and does not need design scheme in upstream random primer and downstream random primer, comprise 5 random bases, 6 random bases, 7 random bases, 8 random bases etc. to guarantee that primer is attached to the different positions of target sequence, and this is identical with conventional random primer, do not do specific restriction at this.Simultaneously for preventing 5 ' random primer and 3 ' random primer from introducing wrong structure in subsequent reaction, 5 ' end design of upstream random primer is modified without phosphorylation, and 5 ' end of downstream random primer has phosphorylation modification, and 3 ' end of downstream random primer carries out blocking modification, to avoid 3-5 phosphodiester bond to form, preferably dideoxy modification.In addition, downstream random primer and upstream random primer can also comprise the follow-up joint sequence that is applicable to different sequencing platforms.

此外，DNA聚合酶可以选用常规的DNA聚合酶，延伸后的连接酶，也是常规的DNA连接酶，在此不做具体限定。连接完成后，本申请的一种实现方式中，单链DNA纯化采用单链DNA磁珠选择法，磁珠浓度为1.0倍。PCR过程以单链DNA为模板进行扩增，实现随机破碎的DNA片段的信号放大。In addition, the DNA polymerase can be a conventional DNA polymerase, and the ligase after extension is also a conventional DNA ligase, which is not specifically limited here. After the connection is completed, in one implementation of the present application, the single-stranded DNA is purified using a single-stranded DNA magnetic bead selection method with a magnetic bead concentration of 1.0 times. The PCR process is amplified using single-stranded DNA as a template to achieve signal amplification of randomly broken DNA fragments.

本申请中，上游随机引物和下游随机引物的用量直接影响破碎后的DNA片段长度，因此，本申请优选的方案中，上游随机引物和下游随机引物按照总用量R×n皮摩尔添加到DNA样品中，2.7≤R≤750，n=1.515×(m÷L)，m为DNA样品的重量，单位为纳克，L为预计打断后的DNA片段长度；以上公式n=1.515×(m÷L)是本申请的发明人理论推导的，而R的取值范围则是根据破碎后不同的片段大小需求和大量的试验分析得出的，R值越大破碎后的片段越小。n=1.515×(m÷L)的推导过程如下：In the present application, the amount of upstream random primers and downstream random primers directly affects the length of the DNA fragments after fragmentation. Therefore, in the preferred embodiment of the present application, the upstream random primers and downstream random primers are added to the DNA sample according to a total amount of R×n picomoles, 2.7≤R≤750, n=1.515×(m÷L), m is the weight of the DNA sample in nanograms, and L is the expected length of the DNA fragments after fragmentation; the above formula n=1.515×(m÷L) is theoretically derived by the inventors of the present application, and the value range of R is based on different fragment size requirements after fragmentation and a large number of experimental analyses. The larger the R value, the smaller the fragments after fragmentation. The derivation process of n=1.515×(m÷L) is as follows:

以3G的人类基因组为例，其含有3×10⁹个碱基对，其质量摩尔浓度约为M=(3×10⁹×660) g/mol=(3×10⁶×660) ng/pmol=1.98×10⁹ng/pmol，Taking the 3G human genome as an example, it contains 3×10 ⁹ base pairs, and its mass molar concentration is approximately M=(3×10 ⁹ ×660) g/mol=(3×10 ⁶ ×660) ng/pmol=1.98×10 ⁹ ng/pmol,

质量为m（ng）的基因组DNA其摩尔数n₁=m÷M=m/(3×10⁶×660×10¹²)mol，The number of moles of genomic DNA with a mass of m (ng) is n ₁ =m÷M=m/(3×10 ⁶ ×660×10 ¹² )mol.

质量为m（ng）的基因组DNA其分子数N₁=n₁×Na=m÷M×Na，The number of molecules of genomic DNA with a mass of m (ng) is N ₁ =n ₁ ×Na=m÷M×Na,

打断到L（bp）长度的分子理论所需随机引物分子数n₂=(3×10⁹÷L)×N₁ The theoretical number of random primer molecules required to break a molecule to a length of L (bp) is n ₂ = (3×10 ⁹ ÷ L)×N ₁

打断到L（bp）长度的分子理论所需随机引物摩尔数n=n₂÷NaThe theoretical number of moles of random primers required to break a molecule to a length of L (bp) is n = n ₂ ÷ Na

因此，n=n₂÷Na=(3×10⁹÷L)×N₁÷Na=(3×10⁹÷L)×(m÷M×Na)÷NaTherefore, n = n ₂ ÷ Na = (3 × 10 ⁹ ÷ L) × N ₁ ÷ Na = (3 × 10 ⁹ ÷ L) × (m ÷ M × Na) ÷ Na

=3×10⁹×m÷L÷M=(3×10⁹÷1.98×10⁹)×m÷L=1.515×m÷L pmol =3×10 ⁹ ×m÷L÷M=(3×10 ⁹ ÷1.98×10 ⁹ )×m÷L=1.515×m÷L pmol

其中，Na为阿伏伽德罗常数，Na=6.02×10²³ Where Na is Avogadro's constant, Na=6.02×10 ²³

即，理论上将m纳克的3G人类基因组破碎至L长度的片段，需要用n pmol的上游随机引物和下游随机引物，但是根据大量的试验分析，为了获得L长度的片段，实际上需要加入的上游随机引物和下游随机引物总量为n pmol 的R倍，即R×n皮摩尔，L越小，R越大。That is, theoretically, to break m nanograms of the 3G human genome into L-length fragments, n pmol of upstream and downstream random primers are needed. However, based on a large number of experimental analyses, in order to obtain L-length fragments, the total amount of upstream and downstream random primers that actually needs to be added is R times n pmol, that is, R × n picomoles. The smaller L, the larger R.

需要说明的是，根据推导公式可见，上游随机引物和下游随机引物的理论用量n与DNA样品的实际长度在理论上没有直接关系，直接影响n是需要获得的破碎片段的长度L，以及DNA样品的总重量；所以，本申请以3G的人类基因组为例推导的理论用量n的公式并不仅限于3G的人类基因组，也就是说，本申请的核酸随机片段化的方法和文库构建方法具有广泛的适用性，可以用于对包括cDNA在内的任何DNA样品的处理。It should be noted that, according to the derived formula, the theoretical amount n of the upstream random primers and the downstream random primers has no direct relationship with the actual length of the DNA sample in theory. The direct factors affecting n are the length L of the fragments to be obtained and the total weight of the DNA sample. Therefore, the formula for the theoretical amount n derived in this application using the 3G human genome as an example is not limited to the 3G human genome. In other words, the nucleic acid random fragmentation method and library construction method of this application have wide applicability and can be used to process any DNA sample, including cDNA.

还需要说明的是，本申请的理论用量n是根据双链DNA进行推导的，而通常，在试验中获取的DNA样品也是双链DNA，因此，本申请的上游随机引物和下游随机引物的用量推导和限定具有广泛适用性。对于一些特殊的单链DNA样品，可以理解，只要替换相应的质量摩尔浓度M即可，其余相同；单链DNA质量摩尔浓度约为M=Y×324.5 g/mol，其中Y为单链DNA的长度；由于通常获得的样品都是双链DNA，因此，本申请不对单链DNA的情况进行具体限定。It should also be noted that the theoretical dosage n in this application is derived based on double-stranded DNA. Typically, the DNA samples obtained in the experiment are also double-stranded DNA. Therefore, the derivation and limitation of the dosage of upstream and downstream random primers in this application are widely applicable. For some special single-stranded DNA samples, it can be understood that it is sufficient to replace the corresponding mass molar concentration M, and the rest remain the same; the mass molar concentration of single-stranded DNA is approximately M = Y × 324.5 g/mol, where Y is the length of the single-stranded DNA. Since the samples typically obtained are double-stranded DNA, this application does not specifically limit the case of single-stranded DNA.

下面通过具体实施例和附图对本申请作进一步详细说明。以下实施例仅对本申请进行进一步说明，不应理解为对本申请的限制。The present invention is further described in detail below through specific examples and drawings. The following examples are only used to further illustrate the present invention and should not be construed as limiting the present invention.

实施例Example

1.引物设计1. Primer Design

本例设计订购含有8个随机序列位点上游随机引物和下游随机引物，序列如下：In this example, we designed and ordered upstream random primers and downstream random primers containing 8 random sequence sites. The sequences are as follows:

上游随机引物：5’-GACGACCGCTTGGCCTCCGACTTNNNNNNNN-3’Upstream random primer: 5'-GAC GACCGCTTGGCCTCCGACT TNNNNNNNN-3'

下游随机引物：5’-P-NNNNNNNNGTCTCCAGTCGAAGCCCGACG-ddC-3’Downstream random primer: 5'-P-NNNNNNNN GTCTCCAGTCGAAGCCCGA CG-ddC-3'

其中，上游随机引物中，“GACCGCTTGGCCTCCGACT”即X序列中测序平台的5’端接头的序列，5’端接头之前的“GAC”为保护碱基，“NNNNNNNN”为随机序列Y序列，X序列与Y序列之间为间隔碱基；下游随机引物中，“GTCTCCAGTCGAAGCCCGA”为X’序列的测序平台的3’端接头序列，3’端接头之后的“CG”为保护碱基，“NNNNNNNN”为随机序列Y’序列，P为磷酸化修饰，ddC为双脱氧修饰。随机序列是随机合成的，在此不做具体限定。In the upstream random primer, "GACCGCTTGGCCTCCGACT" represents the 5' adapter sequence of the sequencing platform in sequence X. The "GAC" before the 5' adapter is a protective base. "NNNNNNNN" represents the random sequence Y, with a spacer base between sequences X and Y. In the downstream random primer, "GTCTCCAGTCGAAGCCCGA" represents the 3' adapter sequence of the sequencing platform in sequence X. The "CG" after the 3' adapter is a protective base. "NNNNNNNN" represents the random sequence Y, with P indicating phosphorylation and ddC indicating dideoxy modification. Random sequences are randomly synthesized and are not specifically limited here.

将合成的引物稀释到10uM备用。The synthesized primers were diluted to 10 μM for later use.

2.基因组DNA序列变性2. Genomic DNA sequence denaturation

本例采用化学变性法进行，具体的，将提取的人类基因组DNA稀释到50ng/uL，按如下体系配置变性反应体系：DNA样品1ul、ddH₂O 0.6ul、变性缓冲液1ul、合计2.6ul。然后在室温反应3分钟即完成变性。本例中，变性缓冲液成分为208mM KOH，1.3mMEDTA。This example uses a chemical denaturation method. Specifically, extracted human genomic DNA was diluted to 50 ng/µL. The denaturation reaction system was prepared as follows: 1 µL DNA sample, 0.6 _µL ddH₂O, and 1 µL denaturation buffer, for a total of 2.6 µL. Denaturation was then completed by incubating at room temperature for 3 minutes. In this example, the denaturation buffer consisted of 208 mM KOH and 1.3 mM EDTA.

3.随机引物退火3. Random Primer Annealing

在上述变性体系中加入1uL中和缓冲液，缓冲液成分为208mMHCl，312.5mMTris-HCl，室温反应3分钟。加入退火反应液1uL，下游随机引物与上游随机引物按照1:2的比例加入到反应液中，并且，上游随机引物和下游随机引物的总浓度5.1pmol。Add 1 μL of neutralization buffer (208 mM HCl, 312.5 mM Tris-HCl) to the denaturation system and incubate at room temperature for 3 minutes. Add 1 μL of annealing solution and the downstream and upstream random primers at a 1:2 ratio to achieve a total concentration of 5.1 pmol.

退火反应液配制如下：10x phi buffer 0.46ul、ddH₂O 0.03ul、上游随机引物10uM 0.34ul、下游随机引物10uM 0.17ul、合计1ul。The annealing reaction solution was prepared as follows: 10x phi buffer 0.46ul, ddH ₂ O 0.03ul, upstream random primer 10uM 0.34ul, downstream random primer 10uM 0.17ul, a total of 1ul.

反应条件为，室温反应10分钟。The reaction conditions were room temperature for 10 minutes.

4.序列延伸4. Sequence extension

向上述反应体系中加入延伸反应液15.4uL，延伸反应液中dNTP浓度为0.85nmol。延伸反应液配制如下：10x phi buffer 1.54ul、纯水3.56ul、二甲基亚砜1ul、5M的甜菜碱8ul、0.25mM each的dNTP 0.85uL、2U/ul DNA聚合酶0.25uL、400U/ul DNA连接酶0.2ul，合计15.4uL。Add 15.4 µL of extension reaction solution to the above reaction system. The dNTP concentration in the extension reaction solution is 0.85 nmol. The extension reaction solution is prepared as follows: 1.54 µL of 10x phi buffer, 3.56 µL of purified water, 1 µL of dimethyl sulfoxide, 8 µL of 5M betaine, 0.85 µL of 0.25 mM each dNTP, 0.25 µL of 2 U/µL DNA polymerase, and 0.2 µL of 400 U/µL DNA ligase, for a total of 15.4 µL.

延伸条件与测序平台适宜的文库大小有关，本例为37℃，延伸20分钟，之后65℃反应15分钟用于DNA聚合酶的热灭活。需要说明的是，由于随机片段化的片段大小是由上游随机引物和下游随机引物的总和与DNA样品的摩尔比决定的，可以理解，片段越大，延伸条件中延伸的时间则越长，反之片段越小，延伸的时间就越短，因此，延伸条件与片段大小也就是文库大小有关。Extension conditions are related to the appropriate library size for the sequencing platform. In this example, extension is performed at 37°C for 20 minutes, followed by a 15-minute reaction at 65°C to heat-inactivate the DNA polymerase. It should be noted that since the size of randomly fragmented fragments is determined by the molar ratio of the sum of the upstream and downstream random primers to the DNA sample, it is understandable that larger fragments require longer extension times, while smaller fragments require shorter extension times. Therefore, extension conditions are related to fragment size, and therefore library size.

5.连接产物纯化5. Purification of Ligation Products

单链连接产物采用磁珠法纯化，本例采用1.0倍 PEG32磁珠。向上述30uL连接体系中加入30uL PEG32磁珠进行单链连接产物纯化，回溶于纯水中，即获得单链的随机打断的DNA。The single-stranded ligation product was purified using magnetic beads. In this example, 1.0x PEG32 magnetic beads were used. 30 μL of PEG32 magnetic beads were added to the 30 μL ligation system described above to purify the single-stranded ligation product. The product was then dissolved in pure water to obtain single-stranded, randomly interrupted DNA.

6.PCR反应6. PCR reaction

以纯化的随机打断的DNA为模板进行扩增，并且设计针对上游随机引物和下游随机引物中的5’端接头和3’端接头的引物组，引物组中正向引物如Seq ID No.4所示，反向引物如Seq ID No.5所示，Amplification was performed using the purified randomly interrupted DNA as a template, and primer sets were designed for the 5' end adapter and 3' end adapter in the upstream random primer and the downstream random primer. The forward primer in the primer set was shown in Seq ID No. 4, and the reverse primer was shown in Seq ID No. 5.

Seq ID No.4：5’-TCCTAAGACCGCTTGGCCTCCGACT-3’Seq ID No.4: 5'-TCCTAA GACCGCTTGGCCTCCGACT -3'

Seq ID No.5：5’-AGACAAGCTCGATCGGGCTTCGACTGGAGAC-3’Seq ID No.5: 5'-AGACAAGCTCGA TCGGGCTTCGACTGGAGAC -3'

需要说明的是，和Seq ID No.1所示序列“GACCGCTTGGCCTCCGACT”的正向引物，以及Seq ID No.3所示序列“TCGGGCTTCGACTGGAGAC”的反向引物相比，本例的正向引物和反向引物的5’端都分别添加了第二测序平台的接头，因此获得Seq ID No.4和Seq ID No.5所示的正向引物和反向引物；可以理解，5’端添加的第二测序平台的接头并不会影响扩增，因此，Seq ID No.1和Seq ID No.3所示的正向引物和反向引物同样可以适用于本例。It should be noted that, compared with the forward primer of the sequence " GACCGCTTGGCCTCCGACT " shown in Seq ID No.1 and the reverse primer of the sequence " TCGGGCTTCGACTGGAGAC " shown in Seq ID No.3, the 5' ends of the forward primer and reverse primer in this example were respectively added with the adapter of the second sequencing platform, thereby obtaining the forward primer and reverse primer shown in Seq ID No.4 and Seq ID No.5; it can be understood that the adapter of the second sequencing platform added to the 5' end does not affect the amplification, therefore, the forward primer and reverse primer shown in Seq ID No.1 and Seq ID No.3 are also applicable to this example.

PCR反应体系：纯化的单链DNA 20.5ul、2X PCR缓冲液25ul、20uM的正向引物2ul、20uM的反向引物2ul、400U/ul DNA聚合酶0.5ul，合计 50 ul。PCR reaction system: 20.5ul of purified single-stranded DNA, 25ul of 2X PCR buffer, 2ul of 20uM forward primer, 2ul of 20uM reverse primer, 0.5ul of 400U/ul DNA polymerase, a total of 50ul.

PCR反应条件为，95℃变性3min，然后进入15个循环反应：95℃ 30sec、55℃30sec、72℃ 1min，循环完成后，72℃延伸10min，最后4℃待机。The PCR reaction conditions were denaturation at 95°C for 3 min, followed by 15 cycles of reaction: 95°C for 30 sec, 55°C for 30 sec, and 72°C for 1 min. After the cycle was completed, the reaction was extended at 72°C for 10 min, and then incubated at 4°C.

7.测序验证7. Sequencing Verification

将PCR产物进行Illumina Hiseq2000 PE101测序，将测序得到reads过滤后，比对参考基因组序列，统计比对数据量及不同深度下基因组覆盖的情况，结果如表1所示。The PCR products were sequenced by Illumina Hiseq2000 PE101. After filtering, the reads obtained by sequencing were aligned with the reference genome sequence. The amount of alignment data and the genome coverage at different depths were counted. The results are shown in Table 1.

表1 测序数据比对回参考基因组的覆盖度及均一性分布Table 1 Coverage and uniformity distribution of sequencing data aligned back to the reference genome

测序读长数目（M）Number of sequencing read lengths (M) 4.54.5 比对率Matching rate 99.70%99.70% 唯一比对的读长数目（M）Number of uniquely aligned reads (M) 4.54.5 基因组覆盖度Genome coverage 99.99%99.99% 4X深度下基因组覆盖度Genome coverage at 4X depth 99.89%99.89% 10X深度下基因组覆盖度Genome coverage at 10X depth 98.53%98.53% 20X深度下基因组覆盖度Genome coverage at 20X depth 96.25%96.25% 30X深度下基因组覆盖度Genome coverage at 30X depth 85.20%85.20% 40X深度下基因组覆盖度Genome coverage at 40X depth 73.55%73.55% 50X深度下基因组覆盖度Genome coverage at 50X depth 55.60%55.60%

比对率说明本例的双随机引物扩增得到的片段基本都是目的物种的，特异性较好；基因组覆盖度说明目标基因组基本被全覆盖，说明了本例的随机打断方法的随机性较好；不同深度下的基因组覆盖度说明了本例覆盖的均一性比较好，大部分的区域都被深度覆盖，如4X，10X，可以满足后续变异的分析。The alignment rate shows that the fragments amplified by the double random primers in this example are basically all of the target species, with good specificity; the genome coverage shows that the target genome is basically fully covered, indicating that the random interruption method in this example has good randomness; the genome coverage at different depths shows that the coverage uniformity in this example is relatively good, and most areas are deeply covered, such as 4X and 10X, which can meet the needs of subsequent variation analysis.

在以上试验的基础上，本例对还进一步对不同的随机序列个数，以及上游随机引物用量和下游随机引物用量的比例进行了优化，同时，还对3中聚合酶进行的比较试验；结果如图2所示。图2中，泳道D2000为DNA marker的电泳结果，泳道1为随机碱基序列为5个，下游随机引物:上游随机引物=1:1，采用Taq DNA聚合酶的电泳结果；泳道2为随机碱基序列为5个，下游随机引物:上游随机引物=2:1，采用Taq DNA聚合酶的电泳结果；泳道3为随机碱基序列为5个，下游随机引物:上游随机引物=3:1，采用Taq DNA聚合酶的电泳结果；泳道4为随机碱基序列为5个，下游随机引物:上游随机引物=3:1，采用E.Coli DNA聚合酶I的电泳结果；泳道5为随机碱基序列为5个，下游随机引物:上游随机引物=3:1，采用klenow Fragment的电泳结果；泳道6为随机碱基序列为9个，下游随机引物:上游随机引物=2:1，klenowFragment的电泳结果；泳道7为随机碱基序列为8个，下游随机引物:上游随机引物=2:1，采用klenow Fragment的电泳结果；泳道8为随机碱基序列为5个，下游随机引物:上游随机引物=2:1，采用klenow Fragment的电泳结果；泳道9为随机碱基序列为6个，下游随机引物:上游随机引物=2:1，采用klenow Fragment的电泳结果；泳道10为随机碱基序列为7个，下游随机引物:上游随机引物=2:1，采用klenow Fragment的电泳结果。结果显示，三种聚合酶比较说明klenow fragment的效果比较好，随机碱基序列数目的比较说明8个随机碱基的随机锚定PCR效果好一些，下游随机引物和上游随机引物比例，在1-3:1均可，而2:1的比例效果较好。Based on the above experiments, this example further optimized the number of different random sequences and the ratio of the upstream random primer amount to the downstream random primer amount. At the same time, a comparative test was conducted on three polymerases; the results are shown in Figure 2. In Figure 2, lane D2000 is the electrophoresis result of DNA marker, lane 1 is the electrophoresis result of 5 random base sequences, downstream random primer: upstream random primer = 1:1, using Taq DNA polymerase; lane 2 is the electrophoresis result of 5 random base sequences, downstream random primer: upstream random primer = 2:1, using Taq DNA polymerase; lane 3 is the electrophoresis result of 5 random base sequences, downstream random primer: upstream random primer = 3:1, using Taq DNA polymerase; lane 4 is the electrophoresis result of 5 random base sequences, downstream random primer: upstream random primer = 3:1, using E.Coli DNA polymerase I; lane 5 is the electrophoresis result of 5 random base sequences, downstream random primer: upstream random primer = 3:1, using klenow Lane 6 is the electrophoresis result of Klenow Fragment with 9 random base sequences, downstream random primers:upstream random primers=2:1; lane 7 is the electrophoresis result of Klenow Fragment with 8 random base sequences, downstream random primers:upstream random primers=2:1; lane 8 is the electrophoresis result of Klenow Fragment with 5 random base sequences, downstream random primers:upstream random primers=2:1; lane 9 is the electrophoresis result of Klenow Fragment with 6 random base sequences, downstream random primers:upstream random primers=2:1; lane 10 is the electrophoresis result of Klenow Fragment with 7 random base sequences, downstream random primers:upstream random primers=2:1. The results showed that the comparison of the three polymerases showed that the klenow fragment was more effective, and the comparison of the number of random base sequences showed that the random anchor PCR with 8 random bases was more effective. The ratio of downstream random primers to upstream random primers could be between 1 and 3:1, and the ratio of 2:1 was better.

在此基础上，本例进一步的，对上游随机引物和下游随机引物的总用量进行了试验，结果如图3所示。图3中，泳道D2000为DNA marker；泳道1为阴性对照；泳道2为下游随机引物:上游随机引物=2:1，随机引物总量200pmol，采用klenow Fragment的电泳结果；泳道3为下游随机引物:上游随机引物=2:1，随机引物总量51pmol，采用klenow Fragment的电泳结果；泳道4为下游随机引物:上游随机引物=2:1，随机引物总量5.1pmol，采用klenowFragment的电泳结果；泳道5为下游随机引物:上游随机引物=2:1，随机引物总量0.51pmol，采用klenow Fragment的电泳结果。结果显示，随机引物总量5.1pmol效果最好，即按照DNA样品m=50ng，打断到L=300bp计算，理论引物用量n=0.253pmol，而R=5.1/0.253约等于20，即将50ng的DNA样品破碎到300bp所需要的实际的上游随机引物和下游随机引物的用量是理论用量的20倍。对于R的取值，本例对其进行了大量的试验探讨，分析试验结果显示，以构建文库为基础，即构建文库所需要的片段大小为基础，根据所打断的长度L不同，R的取值在2.7≤R≤750之间，L越小，实际用量对理论用量的倍数越高，即R越大。R优选为20，是针对比较常规的将DNA样品打断至300bp而言的。On this basis, this example further tested the total amount of upstream random primers and downstream random primers, and the results are shown in Figure 3. In Figure 3, lane D2000 is a DNA marker; lane 1 is a negative control; lane 2 is the electrophoresis result of Klenow Fragment using a ratio of downstream random primers to upstream random primers of 2:1, with a total amount of random primers of 200 pmol; lane 3 is the electrophoresis result of Klenow Fragment using a ratio of downstream random primers to upstream random primers of 2:1, with a total amount of random primers of 51 pmol; lane 4 is the electrophoresis result of Klenow Fragment using a ratio of downstream random primers to upstream random primers of 2:1, with a total amount of random primers of 5.1 pmol; lane 5 is the electrophoresis result of Klenow Fragment using a ratio of downstream random primers to upstream random primers of 2:1, with a total amount of random primers of 0.51 pmol. The results showed that a total amount of 5.1 pmol of random primers was optimal. This means that, based on a DNA sample of m = 50 ng and fragmentation to L = 300 bp, the theoretical primer dosage n = 0.253 pmol. R = 5.1/0.253 is approximately 20, meaning the actual amount of upstream and downstream random primers required to fragment 50 ng of DNA to 300 bp is 20 times the theoretical amount. Extensive experimental research was conducted to determine the value of R. The results showed that, based on the library construction process, specifically the fragment size required, and depending on the fragment length L, R ranged from 2.7 ≤ R ≤ 750. The smaller L, the higher the multiple of the theoretical amount (i.e., the larger R). An R of 20 is preferred, considering the more conventional fragmentation of DNA samples to 300 bp.

本例的引物，采用上游随机引物和下游随机引物的双随机锚定，实现对DNA样品的随机破碎，然后再采用引物组对纯化的随机打断的单链DNA片段进行扩增，获得可以适用于不同测序平台的DNA文库；操作简单方便，避免了对特殊设备和昂贵试剂盒的依赖，极大地扩宽了大规模高通量测序的应用领域。The primers in this example use double random anchoring of upstream random primers and downstream random primers to achieve random fragmentation of the DNA sample. The primer set is then used to amplify the purified randomly interrupted single-stranded DNA fragments to obtain a DNA library that can be applied to different sequencing platforms. The operation is simple and convenient, avoiding dependence on special equipment and expensive kits, and greatly expanding the application field of large-scale high-throughput sequencing.

以上内容是结合具体的实施方式对本申请所作的进一步详细说明，不能认定本申请的具体实施只局限于这些说明。对于本申请所属技术领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本申请的保护范围。The above content is a further detailed description of the present application in conjunction with specific implementation methods, and it cannot be considered that the specific implementation of the present application is limited to these descriptions. For ordinary technicians in the technical field to which the present application belongs, they can make several simple deductions or substitutions without departing from the concept of the present application, which should be considered to fall within the scope of protection of the present application.

SEQUENCE LISTINGSEQUENCE LISTING

<110> 深圳华大智造科技有限公司<110> Shenzhen MGI Technology Co., Ltd.

<120> 一种用于核酸随机片段化的引物及核酸随机片段化方法<120> A primer for random nucleic acid fragmentation and a method for random nucleic acid fragmentation

<130> 17F24115<130> 17F24115

<160> 5<160> 5

<170> PatentIn version 3.3<170> PatentIn version 3.3

<210> 1<210> 1

<211> 19<211> 19

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<400> 1<400> 1

gaccgcttgg cctccgact 19gaccgcttgg cctccgact 19

<210> 2<210> 2

<211> 19<211> 19

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<400> 2<400> 2

gtctccagtc gaagcccga 19gtctccagtc gaagcccga 19

<210> 3<210> 3

<211> 19<211> 19

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<400> 3<400> 3

tcgggcttcg actggagac 19tcgggcttcg actggagac 19

<210> 4<210> 4

<211> 25<211> 25

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<400> 4<400> 4

tcctaagacc gcttggcctc cgact 25tcctaagacc gcttggcctc cgact 25

<210> 5<210> 5

<211> 31<211> 31

<212> DNA<212> DNA

<213> 人工序列<213> Artificial sequence

<400> 5<400> 5

agacaagctc gatcgggctt cgactggaga c 31agacaagctc gatcgggctt cgactggaga c 31

Claims

1. A method for random fragmentation of nucleic acids, comprising using primers to perform double random anchoring on a DNA sample, wherein the primers consist of a plurality of upstream random primers and a plurality of downstream random primers; specifically comprising: hybridizing the upstream random primers and the downstream random primers to a denatured DNA sample; between the most adjacent upstream and downstream random primers, by the action of DNA polymerase, extending the upstream random primers to the 3' end to fill the sequence between the upstream and downstream random primers; and then, by the action of DNA ligase, ligating the 3' end of the extended sequence of the upstream random primers to the 5' end of the downstream random primers, i.e., ligating the upstream random primers and their extended sequences to the downstream random primers into a single segment, thereby achieving double random fragmentation of the DNA sample through random hybridization of the upstream and downstream random primers;

During the process of hybridizing the upstream random primer and the downstream random primer to the denatured DNA sample, the total amount of the upstream random primer and the downstream random primer used is R×n picomoles, where 2.7≤R≤750, n=1.515×(m÷L), m is the weight of the DNA sample in nanograms, L is the expected length of the fragmented DNA, and n is the theoretical amount of upstream and downstream random primers required to break the DNA sample into fragments of length L, in picomoles.

The molar ratio of the upstream random primer to the downstream random primer is 1 to 3:1.

The sequence composition of the upstream random primer is: 5’-X-Y-3’;

The sequence composition of the downstream random primer is: 5’-P-Y’-X’-close-3’;

Where Y and Y’ are random sequences, X is all or part of the 5’ adapter sequence of the sequencing platform, X’ is all or part of the 3’ adapter sequence of the sequencing platform, P is phosphorylation modification, and close is a blocking modification used to prevent the formation of 3-5 phosphodiester bonds.

In use, the upstream random primer and the downstream random primer hybridize and bind on the same template strand. The upstream random primer extends to the 3' end, filling the gap between the nearest downstream random primer and the upstream random primer. The 3' end of the extended sequence of the upstream random primer is connected to the 5' end of the downstream random primer, thus linking the upstream random primer and its extended sequence with the downstream random primer into a single segment.

The blocking modification is a dideoxy modification;

The X sequence of the upstream random primer also includes 2-6 protective bases at its 5' end; the X' sequence of the downstream random primer also includes 2-6 protective bases at its 3' end, and the blocking modification is applied above one of the terminal protective bases.

The X sequence of the upstream random primer has the sequence shown in Seq ID No. 1, and the X’ sequence of the downstream primer has the sequence shown in Seq ID No. 2;

Seq ID No.1: 5’-GACCGCTTGGCCTCCGACT-3’;

Seq ID No.2: 5’-GTCTCCAGTCGAAGCCCGA-3’;

The 5' end of the X sequence of the upstream random primer is protected with "GAC", and the 3' end of the X' sequence of the downstream random primer is protected with "CG".

2. The method according to claim 1, wherein the molar ratio of the upstream random primer to the downstream random primer is 2:1.

3. The method according to claim 1, wherein R = 20.

4. A method for constructing a nucleic acid library, comprising randomly fragmenting a DNA sample using the method described in any one of claims 1-3, and then performing PCR amplification on the double-randomly fragmented DNA fragments using a pair of universal primers to obtain a nucleic acid library enriched with random fragments; wherein the universal primers consist of a forward primer and a reverse primer, wherein the 3' end of the forward primer has all or part of the sequence of the 5' adapter of the sequencing platform, and the 3' end of the reverse primer has all or part of the reverse complementary sequence of the 3' adapter of the sequencing platform.

5. The construction method according to claim 4, characterized in that: the 5' ends of the forward primer and the reverse primer respectively have adapter sequences of the second sequencing platform.

6. The construction method according to claim 4, wherein the forward primer contains the sequence shown in Seq ID No. 1, and the reverse primer contains the sequence shown in Seq ID No. 3;

Seq ID No. 3: 5’-TCGGGCTTCGACTGGAGAC-3’.