CN112226821B

CN112226821B - Construction method of MGI sequencing platform sequencing library based on double-strand cyclization

Info

Publication number: CN112226821B
Application number: CN202011107885.7A
Authority: CN
Inventors: 闫科技; 张智慧
Original assignee: Kunyu Biotechnology Jiangmen Co ltd
Current assignee: Kunyu Biotechnology Jiangmen Co ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2024-02-06
Anticipated expiration: 2040-10-16
Also published as: CN112226821A

Abstract

The invention discloses a construction method of a sequencing library of an MGI sequencing platform based on double-strand cyclization, which comprises the following steps: constructing a joint framework of a double-chain structure, wherein the joint framework has a joint sequence required by an MGI sequencing platform and can be connected by TA, and a section of single-chain structure is reserved in the middle part; sample DNA fragmentation and end repair are added with A; taking the constructed joint skeleton of the double-chain structure and the DNA fragment with the end repaired and added with A as substrates, and connecting the substrates into a ring by using TA to generate a double-chain connecting product; and digesting the double-chain connection product by using exonuclease, only preserving the single-chain structure forming a ring, and purifying the digestion product to obtain a final library compatible with the MGI sequencing platform. The invention uses TA connection principle to cyclize on double-chain basis in cyclizing process, then obtains single-chain annular sequencing library which can be directly put on machine, the construction process is simple and quick, and the information change caused by amplification is avoided.

Description

Construction method of MGI sequencing platform sequencing library based on double-strand cyclization

Technical Field

The invention relates to the technical field of high-throughput sequencing, in particular to a construction method of a sequencing library of an MGI sequencing platform based on double-strand cyclization.

Background

After the analysis of the double helix structure of DNA molecules from the 20 th century, the development of biology has entered the era of molecular biology. In order to explore life's mystery deeper, complex cryptographic interpretation is the first step. With the development of the Human Genome Project (HGP), attempts have been made to interpret the vital codes of species on a large scale, and in the process, development of some sequencing methods has been accelerated, sequencing of genetic codes of humans and some important species at the genomic level has been achieved, and interpretation of the vital codes has entered the development stage of genomics.

In genomics research, sequencing technology plays an irreplaceable role. The different sequencing methods are roughly classified into three generations according to the development process. The first generation of DNA sequencing techniques mainly included Maxam-Gilbert chemical degradation and Sanger dideoxy termination. The first generation sequencing technology plays a great role in human genome project and initial genome sequencing, but has the defects of high cost, low flux, low speed and the like. After 21 st century, the second generation sequencing technology represented by the 454 sequencing technology of Roche company, the Solexa sequencing technology of Illumina company and the SOLiD sequencing technology of ABI company, which are also called as high-flux sequencing technology, has been developed, so that the high accuracy is maintained, the sequencing cost is greatly reduced, the sequencing speed and flux are improved, and a great pushing effect is played for the development of genomics. In recent years, a third generation sequencing technology is formed for the development of a single-molecule sequencing technology, and is mainly represented by an SMRT technology of Pacific Biosciences company and a nanopore single-molecule sequencing technology of Oxford Nanopore Technologies company.

The sequencing technology mainly applied to the research of genomics at present is also a second generation high-throughput sequencing technology, and has the advantages of high throughput, high speed, low cost and the like relative to the first generation and the third generation, is very suitable for the research of the large-scale genomics level, and is not easy to eliminate in a short time. While most of the current commercial platforms are occupied by the various types of sequencers of illumina, the price of sequencing is greatly reduced, the cost of testing large-scale samples and adding kits is still a great limitation for most laboratories to develop related researches. In addition, the sequencer is an upstream platform of the sequencing industry and is decisive equipment in the gene detection, editing and synthesis industry, but the sequencing core technology is mostly held by foreign enterprises for a long time, and is protected by strict patents, so that huge barriers are formed for domestic research development and industrial development.

The national China's national collection flag Hua Dazhi makes that a sequencing platform based on DNBseq sequencing technology is successfully developed at present, a series of sequencers such as BGISEQ-500 and the like are provided, and the sequencing platform has complete independent intellectual property rights, has the advantages of larger flux, lower cost and the like compared with an illumine sequencing platform, breaks the limitation of national sequencing industry monopoly of foreign enterprises to a certain extent, and opens a new situation for domestic research and industrial development.

Hua Dazhi the MGISEQ series sequencer developed mainly has the following core technologies: 1) The DNA Nanosphere (DNB) technology realizes the amplification of DNA fragments by a rolling circle amplification mode, and the linear amplification mode can ensure that each amplification chain always amplifies an original template, and the amplification errors of the amplification chains cannot be accumulated into exponential amplification like bridge PCR, so that the original sequence information is reduced to the greatest extent. 2) A regular array chip (Patterned array) realizes regular arrangement of DNB (deoxyribonucleic acid) by forming a binding site array and an alignment mark on the surface of a silicon wafer, has uniform signals and high density, does not interfere with each other, ensures sequencing accuracy, and improves the utilization efficiency of the sequencing chip. 3) The optimized combined probe anchored polymerization technology (cPAS) is used for polymerizing the DNA molecular anchor and the fluorescent probe on DNB, and the sequence information is read by using a mode of Sequencing By Ligation (SBL), so that the base reading accuracy is higher compared with a mode of Sequencing By Synthesis (SBS).

The MGI sequencing platform has certain advantages over the illuminea platform, and the library construction and sequencing flow is shown in FIG. 1. Because of some inherent characteristics of library construction and sequencing, there is still room for further improvement and optimization in the library construction process.

(1) Because the rolling circle mode is utilized to amplify the target fragment, the single-chain circular DNA is used as an on-machine amplification initial sample library, and compared with the conventional process, the library construction has more cyclization process for generating circular single chains, so that the whole library construction process is complicated and time-consuming;

(2) The single-strand cyclization mode can only enable a small part of PCR products to generate a loop reaction to generate effective data in efficiency, and has a certain influence on library homogenization and information originality.

In view of this, an improvement is needed to be made to the existing construction method of the MGI sequencing platform library, so that the whole library construction process is simpler and faster; in addition, the information change caused by amplification is avoided.

Disclosure of Invention

The invention aims to solve the technical problems that the construction method of the existing MGI sequencing platform library is tedious and time-consuming and has a certain influence on library uniformity and information originality.

In order to solve the technical problems, the technical scheme adopted by the invention is to provide a construction method of a sequencing library of an MGI sequencing platform based on double-strand cyclization, which comprises the following steps:

constructing a joint framework of a double-chain structure, wherein the joint framework of the double-chain structure has a joint sequence required by an MGI sequencing platform and can be connected by TA, and a section of single-chain structure is reserved in the middle part;

fragmenting sample DNA and adding A for terminal repair;

taking the constructed joint skeleton of the double-chain structure and a sample DNA fragment with the end repaired and added with A as substrates, and connecting the substrates into a ring by using TA to generate a double-chain connection product;

digesting the double-stranded ligation product by using exonuclease, only preserving the single-stranded circular structure product which is looped, and then purifying the digestion product to obtain a final library compatible with the MGI sequencing platform.

In the method, the 3-end of the linker skeleton of the double-stranded structure is provided with a T base protrusion for complementary pairing with the sample DNA fragment after A addition.

In the above method, the sequence of the adaptor skeleton of the double-stranded structure is:

in the method, the sample DNA is fragmented by ultrasonic or enzyme cutting breaking, and the fragmented sample DNA is distributed in the range of 200bp-700 bp.

In the above method, the whole library construction process includes sample DNA disruption, DNA fragment screening, repair addition A, double strand cyclization, single strand formation and library purification for 6 reaction stages, which takes 3 to 4 hours in total.

In the above method, the sample DNA was fragmented for 20min, the DNA fragment was screened for 30min, the repair and A addition time was 50min, the double strand cyclization time was 35min, the single strand cyclization time was 35min, and the library purification time was 35min.

In the above method, the screening and purification of the sample fragment DNA are performed using magnetic beads.

Compared with the prior art, the method utilizes the TA connection principle to carry out cyclization on the basis of double chains in the cyclization process, and then obtains a single-chain circular DNA library which can be directly connected with a machine; the whole warehouse-building process is simpler and quicker, and the certain cyclization efficiency is improved; in addition, in the middle of library construction, original information of a sample DNA sequence is reserved because the fragment PCR amplification is not carried out, and information change caused by the amplification is avoided.

Drawings

FIG. 1 is a schematic diagram of a conventional library construction and sequencing flow for an MGI sequencing platform;

FIG. 2 is a schematic flow chart of the method of the present invention;

FIG. 3 is an electrophoretogram of a linker backbone product constructed in the present invention;

FIG. 4 is an electrophoretogram of library product RCA product in an embodiment of the invention.

Detailed Description

The invention provides a construction method of a sequencing library of an MGI sequencing platform based on double-strand cyclization, which has the advantages that the construction process of the sequencing library is simple and quick, and the change of sample DNA information generated by amplification is avoided. The invention is described in detail below with reference to the drawings and the detailed description.

In order to make the explanation and the description of the technical solution and the implementation of the present invention clearer, several preferred embodiments for implementing the technical solution of the present invention are described below. It should be apparent that the specific embodiments described below are only some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 2 is a flow chart of a method for constructing a sequencing library of an MGI sequencing platform based on double-strand cyclization, wherein the cyclization is performed on the basis of double strands by utilizing a TA connection principle in the cyclization process, and then a single-strand circular sequencing library capable of being directly put on machine is obtained. As shown in fig. 2, the method of the present invention comprises the steps of:

step 100: constructing a linker backbone of double-stranded structure, which is available for TA ligation and compatible with downstream library generation.

Since cyclization is based on the principle of TA ligation on a double-stranded basis, it is necessary to construct in advance a linker backbone for the above-described TA ligation compatible with the double-stranded structure generated by the downstream library, which satisfies the following requirements:

(1) The DNA fragment of the connecting part is of a double-chain structure, so that the ligase can efficiently recognize and connect;

(2) The 3 end is provided with a T base protrusion so as to carry out complementary pairing with the sample DNA fragment after A is added;

(3) The linker framework sequence of the double-chain structure is the linker sequence used by the MGI sequencing platform so as to ensure that the final library is completely consistent with the sequence structure of the traditional library;

(4) The middle part fragment of the linker skeleton of the double-stranded structure is in the form of single-stranded structure DNA to realize cyclization and form a single-stranded annular structure.

A preferred sequence of the adaptor backbone of the double-stranded structure is:

step 200, breaking the sample DNA, fragmenting and screening.

And 300, repairing the tail end of the screened sample DNA fragment and adding A.

Step 400, using the constructed double-chain structure linker skeleton and the sample DNA fragment with the end repaired and added with A as substrates, and connecting the substrates into a ring by using TA to generate a double-chain connection product.

The step utilizes the TA connection principle to carry out connection cyclization reaction, the reaction process is consistent with the conventional connection reaction, but the reaction volume needs to be increased, and the reaction Buffer needs to be adjusted so as to reduce the linear structure products of multiple connection of the joint skeleton and the sample DNA fragment and improve the generation and accumulation of the annular products.

And step 500, digesting the double-chain connection product by using exonuclease, only preserving the single-chain circular structure product which is looped, and purifying the digestion product to obtain a final library compatible with the MGI sequencing platform.

In the above method, the detailed steps of step 100 are as follows:

step 110, diluting the three-section skeleton single-stranded DNA fragments constructing double-chain connection to 100 mu mol by using water;

wherein, the sequence of the DNA fragment 1 is:

AGTCGGAGGCCAAGCGGTCTTAGGAAGACAANNNNNNNNNNCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTT；

the sequence of the DNA fragment 2 is:

TTGTCTTCCTAAGACCGCTTGGCCTCCGACTT；

the sequence of the DNA fragment 3 is:

AGTCGGATCGTAGCCATGTCGTTCTGTGAGCC。

step 120, configuring the following reaction system in a PCR tube:

130, vortex mixing uniformly and instantaneous centrifuging;

step 140, placing the PCR tube on a PCR instrument, and the reaction procedure is as follows:

wherein, the temperature is reduced from 95 ℃ to 25 ℃ and 0.1 ℃/s is adopted for slow temperature reduction, thus the reaction effect is the best.

Step 150, diluting the reaction product to 50ul with water, and placing the reaction product at 4 ℃ for standby or storing the reaction product at-20 ℃.

In step 200 of the above method, the detailed steps of breaking the sample DNA for fragmentation (about 20 min) are as follows:

the sample DNA is fragmented by an ultrasonic or enzyme cutting breaking method, and the fragmented DNA is distributed within the range of 200bp-700bp with good effect;

the DNA fragment was filled with water to a total volume of 50. Mu.L, and all was transferred to a new 1.5mL centrifuge tube, and subjected to magnetic bead fragment screening.

In step 200 of the above method, the detailed steps of DNA fragment screening and purification (about 30 min) are as follows:

taking out the magnetic beads, balancing at room temperature, and fully and uniformly vortex mixing;

sucking 30 mu L of magnetic beads into the DNA fragmentation product, and beating until the magnetic beads are completely mixed;

incubating for 5min at room temperature, and placing the centrifuge tube on a magnetic rack;

standing for 2-5min until the liquid is clear, carefully sucking the supernatant into a new 1.5mL centrifuge tube by using a pipette;

sucking 10 mu L of purified magnetic beads into the 80 mu L of supernatant by using a pipette, and beating until the beads are completely mixed;

standing for 2-5min until the liquid is clear, carefully sucking the supernatant by a pipette and discarding the supernatant;

placing the centrifuge tube on a magnetic rack, adding 500 mu L of freshly prepared 80% ethanol, standing for 30s, sucking and discarding the supernatant;

repeating the previous step, and sucking the residual liquid in the tube cleanly by using a liquid-transfering device;

keeping the centrifuge tube on a magnetic frame, opening a tube cover of the centrifuge tube, and drying for 5min at room temperature to avoid excessive drying;

taking down the centrifuge tube from the magnetic rack, adding 40 mu L of eluent, and gently sucking and beating by using a pipette until the eluent is completely mixed;

incubating for 5min at room temperature, performing instantaneous centrifugation, and placing the centrifuge tube on a magnetic rack;

standing for 2-5min until the liquid is clear, transferring 40 mu L of supernatant into a new 0.2mL PCR tube, and obtaining the screened and purified sample DNA fragments.

In step 300 of the above method, the detailed steps of sample DNA fragment end repair and addition of A (about 50 min) are as follows:

the reaction solution was placed on ice as follows:

blowing, mixing and instantaneous centrifuging;

the PCR tube was placed on a PCR instrument and reacted according to the following procedure:

slightly centrifuged, and placed on ice for further double strand cyclization.

In step 400 of the above method, the detailed steps of double-strand cyclization (about 35 min) are as follows: the ligation reaction solution was prepared on ice as follows:

vortex vibration and mixing, and instantaneous centrifugation;

placing the PCR tube on a PCR instrument, and reacting according to the following procedure;

the mixture was removed and centrifuged slightly and placed on ice for the next single-stranded operation.

In the above method, the detailed procedure of the single-stranded (about 35 min) in step 500 is as follows: the digestion reaction solution was prepared on ice as follows:

vortex vibration and mixing, and instantaneous centrifugation;

taking out and instantly centrifuging;

10ul of enzyme cutting reaction stopping solution is added into the reaction product, vortex mixing is carried out, instantaneous centrifugation is carried out, and then all the reaction solution is transferred into a new 1.5mL centrifuge tube.

In the above method, the detailed steps of library purification (35 min) are as follows:

taking out the purified magnetic beads in advance, balancing at room temperature, and fully vibrating and uniformly mixing;

sucking 200 mu L of purified magnetic beads into the digestion products, and lightly sucking and beating the digestion products by a pipette until the digestion products are completely and uniformly mixed;

incubating for 10min at room temperature;

placing the centrifuge tube in a magnetic rack, standing for 2-5min until the liquid is clear, sucking and discarding the supernatant;

keeping the centrifuge tube on a magnetic rack, adding 500 mu L of freshly prepared 80% ethanol, standing for 30s, carefully sucking and discarding the supernatant;

repeating the previous step to suck residual liquid in the dry pipe;

keeping the centrifuge tube fixed on the magnetic frame, opening the tube cover of the centrifuge tube, and drying for 2-5min at room temperature to avoid excessive drying;

taking down the centrifuge tube from the magnetic rack, adding 20 mu L of eluent, and gently sucking and beating by using a pipette until the eluent is completely mixed;

incubating for 5min at room temperature;

placing the centrifuge tube on a magnetic rack, standing for 2-5min until the liquid is clear, and transferring 20 mu L of supernatant into a new 1.5mL centrifuge tube;

the product is the final library, which can be used for the next quality inspection and the on-machine sequencing, and can be stored in a refrigerator at the temperature of minus 20 ℃.

The total time taken for the preparation time and the reaction time in the whole warehouse-building process is about 3-4 hours.

The feasibility verification is carried out on a plurality of key steps in the method of the invention, so as to prove that the technical scheme can achieve the purposes, and the main verification results are as follows:

(1) Verification that the double-stranded structure of the joint bone structure is built into double-strand.

The adaptor skeleton of double-stranded structure is formed through annealing renaturation and phosphorylation of synthesized single-stranded DNA segment, and one long single strand and two short single strands complementary to the long single strand are slowly cooled by a PCR instrument under proper concentration, so that the renaturation is a long double-stranded structure, namely the adaptor skeleton of double-stranded structure. The non-renatured single-chain fragments are used as a control, after gel electrophoresis analysis, the joint skeleton of the constructed double-chain structure forms the expected length, an electrophoresis gel diagram is shown in figure 3, lane 1 is the joint skeleton reaction product of the double-chain structure, lane 2 is the fragment control which is not subjected to annealing reaction, and the double-chain joint construction is proved to be successful according to the size comparison shown by electrophoresis.

(2) Verification of the cyclization reaction was performed in double-stranded form.

The double-chain ring formation mainly takes a sample DNA fragment which is successfully constructed in the early stage and is repaired and added with A after fragmentation as a reactant, and two ends of one linker skeleton are respectively connected with two sample DNA fragments in a TA connection mode to form a double-chain ring structure. Then, the non-circularized adaptor backbone, the sample DNA fragment and the strand containing the nick in the double-stranded circular structure in the ligation product are digested by enzyme digestion using an exonuclease to finally obtain a single-stranded circular library. In order to verify the success of double-stranded ligation and the formation of a circular structure, the product after digestion and purification was used as a template for rolling circle amplification reaction, while the single-stranded circular product formed by using the Huada library kit was used as a control. Gel electrophoresis was performed on the amplified products, and as a result, as shown in FIG. 4, the lane 2 sample was a library obtained according to the MGI library construction method, the lane 1 sample was a library obtained according to the method of the present invention, and both sets of library samples were subjected to RCA reaction to produce rolling circle amplified products, thereby verifying the feasibility of carrying out the loop forming reaction on the double strand form.

(3) The circular products generated by the method are purified and measured in concentration, then the circular products are subjected to on-machine sequencing on a Huada sequencing platform, and finally effective data of target genome are generated in sequencing data, namely, the circular products prove that the circular products can be used for constructing and sequencing DNA libraries.

The process of the present invention is compared with the conventional process provided by MGI in terms of operating time and cyclisation efficiency as follows to verify the superiority and value of the present invention.

In the operation time, the genome is fragmented into a reaction starting process, a final annular purified library is obtained as a reaction ending process, and the method mainly comprises 6 reaction stages (about 20min for breaking sample DNA, about 30min for screening DNA fragments, about 50min for repairing and A adding, about 35min for double-strand cyclization, about 35min for single-strand cyclization and about 35min for library purification), wherein the total time is about 3-4 hours, and the construction of the whole library can be completed in half a day; whereas the current official method provided by MGI requires 11 reaction stages (breaking sample DNA for about 20min, screening DNA fragments for 30min, repairing and adding A for about 50min, adaptor ligation for about 35min, product purification for about 30min, PCR for about 35min, product purification for about 30min, double-strand denaturation for about 10min, single-strand cyclization for about 35min, product digestion for about 35min, library purification for about 35 min), the total time is about 6-7 hours, and the method is nearly half-shortened in time consumption.

In the cyclization efficiency, the same sample DNA is taken as a starting material, fragmented double-stranded DNA of the same amount of substances is added in the cyclization reaction process, then double-stranded cyclization used in the invention and single-stranded cyclization currently provided by MGI are respectively carried out, the cyclized product is subjected to digestion by enzyme digestion and magnetic bead purification to remove uncyclized fragments and linkers, and then the final product is subjected to concentration sequencing by using a Qubit fluorescent quantitation reagent and an instrument, and the ratio of the mass of the final cyclized product to the mass of the initially added DNA fragment is taken as an evaluation of the cyclization efficiency. The ratio obtained by the method can reach 20% -30%, and the ratio obtained by the single-chain cyclization method provided by the MGI at present is about 10%.

In summary, the invention mainly aims at carrying out new design and optimization on the method for constructing the high-throughput sequencing library of the current MGI sequencing platform, provides a new library construction flow, and has the following advantages compared with the construction method of the whole genome conventional library provided by the current MGI:

(1) The operation process is simpler, and the time required by the whole process is shorter;

(2) The enzyme reagent is less in variety and the library construction cost is lower;

(3) The fragments are not amplified by PCR in the library establishment process, and the sequence change caused by amplification is stopped;

(4) Double strand circularization is more efficient, allowing more DNA fragments to form a final efficient sequencing library.

The method of the invention described above constructs a whole genome sequencing library of the MGI sequencing platform using the sample DNA as the starting sample type, but the method of the invention is equally applicable to the construction of methylation libraries and related compatible libraries such as transcriptomes using RNA as the starting sample type.

The present invention is not limited to the above-mentioned preferred embodiments, and any person who can learn the structural changes made under the teaching of the present invention can fall within the scope of the present invention if the present invention has the same or similar technical solutions.

Claims

1. The construction method of the MGI sequencing platform sequencing library based on double-strand cyclization is characterized by comprising the following steps:

fragmenting sample DNA and adding A for terminal repair;

digesting the double-chain connection product by using exonuclease, only preserving the single-chain annular structure product which is looped, and then purifying the digestion product to obtain a final library compatible with the MGI sequencing platform;

wherein the sequence of the linker skeleton of the double-stranded structure is:

2. the method of claim 1, wherein the adaptor backbone of the double-stranded structure has a T base overhang at the 3-terminus for complementary pairing with the sample DNA fragment after A addition.

3. The method according to claim 1, wherein the sample DNA is fragmented by means of ultrasound or cleavage disruption, and the fragmented sample DNA is distributed in the range of 200bp-700 bp.

4. The method of claim 1, wherein the library construction process comprises 6 reaction stages of sample DNA disruption, DNA fragment screening, repair addition a, double strand cyclization, single strand, and library purification, taking a total of 3-4 hours.

5. The method according to claim 4, wherein: sample DNA fragmentation time is 20min, DNA fragment screening time is 30min, repair and A addition time is 50min, double strand cyclization time is 35min, single strand cyclization time is 35min, and library purification time is 35min.

6. The method according to claim 4, wherein the screening and purification of the sample DNA fragments are performed by using magnetic beads.