CN113667716A

CN113667716A - Sequencing library construction method based on rolling circle amplification and application thereof

Info

Publication number: CN113667716A
Application number: CN202110996788.6A
Authority: CN
Inventors: 肖飞; 罗玄梅; 邹丽辉; 苏斐; 张丽丽; 李贺鑫
Original assignee: Beijing Hospital
Current assignee: Beijing Hospital
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-11-19
Anticipated expiration: 2041-08-27
Also published as: CN113667716B

Abstract

The application provides a construction method and application of a sequencing library based on rolling circle amplification. The construction method of the sequencing library comprises the following steps: providing a closed circular double stranded DNA, cDNA or RNA molecule; performing rolling circle amplification by using specific primers, so that only one single-stranded DNA product containing multiple copies is obtained by amplifying each circle and is used as a first chain; and (3) generating a complementary second strand by using the first strand as a template, thereby obtaining a double-stranded DNA product. Methods of sequencing and kits are also provided.

Description

Sequencing library construction method based on rolling circle amplification and application thereof

Technical Field

The invention relates to the field of gene detection, in particular to a construction method of a sequencing library based on rolling circle amplification, application thereof, a sequencing method and a kit.

Background

As a third generation sequencing technology, for example, the Nanopore sequencing technology of Oxford Nanopore Technologies (ONT) and the SMRT (single molecule real time sequencing) sequencing technology of Pacific Biosciences (PacBio), a Single Molecule Sequencing (SMS) method is the most important characteristic of being capable of sequencing a single molecule and has the characteristics of high flux, long read length (read) and high speed. The long read length can reduce the splicing cost and save the memory and the calculation time. Meanwhile, the third generation sequencing also expands the application of the second generation sequencing technology, such as direct reading of the methylation information of DNA/RNA.

However, the high error rate of base reading associated with single molecule sequencing of the third generation sequencing technology has limited research on small fragment insertion or deletion (InDel) and Single Nucleotide Variation (SNV). Particularly, when nucleic acid sequences with high sequence diversity, especially the diversity of single or several base differences, are classified, such as clonotype (clonotype) typing of immune repertoires and species identification of microbial 16s amplicon sequencing, the accuracy of second-generation sequencing is often difficult to achieve by third-generation sequencing.

The PacBio sequencing platform self-calibrates by a series of sub-reads (subreads) generated by circular sequencing to obtain high quality HiFi reads. The method not only provides accurate sequence information, but also has simpler analysis flow and greatly reduced consumed time in the aspect of subsequent operation. But it suffers from limited read length (compared to ONTs) and high cost.

The nanopore sequencing platform of the ONT carries out base recognition according to different current amplitude changes when different bases pass through the nanopore. The read length of ONT (100 Kb) is much longer than that of PacBio (10 Kb), the data can be read in real time and the flux is higher, the sequencing instrument is convenient to carry, but the error rate of base reading is higher.

The immune repertoire refers to the sum of all functionally diverse B cells and T cells at any given time within the circulatory system of an individual. The T cell and B cell surface have receptors that specifically bind to certain antigens, called T cell and B cell surface receptors (TCR/BCR, T/B cell receptor). There is a Region on the TCR/BCR called the Complementary Determining Region (CDR), comprising CDR1, CDR2, CDR3, of which CDR3 is the most highly variable and plays a key role in antigen recognition. Immune repertoires are highly diverse, with thousands of clonotypes present, and some clonotypes present only one copy. The high error rate of the current three-generation single-molecule sequencing leads to the failure of the sequencing to be used for immune repertoire research. Indeed, current research of immunohistochemical libraries is limited to the use of second generation sequencing technologies, such as illumina. However, due to the short read length of the next generation sequencing platform, the current mature library construction and analysis methods mostly only study the CDR3 region, thereby losing the information of the full-length RNA transcript; meanwhile, because of the diversity of the V, D, J gene fragment itself, numerous primers were used for the second-generation sequencing (e.g., 108 primers were used for the IG/TR DNA amplicon assay provided by the Euroclonity-NGS working group); in addition, there are problems that amplification is highly biased, complicated and time-consuming due to many PCR reactions, and it is difficult to determine the correct mixing ratio of the products in each PCR reaction tube. Considering that the first-generation sequencing technology can reach about 1000bp in read length, the first-generation sequencing technology is used for the research of immune repertoires and is based on L-C gene fragment sequencing, the information of RNA full-length transcripts can be obtained, but the first-generation sequencing technology has low flux, and L gene fragment primers have low specificity and low affinity, so that abundant full-length information is difficult to obtain practically. These have all greatly limited the more comprehensive study of immune repertoires.

Extrachromosomal circular DNAs (eccDNAs) refer to single-stranded or double-stranded closed circular DNAs located outside the chromosome, and have a wide length distribution of several hundred bp to several hundred megabases. eccDNA is widely present in various eukaryotes and has high tissue and disease specificity. Most of the recent studies show that eccDNA is an important mechanism for driving tumor heterogeneity, and meanwhile, eccDNA can affect cell life activities, promote tumor cell evolution and adaptive evolution, and increase genome plasticity and instability.

Circular RNAs (Circular RNAs) are a class of non-coding RNAs that can be as small as 100bp or greater than 4000bp in length, have covalently linked closed-loop structures, and result from reverse splicing events. It has now been found that some circrnas act as miRNA sponges in the cytoplasm, or as spacers of RNA Binding Proteins (RBPs), or as regulators of nuclear translation, and are important participants in the regulatory network of gene expression. Most studies have found that circRNA may play an important role in atherosclerosis, neurodegenerative diseases, prion diseases and cancer.

The second-generation sequencing technology is short in reading length and cannot directly sequence circular nucleic acid, the circular structure needs to be opened linearly and sequence breaking is carried out in the library building process due to the fact that the natural circular structure of the eccDNA/circRNA is long in part, the sequence of the eccDNA/circRNA is presumed by utilizing an algorithm based on an integration site in the later period, and the real eccDNA/circRNA and the components of the eccDNA/circRNA cannot be analyzed intuitively and accurately.

The inventors have noted that the primers of conventional rolling circle amplification techniques are random six bases, and can randomly bind to any position of a nucleic acid sequence for amplification. Therefore, a plurality of long sequences containing multiple copies are generated after one circular nucleic acid sequence is subjected to rolling circle amplification reaction, and the sequencing library established by the method changes the proportion of each nucleic acid sequence in the original library while generating a large amount of data redundancy, so that the quantification is difficult.

Disclosure of Invention

The present invention is directed to addressing at least one or more of the problems set forth above. To this end, the invention provides a method for constructing a sequencing library for single molecule sequencing (i.e., third generation sequencing), applications thereof, and a related kit. The invention adopts specific primers to carry out rolling circle amplification on circular cDNA, dsDNA or RNA molecular forms of molecules to be sequenced, and only one long sequence containing multiple copies is generated by one circular sequence, namely single copy amplification. The sequencing library obtained by the construction method is suitable for a third-generation sequencing platform to perform single-molecule sequencing, for example, an ONT sequencing platform and a PacBio sequencing platform perform sequencing, and a consistency sequence is generated by self-correction among copies on a long fragment, so that the sequencing base quality is obviously improved, the high-precision sequencing read length is obtained, the error rate of single-base reading is reduced, the cost is reduced, and the application range of the third-generation sequencing is widened.

In addition, the traditional rolling circle amplification utilizes non-specific primers, and performs multi-copy amplification on a closed circular molecule form of a molecule to be sequenced, namely, one circular nucleic acid sequence generates a plurality of long sequences containing multiple copies, so that the ratio of each nucleic acid sequence in an original library is changed while a large amount of data redundancy is generated, and the quantification is difficult. The invention is based on single copy amplification and can realize relative quantification of sequencing molecules.

The construction method of the invention is prominent in research of circRNA, eccDNA, amplicon sequencing, immune repertoire and the like.

In a first aspect, there is provided a method of constructing a sequencing library for single molecule sequencing, comprising:

providing a closed circular double stranded DNA molecule, cDNA molecule or RNA molecule form of the molecule to be sequenced;

performing rolling circle amplification using primers specific to the closed circular double-stranded DNA molecules, cDNA molecules or RNA molecules, such that only one single-stranded DNA product containing multiple copies is obtained per circle as a first strand;

a complementary second strand is generated using the first strand as a template, thereby obtaining a double-stranded DNA product as a sequencing library for single molecule sequencing.

In some embodiments, the closed circular double stranded DNA or cDNA molecule is extrachromosomal circular DNA, or is formed by:

A) ligation of blunt-ended double-stranded DNA or cDNA molecules into closed loops by ligases, such as T4DNA ligase, T4 RNA ligase;

B) ligation of sticky-ended double stranded DNA into closed loops by TA, for example using a T-bridged fragment with a dT-sticky end at the 3' end, for example as defined by SEQ ID NO:8 and 9, respectively.

The T-bridged fragment used in the examples herein is represented by SEQ ID NO:8 and 9, consisting of Xcml restriction enzyme fragments at both ends and the ccdB gene in the middle, with an overhang of one T base at each end, as shown in FIG. 13.

TA ligation is a looping technique commonly used in the art, and ligation is performed by pairing between sticky ends T and A bases at the ends of two double-stranded molecules to be ligated, respectively.

The amplification enzymes used in rolling circle amplification are known to those skilled in the art, and include phi29DNA polymerase, Bst DNA polymerase or Klenow enzyme, phi29DNA polymerase is preferred for DNA molecules, and Bst 3.0DNA polymerase is preferred for RNA molecules.

In some embodiments, the cDNA molecules are derived from total RNA of leukocytes (e.g., from peripheral blood, bone marrow, etc.). For immune repertoire studies, a miRNA linker (SEQ ID NO:6) can be ligated to the 3' end of the cDNA; dsDNA can be obtained by multiplex amplification using specific primers (e.g., SEQ ID NO:7, 21, 23-30); and/or, dsDNA can be ligated into closed loops by a DNA ligase such as T4DNA ligase and rolling circle amplification can be performed using primers (e.g., SEQ ID NO:22, 31-39), phi29DNA polymerase.

In some embodiments, the double stranded DNA products are ligated into a loop by a ligase (e.g., T4DNA ligase, T4 RNA ligase). Or by looping using a T-bridged fragment, in which case the sequence of the specific primer may be SEQ ID NO: 20.

the specific primer may be absent terminal modifications when using phi29DNA polymerase for rolling circle amplification. Those skilled in the art know that phi29 normally has 3' to 5' exonuclease activity, which can be prevented by phosphorothioate modifications at the 3' end. The inventor finds that in the rolling circle amplification constructed by the library, excessive specific primers with unmodified ends can be added, and the addition amount is preferably 100-1000 uM, so that the primer specific sites of the sequenced DNA chain are completely saturated, and the cost is further reduced.

In some embodiments, the complementary second strand of the first strand is produced by:

generating a poly-A sequence at the 3' end of the first strand using a terminal transferase;

use of Oligod (T) complementary to the poly-A sequence of the first strand₂₀As primers, a DNA polymerase (e.g., phi29DNA polymerase, Bst DNA polymerase, or Klenow enzyme) is used to generate the second strand, forming a dsDNA product.

The inventors have found that dsDNA produced by the above method, when used for sequencing, further improves the sequencing results and improves accuracy.

The sequencing libraries generated by the construction methods described herein are suitable for single molecule sequencing, for example for nanopore platform sequencing such as the ONT platform or other single molecule real-time sequencing platforms such as the PacBio platform sequencing. For third generation single molecule sequencing, the resulting dsDNA products can be ligated to sequencing adapters, e.g., SQK-LSK109 ligation sequencing kits using the ONT sequencing platform, to obtain a sequencing library.

In a second aspect, there is provided a sequencing method comprising:

obtaining a sequencing library using the method of construction of the first aspect;

the library is sequenced using a single molecule sequencing method, e.g., nanopore platform sequencing such as the ONT platform or other single molecule real-time sequencing platform such as the PacBio platform sequencing.

The construction method or sequencing library can be used for immune repertoire sequencing, amplicon sequencing, extrachromosomal circular DNA sequencing and circular RNA sequencing research.

In a third aspect, there is provided a kit for sequencing library construction for single molecule sequencing, comprising:

1) specific primers for isothermal amplification, and

2) an enzyme for rolling circle amplification such as phi29DNA polymerase, Bst DNA polymerase or Klenow enzyme, and

3) a T-bridged fragment with a dT cohesive end at the 3' end, such as a double-stranded DNA consisting of the sequences

SEQ ID NO

8 and 9 and a specific primer sequence therefor SEQ ID NO 20; and/or

4)5 'end r APP modified and 3' end NH₂Block-modified linkers, e.g. miRNA linkers of SEQ ID NO 6 and specific primers thereforSEQ ID NO:7。

In some embodiments, the kit further comprises a DNA or RNA ligase, such as T4DNA or RNA ligase.

In some embodiments, the kit further comprises:

dATP and Oligod (T) 20; and/or

Specific primers SEQ ID NO 21, 23-31 for immunohistorian amplification and specific primers SEQ ID NO 22, 31-39 for rolling circle amplification.

Based on the disclosure herein, it will be understood by those skilled in the art that the specific primers of the present invention specifically bind to closed circular double-stranded DNA, cDNA or RNA molecules (only one binding site is present), and that each test molecule is amplified by the action of rolling circle amplification enzyme to give only one single-stranded DNA product containing multiple copies.

Before rolling circle amplification, specific molecules, such as miRNA linkers, T-bridged fragments, can be ligated to one end of the dsDNA or cDNA in order to design specific primers for the specific molecules for multiplex primer PCR amplification and/or rolling circle amplification. In addition, the linker and the bridging fragment can be used as a barcode (barcode) of a molecule to be sequenced to realize multi-sample mixed sequencing, and then the barcode is utilized to perform data splitting among samples.

Based on the disclosure herein, one skilled in the art understands that alternatively, for highly diverse, low copy number molecules to be sequenced, such as immunohistorian molecules, multiplex primer PCR amplification using specific primers can be performed prior to rolling circle amplification to enrich for the molecule to be sequenced.

It will be appreciated by those skilled in the art that specific primers used in rolling circle amplification can be designed for specific sequences attached or the sequence to be sequenced itself. Specific sequences in the test sequence can be readily determined and primers designed for that sequence. For example, nucleotide data bases of GenBank are reviewed, sequence identity and similarity are identified using computer software such as BLASTN and BLASTX, and primers are designed using primer design software.

Those skilled in the art are familiar with various end modifications, such as 5' adenylation modifications for ligation to the 3' end of the cDNA by 5' AppDNA/RNA thermostable ligase; to avoid end to other nucleic acid molecules connected, can be 3' end closed; for DNA ligase mediated ligation of DNA fragments, 5' terminal phosphorylation modifications can be performed.

In specific embodiments, the TA ligation based dsDNA looping method comprises:

a) providing a bridged fragment with a phosphorylated modified double-stranded DNA at the 5 'end and an overhanging dT base at the 3' end;

b) providing a dsDNA form of a molecule to be sequenced, wherein the 5 'end of the dsDNA form is provided with phosphorylation modified double-stranded DNA and the 3' end of the dsDNA form is provided with an outstanding dA base, for example, a dsDNA amplification product is obtained by performing multiple primer PCR amplification by using a primer of which the 5 'end is phosphorylation modified and the 3' end is provided with an outstanding dA base;

c) looping the bridged fragment and dsDNA using the principle of TA ligation;

d) the non-circular dsDNA is removed after treatment with Exonuclease Lambda exoclease and Exonuclease III.

In a specific example, the T4 RNA ligase 1 based cDNA looping method comprises:

a) providing RNA to be detected and carrying out reverse transcription;

b) RNaseA treatment to remove RNA in the reaction system;

c) t4 RNAlignase 1 mediated cDNA ring formation;

d) the non-circular cDNA was removed after Exonase I treatment.

In a specific embodiment, the rolling circle amplification method comprises:

a) obtaining a circular DNA form of the molecule to be sequenced;

b) synthesis of the first strand by rolling-out amplification with phi29DNA polymerase using specific primers (e.g. primers directed against the bridged fragment);

c) continuously doping a plurality of dATPs at the 3' end of the first strand by using terminal transferase to form a poly-A sequence;

d) use of Oligod (T)₂₀The primer is complementary to the first strand poly-A sequencePairing, relying on phi29DNA polymerase to synthesize the second strand.

In a specific embodiment, the method for studying the full-length transcriptome of the immune repertoire TCR/BCR comprises:

a) providing total RNA of white blood cells in a sample to be tested;

b) use of Oligod (T)₂₀Reverse transcription is carried out on mRNA by the primer to obtain cDNA;

c) RNase A treatment, removing RNA in a reaction system;

d) ligating an adenylated linker to the 3 'end of the cDNA using 5' App DNA/RNA thermostable ligase;

e) performing multiplex primer PCR amplification using 5' phosphorylated specific primers (primers for adenylated linker and/or T cell receptor and/or B cell receptor C region), such as SEQ ID NO:7, 21, 23-30, with cDNA as template;

f) removal of one overhanging dA introduced at the 3' end due to multiplex primer PCR amplification Using T4DNA polymerase

g) Looping the product obtained in the last step by using T4DNA ligase;

h) treating Exonuclease Lambda Exonuclease and Exonuclease III, and removing non-circular DNA;

i) synthesis of the first strand by rolling circle amplification with phi29DNA polymerase using specific primers for the T cell receptor and/or B cell receptor C region, e.g. SEQ ID NO:22, 31-39;

j) continuously doping a plurality of dATPs at the 3' end of the first strand by using terminal transferase to form a poly-A sequence;

k) use of Oligod (T)₂₀Complementary pairing with the first strand poly (A) sequence, the second strand being synthesized by phi29DNA polymerase.

The dsDNA ligation sequencing kit is used for performing library-based sequencing by using a third generation sequencing platform, for example, ONT company SQK-LSK109 ligation sequencing kit specification is used for sequencing by using a matched sequencing instrument.

By carrying out specific rolling circle amplification on the circular template, each circle is only amplified to obtain a long double-stranded DNA product containing multiple copies, so that high-precision sequencing read length is obtained, the high error rate of base reading of a third-generation sequencing platform is well corrected, the data redundancy and amplification preference brought by the conventional rolling circle amplification technology are eliminated, the relative quantification of molecules to be detected can be realized, and the cost is reduced.

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. It is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Drawings

FIG. 1 is a detailed schematic of the TA ligation based dsDNA looping technique.

FIG. 2 is a detailed schematic of the cDNA looping technique based on T4 RNA ligase 1.

FIG. 3 is a detailed schematic of primer-specific rolling circle amplification.

FIG. 4 is a schematic diagram of the technical process of the TCR/BCR full-length transcriptome study.

FIG. 5 shows the accuracy of sequencing by constructing libraries by the library construction methods herein. raw read 1-8 are randomly selected 8 base sequences (each sequence corresponds to one nanopore) obtained by sequencing through ONT official LSK-109 library establishing reagent specifications, consensus read 1-5 are randomly selected 5 base sequences (each sequence corresponds to one nanopore) obtained by the sequencing scheme in embodiment 1 of the invention, and sanger-sequencing-result is a real sequence of a molecule to be tested (obtained by first-generation sequencing). A: randomly selecting a multiple sequence comparison result of the consistency sequence generated by the invention and the result obtained by the ONT platform official sequencing process and the generation sequencing data (sanger sequencing result). B: the result of the official sequencing process of the generated consistent sequence/ONT platform is compared with the pairwise sequence of the first generation sequencing data.

FIG. 6 shows the relative quantitative capability of sequencing by constructing libraries by the library construction methods herein.

FIG. 7 shows a diagram of the analysis of the sequencing results of the method of the invention of example 2. A: CDR3 length analysis; b: heterogeneity (Diversity) analysis; c: clonality evaluation, in which clonotypes are divided according to frequency into: low frequency (small), medium frequency (medium), high frequency (large), ultra high frequency (hyper extended), showing the proportion (relative abundance) of clonotypes of different frequencies.

FIG. 8 shows the sequencing results of the commercial second generation immunohistochemical library of example 2. A: CDR3 length analysis; b: heterogeneity (Diversity) analysis; c: clonality evaluation, in which clonotypes are divided according to frequency into: low frequency (small), medium frequency (medium), high frequency (large), ultra high frequency (hyper extended), showing the proportion (relative abundance) of clonotypes of different frequencies.

FIG. 9 shows a diagram of the analysis of the sequencing results of the method of the present invention of example 3. A: CDR3 length analysis; b: heterogeneity (Diversity) analysis; c: clonality evaluation, in which clonotypes are divided according to frequency into: low frequency (small), medium frequency (medium), high frequency (large), ultra high frequency (hyper extended), showing the proportion (relative abundance) of clonotypes of different frequencies.

FIG. 10 shows the sequencing results of the commercial second generation immunohistochemical library of example 3. A: CDR3 length analysis; b: heterogeneity (Diversity) analysis; c: clonality evaluation, in which clonotypes are divided according to frequency into: low frequency (small), medium frequency (medium), high frequency (large), ultra high frequency (hyper extended), showing the proportion (relative abundance) of clonotypes of different frequencies.

FIG. 11 shows a diagram of the analysis of the sequencing results of the method of the present invention of example 4. A: CDR3 length analysis; b: heterogeneity (Diversity) analysis; c: clonality evaluation, in which clonotypes are divided according to frequency into: low frequency (small), medium frequency (medium), high frequency (large), ultra high frequency (hyper extended), showing the proportion (relative abundance) of clonotypes of different frequencies.

FIG. 12 shows the sequencing results of the commercial second generation immunohistochemical library of example 4. A: CDR3 length analysis; b: heterogeneity (Diversity) analysis; c: clonality evaluation, in which clonotypes are divided according to frequency into: low frequency (small), medium frequency (medium), high frequency (large), ultra high frequency (hyper extended), showing the proportion (relative abundance) of clonotypes of different frequencies.

FIG. 13 shows the structure of the T-bridged fragment, in which the italic part is indicated as Xcml cleavage site and the other part is the ccdB gene.

Detailed Description

Reference will now be made in detail to embodiments of the invention, one or more examples of which are described below. Each example is provided by way of explanation, not limitation, of the invention. In fact, it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the scope or spirit of the invention. For instance, features illustrated or described as part of one embodiment, can be used on another embodiment to yield a still further embodiment.

It is therefore intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents. Other objects, features and aspects of the present invention are disclosed in or are apparent from the summary of the invention herein. It is to be understood by one of ordinary skill in the art that the present section is merely a description of exemplary embodiments and is not intended as limiting the broader aspects of the present invention.

Example 1: sequencing accuracy and quantitative performance of constructed sequencing libraries

This example illustrates the accuracy and quantification performance of sequencing using a library constructed by the library construction method of the present application, using a mixture of commercially available plasmids Antimouse-pRSF, Antirabbitt-pRSF, Dsbc-pRSF, FUCA1_ pRSF, INP-pMV (plasmid cocktail at a molar ratio of Antimouse-pRSF: Antirabbitt-pRSF: Dsbc-pRSF: Dsbc-pRSF: Dsbc-pRSF of 1:1:1:20: 80).

Specific primers SEQ ID NO 1-5 are designed aiming at specific sequences on anti-timeout-pRSF, anti-robbit-pRSF, Dsbc-pRSF, FUCA1_ pRSF and INP-pMV plasmids, and after the five plasmids are mixed according to a certain proportion, rolling damage amplification is carried out to synthesize first strand ssDNA.

Plasmid DNA	10～100ng
		Specific primer (100. mu.M)	1～10μL

And (3) after uniform mixing, complementary pairing is carried out: 95 ℃ for 5min, 50 ℃ for 15s, 30 ℃ for 15s, and 20 ℃ for 10 min. Placed temporarily on ice.

The specific primers SEQ ID NO 1-5 are synthesized and have the sequence:

then the following were added thereto:

mixing, and treating at 30 deg.C for 18 h. Then, the enzyme was inactivated by treatment at 65 ℃ for 10 min.

The resulting first strand ssDNA was recovered by precipitation using the ethanol method.

2. Incorporation of multiple datps at the 3' end of the first strand ssDNA of part 1 using terminal transferase TdT to form a poly-a sequence:

10X TdT reaction buffer	5μL
		CoCl₂(2.5mM)	5μL
ssDNA	0.1～10μg
		dATP(10mM)	0.75μL
TdT(NEB)	10～50U
		nuclease-free water	To 50 μ L

Uniformly mixing, and treating at 37 ℃ for 0.5-1 h. Then, the enzyme was inactivated by treatment at 75 ℃ for 20 min.

The resulting ssDNA was recovered by precipitation using the ethanol method.

3. Use of Oligod (T) complementarily paired to the poly-A sequence of the first strand₂₀Dependent phi29DNA polymerase to produce the second strand:

ssDNA	0.1～10μg
		Oligod(T)₂₀primer (100. mu.M)	0.5～5μL

Setting a reaction temperature gradient after uniformly mixing: 95 ℃ for 5min, 50 ℃ for 15s, 30 ℃ for 15s, and 20 ℃ for 10 min. Placed temporarily on ice.

Then, the following were added thereto:

mixing, and treating at 30 deg.C for 24 hr. Then, the enzyme was inactivated by treatment at 65 ℃ for 10 min.

The resulting dsDNA was recovered by precipitation using the ethanol method.

4. The SQK-LSK109 rapid ligation sequencing kit of the ONT sequencing platform was used, and the end repair and sequencing adapters were added as described in the instructions.

5. And (3) carrying out sequencing by using a matched ONT sequencing instrument.

The present inventors compared the sequencing results with the multi-sequence alignment software Clustal Omega and NCBI blastn alignment software to evaluate the base accuracy (shown in FIG. 5) of the obtained consensus sequence and the quantitative capability of the present invention (shown in FIG. 6) using the C3POa algorithm (https:// githu. com/rvolden/C3POa) for the obtained off-line data containing multiple copies of long sequences.

Specifically, the consensus sequence obtained by the method of the present invention is compared with the sequencing result obtained by the ONT sequencing platform official SQK-LSK109 connected sequencing kit library building process (sequences from 8 nanopores, raw _ read 1-8, each sequence corresponding to one nanopore, are randomly selected), and it is found that the error rate of ONT base reading is significantly improved by self-correction of the multicopy segment within the sequence. FIG. 5A shows the result of a multiple sequence alignment of randomly selected consensus sequences generated by the present invention and the ONT platform official sequencing protocol with generation sequencing data (sanger sequencing result). FIG. 5B shows the results of the ONT platform official sequencing protocol (randomly selected sequences from 8 nanopores, raw read 1-8, each corresponding to a nanopore) aligned with the consensus sequences generated by the present invention (randomly selected sequences from 5 nanopores, consensus reads 1-5, each corresponding to a nanopore), versus a pairwise sequence of the sequencing data. From the multiple sequence alignment results, it can be seen that the base error rate of the consensus sequence is lower than the result obtained from the ONT platform official sequencing process; the comparison result of every two pairs of the comparison results shows that the comparison rate (Identities) of the consistency sequence and the first generation sequencing data serving as the gold standard is 98-99%, and the Score (Score) is 5879-6071; the alignment rate and score of sequencing data of the ONT platform official sequencing process are lower than those of the invention. The multiple sequence alignment result can intuitively display the alignment condition between all bases, and the base alignment rate of the consistent sequence and the first generation sequencing data can be shown to be higher than that of the ONT platform official sequencing process.

FIG. 6 shows plasmid Antimouse-pRSF: Antirabbitt-pRSF: Dsbc-pRSF: Dsbc-pRSF: when Dsbc-pRSF is mixed with the sample at 1:1:1:20:80, the ratio of the number of read lengths obtained by sequencing is about 8:8:9:160:672, which is basically consistent with the mixed sample ratio. This indicates that the constructed sequencing library has good quantitative capability.

In conclusion, the library construction method can significantly improve the accuracy of base reading of the ONT platform when used for sequencing, and has good quantitative capability. Based on accuracy and quantitative capability, it is contemplated to use for amplicon sequencing, immunohistochemical library sequencing, and the like.

Example 2: construction of closed circular dsDNA constructs by TA ligation to create libraries and sequencing of the IGH genes

1. Construction of a bridged fragment with a dT overhang at the 3' end

The ccdB2 fragment from the commercial plasmid ccdB2-pMV was inserted into the pRSF-Duet1 vector based on EcoR I and HindIII restriction digests, resulting in the plasmid name ccdB2_ RCA 1.

The enzyme digestion system is as follows:

ccdB2-pMV/pRSF-Duet1	2～10μg
		EcoR1 restriction endonuclease (NEB)	2μL
HindIII restriction enzyme (NEB)	2μL
		10X CutSmart Buffer(NEB)	4μL
Nuclease-free water	To 40 μ L

The treatment was carried out at 37 ℃ for 1 h.

The nucleic acid molecules of the corresponding fragment size were recovered using an agarose gel recovery kit.

The linking system is as follows:

T4 DNA ligase(ThermoFisher)	2～5U
		10X T4 DNA ligase buffer	1μL
ccdB2 fragment	About 500ng
		pRSF-Duet1 cleavage product	About 500ng
Nuclease-free water	To 10 μ L

Treated at room temperature for 2h and transformed into chemically competent DH 5. alpha. by heat shock transformation.

Extracting ccdB2_ RCA1 plasmid by using a ThermoFisher plasmid miniprep kit, processing the plasmid at 37 ℃ by using restriction enzyme XcmI, and recovering a fragment of about 303bp from the enzyme digestion product through agarose gel electrophoresis to obtain a bridged fragment with a dT protruding end at the 3' end. The enzyme digestion system is as follows:

ccdB2_RCA1	2～10μg
		XcmI restriction enzyme (NEB)	2μL
10X CutSmart Buffer(NEB)	4μL
		Nuclease-free water	To 40 μ L

Treating for 1-3 h at 37 ℃. And recovering nucleic acid molecules with corresponding fragment sizes by using an agarose gel recovery kit, wherein the obtained bridging fragment is a double-stranded DNA molecule with a dT tail at the 3' end, the sequence of one strand is shown as SEQ ID NO. 8, and the complementary strand is shown as SEQ ID NO. 9.

5'-TGTATGGATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAACCATACAT-3'(SEQ ID NO:8)

5'-TGTATGGTTATATTCCCCAGAACATCAGGTTAATGGCGTTTTTGATGTCATTTTCGCGGTGGCTGAGATCAGCCACTTCTTCCCCGATAACGGAGACCGGCACACTGGCCATATCGGTGGTCATCATGCGCCAGCTTTCATCCCCGATATGCACCACCGGGTAAAGTTCACGGGAGACTTTATCTGACAGCAGACGTGCACTGGCCAGGGGGATCACCATCCGTCGCCCGGGCGTGTCAATAATATCACTCTGTACATCCACAAACAGACGATAACGGCTCTCTCTTTTATAGGTGTAAACCTTAAACTGCATCCATACAT-3'(SEQ ID NO:9)

2. Total RNA extraction and generation of dsDNA

The peripheral blood leukocyte IGH gene of rheumatoid arthritis patients is taken as an example.

First, erythrocyte lysate (4.16g NH) was used₄Cl、0.5g KHCO₃0.02g of disodium ethylenediaminetetraacetate, adding nuclease-free water to 500ml, adjusting pH to 7.2) removing peripheral red blood cells, and then extracting total RNA from peripheral blood leukocytes by Trizol (Invitrogen) method.

The extracted total RNA was subjected to Reverse transcription using M-MLV Reverse Transcriptase (Invitrogen) according to the following procedure to obtain cDNA.

The reaction system is as follows:

total RNA	1～5μg
		dNTP(10mM)	1μL
Oligod(T)₂₀Primer (10. mu.M)	1μL

The reaction was carried out at 65 ℃ for 5 min. To the reaction mixture was added:

5X First Strand Buffer	4μL
		0.1M DTT	2μL

react at 37 ℃ for 2 min. To the reaction mixture was added:

M-MLV RT	1μL
		nuclease-free water	To 20 μ L

Reacting at 37 ℃ for 50min, reacting at 75 ℃ for 15min for inactivation, and storing the cDNA product at 4 ℃ for a short time for later use, wherein the cDNA product needs to be stored at-80 ℃ for a long time.

The cDNA was subjected to Multiplex primer PCR amplification using QIAGEN Multiplex PCR Kit according to the following procedure:

the primers are synthesized by the following sequences, and the 5' ends of the primers are all provided with phosphorylation modifications:

the amplification procedure was as follows:

the dsDNA was recovered by precipitation using the ethanol method.

3. The bridging fragment of part 1 and the dsDNA recovered in part 2 were circularized using the principle of TA ligation. The reaction system is as follows:

10X T4 DNA Ligase Buffer	2μL
		T4 DNA ligase(ThermoFisher)	5～10U
dsDNA
		1～10μg
bridged fragments			2～10μg
	Nuclease-free water	To 20 μ L

Reacting at room temperature for 0.5-2 h.

The reaction product, circular dsDNA, is recovered by precipitation using the ethanol method.

Removing non-circularized DNA after treatment with Lambda Exonuclease and Exonuclease III:

DNA	0.5～10μg
		Lambda Exonuclease(NEB)	10～20U
Exonuclease III(NEB)	20～50U
		10X Cutsmart buffer	2μL
nuclease-free water	To 20 μ L

Treating at 37 ℃ for 8-16 h. Then, the enzyme was inactivated by treatment at 70 ℃ for 20 min.

The cyclized reaction product is recovered by precipitation using the ethanol process.

5. Synthesis of first strand ssDNA using rolling amplification with specific primers for the bridged fragments:

circularized DNA product	10～100ng
		Specific primer (100. mu.M)	1～10μL

Setting a reaction temperature gradient after uniformly mixing: 5min at 95 ℃, 15s at 50 ℃, 15s at 30 ℃ and 10min at 20 ℃ and immediately placed on ice.

The specific primers used were synthetic and the sequences are shown below:

5'-CAGTTTAAGGTTTACACCTATAAAA-3'(SEQ ID NO:20)

then the following were added thereto:

uniformly mixing, and treating at 30 ℃ for 18-36 h. Then, the enzyme was inactivated by treatment at 65 ℃ for 10 min.

The reaction product, first strand ssDNA, is recovered by precipitation using the ethanol method.

6. Incorporation of multiple dATPs (polyA sequences) at the 3' end of the first strand obtained in section 5 using terminal transferase:

The ssDNA in the reaction product is recovered by precipitation using the ethanol method.

7. Use of Oligod (T)₂₀Complementary pairing with the poly-A sequence of the ssDNA formed in part 6, relying on phi29DNA polymerase (NEB) to synthesize the second strand:

ssDNA	0.5～10μg
		Oligod(T)₂₀primer (100. mu.M)	1～10μL

Then, the following were added thereto:

uniformly mixing, and treating at 30 ℃ for 24-72 h. Then, the enzyme was inactivated by treatment at 65 ℃ for 10 min.

The dsDNA product was recovered by precipitation using the ethanol method.

9. The SQK-LSK109 rapid ligation sequencing kit of the ONT sequencing platform was used, and the end repair and sequencing adapters were added as described in the instructions.

10. And (3) carrying out sequencing by using a matched ONT sequencing instrument.

The invented fastq file utilizes C3POa algorithm to generate consistent sequence for IGH analysis, and compares the result with that obtained by commercial second generation immune repertoire sequencing protocol (Eggetakang corporation) on the result of CDR3 analysis of IGH. The protocol of agutazone corporation studies CDR3 sequences based on DNA level; in contrast, the present invention is based on mRNA levels and not only obtains information from CDR3, but also obtains full-length transcripts. And more nonfunctional CDR3 sequences were present at the DNA level, and few nonfunctional CDR3 sequences were present at the mRNA level.

Specifically, the consensus sequence generated by the method of the invention or the read length after splicing the sequencing protocol of the Elitanecx (R) was aligned with the database of the immune repertoire by MiXCR software (https:// MiXCR. readthetadocs. io/en/master /), and then R was used as immunarch (R) ((R))https:// immunarch.com) based on CDR3The regions were subjected to CDR3 length analysis, heterogeneity analysis and clonality evaluation. The analysis results of the present invention are shown in FIG. 7, and the sequencing protocol of the commercial second generation immunohistochemical library is shown in FIG. 8.

The CDR3 length analysis chart of FIG. 7A shows that the CDR3 length analysis chart is distributed in 10-30 bp in a centralized way, and compared with the sequencing result of the commercial second generation immune repertoire of FIG. 8A, the method of the invention can detect a longer CDR3 sequence. The heterogeneity analysis of FIG. 7B shows that nearly 25000 clonotypes of type are detected, indicating that the present invention has the potential for detection of a large number of clonotypes. In contrast to the sequencing results of the commercial second generation immunohistochemistry library of FIG. 8B, it was found that the method of the present invention was able to detect more clonotypes. In the clonality evaluation of the method of the present invention shown in FIG. 7C, most of the clones are medium frequency (medium) or low frequency (small) clones, and the detection result substantially matches the immune status of the organism of the patient with rheumatoid arthritis, and matches the clinical diagnosis result (rheumatoid arthritis) of the patient. At the same time, the results also matched the sequencing results of the commercial second generation immunohistochemistry library of fig. 8C.

The above results show that the analysis results of the present invention are approximately consistent with the sequencing results of the commercial second generation immune repertoire, but the method of the present invention can detect more information, such as longer CDR3 sequence information, more clonotypes, and full-length transcriptome information for further analysis.

Example 3: library construction by ligation of cDNAs into circles Using T4 RNA ligase and sequencing

1. Reverse transcription was performed using total RNA from peripheral blood leukocytes extracted in section 2 of example 2, using primers for IGK constant regions according to the following procedure:

total RNA	1～5μg
		dNTP(2.5mM)	1μL
IGK-primer (10. mu.M)	1μL

Reaction at 65 ℃ for 5min, to the reaction mixture was added:

the IGK-primer is synthesized and has the sequence:

5'-GCGTTATCCACCTTCC-3'(SEQ ID NO：21)

5X First Strand Buffer	4μL
		0.1M DTT	2μL

react at 37 ℃ for 2min, add to the reaction mixture:

M-MLV RT(Invitrogen)	1μL
		nuclease-free water	To 20 μ L

The reaction was carried out at 37 ℃ for 50 min. Then, the reaction was carried out at 75 ℃ for 15min to inactivate the enzyme.

2. Add 1. mu.L of RNaseA to the fraction 1 and treat at room temperature for 3 to 6 hours to remove RNA remaining in the reaction. The resulting cDNA was recovered using 50. mu.L of Beckmann RNAclean XP magnetic beads.

3. The recovered cDNA was circularized using T4 RNA ligase 1:

10X T4 RNA ligase Buffer	5μL
		cDNA	0.5～10μg
T4 RNAligase 1(NEB)	10～50U
		50％PEG8000	25μL
ATP(10μM)	4μL
		nuclease-free water	To 50 μ L

After mixing, the mixture was reacted overnight at 16 ℃. Then, the enzyme was inactivated by treatment at 100 ℃ for 2 min.

And precipitating and recovering DNA in the reaction product by using an ethanol method.

4. Removal of acyclic cDNA using exonuclease I:

cDNA	0.5～10μg
		Exonuclease I(NEB)	10～50U
10X reaction buffer	2μL
		nuclease-free water	To 20 μ L

Uniformly mixing, and treating at 37 ℃ for 1-6 h. Then, the enzyme was inactivated by treatment at 80 ℃ for 20 min.

The resulting circular cDNA was recovered by precipitation using the ethanol method.

5. Rolling-out amplification using specific primers for IGK constant regions, resulting in first-strand ssDNA:

circular cDNA	10～100ng
		Specific primer (100. mu.M)	1～10μL

The specific primers were synthesized with the sequence:

5'-GAACTGTGGCTGCACCATCTGTC-3'(SEQ ID NO:22)。

then, the following were added thereto:

6. Incorporation of multiple datps at the 3' end of the first strand of part 5 using terminal transferase:

10X TdT reaction buffer	5μL
		CoCl₂(2.5mM)	5μL
ssDNA	0.5～10μg
		dATP(10mM)	0.75μL
TdT(NEB)	10～50U
		nuclease-free water	To 50 μ L

The resulting ssDNA was recovered by precipitation using the ethanol method.

7. Use of Oligod (T) complementary paired with the poly-A sequence formed in section 6₂₀The second strand is produced by phi29DNA polymerase (NEB) to form a dsDNA product:

Then the following were added thereto:

The resulting dsDNA product is recovered by precipitation using the ethanol method.

The fastq file from the machine generates a consistent sequence by using the C3POa algorithm, the generated consistent sequence is then compared with the database of the immune repertoire by using the MiXCR software, and CDR3 length analysis, heterogeneity analysis and clonality evaluation are performed by using R package immunarch, and the results are shown in fig. 9 and 10. FIG. 9A shows that the CDR3 of the present invention has a concentrated distribution of 10-15 bp in length, slightly shorter in length, but approximately similar distribution trend, compared to the sequencing results of the commercial second generation immune repertoire of FIG. 10A. Heterogeneity analysis of the inventive method of FIG. 9B showed that nearly 2500 clonotypes were detected, and compared to the sequencing results of the commercial second generation immune repertoire of FIG. 10B, it was found that the inventive method was able to detect more clonotypes. In the clonality evaluation of the method of the present invention shown in FIG. 9C, the ratio of the high frequency clonotypes is less than 5%, and most of the high frequency clonotypes are intermediate frequency clonotypes, which substantially meet the immune status of the rheumatoid arthritis patient. And also matched the sequencing results of the commercial second generation immunohistochemical library of fig. 10C.

The above results show that the analysis results of the method of the present invention roughly match the sequencing results of the commercial second generation immunohistochemical library based on CDR3, but it can detect more information, such as more clonotypes and provide full-length transcriptome information for further analysis.

Example 4: TCR and BCR full Length transcriptome Studies

The present example uses the TCR and BCR full-length transcription group of peripheral blood of patients with acute lymphocytic leukemia as the study object.

1. Total RNA from peripheral blood leukocytes was extracted according to the method of example 2.

2. Use of Oligod (T)₂₀Reverse transcription of mRNA with primer to obtain cDNA

Total RNA	1～5μg
		dNTP(10mM)	1μL
Oligod(T)₂₀Primer (10. mu.M)	1μL

5X First Strand Buffer	4μL
		0.1M DTT	2μL

react at 37 ℃ for 2 min. To the reaction mixture was added:

M-MLV RT(Invitrogen)	1μL
		nuclease-free water	To 20 μ L

Reacting at 37 ℃ for 50min, then reacting at 75 ℃ for 15min for inactivation, and storing the cDNA product at 4 ℃ for a short time for later use, wherein the cDNA product needs to be stored at-80 ℃ for a long time.

3. Adding 1 mu L of RNaseA, and treating at room temperature for 1-6 h to remove RNA remained in the previous reaction. The resulting cDNA was recovered using 50. mu.L of Beckmann RNAclean XP magnetic beads.

4. Ligation of adenylated linker to 3 'end of cDNA using 5' App DNA/RNA thermostable ligase:

cDNA	0.5～10ug
		general miRNA cloning linker (NEB) (10. mu.M)	2μL
10X NEBuffer1	2μL
		50mM MnCl₂	2μL
5' App DNA/RNA thermostable ligase (NEB)	2μL
		Nuclease-free water	To 20 μ L

Mixing, treating at 65 deg.C overnight, and treating at 90 deg.C for 3min to inactivate enzyme.

Universal miRNA cloning linker sequence (SEQ ID NO: 6): 5' -rAppCTGTAGGCACCATCAAT-NH₂ 3'。

Specific primer sequences complementary to miRNA linkers (SEQ ID NO: 7): 5'-ATTGATGGTGCCTACAG-3' are provided.

The ligation product was recovered by precipitation using the ethanol method.

5. The cDNA was subjected to Multiplex primer PCR amplification using QIAGEN Multiplex PCR according to the following procedure:

the primer sequences are synthesized, the sequences are as follows, and the 5' ends are all provided with phosphorylation modifications:

name (R)	Sequence of	SEQ ID NO
			MiRNA primer	5'-ATTGATGGTGCCTACAG-3'	7
TRB_C_5P	5'-CACGTGGTCGGGGWAGAAGC-3'	23
			TRA_C_5P	5'-AGCTGGTACACGGCAGGGTC-3'	24
IGH_lgG_C_5P	5'-GAGTTCCACGACACCGTCAC-3'	25
			IGH_lgA_C_5P	5'-GGCTCCTGGGGGAAGAAGCC-3'	26
IGH_lgE_C_5P	5'-TAGCCCGTGGCCAGGCAG-3'	27
			IGH_lgD_C_5P	5'-CCCAGTTATCAAGCATGCCA-3'	28
IGH_lgM_C_5P	5'-GGGGAATTCTCACAGGAGAC-3'	29
			IGL_C_5P	5'-GCTCCCGGGTAGAAGT-3'	30
IGK_C_5P	5'-GCGTTATCCACCTTCC-3'	21

The amplification procedure was as follows:

the reaction product dsDNA was recovered by precipitation using the ethanol method.

6. Removal of an overhanging dA base added at the 3' end of the reaction product due to the multiplex primer PCR amplification Using T4DNA polymerase

10X NEBuffer 2.1	2μL
		dNTP(2.5mM)	4μL
DNA	0.5～10μg
		0.1％BSA	2μL

Reaction at 70 ℃ for 5min, to the reaction mixture was added:

t4DNA polymerase (NEB)	0.5～2U
		Nuclease-free water	up to 20μL

The reaction was carried out at 37 ℃ for 5min and then at 75 ℃ for 20min to inactivate the enzyme.

7. The PCR products were circularized using T4 DNAligase:

10X T4 DNAligation buffer	2μL
		T4 DNAligase(NEB)	10～20U
DNA	0.5～10μg
		nuclease-free water	To 20 μ L

And (5) treating at room temperature for 2-6 h.

The resulting circular dsDNA was recovered by precipitation using the ethanol method.

Removing non-circular DNA after treatment of Lambda Exonuclease and Exonuclease III:

uniformly mixing, and treating at 37 ℃ for 8-16 h. Then, the enzyme was inactivated by treatment at 80 ℃ for 20 min.

The dsDNA was recovered by precipitation using the ethanol method.

9. Synthesis of first strand ssDNA using rolling amplification with specific primers for the TCR/BCR constant region:

setting a reaction temperature gradient after uniformly mixing: 95 ℃ for 5min, 50 ℃ for 15s, 30 ℃ for 15s, and 20 ℃ for 10 min. Immediately placed on ice, to which was then added the following:

The primers were synthesized and the sequences are shown below:

the reaction product ssDNA was recovered by precipitation using the ethanol method.

10. The terminal transferase is used to incorporate a plurality of dATPs at the 3' end of the synthesized first single strand:

11. Use of Oligod (T)₂₀The second strand is produced by phi 29-dependent DNA polymerase, forming a dsDNA product:

ssDNA	0.5～10μg
		Oligod(T)₂₀(100μM)	1～10μL

Then, the following were added thereto:

The dsDNA product was recovered by precipitation using the ethanol method.

12. The SQK-LSK109 rapid ligation sequencing kit of the ONT sequencing platform was used, and the end repair and sequencing adapters were added as described in the instructions.

13. And (3) carrying out sequencing by using a matched ONT sequencing instrument.

The fastq file of the off-line uses the C3POa algorithm to generate a consistent sequence, then the generated consistent sequence is compared with a database of an immune group library by using MiXCR software, and CDR3 length analysis, heterogeneity analysis and clonality evaluation are carried out by using R package immunarch. The results are shown in FIGS. 11 and 12. FIG. 11A shows that the length of CDR3 is distributed in 10-30 bp, and compared with the sequencing result of the commercial second generation immune repertoire shown in FIG. 12A, the invention can detect a longer CDR3 sequence. FIGS. 11B and 11C show that the heterogeneity and clonality evaluation of the method of the present invention is consistent with the sequencing results of the second generation commercial immunohistochemical library of FIGS. 12B and 12C, and is substantially consistent with the immune status of the organism of patients with acute lymphoblastic leukemia, especially TCR abnormalities. Through the evaluation of clonality, the TRB clonotype subtype with the proportion of more than 5 percent is found, the diagnosis of the acute T lymphocyte leukemia is basically met, and the result is consistent with the follow-up clinical flow analysis and bone marrow pathological biopsy results, which shows that the invention is expected to be used for auxiliary clinical diagnosis.

Various technical features of the above embodiments may be combined arbitrarily, and for brevity, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between combinations of these technical features, the scope of the present specification should be considered as being described.

The above-mentioned embodiments only exemplify several embodiments of the present invention, and the description is specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. The protection scope of the present invention should be subject to the appended claims.

Sequence listing

<110> Beijing Hospital

<120> sequencing library construction method based on rolling circle amplification and application thereof

<130> LZ2105657CN01

<160> 39

<170> PatentIn version 3.3

<210> 1

<211> 19

<212> DNA

<213> Artificial

<220>

<223> Antimouse-pRSF_RCA1

<400> 1

atgggccatc accatcatc 19

<210> 2

<211> 19

<212> DNA

<213> Artificial

<220>

<223> Antirabbit-pRSF_RCA1

<400> 2

tgggccatca ccatcatca 19

<210> 3

<211> 19

<212> DNA

<213> Artificial

<220>

<223> Dsbc-pRSF_RCA1

<400> 3

tgggccatca ccatcatca 19

<210> 4

<211> 20

<212> DNA

<213> Artificial

<220>

<223> FUCA1-pRSF_RCA1

<400> 4

agaaaagagt tagaagagca 20

<210> 5

<211> 20

<212> DNA

<213> Artificial

<220>

<223> INP-pRSF_RCA1

<400> 5

caccgttgaa agccgttact 20

<210> 6

<211> 17

<212> DNA

<213> Artificial

<220>

<223> Universal miRNA cloning linker sequence, 5 'is rApp, 3' is NH2

<400> 6

ctgtaggcac catcaat 17

<210> 7

<211> 17

<212> DNA

<213> Artificial

<220>

<223> specific primer sequence complementary to miRNA joint

<400> 7

attgatggtg cctacag 17

<210> 8

<211> 306

<212> DNA

<213> Artificial

<220>

<223> T-bridged fragment

<400> 8

tgtatggatg cagtttaagg tttacaccta taaaagagag agccgttatc gtctgtttgt 60

ggatgtacag agtgatatta ttgacacgcc cgggcgacgg atggtgatcc ccctggccag 120

tgcacgtctg ctgtcagata aagtctcccg tgaactttac ccggtggtgc atatcgggga 180

tgaaagctgg cgcatgatga ccaccgatat ggccagtgtg ccggtctccg ttatcgggga 240

agaagtggct gatctcagcc accgcgaaaa tgacatcaaa aacgccatta acctgatgtt 300

ctggggaata taaccataca t 321

<210> 9

<211> 305

<212> DNA

<213> Artificial

<220>

<223> T-bridged complementary sequences

<400> 9

tgtatggtta tattccccag aacatcaggt taatggcgtt tttgatgtca ttttcgcggt 60

ggctgagatc agccacttct tccccgataa cggagaccgg cacactggcc atatcggtgg 120

tcatcatgcg ccagctttca tccccgatat gcaccaccgg gtaaagttca cgggagactt 180

tatctgacag cagacgtgca ctggccaggg ggatcaccat ccgtcgcccg ggcgtgtcaa 240

taatatcact ctgtacatcc acaaacagac gataacggct ctctctttta taggtgtaaa 300

ccttaaactg catccataca t 321

<210> 10

<211> 24

<212> DNA

<213> Artificial

<220>

<223> IGHV1

<400> 10

cctcagtgaa ggtctcctgc aagg 24

<210> 11

<211> 24

<212> DNA

<213> Artificial

<220>

<223> IGHV2

<400> 11

tcctgcgctg gtgaaaccca caca 24

<210> 12

<211> 23

<212> DNA

<213> Artificial

<220>

<223> IGHV3

<400> 12

ggtccctgag actctcctgt gca 23

<210> 13

<211> 24

<212> DNA

<213> Artificial

<220>

<223> IGHV4

<400> 13

tcggagaccc tgtccctcac ctgc 24

<210> 14

<211> 21

<212> DNA

<213> Artificial

<220>

<223> IGHV5

<400> 14

cagtctggag cagaggtgaa a 21

<210> 15

<211> 24

<212> DNA

<213> Artificial

<220>

<223> IGHV6

<400> 15

cctgtgccat ctccggggac agtg 24

<210> 16

<211> 20

<212> DNA

<213> Artificial

<220>

<223> CHA

<400> 16

ggctcctggg ggaagaagcc 20

<210> 17

<211> 20

<212> DNA

<213> Artificial

<220>

<223> CHG

<400> 17

gagttccacg acaccgtcac 20

<210> 18

<211> 20

<212> DNA

<213> Artificial

<220>

<223> CHM

<400> 18

ggggaattct cacaggagac 20

<210> 19

<211> 24

<212> DNA

<213> Artificial

<220>

<223> IGHJ

<400> 19

acctgaggag acggtgacca gggt 24

<210> 20

<211> 25

<212> DNA

<213> Artificial

<220>

<223> bridge-specific primer

<400> 20

cagtttaagg tttacaccta taaaa 25

<210> 21

<211> 16

<212> DNA

<213> Artificial

<220>

<223> IGK-primers

<400> 21

gcgttatcca ccttcc 16

<210> 22

<211> 23

<212> DNA

<213> Artificial

<220>

<223> specific primer for IGK constant region

<400> 22

gaactgtggc tgcaccatct gtc 23

<210> 23

<211> 20

<212> DNA

<213> Artificial

<220>

<223> TRB_C_5P

<400> 23

cacgtggtcg gggwagaagc 20

<210> 24

<211> 20

<212> DNA

<213> Artificial

<220>

<223> TRA_C_5P

<400> 24

agctggtaca cggcagggtc 20

<210> 25

<211> 20

<212> DNA

<213> Artificial

<220>

<223> IGH_lgG_C_5P

<400> 25

gagttccacg acaccgtcac 20

<210> 26

<211> 20

<212> DNA

<213> Artificial

<220>

<223> IGH_lgA_C_5P

<400> 26

ggctcctggg ggaagaagcc 20

<210> 27

<211> 18

<212> DNA

<213> Artificial

<220>

<223> IGH_lgE_C_5P

<400> 27

tagcccgtgg ccaggcag 18

<210> 28

<211> 20

<212> DNA

<213> Artificial

<220>

<223> IGH_lgD_C_5P

<400> 28

cccagttatc aagcatgcca 20

<210> 29

<211> 20

<212> DNA

<213> Artificial

<220>

<223> IGH_lgM_C_5P

<400> 29

ggggaattct cacaggagac 20

<210> 30

<211> 16

<212> DNA

<213> Artificial

<220>

<223> IGL_C_5P

<400> 30

gctcccgggt agaagt 16

<210> 31

<211> 23

<212> DNA

<213> Artificial

<220>

<223> TCRB_RCA1

<400> 31

aggacctgaa maacgtgttc cca 23

<210> 32

<211> 24

<212> DNA

<213> Artificial

<220>

<223> TCRA_RCA1

<400> 32

atatccagaa ccctgaccct gccg 24

<210> 33

<211> 23

<212> DNA

<213> Artificial

<220>

<223> IGHC_lgG_RCA1

<400> 33

cytccaccaa gggcccatcg gtc 23

<210> 34

<211> 23

<212> DNA

<213> Artificial

<220>

<223> IGHC_lgA_RCA1

<400> 34

catccccgac cagccccaag gtc 23

<210> 35

<211> 23

<212> DNA

<213> Artificial

<220>

<223> IGHC_lgE_RCA1

<400> 35

cctccacaca gagcccatcc gtc 23

<210> 36

<211> 23

<212> DNA

<213> Artificial

<220>

<223> IGHC_lgD_RCA1

<400> 36

cacccaccaa ggctccggat gtg 23

<210> 37

<211> 18

<212> DNA

<213> Artificial

<220>

<223> IGHC_lgM_RCA1

<400> 37

ggagtgcatc cgccccaa 18

<210> 38

<211> 22

<212> DNA

<213> Artificial

<220>

<223> IGLC_RCA1

<400> 38

cactctgttc ccrccctcct ct 22

<210> 39

<211> 24

<212> DNA

<213> Artificial

<220>

<223> IGLC4_RCA1

<400> 39

acaaggccac actggtgtgt ctca 24

Claims

1. A method of constructing a sequencing library for single molecule sequencing, comprising:

providing a closed circular double-stranded DNA molecule, cDNA molecule or RNA molecule form of a molecule to be sequenced;

2. The method of claim 1, wherein the closed circular double stranded DNA or cDNA molecule is extrachromosomal circular DNA or is formed by:

3. The method of claim 1 or 2, wherein the rolling circle amplification uses phi29DNA polymerase, Bst DNA polymerase or Klenow enzyme.

4. The method according to any one of claims 1 to 3, wherein the cDNA is obtained by reverse transcription of total RNA of leukocytes, and the specific primer used for rolling circle amplification is SEQ ID NO:22 and 31-39.

5. The method of claim 4, wherein the cDNA obtained by reverse transcription is ligated to the 3' end of the cDNA molecule having the sequence of SEQ ID NO:6, and using the single-stranded DNA linker of SEQ ID NO: 7. 21, 23-30, to perform multiplex amplification.

6. The method according to any one of claims 1 to 5, wherein when the double-stranded DNA is circularized by using a T-bridged fragment, the sequence of the specific primer is SEQ ID NO: 20.

7. the method of any one of claims 1 to 6, wherein phi29DNA polymerase is used in the rolling circle amplification and the specific primer is free of terminal modifications.

8. The method of construction according to any one of claims 1 to 7 wherein the complementary second strand of the first strand is produced by:

use of Oligod (T) complementary to the poly-A sequence of the first strand₂₀As a primer, a DNA polymerase is used to generate the second strand.

9. The method of claim 8, wherein the DNA polymerase is phi29DNA polymerase, Bst DNA polymerase or Klenow enzyme.

10. The construction method of any one of claims 1 to 9, further comprising ligating double stranded DNA products to a sequencing adaptor to obtain the sequencing library.

11. The method of claim 10, wherein the ONT platform is used to ligate the sequencing adaptor to the double-stranded DNA using a ligation sequencing kit.

12. The construction method of claim 1, wherein the sequencing library is used for single molecule sequencing, e.g., nanopore platform sequencing such as ONT platform or other single molecule real-time sequencing platform such as PacBio platform sequencing.

13. A sequencing method, comprising:

obtaining a sequencing library using the construction method of any one of claims 1 to 12;

14. The method of construction according to claims 1-12, the method of sequencing according to claim 13, for use in immunohistochemical library sequencing, amplicon sequencing, extrachromosomal circular DNA sequencing, circular RNA sequencing.

15. A kit for sequencing library construction for single molecule sequencing comprising:

1) specific primers for rolling circle amplification; and

2) an enzyme for rolling circle amplification such as phi29DNA polymerase, Bst DNA polymerase or Klenow enzyme; and

3) a T-bridged fragment with a dT cohesive end at the 3' end, such as a double-stranded DNA consisting of the sequences SEQ ID NO 8 and 9 and a specific primer sequence therefor SEQ ID NO 20; and/or

4)5 'end r APP modified and 3' end NH₂Blocking modified linkers, such as miRNA linkers of sequence SEQ ID NO 6 and specific primers thereto SEQ ID NO 7.

16. The kit of claim 15, further comprising:

DNA or RNA ligase such as T4DNA or RNA ligase.

17. The kit of claim 15 or 16, further comprising:

dATP and Oligod (T)₂₀(ii) a And/or

Specific primers SEQ ID NO 7, 21, 23-30 for immunohistochemical library cDNA multiplex primer PCR amplification and specific primers SEQ ID NO 22, 31-39 for rolling circle amplification.