CN117844905A

CN117844905A - Method for constructing sequencing library for detecting full length of poly (A) tail contained in RNA

Info

Publication number: CN117844905A
Application number: CN202311699882.0A
Authority: CN
Inventors: 何元林; 江玥; 王曦
Original assignee: Suzhou Nanyi University Innovation Center; Nanjing Medical University
Current assignee: Suzhou Nanyi University Innovation Center; Nanjing Medical University
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-04-09

Abstract

The invention relates to a method for constructing a sequencing library for detecting the full length of RNA containing poly (A) tail, preparing total RNA of a sample to be detected; performing terminal extension on the total RNA in a PolyA polymerase reaction system; directly carrying out reverse transcription and template conversion on a product obtained by extending the tail end in a reverse transcription reaction system without RNA purification, wherein the reverse transcription reaction system comprises dCTP, a reverse transcription primer and a TSO primer, the sequence of the reverse transcription primer comprises a molecular tag, and the sequence of the TSO primer comprises a molecular tag and a spacer sequence; carrying out PCR amplification on the product; sequencing the amplified product to obtain a sequencing library of RNA having poly (A) tails in the sample to be tested. The method does not need to purify RNA before PCR, reduces the loss of trace RNA, enables the method to be suitable for detecting trace samples, improves the template conversion efficiency, reduces the generation of invalid readings, and can realize the accurate quantification of transcripts.

Description

Method for constructing sequencing library for detecting full length of poly (A) tail contained in RNA

Technical Field

The invention belongs to the technical field of biology, and particularly relates to a method for constructing a sequencing library for detecting the full length of an RNA poly (A) tail.

Background

Metabolism of eukaryotic mRNA is a complex process involving coordinated control of transcription, splicing, capping, and 3' end formation. The structure of the formed mRNA includes: a 5 'cap, a 3' poly (a) tail, an untranslated region (UTR) and an Open Reading Frame (ORF). Wherein the 3' poly (A) tail plays a role in regulating mRNA nuclear, mRNA translation initiation, and mRNA stability. In the fields of RNA vaccine drug research, early embryo development research and the like, detection and analysis of the integrity of mRNA containing poly (A) tail and the length of poly (A) tail are generally required.

Methods for detecting poly (A) based on NGS include PAL-seq (poly (A) TAIL length profiling by sequencing), TAIL-seq, FLAM-seq (full-length poly (A) and mRNA sequencing), PAIso-seq (poly (A) including full-length RNA isoform-sequencing), PAIso-seq2 (poly (A) including full-length RNA isoform-sequencing), and the like. Among them, TAIL-seq and PAL-seq are developed based on the second generation sequencing, and the disadvantage is that the second generation sequencing has poor sequencing and recognition accuracy for poly (A) region, and the method can only detect poly (A) region, but can not detect RNA integrity and accuracy, and has only 231bp for effective reading length of poly (T). FLAM-seq, PAIso-seq and PAIso-seq2 are all experimental protocols based on extremely sensitive Smart-seq2 optimization. The core of this technology is the conversion mechanism of the 5' end of RNA transcripts, specifically the conversion of oligonucleotides (TSO: template switch oligo) depending on the intrinsic properties of Moloney Murine Leukemia Virus (MMLV) reverse transcriptase and the use of unique templates. During first strand synthesis, reverse transcription is initiated with reverse transcription primers, and when MMLV reaches the 5 'end of the RNA template, the terminal transferase activity of MMLV reverse transcriptase adds some additional nucleotides (mainly deoxycytidine, CCC) at the 3' end of the newly synthesized cDNA strand. The CCC base serves as an anchor site for the Template Switch (TSO) oligonucleotide GGG. After base pairing between the TSO and the additional deoxycytidine fragment, the reverse transcriptase "converts" the template strand from the original RNA to the TSO oligonucleotide and continues synthesis to the 5' end of the TSO oligonucleotide. The cDNA thus obtained contains the complete 5' end of the transcript and the selected universal sequence is added to the reverse transcription product. FLAM-seq requires nanogram-grade RNA and cannot meet the requirement of micro RNA samples.

While PAIso-seq and PAIso-seq2 can meet the requirement of detecting the Poly (A) tail of a trace sample, the experimental process comprises the following steps of RNA end extension, digestion of redundant extension primers, RNA purification, reverse transcription and template switching, PCR amplification and 3-generation library construction.

However, this method requires purification of RNA before transcription and template switching, and some RNA is lost to some extent. The efficiency of template conversion is typically less than 10%, and mainly MMLV will also add non-deoxycytidine at the 3' end of the newly synthesized cDNA, which cannot be anchored to TSO. Meanwhile, the TSO used at present has certain defects, so that the TSO can also serve as a primer to play a role in reverse transcription, and a subsequent amplified product has no 3-terminal Poly (A) tail information, so that the effectiveness of data is seriously reduced. Furthermore, the reverse transcription anchor primer used in the past has no UMI information, and therefore, the transcript cannot be precisely quantified. There is therefore also a need for an optimized protocol that allows for accurate quantification of transcripts.

Disclosure of Invention

The invention aims to provide a method for constructing a sequencing library for detecting the full length of RNA poly (A) tail, which is suitable for detecting trace RNA, improves template conversion efficiency and can accurately and quantitatively detect transcripts.

In order to achieve the above purpose, the invention adopts the following technical scheme:

in a first aspect the invention provides a method of detecting the construction of a sequencing library comprising the full length of a poly (A) tail in RNA, comprising the steps of:

(1) Preparing total RNA of a sample to be tested;

(2) Performing terminal extension on the total RNA in a PolyA polymerase reaction system;

(3) Directly carrying out reverse transcription and template conversion on a product obtained by extending the tail end in a reverse transcription reaction system without RNA purification, wherein the reverse transcription reaction system comprises dCTP, a reverse transcription primer and a TSO primer, the sequence of the reverse transcription primer comprises a molecular tag, and the sequence of the TSO primer comprises a molecular tag and a spacer sequence;

(4) Carrying out PCR amplification on the product obtained in the step (3);

(5) Sequencing the amplified product of step (4) to obtain a sequencing library of RNA having poly (A) tails in the sample to be tested.

According to some embodiments, the total RNA in step (1) has a mass of 0.01ng and above, e.g., 0.01ng to 1. Mu.g. The total RNA in step (1) may be derived from a single cell, as in example 1, which is RNA extracted from a single cell.

According to some embodiments, the concentration of dCTP is 10 to 120mM, preferably 30 to 120mM, more preferably 50 to 120mM, even more preferably 80 to 120mM, e.g. 80mM, 85mM, 90mM, 95mM, 100mM, 105mM, 110mM, 105mM, 120mM.

According to some embodiments, the molecular tag in the reverse transcription primer is NNNNNNND, wherein each of the plurality of N is independently one of A, T, C, G and D is one of A, G, T.

According to some further embodiments, the sequence of the reverse transcription primer is

ACGACGCTCTTCCGATCTTGTACCTTNNNNNNNDCCCCCCCCCTTT。

According to some embodiments, the molecular tag in the TSO primer is NNNNNNNN, wherein the plurality of N is one of A, T, C, G, respectively, and the spacer sequence is one of CTAAC, CAGCA, ATAAC.

According to some further embodiments, the spacer sequence in the TSO primer is located at the 3' end of the molecular tag.

According to some further embodiments, the sequence of the TSO primer is

CAGTGGTATCAACCAGTNNNNNNSPACERGrGrG, wherein SPACER is the above SPACER sequence, and rG is ribonucleic acid.

According to some preferred embodiments, the simultaneous introduction of 8 base UMI in the reverse transcription primer and the TSO primer results in a synthesized cDNA containing 16 base UMI tags, up to 3.22×10 can be achieved ⁹ Molecular tag (4) ⁷ ×3×4 ⁸ ＝3.22×10 ⁹ )。

According to some embodiments, the reverse transcription system comprises 1. Mu.L of extension-reverse transcription buffer, 0.75. Mu.L of 100mM DTT, 0.27. Mu.L of 100mM dCTP, 3. Mu.L of 5M betaine, 0.09. Mu.L of 1M MgCl per 10. Mu.L of the reverse transcription system ₂ mu.L of 20mM of the reverse transcription primer (C9T3_UMI_RT_primer), 0.1. Mu.L of 100mM of the TSO primer (SPACER_TSO_UMI_primer), 1. Mu.L of reverse transcriptase, 1. Mu.L of dNTP mixture each 10mM, 0.2. Mu.L of RNase inhibitor, 2.09. Mu.L of enzyme-free water.

According to some embodiments, the PolyA polymerase reaction system comprises 0.5. Mu.L of 10mM GTP/ITP mixture, 1. Mu.L Poly (A) polymerase, 0.5. Mu.L extension-reverse transcription buffer, 0.2. Mu.L RNase inhibitor, 0.8. Mu.L enzyme-free water per 3. Mu.L of the PolyA polymerase reaction system.

Wherein the extension-reverse transcription buffer may be a conventional buffer, e.g., may comprise 50mM Tris-HCl, pH 8.3, 75mM KCl,3mM MgCl ₂ And 20mM DTT buffer; it can also be life 5x first strand synthesis buffer, e.g.comprising 50mM NaCl,10mM Tris-HCl,10mM MgCl ₂ And 1mM DTT buffer.

Preferably, the molar concentration ratio of GTP to ITP in the GTP/ITP mixture is 3:1.

According to some embodiments, the specific steps of step (1) are: the cell samples or embryo samples were dissolved in 2. Mu.L of 0.2% (v/v) Triton X-100, denatured at 72℃for 3min, and immediately thereafter placed on ice.

According to some embodiments, the system of step (2) is reacted at 37℃for 1-2 hours, followed by 5min inactivation of Poly (A) polymerase at 65℃and short storage at 4 ℃.

According to some embodiments, the system of step (3) is reacted at 42℃for 2h and 70℃for 10min to obtain full-length cDNA. The product of this step can be stored briefly overnight at 4 ℃.

According to some embodiments, the reaction system of step (4) is subjected to PCR amplification in the presence of upstream and downstream primers to obtain a large amount of cDNA for subsequent experiments.

According to some embodiments, the number of cycles in step (4) for PCR amplification may be adjusted as desired, e.g., 98℃20s, 67℃20s, 72℃360s, and cycles 10 to 16, e.g., 10 cycles, 11 cycles, 12 cycles, 13 cycles, 14 cycles, 15 cycles, or 16 cycles.

According to some embodiments, the method further comprises a purification step after step (4) and before step (5).

Further, the amplified cDNA obtained in the step (4) is purified by using AMPure magnetic beads, specifically, 0.8 x-1 x volume of AMPure magnetic beads are used for uniformly mixing with the product obtained in the step (4), and standing is carried out at room temperature; optionally, blowing and beating again, mixing uniformly, and standing at room temperature; the time for the standing is, for example, 5 to 10 minutes. Then placing the mixture on a magnetic rack, removing the supernatant after the liquid is clarified, adding 80% ethanol, standing in a tube, and discarding the supernatant; the time of the standing is preferably 30 seconds; the magnetic separation step is repeated. And then drying at room temperature, so that the surfaces of the magnetic beads have no reflection and no cracking. Adding the eluent, blowing and mixing uniformly, and standing at room temperature of 25 ℃ for 5min. Placing the centrifuge tube on a magnetic rack, standing until liquid is clear, and transferring supernatant into a new centrifuge tube; the time for clarifying the liquid is, for example, 2-5min.

According to some embodiments, the specific method of step (5) is: constructing SMRTbell Template library using SMRT bell Template Prep kit (pacbrio); and (3) sequencing the SMRTbell Template library by adopting a PacBio platform to obtain a full-length sequencing result of the sample to be tested, wherein the full-length sequencing result contains a Poly (A) tail.

Based on the sequencing results, the full-length RNA sequence of the target gene and the poly (A) tail sequence thereof in the sample can be accurately analyzed.

In a second aspect the invention provides the use of a method as described above for analysing the sequence of an RNA having a poly (A) tail.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

according to the method, through optimizing steps of a library construction method and a reaction system, RNA does not need to be purified before PCR, loss of trace RNA is reduced, so that the method can be suitable for detecting trace samples, template conversion efficiency can be improved, invalid reading is reduced, and accurate quantification of transcripts can be realized.

Drawings

FIG. 1 is a schematic flow chart of a library construction method of the present invention;

FIG. 2 is a graph showing the distribution of transcript poly (A) tail length of trace samples in GV, MII and zygate 3 in example 1;

FIG. 3 is a graph showing the results of poly (A) tail length distribution of the Ccnb1, cfl1 and Spin1 genes in GV, MII and zygate in example 1.

Detailed Description

The invention is further described below with reference to examples. The present invention is not limited to the following examples. The implementation conditions adopted in the embodiments can be further adjusted according to different requirements of specific use, and the implementation conditions which are not noted are conventional conditions in the industry. The technical features of the various embodiments of the present invention may be combined with each other as long as they do not collide with each other.

Example 1 use of transcript poly (A) tail length in analysis of mouse egg cells GV, MII and mouse fertilized egg zygate based on method of sequencing library construction detecting full length of RNA containing poly (A) tail

1. Collecting oocyte and embryo samples of mice, adding into 2 μL of 0.2% (v/v) TritonX-100, and lysing at 72deg.C for 3min;

2. another 0.2mL PCR tube was taken and labeled as RNA extension Master mix, and the reaction system was then prepared as in Table 1:

TABLE 1

Component name	Volume (mu L)
		Extension-reverse transcription buffer	0.5
RRI (RNase inhibitor)	0.2
		Poly (A) polymerase	1
GTP/ITP(10mM)	0.5
		Enzyme-free water	0.8
Total volume of	3

3. After adding 3. Mu.L of RNA extension Master mix to the reaction tube of step 1, the mixture was placed in a PCR apparatus and run according to the following procedure of Table 2:

TABLE 2

Temperature (temperature)	Time (min)
		37℃	120
72℃	5
		4℃	∞

4. Immediately after the end of step 3, a template substitution experiment was prepared to prepare reverse transcription RT-Master mix as shown in Table 3 below:

TABLE 3 Table 3

Wherein, the c9t3_umi_rt_primer sequence is: ACGACGCTCTTCCGATCTTGTACCTTNNNNNNNDCCCCCCCCCTTT where N is one of A, T, C, G and D is one of A, G, T. The SPACER_TSO_UMI_primer sequence is: CAGTGGTATCAACCAGTNNNNNNNNSPACERGrGrG, wherein N is one of A, T, C, G, rG is ribonucleic acid, and SPACER is one of CTAAC, CAGCA, ATAAC.

5. To the reaction tube of step 3, 10. Mu.L of RT-Master mix was added, mixed well and immediately followed by the procedure shown in Table 4 below:

TABLE 4 Table 4

PCR amplification, preparing a PCR master mix system, wherein the reaction system conditions are shown in the following table 5:

TABLE 5

Component name	Volume (mu L)
		KAPA HiFi HotStart ReadyMix(2X)	17.5
F_Primer(10μM)	1
		R_Primer(10μM)	1
H ₂ O	0.5
		Total volume of	20

Wherein the F_Primer sequence is CAGTGGTATCAACGCAGAG; the R_Primer sequence was ACACGACGCTCTTCCGATCT.

7. To the product of step 5, 20. Mu.L of PCR master mix was added in a total volume of 35. Mu.L, mixed well and immediately run the procedure shown in Table 6 below:

TABLE 6

8. Purifying cDNA immediately after the reaction in the step 7 is finished;

8.1. adding 35 mu L of AMPureDNA purified magnetic beads into the product obtained in the step 7, uniformly mixing, and incubating for 5min at room temperature;

8.2. placing the mixture on a magnetic rack, removing supernatant after the liquid is clarified, adding 80% ethanol, standing in a tube, and discarding supernatant; the time of the standing is preferably 30 seconds;

8.3. repeating the step 8.2;

8.4. drying at room temperature to ensure that the surfaces of the magnetic beads have no reflection and no cracking;

8.5. adding 30 mu L Elution Solution solution, blowing, mixing, standing at 25deg.C for 5min;

8.6. placing the centrifuge tube on a magnetic rack, standing until liquid is clear, and transferring supernatant into a new centrifuge tube; the time for clarifying the liquid is, for example, 2-5min.

Qubit quantitated cDNA yield.

10. The SMRTbell Template library was then constructed according to the SMRT bell Template Prep kit (pacbrio);

PacBIO sequencing, namely sequencing a SMRTbell Template library by adopting a PacBIO platform to obtain a 3-generation sequencing result of a sample to be tested.

12. Based on the sequencing results, the full-length RNA sequence of the target gene and the poly (A) tail sequence thereof in the sample can be accurately analyzed. Among them, the transcript poly (A) tail length distribution of the trace samples in GV, MII and Zygote3 of this example is shown in FIG. 2, and the poly (A) tail length distribution of the Ccnb1, cfl1 and Spin1 genes in GV, MII and Zygote is shown in FIG. 3. It can be seen that the method of the present invention is capable of detecting the poly (A) tail sequence of a gene of interest in a trace sample.

Example 2

Example 2 is substantially the same as example 1 except that: RNA obtained in step 1 of example 1 (mouse oocyte and embryo samples were collected, added to 2. Mu.L of 0.2% (v/v) Triton X-100, and lysed at 72℃for 3 min) was replaced with 2. Mu.L of the corresponding RNA sample. In example 2, a plurality of experiments were performed, wherein the content of RNA in 2. Mu.L of the corresponding RNA sample in each experiment was 0.1ng (sample 1), 1ng (sample 2), and 10ng (sample 3), respectively. The cDNA yields detected in step 9 are shown in Table 7 below.

Comparative example 1

Comparative example 1 is substantially the same as example 2 except that: after the step 3 is finished, the product obtained in the step 3 is purified according to the existing magnetic bead purification method, and then the subsequent step 4 and subsequent steps are carried out. The cDNA yields detected in step 9 are shown in Table 7 below.

TABLE 7

As can be seen from Table 7, when the amount of RNA sample is small, the purification step is omitted by optimizing the reaction system, so that the cDNA yield can be improved well, and the construction and sequencing of the subsequent library are facilitated.

Comparative example 2

Comparative example 2 is substantially the same as example 2 except that: 0.27. Mu.L of dCTP (100 mM) in step 4 of example 2 was replaced with 0.27. Mu.L of enzyme-free water. The cDNA yields detected in step 9 are shown in Table 8 below.

TABLE 8

As can be seen from Table 8, the cDNA yield can be further improved by adding dCTP to the reverse transcription reaction system.

Example 3

Example 3 is substantially the same as example 2 except that: the 14 cycles of step 7 are replaced by 12 cycles. The cDNA yields detected in step 9 are shown in Table 9 below, and the proportion of invalid reads is estimated by 3-generation sequencing results, with the ratio of invalid reads shown in Table 9 below.

Comparative example 3

Comparative example 3 is substantially the same as example 3 except that: the sequence of SPACER_TSO_UMI_primer in step 4 is replaced by a primer without SPACER, and the sequence of the primer without SPACER is as follows: CAGTGGTATCAACCAGTNNNNNNNNrGrG, wherein rG is ribonucleic acid, and each of N is one of A, T, C, G independently. The cDNA yields detected in step 9 are shown in Table 9 below, and the proportion of invalid reads is assessed by the 3-generation sequencing results, with the ratio of invalid reads shown in tables 9 and 10 below.

TABLE 9

Table 10

As can be seen from Table 9, although cDNA yields can be improved without SPACE steric hindrance in the TSO primer, the null reads were high. By adopting the TSO primer provided by the invention, the invalid reading ratio can be effectively reduced while the cDNA yield is ensured to be enough for subsequent detection.

The present invention has been described in detail with the purpose of enabling those skilled in the art to understand the contents of the present invention and to implement the same, but not to limit the scope of the present invention, and all equivalent changes or modifications made according to the spirit of the present invention should be included in the scope of the present invention.

Claims

1. A method of detecting a full-length sequencing library construct containing poly (a) tails in RNA, comprising: the method comprises the following steps:

(1) Preparing total RNA of a sample to be tested;

(4) Carrying out PCR amplification on the product obtained in the step (3);

2. The method of claim 1, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (a) tail in RNA: the concentration of dCTP is 10-120 mM.

3. The method of claim 1, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (a) tail in RNA: the molecular tag in the reverse transcription primer is NNNNNNND, wherein a plurality of N are respectively and independently one of A, T, C, G, and D is one of A, G, T.

4. The method of claim 3, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (A) tail in the RNA: the sequence of the reverse transcription primer is ACGACGCTCTTCCGATCTTGTACCTTNNNNNNNDCCCCCCCCCTTT.

5. The method of claim 1, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (a) tail in RNA: the molecular tag in the TSO primer is NNNNNNNN, wherein a plurality of N are one of A, T, C, G respectively, and the interval sequence is one of CTAAC, CAGCA, ATAAC.

6. The method of claim 5, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (A) tail in the RNA: the spacer sequence in the TSO primer is located at the 3' end of the molecular tag.

7. The method of claim 5, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (A) tail in the RNA: the sequence of the TSO primer is CAGTGGTATCAACCAGTNNNNNNSPACERGrGrG, wherein SPACER is the SPACER sequence, and rG is ribonucleic acid.

8. The method of claim 1, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (a) tail in RNA: each 10. Mu.L of the reverse transcription system comprises 1. Mu.L of extension-reverse transcription buffer, 0.75. Mu.L of 100mM DTT, 0.27. Mu.L of 100mM dCTP, 3. Mu.L of 5M betaine, 0.09. Mu.L of 1M MgCl ₂ 0.5. Mu.L of 20mM of the reverse transcription primer, 0.1. Mu.L of 100mM of the TSO primer, 1. Mu.L of reverse transcriptase, 1. Mu.L of dNTP mixture each 10mM, 0.2. Mu.L of LRNase inhibitor, 2.09. Mu.L of enzyme-free water.

9. The method of claim 1, wherein the method comprises the step of constructing a sequencing library for detecting the full length of the poly (a) tail in RNA: each 3. Mu.L of the PolyA polymerase reaction system included 0.5. Mu.L of 10mM GTP/ITP mixture, 1. Mu.L of Poly (A) polymerase, 0.5. Mu.L of extension-reverse transcription buffer, 0.2. Mu.L of RNase inhibitor, and 0.8. Mu.L of enzyme-free water.

10. Use of the method of any one of claims 1 to 9 for analyzing the sequence of RNA having a poly (a) tail.