CN110499356B

CN110499356B - Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected

Info

Publication number: CN110499356B
Application number: CN201910837492.2A
Authority: CN
Inventors: 陆发隆; 刘玉胜
Original assignee: Institute of Genetics and Developmental Biology of CAS
Current assignee: Institute of Genetics and Developmental Biology of CAS
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2021-06-08
Anticipated expiration: 2039-09-05
Also published as: CN110499356A

Abstract

The invention discloses a method for constructing a sequencing library of RNA with a poly (A) tail in a sample to be detected. The method comprises the following steps: taking total RNA of a sample to be detected, carrying out terminal extension by using a primer, then fully digesting by using USER enzyme, and recovering a nucleic acid fragment with the length of more than 200 nt; carrying out reverse transcription on the nucleic acid fragment, and then carrying out template exchange by using TSO (TSO) to obtain cDNA (complementary deoxyribonucleic acid); taking the cDNA, and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA; and taking the amplified cDNA, constructing a sequencing library, and finally performing machine sequencing. Experiments prove that the sequencing library constructed by the method provided by the invention can analyze the sequence of RNA with a poly (A) tail, including the sequence of the poly (A) tail. The invention has important application value for the research of poly (A) tail of RNA.

Description

Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected

Technical Field

The invention belongs to the technical field of biology, and particularly relates to a construction method of a sequencing library of RNA (ribonucleic acid) with a poly (A) tail in a sample to be detected.

Background

Most mature RNAs such as messenger RNA (mRNA) and long non-coding RNA (lincRNA) are often regulated by numerous post-transcriptional modifications. Most of the mRNA and lincRNA were transcribed with a template-free poly (A) tail under the catalysis of poly (A) polymerase. The poly (A) tail is considered to be one of the key factors regulating RNA stability and transcription efficiency. mRNA and lincRNA with poly (A) account for only a small portion of intracellular RNA. Oligo (dT) -dependent affinity purification is a common method for isolating mRNA, but this method has inevitable preferences for longer poly (A) tails. Thus, it is desirable to avoid preferential enrichment of poly (A) to obtain the original true poly (A) information when analyzing the poly (A) tail.

After the advent of the Next Generation Sequencing (NGS) era, transcriptome-related studies have emerged endlessly. However, the existing NGS technology can not process the homopolymerization sequence longer than 30nt by using a base recognition algorithm, and the specific base composition of the poly (A) tail cannot be well explained. Even Sanger sequencing has difficulty in identifying long homopolymeric sequences. Smart-seq2 is an extremely sensitive single-cell RNA-seq technique that allows reverse transcription and cDNA library construction via the 3 ' UTR anchor primer 5 ' -AAGCAGTGGTATCAACGCAGAGTACT30VN-3 ' (N for any base and V for A, C or G). The N and V at the end of this primer can anchor it to the end of the 3' UTR, eliminating the poly (A) tail from the final cDNA library by data analysis, excluding the effect of homopolymeric sequences on sequence assembly. Other RNA-seq tools of the Illumina platform may also remove the effect of poly (A) during library construction, sequencing and data analysis. The Iso-seq technology based on the PacBio platform allows the construction of libraries from cDNAs knocking out of poly (A) using a strategy similar to Smart-seq 2. Thus, the conventional RNA-seq or Iso-seq data generated by the Illumina/PacBio platform are missing poly (A) information.

Currently, PAL-seq (poly (A) TAIL length profiling by sequencing) and TAIL-seq can detect the length of poly (A) TAIL in the whole transcriptome range by optimizing the base recognition algorithm for poly (T). By these means, researchers have obtained data on the dynamics of poly (A) tail length from yeast, cell lines, mouse liver, Arabidopsis thaliana leaves, Drosophila, frog, and zebrafish embryos. One particular sequencing method used by PAL-seq can only be implemented on a mass-off sequencer (Illumina genome Analyzer II). The PAL-seq method is based on the principle that dTTP and biotin-labeled dUTP are incorporated during primer extension, and the poly (A) tail length is estimated by reading the incorporation ratio of the biotin label. The PAL-seq method does not chemically recognize non-A residues in the poly (A) tail nor does it yield single base resolution data. TAIL-seq and its modified version of mRNA enrichment, mTAIL-seq, require the determination of the position of the poly (T) terminus by analysis of the original sequencing image using a special base recognition algorithm. Only a small number of sequencers currently provide the raw sequencing images required for this analysis. It is evident that most users do not make good use of the above-described mTAIL-seq sequencing method based on existing commercial sequencing services and general-purpose sequencing instruments. Two other drawbacks of this approach are: the required special base recognition algorithm is only 231bp for effective reading length of poly (T), and non-T residue information in the poly (A) tail is not easy to obtain. It can be seen that TAIL-seq and PAL-seq, while providing a means to describe the length of poly (A) and detect non-adenosine modifications at the poly (A) terminus (non-A modifications), uridine (uridine, U) and guanosine (G) modifications at the 3' end of RNA TAILs have also been identified as a benefit. However, the base composition in the poly (A) tail other than the 3' end is not clear.

During maturation and early embryo development in mouse ova, various developmental events, including clearance of maternal mRNA and activation of the Zygotic Genome (ZGA), occur, these important events being closely related to the various RNAs stored in GV ova, which are regulated by their poly (a) tail.

Disclosure of Invention

The invention aims to obtain the complete sequence of the poly (A) tail in the RNA of a sample to be detected, so that the structure and the function of the poly (A) tail can be more accurately analyzed.

The invention firstly protects the construction method of the sequencing library of the RNA with poly (A) tail in the sample to be detected, and the method sequentially comprises the following steps:

(a1) taking total RNA of a sample to be detected, carrying out terminal extension by using a primer, then adding USER enzyme for full digestion, and recovering a nucleic acid fragment with the length of more than 200 nt;

the primer can comprise a DNA fragment A and a DNA fragment C from the 5 'end to the 3' end in sequence;

the DNA fragment A can be any sequence of 15-40bp (such as 15-25bp, 25-40bp, 15bp, 25bp or 40bp) and can not be combined with a poly (A) tail for PCR amplification;

the DNA fragment C can be composed of 3-24bp (such as 3-18bp, 18-24bp, 3bp, 18bp, 24bp) nucleotides, and each nucleotide can be dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;

(a2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer to obtain cDNA;

the RT primer is all or part of the nucleotide sequence of the DNA fragment A;

(a3) taking the cDNA, and performing template exchange by using TSO;

the TSO can comprise a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence;

the DNA fragment D can be any sequence of 15-40bp (such as 15-25bp, 25-40bp, 15bp, 25bp or 40bp) and is used for PCR amplification;

the nucleic acid fragment D may comprise segment 2, segment 2 being used for binding to the cDNA (C has been added to the 5' end of the cDNA during reverse transcription); segment 2 can be composed of N guanine ribonucleotides or "(N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid", wherein N can be a natural number more than 3;

(a4) taking the cDNA which is finished in the step (a3), and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA;

the upstream primer in the PCR primer is all or part of the nucleotide sequence of the DNA fragment D;

the downstream primer in the PCR primer is all or part of the nucleotide sequence of the DNA fragment A;

(a5) taking the amplified cDNA, and sequencing; obtaining a sequencing library of RNA with poly (A) tail in the sample to be tested.

The invention also provides a construction method of the sequencing library of RNA with poly (A) tail in the single cell, which sequentially comprises the following steps:

(b1) cracking the single cell to obtain cell lysate; then, conducting terminal extension by using a primer; then adding USER enzyme for full digestion, and recovering nucleic acid fragments with the length of more than 200 nt;

the DNA fragment C can be composed of 3-24bp (such as 3-18bp, 18-24bp, 3bp, 18bp, 24bp) nucleotides, and each nucleotide is dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;

(b2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer to obtain cDNA;

the RT primer is all or part of the nucleotide sequence of the DNA fragment A;

(b3) taking the cDNA, and performing template exchange by using TSO;

the nucleic acid fragment D may comprise segment 2, segment 2 being used for binding to the cDNA (C has been added to the 5' end of the cDNA during reverse transcription); segment 2 consists of N guanine ribonucleotides or (N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid, wherein N can be a natural number more than 3;

(b4) taking the cDNA which is finished in the step (b3), and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA;

(b5) taking the amplified cDNA, and sequencing; a sequencing library of RNA with a poly (A) tail in a single cell was obtained.

Any one of the above primer primers may comprise a DNA fragment A and a DNA fragment C in sequence from the 5 'end to the 3' end.

Any one of the TSOs described above may be composed of a DNA fragment D and a nucleic acid fragment D in order from the 5 'end to the 3' end.

Any one of the above-mentioned nucleic acid fragment D may specifically consist of segment 2.

In any of the above methods, in the step (b1), "lysing a single cell to obtain a cell lysate" may be performed by: adding a nonionic surfactant and an RNase inhibitor into the single cells, and incubating to obtain a cell lysate.

The non-ionic surfactant may be TritonX-100.

In the step (b1), the method for obtaining the cell lysate by "lysing the single cells" may be: mu.L of aqueous solution containing 0.2% (v/v) TritonX-100 and 1. mu.L of RNase inhibitor were added to the single cells, and incubated to obtain cell lysates.

The incubation parameter can be 70-90 deg.C (such as 70-80 deg.C, 80-90 deg.C, 70 deg.C, 80 deg.C or 90 deg.C) for 3-10min (such as 3-5min, 5-10min, 3min, 5min or 10 min).

Any one of the primer also comprises a DNA fragment B, wherein the DNA fragment B is positioned at the 3 'end of the DNA fragment A and the 5' end of the DNA fragment C; the DNA fragment B can be a Barcode sequence of 8-30bp (such as 8-16bp, 16-30bp, 8bp, 16bp or 30bp) and is used for distinguishing different samples to be detected or single cells.

Any one of the primer primers can be composed of a DNA fragment A, a DNA fragment B and a DNA fragment C from the 5 'end to the 3' end in sequence.

In any of the above methods, the performing of the terminal extension in steps (a1) and (b1) further requires adding dNTP and DNA polymerase to the system.

The DNA polymerase may in particular be the Klenow fragment (exo-, NEB).

The conditions for the terminal extension may be: incubating at 32-42 deg.C (such as 32-37 deg.C, 37-42 deg.C, 32 deg.C, 37 deg.C or 42 deg.C) for 30min-90min (such as 30min-60min, 60min-90min, 30min, 60min or 90 min).

The condition of "USER enzyme digestion" can be 32-42 deg.C (such as 32-37 deg.C, 37-42 deg.C, 32 deg.C, 37 deg.C or 42 deg.C) for 30min-90min (such as 30min-60min, 60min-90min, 30min, 60min or 90 min).

In any of the above-described methods, in step (a2) and step (b2), the reverse transcriptase adds C to the 5' end of the reverse transcription product (i.e., cDNA) because the reverse transcriptase has terminal transferase activity.

In any of the methods described above, in steps (a3) and (b3), the nucleic acid fragment D may comprise segment 1, segment 1 being located 5' to segment 2; segment 1 may be any sequence of 0-10bp (e.g., 0-2bp, 2-10bp, 0bp, 2bp, or 10 bp).

Any of the above-mentioned nucleic acid fragment D may specifically consist of segment 1 and segment 2.

In any of the above methods, in the step (a3) and the step (b3), the DNA fragment d may be DNA fragment a.

If both DNA fragment D and DNA fragment A contain DNA fragment I, the upstream primer or the downstream primer may be all or part of the nucleotide sequence of DNA fragment I, in which case the upstream primer and the downstream primer may be the same, i.e., a single primer is used for PCR amplification.

In any of the above methods, the "taking amplified cDNA and sequencing" in step (a5) and step (b5) may be: mixing the amplified cDNA larger than 200bp with the amplified cDNA larger than 2kb, and then adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library; the SMRTbell Template library was sequenced using the PacBio platform.

The RNA in the sequencing library of RNA with a poly (A) tail in the sample to be tested constructed by any one of the methods described above or the RNA with a poly (A) tail in a single cell constructed by any one of the methods described above all contains complete poly (A) tail information.

The RNA in the sequencing library of RNA with poly (A) tail in the sample to be tested constructed by any one of the methods or the RNA with poly (A) tail in the single cell constructed by any one of the methods comprises complete RNA information.

In the above, the nucleotide sequence of the primer may be 5 '-AAGCAGTGGTATCACGCAGAGTACTAGCACTCdUTTTTTTTTdUT TTTTTTTTT-3'. The nucleotide sequence of DNA fragment A or DNA fragment D may be 5'-AAGCAGTGGTATCAACGCAGAGTAC-3'. The nucleotide sequence of the DNA fragment B may be 5'-TACTAGAGTAGCACTC-3'. The nucleotide sequence of the DNA fragment C may be 5 '-dUTTTTTTTTdUTTTTTTTTTT-3'. The nucleotide sequence of the RT primer may be 5'-AAGCAGTGGTATCAACGCAGAGTAC-3'. The nucleotide sequences of both the forward and the reverse primers can be 5'-AAGCAGTGGTATCAACGCAGAGT-3'. The nucleotide sequence of the nucleic acid fragment D can be 5' -ATrGrGG^*-3' (rG represents guanine ribonucleotide, G^*Indicates a guanine deoxyribonucleotide modified with a locked nucleic acid). The nucleotide sequence of segment 1 may be 5 '-AT-3'. The nucleotide sequence of segment 2 can be 5' -rGrGG^*-3'. The nucleotide sequence of TSO may be 5' -AAGCAGTGGTATCAACGCAGAGATRACRGGG^*-3’。

The present invention also protects any of S1) -S6).

S1) application of the method for constructing the sequencing library of the RNA with the poly (A) tail in any sample to be detected in analyzing the sequence of the RNA with the poly (A) tail in the sample to be detected.

S2) application of the method for constructing the sequencing library of the RNA with the poly (A) tail in any sample to be detected in analyzing the poly (A) tail sequence of the RNA with the poly (A) tail in the sample to be detected.

S3) use of the method for constructing the sequencing library of RNA having a poly (A) tail in any one of the single cells described above in the analysis of the sequence of RNA having a poly (A) tail in a single cell.

S4) use of the method for constructing the sequencing library of RNA having a poly (A) tail in any one of the single cells described above in the analysis of the poly (A) tail sequence of RNA having a poly (A) tail in a single cell.

S5) use of any of the methods described above for analyzing the sequence of RNA having a poly (A) tail.

S6) use of any of the methods described above for analyzing a poly (A) tail sequence of an RNA having a poly (A) tail.

The invention also protects a kit, which can comprise at least one of any one of the primer, any one of the RT primer, any one of the TSO and any one of the PCR primer.

The kit can specifically comprise any one of the above-mentioned guide primers, any one of the above-mentioned RT primers, any one of the above-mentioned TSOs and any one of the above-mentioned PCR primers.

The invention also protects the application of any one of the kits, which can be at least one of T1) -T8):

t) analyzing the sequence of RNA with poly (A) tail in the sample to be detected;

t2) analyzing the poly (A) tail sequence of RNA with poly (A) tail in the sample to be tested;

t3) analyzing the sequence of RNA with a poly (a) tail in single cells;

t4) analyzing the poly (A) tail sequence of RNA having a poly (A) tail in single cells;

t5) analyzing the sequence of RNA having a poly (A) tail;

t6) analyzing the poly (A) tail sequence of RNA having a poly (A) tail;

t7) constructing a sequencing library of RNA with poly (A) tail in a sample to be tested;

t8) construction of a sequencing library of RNA with a poly (A) tail in a single cell.

Any of the above test samples may be a biological cell or a biological tissue.

The total RNA of any one of the above test samples may be total RNA of biological cells or total RNA of biological tissues. The total RNA may or may not be purified (i.e., contain small amounts of impurities such as cells, tissues, extracts, etc.).

Any of the single cells described above may be biological single cells. The single cells of the organism may or may not be purified (i.e., contain small amounts of impurities such as centrates, mitochondria, etc.).

Any of the above organisms may be animals, plants or microorganisms. The animal may be a mouse, human, pig, cow, rabbit or sheep, and the like. The mouse may be a mouse or a rat. The mouse may be a CD1(ICR) mouse. The pig may be a white pig.

Any of the above cells may be egg cells, cancer cells, leukocytes or plant (common) cells, and the like. The cancer cell may specifically be a cervical cancer cell. The egg cell may be a GV egg cell.

Any of the above tissues may be liver tissue, kidney tissue, spleen tissue or plant (common) tissue, etc.

In the embodiment of the present invention, the sample to be tested may specifically be GV egg cells, Hela cells or porcine liver tissue of CD1(ICR) mouse. The single cell may be a single GV ovum of a CD1(ICR) mouse.

The present inventors have developed a new sequencing method, named PAIso-seq (poly (a) inclusive RNA isoforma sequencing), which can accurately and sensitively read the full-length poly (a) information contained in different RNA subtypes in the transcriptome domain. The steps of the PAIso-seq process are as follows: first, a linker is added after the poly (A) tail to preserve the poly (A) sequence; then using template exchange method to amplify full-length cDNA, and reserving complete poly (A) tail; finally, three generations of sequencing were performed using the PacBio platform. The template-dependent terminal extension mechanism preserves the integrity of the poly (A) sequence, and the linker sequence provides a primer binding site for reverse transcription and PCR amplification. The end extension by oligo (dT) is template dependent and is only performed in RNA containing poly (A), avoiding the subsequent enrichment step of poly (A) + RNA. The use of template exchange allows the sensitivity of the PAISo-seq method to be comparable to the single-cell RNA-seq technique which is by far the most sensitive. The PacBio sequencing platform also has the advantage of accurately resolving long-chain homopolymeric sequences. In addition, the circular single-molecule cycle sequencing strategy adopted by the technical platform can perform multiple data reads on a single cDNA molecule so as to acquire highly accurate consistent base recognition data. The inventors of the present invention analyzed poly (A) tail data in the whole transcriptome in a large scale using the PAISo-seq method, including quantitative length of poly (A) tail and accurate non-A modification analysis, in mouse GV egg cells, mouse single GV egg, Hela cells and porcine liver tissue, respectively. The results indicate that in addition to U and G modifications at the 3' end of many mRNAs, some U, G or C non-A modifications occur inside the poly (A) tail; different RNA subtypes can be transcribed from the same gene, and the RNA of the specific subtypes has different poly (A) tails. Also, the PAISo-seq method allows for the precise analysis of the base composition of the poly (A) tail. The results of the analysis show that the PAISo-seq data show that many non-A residues are also widely distributed within the poly (A) tail, rather than the previously thought 3' end of the tail. Thus, the PAISo-seq method allows for the fine analysis of poly (A) tails from different RNA subtypes, suggesting that the base composition of poly (A) tails is far more complex than previously thought. This also suggests that there are very complex and further-explored mechanisms for modification and regulation of the poly (A) tail. These results not only provide a new tool for the study of the poly (A) tail, which is accurate and sensitive, but also open a new door for the functional and regulatory studies of the poly (A) tail. The PAISo-seq method is very sensitive and can be used for RNA poly (A) analysis of the whole transcriptome of a single cell.

Drawings

FIG. 1 is a schematic flow chart of the construction of a PAISo-seq sequencing library for a sample to be tested.

FIG. 2 is a diagram showing the analysis of poly (A) tail length of Dnmt1 gene, Cnot7 gene, Btg4 gene and Plat gene in GV egg cells and individual GV eggs.

FIG. 3 is a diagram of analysis of poly (A) information in transcripts. GV rep.1 is GV egg sample repeat 1, GV rep.2 is GV egg sample repeat 2, SCGV com is a mixed sample of individual GV eggs, the blue line is a linear regression line, the light grey area indicates the regression confidence interval, and the number of transcripts with the corresponding length poly (a) tail is indicated.

FIG. 4 is a plot of poly (A) tail length distribution, frequency of non-A modification and spearman correlation analysis for all transcripts in pooled sample data from single GV ova, GV ova and single GV ova. GV rep.2 is GV egg sample replicate 2, SCGV com is a mixed sample of individual GV eggs, blue line is linear regression line, light grey area indicates regression confidence interval.

FIG. 5 shows the poly (A) tail length distribution and non-A modification frequency of transcripts in Hela cells.

FIG. 6 shows poly (A) tail length distribution and non-A modification frequency of transcripts in porcine liver tissue.

Detailed Description

The following examples are given to facilitate a better understanding of the invention, but do not limit the invention.

The experimental procedures in the following examples are conventional unless otherwise specified.

The test materials used in the following examples were purchased from a conventional biochemical reagent store unless otherwise specified.

The quantitative tests in the following examples, all set up three replicates and the results averaged.

Example 1 construction of a PAISo-seq sequencing library for samples to be tested

Because long-chain homopolymeric sequences cannot be resolved, the current Illumina platform does not provide a good tool for analyzing poly (a) tails. TAIL-seq and PAL-seq techniques can obtain the length information of poly (T) by using a special poly (T) length analysis algorithm or a special sequencing method, but neither method can identify the non-A modifications distributed outside the 3' terminal region in poly (A). In addition, the TAIL-seq and PAL-seq methods require milligram-sized RNA samples, which limits their use in precious biopsy samples and patient samples. The newly developed PacBio third-generation sequencing technology can identify and analyze the base composition of a long-chain homopolymerized sequence by reading a single molecule in real time. In addition, the application of the cyclization sequencing template in the library enables a single template to be read for multiple times, and the method is used for obtaining CCS (Circular Consensus sequences) information, so that the accuracy of Sequence analysis is greatly improved. Thus, the PacBio third generation sequencing platform may be the best choice for resolving the poly (A) tail length and base composition in RNA.

Through a large number of experiments, the inventor of the invention accurately analyzes the full-length RNA sequences and poly (A) tail sequences of a plurality of target genes in a sample to be detected by constructing a PAISo-seq sequencing library of the sample to be detected. The method comprises the following specific steps:

1. end extension

Taking the total RNA of a sample to be detected, adding dNTP and Klenow fragments (exo-, NEB) and mixing uniformly; then, the terminal extension reaction is carried out using a primer.

The primer comprises a DNA fragment A, a DNA fragment B and a DNA fragment C from a5 'end to a 3' end in sequence; the DNA fragment A is an arbitrary sequence (the length is 15-40bp) and is used for PCR amplification; the DNA fragment B is a Barcode sequence (the length is 8-30bp) and is used for distinguishing different PAISo-seq sequencing libraries; the DNA fragment C consists of 3-24bp nucleotides, each nucleotide is uracil deoxyribonucleotide (expressed by dU) or T; at least one of the 3 nucleotides at the C5' end of the DNA fragment is dU.

The terminal extension reaction conditions were: incubate at 37 ℃ for 1 h.

2. USER digestion and RNA purification

(1) The USER enzyme (NEB) was added to the reaction system completing step 1 and incubated at 37 ℃ for 1 h.

The USER enzyme can recognize and cut dU residues in the guide primer so as to cut the guide primer and avoid the guide primer as a reverse transcription primer.

(2) Recovering nucleic acid fragments with length of more than 200 nt.

The recovery method specifically comprises the following steps: adding 50 mu L of RNA Binding buffer (a component in an RNA Clean & Concentrator-5 kit) and 50 mu L of ethanol into the reaction system which finishes the step (1), and uniformly mixing to obtain a mixed solution; transferring the mixed solution to a Zymo-Spin IC column and centrifuging; the column was washed with RNAclean & Concentrator-5 kit (Zymo Research) and centrifuged; adding 6-8 μ L of enucleated enzyme water into the column base for elution, and centrifuging to obtain nucleic acid solution; the nucleic acid fragments in the nucleic acid solution are all subjected to terminal extension and have a length of more than 200 nt.

3. Reverse transcription and terminal transfer C

Taking the nucleic acid solution obtained in the step 2, carrying out reverse transcription by using an RT primer, wherein the reverse transcriptase adds C to the 5 'end of the reverse transcription product due to the terminal transferase activity of the reverse transcriptase, and a cDNA with a C label at the 5' end is obtained.

The RT primer is all or part of the nucleotide sequence of the DNA fragment A.

4. Template replacement

And (4) taking the reaction system which finishes the step (3), and carrying out template exchange by using TSO to obtain cDNA.

The TSO comprises a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence; the DNA fragment D is any sequence (the length is 15-40 bp); the nucleic acid fragment D consists of a segment 1 and a segment 2; segment 1 is any DNA sequence (0-10 bp in length); segment 2 consists of N rG's or "(N-1) rG's and 1G^*"composition, N is a natural number of 3 or more; rG represents guanine ribonucleotide, G^*Shows a locked nucleic acid modified guanine deoxyribonucleotide.

The DNA fragment D can be a DNA fragment A.

5. Amplification of cDNA

And (4) performing PCR amplification by using the cDNA obtained in the step (4) as a template and adopting an upstream primer and a downstream primer to obtain amplified cDNA.

The upstream primer is all or part of the nucleotide sequence of the DNA fragment D.

The downstream primer is all or part of the nucleotide sequence of the DNA fragment A.

If the DNA fragment A and the DNA fragment D both contain the DNA fragment I, and the upstream primer and the downstream primer are all or part of the nucleotide sequence of the DNA fragment I, the upstream primer and the downstream primer can be the same, i.e., a single primer is used for PCR amplification.

6. Ligation of cyclized linkers

The amplified cDNA obtained in step 5 is subjected to length selection by using Pure PB beads, specifically, amplified cDNA larger than 200bp is selected by using 1 × beads, and amplified cDNA larger than 2kb is selected by using 0.4 × beads. The amplified cDNA of more than 200bp and the amplified cDNA of more than 2kb were mixed equimolar, and then an SMRTbell Template library was constructed using an SMRTbell Template Prep kit (PacBio).

7. PacBio sequencing

And sequencing the SMRTbell Template library by adopting a PacBio platform to obtain a PAISo-seq sequencing library of the sample to be detected.

According to the sequencing result, the full-length RNA sequence of the target gene in the sample to be tested and the poly (A) tail sequence thereof can be accurately analyzed.

A schematic flow chart of the PAISo-seq sequencing library of the samples to be tested is shown in FIG. 1(DNA fragment D is identical to DNA fragment A).

If the sample to be detected is a single cell, replacing the step 1 of taking total RNA of the sample to be detected with the step of taking the single cell, adding 19 mu L of aqueous solution containing 0.2% (v/v) TritonX-100 and 1 mu L of RNase inhibitor, incubating for 5min at 80 ℃, cracking to obtain a cell lysate, and obtaining the PAISo-seq sequencing library of the single cell without changing other steps.

Example 2, example 1 use of the PAISo-seq sequencing library of test samples constructed in example 2 to analyze the length of the poly (A) tail of the transcript in mouse GV egg cells and mouse single GV egg

1. Injecting pregnant mare serum gonadotropin (ProSpec) into the abdominal cavity of 7-8 weeks old CD1(ICR) mice (Beijing Wintonlitha laboratory animal technology Co., Ltd.), and then obtaining ova by ovary puncture (30 gauge); the ova were washed 1 times with M2 medium (Sigma) and3 times with PBS (PBSA) buffer containing 0.1% (v/v) BSA to obtain GV ova.

2. And (3) extracting total RNA from the GV egg cells by adopting a Direct-zol RNA MicroPrep kit (Zymo Research) to obtain the total RNA of the GV egg cells. The method comprises the following specific steps:

(1) adding 500 μ L TRIzol (Ambion) into GV egg cells, lysing, and mixing thoroughly;

(2) adding 500 mu L of ethanol into the system which finishes the step (1), and thoroughly mixing;

(3) transferring the system completing the step (2) to a Zymo-Spin IC column, and centrifuging; then washing the column and centrifuging;

(4) and (4) after the step (3) is finished, adding 10-30 mu L of enucleated enzyme water into the column base for elution, and centrifuging to obtain the total RNA of the GV egg cells.

3. The "total RNA of test sample" in example 1 was replaced with total RNA of GV egg cells, and the other steps were not changed, to obtain a PAISo-seq sequencing library of GV egg cells. This experiment was repeated twice.

4. Replacing the step 1 of the embodiment 1 with the step of taking total RNA of a sample to be detected by the step of taking a single GV ovum, adding 19 mu L of aqueous solution containing 0.2 percent (v/v) TritonX-100 and 1 mu L of RNase inhibitor, incubating for 5min at 80 ℃, and cracking to obtain cell lysate, wherein other steps in the

steps

1 and 2 are not changed to obtain a single-cell nucleic acid solution; mixing the nucleic acid solutions of 15 single cells (obtained from 15 single GV ova respectively) to obtain a mixed nucleic acid solution (in the mixed nucleic acid solution, the nucleic acid quality of each single cell is the same); the PAISo-seq sequencing library of a mixed sample of single GV eggs was obtained by replacing the "nucleic acid solution" in step 3 of example 1 with a mixed nucleic acid solution, with steps 3 to 7 unchanged. The experiment was repeated once.

5. The sequence of "total RNA of sample to be tested" in step 1 of example 1 was replaced by "taking single GV ovum, adding 19. mu.L of aqueous solution containing 0.2% (v/v) TritonX-100 and 1. mu.L of RNase inhibitor, incubating at 80 ℃ for 5min, lysing to obtain cell lysate", and obtaining PAISo-seq sequencing library of single GV ovum without changing other steps. A total of 15 GV eggs (designated C1-C15) were collected and, accordingly, a PAISo-seq sequencing library of 15 GV eggs was obtained.

In step 3, step 4 and step 5, the nucleotide sequence of each primer is as follows: and (3) primer guiding:

(the box is a Barcode sequence); RT primer: 5'-AAGCAGTGGTATCAACGCAGAGTAC-3', respectively; TSO: 5' -AAGCAGTGGTATCAACGCAGAGATRACRGGG^*-3'. Because the DNA fragment D is the same as the DNA fragment A, the upstream primer and the downstream primer for PCR amplification are also the same, namely, a single primer is adopted for PCR amplification; primers for PCR amplification: 5'-AAGCAGTGGTATCAACGCAGAGT-3' are provided.

6. Analyzing the poly (A) tail length of the Dnmt1 gene, the Cnot7 gene, the Btg4 gene and the Plat gene in a mixed sample of the GV egg cells and the single GV ovum according to the sequencing results of the step 3 and the step 4, wherein the poly (A) tail length is expressed by a median; and analyzing poly (A) information in the captured transcript.

The results of the poly (A) tail length analysis are shown in the left panel of FIG. 2 (GV rep.1 is GV egg sample repeat 1, GV rep.2 is GV egg sample repeat 2, SCGV com. is a mixed sample of single GV eggs, Dnmt1 gene, Cnot7 gene, Btg4 gene and Plat gene from top to bottom). The results showed that the poly (A) tail of DNMT1 gene was longer, while the poly (A) tail of Cnot7 gene, Btg4 gene and Plat gene was shorter. Transcript numbers have a spaerman correlation between 3 repeats (a in fig. 3).

The poly (A) tail length has a spearman correlation between 3 repeats (B in FIG. 3).

7. Analyzing poly (A) tail length distribution and non-A modification frequency of all transcripts in mixed sample data of single GV ovum, GV ovum and single GV ovum according to the sequencing results of step 3, step 4 and step 5, and analyzing the spearman correlation of poly (A) tail length between C15 and C4, C4 and GV rep.2 and C15 and GV rep.2.

The results of the poly (A) tail length distribution portion of all transcripts are shown in FIG. 4A (the numbers below and in the red dots are the median poly (A) tail lengths).

The results of the non-A modified frequency part are shown in FIG. 4B.

The poly (A) tail length between C15 and C4 has a spearman correlation (top panel of C in FIG. 4).

The poly (A) tail length between C4 and GV rep.2 has a spearman correlation (middle panel C in FIG. 4).

The poly (A) tail length between C15 and GV rep.2 has a spearman correlation (lower panel of C in FIG. 4).

8. Poly (A) tail lengths of Dnmt1 gene, Cn 7 gene, Btg4 gene and Plat gene in individual GV ova were determined using PAT in combination with Fragment Analyzer capillary electrophoresis (described in Mishima, Yuichiro, Tomari, Yukihide. code use and 3' UTR Length determination of mRNA Stability in Zebraphis. 10.1016/j. molcel. 2016.02.027). The experiment was repeated three times and the mean value was taken.

The detection results are shown in the right panel of FIG. 2. The results showed that the poly (A) tail of DNMT1 gene was longer, while the poly (A) tail of Cnot7 gene, Btg4 gene and Plat gene was shorter, which is completely consistent with the results in the left panel of FIG. 2.

The above results indicate that the PAISo-seq sequencing library of GV eggs can analyze the poly (A) tail length of transcripts in GV eggs, the PAISo-seq sequencing library of individual GV eggs or the PAISo-seq sequencing library of pooled samples of individual GV eggs can analyze the poly (A) tail length of transcripts in individual GV eggs.

Example 3, example 1 use of the PAISo-seq sequencing library of test samples constructed in example 1 to analyze transcripts of Hela cells for poly (A) tail length and frequency of non-A modifications

1. Extracting total RNA from Hela cells (ATCC) by using a Direct-zol RNA MicroPrep kit to obtain the total RNA of the Hela cells. The method comprises the following specific steps:

(1) taking Hela cells, adding 500 mu L TRIzol for cracking, and thoroughly mixing;

(4) and (4) after the step (3) is finished, adding 10-30 mu L of enucleated enzyme water into the column base for elution, and centrifuging to obtain the total RNA of the Hela cells.

2. The PAISo-seq sequencing library of Hela cells was obtained by replacing "total RNA of test sample" in example 1 with total RNA of Hela cells, and the other steps were not changed. This experiment was repeated twice.

The nucleotide sequences of the primer, RT primer, TSO, and the primer for PCR amplification were the same as those in example 2.

3. According to the sequencing results of step 2, the transcript poly (A) tail length distribution and the frequency of non-A modifications in Hela cells were analyzed.

The results of the distribution of the transcript poly (A) tail lengths in Hela cells are shown in the left panel of FIG. 5.

Results for frequency of non-A modification in Hela cells the right panel in FIG. 5.

The results indicate that the PAISo-seq sequencing library of Hela cells can analyze transcripts in Hela cells for poly (A) tail length and non-A modification frequency.

Example 4, example 1 use of the PAISo-seq sequencing library of test samples constructed in example 4 to analyze transcripts in porcine liver tissue for poly (A) tail length and frequency of non-A modifications

1. Taking a pig liver tissue (the pig variety is a white pig), and extracting total RNA by adopting a Direct-zol RNAmuricroprep kit to obtain the total RNA of the pig liver tissue. The method comprises the following specific steps:

(1) taking the liver tissue of the pig, adding 500 mu L TRIzol for cracking, and thoroughly mixing;

(4) and (4) after the step (3) is finished, adding 10-30 mu L of enucleated enzyme water into the column base for elution, and centrifuging to obtain the total RNA of the pig liver tissue.

2. The PAISo-seq sequencing library of the porcine liver tissue was obtained by replacing the "total RNA of the sample to be tested" in example 1 with the total RNA of the porcine liver tissue, and the other steps were not changed. This experiment was repeated twice.

3. According to the sequencing result of step 2, the transcript poly (A) tail length distribution and the non-A modification frequency in the pig liver tissue were analyzed.

The results of the transcript poly (A) tail length distribution in porcine liver tissue are shown in the left panel of FIG. 6.

Results for non-a modification frequency in porcine liver tissue are shown in the right panel of fig. 6.

The results indicate that the PAISo-seq sequencing library of porcine liver tissue can analyze transcripts in porcine liver tissue for poly (A) tail length and non-A modification frequency.

Example 5, example 1 the accuracy of the PAISo-seq sequencing library of the samples to be tested for the detection of the length of the poly (A) tail

1. Reference primers spike-in were synthesized by Takara and poly (A) tails of different lengths in spike-in were barcoded before the start codon (spike-in tails set to 10, 30, 50, 70 and 100A's, respectively).

2. After the step 1 is completed, respectively carrying out PCR amplification by using mCherry vectors containing spike-in of poly (A) tails with different lengths as templates to obtain double-stranded DNA of the spike-in with different lengths, and then carrying out PCR amplification product gel recovery according to different lengths.

3. Taking the PCR amplification product, and adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library.

4. The SMRTbell Template library was sequenced using the PacBio platform.

According to the sequencing results, the PAISo-seq analysis data of Spike-in shows that the median length of poly (A) tail is 10, 28, 48, 67 and 97, respectively. As can be seen, the PAISo-seq sequencing library of the test sample constructed in example 1 allows an accurate quantitative description of the length of the poly (A) tail.

The above results show that the PAISo-seq sequencing library of the test sample constructed in example 1 accurately reflects the length of the poly (A) tail.

Claims

1. The construction method of the sequencing library of the RNA with the poly (A) tail in the sample to be detected sequentially comprises the following steps:

(a1) taking total RNA of a sample to be detected, carrying out end extension by using a primer and a Klenow fragment, then adding USER enzyme for full digestion, and recovering a nucleic acid fragment with the length of more than 200 nt;

the primer comprises a DNA fragment A and a DNA fragment C from the 5 'end to the 3' end in sequence;

the DNA fragment A is an arbitrary sequence of 15-40bp and can not be combined with poly (A) tail for PCR amplification;

the DNA fragment C consists of 3-24bp nucleotides, and each nucleotide is dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;

(a2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer and a reverse transcriptase with terminal transferase activity to obtain cDNA;

the RT primer is the whole nucleotide sequence of the DNA fragment A;

(a3) taking the cDNA, and performing template exchange by using TSO;

the TSO comprises a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence;

the DNA fragment D is any sequence of 15-40bp and is used for PCR amplification;

the nucleic acid fragment D comprises a segment 2, and the segment 2 is used for combining with the cDNA; segment 2 consists of N guanine ribonucleotides or (N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid, wherein N is a natural number of more than 3;

the upstream primer in the PCR primer is the whole nucleotide sequence of the DNA fragment D;

the downstream primer in the PCR primer is the whole nucleotide sequence of the DNA fragment A;

2. The method of claim 1, wherein: in the step (a1), the primer further comprises a DNA fragment B, wherein the DNA fragment B is located at the 3 'end of the DNA fragment A and the 5' end of the DNA fragment C; the DNA fragment B is a Barcode sequence of 8-30bp and is used for distinguishing different samples to be detected.

3. The method of claim 1, wherein: in the step (a3), the nucleic acid fragment D comprises segment 1, segment 1 is located at the 5' end of segment 2; segment 1 is any sequence of 0-10 bp.

4. The method of claim 1, wherein: in the step (a3), the DNA fragment D is DNA fragment A.

5. The method of claim 1, wherein: in the step (a5), the amplified cDNA is sequenced as follows: mixing the amplified cDNA larger than 200bp with the amplified cDNA larger than 2kb, and then adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library; the SMRTbell Template library was sequenced using the PacBio platform.

6. The construction method of the sequencing library of the RNA with the poly (A) tail in the single cell sequentially comprises the following steps:

(b1) cracking the single cell to obtain cell lysate; then using a primer and a Klenow fragment for terminal extension; then adding USER enzyme for full digestion, and recovering nucleic acid fragments with the length of more than 200 nt;

(b2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer and a reverse transcriptase with terminal transferase activity to obtain cDNA;

the RT primer is the whole nucleotide sequence of the DNA fragment A;

(b3) taking the cDNA, and performing template exchange by using TSO;

7. The method of claim 6, wherein: in the step (b1), the method for obtaining the cell lysate by lysing the single cells comprises the following steps: adding a nonionic surfactant and an RNase inhibitor into the single cells, and incubating to obtain a cell lysate.

8. The method of claim 6, wherein: in the step (b1), the primer further comprises a DNA fragment B, wherein the DNA fragment B is located at the 3 'end of the DNA fragment A and the 5' end of the DNA fragment C; the DNA fragment B is 8-30bp Barcode sequence and is used for distinguishing different single cells.

9. The method of claim 6, wherein: in the step (b3), the nucleic acid fragment D comprises segment 1, segment 1 is located at the 5' end of segment 2; segment 1 is any sequence of 0-10 bp.

10. The method of claim 6, wherein: in the step (b3), the DNA fragment D is DNA fragment A.

11. The method of claim 6, wherein: in the step (b5), the amplified cDNA is sequenced as follows: mixing the amplified cDNA larger than 200bp with the amplified cDNA larger than 2kb, and then adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library; the SMRTbell Template library was sequenced using the PacBio platform.

12. Use of the method of any one of claims 1 to 5 for analysing the sequence of RNA having a poly (A) tail in a test sample.

13. Use of the method of any one of claims 1 to 5 for analyzing a poly (A) tail sequence of RNA having a poly (A) tail in a sample to be tested.

14. Use of the method of any one of claims 6 to 11 for analysing the sequence of RNA having a poly (a) tail in a single cell.

15. Use of the method of any one of claims 6 to 11 for analyzing the poly (a) tail sequence of RNA having a poly (a) tail in a single cell.

16. Use of the method of any one of claims 1 to 11 for analyzing the sequence of RNA having a poly (a) tail.

17. Use of the method of any one of claims 1 to 11 for analyzing the poly (a) tail sequence of RNA having a poly (a) tail.