CN110499356B - Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected - Google Patents
Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected Download PDFInfo
- Publication number
- CN110499356B CN110499356B CN201910837492.2A CN201910837492A CN110499356B CN 110499356 B CN110499356 B CN 110499356B CN 201910837492 A CN201910837492 A CN 201910837492A CN 110499356 B CN110499356 B CN 110499356B
- Authority
- CN
- China
- Prior art keywords
- dna fragment
- poly
- tail
- sequence
- primer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 108091036407 Polyadenylation Proteins 0.000 title claims abstract description 121
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 74
- 238000010276 construction Methods 0.000 title claims description 13
- 229920002477 rna polymer Polymers 0.000 title description 77
- 238000000034 method Methods 0.000 claims abstract description 60
- 239000002299 complementary DNA Substances 0.000 claims abstract description 55
- 150000007523 nucleic acids Chemical group 0.000 claims abstract description 44
- 238000012408 PCR amplification Methods 0.000 claims abstract description 26
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims abstract description 16
- 238000010839 reverse transcription Methods 0.000 claims abstract description 14
- 102000004190 Enzymes Human genes 0.000 claims abstract description 11
- 108090000790 Enzymes Proteins 0.000 claims abstract description 11
- 239000012634 fragment Substances 0.000 claims description 99
- 239000002773 nucleotide Substances 0.000 claims description 42
- 125000003729 nucleotide group Chemical group 0.000 claims description 42
- 108020004707 nucleic acids Proteins 0.000 claims description 16
- 102000039446 nucleic acids Human genes 0.000 claims description 16
- UYTPUPDQBNUYGX-UHFFFAOYSA-N Guanine Natural products O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 claims description 15
- 239000013592 cell lysate Substances 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 11
- 238000011144 upstream manufacturing Methods 0.000 claims description 11
- 108091028664 Ribonucleotide Proteins 0.000 claims description 10
- 239000002336 ribonucleotide Substances 0.000 claims description 10
- 239000005547 deoxyribonucleotide Substances 0.000 claims description 7
- 102100034343 Integrase Human genes 0.000 claims description 6
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 6
- 238000005336 cracking Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 6
- 239000003161 ribonuclease inhibitor Substances 0.000 claims description 6
- 230000029087 digestion Effects 0.000 claims description 5
- 230000002934 lysing effect Effects 0.000 claims description 5
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 claims description 4
- 102100029764 DNA-directed DNA/RNA polymerase mu Human genes 0.000 claims description 4
- 102000004594 DNA Polymerase I Human genes 0.000 claims description 3
- 108010017826 DNA Polymerase I Proteins 0.000 claims description 3
- 239000002736 nonionic surfactant Substances 0.000 claims description 3
- 238000002474 experimental method Methods 0.000 abstract description 8
- 238000011160 research Methods 0.000 abstract description 3
- 210000004027 cell Anatomy 0.000 description 65
- 108020004414 DNA Proteins 0.000 description 64
- 239000000523 sample Substances 0.000 description 43
- 235000013601 eggs Nutrition 0.000 description 33
- 108091034057 RNA (poly(A)) Proteins 0.000 description 25
- 238000012986 modification Methods 0.000 description 21
- 230000004048 modification Effects 0.000 description 21
- 210000004681 ovum Anatomy 0.000 description 18
- 210000005228 liver tissue Anatomy 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 14
- 241000699666 Mus <mouse, genus> Species 0.000 description 13
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 12
- 241000282898 Sus scrofa Species 0.000 description 10
- 108020004999 messenger RNA Proteins 0.000 description 9
- 239000000243 solution Substances 0.000 description 9
- 108010000912 Egg Proteins Proteins 0.000 description 8
- 102000002322 Egg Proteins Human genes 0.000 description 8
- 108090000623 proteins and genes Proteins 0.000 description 8
- 239000000203 mixture Substances 0.000 description 7
- 101150007297 Dnmt1 gene Proteins 0.000 description 6
- 101100165680 Mus musculus Btg4 gene Proteins 0.000 description 6
- 101150055168 Plat gene Proteins 0.000 description 6
- 108091028043 Nucleic acid sequence Proteins 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 5
- 210000001519 tissue Anatomy 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- 238000003559 RNA-seq method Methods 0.000 description 4
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 4
- 239000007864 aqueous solution Substances 0.000 description 4
- 238000010828 elution Methods 0.000 description 4
- 229920001519 homopolymer Polymers 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 241000196324 Embryophyta Species 0.000 description 3
- 108091007460 Long intergenic noncoding RNA Proteins 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 3
- 239000011324 bead Substances 0.000 description 3
- 230000027455 binding Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000007481 next generation sequencing Methods 0.000 description 3
- 238000005406 washing Methods 0.000 description 3
- 108020005345 3' Untranslated Regions Proteins 0.000 description 2
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 2
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical compound C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- UVCJGUGAGLDPAA-UHFFFAOYSA-N ensulizole Chemical compound N1C2=CC(S(=O)(=O)O)=CC=C2N=C1C1=CC=CC=C1 UVCJGUGAGLDPAA-UHFFFAOYSA-N 0.000 description 2
- 239000012535 impurity Substances 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 239000011259 mixed solution Substances 0.000 description 2
- 229920009537 polybutylene succinate adipate Polymers 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000007671 third-generation sequencing Methods 0.000 description 2
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 2
- 229940045145 uridine Drugs 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- 101150101112 7 gene Proteins 0.000 description 1
- 241001450805 Allenbatrachus grunniens Species 0.000 description 1
- 241000219195 Arabidopsis thaliana Species 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 108091035707 Consensus sequence Proteins 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 241000252212 Danio rerio Species 0.000 description 1
- AHCYMLUZIRLXAA-SHYZEUOFSA-N Deoxyuridine 5'-triphosphate Chemical compound O1[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C[C@@H]1N1C(=O)NC(=O)C=C1 AHCYMLUZIRLXAA-SHYZEUOFSA-N 0.000 description 1
- 241000255581 Drosophila <fruit fly, genus> Species 0.000 description 1
- 102000006771 Gonadotropins Human genes 0.000 description 1
- 108010086677 Gonadotropins Proteins 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 108091027974 Mature messenger RNA Proteins 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 101710124239 Poly(A) polymerase Proteins 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- 230000004570 RNA-binding Effects 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 238000012352 Spearman correlation analysis Methods 0.000 description 1
- 108091081024 Start codon Proteins 0.000 description 1
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 1
- 108091036066 Three prime untranslated region Proteins 0.000 description 1
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Natural products O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 210000000683 abdominal cavity Anatomy 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 238000001261 affinity purification Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000012148 binding buffer Substances 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 238000005251 capillar electrophoresis Methods 0.000 description 1
- 238000006555 catalytic reaction Methods 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- NHVNXKFIZYSCEB-XLPZGREQSA-N dTTP Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](COP(O)(=O)OP(O)(=O)OP(O)(O)=O)[C@@H](O)C1 NHVNXKFIZYSCEB-XLPZGREQSA-N 0.000 description 1
- 230000013020 embryo development Effects 0.000 description 1
- 210000002257 embryonic structure Anatomy 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000002622 gonadotropin Substances 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 230000035800 maturation Effects 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 210000003470 mitochondria Anatomy 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000012113 quantitative test Methods 0.000 description 1
- 210000005084 renal tissue Anatomy 0.000 description 1
- 238000007363 ring formation reaction Methods 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000011451 sequencing strategy Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- UNFWWIHTNXNPBV-WXKVUWSESA-N spectinomycin Chemical compound O([C@@H]1[C@@H](NC)[C@@H](O)[C@H]([C@@H]([C@H]1O1)O)NC)[C@]2(O)[C@H]1O[C@H](C)CC2=O UNFWWIHTNXNPBV-WXKVUWSESA-N 0.000 description 1
- 210000000952 spleen Anatomy 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 229940035893 uracil Drugs 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Wood Science & Technology (AREA)
- Biochemistry (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Microbiology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- General Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Plant Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for constructing a sequencing library of RNA with a poly (A) tail in a sample to be detected. The method comprises the following steps: taking total RNA of a sample to be detected, carrying out terminal extension by using a primer, then fully digesting by using USER enzyme, and recovering a nucleic acid fragment with the length of more than 200 nt; carrying out reverse transcription on the nucleic acid fragment, and then carrying out template exchange by using TSO (TSO) to obtain cDNA (complementary deoxyribonucleic acid); taking the cDNA, and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA; and taking the amplified cDNA, constructing a sequencing library, and finally performing machine sequencing. Experiments prove that the sequencing library constructed by the method provided by the invention can analyze the sequence of RNA with a poly (A) tail, including the sequence of the poly (A) tail. The invention has important application value for the research of poly (A) tail of RNA.
Description
Technical Field
The invention belongs to the technical field of biology, and particularly relates to a construction method of a sequencing library of RNA (ribonucleic acid) with a poly (A) tail in a sample to be detected.
Background
Most mature RNAs such as messenger RNA (mRNA) and long non-coding RNA (lincRNA) are often regulated by numerous post-transcriptional modifications. Most of the mRNA and lincRNA were transcribed with a template-free poly (A) tail under the catalysis of poly (A) polymerase. The poly (A) tail is considered to be one of the key factors regulating RNA stability and transcription efficiency. mRNA and lincRNA with poly (A) account for only a small portion of intracellular RNA. Oligo (dT) -dependent affinity purification is a common method for isolating mRNA, but this method has inevitable preferences for longer poly (A) tails. Thus, it is desirable to avoid preferential enrichment of poly (A) to obtain the original true poly (A) information when analyzing the poly (A) tail.
After the advent of the Next Generation Sequencing (NGS) era, transcriptome-related studies have emerged endlessly. However, the existing NGS technology can not process the homopolymerization sequence longer than 30nt by using a base recognition algorithm, and the specific base composition of the poly (A) tail cannot be well explained. Even Sanger sequencing has difficulty in identifying long homopolymeric sequences. Smart-seq2 is an extremely sensitive single-cell RNA-seq technique that allows reverse transcription and cDNA library construction via the 3 ' UTR anchor primer 5 ' -AAGCAGTGGTATCAACGCAGAGTACT30VN-3 ' (N for any base and V for A, C or G). The N and V at the end of this primer can anchor it to the end of the 3' UTR, eliminating the poly (A) tail from the final cDNA library by data analysis, excluding the effect of homopolymeric sequences on sequence assembly. Other RNA-seq tools of the Illumina platform may also remove the effect of poly (A) during library construction, sequencing and data analysis. The Iso-seq technology based on the PacBio platform allows the construction of libraries from cDNAs knocking out of poly (A) using a strategy similar to Smart-seq 2. Thus, the conventional RNA-seq or Iso-seq data generated by the Illumina/PacBio platform are missing poly (A) information.
Currently, PAL-seq (poly (A) TAIL length profiling by sequencing) and TAIL-seq can detect the length of poly (A) TAIL in the whole transcriptome range by optimizing the base recognition algorithm for poly (T). By these means, researchers have obtained data on the dynamics of poly (A) tail length from yeast, cell lines, mouse liver, Arabidopsis thaliana leaves, Drosophila, frog, and zebrafish embryos. One particular sequencing method used by PAL-seq can only be implemented on a mass-off sequencer (Illumina genome Analyzer II). The PAL-seq method is based on the principle that dTTP and biotin-labeled dUTP are incorporated during primer extension, and the poly (A) tail length is estimated by reading the incorporation ratio of the biotin label. The PAL-seq method does not chemically recognize non-A residues in the poly (A) tail nor does it yield single base resolution data. TAIL-seq and its modified version of mRNA enrichment, mTAIL-seq, require the determination of the position of the poly (T) terminus by analysis of the original sequencing image using a special base recognition algorithm. Only a small number of sequencers currently provide the raw sequencing images required for this analysis. It is evident that most users do not make good use of the above-described mTAIL-seq sequencing method based on existing commercial sequencing services and general-purpose sequencing instruments. Two other drawbacks of this approach are: the required special base recognition algorithm is only 231bp for effective reading length of poly (T), and non-T residue information in the poly (A) tail is not easy to obtain. It can be seen that TAIL-seq and PAL-seq, while providing a means to describe the length of poly (A) and detect non-adenosine modifications at the poly (A) terminus (non-A modifications), uridine (uridine, U) and guanosine (G) modifications at the 3' end of RNA TAILs have also been identified as a benefit. However, the base composition in the poly (A) tail other than the 3' end is not clear.
During maturation and early embryo development in mouse ova, various developmental events, including clearance of maternal mRNA and activation of the Zygotic Genome (ZGA), occur, these important events being closely related to the various RNAs stored in GV ova, which are regulated by their poly (a) tail.
Disclosure of Invention
The invention aims to obtain the complete sequence of the poly (A) tail in the RNA of a sample to be detected, so that the structure and the function of the poly (A) tail can be more accurately analyzed.
The invention firstly protects the construction method of the sequencing library of the RNA with poly (A) tail in the sample to be detected, and the method sequentially comprises the following steps:
(a1) taking total RNA of a sample to be detected, carrying out terminal extension by using a primer, then adding USER enzyme for full digestion, and recovering a nucleic acid fragment with the length of more than 200 nt;
the primer can comprise a DNA fragment A and a DNA fragment C from the 5 'end to the 3' end in sequence;
the DNA fragment A can be any sequence of 15-40bp (such as 15-25bp, 25-40bp, 15bp, 25bp or 40bp) and can not be combined with a poly (A) tail for PCR amplification;
the DNA fragment C can be composed of 3-24bp (such as 3-18bp, 18-24bp, 3bp, 18bp, 24bp) nucleotides, and each nucleotide can be dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;
(a2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer to obtain cDNA;
the RT primer is all or part of the nucleotide sequence of the DNA fragment A;
(a3) taking the cDNA, and performing template exchange by using TSO;
the TSO can comprise a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence;
the DNA fragment D can be any sequence of 15-40bp (such as 15-25bp, 25-40bp, 15bp, 25bp or 40bp) and is used for PCR amplification;
the nucleic acid fragment D may comprise segment 2, segment 2 being used for binding to the cDNA (C has been added to the 5' end of the cDNA during reverse transcription); segment 2 can be composed of N guanine ribonucleotides or "(N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid", wherein N can be a natural number more than 3;
(a4) taking the cDNA which is finished in the step (a3), and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA;
the upstream primer in the PCR primer is all or part of the nucleotide sequence of the DNA fragment D;
the downstream primer in the PCR primer is all or part of the nucleotide sequence of the DNA fragment A;
(a5) taking the amplified cDNA, and sequencing; obtaining a sequencing library of RNA with poly (A) tail in the sample to be tested.
The invention also provides a construction method of the sequencing library of RNA with poly (A) tail in the single cell, which sequentially comprises the following steps:
(b1) cracking the single cell to obtain cell lysate; then, conducting terminal extension by using a primer; then adding USER enzyme for full digestion, and recovering nucleic acid fragments with the length of more than 200 nt;
the primer can comprise a DNA fragment A and a DNA fragment C from the 5 'end to the 3' end in sequence;
the DNA fragment A can be any sequence of 15-40bp (such as 15-25bp, 25-40bp, 15bp, 25bp or 40bp) and can not be combined with a poly (A) tail for PCR amplification;
the DNA fragment C can be composed of 3-24bp (such as 3-18bp, 18-24bp, 3bp, 18bp, 24bp) nucleotides, and each nucleotide is dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;
(b2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer to obtain cDNA;
the RT primer is all or part of the nucleotide sequence of the DNA fragment A;
(b3) taking the cDNA, and performing template exchange by using TSO;
the TSO can comprise a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence;
the DNA fragment D can be any sequence of 15-40bp (such as 15-25bp, 25-40bp, 15bp, 25bp or 40bp) and is used for PCR amplification;
the nucleic acid fragment D may comprise segment 2, segment 2 being used for binding to the cDNA (C has been added to the 5' end of the cDNA during reverse transcription); segment 2 consists of N guanine ribonucleotides or (N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid, wherein N can be a natural number more than 3;
(b4) taking the cDNA which is finished in the step (b3), and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA;
the upstream primer in the PCR primer is all or part of the nucleotide sequence of the DNA fragment D;
the downstream primer in the PCR primer is all or part of the nucleotide sequence of the DNA fragment A;
(b5) taking the amplified cDNA, and sequencing; a sequencing library of RNA with a poly (A) tail in a single cell was obtained.
Any one of the above primer primers may comprise a DNA fragment A and a DNA fragment C in sequence from the 5 'end to the 3' end.
Any one of the TSOs described above may be composed of a DNA fragment D and a nucleic acid fragment D in order from the 5 'end to the 3' end.
Any one of the above-mentioned nucleic acid fragment D may specifically consist of segment 2.
In any of the above methods, in the step (b1), "lysing a single cell to obtain a cell lysate" may be performed by: adding a nonionic surfactant and an RNase inhibitor into the single cells, and incubating to obtain a cell lysate.
The non-ionic surfactant may be TritonX-100.
In the step (b1), the method for obtaining the cell lysate by "lysing the single cells" may be: mu.L of aqueous solution containing 0.2% (v/v) TritonX-100 and 1. mu.L of RNase inhibitor were added to the single cells, and incubated to obtain cell lysates.
The incubation parameter can be 70-90 deg.C (such as 70-80 deg.C, 80-90 deg.C, 70 deg.C, 80 deg.C or 90 deg.C) for 3-10min (such as 3-5min, 5-10min, 3min, 5min or 10 min).
Any one of the primer also comprises a DNA fragment B, wherein the DNA fragment B is positioned at the 3 'end of the DNA fragment A and the 5' end of the DNA fragment C; the DNA fragment B can be a Barcode sequence of 8-30bp (such as 8-16bp, 16-30bp, 8bp, 16bp or 30bp) and is used for distinguishing different samples to be detected or single cells.
Any one of the primer primers can be composed of a DNA fragment A, a DNA fragment B and a DNA fragment C from the 5 'end to the 3' end in sequence.
In any of the above methods, the performing of the terminal extension in steps (a1) and (b1) further requires adding dNTP and DNA polymerase to the system.
The DNA polymerase may in particular be the Klenow fragment (exo-, NEB).
The conditions for the terminal extension may be: incubating at 32-42 deg.C (such as 32-37 deg.C, 37-42 deg.C, 32 deg.C, 37 deg.C or 42 deg.C) for 30min-90min (such as 30min-60min, 60min-90min, 30min, 60min or 90 min).
The condition of "USER enzyme digestion" can be 32-42 deg.C (such as 32-37 deg.C, 37-42 deg.C, 32 deg.C, 37 deg.C or 42 deg.C) for 30min-90min (such as 30min-60min, 60min-90min, 30min, 60min or 90 min).
In any of the above-described methods, in step (a2) and step (b2), the reverse transcriptase adds C to the 5' end of the reverse transcription product (i.e., cDNA) because the reverse transcriptase has terminal transferase activity.
In any of the methods described above, in steps (a3) and (b3), the nucleic acid fragment D may comprise segment 1, segment 1 being located 5' to segment 2; segment 1 may be any sequence of 0-10bp (e.g., 0-2bp, 2-10bp, 0bp, 2bp, or 10 bp).
Any of the above-mentioned nucleic acid fragment D may specifically consist of segment 1 and segment 2.
In any of the above methods, in the step (a3) and the step (b3), the DNA fragment d may be DNA fragment a.
If both DNA fragment D and DNA fragment A contain DNA fragment I, the upstream primer or the downstream primer may be all or part of the nucleotide sequence of DNA fragment I, in which case the upstream primer and the downstream primer may be the same, i.e., a single primer is used for PCR amplification.
In any of the above methods, the "taking amplified cDNA and sequencing" in step (a5) and step (b5) may be: mixing the amplified cDNA larger than 200bp with the amplified cDNA larger than 2kb, and then adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library; the SMRTbell Template library was sequenced using the PacBio platform.
The RNA in the sequencing library of RNA with a poly (A) tail in the sample to be tested constructed by any one of the methods described above or the RNA with a poly (A) tail in a single cell constructed by any one of the methods described above all contains complete poly (A) tail information.
The RNA in the sequencing library of RNA with poly (A) tail in the sample to be tested constructed by any one of the methods or the RNA with poly (A) tail in the single cell constructed by any one of the methods comprises complete RNA information.
In the above, the nucleotide sequence of the primer may be 5 '-AAGCAGTGGTATCACGCAGAGTACTAGCACTCdUTTTTTTTTdUT TTTTTTTTT-3'. The nucleotide sequence of DNA fragment A or DNA fragment D may be 5'-AAGCAGTGGTATCAACGCAGAGTAC-3'. The nucleotide sequence of the DNA fragment B may be 5'-TACTAGAGTAGCACTC-3'. The nucleotide sequence of the DNA fragment C may be 5 '-dUTTTTTTTTdUTTTTTTTTTT-3'. The nucleotide sequence of the RT primer may be 5'-AAGCAGTGGTATCAACGCAGAGTAC-3'. The nucleotide sequences of both the forward and the reverse primers can be 5'-AAGCAGTGGTATCAACGCAGAGT-3'. The nucleotide sequence of the nucleic acid fragment D can be 5' -ATrGrGG*-3' (rG represents guanine ribonucleotide, G*Indicates a guanine deoxyribonucleotide modified with a locked nucleic acid). The nucleotide sequence of segment 1 may be 5 '-AT-3'. The nucleotide sequence of segment 2 can be 5' -rGrGG*-3'. The nucleotide sequence of TSO may be 5' -AAGCAGTGGTATCAACGCAGAGATRACRGGG*-3’。
The present invention also protects any of S1) -S6).
S1) application of the method for constructing the sequencing library of the RNA with the poly (A) tail in any sample to be detected in analyzing the sequence of the RNA with the poly (A) tail in the sample to be detected.
S2) application of the method for constructing the sequencing library of the RNA with the poly (A) tail in any sample to be detected in analyzing the poly (A) tail sequence of the RNA with the poly (A) tail in the sample to be detected.
S3) use of the method for constructing the sequencing library of RNA having a poly (A) tail in any one of the single cells described above in the analysis of the sequence of RNA having a poly (A) tail in a single cell.
S4) use of the method for constructing the sequencing library of RNA having a poly (A) tail in any one of the single cells described above in the analysis of the poly (A) tail sequence of RNA having a poly (A) tail in a single cell.
S5) use of any of the methods described above for analyzing the sequence of RNA having a poly (A) tail.
S6) use of any of the methods described above for analyzing a poly (A) tail sequence of an RNA having a poly (A) tail.
The invention also protects a kit, which can comprise at least one of any one of the primer, any one of the RT primer, any one of the TSO and any one of the PCR primer.
The kit can specifically comprise any one of the above-mentioned guide primers, any one of the above-mentioned RT primers, any one of the above-mentioned TSOs and any one of the above-mentioned PCR primers.
The invention also protects the application of any one of the kits, which can be at least one of T1) -T8):
t) analyzing the sequence of RNA with poly (A) tail in the sample to be detected;
t2) analyzing the poly (A) tail sequence of RNA with poly (A) tail in the sample to be tested;
t3) analyzing the sequence of RNA with a poly (a) tail in single cells;
t4) analyzing the poly (A) tail sequence of RNA having a poly (A) tail in single cells;
t5) analyzing the sequence of RNA having a poly (A) tail;
t6) analyzing the poly (A) tail sequence of RNA having a poly (A) tail;
t7) constructing a sequencing library of RNA with poly (A) tail in a sample to be tested;
t8) construction of a sequencing library of RNA with a poly (A) tail in a single cell.
Any of the above test samples may be a biological cell or a biological tissue.
The total RNA of any one of the above test samples may be total RNA of biological cells or total RNA of biological tissues. The total RNA may or may not be purified (i.e., contain small amounts of impurities such as cells, tissues, extracts, etc.).
Any of the single cells described above may be biological single cells. The single cells of the organism may or may not be purified (i.e., contain small amounts of impurities such as centrates, mitochondria, etc.).
Any of the above organisms may be animals, plants or microorganisms. The animal may be a mouse, human, pig, cow, rabbit or sheep, and the like. The mouse may be a mouse or a rat. The mouse may be a CD1(ICR) mouse. The pig may be a white pig.
Any of the above cells may be egg cells, cancer cells, leukocytes or plant (common) cells, and the like. The cancer cell may specifically be a cervical cancer cell. The egg cell may be a GV egg cell.
Any of the above tissues may be liver tissue, kidney tissue, spleen tissue or plant (common) tissue, etc.
In the embodiment of the present invention, the sample to be tested may specifically be GV egg cells, Hela cells or porcine liver tissue of CD1(ICR) mouse. The single cell may be a single GV ovum of a CD1(ICR) mouse.
The present inventors have developed a new sequencing method, named PAIso-seq (poly (a) inclusive RNA isoforma sequencing), which can accurately and sensitively read the full-length poly (a) information contained in different RNA subtypes in the transcriptome domain. The steps of the PAIso-seq process are as follows: first, a linker is added after the poly (A) tail to preserve the poly (A) sequence; then using template exchange method to amplify full-length cDNA, and reserving complete poly (A) tail; finally, three generations of sequencing were performed using the PacBio platform. The template-dependent terminal extension mechanism preserves the integrity of the poly (A) sequence, and the linker sequence provides a primer binding site for reverse transcription and PCR amplification. The end extension by oligo (dT) is template dependent and is only performed in RNA containing poly (A), avoiding the subsequent enrichment step of poly (A) + RNA. The use of template exchange allows the sensitivity of the PAISo-seq method to be comparable to the single-cell RNA-seq technique which is by far the most sensitive. The PacBio sequencing platform also has the advantage of accurately resolving long-chain homopolymeric sequences. In addition, the circular single-molecule cycle sequencing strategy adopted by the technical platform can perform multiple data reads on a single cDNA molecule so as to acquire highly accurate consistent base recognition data. The inventors of the present invention analyzed poly (A) tail data in the whole transcriptome in a large scale using the PAISo-seq method, including quantitative length of poly (A) tail and accurate non-A modification analysis, in mouse GV egg cells, mouse single GV egg, Hela cells and porcine liver tissue, respectively. The results indicate that in addition to U and G modifications at the 3' end of many mRNAs, some U, G or C non-A modifications occur inside the poly (A) tail; different RNA subtypes can be transcribed from the same gene, and the RNA of the specific subtypes has different poly (A) tails. Also, the PAISo-seq method allows for the precise analysis of the base composition of the poly (A) tail. The results of the analysis show that the PAISo-seq data show that many non-A residues are also widely distributed within the poly (A) tail, rather than the previously thought 3' end of the tail. Thus, the PAISo-seq method allows for the fine analysis of poly (A) tails from different RNA subtypes, suggesting that the base composition of poly (A) tails is far more complex than previously thought. This also suggests that there are very complex and further-explored mechanisms for modification and regulation of the poly (A) tail. These results not only provide a new tool for the study of the poly (A) tail, which is accurate and sensitive, but also open a new door for the functional and regulatory studies of the poly (A) tail. The PAISo-seq method is very sensitive and can be used for RNA poly (A) analysis of the whole transcriptome of a single cell.
Drawings
FIG. 1 is a schematic flow chart of the construction of a PAISo-seq sequencing library for a sample to be tested.
FIG. 2 is a diagram showing the analysis of poly (A) tail length of Dnmt1 gene, Cnot7 gene, Btg4 gene and Plat gene in GV egg cells and individual GV eggs.
FIG. 3 is a diagram of analysis of poly (A) information in transcripts. GV rep.1 is GV egg sample repeat 1, GV rep.2 is GV egg sample repeat 2, SCGV com is a mixed sample of individual GV eggs, the blue line is a linear regression line, the light grey area indicates the regression confidence interval, and the number of transcripts with the corresponding length poly (a) tail is indicated.
FIG. 4 is a plot of poly (A) tail length distribution, frequency of non-A modification and spearman correlation analysis for all transcripts in pooled sample data from single GV ova, GV ova and single GV ova. GV rep.2 is GV egg sample replicate 2, SCGV com is a mixed sample of individual GV eggs, blue line is linear regression line, light grey area indicates regression confidence interval.
FIG. 5 shows the poly (A) tail length distribution and non-A modification frequency of transcripts in Hela cells.
FIG. 6 shows poly (A) tail length distribution and non-A modification frequency of transcripts in porcine liver tissue.
Detailed Description
The following examples are given to facilitate a better understanding of the invention, but do not limit the invention.
The experimental procedures in the following examples are conventional unless otherwise specified.
The test materials used in the following examples were purchased from a conventional biochemical reagent store unless otherwise specified.
The quantitative tests in the following examples, all set up three replicates and the results averaged.
Example 1 construction of a PAISo-seq sequencing library for samples to be tested
Because long-chain homopolymeric sequences cannot be resolved, the current Illumina platform does not provide a good tool for analyzing poly (a) tails. TAIL-seq and PAL-seq techniques can obtain the length information of poly (T) by using a special poly (T) length analysis algorithm or a special sequencing method, but neither method can identify the non-A modifications distributed outside the 3' terminal region in poly (A). In addition, the TAIL-seq and PAL-seq methods require milligram-sized RNA samples, which limits their use in precious biopsy samples and patient samples. The newly developed PacBio third-generation sequencing technology can identify and analyze the base composition of a long-chain homopolymerized sequence by reading a single molecule in real time. In addition, the application of the cyclization sequencing template in the library enables a single template to be read for multiple times, and the method is used for obtaining CCS (Circular Consensus sequences) information, so that the accuracy of Sequence analysis is greatly improved. Thus, the PacBio third generation sequencing platform may be the best choice for resolving the poly (A) tail length and base composition in RNA.
Through a large number of experiments, the inventor of the invention accurately analyzes the full-length RNA sequences and poly (A) tail sequences of a plurality of target genes in a sample to be detected by constructing a PAISo-seq sequencing library of the sample to be detected. The method comprises the following specific steps:
1. end extension
Taking the total RNA of a sample to be detected, adding dNTP and Klenow fragments (exo-, NEB) and mixing uniformly; then, the terminal extension reaction is carried out using a primer.
The primer comprises a DNA fragment A, a DNA fragment B and a DNA fragment C from a5 'end to a 3' end in sequence; the DNA fragment A is an arbitrary sequence (the length is 15-40bp) and is used for PCR amplification; the DNA fragment B is a Barcode sequence (the length is 8-30bp) and is used for distinguishing different PAISo-seq sequencing libraries; the DNA fragment C consists of 3-24bp nucleotides, each nucleotide is uracil deoxyribonucleotide (expressed by dU) or T; at least one of the 3 nucleotides at the C5' end of the DNA fragment is dU.
The terminal extension reaction conditions were: incubate at 37 ℃ for 1 h.
2. USER digestion and RNA purification
(1) The USER enzyme (NEB) was added to the reaction system completing step 1 and incubated at 37 ℃ for 1 h.
The USER enzyme can recognize and cut dU residues in the guide primer so as to cut the guide primer and avoid the guide primer as a reverse transcription primer.
(2) Recovering nucleic acid fragments with length of more than 200 nt.
The recovery method specifically comprises the following steps: adding 50 mu L of RNA Binding buffer (a component in an RNA Clean & Concentrator-5 kit) and 50 mu L of ethanol into the reaction system which finishes the step (1), and uniformly mixing to obtain a mixed solution; transferring the mixed solution to a Zymo-Spin IC column and centrifuging; the column was washed with RNAclean & Concentrator-5 kit (Zymo Research) and centrifuged; adding 6-8 μ L of enucleated enzyme water into the column base for elution, and centrifuging to obtain nucleic acid solution; the nucleic acid fragments in the nucleic acid solution are all subjected to terminal extension and have a length of more than 200 nt.
3. Reverse transcription and terminal transfer C
Taking the nucleic acid solution obtained in the step 2, carrying out reverse transcription by using an RT primer, wherein the reverse transcriptase adds C to the 5 'end of the reverse transcription product due to the terminal transferase activity of the reverse transcriptase, and a cDNA with a C label at the 5' end is obtained.
The RT primer is all or part of the nucleotide sequence of the DNA fragment A.
4. Template replacement
And (4) taking the reaction system which finishes the step (3), and carrying out template exchange by using TSO to obtain cDNA.
The TSO comprises a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence; the DNA fragment D is any sequence (the length is 15-40 bp); the nucleic acid fragment D consists of a segment 1 and a segment 2; segment 1 is any DNA sequence (0-10 bp in length); segment 2 consists of N rG's or "(N-1) rG's and 1G*"composition, N is a natural number of 3 or more; rG represents guanine ribonucleotide, G*Shows a locked nucleic acid modified guanine deoxyribonucleotide.
The DNA fragment D can be a DNA fragment A.
5. Amplification of cDNA
And (4) performing PCR amplification by using the cDNA obtained in the step (4) as a template and adopting an upstream primer and a downstream primer to obtain amplified cDNA.
The upstream primer is all or part of the nucleotide sequence of the DNA fragment D.
The downstream primer is all or part of the nucleotide sequence of the DNA fragment A.
If the DNA fragment A and the DNA fragment D both contain the DNA fragment I, and the upstream primer and the downstream primer are all or part of the nucleotide sequence of the DNA fragment I, the upstream primer and the downstream primer can be the same, i.e., a single primer is used for PCR amplification.
6. Ligation of cyclized linkers
The amplified cDNA obtained in step 5 is subjected to length selection by using Pure PB beads, specifically, amplified cDNA larger than 200bp is selected by using 1 × beads, and amplified cDNA larger than 2kb is selected by using 0.4 × beads. The amplified cDNA of more than 200bp and the amplified cDNA of more than 2kb were mixed equimolar, and then an SMRTbell Template library was constructed using an SMRTbell Template Prep kit (PacBio).
7. PacBio sequencing
And sequencing the SMRTbell Template library by adopting a PacBio platform to obtain a PAISo-seq sequencing library of the sample to be detected.
According to the sequencing result, the full-length RNA sequence of the target gene in the sample to be tested and the poly (A) tail sequence thereof can be accurately analyzed.
A schematic flow chart of the PAISo-seq sequencing library of the samples to be tested is shown in FIG. 1(DNA fragment D is identical to DNA fragment A).
If the sample to be detected is a single cell, replacing the step 1 of taking total RNA of the sample to be detected with the step of taking the single cell, adding 19 mu L of aqueous solution containing 0.2% (v/v) TritonX-100 and 1 mu L of RNase inhibitor, incubating for 5min at 80 ℃, cracking to obtain a cell lysate, and obtaining the PAISo-seq sequencing library of the single cell without changing other steps.
Example 2, example 1 use of the PAISo-seq sequencing library of test samples constructed in example 2 to analyze the length of the poly (A) tail of the transcript in mouse GV egg cells and mouse single GV egg
1. Injecting pregnant mare serum gonadotropin (ProSpec) into the abdominal cavity of 7-8 weeks old CD1(ICR) mice (Beijing Wintonlitha laboratory animal technology Co., Ltd.), and then obtaining ova by ovary puncture (30 gauge); the ova were washed 1 times with M2 medium (Sigma) and3 times with PBS (PBSA) buffer containing 0.1% (v/v) BSA to obtain GV ova.
2. And (3) extracting total RNA from the GV egg cells by adopting a Direct-zol RNA MicroPrep kit (Zymo Research) to obtain the total RNA of the GV egg cells. The method comprises the following specific steps:
(1) adding 500 μ L TRIzol (Ambion) into GV egg cells, lysing, and mixing thoroughly;
(2) adding 500 mu L of ethanol into the system which finishes the step (1), and thoroughly mixing;
(3) transferring the system completing the step (2) to a Zymo-Spin IC column, and centrifuging; then washing the column and centrifuging;
(4) and (4) after the step (3) is finished, adding 10-30 mu L of enucleated enzyme water into the column base for elution, and centrifuging to obtain the total RNA of the GV egg cells.
3. The "total RNA of test sample" in example 1 was replaced with total RNA of GV egg cells, and the other steps were not changed, to obtain a PAISo-seq sequencing library of GV egg cells. This experiment was repeated twice.
4. Replacing the step 1 of the embodiment 1 with the step of taking total RNA of a sample to be detected by the step of taking a single GV ovum, adding 19 mu L of aqueous solution containing 0.2 percent (v/v) TritonX-100 and 1 mu L of RNase inhibitor, incubating for 5min at 80 ℃, and cracking to obtain cell lysate, wherein other steps in the steps 1 and 2 are not changed to obtain a single-cell nucleic acid solution; mixing the nucleic acid solutions of 15 single cells (obtained from 15 single GV ova respectively) to obtain a mixed nucleic acid solution (in the mixed nucleic acid solution, the nucleic acid quality of each single cell is the same); the PAISo-seq sequencing library of a mixed sample of single GV eggs was obtained by replacing the "nucleic acid solution" in step 3 of example 1 with a mixed nucleic acid solution, with steps 3 to 7 unchanged. The experiment was repeated once.
5. The sequence of "total RNA of sample to be tested" in step 1 of example 1 was replaced by "taking single GV ovum, adding 19. mu.L of aqueous solution containing 0.2% (v/v) TritonX-100 and 1. mu.L of RNase inhibitor, incubating at 80 ℃ for 5min, lysing to obtain cell lysate", and obtaining PAISo-seq sequencing library of single GV ovum without changing other steps. A total of 15 GV eggs (designated C1-C15) were collected and, accordingly, a PAISo-seq sequencing library of 15 GV eggs was obtained.
In step 3, step 4 and step 5, the nucleotide sequence of each primer is as follows: and (3) primer guiding: (the box is a Barcode sequence); RT primer: 5'-AAGCAGTGGTATCAACGCAGAGTAC-3', respectively; TSO: 5' -AAGCAGTGGTATCAACGCAGAGATRACRGGG*-3'. Because the DNA fragment D is the same as the DNA fragment A, the upstream primer and the downstream primer for PCR amplification are also the same, namely, a single primer is adopted for PCR amplification; primers for PCR amplification: 5'-AAGCAGTGGTATCAACGCAGAGT-3' are provided.
6. Analyzing the poly (A) tail length of the Dnmt1 gene, the Cnot7 gene, the Btg4 gene and the Plat gene in a mixed sample of the GV egg cells and the single GV ovum according to the sequencing results of the step 3 and the step 4, wherein the poly (A) tail length is expressed by a median; and analyzing poly (A) information in the captured transcript.
The results of the poly (A) tail length analysis are shown in the left panel of FIG. 2 (GV rep.1 is GV egg sample repeat 1, GV rep.2 is GV egg sample repeat 2, SCGV com. is a mixed sample of single GV eggs, Dnmt1 gene, Cnot7 gene, Btg4 gene and Plat gene from top to bottom). The results showed that the poly (A) tail of DNMT1 gene was longer, while the poly (A) tail of Cnot7 gene, Btg4 gene and Plat gene was shorter. Transcript numbers have a spaerman correlation between 3 repeats (a in fig. 3).
The poly (A) tail length has a spearman correlation between 3 repeats (B in FIG. 3).
7. Analyzing poly (A) tail length distribution and non-A modification frequency of all transcripts in mixed sample data of single GV ovum, GV ovum and single GV ovum according to the sequencing results of step 3, step 4 and step 5, and analyzing the spearman correlation of poly (A) tail length between C15 and C4, C4 and GV rep.2 and C15 and GV rep.2.
The results of the poly (A) tail length distribution portion of all transcripts are shown in FIG. 4A (the numbers below and in the red dots are the median poly (A) tail lengths).
The results of the non-A modified frequency part are shown in FIG. 4B.
The poly (A) tail length between C15 and C4 has a spearman correlation (top panel of C in FIG. 4).
The poly (A) tail length between C4 and GV rep.2 has a spearman correlation (middle panel C in FIG. 4).
The poly (A) tail length between C15 and GV rep.2 has a spearman correlation (lower panel of C in FIG. 4).
8. Poly (A) tail lengths of Dnmt1 gene, Cn 7 gene, Btg4 gene and Plat gene in individual GV ova were determined using PAT in combination with Fragment Analyzer capillary electrophoresis (described in Mishima, Yuichiro, Tomari, Yukihide. code use and 3' UTR Length determination of mRNA Stability in Zebraphis. 10.1016/j. molcel. 2016.02.027). The experiment was repeated three times and the mean value was taken.
The detection results are shown in the right panel of FIG. 2. The results showed that the poly (A) tail of DNMT1 gene was longer, while the poly (A) tail of Cnot7 gene, Btg4 gene and Plat gene was shorter, which is completely consistent with the results in the left panel of FIG. 2.
The above results indicate that the PAISo-seq sequencing library of GV eggs can analyze the poly (A) tail length of transcripts in GV eggs, the PAISo-seq sequencing library of individual GV eggs or the PAISo-seq sequencing library of pooled samples of individual GV eggs can analyze the poly (A) tail length of transcripts in individual GV eggs.
Example 3, example 1 use of the PAISo-seq sequencing library of test samples constructed in example 1 to analyze transcripts of Hela cells for poly (A) tail length and frequency of non-A modifications
1. Extracting total RNA from Hela cells (ATCC) by using a Direct-zol RNA MicroPrep kit to obtain the total RNA of the Hela cells. The method comprises the following specific steps:
(1) taking Hela cells, adding 500 mu L TRIzol for cracking, and thoroughly mixing;
(2) adding 500 mu L of ethanol into the system which finishes the step (1), and thoroughly mixing;
(3) transferring the system completing the step (2) to a Zymo-Spin IC column, and centrifuging; then washing the column and centrifuging;
(4) and (4) after the step (3) is finished, adding 10-30 mu L of enucleated enzyme water into the column base for elution, and centrifuging to obtain the total RNA of the Hela cells.
2. The PAISo-seq sequencing library of Hela cells was obtained by replacing "total RNA of test sample" in example 1 with total RNA of Hela cells, and the other steps were not changed. This experiment was repeated twice.
The nucleotide sequences of the primer, RT primer, TSO, and the primer for PCR amplification were the same as those in example 2.
3. According to the sequencing results of step 2, the transcript poly (A) tail length distribution and the frequency of non-A modifications in Hela cells were analyzed.
The results of the distribution of the transcript poly (A) tail lengths in Hela cells are shown in the left panel of FIG. 5.
Results for frequency of non-A modification in Hela cells the right panel in FIG. 5.
The results indicate that the PAISo-seq sequencing library of Hela cells can analyze transcripts in Hela cells for poly (A) tail length and non-A modification frequency.
Example 4, example 1 use of the PAISo-seq sequencing library of test samples constructed in example 4 to analyze transcripts in porcine liver tissue for poly (A) tail length and frequency of non-A modifications
1. Taking a pig liver tissue (the pig variety is a white pig), and extracting total RNA by adopting a Direct-zol RNAmuricroprep kit to obtain the total RNA of the pig liver tissue. The method comprises the following specific steps:
(1) taking the liver tissue of the pig, adding 500 mu L TRIzol for cracking, and thoroughly mixing;
(2) adding 500 mu L of ethanol into the system which finishes the step (1), and thoroughly mixing;
(3) transferring the system completing the step (2) to a Zymo-Spin IC column, and centrifuging; then washing the column and centrifuging;
(4) and (4) after the step (3) is finished, adding 10-30 mu L of enucleated enzyme water into the column base for elution, and centrifuging to obtain the total RNA of the pig liver tissue.
2. The PAISo-seq sequencing library of the porcine liver tissue was obtained by replacing the "total RNA of the sample to be tested" in example 1 with the total RNA of the porcine liver tissue, and the other steps were not changed. This experiment was repeated twice.
The nucleotide sequences of the primer, RT primer, TSO, and the primer for PCR amplification were the same as those in example 2.
3. According to the sequencing result of step 2, the transcript poly (A) tail length distribution and the non-A modification frequency in the pig liver tissue were analyzed.
The results of the transcript poly (A) tail length distribution in porcine liver tissue are shown in the left panel of FIG. 6.
Results for non-a modification frequency in porcine liver tissue are shown in the right panel of fig. 6.
The results indicate that the PAISo-seq sequencing library of porcine liver tissue can analyze transcripts in porcine liver tissue for poly (A) tail length and non-A modification frequency.
Example 5, example 1 the accuracy of the PAISo-seq sequencing library of the samples to be tested for the detection of the length of the poly (A) tail
1. Reference primers spike-in were synthesized by Takara and poly (A) tails of different lengths in spike-in were barcoded before the start codon (spike-in tails set to 10, 30, 50, 70 and 100A's, respectively).
2. After the step 1 is completed, respectively carrying out PCR amplification by using mCherry vectors containing spike-in of poly (A) tails with different lengths as templates to obtain double-stranded DNA of the spike-in with different lengths, and then carrying out PCR amplification product gel recovery according to different lengths.
3. Taking the PCR amplification product, and adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library.
4. The SMRTbell Template library was sequenced using the PacBio platform.
According to the sequencing results, the PAISo-seq analysis data of Spike-in shows that the median length of poly (A) tail is 10, 28, 48, 67 and 97, respectively. As can be seen, the PAISo-seq sequencing library of the test sample constructed in example 1 allows an accurate quantitative description of the length of the poly (A) tail.
The above results show that the PAISo-seq sequencing library of the test sample constructed in example 1 accurately reflects the length of the poly (A) tail.
Claims (17)
1. The construction method of the sequencing library of the RNA with the poly (A) tail in the sample to be detected sequentially comprises the following steps:
(a1) taking total RNA of a sample to be detected, carrying out end extension by using a primer and a Klenow fragment, then adding USER enzyme for full digestion, and recovering a nucleic acid fragment with the length of more than 200 nt;
the primer comprises a DNA fragment A and a DNA fragment C from the 5 'end to the 3' end in sequence;
the DNA fragment A is an arbitrary sequence of 15-40bp and can not be combined with poly (A) tail for PCR amplification;
the DNA fragment C consists of 3-24bp nucleotides, and each nucleotide is dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;
(a2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer and a reverse transcriptase with terminal transferase activity to obtain cDNA;
the RT primer is the whole nucleotide sequence of the DNA fragment A;
(a3) taking the cDNA, and performing template exchange by using TSO;
the TSO comprises a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence;
the DNA fragment D is any sequence of 15-40bp and is used for PCR amplification;
the nucleic acid fragment D comprises a segment 2, and the segment 2 is used for combining with the cDNA; segment 2 consists of N guanine ribonucleotides or (N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid, wherein N is a natural number of more than 3;
(a4) taking the cDNA which is finished in the step (a3), and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA;
the upstream primer in the PCR primer is the whole nucleotide sequence of the DNA fragment D;
the downstream primer in the PCR primer is the whole nucleotide sequence of the DNA fragment A;
(a5) taking the amplified cDNA, and sequencing; obtaining a sequencing library of RNA with poly (A) tail in the sample to be tested.
2. The method of claim 1, wherein: in the step (a1), the primer further comprises a DNA fragment B, wherein the DNA fragment B is located at the 3 'end of the DNA fragment A and the 5' end of the DNA fragment C; the DNA fragment B is a Barcode sequence of 8-30bp and is used for distinguishing different samples to be detected.
3. The method of claim 1, wherein: in the step (a3), the nucleic acid fragment D comprises segment 1, segment 1 is located at the 5' end of segment 2; segment 1 is any sequence of 0-10 bp.
4. The method of claim 1, wherein: in the step (a3), the DNA fragment D is DNA fragment A.
5. The method of claim 1, wherein: in the step (a5), the amplified cDNA is sequenced as follows: mixing the amplified cDNA larger than 200bp with the amplified cDNA larger than 2kb, and then adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library; the SMRTbell Template library was sequenced using the PacBio platform.
6. The construction method of the sequencing library of the RNA with the poly (A) tail in the single cell sequentially comprises the following steps:
(b1) cracking the single cell to obtain cell lysate; then using a primer and a Klenow fragment for terminal extension; then adding USER enzyme for full digestion, and recovering nucleic acid fragments with the length of more than 200 nt;
the primer comprises a DNA fragment A and a DNA fragment C from the 5 'end to the 3' end in sequence;
the DNA fragment A is an arbitrary sequence of 15-40bp and can not be combined with poly (A) tail for PCR amplification;
the DNA fragment C consists of 3-24bp nucleotides, and each nucleotide is dU or T; at least one of 3 nucleotides at the C5' end of the DNA fragment is dU;
(b2) taking the nucleic acid fragment, and carrying out reverse transcription by using an RT primer and a reverse transcriptase with terminal transferase activity to obtain cDNA;
the RT primer is the whole nucleotide sequence of the DNA fragment A;
(b3) taking the cDNA, and performing template exchange by using TSO;
the TSO comprises a DNA fragment D and a nucleic acid fragment D from the 5 'end to the 3' end in sequence;
the DNA fragment D is any sequence of 15-40bp and is used for PCR amplification;
the nucleic acid fragment D comprises a segment 2, and the segment 2 is used for combining with the cDNA; segment 2 consists of N guanine ribonucleotides or (N-1) guanine ribonucleotides and 1 guanine deoxyribonucleotide modified by locked nucleic acid, wherein N is a natural number of more than 3;
(b4) taking the cDNA which is finished in the step (b3), and carrying out PCR amplification by adopting a PCR primer to obtain amplified cDNA;
the upstream primer in the PCR primer is the whole nucleotide sequence of the DNA fragment D;
the downstream primer in the PCR primer is the whole nucleotide sequence of the DNA fragment A;
(b5) taking the amplified cDNA, and sequencing; a sequencing library of RNA with a poly (A) tail in a single cell was obtained.
7. The method of claim 6, wherein: in the step (b1), the method for obtaining the cell lysate by lysing the single cells comprises the following steps: adding a nonionic surfactant and an RNase inhibitor into the single cells, and incubating to obtain a cell lysate.
8. The method of claim 6, wherein: in the step (b1), the primer further comprises a DNA fragment B, wherein the DNA fragment B is located at the 3 'end of the DNA fragment A and the 5' end of the DNA fragment C; the DNA fragment B is 8-30bp Barcode sequence and is used for distinguishing different single cells.
9. The method of claim 6, wherein: in the step (b3), the nucleic acid fragment D comprises segment 1, segment 1 is located at the 5' end of segment 2; segment 1 is any sequence of 0-10 bp.
10. The method of claim 6, wherein: in the step (b3), the DNA fragment D is DNA fragment A.
11. The method of claim 6, wherein: in the step (b5), the amplified cDNA is sequenced as follows: mixing the amplified cDNA larger than 200bp with the amplified cDNA larger than 2kb, and then adopting an SMRTbell Template Prep kit to construct an SMRTbell Template library; the SMRTbell Template library was sequenced using the PacBio platform.
12. Use of the method of any one of claims 1 to 5 for analysing the sequence of RNA having a poly (A) tail in a test sample.
13. Use of the method of any one of claims 1 to 5 for analyzing a poly (A) tail sequence of RNA having a poly (A) tail in a sample to be tested.
14. Use of the method of any one of claims 6 to 11 for analysing the sequence of RNA having a poly (a) tail in a single cell.
15. Use of the method of any one of claims 6 to 11 for analyzing the poly (a) tail sequence of RNA having a poly (a) tail in a single cell.
16. Use of the method of any one of claims 1 to 11 for analyzing the sequence of RNA having a poly (a) tail.
17. Use of the method of any one of claims 1 to 11 for analyzing the poly (a) tail sequence of RNA having a poly (a) tail.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910837492.2A CN110499356B (en) | 2019-09-05 | 2019-09-05 | Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910837492.2A CN110499356B (en) | 2019-09-05 | 2019-09-05 | Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110499356A CN110499356A (en) | 2019-11-26 |
CN110499356B true CN110499356B (en) | 2021-06-08 |
Family
ID=68591320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910837492.2A Active CN110499356B (en) | 2019-09-05 | 2019-09-05 | Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110499356B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111088250B (en) * | 2019-12-25 | 2022-03-08 | 中国科学院苏州生物医学工程技术研究所 | mRNA capture sequence, capture carrier synthesis method and high-throughput single-cell sequencing library preparation method |
CN113308514A (en) * | 2021-05-19 | 2021-08-27 | 武汉大学 | Construction method and kit for detection library of trace m6A and high-throughput detection method |
CN114582419B (en) * | 2022-01-29 | 2023-02-10 | 苏州大学 | Sliding window based gene sequence poly A tail extraction method |
CN116497105B (en) * | 2023-06-28 | 2023-09-29 | 浙江大学 | Single-cell transcriptome sequencing kit based on terminal transferase and sequencing method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104673785A (en) * | 2015-02-12 | 2015-06-03 | 上海交通大学 | Method for researching APA within strand-specific whole-genome range |
CN107502607A (en) * | 2017-06-20 | 2017-12-22 | 浙江大学 | A kind of a large amount of tissues, cell sample mRNA molecular barcode mark, library construction, the method for sequencing |
US11732257B2 (en) * | 2017-10-23 | 2023-08-22 | Massachusetts Institute Of Technology | Single cell sequencing libraries of genomic transcript regions of interest in proximity to barcodes, and genotyping of said libraries |
-
2019
- 2019-09-05 CN CN201910837492.2A patent/CN110499356B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110499356A (en) | 2019-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110499356B (en) | Construction method of sequencing library of RNA (ribonucleic acid) with poly (A) tail in sample to be detected | |
Ginsberg | RNA amplification strategies for small sample populations | |
CN106947827B (en) | Bighead carp gender specific molecular marker, screening method and application thereof | |
US20080318801A1 (en) | Method and kit for evaluating rna quality | |
CN110050067A (en) | Generate the method for the double stranded DNA through expanding and composition and kit for the method | |
US20220033811A1 (en) | Method and kit for preparing complementary dna | |
CN111808854B (en) | Balanced joint with molecular bar code and method for quickly constructing transcriptome library | |
CN112359093B (en) | Method and kit for preparing and expressing and quantifying free miRNA library in blood | |
Cairney et al. | Special symposium: in vitro plant recalcitrance transcript profiling: a tool to assess the development of conifer embryos | |
CN110747514B (en) | High-throughput single-cell small RNA library construction method | |
CN109971843B (en) | Sequencing method of single cell transcriptome | |
CN114875118B (en) | Methods, kits and devices for determining cell lineage | |
US20220002797A1 (en) | Full-length rna sequencing | |
US20220396833A1 (en) | In situ readout of dna barcodes | |
EP1195434A1 (en) | METHOD FOR CONSTRUCTING FULL-LENGTH cDNA LIBRARIES | |
Sudre et al. | A collection of bovine cDNA probes for gene expression profiling in muscle | |
CN112858693A (en) | Biomolecule detection method | |
CN109385468B (en) | Kit and method for detecting strand-specific efficiency | |
EP1698694A1 (en) | Method of obtaining gene tag | |
Rabani | Massively parallel analysis of regulatory RNA sequences | |
Shore et al. | CleanTag adapters improve small RNA next-generation sequencing library preparation by reducing adapter dimers | |
Petrović et al. | The application of modern molecular techniques in animal selection | |
CN116875703A (en) | Molecular marker related to calf growth and development and application thereof | |
Zhang et al. | Single-Cell Analysis of Long Noncoding RNAs (lncRNAs) in Mouse Brain Cells | |
CN115992204A (en) | Method for rapidly obtaining vertebrate mitochondrial genome sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |