CN111635930B - Method for extracting unknown RNA full-length sequence by high-throughput sequencing - Google Patents
Method for extracting unknown RNA full-length sequence by high-throughput sequencing Download PDFInfo
- Publication number
- CN111635930B CN111635930B CN202010398919.6A CN202010398919A CN111635930B CN 111635930 B CN111635930 B CN 111635930B CN 202010398919 A CN202010398919 A CN 202010398919A CN 111635930 B CN111635930 B CN 111635930B
- Authority
- CN
- China
- Prior art keywords
- rna
- sequence
- cdna
- full
- unknown
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012165 high-throughput sequencing Methods 0.000 title claims abstract description 48
- 239000002299 complementary DNA Substances 0.000 claims abstract description 72
- 238000010839 reverse transcription Methods 0.000 claims abstract description 39
- 239000012634 fragment Substances 0.000 claims abstract description 38
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 29
- 238000003559 RNA-seq method Methods 0.000 claims abstract description 24
- 102000004594 DNA Polymerase I Human genes 0.000 claims abstract description 6
- 108010017826 DNA Polymerase I Proteins 0.000 claims abstract description 6
- 102100034343 Integrase Human genes 0.000 claims abstract description 6
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims abstract description 5
- 238000003766 bioinformatics method Methods 0.000 claims abstract 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 82
- 239000000203 mixture Substances 0.000 claims description 40
- 239000011324 bead Substances 0.000 claims description 28
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 239000000047 product Substances 0.000 claims description 16
- 108020004635 Complementary DNA Proteins 0.000 claims description 13
- 230000003321 amplification Effects 0.000 claims description 11
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 11
- 238000013461 design Methods 0.000 claims description 10
- 230000008439 repair process Effects 0.000 claims description 10
- 238000010804 cDNA synthesis Methods 0.000 claims description 9
- 101150100366 end gene Proteins 0.000 claims description 9
- 238000011534 incubation Methods 0.000 claims description 9
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 8
- 238000011084 recovery Methods 0.000 claims description 8
- 239000000872 buffer Substances 0.000 claims description 7
- 230000000295 complement effect Effects 0.000 claims description 7
- 238000002156 mixing Methods 0.000 claims description 7
- 238000000746 purification Methods 0.000 claims description 6
- 238000003908 quality control method Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 5
- 239000007795 chemical reaction product Substances 0.000 claims description 5
- 238000001502 gel electrophoresis Methods 0.000 claims description 5
- 102000004190 Enzymes Human genes 0.000 claims description 4
- 108090000790 Enzymes Proteins 0.000 claims description 4
- 238000000137 annealing Methods 0.000 claims description 4
- 238000004925 denaturation Methods 0.000 claims description 4
- 230000036425 denaturation Effects 0.000 claims description 4
- 238000012257 pre-denaturation Methods 0.000 claims description 4
- 238000002864 sequence alignment Methods 0.000 claims description 4
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 claims description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 claims description 3
- 238000002123 RNA extraction Methods 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 230000008030 elimination Effects 0.000 claims description 3
- 238000003379 elimination reaction Methods 0.000 claims description 3
- 238000011451 sequencing strategy Methods 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 238000007259 addition reaction Methods 0.000 claims description 2
- 238000007622 bioinformatic analysis Methods 0.000 claims description 2
- 238000001816 cooling Methods 0.000 claims description 2
- 108020004418 ribosomal RNA Proteins 0.000 claims description 2
- TWRXJAOTZQYOKJ-UHFFFAOYSA-L Magnesium chloride Chemical compound [Mg+2].[Cl-].[Cl-] TWRXJAOTZQYOKJ-UHFFFAOYSA-L 0.000 claims 2
- 102000012410 DNA Ligases Human genes 0.000 claims 1
- 108010061982 DNA Ligases Proteins 0.000 claims 1
- 241000588724 Escherichia coli Species 0.000 claims 1
- 229910001629 magnesium chloride Inorganic materials 0.000 claims 1
- 210000003705 ribosome Anatomy 0.000 claims 1
- 238000012163 sequencing technique Methods 0.000 abstract description 23
- 238000002474 experimental method Methods 0.000 abstract description 8
- 230000002194 synthesizing effect Effects 0.000 abstract description 6
- 230000035945 sensitivity Effects 0.000 abstract description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 37
- 239000000523 sample Substances 0.000 description 28
- 108020004414 DNA Proteins 0.000 description 23
- 239000000243 solution Substances 0.000 description 15
- 239000006228 supernatant Substances 0.000 description 14
- 239000000499 gel Substances 0.000 description 13
- 210000004027 cell Anatomy 0.000 description 11
- 210000001671 embryonic stem cell Anatomy 0.000 description 11
- 238000004140 cleaning Methods 0.000 description 8
- 238000005119 centrifugation Methods 0.000 description 7
- 239000008188 pellet Substances 0.000 description 5
- KFZMGEQAYNKOFK-UHFFFAOYSA-N Isopropanol Chemical compound CC(C)O KFZMGEQAYNKOFK-UHFFFAOYSA-N 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000029087 digestion Effects 0.000 description 4
- 239000007788 liquid Substances 0.000 description 4
- 101150022630 prp5 gene Proteins 0.000 description 4
- 238000011144 upstream manufacturing Methods 0.000 description 4
- 238000003260 vortexing Methods 0.000 description 4
- 101150092780 GSP1 gene Proteins 0.000 description 3
- 101150035751 GSP2 gene Proteins 0.000 description 3
- 238000010367 cloning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 239000002184 metal Substances 0.000 description 3
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 2
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 2
- 108010008286 DNA nucleotidylexotransferase Proteins 0.000 description 2
- 102100033215 DNA nucleotidylexotransferase Human genes 0.000 description 2
- 102000016911 Deoxyribonucleases Human genes 0.000 description 2
- 108010053770 Deoxyribonucleases Proteins 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- 101100247004 Rattus norvegicus Qsox1 gene Proteins 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000001212 derivatisation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000003623 enhancer Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007857 nested PCR Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- VGONTNSXDCQUGY-RRKCRQDMSA-N 2'-deoxyinosine Chemical compound C1[C@H](O)[C@@H](CO)O[C@H]1N1C(N=CNC2=O)=C2N=C1 VGONTNSXDCQUGY-RRKCRQDMSA-N 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 108091092584 GDNA Proteins 0.000 description 1
- 102100022887 GTP-binding nuclear protein Ran Human genes 0.000 description 1
- 101000774835 Heteractis crispa PI-stichotoxin-Hcr2o Proteins 0.000 description 1
- 101000620756 Homo sapiens GTP-binding nuclear protein Ran Proteins 0.000 description 1
- 101000595467 Homo sapiens T-complex protein 1 subunit gamma Proteins 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 108020002230 Pancreatic Ribonuclease Proteins 0.000 description 1
- 102000005891 Pancreatic ribonuclease Human genes 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 101100393821 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) GSP2 gene Proteins 0.000 description 1
- 102100036049 T-complex protein 1 subunit gamma Human genes 0.000 description 1
- 108700009124 Transcription Initiation Site Proteins 0.000 description 1
- 239000008346 aqueous phase Substances 0.000 description 1
- 239000012295 chemical reaction liquid Substances 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- VGONTNSXDCQUGY-UHFFFAOYSA-N desoxyinosine Natural products C1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 VGONTNSXDCQUGY-UHFFFAOYSA-N 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000012149 elution buffer Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 239000007863 gel particle Substances 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 235000015110 jellies Nutrition 0.000 description 1
- 239000008274 jelly Substances 0.000 description 1
- 108020004999 messenger RNA Proteins 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 238000001179 sorption measurement Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000011534 wash buffer Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Analytical Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Crystallography & Structural Chemistry (AREA)
- Immunology (AREA)
- Plant Pathology (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for extracting an unknown RNA full-length sequence by high-throughput sequencing, which comprises the following steps: designing gene-specific reverse transcription primers at the 5 'end and the 3' end of the unknown RNA according to the sequence of the brand new short fragment obtained by the RNA-seq, respectively synthesizing 5 'end and 3' end one-strand cDNA by using reverse transcriptase and the gene-specific reverse transcription primers, respectively synthesizing 5 'end and 3' end two-strand cDNA by using DNA polymerase I, combining the synthesized two end two-strand cDNA, constructing a library after ultrasonic breaking and sequencing, and finally preparing the full-length sequence of the unknown RNA through bioinformatics analysis. The invention can simultaneously call the sequences of the 5 'end and the 3' end of the unknown RNA in one experiment, and has the characteristics of rapidness, flexibility, high sensitivity, low experiment cost and the like.
Description
Technical Field
The invention relates to the field of gene detection, in particular to a method for extracting an unknown RNA full-length sequence by high-throughput sequencing.
Background
The retrieval of the full-length sequence of RNA is essential for unknown RNA studies. There are many methods for extracting the full-length sequence of unknown RNA, most typically the method of rapid amplification of cDNA ends (Rapid amplification cDNA ends, RACE) developed by MICHAEL A. FROHMAN et al, which uses terminal deoxynucleotidyl transferase (Terminal deoxynucleotidyl-transferase, tdT) to introduce an Adaptor linked to oligo (dT) into the 3 'end (3' RACE) of mRNA or the poly A tail of the 3 'end (5' RACE) of cDNA, and performing cloning sequencing after PCR amplification of the cDNA ends by gene specific primers and Adaptor primers. Several methods for amplifying cDNA ends based on RACE are derived, for example, a 5'RACE method in which a poly (dC) tail is added to the 3' end of cDNA and deoxyinosine is added to poly (dG) linked to an anchor primer in order to increase the stability of binding of the anchor primer to a cDNA template and the specificity of the primer; adaptor is ligated to the 5 '-end of RNA before reverse transcription, and cDNA 5' -end is amplified by using an Adaptor primer and a gene-specific primer.
For unknown RNA with shorter acquired brand new fragments or higher homology, because it is difficult to design proper gene-specific reverse transcription primers, RACE and a derivative method thereof are not suitable for taking the full length. RACE and its derivatization method require very stringent primers, and improper primers may result in experimental failure. The 5' RACE and its derivative process includes designing gene specific reverse transcription primer, introducing adapter containing anchor primer into the tail of cDNA after reverse transcription, PCR with the gene specific amplification primer and the anchor primer to produce target sequence with restriction enzyme cutting site, and cloning and sequencing. The RACE method in common use at present also requires nested PCR by designing nested PCR primers between the gene-specific amplification primers and the anchor primers. In summary, RACE and its derivatization methods require the design of multiple pairs of primers, and as the number of primers increases, the effect on experimental results is greater.
RACE and its derivative method are to respectively prepare 5 'end sequence and 3' end sequence of RNA, and have long experimental period, complex operation and high experimental cost.
Accordingly, there is a need for further development and advancement in the art.
Disclosure of Invention
Aiming at the technical problems, the embodiment of the invention provides a method for extracting the full-length sequence of unknown RNA by high-throughput sequencing.
The technical scheme of the invention is as follows:
a method for extracting the full-length sequence of unknown RNA by high-throughput sequencing includes such steps as designing the gene-specific reverse transcription primer at 5 'end and 3' end of unknown RNA according to the new short segment sequence obtained by RNA-seq, synthesizing the first-chain cDNA at 5 'end and 3' end by reverse transcriptase and gene-specific reverse transcription primer, synthesizing the second-chain cDNA at 5 'end and 3' end by DNA polymerase I, merging the two synthesized second-chain cDNAs, ultrasonic breaking, constructing library, sequencing, and extracting the full-length sequence of unknown RNA.
The method for extracting the full-length sequence of the unknown RNA by high-throughput sequencing comprises the following steps of: the 5 'end gene specific reverse transcription primer (GSP 1) is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, the 3' end gene specific reverse transcription primer (GSP 2) is homodromously complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, and the fragments amplified by the two end gene specific reverse transcription primers have an overlapping region.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the following steps: further comprises: total RNA extraction and removal of ribosomal RNA.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of respectively synthesizing 5 '-end and 3' -end one-strand cDNA by reverse transcriptase and a gene specific reverse transcription primer, wherein the method comprises the following steps of:
(1) The following mixtures were prepared:
incubate at 65℃for 5 minutes, after which it is cooled on ice for at least 1 minute.
(2) The following cDNA synthesis mixtures were prepared:
(3) Mu.l of cDNA synthesis mixture was added to the mixture of RNA and gene-specific reverse transcription primer, gently mixed, transiently detached to the bottom of tube, incubated at 50℃for 50 minutes, incubated at 85℃for 5 minutes to terminate the reaction, cooled on ice to give a strand of cDNA, and then subjected to bead purification.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of respectively synthesizing 5 '-end and 3' -end two-chain cDNA by using DNA polymerase I, wherein the method comprises the following specific steps:
(1) The following mixtures were prepared:
mix well and incubate on ice for 5 minutes.
(2) The following enzymes were added:
two-stranded cDNA was prepared by mixing and incubating at 15℃for 2.5 hours.
Wherein 10X second strand buffer is composed of 500mM Tris-HCl, pH7.8, 50mM MgCl 2 And 10mM DTT. This buffer was filtered without adding DTT and then 0.1M DTT (invitrogen) was added to prepare the final composition.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of merging 5 '-end and 3' -end two-chain cDNA, and then performing ultrasonic disruption, wherein the size of a disruption fragment is 200bp.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the following steps of:
(1) The reaction system of the end repair and tail addition reaction is as follows:
vortex shaking, ice bath, immediately transferring to a PCR instrument for incubation (20 ℃,30min, 65 ℃,30min, 4 ℃ and infinity) to obtain a terminal repair and tailing reaction product.
(2) Joint connection
Adding a joint connection reaction solution into a reaction product of terminal repair and tailing, wherein the reaction system is as follows:
completely mixing, instantaneous centrifuging, and incubating at 20 ℃ for 15 minutes to obtain the joint product.
(3) And (5) purifying after connection.
(4) Library amplification, amplification system as follows:
mixing, instantaneous centrifuging, performing PCR reaction and pre-denaturation according to the following procedures: 98 ℃ for 1min;11 cycles (denaturation: 98 ℃ C. For 15sec; annealing: 60 ℃ C. For 30sec; extension: 72 ℃ C. For 30 sec); final extension: and (5) obtaining a PCR product at 72 ℃ for 5 min.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of E-gel electrophoresis after library construction, and gel recovery by selecting a 200bp-700bp fragment to complete library selection.
The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the step of sequencing a library by using a Hiseq-PE150 sequencing strategy.
The method for extracting the full-length sequence of the unknown RNA by high-throughput sequencing comprises the following steps of:
(1) And (3) quality control: including library mass analysis and elimination of linker sequences.
(2) Sequence alignment: the off-press data was aligned using software Bowtie2 version 2.1.0 and Samtools0.1.19-44428 cd.
(3) BLAT test comparison results: and checking the comparison result of the off-machine data by using a BLAT tool on the UCSC platform, visualizing the comparison result, and finally calling the full-length sequence of the unknown RNA.
Advantageous effects
The invention provides a method for extracting an unknown RNA full-length sequence by high-throughput sequencing, which has the following advantages:
the method has higher sensitivity, and from the experimental result (figure 2), the unknown RNA full-length sequence can be finally obtained even if the homologous hybrid sequence is influenced. This has advantages for the modulation of the full length sequence of unknown RNA with high homology, since if the unknown RNA is highly homologous, many fragments may be reverse transcribed using gene specific primers for reverse transcription, increasing the false positive rate and increasing the time and cost of positive fragment screening using RACE and its derivative methods. By using the method, even if unknown RNA has high homology, the target sequence can be detected as long as the target sequence is amplified, so that the detection sensitivity and the success rate of the experiment are improved to a great extent.
In addition, the experimental primer design has a large flexible space, and theoretically, the method can be used as long as the gene-specific reverse transcription primers at the 5 'end and the 3' end are complementary to the RNA sequences of the brand-new short fragments obtained by the RNA-seq and have correct directions, and the amplified fragments of the two end primers have overlapping regions.
The experiment only needs to design a pair of gene specific reverse transcription primers of the 5 'end and the 3' end and a pair of primers for identifying the brand-new short fragments obtained by the RNA-seq in the 5 'end and the 3' end two-chain cDNA, and the cloning sequencing step for identifying the brand-new short fragments obtained by the RNA-seq in the 5 'end and the 3' end two-chain cDNA is an optional quality control step, and an experimental result is not determined, so fewer primers are used in the method, and the influence on the experimental result caused by the primers can be reduced.
Since the reverse transcription direction depends on the primer direction, it is independent of the RNA template direction. Therefore, although the synthesis of the cDNA sequence of the 5 'end and the cDNA sequence of the 3' end are carried out separately before the ultrasonic disruption, the experimental method is the same except for the difference of the primers, and the two-chain cDNA of the 5 'end and the two-chain cDNA of the 3' end are combined during the ultrasonic disruption, and then the extraction of the full-length sequence of the unknown RNA can be completed through the same experimental flow, so that the experimental steps are simplified, and the time cost is saved. With the increase of the sequencing flux and the reduction of the sequencing cost, the method can generally save the experimental cost.
Drawings
FIG. 1 is a schematic diagram of a method for extracting an unknown RNA full-length sequence by high-throughput sequencing.
FIG. 2 is a PAGE gel electrophoresis of PCR products containing novel short fragments obtained by RNA-seq in clone sequencing identified 5 'and 3' end two-strand cDNAs in an embodiment of the present invention.
FIG. 3 is a diagram of E-gel electrophoresis and gel recovery positions in a library according to an embodiment of the present invention.
Fig. 4 is basic information of the original data of the off-line in the embodiment of the present invention.
FIG. 5 shows the mass values of all bases in the fastq file (a is the mass value of all bases of reads1, and b is the mass value of all bases of reads 2) for the next data in the example of the present invention.
FIG. 6 is a underlined section showing the results of the high throughput sequencing of the 3' -end cDNA sequence of unknown RNA obtained in the embodiment of the present invention, which was examined using BLAT tool on UCSC platform.
FIG. 7 is a underlined section showing the results of the detection of the unknown RNA5' -end cDNA sequence obtained by high throughput sequencing in the embodiment of the present invention using BLAT tool on UCSC platform.
FIG. 8 is a underlined section of the results of the high throughput sequencing of the unknown RNA full-length cDNA sequences obtained in the embodiments of the present invention, as tested on UCSC platform using BLAT tools.
FIG. 9 is a PAGE gel electrophoresis of a clone sequencing identification of undetectable sequences in an unknown RNA full-length cDNA sequence obtained by high throughput sequencing in an embodiment of the present invention, M is 50bp DNALader, and 1 is a PCR product of clone sequencing identification of undetectable sequences.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items. In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The method is designed based on the experimental principle shown in fig. 1, and the specific experimental steps are as follows:
1. primer design
(1) 5 'terminal and 3' terminal gene-specific reverse transcription primer design
Since the reverse transcription direction depends on the primer direction, it is independent of the RNA template direction. Thus, the gene-specific reverse transcription primer designs for the 5 'and 3' ends of the unknown RNAs are specifically: the 5 'end gene specific reverse transcription primer (GSP 1) is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, the 3' end gene specific reverse transcription primer (GSP 2) is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, and the amplified fragments of the two reverse transcription primers have an overlapping region. The Primer3 is used for designing a gene specific reverse transcription Primer, and the Primer itself meets general Primer design principles, such as: the GC content, secondary structure, tm value, etc.
In a specific example, the novel short fragment obtained from RNA-Seq (chr 13:64,787,049-64,787,219, NCBI37/mm 9) was obtained from RNA-Seq data from a mouse embryonic stem cell (mES) cell line established in the laboratory, from which a distal enhancer (designated E3) interacting with the Sox2 promoter was knocked out, with the specific sequence (SEQ ID NO. 1) being CTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCA, and the corresponding RNA sequence (SEQ ID NO. 2) of the novel short fragment obtained from RNA-Seq was UGGGUCUUUUUGUUCUUCCAGCAUGAGAUUCCUUAUAGAAUUCUAAUUCCUAGAAGUGCAUUGCCUAACAGCCUGAUAGCUGGAUGGGGGGAAGCACAAAUGUCUCAGAUAGUACAUCAAGAACACAGUUCCCUGAAUUAUGGCACCAG, and thus the 5' -terminal gene-specific reverse transcription primer (GSP 1) sequence (SEQ ID NO. 3) was designed as: CTGGTGCCATAATTCAGGGA the 3' -terminal gene-specific reverse transcription primer (GSP 2) sequence (SEQ ID NO. 4) is: GGATCTTCACGTAACGGATTGT. (2) Clone sequencing identified that the 5 'end and 3' end two-chain cDNA contains a brand new short fragment obtained by RNA-SEQ, and the upstream primer sequence (SEQ ID NO. 5) is: CTGGTGCCATAATTCAGGGA, the downstream primer sequence (SEQ ID NO. 6) is: CCTAGAAGTGCATTGCCTAACA the product size was 101bp.
2. Total RNA extraction
(1) Cells were from a mouse embryonic stem cell (mES) cell line established in the laboratory that knocked out a distal enhancer (designated E3) that interacted with the Sox2 promoter, and when mES clones grew in a culture dish (60 mm) to an average size of 200-400 μm, the medium was removed, 2ml Trizol was added directly to the cells in the dish, and the cells were gently swirled to bring them into full contact with all cells in the dish, and the cells were lysed and RNase was inactivated. After 5 minutes at room temperature, the cells were lysed to a homogenized state, and the lysed cells were gently swirled and mixed using a pipette, and transferred to an RNase-free EP tube.
(2) Mu.l of chloroform was added to 500. Mu.l of the lysed cells, the mixture was vigorously mixed with the lid for 10-15 seconds, left at room temperature for 2 minutes, and centrifuged at 12,000g for 15 minutes at 4 ℃.
(3) The upper aqueous phase after centrifugation was transferred to a new RNase-free EP tube, taking care not to aspirate the intermediate phase. 250 μl of isopropanol was added and the mixture was gently inverted and shaken well, at which point the sample was slightly cloudy, left at room temperature for 10 minutes, and centrifuged at 4℃for 10 minutes at 12,000 g.
(4) After centrifugation, a small milky precipitate was seen at the bottom of the tube, the supernatant was discarded, 500. Mu.l of 75% (V/V) ethanol was added, and then centrifuged at 7500g for 5 minutes at 4 ℃.
(5) Repeating the step (4).
(6) The supernatant was discarded, air-dried for 5 minutes, added with 60. Mu.l of nucleic-free H2O, and subjected to a metal bath at 55℃for 10 minutes, and used as such or left at-80℃for further use.
3. Ribosome RNA removal (Nanjing Norvigator Biotechnology Co., NR 603)
3.1 experimental procedure:
3.1.1 preparing a total RNA sample:
in a Nuclease-free centrifuge tube, 1. Mu.g of total RNA was diluted to 11. Mu.l with Nuclease-free H2O and placed on ice for use.
3.1.2RNA sample hybridized with probe:
(1) The following reaction solutions were prepared in a Nuclease-free microcentrifuge tube:
the mixture was gently pipetted 10 times and thoroughly mixed.
(2) The sample is collected to the bottom of the tube by instantaneous centrifugation, and is placed in a PCR instrument for probe hybridization reaction:
3.1.3RNase H digestion:
(1) The following reaction solutions were prepared on ice:
the mixture was gently pipetted 10 times and thoroughly mixed.
(2) The sample was placed in a PCR instrument and RNase H digestion was performed:
3.1.4DNase I digestion:
(1) The following reaction solutions were prepared on ice:
the mixture was gently pipetted 10 times and thoroughly mixed.
(2) The sample was placed in a PCR instrument and DNase I digestion was performed:
the sample was collected to the bottom of the tube by instantaneous centrifugation and placed on ice immediately into the next step.
3.1.5 purification of Ribosomal-amplified RNA using VAHTS RNA Clean Beads (Nanjinopran Biotechnology Co., N412-01).
(1) Vortex mix VAHTS RNA Clean Beads, aspirate 110 μl (2.2×) into the previous step RNA sample, blow 10 times with a pipette to thoroughly mix.
(2) The mixture was allowed to stand on ice for 15 minutes to bind RNA to the beads.
(3) The sample was placed on a magnetic rack for 5 minutes and after the solution was clear, the supernatant was carefully removed.
(4) The samples were kept on a magnetic rack all the time, the beads were rinsed with 200. Mu.l of Nuclease-free H2O freshly prepared 80% ethanol, and after incubation at room temperature for 30 seconds, the supernatant was carefully removed.
(5) Repeating the step (4) once.
(6) The sample is kept on the magnetic rack all the time, and the magnetic beads are uncapped and dried at room temperature for 5-10 minutes or until the alcohol is completely evaporated.
(7) The sample was removed from the magnet holder, 10. Mu.l of Nuclease-free H2O was added, and the mixture was blown 6 times with a pipette to mix thoroughly, allowed to stand at room temperature for 2 minutes, allowed to stand on the magnet holder for 5 minutes, and after the solution was clarified, 8. Mu.l of the supernatant was carefully aspirated into a new Nuclease-free centrifuge tube.
4. First strand cDNA Synthesis (Invitrogen; 18080-051): the 5 'and 3' ends were each subjected to a single-strand cDNA synthesis, as follows:
(1) The following mixtures were prepared
Incubate at 65℃for 5 minutes, after which it is cooled on ice for at least 1 minute.
(2) The following cDNA synthesis mixtures were prepared:
(3) Mu.l of cDNA synthesis mixture was added to the mixture of RNA and gene-specific reverse transcription primer, gently mixed, and transiently detached to the bottom of the tube. Incubation was performed at 50℃for 50 minutes and at 85℃for 5 minutes to terminate the reaction, and cooling was performed on ice to obtain a strand of cDNA, followed by bead purification of the strand of cDNA.
5. 5 'and 3' end one-strand cDNAs were purified using Agencourt AMPure XP magnetic beads (Beckman Coulter; A63881), respectively, as follows:
(1) 24 μl (1.2X) of AMPure XP magnetic beads was added to a strand of cDNA and thoroughly mixed by vortexing or multiple up and down strokes.
(2) The reaction mixture was allowed to stand at room temperature for 15 minutes to bind cDNA to the beads.
(3) The sample containing the magnetic beads was placed on a magnetic rack and allowed to stand for 5 minutes or until the solution became clear.
(4) The supernatant was aspirated by a pipette, 200. Mu.l of freshly prepared 80% EtOH (ethanol, 85% by volume) was added and washed twice. The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.
(5) After the second wash, the sample tube was rotated slightly, replaced on the magnet rack, residual EtOH was removed with a 10. Mu.l pipette, and the beads were left open to dry for 5-10 minutes at room temperature or until the alcohol was completely evaporated.
(6) The sample was removed from the magnet holder and 42.5. Mu.l of Nuclease-free H2O was added, blotted or simply vortexed. Incubate for 2 minutes at room temperature to elute a strand of cDNA from the beads.
(7) The sample tube was returned to the magnet rack for 5 minutes until the solution was clear, and 40.5. Mu.l of supernatant was carefully aspirated into a new nucleic-free PCR tube.
6. Second strand cDNA Synthesis: the 5 'and 3' end cDNAs were subjected to two-strand synthesis, respectively, and the experimental procedure was as follows:
(1) The following mixtures were prepared
Mix well and incubate on ice for 5 minutes.
(2) The following enzymes were added:
mix well and incubate at 15 ℃ for 2.5 hours. The temperature is not allowed to be higher than 15 ℃ and a hot cover is not used.
Wherein 10X second strand buffer is composed of 500mM Tris-HCl, pH7.8, 50mM MgCl 2 And 10mM DTT. This buffer was filtered without the addition of DTT and then 0.1M DTT was added to prepare the final composition.
7.5 'and 3' end two-strand cDNAs were RNA-depleted, respectively
Mu.l of RNase A (Thermo; #EN 0531) was added to each of the 5 '-terminal two-strand and 3' -terminal two-strand cDNAs, and the reaction was continued at 37℃for 1 hour to digest RNA.
8. 5 'and 3' end two-strand cDNAs were purified using Agencourt AMPure XP magnetic beads, respectively, as follows:
(1) 66 μl (1.2X) of AMPure XP magnetic beads was added to the two-stranded cDNA and thoroughly mixed by vortexing or multiple up and down strokes.
(2) The mixture was allowed to stand at room temperature for 15 minutes to bind the two-stranded cDNA to the magnetic beads.
(3) The sample containing the magnetic beads was placed on a magnetic rack and allowed to stand for 5 minutes or until the solution became clear.
(4) The supernatant was aspirated by a pipette, added 500. Mu.l of freshly prepared 80% EtOH (ethanol, volume fraction 85%), and washed twice. The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.
(5) After the second wash, the sample tube was rotated slightly, replaced on the magnet rack, and the residual EtOH was removed with a 10. Mu.l pipette, dried at room temperature for 5-10 minutes or until the alcohol was completely evaporated.
(6) Add 62 μl TE Buffer, pipette or simple vortex mix. Incubate for 2 min at room temperature to elute the two-stranded cDNA from the magnetic beads.
(7) The sample tube was returned to the magnet rack for 5 minutes until the solution was clear. Mu.l of the supernatant was carefully pipetted into a new nucleic-free PCR tube.
9. The purified 5 'end and 3' end two-chain cDNA were used to clone and sequence 8. Mu.l each to identify the whole new short fragment obtained by RNA-seq in the two-chain cDNA, and the PCR results are shown in FIG. 2, which shows that the experiment has the effect of homologous hetero-sequences. The result of clone sequencing is that the target fragment is contained in the two-chain cDNA at both ends. The step is selected, for example, the PCR primer is difficult to design due to the fact that the brand new fragment obtained from the RNA-seq is too short, and the step 10 can be directly performed without the step of clone sequencing identification.
9.1 3 PCR repeats were made for each of the 5 'and 3' end two-strand cDNAs, and the reagents and conditions were as follows:
pre-denaturation: 98 ℃ for 5min;35 cycles (denaturation: 98 ℃ C. For 10sec; annealing: 55 ℃ C. For 30sec; extension: 72 ℃ C. For 1 min); final extension: 72 ℃ for 5min; and (3) preserving: 4 ℃ for a period of infinity, and obtaining a PCR product.
9.2PCR products run on 12% PAGE gel, cut the target band and gel recovery, steps are as follows:
(1) The bottom of a 0.5ml DNase-free EP tube was pricked with a 21G needle, 3 wells per tube.
(2) A0.5 ml EP tube with a hole punched is placed in a 1.5ml EP tube without DNase.
(3) The cut strips were placed into 0.5ml puncture EP tube and covered with a tube cap.
(4) The gel strip in the 0.5ml EP tube was centrifuged at 20,000g for 4 min at room temperature and was now in the form of fine gel particles after centrifugation into the 1.5ml EP tube.
(5) The 0.5ml EP tube was discarded, 200. Mu.l of DNase-free water was added to the gel pellet and mixed well with a finger flick tube wall.
(6) The EP tube was incubated in a 70 ℃ metal bath for 10 minutes.
(7) The mixture was homogenized for 30 seconds using a medium intensity vortex shaker, and the liquid on the tube wall was centrifuged to the bottom.
(8) 1ml of the tip of the gun head was removed, and the homogenized gum solution was blown with it and transferred to a centrifuge tube containing a centrifuge column.
(9) Centrifuge at 20,000g for 3 min at room temperature, discard the column and collect the centrifugate.
(10) The 3 PCR replicates were pooled together and air-beaten to mix well, approximately 600. Mu.l in volume.
(11) To the pooled centrifugations, 25. Mu.l of 5M NaCl was added and mixed well, followed by 12. Mu.l of Glycogene (Invitrogen; 10814-010) and 750. Mu.l of isopropanol. Incubation is carried out at-20 ℃ for 1 hour or more.
(12) A small white DNA pellet was visible at the bottom of the tube after centrifugation at 20,000g for 30min at 4 ℃.
(13) The supernatant was removed and the pellet was washed with 750 μl of cold 80% (volume ratio) ethanol.
(14) The sample was centrifuged at 20,000g at 4℃and the ethanol was removed.
(15) The DNA pellet was air-dried at room temperature for about 10 minutes, at which time the DNA pellet was pale in color and jelly.
(16) Add 15. Mu.l of DNase free water, blow around 15 to resuspend DNA to get the gel recovery product.
9.3 gel recovery product was converted to Trelief using pClone007 Blunt Simple Vector Kit (Beijing Optimu Biotechnology Co., ltd.; TSV-007 BS) attached to a support TM 5 alpha competent cells (Beijing Optimu Biotechnology Co., ltd.; TSC 01) were then selected for clonal sequencing.
10. Ultrasonic breaking
The 5 'and 3' end two-strand cDNAs were pooled and transferred to 1.5ml Biorupter-disrupted tubes, and disrupted according to the procedure (on/off Cycle time 30"/30",13Cycle number) using a Biorupter Pico ultrasonic disrupter, fragment size 200bp.
Purification of the two-strand cDNA after cleavage by Ampure Beads
(1) Mu.l (1.2X) of Ampure XP beads were added to 100. Mu.l of the cleaved two-strand cDNA and thoroughly mixed by vortexing or multiple up-and-down strokes.
(2) The mixture was allowed to stand at room temperature for 15 minutes to bind DNA to the magnetic beads.
(3) The sample containing the magnetic beads was placed on a magnetic rack and allowed to stand for 5 minutes or until the solution became clear.
(4) The supernatant was aspirated by a pipette, added 500. Mu.l of freshly prepared 80% EtOH (ethanol, volume fraction 85%), and washed twice. The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.
(5) After the second wash, the sample tube was rotated slightly, replaced on the magnet rack, and the residual EtOH was removed with a 10. Mu.l pipette, dried at room temperature for 5-10 minutes or until the alcohol was completely evaporated.
(6) Mu.l of Nuclear-free H2O are added, pipetted or simply vortexed. Incubate for 2 minutes at room temperature to elute the cleaved two-stranded cDNA from the magnetic beads.
(7) The sample tube was returned to the magnet rack for 5 minutes until the solution was clear, and 25. Mu.l of the supernatant was carefully aspirated into a new nucleic-free PCR tube.
12. Library construction (KAPA Biosystems; KK 8504):
12.1 end repair and tailing reactions
(1) The following operations were performed in PCR tubes
(2) The mixture was gently vortexed, ice-bathed, immediately transferred to a PCR apparatus and immediately incubated (20 ℃,30min;65 ℃,30min;4 ℃ C., ++) to obtain a reaction product for end repair and tailing.
12.2 Joint connection
(1) Preparing a joint connection reaction liquid in the tail end repair and tailing reaction pipe:
(2) Completely mixing, instantaneous centrifuging, and incubating at 20 ℃ for 15 minutes to obtain the joint product.
12.3 post ligation purification
(1) Mu.l (0.8X) of Ampure XP beads were added to the adaptor product and thoroughly mixed by vortexing or multiple up and down blows.
(2) The mixture was allowed to stand at room temperature for 15 minutes to bind DNA to the magnetic beads.
(3) The sample containing the magnetic beads was placed on a magnetic rack and allowed to stand for 5 minutes or until the solution became clear.
(4) The supernatant was aspirated by a pipette and washed twice with 200. Mu.l of freshly prepared 80% EtOH (ethanol, volume fraction 85%). The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.
(5) After the second wash, the sample tube was rotated slightly, replaced on the magnet rack, and the residual EtOH was removed with a 10. Mu.l pipette, dried at room temperature for 5-10 minutes or until the alcohol was completely evaporated.
(6) Mu.l of Nuclear-free H2O are added, pipetted or simply vortexed. Incubate for 2 minutes at room temperature to elute DNA from the beads.
(7) The tube was returned to the magnet rack for 5 minutes until the solution was clear, and 16. Mu.l of the supernatant was carefully aspirated into a new PCR tube, yielding a purified adaptor product.
12.4 library amplification
(1) Preparing an amplification system in a PCR tube:
(2) Completely mixed, centrifuged instantaneously and subjected to PCR as follows. Pre-denaturation: 98 ℃ for 1min;11 cycles (denaturation: 98 ℃ C. For 15sec; annealing: 60 ℃ C. For 30sec; extension: 72 ℃ C. For 30 sec); final extension: 72 ℃ for 5min; and (3) preserving: 10 ℃ in infinity min. The PCR product was obtained.
12.5 selection library
The PCR product was run on E-gel, and a fragment size of 200bp to 700bp was selected and used for gel recovery using Zymoclean Gel DNA Recovery Kit (Zymo Research; D4008), as shown in FIG. 3, and the steps were as follows:
(1) ADB was added at 3 gel volumes and mixed upside down about every 2 minutes in a metal bath at 55 c until the gel was completely dissolved.
(2) All the liquid was transferred to an adsorption column and centrifuged at 12,000g for 1min at room temperature.
(3) The liquid was poured off, 200. Mu. l DNA wash buffer was added and centrifuged at 12,000g for 30 seconds at room temperature.
(4) Pouring the liquid, and repeating the step (3) once.
(5) The column was placed in a collection tube, added with 14. Mu. l DNA Elution buffer, centrifuged at 12,000g for 1min at room temperature, the column discarded, the concentration was measured and stored at-20 ℃.
13. High throughput sequencing
13.1 sequencing of library using Hiseq-PE150 sequencing strategy, data volume was about 2G.
13.2 bioinformatic analysis
The analysis flow comprises: quality control, sequence alignment, BLAT test alignment results.
(1) Quality control
Quality control included library quality analysis and elimination of linker sequences using software for FastQC v0.11.5 and cutadapt version 1.8.Dev0. FIGS. 4 and 5 show the results of library quality analysis using FastQC v 0.1. Fig. 4 shows basic information of the original data of the next machine. FIG. 5 shows the mass values of all bases in the fastq file (a is the mass value of all bases of reads1, and b is the mass value of all bases of reads 2).
(2) Sequence alignment
The machine-down data were aligned using software Bowtie2 version 2.1.0 and Samtools0.1.19-44428cd to search for new short fragments and their proximity sequences obtained from RNA-seq. As a result, the 3' -terminal cDNA sequence (SEQ ID NO. 7) of the unknown RNA obtained by high-throughput sequencing was: AAAGAAACAATCACACCCAATTCTATTTAGGTAAGCCAGTGACTTTATTGGGGTTACTTACAGGAGTGTGGATGACGCAAAGGTGGATGTACCACTGAAAAGCCCACCCCAGCATGGTGATGACTCATGAAAGCGGAATCCCTGGCATAC. The sequence of the cDNA at the 5' -end of the unknown RNA (SEQ ID NO. 8) obtained by high-throughput sequencing is: TGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCAGGTAGTACATCAAGAACACAGTTCCCTG.
(3) BLAT test results
The results were compared using a blast tool at the UCSC platform and visualized as shown in fig. 6 and 7. FIG. 6 is a underlined section showing the results of the detection of the unknown RNA 3' -end cDNA sequence obtained by high throughput sequencing using BLAT tools on the UCSC platform.
FIG. 7 is a underlined section showing the results of the examination of the unknown RNA5' -end cDNA sequence obtained by high throughput sequencing using BLAT tools on the UCSC platform. It can be seen that the sequence in the 5' -terminal cDNA sequence of the unknown RNA obtained by high throughput sequencing (SEQ ID NO. 9): tagtacatcaagaacacagttccctg is not aligned to the genome, presumably due to the presence of different transcripts in unknown RNAs. The BLAT test results showed that the undetectable sequence (SEQ ID NO. 10) in the 5' -terminal cDNA sequence of the unknown RNA obtained by high-throughput sequencing was: ggcaggaatgaagatattctaag. The results of clone sequencing identification of the novel short fragments obtained from RNA-seq in the 5 'and 3' terminal two-strand cDNAs in example 9 confirm that the undetectable sequence is an intron sequence.
In summary, FIG. 8 is a underlined section of the results of the high throughput sequencing of the full-length cDNA sequence of unknown RNA, as tested on UCSC platform using BLAT tools. The BLAT test results show that the undetectable sequence (SEQ ID NO. 11) in the unknown RNA full-length cDNA sequence obtained by high throughput sequencing further comprises: tctctgttcctaaatttctggtgccataattcagggaactgtgttct. Since the 5' -terminal sequence of the unknown RNA has been obtained and this undetectable sequence is contained between the 5' -terminal sequence and the 5' -terminal reverse transcription initiation site, it can be presumed that it is contained in the cDNA sequence of the unknown RNA. The undetected sequence was identified by clone sequencing using a reverse transcription kit HiScript III SuperMix for qPCR (+gDNA wind) (Nanjinopran Biotechnology Co., R323-01). The upstream primer sequence (SEQ ID NO. 12) is: GTACCACTGAAAAGCCCACC, the downstream primer sequence (SEQ ID NO. 13) is: AAGTGCATTGCCTAACAGCC the fragment size is 177bp. The PCR results are shown in FIG. 9, and the results of clone sequencing confirm that the undetectable sequence is contained in the cDNA sequence of unknown RNA.
Finally, the full-length cDNA sequence of the unknown RNA (SEQ ID NO. 14) taken by high-throughput sequencing is: AAAGAAACAATCACACCCAATTCTATTTAGGTAAGCCAGTGACTTTATTGGGGTTACTTACAGGAGTGTGGATGACGCAAAGGTGGATGTACCACTGAAAAGCCCACCCCAGCATGGTGATGACTCATGAAAGCGGAATCCCTGGCATACTCTCTGTTCCTAAATTTCTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCAGGTAGTACATCAAGAACACAGTTCCCTG, 344bp in length.
The full-length sequence of the unknown RNA (SEQ ID NO. 15) extracted by high-throughput sequencing is: CAGGGAACUGUGUUCUUGAUGUACUACCUGGGUCUUUUUGUUCUUCCAGCAUGAGAUUCCUUAUAGAAUUCUAAUUCCUAGAAGUGCAUUGCCUAACAGCCUGAUAGCUGGAUGGGGGGAAGCACAAAUGUCUCAGAUAGUACAUCAAGAACACAGUUCCCUGAAUUAUGGCACCAGAAAUUUAGGAACAGAGAGUAUGCCAGGGAUUCCGCUUUCAUGAGUCAUCACCAUGCUGGGGUGGGCUUUUCAGUGGUACAUCCACCUUUGCGUCAUCCACACUCCUGUAAGUAACCCCAAUAAAGUCACUGGCUUACCUAAAUAGAAUUGGGUGUGAUUGUUUCUUU, 344nt in length.
It will be understood that equivalents and modifications will occur to those skilled in the art in light of the present teachings and concepts, and all such modifications and substitutions are intended to be included within the scope of the present invention as defined in the accompanying claims.
Sequence listing
<110> institute of genome of national academy of agricultural sciences
<120> a method for extracting unknown RNA full-length sequence by high-throughput sequencing
<160> 15
<170> SIPOSequenceListing 1.0
<210> 1
<211> 149
<212> DNA
<213> New short fragment sequence obtained from RNA-seq (mouse embryonic stem cells)
<400> 1
ctggtgccat aattcaggga actgtgttct tgatgtacta tctgagacat ttgtgcttcc 60
ccccatccag ctatcaggct gttaggcaat gcacttctag gaattagaat tctataagga 120
atctcatgct ggaagaacaa aaagaccca 149
<210> 2
<211> 149
<212> RNA
<213> RNA sequence of completely novel short fragment obtained from RNA-seq (mouse embryonic stem cells)
<400> 2
ugggucuuuu uguucuucca gcaugagauu ccuuauagaa uucuaauucc uagaagugca 60
uugccuaaca gccugauagc uggauggggg gaagcacaaa ugucucagau aguacaucaa 120
gaacacaguu cccugaauua uggcaccag 149
<210> 3
<211> 20
<212> DNA
<213> 5' -terminal Gene-specific reverse transcription primer GSP1 sequence (Synthetic sequence)
<400> 3
ctggtgccat aattcaggga 20
<210> 4
<211> 22
<212> DNA
<213> 3' -terminal Gene-specific reverse transcription primer GSP2 sequence (Synthetic sequence)
<400> 4
ggatcttcac gtaacggatt gt 22
<210> 5
<211> 20
<212> DNA
<213> clone sequencing identification of novel short fragments obtained from RNA-seq contained in 5 'and 3' terminal two-strand cDNAs, upstream primer sequences (Synthetic sequence)
<400> 5
ctggtgccat aattcaggga 20
<210> 6
<211> 22
<212> DNA
<213> clone sequencing to identify that the 5 'and 3' terminal two-strand cDNAs contain a completely novel short fragment obtained from the RNA-seq, the downstream primer sequence (Synthetic sequence)
<400> 6
cctagaagtg cattgcctaa ca 22
<210> 7
<211> 150
<212> DNA
<213> unknown RNA 3' -terminal cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)
<400> 7
aaagaaacaa tcacacccaa ttctatttag gtaagccagt gactttattg gggttactta 60
caggagtgtg gatgacgcaa aggtggatgt accactgaaa agcccacccc agcatggtga 120
tgactcatga aagcggaatc cctggcatac 150
<210> 8
<211> 147
<212> DNA
<213> unknown RNA5' terminal cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)
<400> 8
tgatgtacta tctgagacat ttgtgcttcc ccccatccag ctatcaggct gttaggcaat 60
gcacttctag gaattagaat tctataagga atctcatgct ggaagaacaa aaagacccag 120
gtagtacatc aagaacacag ttccctg 147
<210> 9
<211> 26
<212> DNA
<213> sequence not aligned to genome in unknown RNA5' -terminal cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)
<400> 9
tagtacatca agaacacagt tccctg 26
<210> 10
<211> 23
<212> DNA
<213> sequence confirmed as an intron which was not detected in the cDNA sequence of the 5' -end of unknown RNA obtained by high-throughput sequencing (mouse embryonic stem cells)
<400> 10
ggcaggaatg aagatattct aag 23
<210> 11
<211> 47
<212> DNA
<213> sequence undetectable in unknown RNA full-length cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)
<400> 11
tctctgttcc taaatttctg gtgccataat tcagggaact gtgttct 47
<210> 12
<211> 20
<212> DNA
<213> clone sequencing to identify undetectable sequences in the full-length cDNA sequence of unknown RNA obtained by high throughput sequencing, upstream primer sequence (Synthetic sequence)
<400> 12
gtaccactga aaagcccacc 20
<210> 13
<211> 20
<212> DNA
<213> clone sequencing to identify undetectable sequences in the full-length cDNA sequence of unknown RNA obtained by high throughput sequencing, downstream primer sequence (Synthetic sequence)
<400> 13
aagtgcattg cctaacagcc 20
<210> 14
<211> 344
<212> DNA
<213> high throughput sequencing of the retrieved unknown RNA full-length cDNA sequence (mouse embryonic stem cells)
<400> 14
aaagaaacaa tcacacccaa ttctatttag gtaagccagt gactttattg gggttactta 60
caggagtgtg gatgacgcaa aggtggatgt accactgaaa agcccacccc agcatggtga 120
tgactcatga aagcggaatc cctggcatac tctctgttcc taaatttctg gtgccataat 180
tcagggaact gtgttcttga tgtactatct gagacatttg tgcttccccc catccagcta 240
tcaggctgtt aggcaatgca cttctaggaa ttagaattct ataaggaatc tcatgctgga 300
agaacaaaaa gacccaggta gtacatcaag aacacagttc cctg 344
<210> 15
<211> 344
<212> RNA
<213> high throughput sequencing of the retrieved unknown RNA full-length sequence (mouse embryonic stem cells)
<400> 15
cagggaacug uguucuugau guacuaccug ggucuuuuug uucuuccagc augagauucc 60
uuauagaauu cuaauuccua gaagugcauu gccuaacagc cugauagcug gaugggggga 120
agcacaaaug ucucagauag uacaucaaga acacaguucc cugaauuaug gcaccagaaa 180
uuuaggaaca gagaguaugc cagggauucc gcuuucauga gucaucacca ugcuggggug 240
ggcuuuucag ugguacaucc accuuugcgu cauccacacu ccuguaagua accccaauaa 300
agucacuggc uuaccuaaau agaauugggu gugauuguuu cuuu 344
Claims (9)
1.A method for extracting unknown RNA full-length sequence by high-throughput sequencing is characterized in that 5 'end and 3' end gene-specific reverse transcription primers of the unknown RNA are designed according to the sequence of a brand-new short fragment obtained by RNA-seq, 5 'end and 3' end one-chain cDNA are respectively synthesized by reverse transcriptase and the gene-specific reverse transcription primers, then 5 'end and 3' end two-chain cDNA are respectively synthesized by DNA polymerase I, the synthesized two end two-chain cDNA are combined, a library is constructed and sequenced after ultrasonic disruption, and the unknown RNA full-length sequence is finally extracted by bioinformatics analysis;
the design of the gene specific reverse transcription primer of the 5 'end and the 3' end of the unknown RNA is specifically as follows: the 5 'end gene specific reverse transcription primer is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, the 3' end gene specific reverse transcription primer is co-directionally complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, and the fragments amplified by the two end gene specific reverse transcription primers have an overlapping region.
2. The method of claim 1, further comprising: total RNA extraction and removal of ribosomal RNA.
3. The method for extracting the full-length sequence of unknown RNA by high-throughput sequencing according to claim 2, wherein the synthesis of the 5 '-end and the 3' -end single-strand cDNA is respectively carried out by reverse transcriptase and a gene-specific reverse transcription primer, and specifically comprises the following steps:
(1) The following mixtures were prepared:
Ribosomal-depleted RNA 8μl
10mM dNTP mix 1μl
2. Mu.M Gene-specific primer 1. Mu.l
Incubation at 65 ℃ for 5 minutes, followed by cooling on ice for at least 1 minute;
(2) The following cDNA synthesis mixtures were prepared:
(3) Mu.l of cDNA synthesis mixture was added to the mixture of RNA and gene-specific reverse transcription primer, gently mixed, transiently detached to the bottom of tube, incubated at 50℃for 50 minutes, incubated at 85℃for 5 minutes to terminate the reaction, cooled on ice to give a strand of cDNA, and then subjected to bead purification.
4. A method for high throughput sequencing of unknown RNA full-length sequences according to claim 3, wherein DNA polymerase I is used to synthesize 5 'and 3' end two-strand cdnas, respectively, specifically:
(1) The following mixtures were prepared:
40.5 μl of one-strand cDNA
10X second strand buffer 5μl
10mM dNTP mix 1.5μl
Mixing well and incubating on ice for 5 minutes;
(2) The following enzymes were added:
DNA polymerase I,E.coli,10U/μl 2.5μl
RNaseH, 2U/. Mu.l 0.5. Mu.l were mixed and incubated at 15℃for 2.5 hours to prepare a two-stranded cDNA;
wherein 10X second strand buffer is composed of 500mM Tris-HCl pH7.8, 50mM MgCl2 and 10mM DTT, and the buffer is filtered without adding DTT, and then 0.1M DTT is added to prepare the final composition.
5. The method for extracting the full-length sequence of unknown RNA by high-throughput sequencing according to claim 4, wherein 5 '-end and 3' -end two-strand cDNA are combined and subjected to ultrasonic disruption, and the disruption fragment size is 200bp.
6. The method of claim 5, wherein constructing the library comprises the steps of:
(1) The reaction system of the end repair and tail addition reaction is as follows:
25 μl of fragmented two-stranded cDNA
End Repair&A-Tailing Buffer 3.5μl
End Repair&A-Tailing Enzyme Mix 1.5μl;
Vortex vibration, ice bath, immediately transferring to a PCR instrument, incubating for 30min at 20 ℃, incubating for 30min at 65 ℃ and finally keeping at 4 ℃ to obtain a tail end repairing and tailing reaction product;
(2) Joint connection
Adding a joint connection reaction solution into a reaction product of terminal repair and tailing, wherein the reaction system is as follows:
DNA Ligase 5μl;
completely mixing, instantaneous centrifuging, and incubating at 20 ℃ for 15 minutes to obtain a joint product;
(3) Purifying after connection;
(4) Library amplification, amplification system as follows:
2X KAPA HiFi HotStart ReadyMix 20μl
10X KAPA Library Amplification Primer Mix 4μl
Adapter-ligated library 16μl;
mixing, instantaneous centrifuging, performing PCR reaction and pre-denaturation according to the following procedures: 98 ℃ for 1min;11 cycles, denaturation: 15sec at 98 ℃; annealing: 30sec at 60 ℃; extension: 30sec at 72 ℃; final extension: and (5) obtaining a PCR product at 72 ℃ for 5 min.
7. The method for extracting the full-length sequence of unknown RNA by high-throughput sequencing according to claim 6, wherein the library is constructed and then subjected to E-gel electrophoresis, and a 200bp-700bp fragment is selected for gel recovery, so that library selection is completed.
8. The method of claim 7, wherein the library is sequenced using a Hiseq-PE150 sequencing strategy.
9. The method for extracting the full-length sequence of the unknown RNA by high-throughput sequencing according to claim 8, wherein the bioinformatic analysis is carried out, and the method for extracting the full-length sequence of the unknown RNA finally comprises the following steps:
(1) And (3) quality control: including library mass analysis and elimination of linker sequences;
(2) Sequence alignment: comparing the off-line data by using software Bowtie2 version 2.1.0 and Samtools0.1.19-44428 cd;
(3) BLAT test comparison results: and checking the comparison result of the off-machine data by using a BLAT tool on the UCSC platform, visualizing the comparison result, and finally calling the full-length sequence of the unknown RNA.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010398919.6A CN111635930B (en) | 2020-05-12 | 2020-05-12 | Method for extracting unknown RNA full-length sequence by high-throughput sequencing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010398919.6A CN111635930B (en) | 2020-05-12 | 2020-05-12 | Method for extracting unknown RNA full-length sequence by high-throughput sequencing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111635930A CN111635930A (en) | 2020-09-08 |
CN111635930B true CN111635930B (en) | 2023-10-24 |
Family
ID=72327920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010398919.6A Active CN111635930B (en) | 2020-05-12 | 2020-05-12 | Method for extracting unknown RNA full-length sequence by high-throughput sequencing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111635930B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004073118A (en) * | 2002-08-20 | 2004-03-11 | Hubit Genomix Inc | Method for synthesizing polynucleotide |
CN1578841A (en) * | 2001-12-08 | 2005-02-09 | 视基因公司 | Annealing control primer and the use of the same annealing control primer |
CN102181527A (en) * | 2011-03-16 | 2011-09-14 | 中山大学 | Construction method of terminal gene library of full genome mRNA3' |
CN102559664A (en) * | 2012-02-22 | 2012-07-11 | 长春理工大学 | Method for cloning cDNA full length of bacterial gene |
CN104845968A (en) * | 2015-03-27 | 2015-08-19 | 武汉华美生物工程有限公司 | Long-chain non-coding RNA cloning method |
CN105779439A (en) * | 2016-04-19 | 2016-07-20 | 武汉生命之美科技有限公司 | Library construction method for RNA 5'-terminal information acquired through low-initial-dose high-throughput sequencing analysis transcription |
CN106757380A (en) * | 2017-01-20 | 2017-05-31 | 深圳大学 | A kind of method for building pre miRNA3`RACE seq libraries in plant |
CN109137086A (en) * | 2018-10-16 | 2019-01-04 | 梁耀极 | A kind of banking process of the full length mRNA sequencing of improvement |
CN110283883A (en) * | 2019-05-08 | 2019-09-27 | 湖南农业大学 | A kind of primer and method for unknown RNA mycoviruses genomic clone |
WO2020025599A1 (en) * | 2018-07-30 | 2020-02-06 | Gmi - Gregor Mendel Institut Für Molekulare Pflanzenbiologie Gmbh | Parallel analysis of rna 5' ends from low-input rna |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2001238106A1 (en) * | 2000-02-10 | 2001-08-20 | The Penn State Research Foundation | Method for amplifying full length single strand polynucleotide sequences |
US7504240B2 (en) * | 2005-03-10 | 2009-03-17 | Masanori Hirano | Methods for synthesizing polynucleotides using a single primer |
-
2020
- 2020-05-12 CN CN202010398919.6A patent/CN111635930B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1578841A (en) * | 2001-12-08 | 2005-02-09 | 视基因公司 | Annealing control primer and the use of the same annealing control primer |
JP2004073118A (en) * | 2002-08-20 | 2004-03-11 | Hubit Genomix Inc | Method for synthesizing polynucleotide |
CN102181527A (en) * | 2011-03-16 | 2011-09-14 | 中山大学 | Construction method of terminal gene library of full genome mRNA3' |
CN102559664A (en) * | 2012-02-22 | 2012-07-11 | 长春理工大学 | Method for cloning cDNA full length of bacterial gene |
CN104845968A (en) * | 2015-03-27 | 2015-08-19 | 武汉华美生物工程有限公司 | Long-chain non-coding RNA cloning method |
CN105779439A (en) * | 2016-04-19 | 2016-07-20 | 武汉生命之美科技有限公司 | Library construction method for RNA 5'-terminal information acquired through low-initial-dose high-throughput sequencing analysis transcription |
CN106757380A (en) * | 2017-01-20 | 2017-05-31 | 深圳大学 | A kind of method for building pre miRNA3`RACE seq libraries in plant |
WO2020025599A1 (en) * | 2018-07-30 | 2020-02-06 | Gmi - Gregor Mendel Institut Für Molekulare Pflanzenbiologie Gmbh | Parallel analysis of rna 5' ends from low-input rna |
CN109137086A (en) * | 2018-10-16 | 2019-01-04 | 梁耀极 | A kind of banking process of the full length mRNA sequencing of improvement |
CN110283883A (en) * | 2019-05-08 | 2019-09-27 | 湖南农业大学 | A kind of primer and method for unknown RNA mycoviruses genomic clone |
Non-Patent Citations (7)
Title |
---|
Dongwei Li等.Baiting out a full length sequence from unmapped RNA-seq data.BMC Genomics.2021,第22卷(第1期),第857页. * |
Julien Lagarde等.Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq).Nat Commun.2016,第1-11页. * |
M A Frohman等.Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer.Proc Natl Acad Sci U S A.1988,第85卷(第23期),第8998-9002页. * |
Masanori Hirano.RACE using only a gene-specific primer: application of a template-switching model.Mol Biotechnol.2004,第27卷(第3期),第179-186页. * |
Neil I Bower等.Targeted rapid amplification of cDNA ends (T-RACE)--an improved RACE reaction through degradation of non-target sequences.Nucleic Acids Res.2010,第38卷(第21期),第e194页. * |
Ryuji J Machida等.Four methods of preparing mRNA 5' end libraries using the Illumina sequencing platform.PLoS One.2014,第9卷(第7期),第e101812页. * |
Signe Olivarius等.High-throughput verification of transcriptional starting sites by Deep-RACE.Biotechniques.2009,第46卷(第2期),第130-132页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111635930A (en) | 2020-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10400279B2 (en) | Method for constructing a sequencing library based on a single-stranded DNA molecule and application thereof | |
JP6438126B2 (en) | Method and reagent kit for constructing nucleic acid single-stranded circular library | |
CN102971434A (en) | High-throughput sequencing method for methylated DNA and use thereof | |
JP4644685B2 (en) | Preparation method of base sequence tag | |
CN111379031B (en) | Nucleic acid library construction method, nucleic acid library obtained by the method and use thereof | |
CN111808854B (en) | Balanced joint with molecular bar code and method for quickly constructing transcriptome library | |
JP2001512474A (en) | In vitro peptide or protein expression library | |
WO2012028105A1 (en) | Sequencing library and its preparation method thereof, terminal nucleic acid sequence determining method and system | |
US20210214783A1 (en) | Method for constructing sequencing library, obtained sequencing library and sequencing method | |
CN113466444A (en) | Chromatin conformation capture method | |
CN109750086B (en) | Method for constructing single-chain circular library | |
CN111549380B (en) | Kit for constructing double-stranded RNA sequencing library and application thereof | |
CN113668068A (en) | Genome methylation library and preparation method and application thereof | |
CN113308514A (en) | Construction method and kit for detection library of trace m6A and high-throughput detection method | |
CN110387400B (en) | Parallel liquid phase hybridization capture method for simultaneously capturing positive and negative sense double chains of genome target region | |
CN111635930B (en) | Method for extracting unknown RNA full-length sequence by high-throughput sequencing | |
CN111118126B (en) | mRNA detection method based on high-throughput sequencing | |
CN116024324B (en) | Method for detecting off-target of gene editing cell | |
CN108342385A (en) | A kind of connector and the method that sequencing library is built by way of high efficiency cyclisation | |
EP1195434A1 (en) | METHOD FOR CONSTRUCTING FULL-LENGTH cDNA LIBRARIES | |
CN108624709A (en) | A kind of universal primer and detection method detecting destination gene expression in genetically modified plants | |
CN113046353B (en) | Differential screening deoxyribozyme probe for specifically inducing triple negative breast cancer | |
WO2020259303A1 (en) | Method for rapid construction of rna 3'-end gene expression library | |
CN110387399B (en) | Method for linearly amplifying double-stranded DNA and application | |
JP4403069B2 (en) | Methods for using the 5 'end of mRNA for cloning and analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |