CN111635930B

CN111635930B - Method for extracting unknown RNA full-length sequence by high-throughput sequencing

Info

Publication number: CN111635930B
Application number: CN202010398919.6A
Authority: CN
Inventors: 张玉波; 李东卫; 黄其通; 黄雷; 李清
Original assignee: Agricultural Genomics Institute at Shenzhen of CAAS
Current assignee: Agricultural Genomics Institute at Shenzhen of CAAS
Priority date: 2020-05-12
Filing date: 2020-05-12
Publication date: 2023-10-24
Anticipated expiration: 2040-05-12
Also published as: CN111635930A

Abstract

The invention discloses a method for extracting an unknown RNA full-length sequence by high-throughput sequencing, which comprises the following steps: designing gene-specific reverse transcription primers at the 5 'end and the 3' end of the unknown RNA according to the sequence of the brand new short fragment obtained by the RNA-seq, respectively synthesizing 5 'end and 3' end one-strand cDNA by using reverse transcriptase and the gene-specific reverse transcription primers, respectively synthesizing 5 'end and 3' end two-strand cDNA by using DNA polymerase I, combining the synthesized two end two-strand cDNA, constructing a library after ultrasonic breaking and sequencing, and finally preparing the full-length sequence of the unknown RNA through bioinformatics analysis. The invention can simultaneously call the sequences of the 5 'end and the 3' end of the unknown RNA in one experiment, and has the characteristics of rapidness, flexibility, high sensitivity, low experiment cost and the like.

Description

Method for extracting unknown RNA full-length sequence by high-throughput sequencing

Technical Field

The invention relates to the field of gene detection, in particular to a method for extracting an unknown RNA full-length sequence by high-throughput sequencing.

Background

The retrieval of the full-length sequence of RNA is essential for unknown RNA studies. There are many methods for extracting the full-length sequence of unknown RNA, most typically the method of rapid amplification of cDNA ends (Rapid amplification cDNA ends, RACE) developed by MICHAEL A. FROHMAN et al, which uses terminal deoxynucleotidyl transferase (Terminal deoxynucleotidyl-transferase, tdT) to introduce an Adaptor linked to oligo (dT) into the 3 'end (3' RACE) of mRNA or the poly A tail of the 3 'end (5' RACE) of cDNA, and performing cloning sequencing after PCR amplification of the cDNA ends by gene specific primers and Adaptor primers. Several methods for amplifying cDNA ends based on RACE are derived, for example, a 5'RACE method in which a poly (dC) tail is added to the 3' end of cDNA and deoxyinosine is added to poly (dG) linked to an anchor primer in order to increase the stability of binding of the anchor primer to a cDNA template and the specificity of the primer; adaptor is ligated to the 5 '-end of RNA before reverse transcription, and cDNA 5' -end is amplified by using an Adaptor primer and a gene-specific primer.

For unknown RNA with shorter acquired brand new fragments or higher homology, because it is difficult to design proper gene-specific reverse transcription primers, RACE and a derivative method thereof are not suitable for taking the full length. RACE and its derivatization method require very stringent primers, and improper primers may result in experimental failure. The 5' RACE and its derivative process includes designing gene specific reverse transcription primer, introducing adapter containing anchor primer into the tail of cDNA after reverse transcription, PCR with the gene specific amplification primer and the anchor primer to produce target sequence with restriction enzyme cutting site, and cloning and sequencing. The RACE method in common use at present also requires nested PCR by designing nested PCR primers between the gene-specific amplification primers and the anchor primers. In summary, RACE and its derivatization methods require the design of multiple pairs of primers, and as the number of primers increases, the effect on experimental results is greater.

RACE and its derivative method are to respectively prepare 5 'end sequence and 3' end sequence of RNA, and have long experimental period, complex operation and high experimental cost.

Accordingly, there is a need for further development and advancement in the art.

Disclosure of Invention

Aiming at the technical problems, the embodiment of the invention provides a method for extracting the full-length sequence of unknown RNA by high-throughput sequencing.

The technical scheme of the invention is as follows:

a method for extracting the full-length sequence of unknown RNA by high-throughput sequencing includes such steps as designing the gene-specific reverse transcription primer at 5 'end and 3' end of unknown RNA according to the new short segment sequence obtained by RNA-seq, synthesizing the first-chain cDNA at 5 'end and 3' end by reverse transcriptase and gene-specific reverse transcription primer, synthesizing the second-chain cDNA at 5 'end and 3' end by DNA polymerase I, merging the two synthesized second-chain cDNAs, ultrasonic breaking, constructing library, sequencing, and extracting the full-length sequence of unknown RNA.

The method for extracting the full-length sequence of the unknown RNA by high-throughput sequencing comprises the following steps of: the 5 'end gene specific reverse transcription primer (GSP 1) is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, the 3' end gene specific reverse transcription primer (GSP 2) is homodromously complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, and the fragments amplified by the two end gene specific reverse transcription primers have an overlapping region.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the following steps: further comprises: total RNA extraction and removal of ribosomal RNA.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of respectively synthesizing 5 '-end and 3' -end one-strand cDNA by reverse transcriptase and a gene specific reverse transcription primer, wherein the method comprises the following steps of:

(1) The following mixtures were prepared:

incubate at 65℃for 5 minutes, after which it is cooled on ice for at least 1 minute.

(2) The following cDNA synthesis mixtures were prepared:

(3) Mu.l of cDNA synthesis mixture was added to the mixture of RNA and gene-specific reverse transcription primer, gently mixed, transiently detached to the bottom of tube, incubated at 50℃for 50 minutes, incubated at 85℃for 5 minutes to terminate the reaction, cooled on ice to give a strand of cDNA, and then subjected to bead purification.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of respectively synthesizing 5 '-end and 3' -end two-chain cDNA by using DNA polymerase I, wherein the method comprises the following specific steps:

(1) The following mixtures were prepared:

mix well and incubate on ice for 5 minutes.

(2) The following enzymes were added:

two-stranded cDNA was prepared by mixing and incubating at 15℃for 2.5 hours.

Wherein 10X second strand buffer is composed of 500mM Tris-HCl, pH7.8, 50mM MgCl ₂ And 10mM DTT. This buffer was filtered without adding DTT and then 0.1M DTT (invitrogen) was added to prepare the final composition.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of merging 5 '-end and 3' -end two-chain cDNA, and then performing ultrasonic disruption, wherein the size of a disruption fragment is 200bp.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the following steps of:

(1) The reaction system of the end repair and tail addition reaction is as follows:

vortex shaking, ice bath, immediately transferring to a PCR instrument for incubation (20 ℃,30min, 65 ℃,30min, 4 ℃ and infinity) to obtain a terminal repair and tailing reaction product.

(2) Joint connection

Adding a joint connection reaction solution into a reaction product of terminal repair and tailing, wherein the reaction system is as follows:

completely mixing, instantaneous centrifuging, and incubating at 20 ℃ for 15 minutes to obtain the joint product.

(3) And (5) purifying after connection.

(4) Library amplification, amplification system as follows:

mixing, instantaneous centrifuging, performing PCR reaction and pre-denaturation according to the following procedures: 98 ℃ for 1min;11 cycles (denaturation: 98 ℃ C. For 15sec; annealing: 60 ℃ C. For 30sec; extension: 72 ℃ C. For 30 sec); final extension: and (5) obtaining a PCR product at 72 ℃ for 5 min.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the steps of E-gel electrophoresis after library construction, and gel recovery by selecting a 200bp-700bp fragment to complete library selection.

The method for extracting the unknown RNA full-length sequence by high-throughput sequencing comprises the step of sequencing a library by using a Hiseq-PE150 sequencing strategy.

The method for extracting the full-length sequence of the unknown RNA by high-throughput sequencing comprises the following steps of:

(1) And (3) quality control: including library mass analysis and elimination of linker sequences.

(2) Sequence alignment: the off-press data was aligned using software Bowtie2 version 2.1.0 and Samtools0.1.19-44428 cd.

(3) BLAT test comparison results: and checking the comparison result of the off-machine data by using a BLAT tool on the UCSC platform, visualizing the comparison result, and finally calling the full-length sequence of the unknown RNA.

Advantageous effects

The invention provides a method for extracting an unknown RNA full-length sequence by high-throughput sequencing, which has the following advantages:

the method has higher sensitivity, and from the experimental result (figure 2), the unknown RNA full-length sequence can be finally obtained even if the homologous hybrid sequence is influenced. This has advantages for the modulation of the full length sequence of unknown RNA with high homology, since if the unknown RNA is highly homologous, many fragments may be reverse transcribed using gene specific primers for reverse transcription, increasing the false positive rate and increasing the time and cost of positive fragment screening using RACE and its derivative methods. By using the method, even if unknown RNA has high homology, the target sequence can be detected as long as the target sequence is amplified, so that the detection sensitivity and the success rate of the experiment are improved to a great extent.

In addition, the experimental primer design has a large flexible space, and theoretically, the method can be used as long as the gene-specific reverse transcription primers at the 5 'end and the 3' end are complementary to the RNA sequences of the brand-new short fragments obtained by the RNA-seq and have correct directions, and the amplified fragments of the two end primers have overlapping regions.

The experiment only needs to design a pair of gene specific reverse transcription primers of the 5 'end and the 3' end and a pair of primers for identifying the brand-new short fragments obtained by the RNA-seq in the 5 'end and the 3' end two-chain cDNA, and the cloning sequencing step for identifying the brand-new short fragments obtained by the RNA-seq in the 5 'end and the 3' end two-chain cDNA is an optional quality control step, and an experimental result is not determined, so fewer primers are used in the method, and the influence on the experimental result caused by the primers can be reduced.

Since the reverse transcription direction depends on the primer direction, it is independent of the RNA template direction. Therefore, although the synthesis of the cDNA sequence of the 5 'end and the cDNA sequence of the 3' end are carried out separately before the ultrasonic disruption, the experimental method is the same except for the difference of the primers, and the two-chain cDNA of the 5 'end and the two-chain cDNA of the 3' end are combined during the ultrasonic disruption, and then the extraction of the full-length sequence of the unknown RNA can be completed through the same experimental flow, so that the experimental steps are simplified, and the time cost is saved. With the increase of the sequencing flux and the reduction of the sequencing cost, the method can generally save the experimental cost.

Drawings

FIG. 1 is a schematic diagram of a method for extracting an unknown RNA full-length sequence by high-throughput sequencing.

FIG. 2 is a PAGE gel electrophoresis of PCR products containing novel short fragments obtained by RNA-seq in clone sequencing identified 5 'and 3' end two-strand cDNAs in an embodiment of the present invention.

FIG. 3 is a diagram of E-gel electrophoresis and gel recovery positions in a library according to an embodiment of the present invention.

Fig. 4 is basic information of the original data of the off-line in the embodiment of the present invention.

FIG. 5 shows the mass values of all bases in the fastq file (a is the mass value of all bases of reads1, and b is the mass value of all bases of reads 2) for the next data in the example of the present invention.

FIG. 6 is a underlined section showing the results of the high throughput sequencing of the 3' -end cDNA sequence of unknown RNA obtained in the embodiment of the present invention, which was examined using BLAT tool on UCSC platform.

FIG. 7 is a underlined section showing the results of the detection of the unknown RNA5' -end cDNA sequence obtained by high throughput sequencing in the embodiment of the present invention using BLAT tool on UCSC platform.

FIG. 8 is a underlined section of the results of the high throughput sequencing of the unknown RNA full-length cDNA sequences obtained in the embodiments of the present invention, as tested on UCSC platform using BLAT tools.

FIG. 9 is a PAGE gel electrophoresis of a clone sequencing identification of undetectable sequences in an unknown RNA full-length cDNA sequence obtained by high throughput sequencing in an embodiment of the present invention, M is 50bp DNALader, and 1 is a PCR product of clone sequencing identification of undetectable sequences.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used in this specification includes any and all combinations of one or more of the associated listed items. In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The method is designed based on the experimental principle shown in fig. 1, and the specific experimental steps are as follows:

1. primer design

(1) 5 'terminal and 3' terminal gene-specific reverse transcription primer design

Since the reverse transcription direction depends on the primer direction, it is independent of the RNA template direction. Thus, the gene-specific reverse transcription primer designs for the 5 'and 3' ends of the unknown RNAs are specifically: the 5 'end gene specific reverse transcription primer (GSP 1) is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, the 3' end gene specific reverse transcription primer (GSP 2) is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, and the amplified fragments of the two reverse transcription primers have an overlapping region. The Primer3 is used for designing a gene specific reverse transcription Primer, and the Primer itself meets general Primer design principles, such as: the GC content, secondary structure, tm value, etc.

In a specific example, the novel short fragment obtained from RNA-Seq (chr 13:64,787,049-64,787,219, NCBI37/mm 9) was obtained from RNA-Seq data from a mouse embryonic stem cell (mES) cell line established in the laboratory, from which a distal enhancer (designated E3) interacting with the Sox2 promoter was knocked out, with the specific sequence (SEQ ID NO. 1) being CTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCA, and the corresponding RNA sequence (SEQ ID NO. 2) of the novel short fragment obtained from RNA-Seq was UGGGUCUUUUUGUUCUUCCAGCAUGAGAUUCCUUAUAGAAUUCUAAUUCCUAGAAGUGCAUUGCCUAACAGCCUGAUAGCUGGAUGGGGGGAAGCACAAAUGUCUCAGAUAGUACAUCAAGAACACAGUUCCCUGAAUUAUGGCACCAG, and thus the 5' -terminal gene-specific reverse transcription primer (GSP 1) sequence (SEQ ID NO. 3) was designed as: CTGGTGCCATAATTCAGGGA the 3' -terminal gene-specific reverse transcription primer (GSP 2) sequence (SEQ ID NO. 4) is: GGATCTTCACGTAACGGATTGT. (2) Clone sequencing identified that the 5 'end and 3' end two-chain cDNA contains a brand new short fragment obtained by RNA-SEQ, and the upstream primer sequence (SEQ ID NO. 5) is: CTGGTGCCATAATTCAGGGA, the downstream primer sequence (SEQ ID NO. 6) is: CCTAGAAGTGCATTGCCTAACA the product size was 101bp.

2. Total RNA extraction

(1) Cells were from a mouse embryonic stem cell (mES) cell line established in the laboratory that knocked out a distal enhancer (designated E3) that interacted with the Sox2 promoter, and when mES clones grew in a culture dish (60 mm) to an average size of 200-400 μm, the medium was removed, 2ml Trizol was added directly to the cells in the dish, and the cells were gently swirled to bring them into full contact with all cells in the dish, and the cells were lysed and RNase was inactivated. After 5 minutes at room temperature, the cells were lysed to a homogenized state, and the lysed cells were gently swirled and mixed using a pipette, and transferred to an RNase-free EP tube.

(2) Mu.l of chloroform was added to 500. Mu.l of the lysed cells, the mixture was vigorously mixed with the lid for 10-15 seconds, left at room temperature for 2 minutes, and centrifuged at 12,000g for 15 minutes at 4 ℃.

(3) The upper aqueous phase after centrifugation was transferred to a new RNase-free EP tube, taking care not to aspirate the intermediate phase. 250 μl of isopropanol was added and the mixture was gently inverted and shaken well, at which point the sample was slightly cloudy, left at room temperature for 10 minutes, and centrifuged at 4℃for 10 minutes at 12,000 g.

(4) After centrifugation, a small milky precipitate was seen at the bottom of the tube, the supernatant was discarded, 500. Mu.l of 75% (V/V) ethanol was added, and then centrifuged at 7500g for 5 minutes at 4 ℃.

(5) Repeating the step (4).

(6) The supernatant was discarded, air-dried for 5 minutes, added with 60. Mu.l of nucleic-free H2O, and subjected to a metal bath at 55℃for 10 minutes, and used as such or left at-80℃for further use.

3. Ribosome RNA removal (Nanjing Norvigator Biotechnology Co., NR 603)

3.1 experimental procedure:

3.1.1 preparing a total RNA sample:

in a Nuclease-free centrifuge tube, 1. Mu.g of total RNA was diluted to 11. Mu.l with Nuclease-free H2O and placed on ice for use.

3.1.2RNA sample hybridized with probe:

(1) The following reaction solutions were prepared in a Nuclease-free microcentrifuge tube:

the mixture was gently pipetted 10 times and thoroughly mixed.

(2) The sample is collected to the bottom of the tube by instantaneous centrifugation, and is placed in a PCR instrument for probe hybridization reaction:

3.1.3RNase H digestion:

(1) The following reaction solutions were prepared on ice:

the mixture was gently pipetted 10 times and thoroughly mixed.

(2) The sample was placed in a PCR instrument and RNase H digestion was performed:

3.1.4DNase I digestion:

(1) The following reaction solutions were prepared on ice:

the mixture was gently pipetted 10 times and thoroughly mixed.

(2) The sample was placed in a PCR instrument and DNase I digestion was performed:

the sample was collected to the bottom of the tube by instantaneous centrifugation and placed on ice immediately into the next step.

3.1.5 purification of Ribosomal-amplified RNA using VAHTS RNA Clean Beads (Nanjinopran Biotechnology Co., N412-01).

(1) Vortex mix VAHTS RNA Clean Beads, aspirate 110 μl (2.2×) into the previous step RNA sample, blow 10 times with a pipette to thoroughly mix.

(2) The mixture was allowed to stand on ice for 15 minutes to bind RNA to the beads.

(3) The sample was placed on a magnetic rack for 5 minutes and after the solution was clear, the supernatant was carefully removed.

(4) The samples were kept on a magnetic rack all the time, the beads were rinsed with 200. Mu.l of Nuclease-free H2O freshly prepared 80% ethanol, and after incubation at room temperature for 30 seconds, the supernatant was carefully removed.

(5) Repeating the step (4) once.

(6) The sample is kept on the magnetic rack all the time, and the magnetic beads are uncapped and dried at room temperature for 5-10 minutes or until the alcohol is completely evaporated.

(7) The sample was removed from the magnet holder, 10. Mu.l of Nuclease-free H2O was added, and the mixture was blown 6 times with a pipette to mix thoroughly, allowed to stand at room temperature for 2 minutes, allowed to stand on the magnet holder for 5 minutes, and after the solution was clarified, 8. Mu.l of the supernatant was carefully aspirated into a new Nuclease-free centrifuge tube.

4. First strand cDNA Synthesis (Invitrogen; 18080-051): the 5 'and 3' ends were each subjected to a single-strand cDNA synthesis, as follows:

(1) The following mixtures were prepared

(2) The following cDNA synthesis mixtures were prepared:

(3) Mu.l of cDNA synthesis mixture was added to the mixture of RNA and gene-specific reverse transcription primer, gently mixed, and transiently detached to the bottom of the tube. Incubation was performed at 50℃for 50 minutes and at 85℃for 5 minutes to terminate the reaction, and cooling was performed on ice to obtain a strand of cDNA, followed by bead purification of the strand of cDNA.

5. 5 'and 3' end one-strand cDNAs were purified using Agencourt AMPure XP magnetic beads (Beckman Coulter; A63881), respectively, as follows:

(1) 24 μl (1.2X) of AMPure XP magnetic beads was added to a strand of cDNA and thoroughly mixed by vortexing or multiple up and down strokes.

(2) The reaction mixture was allowed to stand at room temperature for 15 minutes to bind cDNA to the beads.

(3) The sample containing the magnetic beads was placed on a magnetic rack and allowed to stand for 5 minutes or until the solution became clear.

(4) The supernatant was aspirated by a pipette, 200. Mu.l of freshly prepared 80% EtOH (ethanol, 85% by volume) was added and washed twice. The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.

(5) After the second wash, the sample tube was rotated slightly, replaced on the magnet rack, residual EtOH was removed with a 10. Mu.l pipette, and the beads were left open to dry for 5-10 minutes at room temperature or until the alcohol was completely evaporated.

(6) The sample was removed from the magnet holder and 42.5. Mu.l of Nuclease-free H2O was added, blotted or simply vortexed. Incubate for 2 minutes at room temperature to elute a strand of cDNA from the beads.

(7) The sample tube was returned to the magnet rack for 5 minutes until the solution was clear, and 40.5. Mu.l of supernatant was carefully aspirated into a new nucleic-free PCR tube.

6. Second strand cDNA Synthesis: the 5 'and 3' end cDNAs were subjected to two-strand synthesis, respectively, and the experimental procedure was as follows:

(1) The following mixtures were prepared

Mix well and incubate on ice for 5 minutes.

(2) The following enzymes were added:

mix well and incubate at 15 ℃ for 2.5 hours. The temperature is not allowed to be higher than 15 ℃ and a hot cover is not used.

Wherein 10X second strand buffer is composed of 500mM Tris-HCl, pH7.8, 50mM MgCl ₂ And 10mM DTT. This buffer was filtered without the addition of DTT and then 0.1M DTT was added to prepare the final composition.

7.5 'and 3' end two-strand cDNAs were RNA-depleted, respectively

Mu.l of RNase A (Thermo; #EN 0531) was added to each of the 5 '-terminal two-strand and 3' -terminal two-strand cDNAs, and the reaction was continued at 37℃for 1 hour to digest RNA.

8. 5 'and 3' end two-strand cDNAs were purified using Agencourt AMPure XP magnetic beads, respectively, as follows:

(1) 66 μl (1.2X) of AMPure XP magnetic beads was added to the two-stranded cDNA and thoroughly mixed by vortexing or multiple up and down strokes.

(2) The mixture was allowed to stand at room temperature for 15 minutes to bind the two-stranded cDNA to the magnetic beads.

(4) The supernatant was aspirated by a pipette, added 500. Mu.l of freshly prepared 80% EtOH (ethanol, volume fraction 85%), and washed twice. The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.

(5) After the second wash, the sample tube was rotated slightly, replaced on the magnet rack, and the residual EtOH was removed with a 10. Mu.l pipette, dried at room temperature for 5-10 minutes or until the alcohol was completely evaporated.

(6) Add 62 μl TE Buffer, pipette or simple vortex mix. Incubate for 2 min at room temperature to elute the two-stranded cDNA from the magnetic beads.

(7) The sample tube was returned to the magnet rack for 5 minutes until the solution was clear. Mu.l of the supernatant was carefully pipetted into a new nucleic-free PCR tube.

9. The purified 5 'end and 3' end two-chain cDNA were used to clone and sequence 8. Mu.l each to identify the whole new short fragment obtained by RNA-seq in the two-chain cDNA, and the PCR results are shown in FIG. 2, which shows that the experiment has the effect of homologous hetero-sequences. The result of clone sequencing is that the target fragment is contained in the two-chain cDNA at both ends. The step is selected, for example, the PCR primer is difficult to design due to the fact that the brand new fragment obtained from the RNA-seq is too short, and the step 10 can be directly performed without the step of clone sequencing identification.

9.1 3 PCR repeats were made for each of the 5 'and 3' end two-strand cDNAs, and the reagents and conditions were as follows:

pre-denaturation: 98 ℃ for 5min;35 cycles (denaturation: 98 ℃ C. For 10sec; annealing: 55 ℃ C. For 30sec; extension: 72 ℃ C. For 1 min); final extension: 72 ℃ for 5min; and (3) preserving: 4 ℃ for a period of infinity, and obtaining a PCR product.

9.2PCR products run on 12% PAGE gel, cut the target band and gel recovery, steps are as follows:

(1) The bottom of a 0.5ml DNase-free EP tube was pricked with a 21G needle, 3 wells per tube.

(2) A0.5 ml EP tube with a hole punched is placed in a 1.5ml EP tube without DNase.

(3) The cut strips were placed into 0.5ml puncture EP tube and covered with a tube cap.

(4) The gel strip in the 0.5ml EP tube was centrifuged at 20,000g for 4 min at room temperature and was now in the form of fine gel particles after centrifugation into the 1.5ml EP tube.

(5) The 0.5ml EP tube was discarded, 200. Mu.l of DNase-free water was added to the gel pellet and mixed well with a finger flick tube wall.

(6) The EP tube was incubated in a 70 ℃ metal bath for 10 minutes.

(7) The mixture was homogenized for 30 seconds using a medium intensity vortex shaker, and the liquid on the tube wall was centrifuged to the bottom.

(8) 1ml of the tip of the gun head was removed, and the homogenized gum solution was blown with it and transferred to a centrifuge tube containing a centrifuge column.

(9) Centrifuge at 20,000g for 3 min at room temperature, discard the column and collect the centrifugate.

(10) The 3 PCR replicates were pooled together and air-beaten to mix well, approximately 600. Mu.l in volume.

(11) To the pooled centrifugations, 25. Mu.l of 5M NaCl was added and mixed well, followed by 12. Mu.l of Glycogene (Invitrogen; 10814-010) and 750. Mu.l of isopropanol. Incubation is carried out at-20 ℃ for 1 hour or more.

(12) A small white DNA pellet was visible at the bottom of the tube after centrifugation at 20,000g for 30min at 4 ℃.

(13) The supernatant was removed and the pellet was washed with 750 μl of cold 80% (volume ratio) ethanol.

(14) The sample was centrifuged at 20,000g at 4℃and the ethanol was removed.

(15) The DNA pellet was air-dried at room temperature for about 10 minutes, at which time the DNA pellet was pale in color and jelly.

(16) Add 15. Mu.l of DNase free water, blow around 15 to resuspend DNA to get the gel recovery product.

9.3 gel recovery product was converted to Trelief using pClone007 Blunt Simple Vector Kit (Beijing Optimu Biotechnology Co., ltd.; TSV-007 BS) attached to a support ^TM 5 alpha competent cells (Beijing Optimu Biotechnology Co., ltd.; TSC 01) were then selected for clonal sequencing.

10. Ultrasonic breaking

The 5 'and 3' end two-strand cDNAs were pooled and transferred to 1.5ml Biorupter-disrupted tubes, and disrupted according to the procedure (on/off Cycle time 30"/30",13Cycle number) using a Biorupter Pico ultrasonic disrupter, fragment size 200bp.

Purification of the two-strand cDNA after cleavage by Ampure Beads

(1) Mu.l (1.2X) of Ampure XP beads were added to 100. Mu.l of the cleaved two-strand cDNA and thoroughly mixed by vortexing or multiple up-and-down strokes.

(2) The mixture was allowed to stand at room temperature for 15 minutes to bind DNA to the magnetic beads.

(6) Mu.l of Nuclear-free H2O are added, pipetted or simply vortexed. Incubate for 2 minutes at room temperature to elute the cleaved two-stranded cDNA from the magnetic beads.

(7) The sample tube was returned to the magnet rack for 5 minutes until the solution was clear, and 25. Mu.l of the supernatant was carefully aspirated into a new nucleic-free PCR tube.

12. Library construction (KAPA Biosystems; KK 8504):

12.1 end repair and tailing reactions

(1) The following operations were performed in PCR tubes

(2) The mixture was gently vortexed, ice-bathed, immediately transferred to a PCR apparatus and immediately incubated (20 ℃,30min;65 ℃,30min;4 ℃ C., ++) to obtain a reaction product for end repair and tailing.

12.2 Joint connection

(1) Preparing a joint connection reaction liquid in the tail end repair and tailing reaction pipe:

(2) Completely mixing, instantaneous centrifuging, and incubating at 20 ℃ for 15 minutes to obtain the joint product.

12.3 post ligation purification

(1) Mu.l (0.8X) of Ampure XP beads were added to the adaptor product and thoroughly mixed by vortexing or multiple up and down blows.

(4) The supernatant was aspirated by a pipette and washed twice with 200. Mu.l of freshly prepared 80% EtOH (ethanol, volume fraction 85%). The sample tube is always ensured to be arranged on the magnetic rack during cleaning, and the normal temperature incubation is more than or equal to 30 seconds during each cleaning.

(6) Mu.l of Nuclear-free H2O are added, pipetted or simply vortexed. Incubate for 2 minutes at room temperature to elute DNA from the beads.

(7) The tube was returned to the magnet rack for 5 minutes until the solution was clear, and 16. Mu.l of the supernatant was carefully aspirated into a new PCR tube, yielding a purified adaptor product.

12.4 library amplification

(1) Preparing an amplification system in a PCR tube:

(2) Completely mixed, centrifuged instantaneously and subjected to PCR as follows. Pre-denaturation: 98 ℃ for 1min;11 cycles (denaturation: 98 ℃ C. For 15sec; annealing: 60 ℃ C. For 30sec; extension: 72 ℃ C. For 30 sec); final extension: 72 ℃ for 5min; and (3) preserving: 10 ℃ in infinity min. The PCR product was obtained.

12.5 selection library

The PCR product was run on E-gel, and a fragment size of 200bp to 700bp was selected and used for gel recovery using Zymoclean Gel DNA Recovery Kit (Zymo Research; D4008), as shown in FIG. 3, and the steps were as follows:

(1) ADB was added at 3 gel volumes and mixed upside down about every 2 minutes in a metal bath at 55 c until the gel was completely dissolved.

(2) All the liquid was transferred to an adsorption column and centrifuged at 12,000g for 1min at room temperature.

(3) The liquid was poured off, 200. Mu. l DNA wash buffer was added and centrifuged at 12,000g for 30 seconds at room temperature.

(4) Pouring the liquid, and repeating the step (3) once.

(5) The column was placed in a collection tube, added with 14. Mu. l DNA Elution buffer, centrifuged at 12,000g for 1min at room temperature, the column discarded, the concentration was measured and stored at-20 ℃.

13. High throughput sequencing

13.1 sequencing of library using Hiseq-PE150 sequencing strategy, data volume was about 2G.

13.2 bioinformatic analysis

The analysis flow comprises: quality control, sequence alignment, BLAT test alignment results.

(1) Quality control

Quality control included library quality analysis and elimination of linker sequences using software for FastQC v0.11.5 and cutadapt version 1.8.Dev0. FIGS. 4 and 5 show the results of library quality analysis using FastQC v 0.1. Fig. 4 shows basic information of the original data of the next machine. FIG. 5 shows the mass values of all bases in the fastq file (a is the mass value of all bases of reads1, and b is the mass value of all bases of reads 2).

(2) Sequence alignment

The machine-down data were aligned using software Bowtie2 version 2.1.0 and Samtools0.1.19-44428cd to search for new short fragments and their proximity sequences obtained from RNA-seq. As a result, the 3' -terminal cDNA sequence (SEQ ID NO. 7) of the unknown RNA obtained by high-throughput sequencing was: AAAGAAACAATCACACCCAATTCTATTTAGGTAAGCCAGTGACTTTATTGGGGTTACTTACAGGAGTGTGGATGACGCAAAGGTGGATGTACCACTGAAAAGCCCACCCCAGCATGGTGATGACTCATGAAAGCGGAATCCCTGGCATAC. The sequence of the cDNA at the 5' -end of the unknown RNA (SEQ ID NO. 8) obtained by high-throughput sequencing is: TGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCAGGTAGTACATCAAGAACACAGTTCCCTG.

(3) BLAT test results

The results were compared using a blast tool at the UCSC platform and visualized as shown in fig. 6 and 7. FIG. 6 is a underlined section showing the results of the detection of the unknown RNA 3' -end cDNA sequence obtained by high throughput sequencing using BLAT tools on the UCSC platform.

FIG. 7 is a underlined section showing the results of the examination of the unknown RNA5' -end cDNA sequence obtained by high throughput sequencing using BLAT tools on the UCSC platform. It can be seen that the sequence in the 5' -terminal cDNA sequence of the unknown RNA obtained by high throughput sequencing (SEQ ID NO. 9): tagtacatcaagaacacagttccctg is not aligned to the genome, presumably due to the presence of different transcripts in unknown RNAs. The BLAT test results showed that the undetectable sequence (SEQ ID NO. 10) in the 5' -terminal cDNA sequence of the unknown RNA obtained by high-throughput sequencing was: ggcaggaatgaagatattctaag. The results of clone sequencing identification of the novel short fragments obtained from RNA-seq in the 5 'and 3' terminal two-strand cDNAs in example 9 confirm that the undetectable sequence is an intron sequence.

In summary, FIG. 8 is a underlined section of the results of the high throughput sequencing of the full-length cDNA sequence of unknown RNA, as tested on UCSC platform using BLAT tools. The BLAT test results show that the undetectable sequence (SEQ ID NO. 11) in the unknown RNA full-length cDNA sequence obtained by high throughput sequencing further comprises: tctctgttcctaaatttctggtgccataattcagggaactgtgttct. Since the 5' -terminal sequence of the unknown RNA has been obtained and this undetectable sequence is contained between the 5' -terminal sequence and the 5' -terminal reverse transcription initiation site, it can be presumed that it is contained in the cDNA sequence of the unknown RNA. The undetected sequence was identified by clone sequencing using a reverse transcription kit HiScript III SuperMix for qPCR (+gDNA wind) (Nanjinopran Biotechnology Co., R323-01). The upstream primer sequence (SEQ ID NO. 12) is: GTACCACTGAAAAGCCCACC, the downstream primer sequence (SEQ ID NO. 13) is: AAGTGCATTGCCTAACAGCC the fragment size is 177bp. The PCR results are shown in FIG. 9, and the results of clone sequencing confirm that the undetectable sequence is contained in the cDNA sequence of unknown RNA.

Finally, the full-length cDNA sequence of the unknown RNA (SEQ ID NO. 14) taken by high-throughput sequencing is: AAAGAAACAATCACACCCAATTCTATTTAGGTAAGCCAGTGACTTTATTGGGGTTACTTACAGGAGTGTGGATGACGCAAAGGTGGATGTACCACTGAAAAGCCCACCCCAGCATGGTGATGACTCATGAAAGCGGAATCCCTGGCATACTCTCTGTTCCTAAATTTCTGGTGCCATAATTCAGGGAACTGTGTTCTTGATGTACTATCTGAGACATTTGTGCTTCCCCCCATCCAGCTATCAGGCTGTTAGGCAATGCACTTCTAGGAATTAGAATTCTATAAGGAATCTCATGCTGGAAGAACAAAAAGACCCAGGTAGTACATCAAGAACACAGTTCCCTG, 344bp in length.

The full-length sequence of the unknown RNA (SEQ ID NO. 15) extracted by high-throughput sequencing is: CAGGGAACUGUGUUCUUGAUGUACUACCUGGGUCUUUUUGUUCUUCCAGCAUGAGAUUCCUUAUAGAAUUCUAAUUCCUAGAAGUGCAUUGCCUAACAGCCUGAUAGCUGGAUGGGGGGAAGCACAAAUGUCUCAGAUAGUACAUCAAGAACACAGUUCCCUGAAUUAUGGCACCAGAAAUUUAGGAACAGAGAGUAUGCCAGGGAUUCCGCUUUCAUGAGUCAUCACCAUGCUGGGGUGGGCUUUUCAGUGGUACAUCCACCUUUGCGUCAUCCACACUCCUGUAAGUAACCCCAAUAAAGUCACUGGCUUACCUAAAUAGAAUUGGGUGUGAUUGUUUCUUU, 344nt in length.

It will be understood that equivalents and modifications will occur to those skilled in the art in light of the present teachings and concepts, and all such modifications and substitutions are intended to be included within the scope of the present invention as defined in the accompanying claims.

Sequence listing

<110> institute of genome of national academy of agricultural sciences

<120> a method for extracting unknown RNA full-length sequence by high-throughput sequencing

<160> 15

<170> SIPOSequenceListing 1.0

<210> 1

<211> 149

<212> DNA

<213> New short fragment sequence obtained from RNA-seq (mouse embryonic stem cells)

<400> 1

ctggtgccat aattcaggga actgtgttct tgatgtacta tctgagacat ttgtgcttcc 60

ccccatccag ctatcaggct gttaggcaat gcacttctag gaattagaat tctataagga 120

atctcatgct ggaagaacaa aaagaccca 149

<210> 2

<211> 149

<212> RNA

<213> RNA sequence of completely novel short fragment obtained from RNA-seq (mouse embryonic stem cells)

<400> 2

ugggucuuuu uguucuucca gcaugagauu ccuuauagaa uucuaauucc uagaagugca 60

uugccuaaca gccugauagc uggauggggg gaagcacaaa ugucucagau aguacaucaa 120

gaacacaguu cccugaauua uggcaccag 149

<210> 3

<211> 20

<212> DNA

<213> 5' -terminal Gene-specific reverse transcription primer GSP1 sequence (Synthetic sequence)

<400> 3

ctggtgccat aattcaggga 20

<210> 4

<211> 22

<212> DNA

<213> 3' -terminal Gene-specific reverse transcription primer GSP2 sequence (Synthetic sequence)

<400> 4

ggatcttcac gtaacggatt gt 22

<210> 5

<211> 20

<212> DNA

<213> clone sequencing identification of novel short fragments obtained from RNA-seq contained in 5 'and 3' terminal two-strand cDNAs, upstream primer sequences (Synthetic sequence)

<400> 5

ctggtgccat aattcaggga 20

<210> 6

<211> 22

<212> DNA

<213> clone sequencing to identify that the 5 'and 3' terminal two-strand cDNAs contain a completely novel short fragment obtained from the RNA-seq, the downstream primer sequence (Synthetic sequence)

<400> 6

cctagaagtg cattgcctaa ca 22

<210> 7

<211> 150

<212> DNA

<213> unknown RNA 3' -terminal cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)

<400> 7

aaagaaacaa tcacacccaa ttctatttag gtaagccagt gactttattg gggttactta 60

caggagtgtg gatgacgcaa aggtggatgt accactgaaa agcccacccc agcatggtga 120

tgactcatga aagcggaatc cctggcatac 150

<210> 8

<211> 147

<212> DNA

<213> unknown RNA5' terminal cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)

<400> 8

tgatgtacta tctgagacat ttgtgcttcc ccccatccag ctatcaggct gttaggcaat 60

gcacttctag gaattagaat tctataagga atctcatgct ggaagaacaa aaagacccag 120

gtagtacatc aagaacacag ttccctg 147

<210> 9

<211> 26

<212> DNA

<213> sequence not aligned to genome in unknown RNA5' -terminal cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)

<400> 9

tagtacatca agaacacagt tccctg 26

<210> 10

<211> 23

<212> DNA

<213> sequence confirmed as an intron which was not detected in the cDNA sequence of the 5' -end of unknown RNA obtained by high-throughput sequencing (mouse embryonic stem cells)

<400> 10

ggcaggaatg aagatattct aag 23

<210> 11

<211> 47

<212> DNA

<213> sequence undetectable in unknown RNA full-length cDNA sequence obtained by high throughput sequencing (mouse embryonic stem cells)

<400> 11

tctctgttcc taaatttctg gtgccataat tcagggaact gtgttct 47

<210> 12

<211> 20

<212> DNA

<213> clone sequencing to identify undetectable sequences in the full-length cDNA sequence of unknown RNA obtained by high throughput sequencing, upstream primer sequence (Synthetic sequence)

<400> 12

gtaccactga aaagcccacc 20

<210> 13

<211> 20

<212> DNA

<213> clone sequencing to identify undetectable sequences in the full-length cDNA sequence of unknown RNA obtained by high throughput sequencing, downstream primer sequence (Synthetic sequence)

<400> 13

aagtgcattg cctaacagcc 20

<210> 14

<211> 344

<212> DNA

<213> high throughput sequencing of the retrieved unknown RNA full-length cDNA sequence (mouse embryonic stem cells)

<400> 14

aaagaaacaa tcacacccaa ttctatttag gtaagccagt gactttattg gggttactta 60

caggagtgtg gatgacgcaa aggtggatgt accactgaaa agcccacccc agcatggtga 120

tgactcatga aagcggaatc cctggcatac tctctgttcc taaatttctg gtgccataat 180

tcagggaact gtgttcttga tgtactatct gagacatttg tgcttccccc catccagcta 240

tcaggctgtt aggcaatgca cttctaggaa ttagaattct ataaggaatc tcatgctgga 300

agaacaaaaa gacccaggta gtacatcaag aacacagttc cctg 344

<210> 15

<211> 344

<212> RNA

<213> high throughput sequencing of the retrieved unknown RNA full-length sequence (mouse embryonic stem cells)

<400> 15

cagggaacug uguucuugau guacuaccug ggucuuuuug uucuuccagc augagauucc 60

uuauagaauu cuaauuccua gaagugcauu gccuaacagc cugauagcug gaugggggga 120

agcacaaaug ucucagauag uacaucaaga acacaguucc cugaauuaug gcaccagaaa 180

uuuaggaaca gagaguaugc cagggauucc gcuuucauga gucaucacca ugcuggggug 240

ggcuuuucag ugguacaucc accuuugcgu cauccacacu ccuguaagua accccaauaa 300

agucacuggc uuaccuaaau agaauugggu gugauuguuu cuuu 344

Claims

1.A method for extracting unknown RNA full-length sequence by high-throughput sequencing is characterized in that 5 'end and 3' end gene-specific reverse transcription primers of the unknown RNA are designed according to the sequence of a brand-new short fragment obtained by RNA-seq, 5 'end and 3' end one-chain cDNA are respectively synthesized by reverse transcriptase and the gene-specific reverse transcription primers, then 5 'end and 3' end two-chain cDNA are respectively synthesized by DNA polymerase I, the synthesized two end two-chain cDNA are combined, a library is constructed and sequenced after ultrasonic disruption, and the unknown RNA full-length sequence is finally extracted by bioinformatics analysis;

the design of the gene specific reverse transcription primer of the 5 'end and the 3' end of the unknown RNA is specifically as follows: the 5 'end gene specific reverse transcription primer is reversely complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, the 3' end gene specific reverse transcription primer is co-directionally complementary with the RNA sequence of the brand-new short fragment obtained by the RNA-seq, and the fragments amplified by the two end gene specific reverse transcription primers have an overlapping region.

2. The method of claim 1, further comprising: total RNA extraction and removal of ribosomal RNA.

3. The method for extracting the full-length sequence of unknown RNA by high-throughput sequencing according to claim 2, wherein the synthesis of the 5 '-end and the 3' -end single-strand cDNA is respectively carried out by reverse transcriptase and a gene-specific reverse transcription primer, and specifically comprises the following steps:

(1) The following mixtures were prepared:

Ribosomal-depleted RNA 8μl

10mM dNTP mix 1μl

2. Mu.M Gene-specific primer 1. Mu.l

Incubation at 65 ℃ for 5 minutes, followed by cooling on ice for at least 1 minute;

(2) The following cDNA synthesis mixtures were prepared:

4. A method for high throughput sequencing of unknown RNA full-length sequences according to claim 3, wherein DNA polymerase I is used to synthesize 5 'and 3' end two-strand cdnas, respectively, specifically:

(1) The following mixtures were prepared:

40.5 μl of one-strand cDNA

10X second strand buffer 5μl

10mM dNTP mix 1.5μl

Mixing well and incubating on ice for 5 minutes;

(2) The following enzymes were added:

DNA polymerase I,E.coli,10U/μl 2.5μl

RNaseH, 2U/. Mu.l 0.5. Mu.l were mixed and incubated at 15℃for 2.5 hours to prepare a two-stranded cDNA;

wherein 10X second strand buffer is composed of 500mM Tris-HCl pH7.8, 50mM MgCl2 and 10mM DTT, and the buffer is filtered without adding DTT, and then 0.1M DTT is added to prepare the final composition.

5. The method for extracting the full-length sequence of unknown RNA by high-throughput sequencing according to claim 4, wherein 5 '-end and 3' -end two-strand cDNA are combined and subjected to ultrasonic disruption, and the disruption fragment size is 200bp.

6. The method of claim 5, wherein constructing the library comprises the steps of:

25 μl of fragmented two-stranded cDNA

End Repair&A-Tailing Buffer 3.5μl

End Repair&A-Tailing Enzyme Mix 1.5μl；

Vortex vibration, ice bath, immediately transferring to a PCR instrument, incubating for 30min at 20 ℃, incubating for 30min at 65 ℃ and finally keeping at 4 ℃ to obtain a tail end repairing and tailing reaction product;

(2) Joint connection

DNA Ligase 5μl；

completely mixing, instantaneous centrifuging, and incubating at 20 ℃ for 15 minutes to obtain a joint product;

(3) Purifying after connection;

(4) Library amplification, amplification system as follows:

2X KAPA HiFi HotStart ReadyMix 20μl

10X KAPA Library Amplification Primer Mix 4μl

Adapter-ligated library 16μl；

mixing, instantaneous centrifuging, performing PCR reaction and pre-denaturation according to the following procedures: 98 ℃ for 1min;11 cycles, denaturation: 15sec at 98 ℃; annealing: 30sec at 60 ℃; extension: 30sec at 72 ℃; final extension: and (5) obtaining a PCR product at 72 ℃ for 5 min.

7. The method for extracting the full-length sequence of unknown RNA by high-throughput sequencing according to claim 6, wherein the library is constructed and then subjected to E-gel electrophoresis, and a 200bp-700bp fragment is selected for gel recovery, so that library selection is completed.

8. The method of claim 7, wherein the library is sequenced using a Hiseq-PE150 sequencing strategy.

9. The method for extracting the full-length sequence of the unknown RNA by high-throughput sequencing according to claim 8, wherein the bioinformatic analysis is carried out, and the method for extracting the full-length sequence of the unknown RNA finally comprises the following steps:

(1) And (3) quality control: including library mass analysis and elimination of linker sequences;

(2) Sequence alignment: comparing the off-line data by using software Bowtie2 version 2.1.0 and Samtools0.1.19-44428 cd;