CN116516495A

CN116516495A - Construction method and application for capturing full-length non-coding RNA sequencing library

Info

Publication number: CN116516495A
Application number: CN202310366344.3A
Authority: CN
Inventors: 杨建华; 李斌; 屈良鹄
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-08-01

Abstract

The invention discloses a construction method and application of a sequencing library for capturing full-length non-coding RNA. The method comprises the following steps: s1, obtaining RNA of a sample to be detected, and respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of the RNA to obtain an RNA connection product; s2, mixing the RNA connection product with a DNA probe targeting non-target RNA for annealing, and removing the non-target RNA and the residual DNA probe to obtain a target RNA connection product; s3, designing a truncated reverse transcription primer aiming at a target RNA connection product, synthesizing cDNA, and then carrying out PCR amplification on the cDNA by using the primer containing an anchor base. The construction method can capture the terminal information of the ncRNA, improve the ratio of the sequencing sequences of the target ncRNA, obviously reduce useless sequencing and reduce the sequencing cost, and simultaneously improve the accuracy of detecting the middle-low abundance ncRNA.

Description

Construction method and application for capturing full-length non-coding RNA sequencing library

Technical Field

The invention relates to the technical field of molecular biology, in particular to a construction method and application for capturing a full-length non-coding RNA sequencing library.

Background

In addition to transcription of messenger RNA (mRNA) encoding a protein, human genome can also be transcribed to produce a large amount of RNA that does not encode a protein, i.e., non-coding RNA (ncRNA). Non-coding RNAs that have been found to include: tRNA and rRNA involved in protein synthesis; snRNA involved in RNA processing; box C/D snoRNA and box H/ACA snoRNA involved in RNA modification; miRNA, piRNA, circRNA and lncRNA involved in mRNA post-transcriptional regulation, and the like, and mutation and abnormal expression of these non-coding RNAs are closely related to major human diseases such as tumors. As a key regulatory molecule for genetic information, non-coding RNAs need to be processed to a specific length after transcription and interact with RNA-binding proteins to exert their regulatory function. Taking Kink-turn (K-turn) type RNA as an example, it is a ncRNA comprising a K-turn three-dimensional structure formed by a C box (conserved motif RUGAUGA) and a D box (conserved motif CUGA), the length of which is usually 60 to 200nt, the assembly of 2' -O-methylation modification or splice complex of guide RNA by a binding protein 15.5kDa (abbreviated as 15.5K, homologous protein Snu p in yeast, L7Ae in archaea, ybxF/YbxQ in bacteria) with the K-turn structure.

In the related art, the identification of the full-length sequence of the ncRNA is an effective means for analyzing the sequence, and mainly comprises an RNA sequence comparison method, an RNA structure prediction method, RACE (rapid-amplification of cDNA ends, namely cDNA end rapid cloning technology) and a high-throughput sequencing method. The RNA sequence comparison method and the RNA structure prediction method mainly depend on the sequence and structure conservation of ncRNA to predict the tail end of RNA, and determine the full-length sequence of the RNA based on the sequence, and the method is high in efficiency but low in accuracy; the RACE-based method can accurately identify the full-length sequence of the end of RNA, but the technical flux is too low; the method can efficiently and accurately analyze the tail end of RNA (ribonucleic acid), namely the full-length sequence, based on a high-throughput sequencing method, but the current sequencing method of the full-length sequence of the ncRNA mainly aims at the long ncRNA with a polyA tail structure of small RNAs such as miRNA, piRNA and the like or similar mRNAs such as lncRNA and the like; for the ncRNA with medium length, low abundance and no polyA such as K-turn RNA, no specific method for high-throughput sequencing analysis is available. Therefore, how to specifically capture the ncRNA with medium length and low abundance such as K-turn RNA and the full-length sequence analysis thereof is still the biggest technical problem in the RNA research field.

In recent years, a number of techniques for capturing RNA and RNA binding protein interactions, such as RIP-seq and CLIP-seq, etc., have been developed by some researchers. These techniques fall into two main categories: (1) Capturing RNA interacted with the RNA through an immunoprecipitation mode of the RNA binding protein, then fragmenting the RNA, carrying out reverse transcription by using a random primer, and then carrying out library construction sequencing; (2) The RNA region not bound by the RNA binding protein is digested by enzyme digestion, and RNA fragments interacted with the RNA binding protein are captured by immunoprecipitation of the RNA binding protein, and are connected through an RNA connector for library construction. Although the above methods can study RNA and RNA binding proteins that interact with RNA in high throughput, there are also some significant drawbacks: first, the information obtained is RNA fragment information, and the full-length information of the interacting RNA cannot be obtained, so that it cannot be excluded whether the interaction exists in the RNA precursor or the mature body; second, in the data obtained by sequencing, reads of high abundance RNAs predominate (e.g., rRNA, snRNA, tRNA, etc.), and too much such useless data results in a severe compression of the amount of useful data required, affecting data quality and result resolution. Therefore, how to capture the full length of each ncrnas and increase their duty cycle in sequencing data remains a major challenge.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a construction method and application for capturing a full-length non-coding RNA sequencing library, which can improve the ratio of sequencing sequences (reads) of target ncRNA in the sequencing library, obviously reduce useless sequencing, reduce sequencing cost and improve the accuracy of detecting the middle-low abundance ncRNA.

The invention also provides a method for sequencing the full-length non-coding RNA.

In a first aspect of the invention, there is provided a method of constructing a captured full-length non-coding RNA sequencing library, comprising:

s1, obtaining RNA of a sample to be detected, and respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of the RNA to obtain an RNA connection product;

s2, mixing the RNA connection product with a DNA probe targeting non-target RNA, and annealing to remove the non-target RNA and the residual DNA probe to obtain a target RNA connection product;

s3, designing a truncated reverse transcription primer aiming at the target RNA connection product, synthesizing cDNA, and then carrying out PCR amplification on the cDNA by using the primer containing the anchor base to obtain a captured full-length non-coding RNA sequencing library.

The construction method according to the embodiment of the invention has at least the following beneficial effects:

(1) Firstly, respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of RNA to capture double-end information of the RNA; and then annealing the RNA connection product and a DNA probe of the target non-target RNA, and respectively digesting the non-target RNA and the digested DNA probe by using RNase H and single-stranded DNA exonuclease RecJF, so that the interference of the non-target RNA and the residual DNA probe to subsequent experiments is effectively reduced, after the enriched target RNA connection product is obtained, a truncated reverse transcription primer is used for synthesizing cDNA, and then a primer containing an anchor base is used for PCR amplification, so that the accuracy of identifying the tail end of the RNA is improved.

(2) In the related technology, random primers are adopted for reverse transcription to acquire sequence information, and can not acquire information of two ends of RNA, but in the invention, the double ends of the RNA are effectively anchored by connecting a connector on a target RNA chain, so that the double-end information of the RNA can be acquired in a single base precision level through sequencing; and if the corresponding linker is not attached, the RNA duplex cannot be anchored. Therefore, the method can obtain the end with single base precision, and provides an effective means for accurately researching the structure and motif (motif) characteristics of the ncRNA and discovering novel types of ncRNA.

(3) In the invention, the two ends of the RNA are respectively connected with the 3'DNA connector and the 5' RNA connector to capture double-end information of the RNA, and in the process, the coding RNA (such as mRNA) has a 5 'end cap structure, so that the coding RNA cannot be connected with the 5' RNA connector when the connectors are connected, and further, the interference of the coding RNA on the construction and identification results of the full-length non-coding RNA library can be effectively avoided.

(4) The method for constructing the full-length non-coding RNA sequencing library can greatly improve the sequence ratio of target ncRNA in the sequencing library, obviously reduce useless sequencing and reduce the sequencing cost, and simultaneously can effectively improve the accuracy of detecting the middle-low abundance ncRNA.

In some embodiments of the invention, the full-length non-coding RNA comprises at least one of tRNA, rRNA, snRNA, snoRNA, scaRNA, miRNA, piRNA, circRNA and lncRNA.

Preferably, the non-coding RNA is a non-coding RNA of medium length and low abundance.

Preferably, the non-coding RNA is a Kink-turn type RNA.

In some embodiments of the invention, the RNA of the test sample is RNA from which genomic DNA is removed.

Preferably, the method for removing genomic DNA comprises: adding RQ1 DNase 1×reaction Buffer, 2U/. Mu. L RiboLock RNase Inhibitor and RQ1 RNase-Free DNase into RNA of a sample to be detected, reacting for 30 minutes at 37 ℃, and purifying the RNA by using an RNAClean & Concentrator-5 kit.

Wherein the RQ1 RNase-Free DNase can be Promega product with the product number of M6101; the RNA clear & Concentrator-5 can be specifically ZYMO RESEARCH product with the product number of R1015.

In some embodiments of the invention, the RNA of the test sample comprises at least one of total RNA of cellular origin, total RNA of tissue origin, RNA immunoprecipitated with RNA binding protein, RNA of different organelle origin.

Preferably, the RNA immunoprecipitated by the RNA binding protein comprises 15.5K immunoprecipitated RNA.

In some embodiments of the invention, the total RNA of cellular or tissue origin may be obtained using a TRIzol RNA extraction method.

In some embodiments of the invention, the 3' dna linker is an adenylated 5' terminal random base 3' dna linker;

preferably, the 3' dna linker is: rApNNNNNNTGGAATTCTCGGGTGCCAAGG-C3 Spacer, wherein rApp is adenylation modification, NNNNNN is deoxyribonucleotide of six random bases, N represents any one of four deoxyribonucleotides A, T, C, G, and C3 Spacer is a blocking group.

Preferably, the 3'DNA linker not linked is removed using 5' deadenylase, single-stranded DNA binding protein and RecJf; specifically, the 5' deanylase is reacted at 28-32 deg.c for 0.8-1.2 hr, the single stranded DNA binding protein is reacted on ice for 25-35 min and the RecJF is reacted at 36-38 deg.c for 0.8-1.2 hr.

In some embodiments of the invention, the 3 'end of the 5' rna linker carries random bases.

Preferably, the nucleotide sequence of the 5' rna linker is: guucagagucuacaguccgacgaucnnnn, wherein NNNNNN represents ribonucleotides of six random bases and N represents any one of A, U, C, G ribonucleotides.

Since the total RNA contains the highest abundance of rRNA, snRNA and snoRNA, the identification of the novel non-coding RNA species is also most affected, and thus the above 3 RNA species are taken as an example for the removal operation in the present invention.

Specifically, when the full-length non-coding RNA is selected from at least one of tRNA, scaRNA, miRNA, piRNA, circRNA and lncRNA, the non-target RNA comprises at least one of rRNA, snRNA, snoRNA.

Preferably, when the full-length non-coding RNA is a king-turn type RNA, the non-target RNA comprises at least one of rRNA, snRNA, snoRNA.

Preferably, the rRNA comprises 28S rRNA, 18S rRNA, 5.8S rRNA, 5S rRNA, 12S rRNA, and 16S rRNA; the snRNA comprises U1, U2, U4, U5, U6, U11, U12, U4atac and U6atac; the snoRNA includes SNORD101, SNORD20 and SNORA23.

In some embodiments of the invention, the nucleotide sequence of the snRNA-targeted DNA probe is shown in SEQ ID NOS.1-29.

In some embodiments of the invention, the nucleotide sequence of the snoRNA-targeting DNA probe is shown in SEQ ID No. 30-196.

In some embodiments of the invention, the DNA probe has a length of 38-55nt.

In some embodiments of the invention, the DNA probe has a length of 40-55nt. Probe spacer sequences less than 10nt DNA probes were designed to target non-target RNAs.

In some embodiments of the invention, the annealing temperature is 70-80 ℃.

In some embodiments of the invention, the RNA ligation product is mixed with the DNA probe in equal mass.

In some embodiments of the invention, the non-target RNA and the residual DNA probe are removed using an RNase H enzyme and an exonuclease RecJf, respectively.

In some embodiments of the invention, the truncated reverse transcription primer sequence is set forth in SEQ ID NO. 197.

The invention uses truncated reverse transcription primer, which can reduce the mismatch probability.

In some embodiments of the invention, the fragment size in the non-coding RNA sequencing library is 150bp to 1500bp.

Preferably, the fragment size in the non-coding RNA sequencing library is 150 bp-700 bp.

In a second aspect of the invention, there is provided a method of sequencing full length non-coding RNA comprising constructing a sequencing library using the method described above; and sequencing the sequencing library.

In some embodiments of the invention, the sequencing is PE150 double-ended sequencing.

The sequencing library of the invention can be combined with RNA from different sources for construction and sequencing analysis. For example, a library of PEN-seq (Sequencing of Paired-Ends of NcRNAs, PEN-seq) sequencing against total RNA of cellular or tissue origin; sub-PEN-seq sequencing library for RNA of each cell component (Sequencing of Paired-Ends of subcellular NcRNAs, sub-PEN-seq), RIP-PEN-seq sequencing library for RNA immunoprecipitated with RNA binding proteins (RNA ImmunoPrecipitation coupled with sequencing of Paired-Ends of NcRNAs, RIP-PEN-seq), etc. The construction and sequencing of the sub-PEN-seq sequencing library comprises the steps of separating RNA (such as cytoplasmic RNA, nuclear RNA, nucleolus RNA and the like) of each component of a cell, and then carrying out double-end sequencing of the ncRNA and determination of the full-length sequence by adopting a PEN-seq strategy. While construction and sequencing of the RIP-PEN-seq sequencing library involves RNA immunoprecipitation using a K-turn RNA specific binding protein, enrichment of ncRNA, and then enrichment of the K-turn RNA ligation product in combination with PEN-seq. The RIP-PEN-seq technology combines the RNA co-immunoprecipitation technology and the PEN-seq technology, and can accurately identify the full-length sequence of RNA while capturing RNA interaction.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The invention is further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a schematic diagram of the construction of sequencing libraries of PEN-seq, RIP-PEN-seq and sub-PEN-seq according to the present invention.

FIG. 2 is a flow chart of the invention for identifying RNA double-ended and full-length sequences based on PEN-seq, RIP-PEN-seq and sub-PEN-seq libraries.

FIG. 3 is a schematic representation of the sequencing data analysis flow of the present invention.

FIG. 4 is a computer analysis flow for K-turn RNA identification based on the K-turn result motif contained in the K-turn RNA of the present invention;

FIG. 5 is a graph showing the effect of detecting stable knockdown of HEK293T cells by qPCR and Western blot experiments;

FIG. 6 is a graph showing the comparison of the known K-turn RNA (i.e., box C/DsnoRNA) start point with the annotated start point identified using PEN-seq in the shNC of the negative control cell of the present invention.

FIG. 7 is a graph showing the comparison of known K-turn RNA (i.e., box C/DsnoRNA) end points identified using PEN-seq with annotated end points in a negative control cell shNC according to the present invention.

FIG. 8 is a double-ended site and full length of K-turn RNAbktRNA1 in shNC, sh15.5K-1 and sh15.5K-2 cells visualized using IGVs according to the invention.

FIG. 9 is a graph of the expression levels of K-turn RNA in shNC, sh15.5K-1, and sh15.5K-2 cells using a heat map according to an embodiment of the present invention.

FIG. 10 is a graph of variation in K-turn RNA expression levels in shNC, sh15.5K-1, and sh15.5K-2 using violin in accordance with the present invention.

FIG. 11 shows the Western blot of the invention demonstrating the overexpression of FLAG-15.5K in a cell line stably expressing FLAG-15.5K (A) and immunoprecipitation of FLAG-15.5K protein (B), wherein pCGP is a negative control cell.

FIG. 12 is a graph (A) of the results of a comparison of the start point of a known K-turn RNA (i.e., box C/D snoRNA) with the annotated start point, and a graph (B) of the results of a comparison of the end point of a known K-turn RNA (i.e., box C/D snoRNA) with the annotated end point, as identified by the 15.5K RIP-PEN-seq of the present invention.

FIG. 13 is a double-ended site and full length of K-turn RNA in 10 GAS5 introns identified using UCSC visualization 15.5K RIP-PEN-seq, wherein Coverage indicates full length of RNA and expression levels, according to the present invention.

FIG. 14 is a double-ended site and full length of K-turn RNAbktRNA1 in the CWD19L1 intron identified using UCSC visualization 15.5K RIP-PEN-seq, wherein Coverage indicates full length of RNA and expression levels, and Conservation is evolutionary Conservation of bktRNA1 in vertebrates 100, in accordance with the present invention.

FIG. 15 shows the Western blot of the present invention demonstrating the separation effect of Cytoplasm (Cyto), cytoplasm (Np) and Nucleolus (Nucleolus, no) in HEK293T cells (A) and HCT116 (B) cells.

FIG. 16 shows the double-ended and full-length sites of K-turn RNAbktRNA1 identified in each cell fraction sub-PEN-seq of HEK293T of the present invention.

FIG. 17 is a double-ended site and full length of K-turn RNAbktRNA1 identified in HCT116 sub-PEN-seq of the present invention.

FIG. 18 is a graph of the expression levels of K-turn RNA in individual cell fractions in HEK293T and HCT116 cell sub-PEN-seq data using a thermal map according to the present invention.

FIG. 19 is a graph of the use of violin to analyze the differences in K-turn RNA expression levels between different cell components in HEK293T and HCT116 cell sub-PEN-seq data in accordance with the present invention.

Detailed Description

The conception and the technical effects produced by the present invention will be clearly and completely described in conjunction with the embodiments below to fully understand the objects, features and effects of the present invention. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present invention based on the embodiments of the present invention.

In the description of the present invention, the descriptions of the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In an embodiment of the invention, the solvent of the cell membrane lysate is 10mM Tris-HCl buffer pH 7.5, the solutes and their concentrations are as follows: 10mM NaCl,3mM MgCl ₂ 0.3% (by volume) NP-40, 10% (by volume) glycerol, 1mM DTT,100U/mL RiboLock RNase Inhibitor,400 μ M Ribonucleoside Vanadyl Complex, wherein RiboLock RNase Inhibitor is specifically a Thermo Fisher product, cat# EO0381; ribonucleoside Vanadyl Complex is NEB product with the product number S1402S.

In an embodiment of the invention, the S1 sucrose solution formulation is: 0.25M sucrose, 10mM MgCl ₂ 1mM DTT,100U/mL RiboLock RNase Inhibitor, 400. Mu. M Ribonucleoside Vanadyl Complex; the S2 sucrose solution comprises the following components: 0.34M sucrose, 5mM MgCl ₂ 1mM DTT,100U/mL RiboLock RNase Inhibitor, 400. Mu. MRibonucleoside Vanadyl Complex. The S3 sucrose solution comprises the following components: 0.88M sucrose, 5mM MgCl ₂ ，1mM DTT，100U/mL RiboLock RNase Inhibitor，400μM Ribonucleoside Vanadyl Complex。

In an embodiment of the invention, the solvent of the RIP binding solution is 50mM Tris-HCl buffer solution with pH of 7.5, and the solute is as follows: 150mM NaCl,1mM MgCl2,0.05% (volume percent) NP-40, 20mM EDTA-Na2,1mM DTT,100U/mL RiboLock RNase Inhibitor, 1X Protease Inhibitor Cocktail.

The specific conditions are not noted in the examples and are carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.

In an embodiment of the invention, the PEN-seq sequencing library construction is using total RNA of cellular or tissue origin; RIP-PEN-seq sequencing library construction RNA immunoprecipitated using RNA binding proteins RNA from cell fractions was constructed using the sub-PEN-seq sequencing library.

In the embodiment of the invention, the basic process of sequencing library construction is shown in FIG. 1, and specifically comprises the following steps: after RNA is obtained, an adenylated DNA joint with random bases is connected to the 3 'end of the ncRNA, and an RNA joint with random bases is connected to the 5' end; then, specific DNA probes are designed for non-target RNAs (such as rRNA, snRNA, snorRNA and the like), after annealing the ligation products and the DNA probes, the non-target RNAs are digested by RNase H, then the DNA probes are digested by single-stranded DNA 5'-3' exonuclease RecJF, the target ncRNA ligation products are enriched, then the target RNA ligation products are transcribed into cDNA by using truncated reverse transcription primers, and the cDNA is amplified by using primers containing anchored bases, so that a sequencing library is obtained, and then the subsequent double-ended PE150 high-throughput sequencing is performed.

The method for capturing the non-coding RNA to construct the sequencing library, combined with the novel double-end sequencing technology, can effectively improve the ratio of sequencing sequences (reads) of target ncRNA in data and reduce the sequencing cost.

Example 1 method for construction of sequencing library

The embodiment of the invention provides a method for constructing a sequencing library, which comprises the steps of cell culture, total RNA extraction, DNase removal of genomic DNA in RNA, 3'DNA joint connection and residual joint removal, 5' RNA joint connection, non-target RNA removal, reverse transcription, library amplification and the like.

1. Extraction of total RNA from cells

After 1mL of Trizol was added after the medium was discarded and the lysate was transferred to a 1.5mL centrifuge tube after 10 minutes of room temperature lysis, 200 μl of chloroform was added, vortexed and mixed for 15 seconds and left at room temperature for 3 minutes, when the cells in the 6 well plate were grown to about 90% confluency. Centrifuge 13000 Xg for 10 min at 4℃and leave 500. Mu.L of supernatant, add 500. Mu.L of isopropanol, mix well and precipitate for 10 min at room temperature. Then centrifugation was performed at 20000 Xg for 10 minutes at 4℃and the supernatant was discarded, and the RNA pellet was washed by adding 1mL of 75% ethanol (DEPC water configuration) and then centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. This procedure was repeated once. The precipitate was washed once with 1mL of absolute ethanol, centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. The RNA pellet was dried in vacuo. 30. Mu.L of DEPC water was added for solubilization, and then the RNA concentration was determined using NanoDrop (total RNA required for RNA integrity to meet 28SrRNA:18S rRNA approximately equal to 2, A260/A280 greater than 2, A260/A230 greater than 2), and either the next experiment was directly performed or-80℃for storage of acceptable total RNA samples.

2. Removal of genomic DNA from Total RNA

Mu.g of total RNA sample was taken, 5. Mu.L of RQ1 DNase 10 Xreaction Buffer (RQ 1 RNase-Free Dnase kit Reaction Buffer), 2.5. Mu. L RiboLock RNase Inhibitor and 5. Mu.L of RQ1 RNase-Free Dnase (Promega Co., product, cat# M6101) were added, and DEPC water was further added to 50. Mu.L; the reaction was carried out at 37℃for 30 minutes. The RNA from the above reaction was then purified using RNAClean & Concentrator-5. After concentration measurement using NanoDrop, the samples were directly subjected to subsequent linker ligation or stored at-80℃after elution with 12. Mu.L of DEPC water.

3. 3' DNA linker ligation

(1) Adenylation treatment of 3' DNA linkers

500pmol of 3'DNA adaptor was added to 10. Mu.L of 10X 5'DNA Adenylation Reaction Buffer (Mth RNA ligation kit buffer), 10. Mu.L of riboLock 1mM ATP and 10. Mu.L of Mth RNA ligation (NEB Co., ltd.; product No. M2610), and DEPC water was added to 100. Mu.L; the reaction was carried out at 65℃for 2 hours and the enzyme was inactivated at 85℃for 10 minutes. The DNA linker was then purified using Oligo Clean & Concentrator, and then eluted with 20. Mu.L of DEPC water, and after concentration was measured using NanoDrop, the concentration of the DNA linker was adjusted to 20. Mu.M to give an adenylated 3'DNA linker, wherein the adenylated 3' DNA linker was: rApNNNNNNTGGAATTCTCGGGTGCCAAGG-C3 Spacer, wherein rApp is adenylation modification, NNNNNN is deoxyribonucleotide of six random bases, N represents any one of four deoxyribonucleotides A, T, C, G, and C3 Spacer is a blocking group.

(2) 3' DNA linker ligation

500ng of the total RNA from which the genomic DNA was removed was taken, DEPC water was added to 10.5. Mu.L, and 0.5. Mu.L of the adenylated 3' DNA linker was added thereto, and after mixing, the mixture was denatured at 70℃for 2 minutes, and then immediately placed on ice for 2 minutes. Then, 2. Mu.L of 10 XT 4 RNALigase 2,truncated KQ reaction buffer (T4 RNALigase 2, truncated KQ kit reaction buffer), 5. Mu.L of PEG 8000MW (50%), 1. Mu. L RiboLock RNase Inhibitor and 1. Mu. L T4 RNALigase 2, truncated KQ (NEB Co., ltd., product No. M0373) were added, and after mixing, the mixture was reacted at 16℃for 18 hours.

(3) Removal of residual joints

2. Mu.L of 5' deadienase (product of NEB company, product of product No. M0331) was added to the reaction system, and after mixing, the mixture was reacted at 30℃for 1 hour, and then 2. Mu.g of single-stranded DNA binding protein (product of Promega company, product of product No. M3011) was added, and after mixing, the mixture was reacted on ice for 30 minutes, and then 2. Mu.L of RecJF (product of NEB company, product of product No. M0264) was added, and after mixing, the mixture was reacted at 37℃for 1 hour.

4. 5' RNA linker ligation

2. Mu.L (40 pmol) of denatured 5'RNA linker, 2. Mu.L of 10 XT 4 RNA Ligase reaction buffer (T4 RNA Ligase 1-matched reaction buffer), 2.56. Mu.L of PEG 8000MW (50%), 1. Mu. L RiboLock RNase Inhibitor, 4. Mu.L of 10mM ATP and 4. Mu. L T4 RNA Ligase 1 (NEB Co., product, cat. No. M0204) were added to the reaction system after removal of the residual linker, and after mixing, the reaction was allowed to proceed for 18 hours at 16℃and then the RNA thus reacted was purified by RNAClean & Concentrator-5 and eluted with 12. Mu.L of DEPC water to give an RNA ligation product, wherein the nucleotide sequence of the 5' RNA linker was: guucagagucuacaguccgacgaucnnnn, wherein NNNNNN represents ribonucleotides of six random bases and N represents any one of A, T, C, G four deoxyribonucleotides.

5. Removal of non-target RNA

11.2. Mu.L of the RNA ligation product obtained above was taken, 0.8. Mu.L of a DNA probe (50. Mu.M) targeting non-target RNA was added, 3. Mu.L of 5 Xannealing buffer was mixed, reacted at 95℃for 2 minutes, then cooled to 22℃at 0.1℃per second, kept at 22℃for 5 minutes, and then placed on ice.

DNA probes targeting rRNA reference published literature (Adiconis, X., et al, comparative analysis of RNA sequencing Methods for degraded or low-input samples Nat Methods,2013.10 (7): p.623-9.) the DNA probe sequences targeting snRNA are shown in Table 1.

Table 1: DNA probe sequence information targeting snRNA

The sequence of the DNA probe targeting the snoRNA is shown in SEQ ID NO. 30-196 (Table 2).

Table 2: DNA probe sequence information targeting snoRNA

/>

Then, 2. Mu.L of 10X RNase H reaction buffer (RNase H-supporting reaction buffer), 0.2. Mu. L RiboLock RNase Inhibitor and 2. Mu.L of RNase H (NEB Co., ltd., product No. M0297) were added to the above reaction system, and DEPC water was added to 20. Mu.L. After mixing, the mixture was reacted at 37℃for 30 minutes, and then the RNA thus reacted was purified using RNA Clean & Concentrator-5, and finally eluted with 22. Mu.L of DEPC water.

6. Removal of DNA probes

21.5. Mu.L of the above product from which non-target RNA was removed was denatured at 70℃for 2 minutes, immediately placed on ice for 2 minutes, then 3. Mu.L of 10 XNEBuffer 2 (RecJF kit reaction buffer) was added, 1. Mu. L RiboLock RNase Inhibitor and 7. Mu.g of single-stranded DNA binding protein were mixed, and placed on ice for 30 minutes. mu.L RecJF was added and reacted at 37℃for 1 hour to digest the DNA probe, followed by RNA purification using RNA Clean & Concentrator-5. The samples were eluted with 12. Mu.L of DEPC water and either directly subjected to the next reaction or stored at-80 ℃.

7. Reverse transcription reaction

Taking 11.5. Mu.L of the above DNA probe-removed product, adding 0.5. Mu.L of 40. Mu.M truncated reverse transcription primer, mixing, denaturing at 65℃for 5 minutes, immediately placing on ice, then adding 4. Mu.L of 5 XRT buffer (Thermo Fisher Co., product under the name 18090050), 1. Mu.L of 100mM DTT, 1. Mu.L of 10mM dNTPs, 1. Mu. L RiboLock RNase Inhibitor and 1. Mu. LSuperScript IV Reverse Transcriptase, mixing, and reacting at 50℃for 60 minutes, wherein the nucleotide sequence of the truncated reverse transcription primer is as follows: GCCTTGGCACCCGAGAAT (SEQ ID NO. 197).

To the reaction system, 4. Mu.L of Exoneclease I (NEB product, cat. No. M0293) and 4. Mu.L of rSAP (NEB product, cat. No. M0371) were added, and reacted at 37℃for 15 minutes. Then, 5. Mu.L of 0.5M EDTA and 7. Mu.L of 1M NaOH were added thereto, and the mixture was stirred well and reacted at 70℃for 12 minutes. cDNA purification was then performed using an Oligo Clean & Concentrator. The cDNA was obtained by eluting with 16. Mu.L of DEPC water.

8. Sequencing adapter ligation

mu.L of the cDNA obtained above was taken, and 25. Mu. L NEBNext Ultra II Q5 of 5 Master Mix (NEB Co., ltd., product No. M0544) 5. Mu.L of RP1 (10. Mu.M) and 5. Mu.L of RPI-X (10. Mu.M, comprising a series of primers containing different INDEXs and containing bases for anchoring 3' DNA linkers on the primers) were added to carry out PCR reaction.

Wherein the primer sequences of RP1 and RPI 1-12 are shown in Table 3.

Table 3: primer sequences

Note that: the underlined parts of the table are the inserted INDEX sequences and the part is the thio modification.

The PCR reaction procedure was: pre-denaturation at 98 ℃ for 30 seconds; 15 cycles: denaturation at 98℃for 10 sec, annealing at 65℃for 75 sec; extending at 65deg.C for 5 min, and preserving at 4deg.C.

After completion of the PCR reaction, purification was performed using DNA Clean & Concentator-5, elution was performed with 20. Mu.L of enzyme-free water, then electrophoresis was performed using 4% low melting agarose, bands ranging from 150 to 700bp were recovered using Zymoclean Gel DNA Recovery Kit, then elution was performed with 18. Mu.L of DEPC water, and concentration measurement was performed on the recovered products of the gel using NanoDrop, thereby obtaining a sequencing library.

After the sequencing library is obtained, PE150 double-end sequencing is carried out on the sequencing library by using a Illumina Hiseq Xten sequencer, and the double ends of non-coding RNA are identified and the full-length sequence of the non-coding RNA is analyzed.

Example 2 construction method of PEN-seq sequencing library

The PEN-seq sequencing library in this example was constructed using total RNA from stably knocked down 15.5K HEK293T cells and sequenced, including the following procedures.

Culture of stable knockdown 15.5K HEK293T cells

In HEK293T cells, a cell line (sh15.5K-1, sh15.5K-2 and control shNC) that induces silencing of 15.5K was constructed in the manner of miR-E as described in the published article (Fellmann, C., et al An optimized microRNA backbone for effective single-copy RNAi. Cell Rep,2013.5 (6): p.1704-13.). The three cells were inoculated into a 6-well plate, and after culturing for 24 hours, doxycycline (product of Selleck Co., ltd., product No. S5159) was added at a final concentration of 3. Mu.M, and culturing was continued for 48 hours.

(II) extraction of stable knockdown 15.5K HEK293T Total RNA

After 1mL of Trizol after the medium was discarded and after 10 minutes at room temperature, the lysate was transferred to a 1.5mL centrifuge tube and 200 μl of chloroform was added, vortexed and mixed for 15 seconds and left at room temperature for 3 minutes. Centrifuge 13000 Xg for 10 min at 4℃and leave 500. Mu.L of supernatant, add 500. Mu.L of isopropanol, mix well and precipitate for 10 min at room temperature. Then centrifugation was performed at 20000 Xg for 10 minutes at 4℃and the supernatant was discarded, and the RNA pellet was washed by adding 1mL of 75% ethanol (DEPC water configuration) and then centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. This procedure was repeated once. The precipitate was washed once with 1mL of absolute ethanol, centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. The RNA pellet was dried in vacuo. 30. Mu.L of DEPC water was added for solubilization and RNA concentration was determined using NanoDrop (total RNA requires that the RNA integrity satisfy 28S rRNA:18S rRNA approximately equal to 2, A260/A280 greater than 2, A260/A230 greater than 2).

(III) construction of a stable 15.5K knock-down PEN-seq library

The construction of the PEN-seq sequencing library in this example is shown with particular reference to steps 2-8 of example 1.

(IV) high throughput sequencing

PE150 double-ended sequencing was performed on the PEN-seq constructed as described above using a Illumina Hiseq Xten sequencer.

EXAMPLE 3 construction of RIP-PEN-seq sequencing library

RIP-PEN-seq library construction of this example uses RNA immunoprecipitated with RNA binding proteins, which specifically includes HEK293T cell culture stably expressing FLAG-15.5K, cell lysis, RNA immunoprecipitation, FLAG-15.5K interacting RNA isolation, and library construction and sequencing thereof.

Construction of cell lines stably overexpressing FLAG-15.5K and cell harvesting

HEK293T cells stably expressing FLAG-15.5K are constructed by using lentiviral vectors, the cells are expanded, when the cells in a cell culture dish grow to about 90% confluence, after the culture medium is abandoned, pre-cooled DPBS is added for washing twice, after the DPBS is abandoned, 3mL of pre-cooled DPBS is added, the cells are collected in a centrifuge tube by using cell scraping, and after centrifugation is carried out for 5 minutes at 1000 Xg and 4 ℃, the upper DPBS layer is abandoned, and cell precipitation is obtained.

(II) cell lysis and RNA immunoprecipitation

Adding an equal volume of cell lysis buffer solution into the cell pellet, suspending the cell pellet by using a pipettor, incubating on ice for 15 minutes, centrifuging at 15000 Xg and 4 ℃ for 15 minutes, retaining the upper cell lysate, adding 1/20 volume of Dynabeads protein G magnetic beads into the cell lysate, rotating at 4 ℃ for 30 minutes, separating the magnetic beads from the cell lysate by using a magnetic rack, diluting the cell lysate by 10 times by using RIP binding solution, adding an antibody of RNA binding protein (FLAG antibody targeting FLAG-15.5K here) according to the proportion of 5 mu g/mL of the cell lysis dilution, and rotating at 4 ℃ for 12 hours. Dynabeads protein G (Thermo Fisher Co., ltd., product number 10004D) was then added at a rate of 10. Mu.L/g antibody, and incubation was continued at 4℃for 3 hours with rotation.

(III) FLAG-15.5K interaction RNA isolation

After the incubation was completed, the magnetic beads and the solution were separated using a magnetic rack, the solution was discarded, then 1mL of RIP washing solution was added to the magnetic beads, and the beads and the solution were rotated at room temperature for 3 minutes, and the solution was discarded using a magnetic rack. The washing was repeated 4 more times. Then, 1mL of TRIzol (product of Thermo Fisher Co., ltd., product No. 15596018) was added to the washed beads, and after mixing, the mixture was left at room temperature for 5 minutes, 200. Mu.L of chloroform was then added, and the mixture was vortexed and mixed for 15 seconds, and left at room temperature for 3 minutes. Centrifuge 13000 Xg for 10 min at 4℃and leave 500. Mu.L of supernatant, add 500. Mu.L of isopropanol and 4. Mu.L of glycogen (product of Thermo Fisher Co., ltd.; product No. AM 9510), mix well and precipitate overnight at-20 ℃. The RNA pellet was washed by adding 1mL of 75% ethanol (DEPC water preparation) and centrifuged at 20000 Xg for 30 minutes at 4℃and then at 20000 Xg for 5 minutes, and the supernatant was discarded. This procedure was repeated once. The precipitate was washed once with 1mL of absolute ethanol, centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. The RNA pellet was dried in vacuo. 30. Mu.L of DEPC water was added for solubilization, and the RNA concentration was determined using NanoDrop (A260/A280 greater than 2, A260/A230 greater than 2), and the qualified RNA samples were either directly subjected to the next experiment or stored at-80 ℃.

(IV) RIP-PEN-seq sequencing library preparation and sequencing

The preparation and sequencing methods of the RIP-PEN-seq sequencing library in this example are shown with particular reference to steps 2-8 in example 1.

EXAMPLE 4 construction of a sub-PEN-seq sequencing library

The sub-PEN-seq sequencing library in the embodiment adopts RNA derived from each component of the cells, and the specific construction method comprises the following procedures.

HEK293T and HCT116 cell culture and collection

HEK293T and HCT116 cells cultured in a laboratory are taken as samples, and the initial amount of the cell samples is 3 multiplied by 10 ⁷ After the cells were grown to about 90% confluence in the cell culture dish, the cells were washed twice by adding DPBS solution (pH 7.4) to the cells after discarding the medium; then, cells and tissues were digested with pancreatin, after termination of digestion with serum-containing medium, the cell suspension was collected in conical tubes, placed on ice, centrifuged at 500×g at 4 ℃ for 5 minutes, the supernatant was discarded, the cells were resuspended with pre-chilled DPBS and counted, while the relative volume RV of the cells was determined.

(II) cytoplasmic RNA isolation

Pre-chilled cell membrane lysate was added at 15-fold relative volumes, the cells resuspended with a pipette and gently mixed and placed on ice for 10 minutes. Gently vortexing, centrifuging at 1000 Xg at 4deg.C for 3 min, transferring the supernatant (i.e., cytoplasmic fraction) to a new centrifuge tube, and precipitating to obtain the nuclear fraction. For cytoplasmic fractions, 950. Mu.L of absolute ethanol and 50. Mu.L of 3M sodium acetate (product of pH 5.5,Thermo Fisher, product of company, cat# AM 9740) were added per 330. Mu.L of cytoplasmic fraction, and the mixture was homogenized and then precipitated at-20℃for 2 hours. Then 18000 Xg, centrifuged at 4℃for 15 minutes, and the supernatant was discarded. 1mL of 75% ethanol was added, washed by vortexing, centrifuged at 18000 Xg for 5 minutes at 4℃and slightly dried (naturally dried in air) after removing the supernatant, 1mL of TRIzol was added for cleavage, and after 10. Mu. L0.5M EDTA was added, the mixture was incubated at 65℃for 10 minutes to sufficiently dissolve RNA, and then RNA extraction was performed using chloroform to obtain cytoplasmic RNA.

(III) isolation of cytoplasmic RNA

The nuclei were washed by adding 30 times the relative volume of the pre-chilled cell membrane lysate to the nuclei fraction of (II) above, and centrifuged at 200 Xg for 2 minutes at 4℃and this step was repeated once. Then, 30 times of the relative volume of the cell membrane lysate was added, and after resuspension of the nuclei, the nuclei were centrifuged at 1200 Xg for 5 minutes at 4℃and the supernatant was discarded. The nuclei were resuspended by adding 10 relative volumes of S1 sucrose solution, and then added to 10 relative volumes of S3 sucrose solution. The obtained precipitate is purified cell nucleus after centrifugation at 1200 Xg and 4 ℃ for 10 minutes. Adding 10 times of S2 sucrose solution with relative volume into the purified cell nucleus precipitate for resuspension and transferring to a new tube, and then performing ultrasonic disruption under the following ultrasonic conditions: power is 50%, ultrasound is 15 seconds, 45 seconds apart, ultrasound 7 times. The heavy suspension was then added to 10 times the relative volume of S3 sucrose solution, 2000 Xg, and centrifuged at 4℃for 20 minutes. The supernatant contained the cytoplasm and the pellet contained the nucleolus. Taking out the cytoplasm of the supernatant, adding 950 mu L of absolute ethyl alcohol and 50 mu L of 3M sodium acetate into the supernatant according to each 330 mu L, uniformly mixing, precipitating for 2 hours at the temperature of minus 20 ℃, and extracting the cytoplasm RNA by referring to the method (II) to obtain the cytoplasm RNA.

(IV) separation of nucleolus RNA

And (3) re-suspending the cell nucleolus precipitate in 500 mu L S sucrose solution, centrifuging at 2000 Xg and 4 ℃ for 5 minutes, removing the supernatant, adding 1mL TRIzol, cracking at room temperature for 10 minutes, and extracting RNA by using chloroform to obtain the cell nucleolus RNA.

Preparation and sequencing of the sub-PEN-seq library

The preparation and sequencing methods of the sub-PEN-seq sequencing library in this example are shown in steps 2-8 in example 1.

Application example 1 construction of PEN-seq sequencing library and sequencing analysis

In the application example, total RNA is separated from three stable cell lines of HEK293T-shNC, sh15.5K-1 and sh15.5K-2 treated by doxycycline, library preparation and sequencing are carried out by adopting the construction method of the PEN-seq sequencing library in the example 2, and then data analysis is carried out.

1. Data analysis method

The data analysis procedure for identifying double-ended and full-length sequences of non-coding RNA from PEN-seq sequencing libraries is shown in FIGS. 2 and 3, and specifically includes: after the obtained PEN-seq sequencing library original high-throughput double-ended sequencing data, firstly analyzing a joint sequence and a low-quality sequence in the PEN-seq original double-ended sequencing data by using Cutadapt (v 2.8) software, then comparing the filtered data to a human reference genome (hg 38 version) by using sequence comparison software STAR (v 2.7.1 a), and after the BAM file of the comparison result is read by using SAMtools, performing clustered analysis based on the overlapping condition between sequences, and finally determining double-ended sites (starting points and end points) and full-length sequences of non-coding RNA from the clustered analysis result.

The PEN-seq data analysis results can be visually displayed by using IGVs or UCSCs. And, by using the PEN-seq data analysis result, the screening of the K-turn RNA can be further performed based on the structural characteristics of the K-turn RNA, and the computer analysis flow for the identification of the K-turn RNA based on the K-turn structural motif contained in the K-turn RNA is shown in FIG. 4.

2. Data analysis results

Firstly, qPCR and Western blot experiments are used for detecting an effect diagram of stable knocking-down of HEK293T cells by 15.5K, wherein the effect diagram is shown in FIG. 5, shNC, sh15.5K-1 and sh15.5K-2 HEK293T stable strain cells are respectively treated by DMSO or doxycycline for 48 hours, RNA and protein are collected, qPCR results show that compared with negative control group cells shNC, sh15.5K-1 and sh15.5K-2 treated by doxycycline, the shRNA expression of targeted shRNA of 15.5K can be obviously reduced after the shRNA is induced by the doxycycline, and the Western blot experiment results show that the protein level is also obviously reduced.

The double-ended site and full-length sequence of known and new K-turn RNAs are identified based on the three-dimensional structural features of the K-turn, by comparing with annotated starting and ending positions of the known K-turn RNAs, wherein the comparison result of the known K-turn RNA (i.e. box C/D snorNA) starting point and the annotated starting point identified by using the PEN-seq sequencing library is shown in FIG. 6, and the result shows that the detection of the PEN-seq sequencing library can accurately identify the starting point of the known K-turn RNA; the results of comparing the known K-turn RNA (i.e., box C/D snorNA) end-point with the annotated end-point identified using the PEN-seq sequencing library are shown in FIG. 7, which shows that the end-point of the known K-turn RNA can be accurately identified for the PEN-seq sequencing library.

The above results indicate that the full-length sequence of the known K-turn RNA can be accurately identified in PEN-seq libraries constructed based on HEK293T-shNC cells.

In addition, a new batch of K-turn RNA was identified by high throughput sequencing using the PEN-seq sequencing library constructed according to the present invention, as shown in FIG. 8, wherein the decrease in bktRNA1 levels in the silenced 15.5K Cell lines sh15.5K-1 and sh15.5K-2 (Coverage indicates the full length and expression level of the RNA) further suggests that the use of the PEN-seq sequencing library can clearly determine its start and end points and that its expression significantly decreases after silencing 15.5K, which is consistent with the existing literature report (Watkins, N.J., A.Dickmanns, and R.Luhrmann, conserved stem II of the box C/D motif is essential for nucleolar localization and is required, along with the 15.5K protein,for the hierarchical assembly of the box C/D snorNP.mol Cell Biol,2002.22 (23): p.8342-52.) 15.5K can promote processing of K-rn RNA.

For a newly identified pool of K-turn RNAs, their expression levels in shNC, sh15.5K-1, and sh15.5K-2 cells were further compared, and silencing 15.5K significantly down-regulates the expression levels of these K-turn RNAs similar to bktRNA1, as shown in FIG. 9, where the results of the analysis of the expression levels of K-turn RNAs in shNC, sh15.5K-1, and sh15.5K-2 cells using a heat map were examined, with RPM Reads Per Million Reads. Analysis of changes in K-turn RNA expression levels in shNC, sh15.5K-1 and sh15.5K-2 using violin plots, the results are shown in FIG. 10, and the results show that silencing of 15.5K significantly down-regulates the expression level of K-turn RNA.

In conclusion, analysis of the results shows that the method provided by the invention is adopted to construct a PEN-seq sequencing library and carry out sequencing analysis, can effectively capture the full-length non-coding RNA sequence information, and has important significance in researching the transcription level of non-coding RNA.

Application example 2 construction of RIP-PEN-seq sequencing library and sequencing analysis

This application was performed by collecting HEK293T-FLAG-15.5K cells stably expressing FLAG-15.5K, immunoprecipitation of FLAG-15.5K using FLAG antibodies, and library preparation and sequencing using the RIP-PEN-seq sequencing library construction method of example 3, followed by data analysis.

1. Data analysis method

The data analysis procedure for identifying double-ended and full-length sequences of non-coding RNA from RIP-PEN-seq specifically comprises: firstly, analyzing a linker sequence and a low-quality sequence in original double-ended sequencing data of a PEN-seq sequencing library by using Cutadapt (v 2.8), then, comparing filtered data to a human reference genome (hg 38 version) by using sequence comparison software STAR (v 2.7.1 a), performing clustering analysis by using SAMtools after reading BAM files of comparison results based on overlapping conditions among sequences, and finally, determining double-ended sites (starting points and end points) and full-length sequences of non-coding RNA from clustering analysis results. The RIP-PEN-seq data analysis result can be visually displayed by using IGV or UCSC. And, using the results of the RIP-PEN-seq data analysis, K-turn RNA interacting with 15.5K was determined.

2. Data analysis results

Collecting HEK293T-FLAG-15.5K cells stably expressing FLAG-15.5K, and verifying the over-expression condition of FLAG-15.5K in a cell strain stably expressing FLAG-15.5K by Western blot, wherein pCGP is a negative control cell, and the result is shown as A in FIG. 11, and shows that the expression of FLAG-15.5K protein is obviously increased in the HEK293T-FLAG-15.5K cells, and the FLAG-15.5K is immunoprecipitated by using a FLAG antibody; the effect of immunoprecipitation of FLAG-15.5K protein in HEK293T-FLAG-15.5K was detected by Western blot, and the results are shown in FIG. 11B, which shows that FLAG-15.5K protein can be significantly enriched after immunoprecipitation in HEK293T-FLAG-15.5K cells using FLAG, a specific antibody targeting FLAG-15.5K.

Library preparation, sequencing and data analysis were further performed following the RIP-PEN-seq procedure, identifying both the double-ended sites and full-length sequences of known and novel K-turn RNAs and interactions with 15.5K. The results of comparing the known K-turn RNA (i.e., box C/DsnoRNA) start identified by 15.5K RIP-PEN-seq with the annotated start are shown in FIG. 12A by comparison with the annotated start and end positions of the known K-turn RNA, which shows that constructing a RIP-PEN-seq sequencing library for sequencing can accurately identify the start of the known K-turn RNA; 15.5K RIP-PEN-seq the comparison of the known K-turn RNA (i.e., box C/D snoRNA) endpoint with the annotated endpoint is shown as B at 12, which shows that constructing a RIP-PEN-seq sequencing library for sequencing can also accurately identify the endpoint of the known K-turn RNA.

The above results demonstrate that constructing a RIP-PEN-seq sequencing library using the methods of the present invention and performing high throughput sequencing can accurately identify the double-ended site and full length of a known K-turn RNA, e.g., the K-turn RNA in the 10 GAS5 introns shown in FIG. 13. In addition, a new set of K-turn RNAs was identified, such as bktRNA1, starting and ending in FIG. 14.

Application example 3 data processing of sub-PEN-seq sequencing library

After collecting the cultured HEK293T and HCT116 cells, a sub-PEN-seq sequencing library and sequencing were constructed and data analysis was performed according to the method of example 4.

1. Data analysis method

Data identifying double-ended and full-length sequences and cell localization information for non-coding RNAs from sub-PEN-seq. Firstly, analyzing a linker sequence and a low-quality sequence in original double-ended sequencing data of a PEN-seq library by using Cutadapt (v 2.8) software, then, comparing the filtered data to a human reference genome (hg 38 version) by using sequence comparison software STAR (v 2.7.1 a), performing clustering analysis by using SAMtools to read BAM files of comparison results based on overlapping conditions among sequences, and finally, determining double-ended sites (starting points and end points) and full-length sequences of non-coding RNA from clustering analysis results. The sub-PEN-seq data analysis result can be visually displayed by using the IGV or UCSC. And, using the sub-PEN-seq data analysis results, K-turn RNA screening can be further performed based on the structural characteristics of the K-turn RNA and the distribution of the K-turn RNA in different components of the cell and the cell localization information of the RNA can be analyzed.

2. Data analysis results

After collecting the cultured HEK293T and HCT116 cells, the separation of three components of Cytoplasm, cytoplasm and Nucleolus was carried out according to the method of separating each cell component in sub-PEN-seq, and the separation effect was detected by using proteins specific to each component, wherein Western blot verifies that the separation effect of HEK293T Cytoplasm (cytoplasms, cyto), cytoplasm (Np) and Nucleolus (Nucleolus, no) is shown as A in FIG. 15, and the results show that the proteins GAPDH specific to the Cytoplasm component, FUS proteins specific to the Cytoplasm component and FBL proteins specific to the Nucleolus component are remarkably distributed in each component, and are very low in the other components, that is, the separation effect of each component of the cells is remarkable. Western blot verifies that HCT116 cytoplasm, cytoplasm and nucleolus were isolated as shown in FIG. 15B. The above illustrates that the method of the present invention is effective in separating individual cell components.

Library preparation, sequencing and data analysis were further performed according to the construction method of the sub-PEN-seq sequencing library of example 4, identifying the double-ended site and full-length sequence of known and novel K-turn RNAs and their cellular localization. The double-ended site and full length of K-turn RNA bktRNA1 identified in each cell component sub-PEN-seq of HEK293T are shown in FIG. 16, and the double-ended site and full length of K-turn RNA bktRNA1 identified in HCT116sub-PEN-seq are shown in FIG. 17, and the results show that the high-throughput sequencing of the sub-PEN-seq sequencing library constructed according to the invention can accurately identify known K-turn RNA, and the K-turn RNA of HEK293T and HCT116 cells is mainly distributed in the cytoplasm and nucleolus of the cell according to the expression analysis of the K-turn. In addition, the expression level of K-turn RNA in each cell fraction was analyzed using a heat map in HEK293T and HCT116 cell sub-PEN-seq library data, and the results are shown in FIG. 18, which shows that the expression level of K-turn RNA in the cytoplasm and nucleolus was higher than that in the cytoplasmic fraction; further, the difference in the expression levels of K-turn RNA between the different cell components in HEK293T and HCT116 cell sub-PEN-seq data was analyzed using violin plots, and the results are shown in FIG. 19, which shows that the K-turn RNA is mainly distributed in the cytoplasm and nucleolus.

The results show that the construction of a captured full-length non-coding RNA sequencing library and high-throughput sequencing by the method of the invention can detect the full length of non-coding RNA in each cell component (such as cytoplasm, cytoplasm and nucleolus).

In summary, the invention provides a construction method and application for capturing full-length non-coding RNA sequencing library, and the high-throughput sequencing is carried out on the non-coding RNA sequencing library constructed by the invention, so that the result shows that the non-coding RNA can be accurately identified, and the construction method and application have important significance for capturing the full length of various ncRNAs and researching the positioning of the non-coding RNA.

While the embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict.

Claims

1. A method of constructing a sequencing library that captures full-length non-coding RNA, comprising:

2. The method of claim 1, wherein the RNA of the sample to be tested comprises at least one of total RNA from a cell or tissue source, RNA immunoprecipitated with an RNA binding protein, and RNA from a different organelle source.

3. The method according to claim 1, wherein the 3' DNA linker is an adenylated 5' terminal 3' DNA linker having a random base;

preferably, the 3 'end of the 5' RNA linker carries a random base.

4. A method of construction according to any one of claims 1 to 3, wherein the non-target RNA comprises at least one of rRNA, snRNA, snoRNA.

5. The method according to claim 4, wherein the length of the DNA probe is 38-55nt.

6. The method according to claim 4, wherein the non-target RNA and the residual DNA probe are removed by RNase H and exonuclease RecJF, respectively.

7. The method according to claim 4, wherein the truncated reverse transcription primer sequence is as set forth in Seq ID No: 197.

8. The method of claim 4, wherein the fragment size in the captured full-length non-coding RNA sequencing library is 150bp to 1500bp.

9. A method of sequencing full-length non-coding RNA comprising constructing a sequencing library using the method of any one of claims 1 to 8; and sequencing the sequencing library.

10. The method of claim 9, wherein the sequencing is PE150 double-ended sequencing.