CN116516495A - Construction method and application for capturing full-length non-coding RNA sequencing library - Google Patents
Construction method and application for capturing full-length non-coding RNA sequencing library Download PDFInfo
- Publication number
- CN116516495A CN116516495A CN202310366344.3A CN202310366344A CN116516495A CN 116516495 A CN116516495 A CN 116516495A CN 202310366344 A CN202310366344 A CN 202310366344A CN 116516495 A CN116516495 A CN 116516495A
- Authority
- CN
- China
- Prior art keywords
- rna
- sequencing
- seq
- pen
- sequencing library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 36
- 238000012171 non-coding RNA sequencing Methods 0.000 title claims description 14
- 238000012163 sequencing technique Methods 0.000 claims abstract description 111
- 238000000034 method Methods 0.000 claims abstract description 58
- 102000042567 non-coding RNA Human genes 0.000 claims abstract description 52
- 108091027963 non-coding RNA Proteins 0.000 claims abstract description 52
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims abstract description 39
- 239000003298 DNA probe Substances 0.000 claims abstract description 29
- 108020003215 DNA Probes Proteins 0.000 claims abstract description 23
- 239000002299 complementary DNA Substances 0.000 claims abstract description 14
- 238000002156 mixing Methods 0.000 claims abstract description 14
- 238000010839 reverse transcription Methods 0.000 claims abstract description 14
- 230000008685 targeting Effects 0.000 claims abstract description 11
- 239000000523 sample Substances 0.000 claims abstract description 8
- 238000000137 annealing Methods 0.000 claims abstract description 7
- 238000012408 PCR amplification Methods 0.000 claims abstract description 4
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 4
- 210000004027 cell Anatomy 0.000 claims description 68
- 108020003224 Small Nucleolar RNA Proteins 0.000 claims description 16
- 102000042773 Small Nucleolar RNA Human genes 0.000 claims description 16
- 102000044126 RNA-Binding Proteins Human genes 0.000 claims description 14
- 102000039471 Small Nuclear RNA Human genes 0.000 claims description 11
- 108020004688 Small Nuclear RNA Proteins 0.000 claims description 11
- 101710159080 Aconitate hydratase A Proteins 0.000 claims description 9
- 101710159078 Aconitate hydratase B Proteins 0.000 claims description 9
- 101710105008 RNA-binding protein Proteins 0.000 claims description 9
- 102100034343 Integrase Human genes 0.000 claims description 7
- 210000001519 tissue Anatomy 0.000 claims description 6
- 101710203526 Integrase Proteins 0.000 claims description 5
- 239000012634 fragment Substances 0.000 claims description 5
- 108060002716 Exonuclease Proteins 0.000 claims description 4
- 102000013165 exonuclease Human genes 0.000 claims description 4
- 210000003463 organelle Anatomy 0.000 claims description 2
- 108020004417 Untranslated RNA Proteins 0.000 abstract description 4
- 102000039634 Untranslated RNA Human genes 0.000 abstract description 4
- 229920002477 rna polymer Polymers 0.000 description 185
- 239000000047 product Substances 0.000 description 51
- 108020004414 DNA Proteins 0.000 description 30
- 210000000805 cytoplasm Anatomy 0.000 description 20
- 238000007405 data analysis Methods 0.000 description 20
- 239000006228 supernatant Substances 0.000 description 20
- 230000014509 gene expression Effects 0.000 description 19
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 18
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 17
- 239000000243 solution Substances 0.000 description 17
- 238000004458 analytical method Methods 0.000 description 16
- 239000000203 mixture Substances 0.000 description 16
- 238000006243 chemical reaction Methods 0.000 description 13
- 239000003161 ribonuclease inhibitor Substances 0.000 description 13
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 12
- 229930006000 Sucrose Natural products 0.000 description 11
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 11
- 239000005720 sucrose Substances 0.000 description 11
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 10
- 230000000694 effects Effects 0.000 description 10
- 238000012165 high-throughput sequencing Methods 0.000 description 10
- 102000004169 proteins and genes Human genes 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 10
- 239000011535 reaction buffer Substances 0.000 description 10
- 239000008188 pellet Substances 0.000 description 9
- 239000002244 precipitate Substances 0.000 description 9
- 238000002360 preparation method Methods 0.000 description 9
- 238000001262 western blot Methods 0.000 description 9
- 210000003850 cellular structure Anatomy 0.000 description 7
- 239000002609 medium Substances 0.000 description 7
- 239000004055 small Interfering RNA Substances 0.000 description 7
- OZFAFGSSMRRTDW-UHFFFAOYSA-N (2,4-dichlorophenyl) benzenesulfonate Chemical compound ClC1=CC(Cl)=CC=C1OS(=O)(=O)C1=CC=CC=C1 OZFAFGSSMRRTDW-UHFFFAOYSA-N 0.000 description 6
- 239000012591 Dulbecco’s Phosphate Buffered Saline Substances 0.000 description 6
- KFZMGEQAYNKOFK-UHFFFAOYSA-N Isopropanol Chemical compound CC(C)O KFZMGEQAYNKOFK-UHFFFAOYSA-N 0.000 description 6
- 238000002123 RNA extraction Methods 0.000 description 6
- 239000011324 bead Substances 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 238000001114 immunoprecipitation Methods 0.000 description 6
- 239000006166 lysate Substances 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 102000016911 Deoxyribonucleases Human genes 0.000 description 5
- 108010053770 Deoxyribonucleases Proteins 0.000 description 5
- 108020005198 Long Noncoding RNA Proteins 0.000 description 5
- 108700020471 RNA-Binding Proteins Proteins 0.000 description 5
- 239000000872 buffer Substances 0.000 description 5
- 238000004113 cell culture Methods 0.000 description 5
- 239000005547 deoxyribonucleotide Substances 0.000 description 5
- 125000002637 deoxyribonucleotide group Chemical group 0.000 description 5
- 229960003722 doxycycline Drugs 0.000 description 5
- XQTWDDCIUJNLTR-CVHRZJFOSA-N doxycycline monohydrate Chemical compound O.O=C1C2=C(O)C=CC=C2[C@H](C)[C@@H]2C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@@H]1[C@H]2O XQTWDDCIUJNLTR-CVHRZJFOSA-N 0.000 description 5
- 108020004999 messenger RNA Proteins 0.000 description 5
- 239000002679 microRNA Substances 0.000 description 5
- 239000013642 negative control Substances 0.000 description 5
- 239000002773 nucleotide Substances 0.000 description 5
- 125000003729 nucleotide group Chemical group 0.000 description 5
- 210000004940 nucleus Anatomy 0.000 description 5
- 125000006850 spacer group Chemical group 0.000 description 5
- 102000052510 DNA-Binding Proteins Human genes 0.000 description 4
- 101710116602 DNA-Binding protein G5P Proteins 0.000 description 4
- 108091007412 Piwi-interacting RNA Proteins 0.000 description 4
- 101710162453 Replication factor A Proteins 0.000 description 4
- 101710176758 Replication protein A 70 kDa DNA-binding subunit Proteins 0.000 description 4
- 101710176276 SSB protein Proteins 0.000 description 4
- 101710126859 Single-stranded DNA-binding protein Proteins 0.000 description 4
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 4
- 108020004566 Transfer RNA Proteins 0.000 description 4
- 230000006154 adenylylation Effects 0.000 description 4
- 239000013592 cell lysate Substances 0.000 description 4
- 210000000170 cell membrane Anatomy 0.000 description 4
- 230000001413 cellular effect Effects 0.000 description 4
- 238000005119 centrifugation Methods 0.000 description 4
- 108091092330 cytoplasmic RNA Proteins 0.000 description 4
- 230000001086 cytosolic effect Effects 0.000 description 4
- 235000019441 ethanol Nutrition 0.000 description 4
- 238000003197 gene knockdown Methods 0.000 description 4
- 230000030279 gene silencing Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 108091070501 miRNA Proteins 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000002473 ribonucleic acid immunoprecipitation Methods 0.000 description 4
- 239000002342 ribonucleoside Substances 0.000 description 4
- 108020004463 18S ribosomal RNA Proteins 0.000 description 3
- 102100024365 Arf-GAP domain and FG repeat-containing protein 1 Human genes 0.000 description 3
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 239000013614 RNA sample Substances 0.000 description 3
- 238000011529 RT qPCR Methods 0.000 description 3
- 108091028664 Ribonucleotide Proteins 0.000 description 3
- 108091027967 Small hairpin RNA Proteins 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 230000006037 cell lysis Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000010828 elution Methods 0.000 description 3
- -1 i.e. Proteins 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 239000002336 ribonucleotide Substances 0.000 description 3
- 125000002652 ribonucleotide group Chemical group 0.000 description 3
- 230000007928 solubilization Effects 0.000 description 3
- 238000005063 solubilization Methods 0.000 description 3
- 238000013518 transcription Methods 0.000 description 3
- 230000035897 transcription Effects 0.000 description 3
- 238000005406 washing Methods 0.000 description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 2
- 102000014914 Carrier Proteins Human genes 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 108091092195 Intron Proteins 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 101710086015 RNA ligase Proteins 0.000 description 2
- 238000000505 RNA structure prediction Methods 0.000 description 2
- 108020004682 Single-Stranded DNA Proteins 0.000 description 2
- 108091007415 Small Cajal body-specific RNA Proteins 0.000 description 2
- VMHLLURERBWHNL-UHFFFAOYSA-M Sodium acetate Chemical compound [Na+].CC([O-])=O VMHLLURERBWHNL-UHFFFAOYSA-M 0.000 description 2
- 101710120037 Toxin CcdB Proteins 0.000 description 2
- 230000027455 binding Effects 0.000 description 2
- 108091008324 binding proteins Proteins 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 210000004238 cell nucleolus Anatomy 0.000 description 2
- 210000003855 cell nucleus Anatomy 0.000 description 2
- 238000012258 culturing Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 101150077246 gas5 gene Proteins 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- 230000007762 localization of cell Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 230000002018 overexpression Effects 0.000 description 2
- 230000001376 precipitating effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000001632 sodium acetate Substances 0.000 description 2
- 235000017281 sodium acetate Nutrition 0.000 description 2
- 239000011780 sodium chloride Substances 0.000 description 2
- 239000002904 solvent Substances 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000002604 ultrasonography Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 238000003260 vortexing Methods 0.000 description 2
- 108020004465 16S ribosomal RNA Proteins 0.000 description 1
- 108020004565 5.8S Ribosomal RNA Proteins 0.000 description 1
- 108020005075 5S Ribosomal RNA Proteins 0.000 description 1
- 229920000936 Agarose Polymers 0.000 description 1
- 241000203069 Archaea Species 0.000 description 1
- 101100476465 Bacillus subtilis (strain 168) rplGB gene Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- 241000208199 Buxus sempervirens Species 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 101100041503 Geobacillus stearothermophilus ybxF gene Proteins 0.000 description 1
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 description 1
- 229920002527 Glycogen Polymers 0.000 description 1
- 108020005004 Guide RNA Proteins 0.000 description 1
- 238000012156 HITS-CLIP Methods 0.000 description 1
- 101000917519 Homo sapiens rRNA 2'-O-methyltransferase fibrillarin Proteins 0.000 description 1
- 108700036248 MT-RNR1 Proteins 0.000 description 1
- 108700011259 MicroRNAs Proteins 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 102000007999 Nuclear Proteins Human genes 0.000 description 1
- 108010089610 Nuclear Proteins Proteins 0.000 description 1
- 102000043141 Nuclear RNA Human genes 0.000 description 1
- 108020003217 Nuclear RNA Proteins 0.000 description 1
- 108010019160 Pancreatin Proteins 0.000 description 1
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 1
- 108020005093 RNA Precursors Proteins 0.000 description 1
- 230000026279 RNA modification Effects 0.000 description 1
- 102000003890 RNA-binding protein FUS Human genes 0.000 description 1
- 108090000292 RNA-binding protein FUS Proteins 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 108091030071 RNAI Proteins 0.000 description 1
- 102000006382 Ribonucleases Human genes 0.000 description 1
- 108010083644 Ribonucleases Proteins 0.000 description 1
- 240000004808 Saccharomyces cerevisiae Species 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 108091012456 T4 RNA ligase 1 Proteins 0.000 description 1
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004873 anchoring Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 239000008004 cell lysis buffer Substances 0.000 description 1
- 239000006285 cell suspension Substances 0.000 description 1
- 230000030570 cellular localization Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000000749 co-immunoprecipitation Methods 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000004925 denaturation Methods 0.000 description 1
- 230000036425 denaturation Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000007865 diluting Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 229940088598 enzyme Drugs 0.000 description 1
- 238000001976 enzyme digestion Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000009368 gene silencing by RNA Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 description 1
- 229940096919 glycogen Drugs 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000007169 ligase reaction Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 229940055695 pancreatin Drugs 0.000 description 1
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 238000012257 pre-denaturation Methods 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000006916 protein interaction Effects 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 102100029526 rRNA 2'-O-methyltransferase fibrillarin Human genes 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000009711 regulatory function Effects 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000007790 scraping Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 125000000446 sulfanediyl group Chemical group *S* 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 125000005287 vanadyl group Chemical group 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
Abstract
The invention discloses a construction method and application of a sequencing library for capturing full-length non-coding RNA. The method comprises the following steps: s1, obtaining RNA of a sample to be detected, and respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of the RNA to obtain an RNA connection product; s2, mixing the RNA connection product with a DNA probe targeting non-target RNA for annealing, and removing the non-target RNA and the residual DNA probe to obtain a target RNA connection product; s3, designing a truncated reverse transcription primer aiming at a target RNA connection product, synthesizing cDNA, and then carrying out PCR amplification on the cDNA by using the primer containing an anchor base. The construction method can capture the terminal information of the ncRNA, improve the ratio of the sequencing sequences of the target ncRNA, obviously reduce useless sequencing and reduce the sequencing cost, and simultaneously improve the accuracy of detecting the middle-low abundance ncRNA.
Description
Technical Field
The invention relates to the technical field of molecular biology, in particular to a construction method and application for capturing a full-length non-coding RNA sequencing library.
Background
In addition to transcription of messenger RNA (mRNA) encoding a protein, human genome can also be transcribed to produce a large amount of RNA that does not encode a protein, i.e., non-coding RNA (ncRNA). Non-coding RNAs that have been found to include: tRNA and rRNA involved in protein synthesis; snRNA involved in RNA processing; box C/D snoRNA and box H/ACA snoRNA involved in RNA modification; miRNA, piRNA, circRNA and lncRNA involved in mRNA post-transcriptional regulation, and the like, and mutation and abnormal expression of these non-coding RNAs are closely related to major human diseases such as tumors. As a key regulatory molecule for genetic information, non-coding RNAs need to be processed to a specific length after transcription and interact with RNA-binding proteins to exert their regulatory function. Taking Kink-turn (K-turn) type RNA as an example, it is a ncRNA comprising a K-turn three-dimensional structure formed by a C box (conserved motif RUGAUGA) and a D box (conserved motif CUGA), the length of which is usually 60 to 200nt, the assembly of 2' -O-methylation modification or splice complex of guide RNA by a binding protein 15.5kDa (abbreviated as 15.5K, homologous protein Snu p in yeast, L7Ae in archaea, ybxF/YbxQ in bacteria) with the K-turn structure.
In the related art, the identification of the full-length sequence of the ncRNA is an effective means for analyzing the sequence, and mainly comprises an RNA sequence comparison method, an RNA structure prediction method, RACE (rapid-amplification of cDNA ends, namely cDNA end rapid cloning technology) and a high-throughput sequencing method. The RNA sequence comparison method and the RNA structure prediction method mainly depend on the sequence and structure conservation of ncRNA to predict the tail end of RNA, and determine the full-length sequence of the RNA based on the sequence, and the method is high in efficiency but low in accuracy; the RACE-based method can accurately identify the full-length sequence of the end of RNA, but the technical flux is too low; the method can efficiently and accurately analyze the tail end of RNA (ribonucleic acid), namely the full-length sequence, based on a high-throughput sequencing method, but the current sequencing method of the full-length sequence of the ncRNA mainly aims at the long ncRNA with a polyA tail structure of small RNAs such as miRNA, piRNA and the like or similar mRNAs such as lncRNA and the like; for the ncRNA with medium length, low abundance and no polyA such as K-turn RNA, no specific method for high-throughput sequencing analysis is available. Therefore, how to specifically capture the ncRNA with medium length and low abundance such as K-turn RNA and the full-length sequence analysis thereof is still the biggest technical problem in the RNA research field.
In recent years, a number of techniques for capturing RNA and RNA binding protein interactions, such as RIP-seq and CLIP-seq, etc., have been developed by some researchers. These techniques fall into two main categories: (1) Capturing RNA interacted with the RNA through an immunoprecipitation mode of the RNA binding protein, then fragmenting the RNA, carrying out reverse transcription by using a random primer, and then carrying out library construction sequencing; (2) The RNA region not bound by the RNA binding protein is digested by enzyme digestion, and RNA fragments interacted with the RNA binding protein are captured by immunoprecipitation of the RNA binding protein, and are connected through an RNA connector for library construction. Although the above methods can study RNA and RNA binding proteins that interact with RNA in high throughput, there are also some significant drawbacks: first, the information obtained is RNA fragment information, and the full-length information of the interacting RNA cannot be obtained, so that it cannot be excluded whether the interaction exists in the RNA precursor or the mature body; second, in the data obtained by sequencing, reads of high abundance RNAs predominate (e.g., rRNA, snRNA, tRNA, etc.), and too much such useless data results in a severe compression of the amount of useful data required, affecting data quality and result resolution. Therefore, how to capture the full length of each ncrnas and increase their duty cycle in sequencing data remains a major challenge.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a construction method and application for capturing a full-length non-coding RNA sequencing library, which can improve the ratio of sequencing sequences (reads) of target ncRNA in the sequencing library, obviously reduce useless sequencing, reduce sequencing cost and improve the accuracy of detecting the middle-low abundance ncRNA.
The invention also provides a method for sequencing the full-length non-coding RNA.
In a first aspect of the invention, there is provided a method of constructing a captured full-length non-coding RNA sequencing library, comprising:
s1, obtaining RNA of a sample to be detected, and respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of the RNA to obtain an RNA connection product;
s2, mixing the RNA connection product with a DNA probe targeting non-target RNA, and annealing to remove the non-target RNA and the residual DNA probe to obtain a target RNA connection product;
s3, designing a truncated reverse transcription primer aiming at the target RNA connection product, synthesizing cDNA, and then carrying out PCR amplification on the cDNA by using the primer containing the anchor base to obtain a captured full-length non-coding RNA sequencing library.
The construction method according to the embodiment of the invention has at least the following beneficial effects:
(1) Firstly, respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of RNA to capture double-end information of the RNA; and then annealing the RNA connection product and a DNA probe of the target non-target RNA, and respectively digesting the non-target RNA and the digested DNA probe by using RNase H and single-stranded DNA exonuclease RecJF, so that the interference of the non-target RNA and the residual DNA probe to subsequent experiments is effectively reduced, after the enriched target RNA connection product is obtained, a truncated reverse transcription primer is used for synthesizing cDNA, and then a primer containing an anchor base is used for PCR amplification, so that the accuracy of identifying the tail end of the RNA is improved.
(2) In the related technology, random primers are adopted for reverse transcription to acquire sequence information, and can not acquire information of two ends of RNA, but in the invention, the double ends of the RNA are effectively anchored by connecting a connector on a target RNA chain, so that the double-end information of the RNA can be acquired in a single base precision level through sequencing; and if the corresponding linker is not attached, the RNA duplex cannot be anchored. Therefore, the method can obtain the end with single base precision, and provides an effective means for accurately researching the structure and motif (motif) characteristics of the ncRNA and discovering novel types of ncRNA.
(3) In the invention, the two ends of the RNA are respectively connected with the 3'DNA connector and the 5' RNA connector to capture double-end information of the RNA, and in the process, the coding RNA (such as mRNA) has a 5 'end cap structure, so that the coding RNA cannot be connected with the 5' RNA connector when the connectors are connected, and further, the interference of the coding RNA on the construction and identification results of the full-length non-coding RNA library can be effectively avoided.
(4) The method for constructing the full-length non-coding RNA sequencing library can greatly improve the sequence ratio of target ncRNA in the sequencing library, obviously reduce useless sequencing and reduce the sequencing cost, and simultaneously can effectively improve the accuracy of detecting the middle-low abundance ncRNA.
In some embodiments of the invention, the full-length non-coding RNA comprises at least one of tRNA, rRNA, snRNA, snoRNA, scaRNA, miRNA, piRNA, circRNA and lncRNA.
Preferably, the non-coding RNA is a non-coding RNA of medium length and low abundance.
Preferably, the non-coding RNA is a Kink-turn type RNA.
In some embodiments of the invention, the RNA of the test sample is RNA from which genomic DNA is removed.
Preferably, the method for removing genomic DNA comprises: adding RQ1 DNase 1×reaction Buffer, 2U/. Mu. L RiboLock RNase Inhibitor and RQ1 RNase-Free DNase into RNA of a sample to be detected, reacting for 30 minutes at 37 ℃, and purifying the RNA by using an RNAClean & Concentrator-5 kit.
Wherein the RQ1 RNase-Free DNase can be Promega product with the product number of M6101; the RNA clear & Concentrator-5 can be specifically ZYMO RESEARCH product with the product number of R1015.
In some embodiments of the invention, the RNA of the test sample comprises at least one of total RNA of cellular origin, total RNA of tissue origin, RNA immunoprecipitated with RNA binding protein, RNA of different organelle origin.
Preferably, the RNA immunoprecipitated by the RNA binding protein comprises 15.5K immunoprecipitated RNA.
In some embodiments of the invention, the total RNA of cellular or tissue origin may be obtained using a TRIzol RNA extraction method.
In some embodiments of the invention, the 3' dna linker is an adenylated 5' terminal random base 3' dna linker;
preferably, the 3' dna linker is: rApNNNNNNTGGAATTCTCGGGTGCCAAGG-C3 Spacer, wherein rApp is adenylation modification, NNNNNN is deoxyribonucleotide of six random bases, N represents any one of four deoxyribonucleotides A, T, C, G, and C3 Spacer is a blocking group.
Preferably, the 3'DNA linker not linked is removed using 5' deadenylase, single-stranded DNA binding protein and RecJf; specifically, the 5' deanylase is reacted at 28-32 deg.c for 0.8-1.2 hr, the single stranded DNA binding protein is reacted on ice for 25-35 min and the RecJF is reacted at 36-38 deg.c for 0.8-1.2 hr.
In some embodiments of the invention, the 3 'end of the 5' rna linker carries random bases.
Preferably, the nucleotide sequence of the 5' rna linker is: guucagagucuacaguccgacgaucnnnn, wherein NNNNNN represents ribonucleotides of six random bases and N represents any one of A, U, C, G ribonucleotides.
Since the total RNA contains the highest abundance of rRNA, snRNA and snoRNA, the identification of the novel non-coding RNA species is also most affected, and thus the above 3 RNA species are taken as an example for the removal operation in the present invention.
Specifically, when the full-length non-coding RNA is selected from at least one of tRNA, scaRNA, miRNA, piRNA, circRNA and lncRNA, the non-target RNA comprises at least one of rRNA, snRNA, snoRNA.
Preferably, when the full-length non-coding RNA is a king-turn type RNA, the non-target RNA comprises at least one of rRNA, snRNA, snoRNA.
Preferably, the rRNA comprises 28S rRNA, 18S rRNA, 5.8S rRNA, 5S rRNA, 12S rRNA, and 16S rRNA; the snRNA comprises U1, U2, U4, U5, U6, U11, U12, U4atac and U6atac; the snoRNA includes SNORD101, SNORD20 and SNORA23.
In some embodiments of the invention, the nucleotide sequence of the snRNA-targeted DNA probe is shown in SEQ ID NOS.1-29.
In some embodiments of the invention, the nucleotide sequence of the snoRNA-targeting DNA probe is shown in SEQ ID No. 30-196.
In some embodiments of the invention, the DNA probe has a length of 38-55nt.
In some embodiments of the invention, the DNA probe has a length of 40-55nt. Probe spacer sequences less than 10nt DNA probes were designed to target non-target RNAs.
In some embodiments of the invention, the annealing temperature is 70-80 ℃.
In some embodiments of the invention, the RNA ligation product is mixed with the DNA probe in equal mass.
In some embodiments of the invention, the non-target RNA and the residual DNA probe are removed using an RNase H enzyme and an exonuclease RecJf, respectively.
In some embodiments of the invention, the truncated reverse transcription primer sequence is set forth in SEQ ID NO. 197.
The invention uses truncated reverse transcription primer, which can reduce the mismatch probability.
In some embodiments of the invention, the fragment size in the non-coding RNA sequencing library is 150bp to 1500bp.
Preferably, the fragment size in the non-coding RNA sequencing library is 150 bp-700 bp.
In a second aspect of the invention, there is provided a method of sequencing full length non-coding RNA comprising constructing a sequencing library using the method described above; and sequencing the sequencing library.
In some embodiments of the invention, the sequencing is PE150 double-ended sequencing.
The sequencing library of the invention can be combined with RNA from different sources for construction and sequencing analysis. For example, a library of PEN-seq (Sequencing of Paired-Ends of NcRNAs, PEN-seq) sequencing against total RNA of cellular or tissue origin; sub-PEN-seq sequencing library for RNA of each cell component (Sequencing of Paired-Ends of subcellular NcRNAs, sub-PEN-seq), RIP-PEN-seq sequencing library for RNA immunoprecipitated with RNA binding proteins (RNA ImmunoPrecipitation coupled with sequencing of Paired-Ends of NcRNAs, RIP-PEN-seq), etc. The construction and sequencing of the sub-PEN-seq sequencing library comprises the steps of separating RNA (such as cytoplasmic RNA, nuclear RNA, nucleolus RNA and the like) of each component of a cell, and then carrying out double-end sequencing of the ncRNA and determination of the full-length sequence by adopting a PEN-seq strategy. While construction and sequencing of the RIP-PEN-seq sequencing library involves RNA immunoprecipitation using a K-turn RNA specific binding protein, enrichment of ncRNA, and then enrichment of the K-turn RNA ligation product in combination with PEN-seq. The RIP-PEN-seq technology combines the RNA co-immunoprecipitation technology and the PEN-seq technology, and can accurately identify the full-length sequence of RNA while capturing RNA interaction.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The invention is further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a schematic diagram of the construction of sequencing libraries of PEN-seq, RIP-PEN-seq and sub-PEN-seq according to the present invention.
FIG. 2 is a flow chart of the invention for identifying RNA double-ended and full-length sequences based on PEN-seq, RIP-PEN-seq and sub-PEN-seq libraries.
FIG. 3 is a schematic representation of the sequencing data analysis flow of the present invention.
FIG. 4 is a computer analysis flow for K-turn RNA identification based on the K-turn result motif contained in the K-turn RNA of the present invention;
FIG. 5 is a graph showing the effect of detecting stable knockdown of HEK293T cells by qPCR and Western blot experiments;
FIG. 6 is a graph showing the comparison of the known K-turn RNA (i.e., box C/DsnoRNA) start point with the annotated start point identified using PEN-seq in the shNC of the negative control cell of the present invention.
FIG. 7 is a graph showing the comparison of known K-turn RNA (i.e., box C/DsnoRNA) end points identified using PEN-seq with annotated end points in a negative control cell shNC according to the present invention.
FIG. 8 is a double-ended site and full length of K-turn RNAbktRNA1 in shNC, sh15.5K-1 and sh15.5K-2 cells visualized using IGVs according to the invention.
FIG. 9 is a graph of the expression levels of K-turn RNA in shNC, sh15.5K-1, and sh15.5K-2 cells using a heat map according to an embodiment of the present invention.
FIG. 10 is a graph of variation in K-turn RNA expression levels in shNC, sh15.5K-1, and sh15.5K-2 using violin in accordance with the present invention.
FIG. 11 shows the Western blot of the invention demonstrating the overexpression of FLAG-15.5K in a cell line stably expressing FLAG-15.5K (A) and immunoprecipitation of FLAG-15.5K protein (B), wherein pCGP is a negative control cell.
FIG. 12 is a graph (A) of the results of a comparison of the start point of a known K-turn RNA (i.e., box C/D snoRNA) with the annotated start point, and a graph (B) of the results of a comparison of the end point of a known K-turn RNA (i.e., box C/D snoRNA) with the annotated end point, as identified by the 15.5K RIP-PEN-seq of the present invention.
FIG. 13 is a double-ended site and full length of K-turn RNA in 10 GAS5 introns identified using UCSC visualization 15.5K RIP-PEN-seq, wherein Coverage indicates full length of RNA and expression levels, according to the present invention.
FIG. 14 is a double-ended site and full length of K-turn RNAbktRNA1 in the CWD19L1 intron identified using UCSC visualization 15.5K RIP-PEN-seq, wherein Coverage indicates full length of RNA and expression levels, and Conservation is evolutionary Conservation of bktRNA1 in vertebrates 100, in accordance with the present invention.
FIG. 15 shows the Western blot of the present invention demonstrating the separation effect of Cytoplasm (Cyto), cytoplasm (Np) and Nucleolus (Nucleolus, no) in HEK293T cells (A) and HCT116 (B) cells.
FIG. 16 shows the double-ended and full-length sites of K-turn RNAbktRNA1 identified in each cell fraction sub-PEN-seq of HEK293T of the present invention.
FIG. 17 is a double-ended site and full length of K-turn RNAbktRNA1 identified in HCT116 sub-PEN-seq of the present invention.
FIG. 18 is a graph of the expression levels of K-turn RNA in individual cell fractions in HEK293T and HCT116 cell sub-PEN-seq data using a thermal map according to the present invention.
FIG. 19 is a graph of the use of violin to analyze the differences in K-turn RNA expression levels between different cell components in HEK293T and HCT116 cell sub-PEN-seq data in accordance with the present invention.
Detailed Description
The conception and the technical effects produced by the present invention will be clearly and completely described in conjunction with the embodiments below to fully understand the objects, features and effects of the present invention. It is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments, and that other embodiments obtained by those skilled in the art without inventive effort are within the scope of the present invention based on the embodiments of the present invention.
In the description of the present invention, the descriptions of the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
In an embodiment of the invention, the solvent of the cell membrane lysate is 10mM Tris-HCl buffer pH 7.5, the solutes and their concentrations are as follows: 10mM NaCl,3mM MgCl 2 0.3% (by volume) NP-40, 10% (by volume) glycerol, 1mM DTT,100U/mL RiboLock RNase Inhibitor,400 μ M Ribonucleoside Vanadyl Complex, wherein RiboLock RNase Inhibitor is specifically a Thermo Fisher product, cat# EO0381; ribonucleoside Vanadyl Complex is NEB product with the product number S1402S.
In an embodiment of the invention, the S1 sucrose solution formulation is: 0.25M sucrose, 10mM MgCl 2 1mM DTT,100U/mL RiboLock RNase Inhibitor, 400. Mu. M Ribonucleoside Vanadyl Complex; the S2 sucrose solution comprises the following components: 0.34M sucrose, 5mM MgCl 2 1mM DTT,100U/mL RiboLock RNase Inhibitor, 400. Mu. MRibonucleoside Vanadyl Complex. The S3 sucrose solution comprises the following components: 0.88M sucrose, 5mM MgCl 2 ,1mM DTT,100U/mL RiboLock RNase Inhibitor,400μM Ribonucleoside Vanadyl Complex。
In an embodiment of the invention, the solvent of the RIP binding solution is 50mM Tris-HCl buffer solution with pH of 7.5, and the solute is as follows: 150mM NaCl,1mM MgCl2,0.05% (volume percent) NP-40, 20mM EDTA-Na2,1mM DTT,100U/mL RiboLock RNase Inhibitor, 1X Protease Inhibitor Cocktail.
The specific conditions are not noted in the examples and are carried out according to conventional conditions or conditions recommended by the manufacturer. The reagents or apparatus used were conventional products commercially available without the manufacturer's attention.
In an embodiment of the invention, the PEN-seq sequencing library construction is using total RNA of cellular or tissue origin; RIP-PEN-seq sequencing library construction RNA immunoprecipitated using RNA binding proteins RNA from cell fractions was constructed using the sub-PEN-seq sequencing library.
In the embodiment of the invention, the basic process of sequencing library construction is shown in FIG. 1, and specifically comprises the following steps: after RNA is obtained, an adenylated DNA joint with random bases is connected to the 3 'end of the ncRNA, and an RNA joint with random bases is connected to the 5' end; then, specific DNA probes are designed for non-target RNAs (such as rRNA, snRNA, snorRNA and the like), after annealing the ligation products and the DNA probes, the non-target RNAs are digested by RNase H, then the DNA probes are digested by single-stranded DNA 5'-3' exonuclease RecJF, the target ncRNA ligation products are enriched, then the target RNA ligation products are transcribed into cDNA by using truncated reverse transcription primers, and the cDNA is amplified by using primers containing anchored bases, so that a sequencing library is obtained, and then the subsequent double-ended PE150 high-throughput sequencing is performed.
The method for capturing the non-coding RNA to construct the sequencing library, combined with the novel double-end sequencing technology, can effectively improve the ratio of sequencing sequences (reads) of target ncRNA in data and reduce the sequencing cost.
Example 1 method for construction of sequencing library
The embodiment of the invention provides a method for constructing a sequencing library, which comprises the steps of cell culture, total RNA extraction, DNase removal of genomic DNA in RNA, 3'DNA joint connection and residual joint removal, 5' RNA joint connection, non-target RNA removal, reverse transcription, library amplification and the like.
1. Extraction of total RNA from cells
After 1mL of Trizol was added after the medium was discarded and the lysate was transferred to a 1.5mL centrifuge tube after 10 minutes of room temperature lysis, 200 μl of chloroform was added, vortexed and mixed for 15 seconds and left at room temperature for 3 minutes, when the cells in the 6 well plate were grown to about 90% confluency. Centrifuge 13000 Xg for 10 min at 4℃and leave 500. Mu.L of supernatant, add 500. Mu.L of isopropanol, mix well and precipitate for 10 min at room temperature. Then centrifugation was performed at 20000 Xg for 10 minutes at 4℃and the supernatant was discarded, and the RNA pellet was washed by adding 1mL of 75% ethanol (DEPC water configuration) and then centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. This procedure was repeated once. The precipitate was washed once with 1mL of absolute ethanol, centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. The RNA pellet was dried in vacuo. 30. Mu.L of DEPC water was added for solubilization, and then the RNA concentration was determined using NanoDrop (total RNA required for RNA integrity to meet 28SrRNA:18S rRNA approximately equal to 2, A260/A280 greater than 2, A260/A230 greater than 2), and either the next experiment was directly performed or-80℃for storage of acceptable total RNA samples.
2. Removal of genomic DNA from Total RNA
Mu.g of total RNA sample was taken, 5. Mu.L of RQ1 DNase 10 Xreaction Buffer (RQ 1 RNase-Free Dnase kit Reaction Buffer), 2.5. Mu. L RiboLock RNase Inhibitor and 5. Mu.L of RQ1 RNase-Free Dnase (Promega Co., product, cat# M6101) were added, and DEPC water was further added to 50. Mu.L; the reaction was carried out at 37℃for 30 minutes. The RNA from the above reaction was then purified using RNAClean & Concentrator-5. After concentration measurement using NanoDrop, the samples were directly subjected to subsequent linker ligation or stored at-80℃after elution with 12. Mu.L of DEPC water.
3. 3' DNA linker ligation
(1) Adenylation treatment of 3' DNA linkers
500pmol of 3'DNA adaptor was added to 10. Mu.L of 10X 5'DNA Adenylation Reaction Buffer (Mth RNA ligation kit buffer), 10. Mu.L of riboLock 1mM ATP and 10. Mu.L of Mth RNA ligation (NEB Co., ltd.; product No. M2610), and DEPC water was added to 100. Mu.L; the reaction was carried out at 65℃for 2 hours and the enzyme was inactivated at 85℃for 10 minutes. The DNA linker was then purified using Oligo Clean & Concentrator, and then eluted with 20. Mu.L of DEPC water, and after concentration was measured using NanoDrop, the concentration of the DNA linker was adjusted to 20. Mu.M to give an adenylated 3'DNA linker, wherein the adenylated 3' DNA linker was: rApNNNNNNTGGAATTCTCGGGTGCCAAGG-C3 Spacer, wherein rApp is adenylation modification, NNNNNN is deoxyribonucleotide of six random bases, N represents any one of four deoxyribonucleotides A, T, C, G, and C3 Spacer is a blocking group.
(2) 3' DNA linker ligation
500ng of the total RNA from which the genomic DNA was removed was taken, DEPC water was added to 10.5. Mu.L, and 0.5. Mu.L of the adenylated 3' DNA linker was added thereto, and after mixing, the mixture was denatured at 70℃for 2 minutes, and then immediately placed on ice for 2 minutes. Then, 2. Mu.L of 10 XT 4 RNALigase 2,truncated KQ reaction buffer (T4 RNALigase 2, truncated KQ kit reaction buffer), 5. Mu.L of PEG 8000MW (50%), 1. Mu. L RiboLock RNase Inhibitor and 1. Mu. L T4 RNALigase 2, truncated KQ (NEB Co., ltd., product No. M0373) were added, and after mixing, the mixture was reacted at 16℃for 18 hours.
(3) Removal of residual joints
2. Mu.L of 5' deadienase (product of NEB company, product of product No. M0331) was added to the reaction system, and after mixing, the mixture was reacted at 30℃for 1 hour, and then 2. Mu.g of single-stranded DNA binding protein (product of Promega company, product of product No. M3011) was added, and after mixing, the mixture was reacted on ice for 30 minutes, and then 2. Mu.L of RecJF (product of NEB company, product of product No. M0264) was added, and after mixing, the mixture was reacted at 37℃for 1 hour.
4. 5' RNA linker ligation
2. Mu.L (40 pmol) of denatured 5'RNA linker, 2. Mu.L of 10 XT 4 RNA Ligase reaction buffer (T4 RNA Ligase 1-matched reaction buffer), 2.56. Mu.L of PEG 8000MW (50%), 1. Mu. L RiboLock RNase Inhibitor, 4. Mu.L of 10mM ATP and 4. Mu. L T4 RNA Ligase 1 (NEB Co., product, cat. No. M0204) were added to the reaction system after removal of the residual linker, and after mixing, the reaction was allowed to proceed for 18 hours at 16℃and then the RNA thus reacted was purified by RNAClean & Concentrator-5 and eluted with 12. Mu.L of DEPC water to give an RNA ligation product, wherein the nucleotide sequence of the 5' RNA linker was: guucagagucuacaguccgacgaucnnnn, wherein NNNNNN represents ribonucleotides of six random bases and N represents any one of A, T, C, G four deoxyribonucleotides.
5. Removal of non-target RNA
11.2. Mu.L of the RNA ligation product obtained above was taken, 0.8. Mu.L of a DNA probe (50. Mu.M) targeting non-target RNA was added, 3. Mu.L of 5 Xannealing buffer was mixed, reacted at 95℃for 2 minutes, then cooled to 22℃at 0.1℃per second, kept at 22℃for 5 minutes, and then placed on ice.
DNA probes targeting rRNA reference published literature (Adiconis, X., et al, comparative analysis of RNA sequencing Methods for degraded or low-input samples Nat Methods,2013.10 (7): p.623-9.) the DNA probe sequences targeting snRNA are shown in Table 1.
Table 1: DNA probe sequence information targeting snRNA
The sequence of the DNA probe targeting the snoRNA is shown in SEQ ID NO. 30-196 (Table 2).
Table 2: DNA probe sequence information targeting snoRNA
/>
/>
/>
Then, 2. Mu.L of 10X RNase H reaction buffer (RNase H-supporting reaction buffer), 0.2. Mu. L RiboLock RNase Inhibitor and 2. Mu.L of RNase H (NEB Co., ltd., product No. M0297) were added to the above reaction system, and DEPC water was added to 20. Mu.L. After mixing, the mixture was reacted at 37℃for 30 minutes, and then the RNA thus reacted was purified using RNA Clean & Concentrator-5, and finally eluted with 22. Mu.L of DEPC water.
6. Removal of DNA probes
21.5. Mu.L of the above product from which non-target RNA was removed was denatured at 70℃for 2 minutes, immediately placed on ice for 2 minutes, then 3. Mu.L of 10 XNEBuffer 2 (RecJF kit reaction buffer) was added, 1. Mu. L RiboLock RNase Inhibitor and 7. Mu.g of single-stranded DNA binding protein were mixed, and placed on ice for 30 minutes. mu.L RecJF was added and reacted at 37℃for 1 hour to digest the DNA probe, followed by RNA purification using RNA Clean & Concentrator-5. The samples were eluted with 12. Mu.L of DEPC water and either directly subjected to the next reaction or stored at-80 ℃.
7. Reverse transcription reaction
Taking 11.5. Mu.L of the above DNA probe-removed product, adding 0.5. Mu.L of 40. Mu.M truncated reverse transcription primer, mixing, denaturing at 65℃for 5 minutes, immediately placing on ice, then adding 4. Mu.L of 5 XRT buffer (Thermo Fisher Co., product under the name 18090050), 1. Mu.L of 100mM DTT, 1. Mu.L of 10mM dNTPs, 1. Mu. L RiboLock RNase Inhibitor and 1. Mu. LSuperScript IV Reverse Transcriptase, mixing, and reacting at 50℃for 60 minutes, wherein the nucleotide sequence of the truncated reverse transcription primer is as follows: GCCTTGGCACCCGAGAAT (SEQ ID NO. 197).
To the reaction system, 4. Mu.L of Exoneclease I (NEB product, cat. No. M0293) and 4. Mu.L of rSAP (NEB product, cat. No. M0371) were added, and reacted at 37℃for 15 minutes. Then, 5. Mu.L of 0.5M EDTA and 7. Mu.L of 1M NaOH were added thereto, and the mixture was stirred well and reacted at 70℃for 12 minutes. cDNA purification was then performed using an Oligo Clean & Concentrator. The cDNA was obtained by eluting with 16. Mu.L of DEPC water.
8. Sequencing adapter ligation
mu.L of the cDNA obtained above was taken, and 25. Mu. L NEBNext Ultra II Q5 of 5 Master Mix (NEB Co., ltd., product No. M0544) 5. Mu.L of RP1 (10. Mu.M) and 5. Mu.L of RPI-X (10. Mu.M, comprising a series of primers containing different INDEXs and containing bases for anchoring 3' DNA linkers on the primers) were added to carry out PCR reaction.
Wherein the primer sequences of RP1 and RPI 1-12 are shown in Table 3.
Table 3: primer sequences
Note that: the underlined parts of the table are the inserted INDEX sequences and the part is the thio modification.
The PCR reaction procedure was: pre-denaturation at 98 ℃ for 30 seconds; 15 cycles: denaturation at 98℃for 10 sec, annealing at 65℃for 75 sec; extending at 65deg.C for 5 min, and preserving at 4deg.C.
After completion of the PCR reaction, purification was performed using DNA Clean & Concentator-5, elution was performed with 20. Mu.L of enzyme-free water, then electrophoresis was performed using 4% low melting agarose, bands ranging from 150 to 700bp were recovered using Zymoclean Gel DNA Recovery Kit, then elution was performed with 18. Mu.L of DEPC water, and concentration measurement was performed on the recovered products of the gel using NanoDrop, thereby obtaining a sequencing library.
After the sequencing library is obtained, PE150 double-end sequencing is carried out on the sequencing library by using a Illumina Hiseq Xten sequencer, and the double ends of non-coding RNA are identified and the full-length sequence of the non-coding RNA is analyzed.
Example 2 construction method of PEN-seq sequencing library
The PEN-seq sequencing library in this example was constructed using total RNA from stably knocked down 15.5K HEK293T cells and sequenced, including the following procedures.
Culture of stable knockdown 15.5K HEK293T cells
In HEK293T cells, a cell line (sh15.5K-1, sh15.5K-2 and control shNC) that induces silencing of 15.5K was constructed in the manner of miR-E as described in the published article (Fellmann, C., et al An optimized microRNA backbone for effective single-copy RNAi. Cell Rep,2013.5 (6): p.1704-13.). The three cells were inoculated into a 6-well plate, and after culturing for 24 hours, doxycycline (product of Selleck Co., ltd., product No. S5159) was added at a final concentration of 3. Mu.M, and culturing was continued for 48 hours.
(II) extraction of stable knockdown 15.5K HEK293T Total RNA
After 1mL of Trizol after the medium was discarded and after 10 minutes at room temperature, the lysate was transferred to a 1.5mL centrifuge tube and 200 μl of chloroform was added, vortexed and mixed for 15 seconds and left at room temperature for 3 minutes. Centrifuge 13000 Xg for 10 min at 4℃and leave 500. Mu.L of supernatant, add 500. Mu.L of isopropanol, mix well and precipitate for 10 min at room temperature. Then centrifugation was performed at 20000 Xg for 10 minutes at 4℃and the supernatant was discarded, and the RNA pellet was washed by adding 1mL of 75% ethanol (DEPC water configuration) and then centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. This procedure was repeated once. The precipitate was washed once with 1mL of absolute ethanol, centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. The RNA pellet was dried in vacuo. 30. Mu.L of DEPC water was added for solubilization and RNA concentration was determined using NanoDrop (total RNA requires that the RNA integrity satisfy 28S rRNA:18S rRNA approximately equal to 2, A260/A280 greater than 2, A260/A230 greater than 2).
(III) construction of a stable 15.5K knock-down PEN-seq library
The construction of the PEN-seq sequencing library in this example is shown with particular reference to steps 2-8 of example 1.
(IV) high throughput sequencing
PE150 double-ended sequencing was performed on the PEN-seq constructed as described above using a Illumina Hiseq Xten sequencer.
EXAMPLE 3 construction of RIP-PEN-seq sequencing library
RIP-PEN-seq library construction of this example uses RNA immunoprecipitated with RNA binding proteins, which specifically includes HEK293T cell culture stably expressing FLAG-15.5K, cell lysis, RNA immunoprecipitation, FLAG-15.5K interacting RNA isolation, and library construction and sequencing thereof.
Construction of cell lines stably overexpressing FLAG-15.5K and cell harvesting
HEK293T cells stably expressing FLAG-15.5K are constructed by using lentiviral vectors, the cells are expanded, when the cells in a cell culture dish grow to about 90% confluence, after the culture medium is abandoned, pre-cooled DPBS is added for washing twice, after the DPBS is abandoned, 3mL of pre-cooled DPBS is added, the cells are collected in a centrifuge tube by using cell scraping, and after centrifugation is carried out for 5 minutes at 1000 Xg and 4 ℃, the upper DPBS layer is abandoned, and cell precipitation is obtained.
(II) cell lysis and RNA immunoprecipitation
Adding an equal volume of cell lysis buffer solution into the cell pellet, suspending the cell pellet by using a pipettor, incubating on ice for 15 minutes, centrifuging at 15000 Xg and 4 ℃ for 15 minutes, retaining the upper cell lysate, adding 1/20 volume of Dynabeads protein G magnetic beads into the cell lysate, rotating at 4 ℃ for 30 minutes, separating the magnetic beads from the cell lysate by using a magnetic rack, diluting the cell lysate by 10 times by using RIP binding solution, adding an antibody of RNA binding protein (FLAG antibody targeting FLAG-15.5K here) according to the proportion of 5 mu g/mL of the cell lysis dilution, and rotating at 4 ℃ for 12 hours. Dynabeads protein G (Thermo Fisher Co., ltd., product number 10004D) was then added at a rate of 10. Mu.L/g antibody, and incubation was continued at 4℃for 3 hours with rotation.
(III) FLAG-15.5K interaction RNA isolation
After the incubation was completed, the magnetic beads and the solution were separated using a magnetic rack, the solution was discarded, then 1mL of RIP washing solution was added to the magnetic beads, and the beads and the solution were rotated at room temperature for 3 minutes, and the solution was discarded using a magnetic rack. The washing was repeated 4 more times. Then, 1mL of TRIzol (product of Thermo Fisher Co., ltd., product No. 15596018) was added to the washed beads, and after mixing, the mixture was left at room temperature for 5 minutes, 200. Mu.L of chloroform was then added, and the mixture was vortexed and mixed for 15 seconds, and left at room temperature for 3 minutes. Centrifuge 13000 Xg for 10 min at 4℃and leave 500. Mu.L of supernatant, add 500. Mu.L of isopropanol and 4. Mu.L of glycogen (product of Thermo Fisher Co., ltd.; product No. AM 9510), mix well and precipitate overnight at-20 ℃. The RNA pellet was washed by adding 1mL of 75% ethanol (DEPC water preparation) and centrifuged at 20000 Xg for 30 minutes at 4℃and then at 20000 Xg for 5 minutes, and the supernatant was discarded. This procedure was repeated once. The precipitate was washed once with 1mL of absolute ethanol, centrifuged at 20000 Xg for 5 minutes at 4℃and the supernatant was discarded. The RNA pellet was dried in vacuo. 30. Mu.L of DEPC water was added for solubilization, and the RNA concentration was determined using NanoDrop (A260/A280 greater than 2, A260/A230 greater than 2), and the qualified RNA samples were either directly subjected to the next experiment or stored at-80 ℃.
(IV) RIP-PEN-seq sequencing library preparation and sequencing
The preparation and sequencing methods of the RIP-PEN-seq sequencing library in this example are shown with particular reference to steps 2-8 in example 1.
EXAMPLE 4 construction of a sub-PEN-seq sequencing library
The sub-PEN-seq sequencing library in the embodiment adopts RNA derived from each component of the cells, and the specific construction method comprises the following procedures.
HEK293T and HCT116 cell culture and collection
HEK293T and HCT116 cells cultured in a laboratory are taken as samples, and the initial amount of the cell samples is 3 multiplied by 10 7 After the cells were grown to about 90% confluence in the cell culture dish, the cells were washed twice by adding DPBS solution (pH 7.4) to the cells after discarding the medium; then, cells and tissues were digested with pancreatin, after termination of digestion with serum-containing medium, the cell suspension was collected in conical tubes, placed on ice, centrifuged at 500×g at 4 ℃ for 5 minutes, the supernatant was discarded, the cells were resuspended with pre-chilled DPBS and counted, while the relative volume RV of the cells was determined.
(II) cytoplasmic RNA isolation
Pre-chilled cell membrane lysate was added at 15-fold relative volumes, the cells resuspended with a pipette and gently mixed and placed on ice for 10 minutes. Gently vortexing, centrifuging at 1000 Xg at 4deg.C for 3 min, transferring the supernatant (i.e., cytoplasmic fraction) to a new centrifuge tube, and precipitating to obtain the nuclear fraction. For cytoplasmic fractions, 950. Mu.L of absolute ethanol and 50. Mu.L of 3M sodium acetate (product of pH 5.5,Thermo Fisher, product of company, cat# AM 9740) were added per 330. Mu.L of cytoplasmic fraction, and the mixture was homogenized and then precipitated at-20℃for 2 hours. Then 18000 Xg, centrifuged at 4℃for 15 minutes, and the supernatant was discarded. 1mL of 75% ethanol was added, washed by vortexing, centrifuged at 18000 Xg for 5 minutes at 4℃and slightly dried (naturally dried in air) after removing the supernatant, 1mL of TRIzol was added for cleavage, and after 10. Mu. L0.5M EDTA was added, the mixture was incubated at 65℃for 10 minutes to sufficiently dissolve RNA, and then RNA extraction was performed using chloroform to obtain cytoplasmic RNA.
(III) isolation of cytoplasmic RNA
The nuclei were washed by adding 30 times the relative volume of the pre-chilled cell membrane lysate to the nuclei fraction of (II) above, and centrifuged at 200 Xg for 2 minutes at 4℃and this step was repeated once. Then, 30 times of the relative volume of the cell membrane lysate was added, and after resuspension of the nuclei, the nuclei were centrifuged at 1200 Xg for 5 minutes at 4℃and the supernatant was discarded. The nuclei were resuspended by adding 10 relative volumes of S1 sucrose solution, and then added to 10 relative volumes of S3 sucrose solution. The obtained precipitate is purified cell nucleus after centrifugation at 1200 Xg and 4 ℃ for 10 minutes. Adding 10 times of S2 sucrose solution with relative volume into the purified cell nucleus precipitate for resuspension and transferring to a new tube, and then performing ultrasonic disruption under the following ultrasonic conditions: power is 50%, ultrasound is 15 seconds, 45 seconds apart, ultrasound 7 times. The heavy suspension was then added to 10 times the relative volume of S3 sucrose solution, 2000 Xg, and centrifuged at 4℃for 20 minutes. The supernatant contained the cytoplasm and the pellet contained the nucleolus. Taking out the cytoplasm of the supernatant, adding 950 mu L of absolute ethyl alcohol and 50 mu L of 3M sodium acetate into the supernatant according to each 330 mu L, uniformly mixing, precipitating for 2 hours at the temperature of minus 20 ℃, and extracting the cytoplasm RNA by referring to the method (II) to obtain the cytoplasm RNA.
(IV) separation of nucleolus RNA
And (3) re-suspending the cell nucleolus precipitate in 500 mu L S sucrose solution, centrifuging at 2000 Xg and 4 ℃ for 5 minutes, removing the supernatant, adding 1mL TRIzol, cracking at room temperature for 10 minutes, and extracting RNA by using chloroform to obtain the cell nucleolus RNA.
Preparation and sequencing of the sub-PEN-seq library
The preparation and sequencing methods of the sub-PEN-seq sequencing library in this example are shown in steps 2-8 in example 1.
Application example 1 construction of PEN-seq sequencing library and sequencing analysis
In the application example, total RNA is separated from three stable cell lines of HEK293T-shNC, sh15.5K-1 and sh15.5K-2 treated by doxycycline, library preparation and sequencing are carried out by adopting the construction method of the PEN-seq sequencing library in the example 2, and then data analysis is carried out.
1. Data analysis method
The data analysis procedure for identifying double-ended and full-length sequences of non-coding RNA from PEN-seq sequencing libraries is shown in FIGS. 2 and 3, and specifically includes: after the obtained PEN-seq sequencing library original high-throughput double-ended sequencing data, firstly analyzing a joint sequence and a low-quality sequence in the PEN-seq original double-ended sequencing data by using Cutadapt (v 2.8) software, then comparing the filtered data to a human reference genome (hg 38 version) by using sequence comparison software STAR (v 2.7.1 a), and after the BAM file of the comparison result is read by using SAMtools, performing clustered analysis based on the overlapping condition between sequences, and finally determining double-ended sites (starting points and end points) and full-length sequences of non-coding RNA from the clustered analysis result.
The PEN-seq data analysis results can be visually displayed by using IGVs or UCSCs. And, by using the PEN-seq data analysis result, the screening of the K-turn RNA can be further performed based on the structural characteristics of the K-turn RNA, and the computer analysis flow for the identification of the K-turn RNA based on the K-turn structural motif contained in the K-turn RNA is shown in FIG. 4.
2. Data analysis results
Firstly, qPCR and Western blot experiments are used for detecting an effect diagram of stable knocking-down of HEK293T cells by 15.5K, wherein the effect diagram is shown in FIG. 5, shNC, sh15.5K-1 and sh15.5K-2 HEK293T stable strain cells are respectively treated by DMSO or doxycycline for 48 hours, RNA and protein are collected, qPCR results show that compared with negative control group cells shNC, sh15.5K-1 and sh15.5K-2 treated by doxycycline, the shRNA expression of targeted shRNA of 15.5K can be obviously reduced after the shRNA is induced by the doxycycline, and the Western blot experiment results show that the protein level is also obviously reduced.
The double-ended site and full-length sequence of known and new K-turn RNAs are identified based on the three-dimensional structural features of the K-turn, by comparing with annotated starting and ending positions of the known K-turn RNAs, wherein the comparison result of the known K-turn RNA (i.e. box C/D snorNA) starting point and the annotated starting point identified by using the PEN-seq sequencing library is shown in FIG. 6, and the result shows that the detection of the PEN-seq sequencing library can accurately identify the starting point of the known K-turn RNA; the results of comparing the known K-turn RNA (i.e., box C/D snorNA) end-point with the annotated end-point identified using the PEN-seq sequencing library are shown in FIG. 7, which shows that the end-point of the known K-turn RNA can be accurately identified for the PEN-seq sequencing library.
The above results indicate that the full-length sequence of the known K-turn RNA can be accurately identified in PEN-seq libraries constructed based on HEK293T-shNC cells.
In addition, a new batch of K-turn RNA was identified by high throughput sequencing using the PEN-seq sequencing library constructed according to the present invention, as shown in FIG. 8, wherein the decrease in bktRNA1 levels in the silenced 15.5K Cell lines sh15.5K-1 and sh15.5K-2 (Coverage indicates the full length and expression level of the RNA) further suggests that the use of the PEN-seq sequencing library can clearly determine its start and end points and that its expression significantly decreases after silencing 15.5K, which is consistent with the existing literature report (Watkins, N.J., A.Dickmanns, and R.Luhrmann, conserved stem II of the box C/D motif is essential for nucleolar localization and is required, along with the 15.5K protein,for the hierarchical assembly of the box C/D snorNP.mol Cell Biol,2002.22 (23): p.8342-52.) 15.5K can promote processing of K-rn RNA.
For a newly identified pool of K-turn RNAs, their expression levels in shNC, sh15.5K-1, and sh15.5K-2 cells were further compared, and silencing 15.5K significantly down-regulates the expression levels of these K-turn RNAs similar to bktRNA1, as shown in FIG. 9, where the results of the analysis of the expression levels of K-turn RNAs in shNC, sh15.5K-1, and sh15.5K-2 cells using a heat map were examined, with RPM Reads Per Million Reads. Analysis of changes in K-turn RNA expression levels in shNC, sh15.5K-1 and sh15.5K-2 using violin plots, the results are shown in FIG. 10, and the results show that silencing of 15.5K significantly down-regulates the expression level of K-turn RNA.
In conclusion, analysis of the results shows that the method provided by the invention is adopted to construct a PEN-seq sequencing library and carry out sequencing analysis, can effectively capture the full-length non-coding RNA sequence information, and has important significance in researching the transcription level of non-coding RNA.
Application example 2 construction of RIP-PEN-seq sequencing library and sequencing analysis
This application was performed by collecting HEK293T-FLAG-15.5K cells stably expressing FLAG-15.5K, immunoprecipitation of FLAG-15.5K using FLAG antibodies, and library preparation and sequencing using the RIP-PEN-seq sequencing library construction method of example 3, followed by data analysis.
1. Data analysis method
The data analysis procedure for identifying double-ended and full-length sequences of non-coding RNA from RIP-PEN-seq specifically comprises: firstly, analyzing a linker sequence and a low-quality sequence in original double-ended sequencing data of a PEN-seq sequencing library by using Cutadapt (v 2.8), then, comparing filtered data to a human reference genome (hg 38 version) by using sequence comparison software STAR (v 2.7.1 a), performing clustering analysis by using SAMtools after reading BAM files of comparison results based on overlapping conditions among sequences, and finally, determining double-ended sites (starting points and end points) and full-length sequences of non-coding RNA from clustering analysis results. The RIP-PEN-seq data analysis result can be visually displayed by using IGV or UCSC. And, using the results of the RIP-PEN-seq data analysis, K-turn RNA interacting with 15.5K was determined.
2. Data analysis results
Collecting HEK293T-FLAG-15.5K cells stably expressing FLAG-15.5K, and verifying the over-expression condition of FLAG-15.5K in a cell strain stably expressing FLAG-15.5K by Western blot, wherein pCGP is a negative control cell, and the result is shown as A in FIG. 11, and shows that the expression of FLAG-15.5K protein is obviously increased in the HEK293T-FLAG-15.5K cells, and the FLAG-15.5K is immunoprecipitated by using a FLAG antibody; the effect of immunoprecipitation of FLAG-15.5K protein in HEK293T-FLAG-15.5K was detected by Western blot, and the results are shown in FIG. 11B, which shows that FLAG-15.5K protein can be significantly enriched after immunoprecipitation in HEK293T-FLAG-15.5K cells using FLAG, a specific antibody targeting FLAG-15.5K.
Library preparation, sequencing and data analysis were further performed following the RIP-PEN-seq procedure, identifying both the double-ended sites and full-length sequences of known and novel K-turn RNAs and interactions with 15.5K. The results of comparing the known K-turn RNA (i.e., box C/DsnoRNA) start identified by 15.5K RIP-PEN-seq with the annotated start are shown in FIG. 12A by comparison with the annotated start and end positions of the known K-turn RNA, which shows that constructing a RIP-PEN-seq sequencing library for sequencing can accurately identify the start of the known K-turn RNA; 15.5K RIP-PEN-seq the comparison of the known K-turn RNA (i.e., box C/D snoRNA) endpoint with the annotated endpoint is shown as B at 12, which shows that constructing a RIP-PEN-seq sequencing library for sequencing can also accurately identify the endpoint of the known K-turn RNA.
The above results demonstrate that constructing a RIP-PEN-seq sequencing library using the methods of the present invention and performing high throughput sequencing can accurately identify the double-ended site and full length of a known K-turn RNA, e.g., the K-turn RNA in the 10 GAS5 introns shown in FIG. 13. In addition, a new set of K-turn RNAs was identified, such as bktRNA1, starting and ending in FIG. 14.
Application example 3 data processing of sub-PEN-seq sequencing library
After collecting the cultured HEK293T and HCT116 cells, a sub-PEN-seq sequencing library and sequencing were constructed and data analysis was performed according to the method of example 4.
1. Data analysis method
Data identifying double-ended and full-length sequences and cell localization information for non-coding RNAs from sub-PEN-seq. Firstly, analyzing a linker sequence and a low-quality sequence in original double-ended sequencing data of a PEN-seq library by using Cutadapt (v 2.8) software, then, comparing the filtered data to a human reference genome (hg 38 version) by using sequence comparison software STAR (v 2.7.1 a), performing clustering analysis by using SAMtools to read BAM files of comparison results based on overlapping conditions among sequences, and finally, determining double-ended sites (starting points and end points) and full-length sequences of non-coding RNA from clustering analysis results. The sub-PEN-seq data analysis result can be visually displayed by using the IGV or UCSC. And, using the sub-PEN-seq data analysis results, K-turn RNA screening can be further performed based on the structural characteristics of the K-turn RNA and the distribution of the K-turn RNA in different components of the cell and the cell localization information of the RNA can be analyzed.
2. Data analysis results
After collecting the cultured HEK293T and HCT116 cells, the separation of three components of Cytoplasm, cytoplasm and Nucleolus was carried out according to the method of separating each cell component in sub-PEN-seq, and the separation effect was detected by using proteins specific to each component, wherein Western blot verifies that the separation effect of HEK293T Cytoplasm (cytoplasms, cyto), cytoplasm (Np) and Nucleolus (Nucleolus, no) is shown as A in FIG. 15, and the results show that the proteins GAPDH specific to the Cytoplasm component, FUS proteins specific to the Cytoplasm component and FBL proteins specific to the Nucleolus component are remarkably distributed in each component, and are very low in the other components, that is, the separation effect of each component of the cells is remarkable. Western blot verifies that HCT116 cytoplasm, cytoplasm and nucleolus were isolated as shown in FIG. 15B. The above illustrates that the method of the present invention is effective in separating individual cell components.
Library preparation, sequencing and data analysis were further performed according to the construction method of the sub-PEN-seq sequencing library of example 4, identifying the double-ended site and full-length sequence of known and novel K-turn RNAs and their cellular localization. The double-ended site and full length of K-turn RNA bktRNA1 identified in each cell component sub-PEN-seq of HEK293T are shown in FIG. 16, and the double-ended site and full length of K-turn RNA bktRNA1 identified in HCT116sub-PEN-seq are shown in FIG. 17, and the results show that the high-throughput sequencing of the sub-PEN-seq sequencing library constructed according to the invention can accurately identify known K-turn RNA, and the K-turn RNA of HEK293T and HCT116 cells is mainly distributed in the cytoplasm and nucleolus of the cell according to the expression analysis of the K-turn. In addition, the expression level of K-turn RNA in each cell fraction was analyzed using a heat map in HEK293T and HCT116 cell sub-PEN-seq library data, and the results are shown in FIG. 18, which shows that the expression level of K-turn RNA in the cytoplasm and nucleolus was higher than that in the cytoplasmic fraction; further, the difference in the expression levels of K-turn RNA between the different cell components in HEK293T and HCT116 cell sub-PEN-seq data was analyzed using violin plots, and the results are shown in FIG. 19, which shows that the K-turn RNA is mainly distributed in the cytoplasm and nucleolus.
The results show that the construction of a captured full-length non-coding RNA sequencing library and high-throughput sequencing by the method of the invention can detect the full length of non-coding RNA in each cell component (such as cytoplasm, cytoplasm and nucleolus).
In summary, the invention provides a construction method and application for capturing full-length non-coding RNA sequencing library, and the high-throughput sequencing is carried out on the non-coding RNA sequencing library constructed by the invention, so that the result shows that the non-coding RNA can be accurately identified, and the construction method and application have important significance for capturing the full length of various ncRNAs and researching the positioning of the non-coding RNA.
While the embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art. Furthermore, embodiments of the invention and features of the embodiments may be combined with each other without conflict.
Claims (10)
1. A method of constructing a sequencing library that captures full-length non-coding RNA, comprising:
s1, obtaining RNA of a sample to be detected, and respectively connecting a 3'DNA joint and a 5' RNA joint at two ends of the RNA to obtain an RNA connection product;
S2, mixing the RNA connection product with a DNA probe targeting non-target RNA, and annealing to remove the non-target RNA and the residual DNA probe to obtain a target RNA connection product;
s3, designing a truncated reverse transcription primer aiming at the target RNA connection product, synthesizing cDNA, and then carrying out PCR amplification on the cDNA by using the primer containing the anchor base to obtain a captured full-length non-coding RNA sequencing library.
2. The method of claim 1, wherein the RNA of the sample to be tested comprises at least one of total RNA from a cell or tissue source, RNA immunoprecipitated with an RNA binding protein, and RNA from a different organelle source.
3. The method according to claim 1, wherein the 3' DNA linker is an adenylated 5' terminal 3' DNA linker having a random base;
preferably, the 3 'end of the 5' RNA linker carries a random base.
4. A method of construction according to any one of claims 1 to 3, wherein the non-target RNA comprises at least one of rRNA, snRNA, snoRNA.
5. The method according to claim 4, wherein the length of the DNA probe is 38-55nt.
6. The method according to claim 4, wherein the non-target RNA and the residual DNA probe are removed by RNase H and exonuclease RecJF, respectively.
7. The method according to claim 4, wherein the truncated reverse transcription primer sequence is as set forth in Seq ID No: 197.
8. The method of claim 4, wherein the fragment size in the captured full-length non-coding RNA sequencing library is 150bp to 1500bp.
9. A method of sequencing full-length non-coding RNA comprising constructing a sequencing library using the method of any one of claims 1 to 8; and sequencing the sequencing library.
10. The method of claim 9, wherein the sequencing is PE150 double-ended sequencing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310366344.3A CN116516495A (en) | 2023-04-06 | 2023-04-06 | Construction method and application for capturing full-length non-coding RNA sequencing library |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310366344.3A CN116516495A (en) | 2023-04-06 | 2023-04-06 | Construction method and application for capturing full-length non-coding RNA sequencing library |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116516495A true CN116516495A (en) | 2023-08-01 |
Family
ID=87403814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310366344.3A Pending CN116516495A (en) | 2023-04-06 | 2023-04-06 | Construction method and application for capturing full-length non-coding RNA sequencing library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116516495A (en) |
-
2023
- 2023-04-06 CN CN202310366344.3A patent/CN116516495A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lu et al. | RNA duplex map in living cells reveals higher-order transcriptome structure | |
US7553947B2 (en) | Method for gene identification signature (GIS) analysis | |
CN108220394B (en) | Identification method and system for gene regulatory chromatin interaction and application thereof | |
CN106947827B (en) | Bighead carp gender specific molecular marker, screening method and application thereof | |
CN108315387B (en) | Micro cell ChIP method | |
AU2007225499A1 (en) | Nucleic acid interaction analysis | |
CN113466444A (en) | Chromatin conformation capture method | |
CN113308514A (en) | Construction method and kit for detection library of trace m6A and high-throughput detection method | |
CN111607613A (en) | Plasmid vector for expressing mRNA of cellular immune vaccine and construction method and application thereof | |
CN104630211B (en) | A kind of construction method of Small RNA cDNA libraries | |
US20220213469A1 (en) | Methods and compositions for barcoding nucleic acid libraries and cell populations | |
WO2017215517A1 (en) | Method for removing 5' and 3' linker connection by-products in sequencing library construction | |
KR101913735B1 (en) | Internal control substance searching for intersample crosscontamination of nextgeneration sequencing samples | |
CN112662771B (en) | Targeting capture probe of tumor fusion gene and application thereof | |
CN113638055B (en) | Method for preparing double-stranded RNA sequencing library | |
CN113215234A (en) | Method LACE-seq for identifying RNA binding protein target site, kit and application | |
CN110205365B (en) | High-throughput sequencing method for efficiently researching RNA interaction group and application thereof | |
CN116516495A (en) | Construction method and application for capturing full-length non-coding RNA sequencing library | |
CN113999898B (en) | method for detecting methylation sites of m6A RNA | |
CN115960987A (en) | Rapid construction method and application of mRNA3' terminal sequencing library | |
CN111440843A (en) | Method for preparing chromatin co-immunoprecipitation library by using trace clinical puncture sample and application thereof | |
CN111826419A (en) | Library-establishing sequencing method suitable for RIP (RIP-induced plasticity) experiment of trace cells | |
CN111705119A (en) | Method for detecting epigenetic modification of moso bamboo circular RNA | |
US20240052412A1 (en) | Method for detecting rna structure at whole transcriptome level and use thereof | |
CN107794257B (en) | Construction method and application of DNA large fragment library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |