WO2022099670A1 - 一种全转录组rna结构探测的方法及其应用 - Google Patents
一种全转录组rna结构探测的方法及其应用 Download PDFInfo
- Publication number
- WO2022099670A1 WO2022099670A1 PCT/CN2020/128949 CN2020128949W WO2022099670A1 WO 2022099670 A1 WO2022099670 A1 WO 2022099670A1 CN 2020128949 W CN2020128949 W CN 2020128949W WO 2022099670 A1 WO2022099670 A1 WO 2022099670A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rna
- nucleic acid
- detection method
- structure detection
- acid structure
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims abstract description 128
- 238000001727 in vivo Methods 0.000 claims abstract description 14
- 238000002473 ribonucleic acid immunoprecipitation Methods 0.000 claims abstract description 8
- 108091032955 Bacterial small RNA Proteins 0.000 claims abstract description 6
- 150000007523 nucleic acids Chemical group 0.000 claims description 48
- 230000035772 mutation Effects 0.000 claims description 41
- 238000001514 detection method Methods 0.000 claims description 35
- 239000003153 chemical reaction reagent Substances 0.000 claims description 28
- 238000012163 sequencing technique Methods 0.000 claims description 27
- 239000002679 microRNA Substances 0.000 claims description 24
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 claims description 21
- 125000003729 nucleotide group Chemical group 0.000 claims description 19
- 108020004566 Transfer RNA Proteins 0.000 claims description 18
- 238000002372 labelling Methods 0.000 claims description 18
- 239000002773 nucleotide Substances 0.000 claims description 18
- 102000039446 nucleic acids Human genes 0.000 claims description 16
- 108020004707 nucleic acids Proteins 0.000 claims description 16
- 108090000623 proteins and genes Proteins 0.000 claims description 14
- 239000002299 complementary DNA Substances 0.000 claims description 13
- 239000012634 fragment Substances 0.000 claims description 13
- 238000010839 reverse transcription Methods 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 230000027455 binding Effects 0.000 claims description 9
- 238000000338 in vitro Methods 0.000 claims description 9
- 108091070501 miRNA Proteins 0.000 claims description 9
- 108020005075 5S Ribosomal RNA Proteins 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 7
- 238000007385 chemical modification Methods 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 6
- 230000037430 deletion Effects 0.000 claims description 6
- 238000003780 insertion Methods 0.000 claims description 6
- 230000037431 insertion Effects 0.000 claims description 6
- 102000039471 Small Nuclear RNA Human genes 0.000 claims description 5
- 102000042773 Small Nucleolar RNA Human genes 0.000 claims description 5
- 108020003224 Small Nucleolar RNA Proteins 0.000 claims description 5
- 238000012350 deep sequencing Methods 0.000 claims description 5
- 108020004999 messenger RNA Proteins 0.000 claims description 5
- 108020004688 Small Nuclear RNA Proteins 0.000 claims description 4
- 230000036438 mutation frequency Effects 0.000 claims description 4
- 102000004169 proteins and genes Human genes 0.000 claims description 4
- 108091034135 Vault RNA Proteins 0.000 claims description 3
- 230000003321 amplification Effects 0.000 claims description 3
- 238000002156 mixing Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000003252 repetitive effect Effects 0.000 claims description 3
- 108020005198 Long Noncoding RNA Proteins 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- -1 etc. Proteins 0.000 claims 1
- 238000001228 spectrum Methods 0.000 claims 1
- 238000010183 spectrum analysis Methods 0.000 claims 1
- 239000000758 substrate Substances 0.000 abstract description 15
- 238000005516 engineering process Methods 0.000 abstract description 8
- 238000004458 analytical method Methods 0.000 abstract description 5
- 230000010933 acylation Effects 0.000 abstract description 2
- 238000005917 acylation reaction Methods 0.000 abstract description 2
- 230000008827 biological function Effects 0.000 abstract 1
- 101000907904 Homo sapiens Endoribonuclease Dicer Proteins 0.000 description 26
- 102100023387 Endoribonuclease Dicer Human genes 0.000 description 24
- 210000004027 cell Anatomy 0.000 description 19
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 17
- 238000012986 modification Methods 0.000 description 12
- 230000004048 modification Effects 0.000 description 12
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 10
- 239000000047 product Substances 0.000 description 9
- 108091030146 MiRBase Proteins 0.000 description 8
- VAYGXNSJCAHWJZ-UHFFFAOYSA-N dimethyl sulfate Chemical compound COS(=O)(=O)OC VAYGXNSJCAHWJZ-UHFFFAOYSA-N 0.000 description 8
- 108700011259 MicroRNAs Proteins 0.000 description 6
- 238000013103 analytical ultracentrifugation Methods 0.000 description 6
- 239000000872 buffer Substances 0.000 description 6
- 230000003834 intracellular effect Effects 0.000 description 6
- 239000011541 reaction mixture Substances 0.000 description 6
- 239000000523 sample Substances 0.000 description 6
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 5
- 239000011780 sodium chloride Substances 0.000 description 5
- MULNCJWAVSDEKJ-UHFFFAOYSA-N 1-methyl-7-nitroisatoic anhydride Chemical compound [O-][N+](=O)C1=CC=C2C(=O)OC(=O)N(C)C2=C1 MULNCJWAVSDEKJ-UHFFFAOYSA-N 0.000 description 4
- LQYATWGHTPLHGI-UHFFFAOYSA-O 1H-imidazol-3-ium azide Chemical compound [N-]=[N+]=[N-].c1c[nH+]c[nH]1 LQYATWGHTPLHGI-UHFFFAOYSA-O 0.000 description 4
- 229920002873 Polyethylenimine Polymers 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000003776 cleavage reaction Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000007017 scission Effects 0.000 description 4
- HNTZKNJGAFJMHQ-UHFFFAOYSA-N 2-methylpyridine-3-carboxylic acid Chemical compound CC1=NC=CC=C1C(O)=O HNTZKNJGAFJMHQ-UHFFFAOYSA-N 0.000 description 3
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 3
- 229940124158 Protease/peptidase inhibitor Drugs 0.000 description 3
- 238000011529 RT qPCR Methods 0.000 description 3
- 108020004459 Small interfering RNA Proteins 0.000 description 3
- 239000011324 bead Substances 0.000 description 3
- 230000001186 cumulative effect Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 239000000137 peptide hydrolase inhibitor Substances 0.000 description 3
- 239000013612 plasmid Substances 0.000 description 3
- 230000009257 reactivity Effects 0.000 description 3
- 238000001890 transfection Methods 0.000 description 3
- RKZKKKIVWXSUMZ-UHFFFAOYSA-N 1-ethoxy-3,3-dihydroxybutan-2-one Chemical compound C(C)OCC(C(C)(O)O)=O RKZKKKIVWXSUMZ-UHFFFAOYSA-N 0.000 description 2
- 108020004414 DNA Proteins 0.000 description 2
- KJMRWDHBVCNLTQ-UHFFFAOYSA-N N-methylisatoic anhydride Chemical compound C1=CC=C2C(=O)OC(=O)N(C)C2=C1 KJMRWDHBVCNLTQ-UHFFFAOYSA-N 0.000 description 2
- 229920002594 Polyethylene Glycol 8000 Polymers 0.000 description 2
- 238000012228 RNA interference-mediated gene silencing Methods 0.000 description 2
- 101710086015 RNA ligase Proteins 0.000 description 2
- 102000044126 RNA-Binding Proteins Human genes 0.000 description 2
- 108700020471 RNA-Binding Proteins Proteins 0.000 description 2
- 108010057163 Ribonuclease III Proteins 0.000 description 2
- 102000003661 Ribonuclease III Human genes 0.000 description 2
- 102100024544 SURP and G-patch domain-containing protein 1 Human genes 0.000 description 2
- 108091027967 Small hairpin RNA Proteins 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine Chemical compound C1=NC=2C(N)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 238000004113 cell culture Methods 0.000 description 2
- 238000005119 centrifugation Methods 0.000 description 2
- 150000001875 compounds Chemical class 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009368 gene silencing by RNA Effects 0.000 description 2
- 239000012139 lysis buffer Substances 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 108091070404 miR-27b stem-loop Proteins 0.000 description 2
- 229920002113 octoxynol Polymers 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 239000004055 small Interfering RNA Substances 0.000 description 2
- DAEPDZWVDSPTHF-UHFFFAOYSA-M sodium pyruvate Chemical compound [Na+].CC(=O)C([O-])=O DAEPDZWVDSPTHF-UHFFFAOYSA-M 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 239000011534 wash buffer Substances 0.000 description 2
- YRCRRHNVYVFNTM-UHFFFAOYSA-N 1,1-dihydroxy-3-ethoxy-2-butanone Chemical compound CCOC(C)C(=O)C(O)O YRCRRHNVYVFNTM-UHFFFAOYSA-N 0.000 description 1
- UHDGCWIWMRVCDJ-UHFFFAOYSA-N 1-beta-D-Xylofuranosyl-NH-Cytosine Natural products O=C1N=C(N)C=CN1C1C(O)C(O)C(CO)O1 UHDGCWIWMRVCDJ-UHFFFAOYSA-N 0.000 description 1
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 108091003079 Bovine Serum Albumin Proteins 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- UHDGCWIWMRVCDJ-PSQAKQOGSA-N Cytidine Natural products O=C1N=C(N)C=CN1[C@@H]1[C@@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-PSQAKQOGSA-N 0.000 description 1
- 239000006144 Dulbecco’s modified Eagle's medium Substances 0.000 description 1
- 102100031780 Endonuclease Human genes 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 108010003163 GDP dissociation inhibitor 1 Proteins 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 239000007995 HEPES buffer Substances 0.000 description 1
- 108091029795 Intergenic region Proteins 0.000 description 1
- ZDXPYRJPNDTMRX-VKHMYHEASA-N L-glutamine Chemical compound OC(=O)[C@@H](N)CCC(N)=O ZDXPYRJPNDTMRX-VKHMYHEASA-N 0.000 description 1
- 229930182816 L-glutamine Natural products 0.000 description 1
- 229910021380 Manganese Chloride Inorganic materials 0.000 description 1
- GLFNIEUTAYBVOC-UHFFFAOYSA-L Manganese chloride Chemical compound Cl[Mn]Cl GLFNIEUTAYBVOC-UHFFFAOYSA-L 0.000 description 1
- VQAYFKKCNSOZKM-IOSLPCCCSA-N N(6)-methyladenosine Chemical compound C1=NC=2C(NC)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O VQAYFKKCNSOZKM-IOSLPCCCSA-N 0.000 description 1
- VQAYFKKCNSOZKM-UHFFFAOYSA-N NSC 29409 Natural products C1=NC=2C(NC)=NC=NC=2N1C1OC(CO)C(O)C1O VQAYFKKCNSOZKM-UHFFFAOYSA-N 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 239000012124 Opti-MEM Substances 0.000 description 1
- 238000012157 PAR-CLIP Methods 0.000 description 1
- 238000002123 RNA extraction Methods 0.000 description 1
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 102100034335 Rab GDP dissociation inhibitor alpha Human genes 0.000 description 1
- 108091029474 Y RNA Proteins 0.000 description 1
- 101150063416 add gene Proteins 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 210000000170 cell membrane Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000013043 chemical agent Substances 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- UHDGCWIWMRVCDJ-ZAKLUEHWSA-N cytidine Chemical compound O=C1N=C(N)C=CN1[C@H]1[C@H](O)[C@@H](O)[C@H](CO)O1 UHDGCWIWMRVCDJ-ZAKLUEHWSA-N 0.000 description 1
- 229940104302 cytosine Drugs 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 235000012489 doughnuts Nutrition 0.000 description 1
- 239000012091 fetal bovine serum Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 102000051308 human DICER1 Human genes 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 229950001103 ketoxal Drugs 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 239000011565 manganese chloride Substances 0.000 description 1
- 235000002867 manganese chloride Nutrition 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 108091044988 miR-125a stem-loop Proteins 0.000 description 1
- 108091049513 miR-125a-1 stem-loop Proteins 0.000 description 1
- 108091040046 miR-125a-2 stem-loop Proteins 0.000 description 1
- 108091050874 miR-19a stem-loop Proteins 0.000 description 1
- 108091086850 miR-19a-1 stem-loop Proteins 0.000 description 1
- 108091088468 miR-19a-2 stem-loop Proteins 0.000 description 1
- 231100000219 mutagenic Toxicity 0.000 description 1
- 230000003505 mutagenic effect Effects 0.000 description 1
- 231100000310 mutation rate increase Toxicity 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 238000001426 native polyacrylamide gel electrophoresis Methods 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 description 1
- 238000002264 polyacrylamide gel electrophoresis Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000003161 ribonuclease inhibitor Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 108091029842 small nuclear ribonucleic acid Proteins 0.000 description 1
- 229940054269 sodium pyruvate Drugs 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
- C12Q1/6874—Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/16—Primer sets for multiplex assays
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/178—Oligonucleotides characterized by their use miRNA, siRNA or ncRNA
Definitions
- the invention relates to the technical field of molecular biology, and in particular provides a method and application of a whole transcriptome RNA structure detection method.
- the present invention can detect the secondary structure of all RNA molecules in cells, especially RNAs with a length of less than 200 nt.
- RNA structure omics combines chemical probing with next-generation sequencing to study the structure of RNA.
- Chemical reagents that are widely used for RNA structure probing in vivo include dimethyl sulfate (DMS), 1-methyl-7-nitroisatoic anhydride (1M7), 2-methylnicotinic acid imidazolide-azide (NAI) -N3) and ethoxydihydroxybutanone, etc.
- DMS modifies the N1 and N3 positions of single-stranded adenine and cytosine bases in vivo, whereas NAI-N3 acylates the free 2'-hydroxyl groups of all four single-stranded bases.
- the icSHAPE technology utilizes the structure-selective 2'-hydroxy acylation of NAI-N3, combined with subsequent sequencing technology, to probe the structure of transcriptome RNA.
- icSHAPE has been used to reveal structural differences in RNAs associated with different biological processes, such as translational processes in living cells, structural differences in RNA-protein interaction regions, and N6-methyladenosine modification sites.
- the principle of DMS-seq and icSHAPE technology is that chemically modified nucleotides generate a reverse transcription termination signal during reverse transcription, thereby determining the probability of a nucleotide in a single-stranded conformation.
- a limitation of these techniques is the loss of structural information at the 3' end of the probed target due to the difficulty in aligning the short sequencing reads generated at the 3' end. Lost may be the intact transcript under study or its fragments, such as functional regions of long RNAs.
- Such technical deficiencies severely restrict the structural analysis of small fragments of targets, such as small RNAs (sRNAs, RNAs less than about 200 nt in length) or binding sites for RNA-binding proteins (RBPs).
- DMS-mutation profiling and SHAPE-MaP techniques overcome the loss of structural information at the 3' end by measuring the rate of mutation at nucleotide positions modified by chemical agents during reverse transcription, rather than termination signals .
- DMS-MaPseq provides partial nucleotide coverage (can only detect adenosine "A” and cytidine “C” nucleotides), and current SHAPE-MaP reagents (eg, NMIA, 1M7, etc.) only have Moderate cell membrane penetrating ability limits its structural probing of RNA in vivo.
- RNA structure detection method icSHAPE-MaP proposed in the present invention and tertiary structure modeling, we found that the spatial distance is an important parameter for the pre-miRNA processing by Dicer.
- the present invention provides a method for detecting the RNA structure of the whole transcriptome.
- the RNA structure map of the substrate bound by Dicer is successfully analyzed, and the structure type and the structure of the Dicer substrate are revealed. feature.
- the present invention provides a method and application for nucleic acid structure detection, comprising: 1) modifying nucleic acid with a labeling reagent; 2) processing the nucleic acid; 3) sequencing the processed nucleic acid; 4) calculating a structure score according to the sequencing result; ) predict nucleic acid structure.
- RNA is full-length RNA
- RNA is transcriptome RNA
- RNA is small RNA
- RNA can be miRNA, snoRNA, snRNA, tRNA, vault RNA , Y RNA, pre-miRNA, miscRNA and 5S rRNA, etc. or RNA transcript fragments, such as exon and intron of mRNA, exon and intron of lncRNA, etc.
- the present invention provides a method for RNA structure detection, comprising: 1) modifying nucleic acid with a labeling reagent; 2) processing the RNA; 3) sequencing the processed product; 4) according to the sequencing result Calculate the structure score; 5) Predict the nucleic acid structure.
- RNA structure detection comprises one of the following a)-d) steps:
- step 2) The processing in step 2) is to reverse-transcribe RNA to obtain cDNA;
- step 3 The processed product in step 3) is cDNA, and the sequencing is deep sequencing for cDNA;
- step 4 calculating the structure score includes the steps of counting the mutation frequency of each nucleotide site and calculating the mutation rate;
- Predicting the nucleic acid structure in step 5) includes applying the RNA structure score map obtained in step 4) to predicting RNA secondary structure, tertiary structure or other high-level structures.
- the present invention provides a whole transcriptome RNA structure detection method, comprising: 1) modifying nucleic acid with a labeling reagent; 2) processing the RNA; 3) sequencing the processed product; 4) according to Sequencing results calculate the structure score; 5) Predict the nucleic acid structure.
- the whole transcriptome RNA structure detection method comprises one of the following a)-d) steps:
- step 2) The processing in step 2) is to reverse-transcribe RNA to obtain cDNA;
- step 3 The processed product in step 3) is cDNA, and the sequencing is deep sequencing for cDNA;
- step 4 calculating the structure score includes the steps of counting the mutation frequency of each nucleotide site and calculating the mutation rate;
- Predicting the nucleic acid structure in step 5) includes applying the RNA structure score map obtained in step 4) to predicting RNA secondary structure, tertiary structure or other high-level structures.
- the secondary structures include single-stranded RNA, paired double-stranded RNA, stem loops or hairpins, protruding loops and contact or polyloops, interior loops, pseudoknots, kissing hairpins, and the like.
- the tertiary structure is a complex structure of RNA molecules caused by the further folding of the nucleic acid chain in the spatial conformation based on the secondary structure.
- the other higher order structures include the spatial conformation of RNA-protein complexes and the like.
- the whole transcriptome RNA structure detection method provided by the present invention, wherein the structure detection method can be DMS-mutation profile analysis or SHAPE-MaP (mutation profile) method.
- the labeling reagent is a chemical modification reagent.
- the chemical modification reagent has high intracellular reactivity.
- the high intracellular reactivity refers to the ability to selectively react with single-stranded nucleotides in RNA structure within a reasonable time, resulting in a sufficient number of modification sites, such as NAI, NAIN3, DMS, kethoxal.
- 1M7, NMIA is a modifying reagent with low intracellular reactivity.
- the labeling reagent is dimethyl sulfate (DMS), 2-methylnicotinic acid imidazolide-azide (NAI-N3) or ethoxydihydroxybutanone; more preferably, the labeling reagent is 2- Methylnicotinic acid imidazolide-azide (NAI-N3).
- the method can detect all types of RNA structures in cells in vivo or in vitro.
- RNA can be below 200nt.
- step 1) modifying nucleic acid with labeling reagent, specifically: after co-incubating cells with labeling reagent, RNA is extracted; or after mixing in vitro RNA and labeling reagent, using a kit Purify and extract RNA.
- RNAs were added with 5' and 3' end adaptors prior to reverse transcription.
- the 5'-end adaptor has the following gene sequence: 5'-rArCrArCrGrArCrGrCrUrCrUrCrCrGrArUrCrUrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrNrN
- the reverse transcription primer has the gene sequence described in 5'-AGACGTGTGCTCTTCCGATCT-3' (SEQ ID No. 3).
- the cDNA obtained in step 2) is added to a PCR reaction system to carry out an amplification reaction, and the obtained PCR product is subjected to deep sequencing.
- the PCR reaction system includes: P5 primer, P3 primer, 25 ⁇ SYBR Green, 2 ⁇ Phusion High-Fidelity PCR master mix.
- the P5 primer has the gene sequence described in 5'-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3' (SEQ ID No.4), and the P3 primer has 5'-CAAGCAGAAGACGGCATACGAGAT[8 base barcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3' (SEQ ID No.5 ) the gene sequence described. Insert 8 base barcodes in "...GAGAT" and "GTGAC".
- the 8-base barcode in the P3 primer sequence was used to distinguish sequencing libraries generated from different samples.
- stage I 98°C for 1 minute
- stage II 98°C for 15 seconds, 65°C for 30 seconds, 72°C for 45 seconds
- phase II cycles several times. The number of cycles is determined according to the fluorescence value displayed by the qPCR instrument, generally 13-15.
- the threshold of sequencing coverage can be 1000 ⁇ or 500 ⁇ , preferably 2000 ⁇ .
- calculating the assignment includes one of any of the following steps:
- a) Preprocessing the sequencing data including: removing 3' adapters (preferably using Cutadapt), filtering high-quality reads (preferably using Trimmomatic), and deleting repetitive sequences (preferably using Perl);
- RNA secondary structure preferably using RNAstructure package
- RNA secondary structure preferably using VARNAv3-93.
- RNA sequence is an sRNA sequence or an RNA that binds to a protein.
- the mutation rate includes all types of mutations such as mismatches, insertions, deletions and other complex mutations.
- the mutation rate of each nucleic acid is calculated using shape_mutation_counter.
- r mutation rate
- nai labeling reagent sample group
- dmso DMSO sample group
- f normalization factor
- the present invention also provides a method for detecting a specific RNA structure, which is used in combination with the above method and the RNA immunoprecipitation method.
- RNA is protein-binding RNA, such as Dicer-binding substrate RNA.
- the present invention also provides a kit for detecting the entire transcriptome RNA structure, characterized in that the kit includes the chemical modification reagent and nucleotide sequence described in any one of the above-mentioned methods for detecting the entire transcriptome RNA structure.
- the present invention proposes a new biotechnology "icSHAPE-MaP" that probes intact forms of RNA by detecting highly intracellular reactive labeling reagents, such as NAI-N3-induced modifications, by mutation profiling of reverse transcriptase.
- this method allows structural analysis of RNA species of small size, either full-length sRNAs or fragments of long RNAs (eg, RBP binding sites).
- the present invention also demonstrates the application of icSHAPE-MaP in revealing the structural map of sRNA of Dicer substrate. In the future, icSHAPE-MaP could be used to reveal the structural features of other RBP-bound RNAs.
- RNAstructure software for RNA secondary structure prediction and analysis. BMC Bioinformatics 11, 129.16.
- Figure 1 shows the proportion of reads aligned to the reference genome that carry mutations in different groups.
- the proportion of reads carrying mutations was significantly lower than that in the in vivo NAI-N3 modified and in vitro NAI-N3 modified groups, indicating that the increase in the mutation rate was indeed caused by the NAI-N3 modification;
- Figure 2 is a bar graph of the mutation rate of four bases in the samples of the control DMSO group, the in vivo NAI-N3 modified group and the in vitro NAI-N3 modified group, indicating that NAI-N3 can modify four bases at the same time;
- Figure 3 is the mutation rate of eight types of mutations in human 5S rRNA, indicating the contribution of different types of mutations to the mutation rate;
- Figure 4 is the mutation rate at each nucleotide position of the control DMSO group of human 5S rRNA and the NAI-N3 modified group in vivo;
- Figure 5 is the known secondary structure model of 5S rRNA and the icSHAPE-MaP structure score at each nucleotide position;
- Figure 6 is an example of a snoRNA and a tRNA with icSHAPE-MaP structural scores and a model of its secondary structure
- Fig. 7 is the cumulative distribution curve of the Pearson correlation coefficient of the mutation rate of the NAI-N3 modified group in the same region between two repetitions;
- Figure 8 is a cumulative distribution curve of the Pearson correlation coefficient for the mutation rates in samples from the DMS modification group (top) or the NAI-N3 modification group (bottom) for A and C bases between replicates;
- Figure 9 is a statistical ring diagram of different types of Dicer substrates
- Figure 10 is a violin diagram of the length distribution of Dicer binding fragments
- Figure 11 is the read coverage of two fragments in GDI1 and DICER1 enriched by RNA immunoprecipitation
- Figure 12 is a statistical doughnut of different types of RNAs with icSHAPE-MaP structure scores
- Figure 13 is a heat map of the Pearson correlation coefficient for the mutation rate of nucleotides between two repeats in DMSO or NAI-N3 samples;
- Figure 14 is a statistical circle plot of different types of Dicer-enriched RNAs with icSHAPE-MaP structural fractions
- Figure 15 is a bar graph of the relative expression levels of the top 150 pre-miRNAs in HEK293T cells, which highly represents the relative expression levels of pre-miRNAs;
- Figure 16 is a violin plot of the areas under the curve (AUCs) obtained from the comparative analysis of 209 tRNA secondary structures from the GtRNAdb database and the resulting icSHAPE-MaP structure scores;
- Figure 17 shows the secondary structure model of hsa-miR-125a constructed by RNAstructure software with the structural score as the constraint, and the color of each nucleotide indicates the structural score of icSHAPE-MaP;
- Figure 18 is the secondary structure model of hsa-miR-125a from the miRBase database, the same color of each nucleotide indicates the structural score of icSHAPE-MaP;
- Figure 19 is the secondary structure model of hsa-miR-19a and hsa-miR-27b pre-miRNA and its corresponding icSHAPE-MaP structure score for each nucleotide, (top) using RNAStructure software modeling, its icSHAPE-MaP Structural score as a constraint; (bottom) the model is derived from the miRBase database;
- Figure 20 is a violin plot of pseudo-free energies between 108 pre-miRNA structures predicted from the miRBase database or by RNAStructure.
- HEK293T cell line was purchased from ATCC.
- the Dicer KO HEK293T cell line (NoDice 2-20) was a gift from Dr. Bryan R. Cullen, Duke University. Cells were cultured at 37°C in a humidified incubator with 5% CO 2 in high glucose DMEM containing L-glutamine, sodium pyruvate (Thermo Scientific HyClone) and 10% fetal bovine serum. All cell transfection experiments were performed using polyethyleneimine (PEI) (Sigma-Aldrich).
- PEI polyethyleneimine
- HEK293T cells were scraped from the dish and washed with PBS; the resulting cells were resuspended in 100 mM NAI-N3 and incubated in a thermostatic mixer at 37°C for 5 min; the reaction was terminated after centrifugation at 2500 g for 1 min at 4°C, The supernatant was then removed; cells were collected and resuspended in 250 ⁇ l PBS, and 750 ⁇ l TRIzol LS reagent was added for RNA extraction as directed; the resulting RNA or RNA prepared in vitro was run on a 6% denaturing urea-PAGE gel for size screening (25-200nt); the gel containing RNA of specific length was crushed and placed in buffer (500mM NaCl, 1mM EDTA pH 8.0, 10mM Tris-HCl pH 8.0), and incubated overnight at 4°C with rotation.
- buffer 500mM NaCl, 1mM EDTA pH 8.0, 10mM Tri
- RNA concentration kit Zymo
- a plasmid expressing human Dicer inactive in cleavage (containing two mutations (D1320A and D1709A) in its RNase III domain, Addgene) was transfected into NoDice 2-20 cells. 9 x 106 cells were seeded in 15cm plates on the first day. After 24 hours, transfection was performed with 20 ⁇ g of plasmid and 60 ⁇ l (1 ⁇ g/ ⁇ l) PEI. Specifically, the plasmid and PEI were first incubated with 1 mL of Opti-MEM I reduced serum medium (Gibco), respectively. The two mixtures were then mixed and allowed to stand at room temperature for 15 minutes before the cells were added. After 48 hours, cells were lysed with lysis buffer.
- Opti-MEM I reduced serum medium Gibco
- the lysis buffer formulation was 50 mM Tris-HCl pH 7.4, 150 mM NaCl, 1% Triton-X 100, 1 mM EDTA, supplemented with protease inhibitor cocktail (Roche) and RNase inhibitor RiboLock (40 U/mL, Thermo Fisher).
- the lysate was centrifuged at 15,000 g for 10 minutes at 4°C to remove insoluble cell debris.
- the supernatant was incubated with anti-FLAG M2 magnetic beads (Sigma) for 3 hours at room temperature.
- beads were washed once with high salt wash buffer (50 mM Tris-HCl pH 7.4, 1 M NaCl, 1% Triton-X 100, protease inhibitor cocktail (Roche), RiboLock (Thermo Fisher, 40 U/mL)) , and washed twice with low salt wash buffer (50 mM Tris-HCl pH 7.4, 150 mM NaCl, 5 mM EDTA, protease inhibitor cocktail (Roche), RiboLock (Thermo Fisher, 40 U/mL)).
- high salt wash buffer 50 mM Tris-HCl pH 7.4, 1 M NaCl, 1% Triton-X 100, protease inhibitor cocktail (Roche), RiboLock (Thermo Fisher, 40 U/mL)
- low salt wash buffer 50 mM Tris-HCl pH 7.4, 150 mM NaCl, 5 mM EDTA, protease inhibitor cocktail (Roche), RiboLock (Thermo Fisher
- RNA (8 ⁇ L) obtained in Example 2 or Example 3 was mixed with 3’ ligation reaction mixture (6 ⁇ L PEG8000, 1 ⁇ L 3’ linker (10 ⁇ M), 1 ⁇ L DTT (100 mM), 2 ⁇ L 10 ⁇ ligation buffer, 1 ⁇ L T4 RNA Ligase KQ (NEB), 1 ⁇ L RiboLock) was incubated at 25°C for 2 hours to ligate the 3' linker; then incubated at 65°C for 20 minutes to inactivate the enzyme;
- 3’ ligation reaction mixture (6 ⁇ L PEG8000, 1 ⁇ L 3’ linker (10 ⁇ M), 1 ⁇ L DTT (100 mM), 2 ⁇ L 10 ⁇ ligation buffer, 1 ⁇ L T4 RNA Ligase KQ (NEB), 1 ⁇ L RiboLock) was incubated at 25°C for 2 hours to ligate the 3' linker; then incubated at 65°C for 20 minutes to inactivate the enzyme;
- RNA concentration kit Zymo
- mutagenic reverse transcription buffer 50 mM Tris-HCl pH 8.0, 500 ⁇ M dNTP, 75 mM KCl, 10 mM DTT, 6 mM MnCl2, 1 ⁇ L RiboLock
- mutagenic reverse transcription buffer 50 mM Tris-HCl pH 8.0, 500 ⁇ M dNTP, 75 mM KCl, 10 mM DTT, 6 mM MnCl2, 1 ⁇ L RiboLock
- the cDNA product obtained from the above reaction was purified with a DNA concentration kit (Zymo);
- PCR was performed in a qPCR instrument (Agilent, Mx3000P) to monitor the amplification process and programmed as follows: Phase I: 1 min at 98°C; Phase II: 98°C for 15 s, 65°C for 30 s, 72°C for 45 s, Phase II cycle several times. Determine the cycle number according to the fluorescence value of the qPCR instrument, generally, the cycle number is 13 to 15 times;
- the PCR product obtained from the above reaction was purified with a DNA concentration kit (Zymo) and further size screened (150-330 nt) on a 6% native PAGE gel to remove excess PCR primers. Purify the final PCR product from the gel according to the method described previously, that is, to obtain the final library. Libraries were paired-end 150 cycles sequenced on the Illumina HiSeq X TEN platform.
- Embodiment 5 icSHAPE-MaP structure score calculation
- Preprocessing remove adapters by cutadapt (v1.16), filter out low-quality reads by Trimmomatic (v0.33), and remove repetitive sequencing reads on the sequence by a custom Perl script;
- sRNA sequences less than about 200 nt in length such as miRNA (from miRbase v22), snoRNA (from Gencode v26), snRNA (from Gencode v26), tRNA (from GtRNAdb v2.0), vault RNA (from RefSeq v109), ⁇ RNA (from RefSeq v109), and 5S rRNA.
- the reads processed as above were aligned to the collected human sRNA sequences with STAR (v2.7.1a), and the parameters were set to outFilterMismatchNmax 3, outFilterMultimapNmax 10, alignEndsType Local-scoreGap-1000-outSAMmultNmax 1.
- the raw profile scores are normalized with normalize_profiles.py.
- the icSHAPE-MaP structure score for the i base is the difference between the mutation rate of the i base in the NAI-N3 modified samples and the control DMSO group samples divided by the normalization factor f
- NAI-N3 modifications can cause various types of mutations, including mismatches, insertions, deletions, and other complex mutations ( Figure 3).
- Correlation of mutation rates between replicates Balancing total read counts from two replicates by downsampling. All bases are sorted by coverage. Bases with coverage greater than 500, 1000, 2000, 3000, 4000, or 5000 were selected, and the repeat correlation between mutation rate and sliding window (window size: 50nt, window step: 10nt) was calculated. Finally, a cumulative distribution curve is generated from the correlation data obtained at each threshold.
- RNA secondary structure with constraints The Fold program in the RNAstructure package (v5.6) was used to predict the secondary structure of RNA.
- the icSHAPE-Map structure score is used as a constraint, the parameters are: -si -0.6 -sm 1.8 -SHAPE icSHAPE-Map.shape-mfe
- RNA secondary structure visualization Visualize RNA secondary structure with the VARNAv3-93 command line. Use the parameters "-basesStyle1 on and -applyBasesStyle1 on” to add the color of the bases
- Dicer belongs to the RNase III family, which cleaves double-stranded RNA (dsRNA) and precursor microRNA (pre-miRNA) hairpins into mature small interfering RNA (siRNA) or microRNA (miRNA), respectively. How precisely Dicer determines the cleavage sites of its substrates is critical to the process of RNA interference (RNAi) and miRNA production. Previous studies have shown that Dicer takes different measurements to determine its cleavage site, 1) from the 3' overhanging end of dsRNA substrates (3' counting rule); 2) or from the 5' end of pre-miRNA and dsRNA.
- RNAi RNA interference
- the constrained model of pre-miR-27b has a 6-nt terminal loop and a bulge adjacent to it (Figure 19).
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
一种全转录组RNA结构探测方法及其应用。一种结合体内点击化学选择性2'-羟基酰化和突变谱分析来探测完整RNA结构的方法,结合RNA免疫沉淀技术,进一步应用在解析Dicer结合的底物RNA结构图谱中,并揭示了Dicer底物的结构类型及特征。提供的全转录组RNA结构探测的方法,也可以对小RNA进行完整的全长的结构分析,为开展细胞内全转录组RNA分子的结构及生物学功能等研究奠定基础。
Description
本发明涉及分子生物技术领域,具体提供一种全转录组RNA结构探测的方法及其应用。本发明可以探测细胞内所有RNA分子的二级结构,尤其是长度<200nt的RNA。
全转录组RNA结构组学将化学探测与下一代测序结合对RNA的结构进行研究。被广泛用于体内RNA结构探测的化学试剂包括硫酸二甲酯(DMS)、1-甲基-7-硝基靛红酸酐(1M7)、2-甲基烟酸咪唑化物-叠氮化物(NAI-N3)和乙氧二羟丁酮等。DMS在体内修饰单链的腺嘌呤和胞嘧啶碱基的N1和N3位置,而NAI-N3能酰化所有四种单链碱基的游离2'-羟基。icSHAPE技术正是利用NAI-N3的结构选择性2'-羟基酰化的特性,并结合后续的测序技术对转录组RNA进行结构探测。icSHAPE已用于揭示与不同生物过程相关的RNA的结构差异变化,例如活细胞中的翻译过程、RNA与蛋白质相互作用区域和N6-甲基腺苷修饰位点的结构差异。
DMS-seq和icSHAPE技术的原理在于经化学修饰的核苷酸在逆转录时产生逆转录终止信号,以此确定核苷酸处于单链构象的概率。然而,这些技术的局限是由于在3’末端产生的短测序读段的比对困难,导致的所探测靶标的3'末端的结构信息的丢失。丢失的可能是研究中完整的转录物或其片段,例如长RNA的功能区。这样的技术缺陷严重制约了小片段靶标的结构分析,如小RNA(sRNA,长度小于约200nt的RNA)或RNA结合蛋白(RBPs)的结合位点等。DMS-突变谱分析(DMS-MaPseq)和SHAPE-MaP技术通过测量逆转录过程中被化学试剂修饰的核苷酸位置上产生的突变率,而不是终止信号,来克服3'末端结构信息丢失问题。 然而,DMS-MaPseq提供了部分核苷酸覆盖率(仅可以探测腺苷“A”和胞苷“C”核苷酸),并且当前的SHAPE-MaP试剂(例如,NMIA、1M7等)仅具有中等的细胞膜穿透能力,限制了其对于体内RNA的结构探测。
发明内容
针对上述问题,我们开发了一种探测全转录组RNA结构的方法。简而言之,我们利用NAI-N3在细胞内结构选择性地修饰RNA2’-羟基的特性和逆转录突变谱分析的优点来开发新的结构探测方法icSHAPE-MaP。为了证明其能力,我们使用icSHAPE-MaP来确定细胞sRNA的完整结构信息。此外,我们将icSHAPE-MaP与RNA免疫沉淀(RIP)组合以在全局尺度上确定RNA核酸内切酶Dicer的底物的结构图谱。
我们利用本发明提出的RNA结构探测方法icSHAPE-MaP以及三级结构建模,发现空间距离是Dicer对pre-miRNA加工过程的一个重要参数。
为了解决现有技术中的上述问题,本发明提供一种全转录组RNA结构探测的方法,通过本发明,成功解析出Dicer结合的底物RNA结构图谱,并揭示了Dicer底物的结构类型及特征。
本发明提供一种核酸结构探测的方法及应用,包括:1)用标记试剂修饰核酸;2)对核酸进行处理;3)对处理后的核酸进行测序;4)依据测序结果计算结构分数;5)预测核酸结构。
其中核酸为RNA;进一步,RNA为全长RNA;更进一步地,RNA为转录组RNA;更进一步地,RNA为小RNA;更进一步地,RNA可以是miRNA、snoRNA、snRNA、tRNA、穹窿体RNA、Y RNA、pre-miRNA、miscRNA和5S rRNA等或RNA转录本片段,例如mRNA的exon和intron、lncRNA的exon和intron等等。
在一个具体实施方式中,本发明提供一种RNA结构探测的方法,包括:1)用标记试剂修饰核酸;2)对RNA进行处理;3)对处理后的产物进行测序;4)依据测序结果计算结构分数;5)预测核酸结构。
进一步地,RNA结构探测的方法,包含以下a)-d)步骤之一:
a)步骤2)中的处理为对RNA进行逆转录获得cDNA;
b)步骤3)中处理后的产物为cDNA,测序为针对cDNA的深度测序;
c)步骤4)中计算结构分数包括统计各核苷酸位点突变频数及计算突变率的步骤;
d)步骤5)中预测核酸结构包括将步骤4)中得到的RNA结构分数图谱应用于预测RNA二级结构、三级结构或其他高级结构。
在一个具体实施方式中,本发明提供一种全转录组RNA结构探测方法,包括:1)用标记试剂修饰核酸;2)对RNA进行处理;3)对处理后的产物进行测序;4)依据测序结果计算结构分数;5)预测核酸结构。
进一步地,全转录组RNA结构探测方法,包含以下a)-d)步骤之一:
a)步骤2)中的处理为对RNA进行逆转录获得cDNA;
b)步骤3)中处理后的产物为cDNA,测序为针对cDNA的深度测序;
c)步骤4)中计算结构分数包括统计各核苷酸位点突变频数及计算突变率的步骤;
d)步骤5)中预测核酸结构包括将步骤4)中得到的RNA结构分数图谱应用于预测RNA二级结构、三级结构或其他高级结构。
优选的,所述二级结构包括单链RNA、配对的双链RNA、茎环或发卡、突环和接触或多环、内饰环、假结、相吻发卡等。所述三级结构为RNA分子基于二级结构在空间构象上由核酸链更深一步折叠所造成的复杂结构。所述其他高级结构包括RNA-蛋白质复合物的空间构象等。
本发明提供的全转录组RNA结构探测方法,其中结构探测方法可以是DMS-突变谱分析或SHAPE-MaP(突变谱)法。
进一步地,所述标记试剂为化学修饰试剂。优选的,所述化学修饰试剂具有高细胞内反应活性。所述高细胞内反应活性是指能够在合理时间内在细胞内选择性地与RNA中结构偏向单链的核苷酸的反应,产生足够多的修饰位点,例 如NAI,NAIN3,DMS,kethoxal。而相对的,1M7,NMIA为低细胞内反应活性的修饰试剂。
优选的,所述标记试剂选用硫酸二甲酯(DMS)、2-甲基烟酸咪唑化物-叠氮化物(NAI-N3)或乙氧二羟丁酮;更优选的,标记试剂选用2-甲基烟酸咪唑化物-叠氮化物(NAI-N3)。
进一步地,所述方法可探测体内细胞或体外所有类型的RNA结构。
更进一步地,RNA的长度可以在200nt以下。
本发明提供的全转录组RNA的结构探测方法,对于步骤1)用标记试剂修饰核酸具体为:将细胞与标记试剂共孵育后,提取RNA;或将体外RNA与标记试剂混合后,用试剂盒纯化提取RNA。
进一步地,经化学修饰的RNA在逆转录前加上5'和3'末端衔接子。
更进一步地,5'末端衔接子具有如下基因序列:5’-rArCrArCrGrArCrGrCrUrCrUrUrCrCrGrArUrCrUrNrNrNrNrNrNrNrN-3’(SEQ ID No.1),3'末端衔接子具有如下基因序列:5’腺苷化-AGATCGGAAGAGCACACGTCT-3’(SEQ ID No.2)SpacerC3。
进一步地,逆转录引物具有5’-AGACGTGTGCTCTTCCGATCT-3’(SEQ ID No.3)所述的基因序列。
本发明提供的全转录组RNA的结构探测方法,将步骤2)所得cDNA加入到PCR反应体系中进行扩增反应,将所得PCR产物进行深度测序。
进一步地,PCR反应体系包含:P5引物、P3引物、25×SYBR Green、2×Phusion High-Fidelity PCR主混合物。
更进一步地,P5引物具有5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’(SEQ ID No.4)所述的基因序列,P3引物具有5’-CAAGCAGAAGACGGCATACGAGAT[8碱基barcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’(SEQ ID No.5)所述的 基因序列。在“……GAGAT”和“GTGAC……”中插入8碱基barcode。
更进一步地,P3引物序列中8碱基barcode用于区分不同样本产生的测序文库。
更进一步地,PCR反应程序为:阶段I:98℃1分钟;阶段II:98℃15秒,65℃30秒,72℃45秒,阶段II循环若干次。循环次数根据qPCR仪器显示的荧光值来确定,一般在13~15。
本发明提供的全转录组RNA的结构探测方法中,测序覆盖率的阈值可以为1000×或500×,优选为2000×。
进一步地,对于步骤4),计算赋值包括以下任一步骤之一:
a)对测序数据进行预处理,包括:用去除3’接头(优选用Cutadapt),过滤高质量读段(优选用Trimmomatic),删除重复序列(优选用Perl);
b)将干净的读段映射到参考序列上(优选使用STAR);
c)计算icSHAPE-MaP结构分数(优选使用Shapemapper2);
d)预测RNA二级结构(优选使用RNAstructure package);
e)将RNA二级结构可视化(优选使用VARNAv3-93)。
进一步地,所述RNA序列为sRNA序列或与蛋白质结合的RNA。
进一步地,在计算icSHAPE-MaP结构分数时,突变率包括所有类型的突变,如错配、插入、缺失和其他复杂突变。
进一步地,使用shape_mutation_counter计算每个核酸的突变率。
更进一步地,某碱基i的icSHAPE-MaP结构分数的计算公式为
r:突变率,nai:标记试剂样品组,dmso:DMSO样品组,f:归一化因子。
本发明还提供一种探测特定RNA结构的方法,为上述方法与RNA免疫沉淀方法的组合使用。
进一步地,所述特定RNA为与蛋白质结合的RNA,例如Dicer结合的底物 RNA。
本发明还提供一种探测全转录组RNA结构的试剂盒,其特征在于,所述试剂盒包括上述探测全转录组RNA结构的方法中任一所述的化学修饰试剂、核苷酸序列。
本发明的有益效果在于:
本发明提出了一种新的生物技术“icSHAPE-MaP”,其通过利用逆转录酶的突变谱分析检测高细胞内反应活性的标记试剂,例如NAI-N3诱导的修饰来探测完整形式的RNA的体内二级结构。重要的是,该方法允许对小尺寸的RNA种类(全长sRNA或长RNA的片段(例如RBP结合位点))进行结构分析。本发明还展示了icSHAPE-MaP在揭示Dicer底物的sRNA的结构图谱中的应用。将来,icSHAPE-MaP可用于揭示其他RBP结合的RNA的结构特征。
以上只是概括了本发明的一些方面,不是也不应该认为是在任何方面限制本发明。除非特别说明,本发明的实践将采取细胞生物学、细胞培养、分子生物学和免疫学等的传统技术。这些技术在以下文献中进行了详细的解释。例如:
1、Reuter,J.S.,and Mathews,D.H.(2010).RNAstructure:software for RNA secondary structure prediction and analysis.BMC Bioinformatics11,129.16。
2、Das,R.,Karanicolas,J.,and Baker,D.(2010).Atomic accuracy in predicting and designing noncanonical RNA structure.Nat Methods 7,291-294 23。
3、Zubradt,M.,Gupta,P.,Persad,S.et al.DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo.Nat Methods 14,75–82(2017).https://doi.org/10.1038/nmeth.4057。
4、Siegfried,N.,Busan,S.,Rice,G.et al.RNA motif discovery by SHAPE and mutational profiling(SHAPE-MaP).Nat Methods 11,959–965(2014).https://doi.org/10.1038/nmeth.3029。
为了更清楚地说明本发明实施例或现有的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为比对到参考基因组上的读段中在不同组中携带突变的比例。在对照组DMSO组中,携带突变的读段比例明显低于体内NAI-N3修饰和体外NAI-N3修饰组,表明突变率的提高确实是由于NAI-N3修饰导致的;
图2为对照DMSO组、体内NAI-N3修饰组和体外NAI-N3修饰组样品中四种碱基的突变率的条形图,表明NAI-N3可以同时修饰四种碱基;
图3为人5S rRNA中八种类型突变的突变率,表明不同类型的突变对突变率的贡献;
图4为人5S rRNA的对照DMSO组和体内NAI-N3修饰组每个核苷酸位置上的突变率;
图5为5S rRNA的已知二级结构模型和每个核苷酸位置上icSHAPE-MaP结构分数;
图6为具有icSHAPE-MaP结构分数的一个snoRNA和一个tRNA及其二级结构模型的实例;
图7为两次重复之间同一区域的NAI-N3修饰组突变率的皮尔逊相关系数的累积分布曲线;
图8为在两次重复之间,对于A和C碱基,DMS修饰组(上)或NAI-N3修饰组(下)样品中突变率的皮尔逊相关系数的累积分布曲线;
图9为不同类型的Dicer底物的统计环形图;
图10为Dicer结合片段的长度分布的小提琴图;
图11为通过RNA免疫沉淀富集的GDI1和DICER1中两个片段的读段覆盖 度;
图12为具有icSHAPE-MaP结构分数的不同类型的RNA的统计环形图;
图13为DMSO或NAI-N3样品中两次重复之间核苷酸的突变率的皮尔逊相关系数的热图;
图14为具有icSHAPE-MaP结构分数的不同类型的Dicer富集的RNA的统计环形图;
图15为HEK293T细胞中表达水平前150的pre-miRNA的相对表达水平的条形图,高度表示pre-miRNA的相对表达水平;
图16为来自GtRNAdb数据库的209个tRNA二级结构与所得icSHAPE-MaP结构分数比较分析得到的曲线下面积(AUCs)的小提琴图;
图17为以结构分数为约束通过RNAstructure软件构建hsa-miR-125a的二级结构模型,每个核苷酸的颜色表示icSHAPE-MaP结构分数高低;
图18为来自miRBase数据库的hsa-miR-125a的二级结构模型,同样的每个核苷酸的颜色表示icSHAPE-MaP结构分数高低;
图19为hsa-miR-19a和hsa-miR-27b pre-miRNA的二级结构模型及其对应每个核苷酸icSHAPE-MaP结构分数,(上)使用RNAStructure软件建模,以其icSHAPE-MaP结构分数作为约束;(下)模型来源于miRBase数据库;
图20为来自miRBase数据库或通过RNAStructure预测的108个pre-miRNA结构之间的伪自由能的小提琴图。
以下通过实施例对本发明所述内容作进一步详细说明。但不应该将此理解为本发明上述主题的范围仅限于以下的实例。在不脱离本发明上述技术思想情况下,根据本领域普通技术知识和惯用手段做出的各种替换或更变,均应包括在本发明的范围内。
下述实施例中所使用的实验方法如无特殊说明,均为常规方法。下述实施 例中所用的材料、试剂等,如无特殊说明,均可从商业途径得到。
实施例1、细胞培养和转染
HEK293T细胞系购自ATCC。Dicer KO HEK293T细胞系(NoDice 2-20)由杜克大学Bryan R.Cullen博士赠送。细胞培养在37℃、5%CO
2的潮湿的培养箱中,细胞培养基为高葡萄糖DMEM含有L-谷氨酰胺、丙酮酸钠(Thermo Scientific HyClone)和10%胎牛血清。所有细胞转染实验使用聚乙烯亚胺(PEI)(Sigma-Aldrich)进行。
实施例2、RNA化学修饰
将HEK293T细胞从培养皿上刮下并用PBS洗涤;将所得细胞重悬于100mM NAI-N3中,并在恒温混合仪中于37℃孵育5分钟;在4℃以2500g离心1分钟后终止反应,随后除去上清液;收集细胞并重悬于250μl PBS中,按照说明加入750μl TRIzol LS试剂进行RNA提取;所得的RNA或体外制备的RNA在6%变性尿素-PAGE凝胶中跑胶,进行大小筛选(25~200nt);将含有特定长度RNA的凝胶压碎后置于缓冲液(500mM NaCl、1mM EDTA pH 8.0、10mM Tris-HCl pH 8.0)中,在4℃旋转孵育过夜。用0.45μm Spin-X柱(Thermo Fisher)离心浓缩含有洗脱RNA的溶液,并通过RNA浓缩试剂盒(Zymo)纯化得到特定大小范围的RNA(25~200nt)。
实施例3、RNA免疫沉淀
将表达失去切割活性的人Dicer的质粒(含有两个突变(D1320A和D1709A)在其RNA酶III结构域,Addgene)转染入NoDice 2-20细胞。第一天在15cm板中接种9×10
6个细胞。24小时后,用20μg质粒以及60μl(1μg/μl)PEI转染。具体地,首先将质粒和PEI分别与1mL Opti-MEM I还原血清培养基(Gibco)混合孵育。然后将两种混合物混合,在室温下静置15分钟,再加入细胞。48小时后,用裂解缓冲液裂解细胞。裂解缓冲液配方为50mM Tris-HCl pH 7.4,150mM NaCl,1%Triton-X 100,1mM EDTA,并补充蛋白酶抑制剂混合物(Roche)和RNA酶抑制剂RiboLock(40U/mL,Thermo Fisher)。将裂解产物离心,在4℃ 以15,000g离心10分钟以除去不溶性细胞碎片。将上清液与抗FLAG M2磁珠(Sigma)在室温下孵育3小时。
孵育后,将磁珠用高盐洗涤缓冲液(50mM Tris-HCl pH 7.4、1M NaCl、1%Triton-X 100、蛋白酶抑制剂混合物(Roche)、RiboLock(Thermo Fisher,40U/mL))洗涤一次,并用低盐洗涤缓冲液(50mM Tris-HCl pH 7.4、150mM NaCl、5mM EDTA、蛋白酶抑制剂混合物(Roche)、RiboLock(Thermo Fisher,40U/mL))洗涤两次。最后一次洗涤后,将珠粒与修饰缓冲液(333mM HEPES、20mM MgCl2、150mM NaCl、50mM NAI-N3)在恒温混合器上在37℃下以1000rpm孵育12分钟(NAI-N3修饰组)。对于对照DMSO组,把修饰缓冲液中NAI-N3用DMSO替换。最后用Trizol LS提取RNA。
实施例4、文库构建
将实施例2或实施例3中得到的RNA(8μL)与3'连接反应混合物(6μL PEG8000,1μL 3'连接子(10μM),1μL DTT(100mM),2μL 10×连接缓冲液,1μL T4 RNA连接酶KQ(NEB),1μL RiboLock)在25℃下孵育2小时,连接3’连接子;然后在65℃下孵育20分钟失活酶;
将1.2μL逆转录引物(10μM)加入到上述混合液中,并通过75℃5分钟、37℃15分钟、25℃15分钟退火,使逆转录引物与3'连接子配对,并中和过量的3'连接子;
再向混合液中加入5'连接反应混合物(3μL PEG8000、3μL 10mM ATP、1μL 10×连接缓冲液、0.5μL RiboLock、0.5μL 5'连接子(20μM)、1μL T4 RNA连接酶I(NEB))并在25℃下孵育2小时;
用RNA浓缩试剂盒(Zymo)纯化上述反应混合液得到连接上5’和3’连接子的RNA;
将9μL易发生突变的逆转录缓冲液(50mM Tris-HCl pH 8.0、500μM dNTP、75mM KCl、10mM DTT、6mM MnCl2、1μL RiboLock)加入到10μL纯化的RNA中,让反应混合物在42℃下孵育2分钟;
再将1μL SuperScript II(Thermo Fisher)加入到上述反应混合物中,并在42℃下孵育3小时,进行逆转录反应;
用DNA浓缩试剂盒(Zymo)纯化上述反应所得的cDNA产物;
将20μL洗脱的cDNA和PCR反应混合物(0.5μL P5引物(20μM)、0.5μL P3索引引物(20μM)、0.4μL 25×SYBR Green、20μL 2×Phusion High-Fidelity PCR主混合物(NEB))建立PCR反应体系;
在qPCR仪(安捷伦,Mx3000P)中进行PCR以监测扩增过程,并编程如下:阶段I:98℃1分钟;阶段II:98℃15秒,65℃30秒,72℃45秒,阶段II循环若干次。根据qPCR仪的荧光值确定循环数,一般地,循环次数为13~15次;
用DNA浓缩试剂盒(Zymo)纯化上述反应所得PCR产物,并在6%非变性PAGE凝胶上进一步进行大小筛选(150~330nt),以除去过量的PCR引物。依据之前描述的方法从凝胶中纯化得到最终的PCR产物,即得到最终的文库。在Illumina HiSeq X TEN平台上对文库进行双端150个循环的测序。
实施例5、icSHAPE-MaP结构分数计算
预处理:通过cutadapt(v1.16)去除衔接子,用Trimmomatic(v0.33)过滤掉低质量读段,并用自定义的Perl脚本去除序列上重复的测序读段;
比对:收集长度小于约200nt的人sRNA序列,例如miRNA(来自miRbase v22)、snoRNA(来自Gencode v26)、snRNA(来自Gencode v26)、tRNA(来自GtRNAdb v2.0)、穹窿体RNA(来自RefSeq v109)、Y RNA(来自RefSeq v109)和5S rRNA。用STAR(v2.7.1a)将如上处理的读段比对到收集的人sRNA序列,参数设置为outFilterMismatchNmax 3、outFilterMultimapNmax 10、alignEndsType Local-scoreGap-1000-outSAMmultNmax 1的。为了找出在人类基因组上未良好注释的其他sRNA片段,将未比对的读段比对到人类基因组(版本GRCh38.p12)以重复上述数据分析。无论在体内和体外,与对照DMSO组文库相比,NAI-N3修饰组文库中携带突变的比对读段的比例显著地提高,说明 NAI-N3确实在逆转录过程引起了突变(图1)。突变率的增加在A和U碱基更为显著,这与先前的观察结果一致,即与G/C相比,单链区域富含A/U(图2)。
计算icSHAPE-MaP结构分数:将样品重复间的数据合并(使用samtools的merge命令)。Shapemapper2(v2.1.4)用于如下计算最终结构分数:
a.用shapemapper_mutation_parser解析每个读段上的突变。该脚本统计了8种突变类型:错配、插入、缺失、多错配、多插入、多缺失、复合插入和复合缺失;
b.用shapemapper_mutation_counter统计每个核苷酸的突变频数;
c.用make_reactivity_profiles.py计算icSHAPE-MaP结构分数;
d.原始结构分数用normalize_profiles.py归一化。
每种碱基的计算过程可以通过以下公式简要总结:
i碱基的icSHAPE-MaP结构分数是NAI-N3修饰样品和对照DMSO组样品中i碱基的突变率之间的差异除以归一化因子f
NAI-N3修饰可引起各种类型的突变,包括错配、插入、缺失和其他复杂突变(图3)。
重复之间的突变率的相关性:通过下采样平衡来自两个重复的总读段计数。所有碱基按覆盖率分类。选择覆盖率大于500、1000、2000、3000、4000或5000的碱基,计算突变率与滑动窗口(窗口大小:50nt,窗口步长:10nt)的重复相关性。最后,根据每个阈值下得到的相关性数据产生累积分布曲线。
具有约束条件的RNA二级结构的计算预测:RNAstructure包(v5.6)中的Fold程序用于预测RNA的二级结构。icSHAPE-Map结构分数用作约束条件,参数为:-si -0.6 -sm 1.8 -SHAPE icSHAPE-Map.shape-mfe
RNA二级结构可视化:用VARNAv3-93命令行可视化RNA二级结构。使用参数“-basesStyle1 on and -applyBasesStyle1 on”加上碱基的颜色
利用icSHAPE-MaP方法获得了186个转录本的体内结构分数和250个转录 本体外的结构分数(图4-5),其中5S rRNA的结构分数,AUC=0.825(越接近1越吻合),表明所得结构分数与已知二级结构模型很好地吻合,与已知结构非常一致,由此证明了icSHAPE-MaP的准确性(参见图5)
利用icSHAPE-MaP方法获得了具有已知二级或三级结构模型的其他sRNA的准确的结构分数,包括RNU7的3'片段(小核RNA,snRNA,AUC=0.994)和Gln-TTG-2-1(tRNA,AUC=0.818)(图6)
以2000×测序覆盖率作为阈值,可以得到非常高质量高重复性的结构分数;当在测序成本和数据质量及重复性之间的权衡时,可以看到以500作为测序覆盖度阈值的情况下,超过80%的片段的突变率的皮尔逊相关系数大于0.96,表明我们实验的可重复性很好,1000×或甚至500×覆盖率也可以作为一个合理的阈值(图7)。重要的是,在同样的测序覆盖度阈值下,NAI-N3修饰组(icSHAPE-MaP技术)比DMS修饰组(DMS-MaPseq技术)的可重复性要更好,从测序成本的角度考虑,icSHAPE-MaP技术所需的测序深度更浅,即在同样的数据重复性要求下,我们发现icSHAPE-MaP所需要的测序覆盖度比DMS-MaPseq要小得多的(图8)。
实施例6、icSHAPE-MaP方法在分析Dicer底物中的应用
Dicer属于RNase III家族,它分别将双链RNA(dsRNA)和前体微小RNA(pre-miRNA)发夹切割成成熟的小干扰RNA(siRNA)或微小RNA(miRNA)。Dicer如何精确地确定其底物的切割位点对于RNA干扰(RNAi)和miRNA产生过程是至关重要的。已有的研究表明Dicer采取不同的测量方法来确定其切割位点,1)从dsRNA底物的3'突出末端(3'计数规则);2)或从pre-miRNA和dsRNA的5'末端的磷酸基团(5'计数规则)测量一定数量的核苷酸;3)此外,在对短发夹RNA(shRNA)和pre-miRNA的体内研究中表明:Dicer使用单链区域(凸起或末端环)来精确地锚定单链区域下游2-nt作为切割位点(环计数规则)。然而,关于这些机制何时以及在多大程度上应用于pre-miRNA加工,尚不清楚。此外,Dicer还可与多种底物结合,却没有对应的miRNA或siRNA产生,这表明它还 在RNA代谢中具有其他作用。Dicer是否可以区分,以及如何区分可切割底物和不可切割底物的机制尚不清楚。
通过实施例1-5所述的方法,在Dicer底物的分析中,发现在未修饰的DMSO组文库中检测到了1,595个Dicer富集的RNA(图9)。与其他富集策略相比,富集的RNA列表之间高度相似,相互之间共有的pre-miRNA超过了50%(表1)。除了pre-miRNA之外,我们鉴定了长度中位值约为70-nt的其他细胞内转录本(图10),包括snoRNA、tRNA、信使RNA(mRNA)的内含子和外显子序列片段以及基因间区来源的转录本,表明大部分Dicer结合片段在60-70nt左右,符合预期。这些内含子和外显子片段的读段覆盖图谱显示出非常清晰的边界,表明它们是由其位置的mRNA加工而成的具有一定功能的产物(图11)。
表1:本发明发现的Dicer结合位点与PAR-CLIP发现的Dicer结合位点的比较
使用RIP-icSHAPE-MaP,我们获得了820个覆盖良好(>1000×测序覆盖度)的RNA的结构信息(图12)。我们发现突变谱在独立的生物学重复内高度相关,这表明该方法的重复性很好(图13)。基于我们的RNA-seq数据,我们使用RIP富集得分将其中的439个RNA分类为Dicer靶标,这其中包括122个pre-miRNA。它们包含了HEK293T细胞中几乎所有的表达水平靠前的pre-miRNA(图14-15)。作为结构建模参照,我们将我们的数据集中tRNA的icSHAPE-MaP结构分数与其在GtRNAdb中公布的结构模型进行了比较,并计算了AUCs。大多数AUCs远高于0.5,中位数在0.7以上,说明大部分tRNA所得结构分数与来自GtRNAdb数据库的二级结构模型吻合得很好(图16),表明我们的结构探测与tRNA的现有共进化结构模型之间具有良好一致性。
利用本发明,我们获得了pre-miR-125a的具有结构分数约束的结构模型,其含有12-nt末端环(G25-G36)(图17)。相比之下,来自miRbase(版本22.1)的无约束结构模型表明具有多个凸起,内部环和一个较小的末端环(图18)。此外,pre-miR-19a的约束模型显示其具有一个12-nt末端环,而其miRbase模型含有一个更小的末端环,并且与其miRbase模型中的小的3-nt末端环和与其邻近的大的内部环相比,pre-miR-27b的约束模型具有一个6-nt末端环和一个与其邻近的凸起(图19)。大体上,pre-miRNA的约束结构模型的自由能也比来自miRbase的无约束结构模型的更低(p=1.65e-9),伪自由能越低表示结构越稳定,可以看出通过结合icSHAPE-MaP结构分数作为约束用RNAstructure进行预测得到的结构模型更为稳定(图20)。这些结果表明,将icSHAPE-MaP结构分数作为约束可以精确地模拟RNA二级结构,以此作为Dicer对其底物的加工和功能研究的结构基础。
Claims (24)
- 一种核酸结构探测方法,其特征在于,所述方法包括:1)用标记试剂修饰核酸;2)对核酸进行处理;3)对处理后的核酸进行测序;4)依据测序结果计算结构分数;5)预测核酸结构。
- 如权利要求1所述的核酸结构探测方法,其中核酸为RNA;进一步,RNA为全长RNA;更进一步地,RNA为转录组RNA;更进一步地,RNA为小RNA;更进一步地,RNA可以是miRNA、snoRNA、snRNA、tRNA、穹窿体RNA、Y RNA、miscRNA、pre-miRNA和5S rRNA等,或者RNA转录本片段,例如:mRNA的exon和intron、lncRNA的exon和intron等。
- 如权利要求2所述的核酸结构探测方法,其特征在于,所述核酸是RNA,包含以下a)-d)步骤之一:a)步骤2)中的处理为对RNA进行逆转录获得cDNA;b)步骤3)中处理后的产物为cDNA,测序为针对cDNA的深度测序;c)步骤4)中计算结构分数包括统计各核苷酸位点突变频数及计算突变率的步骤;d)步骤5)中预测核酸结构包括将步骤4)中得到的RNA结构分数图谱应用于预测RNA二级结构、三级结构或其他高级结构。
- 如权利要求3所述的核酸结构探测方法,其特征在于,所述核酸是全转录组RNA。
- 如权利要求1-4任一所述的核酸结构探测方法,其中结构探测方法可以是DMS-突变谱分析或SHAPE-MaP(突变谱)法。
- 如权利要求1-5任一所述的核酸结构探测方法,其中,所述标记试剂为化学修饰试剂。
- 如权利要求2-6任一所述的核酸结构探测方法,其特征在于,所述方法可探测体内细胞或体外所有类型的RNA结构。
- 权利要求2-7任一所述的核酸结构探测方法,其特征在于,所述RNA的长度在200nt以下。
- 如权利要求1-8任一所述的核酸结构探测方法,其特征在于,所述步骤1)用标记试剂修饰核酸具体为:将细胞与标记试剂共孵育后,提取RNA;或将体外RNA与标记试剂混合后,用试剂盒纯化提取RNA。
- 如权利要求3-9任一所述的核酸结构探测方法,其特征在于,经化学修饰的RNA在逆转录前加上5'和3'末端衔接子。
- 如权利要求10所述的核酸结构探测方法,其特征在于,5'末端衔接子具有如下基因序列:5’-rArCrArCrGrArCrGrCrUrCrUrUrCrCrGrArUrCrUrNrNrNrNrNrNrNrN-3’(SEQ ID No.1),3'末端衔接子具有如下基因序列:5’腺苷化-AGATCGGAAGAGCACACGTCT-3’(SEQ ID No.2)SpacerC3。
- 如权利要求10所述的核酸结构探测方法,其特征在于,逆转录反应引物具有5’-AGACGTGTGCTCTTCCGATCT-3’(SEQ ID No.3)所述的基因序列。
- 如权利要求3-12任一所述的核酸结构探测方法,对于步骤3),其特征在于,将步骤2)所得cDNA加入到PCR反应体系中进行扩增反应,将所得PCR产物进行深度测序。
- 如权利要求13所述的核酸结构探测方法,其特征在于,PCR反应体系包含:P5引物、P3引物、25×SYBR Green、2×Phusion High-Fidelity PCR主混合物。
- 如权利要求14所述的核酸结构探测方法,其特征在于,P5引物具有5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-3’(SEQ ID No.4)所述的基因序列,P3引物具有5’-CAAGCAGAAGACGGCATACGAGAT[8碱基barcode]GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3’(SEQ ID No.5)所述的基因序列。
- 如权利要求15所述的核酸结构探测方法,其特征在于,所述P3引物序列中,8碱基barcode用于区分不同样本产生的测序文库。
- 如权利要求1-16任一所述的核酸结构探测方法,对于步骤4),计算结构分数包括以下任一步骤之一:a)对测序数据进行预处理,包括:去除3’接头,过滤高质量读段,再删除重复序列;b)将干净的读段映射到参考序列上;c)计算icSHAPE-MaP结构分数;d)预测RNA二级结构;e)将RNA二级结构可视化。
- 如权利要求17所述的核酸结构探测方法,其特征在于,所述RNA包括但不限于小RNA(sRNA)或者蛋白质结合的RNA。
- 如权利要求17-18任一所述的核酸结构探测方法,其特征在于,在计算icSHAPE-MaP结构分数时,突变率包括所有类型的突变,如错配、插入、缺失和其他复杂突变。
- 如权利要求17-19任一所述的核酸结构探测方法,其特征在于,使用shape_mutation_counter计算每个核酸的突变率。
- 如权利要求1-21任一所述的核酸结构探测方法,其特征在于,所述方法还包括RNA免疫沉淀的步骤获取RNA。
- 如权利要求22所述的方法,其特征在于,所述RNA为与蛋白质结合的RNA。
- 一种探测全转录组RNA结构的试剂盒,其特征在于,所述试剂盒包括 权利要求1-23任一核酸结构探测方法中所述的化学修饰试剂和核苷酸序列。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/128949 WO2022099670A1 (zh) | 2020-11-16 | 2020-11-16 | 一种全转录组rna结构探测的方法及其应用 |
US18/260,442 US20240076735A1 (en) | 2020-11-16 | 2020-11-16 | Method for detecting whole transcriptome rna structure and use thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/128949 WO2022099670A1 (zh) | 2020-11-16 | 2020-11-16 | 一种全转录组rna结构探测的方法及其应用 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022099670A1 true WO2022099670A1 (zh) | 2022-05-19 |
Family
ID=81602073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/128949 WO2022099670A1 (zh) | 2020-11-16 | 2020-11-16 | 一种全转录组rna结构探测的方法及其应用 |
Country Status (2)
Country | Link |
---|---|
US (1) | US20240076735A1 (zh) |
WO (1) | WO2022099670A1 (zh) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140154673A1 (en) * | 2012-08-14 | 2014-06-05 | The Board Of Trustees Of The Leland Stanford Junior University | Probes of rna structure and methods for using the same |
CN108384831A (zh) * | 2018-02-13 | 2018-08-10 | 塔里木大学 | 具有4至10个核苷酸单体的寡聚核苷酸的检测方法 |
CN111192631A (zh) * | 2020-01-02 | 2020-05-22 | 中国科学院计算技术研究所 | 用于构建用于预测蛋白质-rna相互作用结合位点模型的方法和系统 |
-
2020
- 2020-11-16 US US18/260,442 patent/US20240076735A1/en active Pending
- 2020-11-16 WO PCT/CN2020/128949 patent/WO2022099670A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140154673A1 (en) * | 2012-08-14 | 2014-06-05 | The Board Of Trustees Of The Leland Stanford Junior University | Probes of rna structure and methods for using the same |
CN108384831A (zh) * | 2018-02-13 | 2018-08-10 | 塔里木大学 | 具有4至10个核苷酸单体的寡聚核苷酸的检测方法 |
CN111192631A (zh) * | 2020-01-02 | 2020-05-22 | 中国科学院计算技术研究所 | 用于构建用于预测蛋白质-rna相互作用结合位点模型的方法和系统 |
Non-Patent Citations (3)
Title |
---|
MATTHEW J SMOLA, GREGGORY M RICE, STEVEN BUSAN, NATHAN A SIEGFRIED, KEVIN M WEEKS: "elective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis", NATURE PROTOCOLS, vol. 10, no. 11, 1 November 2015 (2015-11-01), GB , pages 1643 - 1669, XP055746052, ISSN: 1754-2189, DOI: 10.1038/nprot.2015.103 * |
SMOLA MATTHEW J, WEEKS KEVIN M: "In-cell RNA structure probing with SHAPE-MaP", NATURE PROTOCOLS, vol. 13, no. 6, 1 June 2018 (2018-06-01), GB , pages 1181 - 1195, XP055848487, ISSN: 1754-2189, DOI: 10.1038/nprot.2018.010 * |
ZHENG JUN: "Prediction of Secondary Structure of SAM-V Riboswitch by SHAPE Chemical Detection", CHINESE MASTER'S THESES FULL-TEXT DATABASE, 11 January 2017 (2017-01-11), pages 1 - 67, XP055930529 * |
Also Published As
Publication number | Publication date |
---|---|
US20240076735A1 (en) | 2024-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Werner et al. | Chromatin-enriched lncRNAs can act as cell-type specific activators of proximal gene transcription | |
Leppek et al. | Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics | |
Wang et al. | RNA structure probing uncovers RNA structure-dependent biological functions | |
Golden et al. | An Argonaute phosphorylation cycle promotes microRNA-mediated silencing | |
Jathar et al. | Technological developments in lncRNA biology | |
Chillón et al. | The molecular structure of long non-coding RNAs: emerging patterns and functional implications | |
Zhang et al. | Quantitative profiling of pseudouridylation landscape in the human transcriptome | |
Blythe et al. | The ins and outs of lncRNA structure: How, why and what comes next? | |
Watters et al. | Characterizing RNA structures in vitro and in vivo with selective 2′-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) | |
Saletore et al. | The birth of the Epitranscriptome: deciphering the function of RNA modifications | |
Vainberg Slutskin et al. | Unraveling the determinants of microRNA mediated regulation using a massively parallel reporter assay | |
Waldron et al. | mRNA structural elements immediately upstream of the start codon dictate dependence upon eIF4A helicase activity | |
Georgieva et al. | Characterization of the uterine leiomyoma microRNAome by deep sequencing | |
Bottini et al. | Recent computational developments on CLIP-seq data analysis and microRNA targeting implications | |
Dassi et al. | Hyper conserved elements in vertebrate mRNA 3′-UTRs reveal a translational network of RNA-binding proteins controlled by HuR | |
Reon et al. | Biological processes discovered by high-throughput sequencing | |
Quarles et al. | Ensemble analysis of primary microRNA structure reveals an extensive capacity to deform near the Drosha cleavage site | |
Wang et al. | Translating extracellular microRNA into clinical biomarkers for drug-induced toxicity: from high-throughput profiling to validation | |
Sterling et al. | An efficient and sensitive method for preparing cDNA libraries from scarce biological samples | |
Solé et al. | The use of circRNAs as biomarkers of cancer | |
Sulaiman et al. | Prospective advances in circular RNA investigation | |
Martin et al. | Using SHAPE-MaP to probe small molecule-RNA interactions | |
Liu et al. | NAP-seq reveals multiple classes of structured noncoding RNAs with regulatory functions | |
Piao et al. | An ultra low-input method for global RNA structure probing uncovers Regnase-1-mediated regulation in macrophages | |
Fitzsimmons et al. | Rewiring of RNA methylation by the oncometabolite fumarate in renal cell carcinoma |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20961225 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18260442 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20961225 Country of ref document: EP Kind code of ref document: A1 |