WO2022107814A1 - 変異プロファイリングのためのrnaプローブ及びその使用 - Google Patents
変異プロファイリングのためのrnaプローブ及びその使用 Download PDFInfo
- Publication number
- WO2022107814A1 WO2022107814A1 PCT/JP2021/042250 JP2021042250W WO2022107814A1 WO 2022107814 A1 WO2022107814 A1 WO 2022107814A1 JP 2021042250 W JP2021042250 W JP 2021042250W WO 2022107814 A1 WO2022107814 A1 WO 2022107814A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- rna
- sequence
- barcode
- library
- probe
- Prior art date
Links
- 230000035772 mutation Effects 0.000 title claims abstract description 86
- 239000000523 sample Substances 0.000 title description 22
- 108020004518 RNA Probes Proteins 0.000 claims abstract description 104
- 239000003391 RNA probe Substances 0.000 claims abstract description 104
- 238000000034 method Methods 0.000 claims abstract description 53
- 238000006243 chemical reaction Methods 0.000 claims abstract description 35
- 239000003607 modifier Substances 0.000 claims abstract description 30
- 239000002773 nucleotide Substances 0.000 claims description 28
- 125000003729 nucleotide group Chemical group 0.000 claims description 27
- 108020004635 Complementary DNA Proteins 0.000 claims description 15
- 238000010804 cDNA synthesis Methods 0.000 claims description 14
- 239000002299 complementary DNA Substances 0.000 claims description 14
- 102100034343 Integrase Human genes 0.000 claims description 12
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 11
- 230000000295 complement effect Effects 0.000 claims description 11
- 239000000203 mixture Substances 0.000 claims description 9
- 239000003795 chemical substances by application Substances 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 13
- 230000037430 deletion Effects 0.000 abstract description 9
- 238000012217 deletion Methods 0.000 abstract description 9
- 230000037431 insertion Effects 0.000 abstract description 7
- 238000003780 insertion Methods 0.000 abstract description 7
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 188
- VAYGXNSJCAHWJZ-UHFFFAOYSA-N dimethyl sulfate Chemical compound COS(=O)(=O)OC VAYGXNSJCAHWJZ-UHFFFAOYSA-N 0.000 description 36
- 108020004414 DNA Proteins 0.000 description 27
- 238000003752 polymerase chain reaction Methods 0.000 description 23
- 150000007523 nucleic acids Chemical class 0.000 description 17
- 238000013507 mapping Methods 0.000 description 14
- 238000007481 next generation sequencing Methods 0.000 description 14
- 239000000243 solution Substances 0.000 description 14
- 108091028043 Nucleic acid sequence Proteins 0.000 description 13
- 102000039446 nucleic acids Human genes 0.000 description 13
- 108020004707 nucleic acids Proteins 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 12
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 11
- 230000015572 biosynthetic process Effects 0.000 description 11
- 238000001514 detection method Methods 0.000 description 10
- 238000013518 transcription Methods 0.000 description 10
- 230000035897 transcription Effects 0.000 description 10
- 238000007385 chemical modification Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 9
- 238000000338 in vitro Methods 0.000 description 9
- 238000012986 modification Methods 0.000 description 9
- 230000004048 modification Effects 0.000 description 9
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical group NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 8
- 238000009826 distribution Methods 0.000 description 8
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 8
- 238000010839 reverse transcription Methods 0.000 description 8
- 238000012163 sequencing technique Methods 0.000 description 8
- 102000004190 Enzymes Human genes 0.000 description 7
- 108090000790 Enzymes Proteins 0.000 description 7
- 108090000623 proteins and genes Proteins 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 6
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 6
- 239000002253 acid Substances 0.000 description 6
- 238000010828 elution Methods 0.000 description 6
- 238000002360 preparation method Methods 0.000 description 6
- 150000007513 acids Chemical class 0.000 description 5
- 239000003153 chemical reaction reagent Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 239000012634 fragment Substances 0.000 description 5
- 238000002898 library design Methods 0.000 description 5
- 102000004169 proteins and genes Human genes 0.000 description 5
- 238000000746 purification Methods 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- 229930024421 Adenine Natural products 0.000 description 4
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 4
- 102000053602 DNA Human genes 0.000 description 4
- HEFNNWSXXWATRW-UHFFFAOYSA-N Ibuprofen Chemical compound CC(C)CC1=CC=C(C(C)C(O)=O)C=C1 HEFNNWSXXWATRW-UHFFFAOYSA-N 0.000 description 4
- 230000026279 RNA modification Effects 0.000 description 4
- 239000013614 RNA sample Substances 0.000 description 4
- DRTQHJPVMGBUCF-XVFCMESISA-N Uridine Chemical group O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-XVFCMESISA-N 0.000 description 4
- 229960000643 adenine Drugs 0.000 description 4
- 230000003321 amplification Effects 0.000 description 4
- 239000000872 buffer Substances 0.000 description 4
- 229940104302 cytosine Drugs 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 231100000350 mutagenesis Toxicity 0.000 description 4
- 238000002703 mutagenesis Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 230000009257 reactivity Effects 0.000 description 4
- 108091093088 Amplicon Proteins 0.000 description 3
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 3
- 108091034117 Oligonucleotide Proteins 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 239000011616 biotin Substances 0.000 description 3
- 229960002685 biotin Drugs 0.000 description 3
- 150000001875 compounds Chemical class 0.000 description 3
- 238000007672 fourth generation sequencing Methods 0.000 description 3
- 238000010438 heat treatment Methods 0.000 description 3
- 238000002156 mixing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 150000003839 salts Chemical class 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 229940035893 uracil Drugs 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 241000894006 Bacteria Species 0.000 description 2
- 241000721047 Danaus plexippus Species 0.000 description 2
- SHIBSTMRCDJXLN-UHFFFAOYSA-N Digoxigenin Natural products C1CC(C2C(C3(C)CCC(O)CC3CC2)CC2O)(O)C2(C)C1C1=CC(=O)OC1 SHIBSTMRCDJXLN-UHFFFAOYSA-N 0.000 description 2
- NYHBQMYGNKIUIF-UUOKFMHZSA-N Guanosine Chemical group C1=NC=2C(=O)NC(N)=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O NYHBQMYGNKIUIF-UUOKFMHZSA-N 0.000 description 2
- KJMRWDHBVCNLTQ-UHFFFAOYSA-N N-methylisatoic anhydride Chemical compound C1=CC=C2C(=O)OC(=O)N(C)C2=C1 KJMRWDHBVCNLTQ-UHFFFAOYSA-N 0.000 description 2
- 108020004682 Single-Stranded DNA Proteins 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 108091012456 T4 RNA ligase 1 Proteins 0.000 description 2
- OIRDTQYFTABQOQ-KQYNXXCUSA-N adenosine group Chemical group [C@@H]1([C@H](O)[C@H](O)[C@@H](CO)O1)N1C=NC=2C(N)=NC=NC12 OIRDTQYFTABQOQ-KQYNXXCUSA-N 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- DRTQHJPVMGBUCF-PSQAKQOGSA-N beta-L-uridine Natural products O[C@H]1[C@@H](O)[C@H](CO)O[C@@H]1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-PSQAKQOGSA-N 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- QONQRTHLHBTMGP-UHFFFAOYSA-N digitoxigenin Natural products CC12CCC(C3(CCC(O)CC3CC3)C)C3C11OC1CC2C1=CC(=O)OC1 QONQRTHLHBTMGP-UHFFFAOYSA-N 0.000 description 2
- SHIBSTMRCDJXLN-KCZCNTNESA-N digoxigenin Chemical compound C1([C@@H]2[C@@]3([C@@](CC2)(O)[C@H]2[C@@H]([C@@]4(C)CC[C@H](O)C[C@H]4CC2)C[C@H]3O)C)=CC(=O)OC1 SHIBSTMRCDJXLN-KCZCNTNESA-N 0.000 description 2
- 238000012172 direct RNA sequencing Methods 0.000 description 2
- 239000012149 elution buffer Substances 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 229910052739 hydrogen Inorganic materials 0.000 description 2
- 239000001257 hydrogen Substances 0.000 description 2
- FDGQSTZJBFJUBT-UHFFFAOYSA-N hypoxanthine Chemical compound O=C1NC=NC2=C1NC=N2 FDGQSTZJBFJUBT-UHFFFAOYSA-N 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000006011 modification reaction Methods 0.000 description 2
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 description 2
- 239000011148 porous material Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 125000006850 spacer group Chemical group 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- DRTQHJPVMGBUCF-UHFFFAOYSA-N uracil arabinoside Natural products OC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 DRTQHJPVMGBUCF-UHFFFAOYSA-N 0.000 description 2
- 229940045145 uridine Drugs 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- TXJUTRJFNRYTHH-UHFFFAOYSA-N 1h-3,1-benzoxazine-2,4-dione Chemical class C1=CC=C2C(=O)OC(=O)NC2=C1 TXJUTRJFNRYTHH-UHFFFAOYSA-N 0.000 description 1
- YIAQYVXJOPEWOS-UHFFFAOYSA-N 1h-imidazole;2-methylpyridine-3-carboxylic acid Chemical compound C1=CNC=N1.CC1=NC=CC=C1C(O)=O YIAQYVXJOPEWOS-UHFFFAOYSA-N 0.000 description 1
- SXUXMRMBWZCMEN-UHFFFAOYSA-N 2'-O-methyl uridine Natural products COC1C(O)C(CO)OC1N1C(=O)NC(=O)C=C1 SXUXMRMBWZCMEN-UHFFFAOYSA-N 0.000 description 1
- SXUXMRMBWZCMEN-ZOQUXTDFSA-N 2'-O-methyluridine Chemical compound CO[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=O)NC(=O)C=C1 SXUXMRMBWZCMEN-ZOQUXTDFSA-N 0.000 description 1
- JKMHFZQWWAIEOD-UHFFFAOYSA-N 2-[4-(2-hydroxyethyl)piperazin-1-yl]ethanesulfonic acid Chemical compound OCC[NH+]1CCN(CCS([O-])(=O)=O)CC1 JKMHFZQWWAIEOD-UHFFFAOYSA-N 0.000 description 1
- 125000003903 2-propenyl group Chemical group [H]C([*])([H])C([H])=C([H])[H] 0.000 description 1
- GJTBSTBJLVYKAU-XVFCMESISA-N 2-thiouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C(=S)NC(=O)C=C1 GJTBSTBJLVYKAU-XVFCMESISA-N 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- LRSASMSXMSNRBT-UHFFFAOYSA-N 5-methylcytosine Chemical compound CC1=CNC(=O)N=C1N LRSASMSXMSNRBT-UHFFFAOYSA-N 0.000 description 1
- DWRXFEITVBNRMK-UHFFFAOYSA-N Beta-D-1-Arabinofuranosylthymine Natural products O=C1NC(=O)C(C)=CN1C1C(O)C(O)C(CO)O1 DWRXFEITVBNRMK-UHFFFAOYSA-N 0.000 description 1
- 239000002126 C01EB10 - Adenosine Substances 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- MIKUYHXYGGJMLM-GIMIYPNGSA-N Crotonoside Natural products C1=NC2=C(N)NC(=O)N=C2N1[C@H]1O[C@@H](CO)[C@H](O)[C@@H]1O MIKUYHXYGGJMLM-GIMIYPNGSA-N 0.000 description 1
- NYHBQMYGNKIUIF-UHFFFAOYSA-N D-guanosine Natural products C1=2NC(N)=NC(=O)C=2N=CN1C1OC(CO)C(O)C1O NYHBQMYGNKIUIF-UHFFFAOYSA-N 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 108010007577 Exodeoxyribonuclease I Proteins 0.000 description 1
- 102100029075 Exonuclease 1 Human genes 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 108091081406 G-quadruplex Proteins 0.000 description 1
- 241001123946 Gaga Species 0.000 description 1
- 108091029499 Group II intron Proteins 0.000 description 1
- 239000007995 HEPES buffer Substances 0.000 description 1
- UGQMRVRMYYASKQ-UHFFFAOYSA-N Hypoxanthine nucleoside Natural products OC1C(O)C(CO)OC1N1C(NC=NC2=O)=C2N=C1 UGQMRVRMYYASKQ-UHFFFAOYSA-N 0.000 description 1
- 229930010555 Inosine Natural products 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 101710203526 Integrase Proteins 0.000 description 1
- PWHULOQIROXLJO-UHFFFAOYSA-N Manganese Chemical compound [Mn] PWHULOQIROXLJO-UHFFFAOYSA-N 0.000 description 1
- VQAYFKKCNSOZKM-IOSLPCCCSA-N N(6)-methyladenosine Chemical compound C1=NC=2C(NC)=NC=NC=2N1[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O VQAYFKKCNSOZKM-IOSLPCCCSA-N 0.000 description 1
- VIHYIVKEECZGOU-UHFFFAOYSA-N N-acetylimidazole Chemical group CC(=O)N1C=CN=C1 VIHYIVKEECZGOU-UHFFFAOYSA-N 0.000 description 1
- VQAYFKKCNSOZKM-UHFFFAOYSA-N NSC 29409 Natural products C1=NC=2C(NC)=NC=NC=2N1C1OC(CO)C(O)C1O VQAYFKKCNSOZKM-UHFFFAOYSA-N 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 235000017284 Pometia pinnata Nutrition 0.000 description 1
- 102000001708 Protein Isoforms Human genes 0.000 description 1
- 108010029485 Protein Isoforms Proteins 0.000 description 1
- 229930185560 Pseudouridine Natural products 0.000 description 1
- PTJWIQPHWPFNBW-UHFFFAOYSA-N Pseudouridine C Natural products OC1C(O)C(CO)OC1C1=CNC(=O)NC1=O PTJWIQPHWPFNBW-UHFFFAOYSA-N 0.000 description 1
- 108091034057 RNA (poly(A)) Proteins 0.000 description 1
- 108091008103 RNA aptamers Proteins 0.000 description 1
- 238000000505 RNA structure prediction Methods 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 101710137500 T7 RNA polymerase Proteins 0.000 description 1
- 108091027753 UTRdb Proteins 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- FJSZPPSFHYCQHQ-UHFFFAOYSA-N [N+](=O)([O-])C=1C=CC=C2C1C(=O)OC(N2C)=O Chemical compound [N+](=O)([O-])C=1C=CC=C2C1C(=O)OC(N2C)=O FJSZPPSFHYCQHQ-UHFFFAOYSA-N 0.000 description 1
- 150000008065 acid anhydrides Chemical class 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 229960005305 adenosine Drugs 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- WGDUUQDYDIIBKT-UHFFFAOYSA-N beta-Pseudouridine Natural products OC1OC(CN2C=CC(=O)NC2=O)C(O)C1O WGDUUQDYDIIBKT-UHFFFAOYSA-N 0.000 description 1
- 102000023732 binding proteins Human genes 0.000 description 1
- 108091008324 binding proteins Proteins 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000030570 cellular localization Effects 0.000 description 1
- 238000012650 click reaction Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 239000013068 control sample Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000012159 eCLIP Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- MHMNJMPURVTYEJ-UHFFFAOYSA-N fluorescein-5-isothiocyanate Chemical compound O1C(=O)C2=CC(N=C=S)=CC=C2C21C1=CC=C(O)C=C1OC1=CC(O)=CC=C21 MHMNJMPURVTYEJ-UHFFFAOYSA-N 0.000 description 1
- 108020001507 fusion proteins Proteins 0.000 description 1
- 102000037865 fusion proteins Human genes 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- 229940029575 guanosine Drugs 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- OWQPEDNXDCVXJO-UHFFFAOYSA-N imidazol-1-yl-(2-methylpyridin-3-yl)methanone Chemical compound CC1=NC=CC=C1C(=O)N1C=NC=C1 OWQPEDNXDCVXJO-UHFFFAOYSA-N 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008863 intramolecular interaction Effects 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 229910052748 manganese Inorganic materials 0.000 description 1
- 239000011572 manganese Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 239000012022 methylating agents Substances 0.000 description 1
- 150000002772 monosaccharides Chemical class 0.000 description 1
- 231100000310 mutation rate increase Toxicity 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 239000013642 negative control Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012946 outsourcing Methods 0.000 description 1
- JTJMJGYZQZDUJJ-UHFFFAOYSA-N phencyclidine Chemical compound C1CCCCN1C1(C=2C=CC=CC=2)CCCCC1 JTJMJGYZQZDUJJ-UHFFFAOYSA-N 0.000 description 1
- 229910052697 platinum Inorganic materials 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- PTJWIQPHWPFNBW-GBNDHIKLSA-N pseudouridine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1C1=CNC(=O)NC1=O PTJWIQPHWPFNBW-GBNDHIKLSA-N 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- DWRXFEITVBNRMK-JXOAFFINSA-N ribothymidine Chemical compound O=C1NC(=O)C(C)=CN1[C@H]1[C@H](O)[C@H](O)[C@@H](CO)O1 DWRXFEITVBNRMK-JXOAFFINSA-N 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 108091069025 single-strand RNA Proteins 0.000 description 1
- -1 small molecule compound Chemical class 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000012916 structural analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- ALLGTQOHAQKUOH-UHFFFAOYSA-N sulfo-cy3 dbco Chemical compound C1C2=CC=CC=C2C#CC2=CC=CC=C2N1C(=O)CCNC(=O)CCCCCN(C=1C(C\2(C)C)=CC(=CC=1)S([O-])(=O)=O)C/2=C/C=C/C1=[N+](CCCS(O)(=O)=O)C2=CC=C(S(O)(=O)=O)C=C2C1(C)C ALLGTQOHAQKUOH-UHFFFAOYSA-N 0.000 description 1
- 239000004094 surface-active agent Substances 0.000 description 1
- 230000005030 transcription termination Effects 0.000 description 1
- 230000014616 translation Effects 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- DGVVWUTYPXICAM-UHFFFAOYSA-N β‐Mercaptoethanol Chemical compound OCCS DGVVWUTYPXICAM-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1034—Isolating an individual clone by screening libraries
- C12N15/1065—Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/10—Processes for the isolation, preparation or purification of DNA or RNA
- C12N15/1096—Processes for the isolation, preparation or purification of DNA or RNA cDNA Synthesis; Subtracted cDNA library construction, e.g. RT, RT-PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the present invention relates to an RNA probe for mutation profiling, and more particularly to an RNA probe in which a structured barcode sequence is added to an RNA to be analyzed, and a method for analyzing the higher-order structure of RNA using the probe.
- RNA is a biomolecule that functions as a template for protein synthesis, but on the other hand, RNA itself forms a tightly folded higher-order structure and controls gene expression, intracellular localization of transcripts, and splicing mechanism. ing. Most of these functional RNAs are defined by the fact that the base as a primary sequence has a sterically specific arrangement in structure formation.
- This RNA higher-order structure includes stem (STEM), stem loop (STEM-LOOP), kissing loop (KISSING-LOOP), multijunction (MULTI-JUNCTION), kink turn (KINK-TURN), pseudoknot (PSEUDOKNOT), 4 It is formed from a combination of various structural motifs such as heavy chain (QUADRUPLEX).
- mutations such as the SHAPE-MaP method (see Patent Document 1) that selectively modifies the carbon at the 2-position of the sugar of nucleic acid, and the DMS-MaPseq method (see Non-Patent Document 1) using dimethyl sulfate (DMS).
- Mutational Profiling (MaP) is used to estimate the secondary structure of RNA. The distribution of chemical modifications correlates with the secondary structure of RNA and is recorded as mutations due to position-specific reverse transcription termination, substitution, insertion or deletion in determining the base sequence of complementary DNA.
- Mutation profiling can simultaneously analyze a wider variety of RNAs by integrating with next-generation sequencing. For example, in the DMS-MaPseq method and the SHAPE-MaP method, a DNA fragment derived from RNA into which a mutation has been introduced is mapped to the reference genome on a computer. By this operation, sequences are sorted under the condition that a plurality of types are mixed, and structure-specific mutations can be simultaneously counted for RNAs of a plurality of regions or different molecules.
- a plurality of types of mutations in the same molecule can be counted by detecting a direct change in potential given by a modified species by a nanopore sequencer (see, for example, Non-Patent Document 2).
- a mapping operation for the reference genome to sort the sequences there is a drawback that it is not possible to know which genome position they are derived from when similar sequences are present.
- gene families, allyl-specific RNA, and the like are examples of genes, allyl-specific RNA, and the like.
- mutagenesis with RNA modification reagents increases the diversity of similar sequences, thus enhancing this effect.
- the present invention aims to improve the detection accuracy of mutations, insertions, deletions, etc. of introduced bases without affecting the RNA higher-order structure to be analyzed when performing mutation profiling using an RNA library. Make it an issue.
- the present invention has been made to solve such a problem, and when performing mutation profiling, each RNA contained in the RNA library has a different unique sequence and suppresses the reaction with a chemical modification agent. A bar code array having such a structure is added.
- the method for analyzing the higher-order structure of RNA is (a) a step of preparing one or more RNA probes to which a barcode sequence is added to the RNA to be analyzed, (b). ) Includes a step of contacting the RNA probe with the RNA modifier and (c) detecting the position and frequency of the modified base in the sequence of the RNA probe obtained in step (b).
- This barcode sequence is characterized by having a structure in which the reaction with the RNA modifier is suppressed and not forming a higher-order structure with the RNA to be analyzed.
- the detection step (c) preferably includes the following steps.
- step (C1) A step of synthesizing complementary DNA with reverse transcriptase using the mixture of RNA probes obtained in step (b) as a template, (c2) determining the base sequence of the complementary DNA and aligning the base sequence including the barcode sequence. And (c3) a step of detecting the position and frequency of mutations occurring in the aligned base sequence.
- an RNA probe containing an RNA to be analyzed to which a barcode sequence forming a structure containing a plurality of base pairs is added, and an RNA probe library containing a plurality of the RNA probes are provided.
- an RNA probe library group consisting of two or more replicas of this RNA probe library is provided. All replicated RNA probes further contain a second barcode sequence, which is all identical in one library but distinguishable from other libraries. ..
- the detection accuracy of mutations, insertions, deletions, etc. of introduced bases is improved without affecting the RNA higher-order structure to be analyzed. be able to.
- FIG. 1 is a flow chart showing a method for analyzing a higher-order structure of RNA in one embodiment.
- FIG. 2 is a flow chart showing a method for analyzing a higher-order structure of RNA in another embodiment.
- FIG. 3 is a schematic diagram showing an outline (b) of the barcode sequence (a) and the library structure used for producing the first library.
- FIG. 4 is a schematic diagram showing an outline of a library structure produced by using 37 types of first barcode sequences and 4 types of second barcode sequences (batch barcodes).
- FIG. 5 is a base sequence of two samples (ID1 and ID32) synthesized as individual strands among RNA probes contained in the first library.
- FIG. 6 is a schematic diagram showing the flow of the mutation profiling operation performed using the second library.
- FIG. 7 shows the absolute value of the delta mutation rate of all nucleotides in the barcode of a sample chemically modified with NAI or DMS. The results are shown separately for structured barcodes (ID 1-28) and unstructured barcodes (ID 29-37) in RNA probes in the first library.
- FIG. 8 shows the delta mutation rate for each nucleotide when each library was chemically modified with NAI or DMS. The X-axis shows the sequence of the target RNA of ID1 and the estimated structure in dot-bracket notation. (A) is the result when the first library and four kinds of the second library were processed by NAI, and (b) is the result when the first library and four kinds of the second library were processed by DMS.
- FIG. 9 shows the delta mutation rates of each ID when the second library is individually or pooled and chemically modified with NAI or DMS, in which the base pairing region (black portion) and the non-base pairing region (gray) are formed. It is a violin plot showing the kernel density distribution of the delta mutation rate of nucleotides predicted to be part of).
- A) is a sample treated with NAI
- B) is a sample treated with DMS.
- FIG. 11 shows all reads obtained in the next generation sequence of the RNA probe library with ID2, which were subjected to mutation profiling by DMS using the RNA probe library group to which the structured batch barcode was added, from ID1 to 96. It is a graph which plotted the read number at the time of mapping about the file of the RNA probe library group to which the bar code of.
- FIG. 12 shows the results of performing mutation profiling without a modifier using an RNA probe library group to which a structured batch barcode was added, and plotting the percentage of each RNA determined to have the correct ID.
- FIG. 13 shows the results of performing mutation profiling by DMS using the RNA probe library group to which the structured batch barcode was added, and plotting the ratio of each RNA determined to have the correct ID.
- FIG. 12 shows the results of performing mutation profiling without a modifier using an RNA probe library group to which a structured batch barcode was added, and plotting the percentage of each RNA determined to have the correct ID.
- FIG. 13 shows the results of performing mutation profil
- FIG. 14 shows that after mutation profiling using structured batch barcodes, next-generation sequencing was performed in combination with multiple indexes, and all reads obtained from the RNA probe library with ID7 were subjected to ID1. It is a graph which plotted the read number at the time of mapping about the RNA probe library group to which the barcode of 96 was given, for each ID.
- FIG. 15 plots the number of reads of the structured batch barcode ID mapped to the index ID as a result of next-generation sequencing performed by assigning a one-to-one corresponding index to the structured batch barcode. It is a graph.
- FIG. 16 is a graph in which the number of RNA types (RNA IDs) misdetermined in the RNA probe library to which each structured batch barcode ID is assigned is plotted in FIG. 15.
- FIG. 17 is a result of assigning a one-to-one corresponding index to a structured batch barcode, performing next-generation sequencing, and plotting the accuracy in determining the ID of the structured batch barcode for each index.
- FIG. 18 is a diagram showing an example of a structured batch barcode array (ID12 and ID28) used in Example 4.
- RNA to be analyzed or "target RNA” means an RNA molecule having a compatible meaning and having a sequence that may interact with a small molecule compound or protein in vivo.
- the RNA to be analyzed may be a biological sample extracted from a living body as it is, or may be an artificially synthesized RNA. In the case of artificial synthesis, it is preferable to include a motif region, which is a functional structural unit of RNA, extracted based on the sequence information of RNA.
- “Motif region” means a functional structural unit for RNA to interact with a substance of interest.
- RNA probes and pseudoknots which are the constituent elements of this RNA motif, are called structural motifs, and the combination of these structural motifs forms a higher-order structure of RNA.
- the motif region contained in the RNA probe of the present invention may consist of a single stem-loop structure (hairpin loop structure) or may include a plurality of stem-loop structures (multi-branched loop structure). It may also include one or more kink-turns, pseudoknots, guanine quadruplexes (G-quadruplex) and the like.
- structural motifs can be composed not only by Watson-Crick base pairs but also by Hoogsteen base pairs.
- RNA probe refers to a nucleic acid molecule containing RNA to be analyzed, preferably a nucleic acid molecule composed of RNA, to which a primer binding site for amplification, a barcode sequence, or the like is added.
- library refers to a set of a plurality (two or more) types of different molecules (for example, a plurality of different DNA molecules or a plurality of different RNA molecules). In the method according to the present embodiment, analysis can be performed using a large number of RNA probes as needed. Therefore, the term “library” is preferably 10 or more, more preferably 10 2 or more, and 10 It may contain 3 or more, or 104 or more, more preferably 106 or more different RNA molecules.
- RNA refers to a partial double-strand formation (also referred to as a stem structure) based on the formation of base pairs in a molecule and a portion without the base pair formation in a solution. It refers to a single-stranded structure, an annular single-stranded structure (referred to as a loop structure), or a combination thereof. Such a structure is in a specific equilibrium state depending on the state of the solution (temperature, salt concentration, etc.) and fluctuates with the movement of RNA molecules.
- stem structure means a double helix structure formed by an arbitrary nucleic acid sequence contained in RNA and a sequence complementary to the nucleic acid sequence.
- complementary means the ability of two nucleic acid sequences to hybridize, and since the two sequences need only hybridize, the two nucleic acid sequences constituting the stem structure are at least 50. It may have%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99%, or 100% sequence complementarity.
- the "barcode sequence” is a tag having a unique sequence added to each nucleic acid molecule for each type or for each molecule. It is also called “index” or “unique molecular identifier (UMI)". UMI typically aims to improve quantification by reducing amplification bias by assigning a random sequence to each molecule in solution.
- UMI unique molecular identifier
- a barcode sequence having a unique sequence different for each type of RNA is added to a plurality of RNAs to be analyzed, the added barcodes are added after the plurality of RNAs are simultaneously modified and amplified. Each RNA can be identified and analyzed based on the type of RNA.
- multiple experimental data can be separated and obtained from the same next-generation sequence data, enabling efficient data analysis. ..
- the barcode sequence can be provided, for example, as a group of nucleic acids having a random base. Barcode arrays are randomly synthesized (so that the sequences are diverse and do not need to be aware of the contents of the array) because the number of array types is important. good. Alternatively, the barcode sequence may be a set of known sequence nucleic acids designed to provide sufficient diversity.
- FIG. 1 is a flow chart showing a method for analyzing a higher-order structure of RNA in one embodiment of the present invention.
- This method is obtained in a step of preparing one or more RNA probes to which a barcode sequence is added to the RNA to be analyzed (S10), a step of contacting the RNA probe with an RNA modifier (S20), and a step S20.
- This includes a step of detecting the position and frequency of the modified base in the sequence of the RNA probe (S30), and further, if necessary, a step of displaying the detection result (S40).
- the barcode sequence is characterized by having a structure in which the reaction with the RNA modifier is suppressed.
- the detection step (S30) uses a reverse transcriptase as a template for the mixture of RNA probes obtained in the step S20.
- the step of synthesizing complementary DNA (S31), the step of determining the base sequence of complementary DNA and aligning the base sequence including the barcode sequence (S32), and the position and frequency of mutations occurring in the aligned base sequence are determined. It preferably comprises a step of detecting (S33).
- the RNA to be analyzed preferably contains a motif region for exerting a function in the living body.
- This motif region may consist of a single stem-loop structure (hairpin loop structure) or may include multiple stem-loop structures (multi-branched loop structure).
- it is preferable to extract the motif region based on the stem structure see, for example, WO2018 / 003809). This makes it possible to prepare an RNA probe that reflects the functional structural units existing in RNA without dividing the motif region.
- the motif region may have any sequence length as long as its function is maintained, and may be, for example, 1000 bases or less, 900 bases or less, 800 bases or less, 700 bases or less, 600 bases or less, 500 bases or less, It may be 400 bases or less, 300 bases or less, 200 bases or less, 150 bases or less, 100 bases or less, and 50 bases or less.
- the stem structure in the RNA is, for example, CentroidFold (Hamada, M. et al., Bioinformatics, Vol. 25, pp465-473, 2009) or IPknot (Sato, K. et al.). It can be recognized using RNA secondary structure prediction software such as et al., Methods Biochem. Anal., Vol. 27, pp. I85-i93, 2011). Further, any RNA sequence information can be used, for example, UTRdb (Grillo, G. et al., Nucl. Acids Res., Vol. 38, D75-D80, 2010), IRESite (Mokrejs).
- RNA sequence information may be obtained from a database containing not only RNA sequence information but also structural information. For example, Rfam (Nawrocki, EP et al., Nucl. Acids Res., Vol. 43, D130-D137, 2015), Structure Surfer (Berkowitz, N.D.
- RNA RNA determined by various methods may be used, and for example, those downloaded from Protein Data bank (https://www.rcsb.org/) can be used. Further, it may be an RNA higher-order structure designed by itself, and for example, data designed by software such as RNA invoke may be used.
- RNA to be analyzed is structured.
- Structured means that RNA is folded in solution to form secondary or tertiary structure, or remains in primary structure (sequence), thereby suppressing reaction with RNA modifiers.
- the barcode sequence contains multiple base pairs that are less susceptible to such modifications. It can be designed to form a structure. Multiple base pairs mean that two or more bases that are continuous or separated form hydrogen bonds with other bases in the barcode sequence, such as Watson-Crick base pairs or Hoogsteen bases. It may be equal.
- a stable structure can be formed at least temporarily, but in order to form a more stable structure, three or more base pairs are preferable. Four or more base pairs are even more preferred, and five or more base pairs are even more preferred.
- the upper limit of the number of base pairs is not particularly limited, but since a sufficiently stable structure can be obtained if there are about 10 base pairs, 30 or less base pairs are preferable, and 20 or less base pairs are preferable from the viewpoint of cost. Pairs are more preferred, and 15 or less base pairs are even more preferred.
- the barcode sequence is a non-base paired sequence, i.e., a single strand structure. It is preferably designed to be maintained. Furthermore, it is preferable that the barcode sequence having this structure is computer-optimized so as not to affect the RNA to be analyzed. This is to avoid the problem that the addition of the barcode sequence itself forms a structure far from the original RNA structure due to the intramolecular interaction with the RNA to be analyzed, or the stability of the structure is affected. .. Computerized sequence optimization can be performed using known programs such as the Vienna RNA package.
- RNA probe modification step (S20) causes an RNA probe modification reaction by contacting the RNA probe prepared in the previous step (S10) with a desired RNA modifying agent.
- the RNA modifier includes compounds that selectively modify unconstrained nucleotides, such as single-stranded regions in RNA probes.
- Such compounds are typically isatoic acid anhydride derivatives that react with the ribose-2'-hydroxy group, known as SHAPE reagents, such as 1-methyl-7-nitroisatoic acid anhydride (1M7), 1 Includes, but is not limited to, methyl-6-nitroisatoic acid anhydride (1M6), NMIA (N-methylisatoic acid anhydride) and 2-methylnicotinic acid imidazolide (NAI).
- SHAPE reagents such as 1-methyl-7-nitroisatoic acid anhydride (1M7)
- dimethyl sulfate can be used as an RNA modifier because it forms adducts at the N1 position of adenosine, the N3 position of cytosine, and the N3 position of uridine and the N1 position of guanosine.
- NAI generally reacts with all four nucleotides and DMS reacts only with adenine and cytosine.
- DMS can also react with guanine and uridine under conditions of a fundamentally biased pH (eg pH 8.0).
- the RNA modifier may selectively modify a constrained nucleotide that forms a double strand in the RNA probe.
- the RNA modifying agent includes, but is not limited to, for example, RNASEV1, which is an enzyme that degrades double-stranded RNA, DICER of the RNASEIII family, or a fusion protein of a double-stranded binding protein and an RNA-modifying protein.
- the solution may be a biological solution containing different concentrations and amounts of proteins, cells, viruses, lipids, monosaccharides and polysaccharides, amino acids, nucleotides, DNA, as well as various salts and metabolites. Further, it may be a solution containing a small molecule or medium molecule drug having a different concentration and amount. It may also contain various surfactants, polymers and ozmolite. The concentration of the RNA modifier can be adjusted to achieve the desired degree of modification of the RNA.
- RNA to be analyzed can be modified in the presence of proteins or other small and high molecular weight biological ligands. If the reactivity of the RNA modifier depends on the pH, the pH may be maintained, for example, in the range of 7.5 to 9.0, but not limited to. The functional range that distinguishes between the most reactive and the least reactive nucleotides typically ranges from 20 to 50 times.
- RNA can be replaced at the desired pH (eg, about pH 8) by any procedure that folds into the desired conformation. This RNA can be heated first and then hurriedly cooled in a low ionic strength buffer to eliminate multimeric morphology. Folding solutions can then be added to prepare the RNA for accurate conformation and for exploration with structurally sensitive RNA modifiers. In some embodiments, RNA is not naturally folded prior to modification. Modifications can be made while the RNA is denatured by heat and / or low salt conditions.
- This step is a step of detecting the position and frequency of the modified base in the sequence of the RNA probe obtained in the modification step (S20).
- the method is not particularly limited as long as it reads the modified base in the RNA sequence, and may be, for example, a pull-down method using an antibody specific to the modified base or a nanopore sequencing method for directly reading the potential of RNA.
- This direct RNA nanopore sequencing method is a technique for detecting RNA modification sites at the single molecule level.
- the direct RNA sequencing platform developed and marketed by Oxford Nanopore Technologies migrates RNA bound to motor proteins via membrane-suspended biological nanopores.
- the modified base detection step (S30) is mutation profiling involving the conversion of RNA to complementary DNA (cDNA), as shown in FIG.
- cDNA is synthesized by reverse transcriptase or another polymerase using the mixture of RNA probes obtained in step S20 as a template (S31).
- the reverse transcriptase is an enzyme that synthesizes cDNA from RNA, and examples thereof include, but are not limited to, thermostable enzymes such as mouse or bird reverse transcriptase. Alternatively, it may be a reverse transcriptase TGIRT (Thermostable Group II intron reverse transcriptase) present in a retrotransposon such as a prokaryote or a fungus.
- TGIRT Thermostable Group II intron reverse transcriptase
- InGex's TGIRT-III is superior in thermal stability, processability, and accuracy to conventional retrovirus-derived reverse transcriptase. Further, it is known that a mutation is induced at the site modified by DMS during reverse transcription (DMS-MaPseq method).
- These enzymes include a method of detecting a chemical modification in RNA by skipping the nucleotide containing the adduct and incorporating an inaccurate (non-complementary) nucleotide at the site of the chemical modification.
- "inaccurate” with respect to nucleotide uptake refers to the incorporation of non-complementary nucleotides (nucleotides that violate the Watson-Crick rule) into the nucleotides present in the original sequence. Say. It contains a small number of deletions in the sequence.
- cDNA can efficiently detect chemical modifications in nucleic acids such as RNA by using massively parallel sequencing (MPS).
- MPS massively parallel sequencing
- the 5'end side is fixed on the flow cell via adapters at both ends of tens of millions to hundreds of millions of DNA fragments.
- the adapter on the 5'end side fixed in advance on the flow cell and the adapter sequence on the 3'end side of the DNA fragment are annealed to form a bridge-shaped DNA fragment.
- next-generation sequencer By performing a nucleic acid amplification reaction with DNA polymerase in this state, a large number of single-stranded DNA fragments can be locally amplified and fixed. Then, in the next-generation sequencer, by performing sequencing using the obtained single-stranded DNA as a template, as of 2020, a huge amount of sequence information of about 3 Tb can be obtained in one analysis.
- NGS Next-Generation Sequencing
- Massively Parallel Sequencing and “Ultra-High-Throughput Sequencing”. Or “massively parallel sequencing”.
- the sequence data (reads) obtained by the next-generation sequencer are aligned in a form including a barcode sequence.
- sequence data for each barcode sequence samples containing multiple types of RNA probes can be sequenced at the same time. Further, even when the RNA to be analyzed contains similar sequences such as gene families and single nucleotide polymorphisms, it is possible to identify and analyze them.
- the alignment may be evaluated by adding the mutation information of the barcode for the unreliable alignment.
- the accuracy of the sequence information can be improved by aligning the RNA sequence to be analyzed together with the barcode sequence.
- the mutation rate at a given nucleotide is simply the number of mutations (mismatch, deletion and insertion) at that location divided by the number of reads.
- the data for calculating the raw reactivity for each nucleotide can be normalized using various criteria. Data quality control is possible by considering the reading depth and standard error of the sequence.
- ⁇ Display of detection result (S40)> The location and frequency of mutations detected in the above steps can be illustrated by methods known to those of skill in the art, such as mutation histograms, sequence depths and reactivity profiles.
- alignment software alignment software
- BWA and STAR alignment software
- These data are quantified and vectorized as mutation counts, and various operations can be performed.
- mutations that show statistically superior reactivity can be annotated.
- RNA samples in this step can be performed using a computer program product stored on a computer readable medium.
- exemplary computer-readable media suitable for carrying out the present invention include chip memory devices, disk storage devices, programmable logic devices, and application-specific integrated circuits.
- the computer program products that carry out this process can be installed on a single device or computing platform, or distributed among multiple devices or computing platforms. Therefore, the higher-order structure of RNA obtained by the method of this embodiment can be displayed on a display connected to a computer.
- the structured barcode disclosed in this embodiment has some advantageous effects. First, it is unlikely that the barcode sequence will be modified in the reaction with the RNA modifier, and it will be possible to correctly identify the barcode as a barcode. In addition, the bar code portion is suppressed from interacting with the RNA to be analyzed or other RNA molecules. This allows structured barcode arrays to not only distinguish from similar sequences in a library, but also to distinguish between different batches of the same library.
- FIG. 4 shows a method of creating a library group using 37 types of first barcode sequences and 4 types of second barcode sequences. A second bar with the same sequence in one library but different sequences in different batches of libraries by amplifying the initially prepared library of 37 DNAs with 4 different primers. A code array is added. By performing an in vitro transcription reaction using these, an RNA library group to which two types of barcode sequences are added can be prepared.
- RNA probe and RNA probe library As another embodiment of the present invention, an RNA probe containing a structured barcode sequence and an RNA probe library containing a plurality of the RNA probes are provided.
- the structured bar code sequence is a bar code sequence that forms a structure containing a plurality of base pairs.
- the barcode sequence of the present embodiment include a complementary double-chain structure, triple-chain structure, or quadruple-chain structure, and specific examples thereof include a stem-loop structure and a pseudoknot structure. ..
- the stem moieties form complementary double strands, but to increase sequence diversity, GU, I-U, I-, which have comparable thermodynamic stability to Watson-click base pairs. It may contain wobble base pair of A and IC.
- I represents inosine, and its base, hypoxanthine, can base pair with uracil, adenine, and cytosine. Uracil can be paired with two bases, guanine and adenine.
- the structure containing a plurality of base pairs is a stem-loop structure, having one or more bulges and / or internal loop structures at the stem site.
- a base that serves as a negative control and a positive control for structure-specific mutations can be loaded at the same time.
- the structured barcode functions as a control for molecular species that modify the terminal loop but not the bulge or internal loop.
- the structure containing a plurality of base pairs is an RNA structure registered in PDB (Protein Data Bank) or a variant thereof.
- PDB Protein Data Bank
- the position of the structured barcode sequence in the RNA probe of this embodiment is not particularly limited and can be placed at any position. For example, it may be on the 5'end side or the 3'end side of the RNA to be analyzed. Alternatively, one strand of the barcode sequence forming the complementary strand is located on the 5'end side of the RNA to be analyzed, and the other strand is located on the 3'end side so that they sandwich the RNA to be analyzed. Chains may be formed. Further, the number of structured barcode sequences is not particularly limited, and a plurality of structured barcodes having the same or different sequences may exist.
- the RNA probe of the present embodiment contains an RNA motif containing at least one structural motif as the RNA to be analyzed.
- this motif region one extracted from arbitrary RNA sequence information can be used.
- the motif region contained in the RNA probe of the present invention may be selected from any RNA secondary structure data already identified by the RNA structure study.
- this RNA probe may be labeled with a fluorescent dye (eg, FITC, PE, Cy3, Cy5, etc.), a radioisotope, digoxigenin (DIG), biotin, etc. for detection.
- Labeling can be performed by incorporating a pre-labeled nucleic acid at the time of probe synthesis, and for example, an artificial nucleic acid labeled on the 5'side can be incorporated.
- the artificial nucleic acid labeled on the entire length of RNA can be incorporated.
- An artificial nucleic acid labeled with, for example, T4 RNA ligase1 can be labeled on the 3'side.
- the labeling may be performed in multiple stages by a click reaction or the like.
- a fluorescent dye or biotin can be incorporated into RNA by reacting DBCO-biotin and DBCO-Cy3 with RNA in which pCp-N3 is added to the 3'end using T4 RNA ligase1.
- the proportions of these labels may be 10, 20, 30, 40, 50, 60, 70, 80, 90, 99, 100%.
- the RNA probe of this embodiment can be synthesized by any conventionally known genetic engineering method.
- the RNA probe can be made by transcribing the synthesized template DNA outsourced to a synthetic contractor.
- the DNA containing the sequence of the RNA probe may have a promoter sequence.
- a T7 promoter sequence is exemplified as a preferable promoter sequence.
- RNA can be transcribed from DNA having a desired RNA probe sequence using MEGAshortscript TM T7 Transcription Kit provided by Life Technologies.
- the RNA may be a modified RNA as well as adenine, guanine, cytosine, and uracil.
- Modified RNAs are exemplified by, for example, pseudouridine, 5-methylcytosine, 5-methyluridine, 2'-O-methyluridine, 2-thiouridine, and N6-methyladenosine.
- an RNA probe library containing a plurality of RNA probes containing RNAs to be analyzed having different sequences is provided.
- oligonucleic acid library synthesis Oligonucleic acid library synthesis
- the oligo library can then be redissolved, amplified, and then subjected to an in vitro transcription reaction to prepare an RNA probe library.
- the Oligonucleotide Library Synthesis can be produced by outsourcing to Agilent Technologies and Twist Bioscience.
- the RNA probe library of this embodiment containing a plurality of RNA probes is amplified with a plurality of primers containing a second barcode sequence to form an RNA probe consisting of two or more replicas.
- Libraries can be prepared. All replicated RNA probes contain first and second barcode sequences, which are all identical sequences within one library but distinguishable from other libraries. Is. According to the examples described later, even when mutation profiling was performed by mixing a plurality of RNA probe libraries, the same results as those performed using each RNA probe library were obtained. Therefore, each RNA probe library was used. After performing different mutation profiling using each of them, it is considered that each mutation profiling can be identified by using a second bar code sequence after mixing them for next-generation sequencing.
- RNA library of this embodiment can be used as a kit for analyzing chemical modification of RNA and / or RNA structure analysis.
- a method of using such a kit a method for higher-order structural analysis of RNA according to the present invention is included.
- Example 1 Materials and methods (barcode array design) The barcode sequences in this example used stems and loops of different lengths. Stems of length 6, 7 or 8 base pairs (bp) containing normal base pairs and GU wobble base pairs were randomly generated. Three different length loops were used for each stem length. For each barcode, any one of the four tetraloops (UUCG, GAGA, GCUU, GUAA), or a sequence of 3 or 5 base lengths (UCG, AGA, CUU, UAA, UUACG, GAAGA, GCUAU, AGUAA). ) was selected. The Vienna RNA package was used to control the barcode to fold correctly. As a control, unstructured 10, 15 and 21 base length barcodes were generated.
- Target RNA sequence 5'-GUGUAUGAUGAAACUACAUUAAGUUAACUCGUGCAC-3'(SEQ ID NO: 1) was used. From this sequence, 12 positions that did not form base pairs were selected, and at each position, point mutants in which all the other three bases were changed were created to obtain 36 point mutants. As a result, a total of 37 sequences were obtained. Any pair of the 37 sequences differ only in one or two bases.
- FIG. 3 shows an outline of the barcode sequence and library structure used in the first library.
- FIG. 3 (a) is a barcode sequence of one RNA probe (ID1), which is composed of a 7 bp stem and a 4-nucleotide loop.
- the first library sequence has the following four parts in the direction of 5'to 3': i) 5'cassettes required for RNA library generation by in vitro transcription (IVT) and preparation of sequencing libraries (dashed line on the 5'side in FIG. 3B); ii) Different barcode sequences for each sequence (IDs 1-28 containing structured barcodes and IDs 29-37 containing unstructured barcodes in FIG.
- RNA sequence in which two-base spacers are adjacent on both sides solid line in FIG. 3 (b), point mutations in the sequence are indicated by triangles
- iv) 3'cassette required for RNA library generation by in vitro transcription (IVT), reverse transcription and preparation of sequencing library dashed line on the 3'side in FIG. 3B.
- FIG. 4 shows an outline of the barcode sequence and library structure used in the second library.
- RNA by this design contains two barcodes, an in-library barcode (first barcode) and a batch barcode (second barcode). It can be divided into the following four parts in the direction of 5'to 3': i) The same 5'cassette used in the first library design; ii) The same barcode sequence used in the first library design; iii) Target RNA sequence with two base spacers adjacent to each other; iv) A 12-base linker sequence that enhances primer binding.
- the base sequence of the primer used for the amplification of the second library is as follows.
- RNA polymerase promoter sequence (IVT recognition site: 5'-TAATACGACTCACTATAG-3'(SEQ ID NO: 6)).
- a forward primer with a cassette sequence and a reverse primer with a sequence complementary to the 3'cassette sequence were used.
- Pr_d2a SEQ ID NO: 2
- Pr_d2b SEQ ID NO: 3
- Pr_d2c SEQ ID NO: 4
- Pr_d2d SEQ ID NO: 5
- the prepared double-stranded DNA was used as a template for an IVT reaction using the MEGAshortscript TM T7 transcription kit (Thermo Fisher Scientific Co., Ltd.).
- the reaction was prepared according to the manual.
- the reaction volume is 20 ⁇ L and the template concentration is 100 nM.
- the reaction was incubated at 37 ° C. for 6 hours and then treated with TURBO DNase (included in the kit) at 37 ° C. for 15 minutes.
- RNA was purified with RNA Clean & Concentrator-25 from Zymo Research.
- RNA probes contained in the first library synthesized by the in vitro transcription reaction the nucleotide sequences of ID1 (SEQ ID NO: 7) and ID32 (SEQ ID NO: 8) synthesized as individual strands are shown in FIG. In FIG. 5, each barcode sequence portion is surrounded by a square, and the target RNA sequence is underlined.
- RNA modification Two different chemical modifiers were used for RNA modification.
- DMS dimethyl sulfate
- NAI imidazolide 2-methylnicotinate
- RNA preparation was used in experiments with both modifiers. 250 ng of RNA (single strand or pool) dissolved in 6 ⁇ L of water was incubated at 95 ° C. for 2 minutes and quenched on ice for at least 2 minutes. Next, 3 ⁇ L of 3.3 ⁇ folding buffer was added and the sample was incubated at 37 ° C. for 20 minutes (1 ⁇ folding buffer is composed of 100 mM HEPES (pH 8.0), 100 mM NaCl, 10 mM MgCl 2 ). ..
- control samples were prepared in the same manner using 1 ⁇ L DMSO instead of NAI.
- the modified RNA sample was reverse transcribed using a reverse primer with a sequence complementary to the 3'cassette sequence.
- the enzyme SuperScript TM II reverse transcriptase (Thermo Fisher Scientific Co., Ltd.) was used in the presence of manganese.
- TGIRT TM -III enzyme (InGex) was used.
- 1 ⁇ L of 2 ⁇ M reverse primer was mixed with 2 ⁇ L of 10 mM dNTPs (New England Biolabs) and 7 ⁇ L of previously modified RNA. Samples were annealed with the Thermo Fisher Scientific, Inc.
- ProFlex TM PCR system (held at 85 ° C., 1 min ⁇ 65 ° C., 10 min ⁇ 4 ° C.), which was also used in the reverse transcription step.
- 9 ⁇ L of 2.22 ⁇ MaP buffer was added, incubated for 2 minutes at room temperature, 1 ⁇ L of enzyme was added, and the sample was placed in a cycler and reverse transcribed (see Table 2).
- index PCR was performed using 1 ng of amplicon PCR product with a reaction volume of 25 ⁇ L.
- Other reaction components are 1xPlatinum TM SuperFi TM PCR Master Mix and Nextera XT Index Kit v2 (Illumina) 1 ⁇ M index primers.
- Samples were transferred to the ProFlex TM PCR system. After first heating to 98 ° C. for 30 seconds, 6 cycles of 3-cycle PCR were performed at 98 ° C. for 10 seconds, 55 ° C. for 10 seconds, and 72 ° C. for 20 seconds. After the last cycle, the temperature was maintained at 72 ° C. for 5 minutes and then cooled to 4 ° C.
- AMPure XP manufactured by Beckman Coulter
- AMPure XP 13 ⁇ L of water was added to the dried beads, mixed well and incubated at room temperature for 10 minutes to recover 12 ⁇ L of supernatant. The samples were then mixed together for next generation sequencing.
- Next-generation sequencing NextSeq500 / 550 mid-output kit v2.5 (Illumina, 150 cycles) using paired-end reads and standard read primers was used for sequencing.
- the FASTQ file adapter is first trimmed and then the read of the FASTQ file generated using the alignment software is mapped to the file containing the reference sequence (reference file) using the alignment software. gone. In this analysis, mapping was performed using STAR aligner software. Mutations, deletions and insertions were counted for further analysis.
- FIG. 6 is a schematic diagram showing the flow of the mutation profiling operation performed using the second library.
- the four libraries each of which had been chemically modified separately, were combined into one tube and subjected to a reverse transcription reaction.
- four tubes in which the reverse transcription reaction was separately performed on the above four libraries were prepared.
- FIG. 7 (a) is a boxplot showing the absolute delta mutation rate for all nucleotides in the barcode sequence of the first library modified with NAI.
- FIG. 7B is the result of similar analysis of the sample treated with DMS.
- the notch indicates the median and the box indicates the interquartile range.
- the whiskers move up and down from the edge of the box to the maximum or minimum within a span of 1.5 times the height of the box.
- Outliers are shown in yen.
- FIGS. 8 (a) and 8 (b) These delta mutation rates are shown in FIGS. 8 (a) and 8 (b) in which the target sequence of ID1 is plotted on the X-axis.
- Delta mutagenesis showed all four groups of first and second libraries (data are from pooled samples).
- the delta mutage rates of the first library and the second library are slightly different, but the mutage rates of both libraries are high in the unconstrained nucleotide region, and the structural probing is secondary. It shows that it reflects information about the structure. The Vienna RNA package was used for prediction.
- DMS FIGS. 8 (a) and 8 (b) in which the target sequence of ID1 is plotted on the X-axis.
- FIG. 8 shows only a single ID mutation profile. Mutation profiles for all IDs were then analyzed and compared to the secondary structure predicted by the Vienna RNA package.
- FIG. 9 is predicted to be unbound to the regions predicted to form base pairs (black regions in FIG. 9) when the second library was chemically modified with NAI or DMS, either alone or pooled, respectively. It is a violin plot which plotted the absolute value of the delta mutation rate of a region (the gray region of FIG. 9) separately.
- 9 (a) is a sample processed by NAI
- FIG. 9 (b) is a sample processed by DMS, and among the IDs shown on the respective x-axis, IDs 1 to 28 are structured barcode sequences.
- IDs 29-37 include unstructured barcode sequences. The results also show that the distributions of the four individual samples (on the left side of the "violin” in FIG. 9) and the pooled samples (on the right side of the "violin” in FIG. 9) are very similar. For DMS, only the positions of bases A and C are considered.
- RNA probe library RNA probe library
- 96 types of structured batch bars are used for a multiplexed library (RNA probe library) in which 54 types of RNA structures are mixed in total. I prepared the code. After that, for mapping, different barcodes were given to all 54 types of RNA structures contained in the library, and 96 ⁇ 54 types of reference files were created.
- an RNA probe library to which two types of batch barcodes with different IDs were added was synthesized in vitro, and a mutation profile experiment using DMS was performed.
- Next-generation sequencing analysis was performed by assigning corresponding indexes to different structured batch barcodes for verification experiments. After that, all the obtained reads were mapped to the reference file. In this analysis, mapping was performed using STAR aligner software. The results are shown in FIGS. 10 and 11.
- FIG. 10 is an experiment using a structured batch barcode 1, and the horizontal axis shows the ID actually determined by the sequence and mapping, and the vertical axis shows the total number of reads (Dept_sum).
- the mutation profile reaction system using the structured batch barcode 1 no modifier is used, and there is no effect of RNA structure-selective mutation introduction.
- most of the structured batch barcodes 1 have been correctly determined to be ID1. It has been determined that mapping was performed incorrectly for 18 types of IDs, but the number of reads for other IDs is very small, 1/1000 to 1/10000 or less, compared to the correct ID1, so the mutation profile. Does not affect the interpretation of the data in.
- FIG. 11 is an experiment using a structured batch barcode 2, and the horizontal axis shows the ID actually determined by the sequence and mapping, and the vertical axis shows the total number of reads (Dept_sum).
- a modifier is used, and the mutation is selectively introduced into the higher-order structure of RNA.
- FIG. 10 in FIG. 11, it was confirmed that the number of IDs for which a certain number of reads was detected increased due to the introduction of the mutation, but the majority of the number of reads was ID2, which was the correct answer as in FIG. I received a judgment.
- the total number of reads of IDs that are erroneously determined is very small, 1/100 to 10,000 or less, compared to the correct IDs (those that are determined to be ID2). It does not affect the interpretation of mutation profile data.
- the accuracy (percentage determined to be the correct ID) was confirmed for each of the 54 types of RNA in the library (Figs. 12 and 13).
- the accuracy under the unmodified condition was 99.91% on average and 99.44% on average under the mutagenesis condition, and high accuracy was maintained even under the mutagenesis condition.
- the structured batch barcode can clearly distinguish the correct barcode ID from other incorrect IDs in the mutation profile without impairing the accuracy of mapping, so it is possible to create a multiplex that mixes multiple different conditions at the same time. It is useful.
- Example 3 Effect of multiplexing by combining a barcode with another barcode (index)
- a commercially available index primer eg, Nextera XT Index
- Kit ⁇ Illumina> Kit ⁇ Illumina>
- FIG. 14 shows an index primer (functioning as a barcode) based on Illumina's sequence on the vertical axis, and the ID determined when the sample of structured RNA ID 7 prepared in Example 2 is mapped on the horizontal axis.
- the color scale shows the average value of the number of reads.
- the structured batch barcode (ID) can be identified with high accuracy in any index primer. That is, it can be said that the number of samples can be expanded on a large scale by combining a plurality of forms of DNA barcodes in addition to the batch barcodes. For example, by using 10 kinds of index primers and 96 kinds of structured barcodes, 960 kinds of conditions of 10 ⁇ 96 can be set.
- RNA probe library multiplexed library
- index IDs indexes
- index ID1 contains the RNA probe library with the structured batch barcode ID1. After that, all the obtained reads were mapped to the reference file. In this analysis, mapping was performed using STAR aligner software.
- the horizontal axis shows the correct index (Index ID), and the vertical axis shows the structured batch barcode ID (Batch Barcode ID) actually determined by the sequence and mapping.
- the color of the heat map indicates the average value (Dept_mean) of the mapped read numbers in the RNA probe library.
- Dept_mean the average value of the mapped read numbers in the RNA probe library.
- FIG. 16 it was found that most of the erroneous determinations occur for 1500 types of RNA in the library for 0 or less than 10 types, and the effect on the RNA of the entire library is very small.
- the number of reads for these erroneously determined RNA types is approximately 1/100 to 10000 or less compared to the correct ID, so the effect is even smaller, and the erroneous determination affects the interpretation of the profile results. Can be said not to be given (Fig. 17). Therefore, it can be said that the structured batch barcode has high orthogonality as intended, indicating that it functions as a barcode.
- FIG. 16 there are some data points in which about 800 types and 130 types are mixed, but they occur continuously between adjacent tubes and there is no similarity in barcodes. Therefore, it is judged to be contamination due to human error, and it is not a problem due to a specific structured barcode.
- the structured barcode RNA of ID12 has a 22 base length: 5'-GCUAGAAGAUUUGUCUUCUGGU-3'(SEQ ID NO: 9) and contains a 4-base loop structure.
- the structured barcode RNA of ID28 has a 19-base length: 5'-UUGCGAGAUAUUCUCGCGA-3' (SEQ ID NO: 10) and contains a 3-base loop structure. In this way, the structured barcode can change not only the base sequence but also the length and higher-order structure, so that the combination can be further expanded.
- structured barcodes can be multiplexed (multiplexed) in structural probing tests under multiple reaction conditions.
- a structural probing test can be performed with a plurality of different reaction compositions and experimental environmental conditions, and the effect of these different conditions on RNA structure can be screened on a large scale.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Microbiology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Plant Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Crystallography & Structural Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Saccharide Compounds (AREA)
Abstract
Description
(c1)工程(b)で得られたRNAプローブの混合物を鋳型として逆転写酵素により相補DNAを合成する工程、(c2)相補DNAの塩基配列を決定し、バーコード配列を含む塩基配列を整列させる工程、及び(c3)整列させた塩基配列に生じた変異の位置と頻度を検出する工程。
本明細書において、「解析対象RNA」又は「目的RNA」とは、互換的な意味を有し、生体内で低分子化合物やタンパク質と相互作用する可能性がある配列を有するRNA分子をいう。この解析対象RNAは、生体から抽出して得られた生物学的試料をそのまま用いてもよく、あるいは人工的に合成したRNAであってもよい。人工的に合成する場合は、RNAの配列情報に基づいて抽出した、RNAの機能構造単位であるモチーフ領域を含むことが好ましい。「モチーフ領域」とは、RNAが対象となる物質と相互作用するための機能構造単位を意味する。このRNAモチーフの構成要素であるステム-ループやシュードノットなどを構造モチーフと称し、この構造モチーフの組み合わせによってRNAの高次構造が形成される。本発明のRNAプローブに含まれるモチーフ領域は、単一のステム-ループ構造(ヘアピンループ構造)からなる場合もあれば、複数のステム-ループ構造(多分岐ループ構造)を含む場合もある。また1つ以上のキンクターン(kink-turn)、シュードノット(pseudoknot)、グアニン4重鎖(G-quadruplex)などを含む場合もある。また構造モチーフはワトソンクリック塩基対だけでなくフーグスティーン塩基対によっても構成され得る。
図1は、本発明の一実施形態における、RNAの高次構造の解析方法を示すフロー図である。この方法は、解析対象RNAにバーコード配列を付加した1又は複数のRNAプローブを調製する工程(S10)と、RNAプローブとRNA修飾剤とを接触させる工程(S20)と、工程S20で得られたRNAプローブの配列中で修飾を受けた塩基の位置と頻度を検出する工程(S30)と、さらに必要に応じて、検出結果を表示する工程(S40)と、を含む。ここで、バーコード配列は、RNA修飾剤との反応が抑制される構造を有することを特徴とする。
解析対象RNAは、生体内での機能を発揮するためのモチーフ領域を含むことが好ましい。このモチーフ領域は、単一のステム-ループ構造(ヘアピンループ構造)からなる場合もあれば、複数のステム-ループ構造(多分岐ループ構造)を含む場合もある。本実施形態では、ステム構造を基準としてモチーフ領域を抽出することが好ましい(例えば、WO2018/003809明細書参照)。これにより、モチーフ領域を分断することなく、RNA中に実在する機能構造単位を反映したRNAプローブを調製することができる。モチーフ領域は、その機能が維持されていることを限度として、任意の配列長であってよく、例えば1000塩基以下、900塩基以下、800塩基以下、700塩基以下、600塩基以下、500塩基以下、400塩基以下、300塩基以下、200塩基以下、150塩基以下、100塩基以下、50塩基以下であってよい。
本工程(S20)におけるRNAの修飾反応は、前工程(S10)で調製したRNAプローブと、所望のRNA修飾剤とを接触させることでRNAプローブの修飾反応を起こさせるものである。1つの実施形態として、このRNA修飾剤は、RNAプローブ中の一本鎖領域のような非拘束ヌクレオチドを選択的に修飾する化合物が挙げられる。このような化合物は、典型的には、SHAPE試薬として知られる、リボース-2’-ヒドロキシ基と反応するイサト酸無水物誘導体、例えば、1-メチル-7-ニトロイサト酸無水物(1M7)、1-メチル-6-ニトロイサト酸無水物(1M6)、NMIA(N-メチルイサト酸無水物)及び2-メチルニコチン酸イミダゾリド(NAI)を含むがこれらに限定されない。SHAPE試薬の他に、硫酸ジメチル(DMS)は、アデノシンのN1位置、シトシンのN3位置、及びウリジンのN3位置、グアノシンのN1位置で付加物を形成するため、RNA修飾剤として用いることができる。一例として、NAIは一般的に4つ全てのヌクレオチドと反応し、DMSは、アデニンとシトシンのみと反応する。一方で、DMSは塩基性に偏ったpH(例えばpH8.0)条件下にてグアニンとウリジンにも反応できる。
本工程は、上記修飾工程(S20)で得られたRNAプローブの配列中で、修飾を受けた塩基の位置と頻度を検出する工程である。RNA配列中における修飾塩基を読み取る方法であれば特に限定されず、例えば、修飾塩基に特異的な抗体を用いるプルダウン法や直接RNAの電位を読み取るナノポアシーケンス法であってもよい。この直接RNAナノポアシーケンス法は、単一分子レベルでRNAの修飾部位を検出するための技術である。現在、Oxford Nanopore Technologiesが開発及び市販している直接RNAシーケンシングプラットフォームでは、膜に懸濁された生物学的ナノポアを介してモータータンパク質と結合したRNAが移動する。RNAが電圧バイアス下で細孔を通過するとき、細孔狭窄部を通過する短い配列(5ヌクレオチド)の化学的同一性(つまりシーケンス)に依存して、ピコアンペアのイオン電流の変化が観察される(Garalde,D.R.,et al.(2018)Highly parallel direct RNA sequencing on an array of nanopores. Nat. Methods,及びWorkman,R.E.,et al.(2019)Nanopore native RNA sequencing of a human poly(A) transcriptome.Nat. Methods,16,1297-1305.参照)。SHAPE試薬の1つである、1-アセチルイミダゾール(Aclm)により修飾されたヌクレオチドを、この方法で検出しうることが報告されている(William Stephenson et al., Direct detection of RNA modifications and structure using single molecule nanopore sequencing.bioRxiv doi:https://doi.org/10.1101/2020.05.31.126763,Posted June 01, 2020)。
上記工程で検出された変異の位置と頻度は、変異ヒストグラム、シーケンスの深さ及び反応性プロファイルなどの当業者に既知の方法で図示することができる。変異位置と頻度の解析はBWA、STARなどの整列用ソフトウェア(アラインメントソフトウェア)を使用することができる。それらのデータは変異カウントとして数値化、ベクトル化され種々の演算を実施できる。また、統計的優位な反応性を示した変異に対してアノテーションをつけることができる。
本実施形態で開示した構造化バーコードは、いくつかの有利な作用効果を有する。1つは、RNA修飾剤との反応において、バーコード配列が修飾される可能性が低く、バーコードとして正しく識別することが可能となる。またバーコード部分が解析対象RNA又は他のRNA分子と相互作用することが抑制される。これにより、構造化バーコード配列は、ライブラリ内の類似配列と識別できるだけでなく、同じライブラリの異なるバッチを区別することも可能である。例えば、図4は、37種類の第1のバーコード配列と、4種類の第2のバーコード配列を用いてライブラリ群を作製する方法を表す。最初に作製された37種類のDNAからなるライブラリを4種類の異なるプライマーを用いて増幅することで、1つのライブラリ内では同じ配列であるが、異なるバッチのライブラリでは異なる配列を有する第2のバーコード配列が付加される。これらを用いてインビトロ転写反応を行うことで、2種類のバーコード配列が付加されたRNAライブラリ群を作製することができる。
本発明の他の実施形態としては、構造化されたバーコード配列を含むRNAプローブ及び複数の当該RNAプローブを含むRNAプローブライブラリが提供される。1つの実施形態において、構造化バーコード配列とは、複数の塩基対を含む構造を形成するバーコード配列である。本実施形態のバーコード配列としては、例えば、相補的な二本鎖構造、三重鎖構造又は四重鎖構造を含み、具体的には、ステム-ループ構造、シュードノット構造などを挙げることができる。ステム部分は相補的な二本鎖を形成するが、配列の多様性を増やすために、ワトソン-クリック型塩基対と同程度の熱力学的安定性を有するG-U、I-U、I-A及びI-Cのゆらぎ塩基対(wobble base pair)を含んでいてもよい。Iは、イノシンを表し、その塩基であるヒポキサンチンはウラシル、アデニン、シトシンと塩基対形成が可能である。ウラシルはグアニンとアデニンという2種類の塩基と対合することが可能である。
材料と方法
(バーコード配列の設計)
本実施例におけるバーコード配列は、異なる長さのステムとループを使用した。正規の塩基対とGUゆらぎ塩基対を含む、長さ6、7又は8塩基対(bp)のステムをランダムに生成した。ステムの長さごとに、3つの異なる長さのループを使用した。各バーコードに対して、4つのテトラループ(UUCG、GAGA、GCUU、GUAA)のいずれか1つ、又は3もしくは5塩基長の配列(UCG、AGA、CUU、UAA、UUACG、GAAGA、GCUAU、AGUAA)のいずれか1つを選択した。ViennaRNAパッケージを使用して、バーコードを正しく折りたたむように制御した。コントロールとして、構造化されていない10、15及び21塩基長のバーコードを生成した。
構造化バーコードの有用性を実証するために、目的RNAとして以下の配列:
5’-GUGUAUGAUGAAACUACAUUAAGUUAACUCGUGCAC-3’(配列番号1)を用いた。この配列から、塩基対を形成しない12カ所の位置を選択し、各位置において、他の3つすべての塩基に変えた点変異体を作成することにより、36個の点変異体を得た。これにより、合計37個の配列が得られた。この37個の配列の任意のペアは、1又は2塩基のみが相違する。
第1のライブラリに用いたバーコード配列及びライブラリ構造の概要を図3に示す。図3(a)は、1つのRNAプローブ(ID1)のバーコード配列であり、7bpのステムと4ヌクレオチドのループで構成されている。第1のライブラリ配列は、5’から3’の方向に以下の4つの部分を有する:
i)インビトロ転写(IVT)によるRNAライブラリの生成と、シーケンス用ライブラリの調製に必要な5’カセット(図3(b)における5’側の破線);
ii)個々の配列ごとに異なるバーコード配列(図3(b)の構造化バーコードを含むID1~28及び非構造化バーコードを含むID29~37);
iii)両側に2塩基のスペーサーが隣接する目的RNA配列(図3(b)の実線、なお、配列中の点変異を三角形で示す。);
iv)インビトロ転写(IVT)によるRNAライブラリの生成、逆転写及びシーケンス用ライブラリの調製に必要な3’カセット(図3(b)における3‘側の破線)。
第2のライブラリに用いたバーコード配列及びライブラリ構造の概要を図4に示す。この設計によるRNAは、ライブラリ内バーコード(第1のバーコード)とバッチバーコード(第2のバーコード)の2つのバーコードを含む。5’から3’の方向に以下の4つの部分に分けることができる:
i)第1のライブラリ設計で用いたものと同じ5’カセット;
ii)第1のライブラリ設計で用いたものと同じバーコード配列;
iii)両側に2塩基のスペーサーが隣接する目的RNA配列;
iv)プライマー結合を強化する12塩基のリンカー配列。
v)4種類のバッチバーコード。このバーコードは、1つのバッチ内のすべての目的RNAで同じ配列である。
vi)第1のライブラリ設計で用いたものと同じ3’カセット。
上述したライブラリ及びプライマーは、DNAの形でIntegrated DNA Technologies,Inc.(IDT社)に依頼して合成した。コントロールとして、第1のライブラリで設計した構造化又は非構造化バーコード配列を持つ2つの個別のRNAプローブ(それぞれID1及びID32)を合成した。
まず、PlatinumTMSuperFiTMPCR Master Mix(サーモフィッシャーサイエンティフィック株式会社製)を使用して、ライブラリをPCRで増幅した。第1のライブラリと、このライブラリ中の2つの個別の一本鎖RNA用には、T7RNAポリメラーゼプロモーター配列(IVTの認識サイト:5’-TAATACGACTCACTATAG-3’(配列番号6))の下流に5’カセット配列を有するフォワードプライマーと、3’カセット配列に相補的な配列を有するリバースプライマーを使用した。第2のライブラリを調製するためのリバースプライマーとしては、Pr_d2a(配列番号2)、Pr_d2b(配列番号3)、Pr_d2c(配列番号4)及びPr_d2d(配列番号5)を使用して4つの異なるバッチを作成し、バーコードを付加した。すべての反応において、各プライマーは、最終濃度500nMになるように添加し、テンプレートは総濃度0.4nMで提供した。反応容量は25μLであった。すべてのPCRはサーモフィッシャーサイエンティフィック株式会社のProFlexTMPCRシステムで行った。
RNA修飾には2つの異なる化学修飾剤を使用した。シグマアルドリッチから購入したメチル化剤の硫酸ジメチル(DMS)、及びSHAPE試薬2-メチルニコチン酸イミダゾリド(NAI)である。両方の修飾剤を用いた実験では、同じRNA調製物を使用した。6μLの水に溶解した250ngのRNA(一本鎖またはプール)を95℃で2分間インキュベートし、氷上で少なくとも2分間急冷した。次に、3μLの3.3×フォールディングバッファーを加え、サンプルを37℃で20分間インキュベートした(1×フォールディングバッファーは、100mM HEPES(pH8.0),100mM NaCl,10mM MgCl2で構成されている)。
1000mMのNAI溶液1μLを、空の0.2mLのPCRチューブに加えた。RNAを加える直前まで、チューブを氷上で維持した。37℃で、RNAを含む9μLのサンプルをNAIに加え、溶液を上下にピペッティングして混合した。サンプルは37℃で10分間放置した。
37℃で、エタノールを含む1μLの50%DMSを、先に調製したRNAを含む9μLのサンプルに加えた。サンプルを37℃で6分間放置した。5μLのβ-メルカプトエタノールで反応を停止し、完全に混合した後、37℃で2分間インキュベートした。次に、RNAをZymo ResearchのRNA Clean and Concentrator-5キットで精製し、最終溶出量を15μLにした。DMSで修飾された各RNAサンプルについて、DMSの代わりに1μLの50%エタノール水溶液を用いて同じ方法で処理したコントロールサンプルを調製した。
修飾されたRNAサンプルは、3’カセット配列に相補的な配列を有するリバースプライマーを使用して逆転写反応を行った。NAI修飾RNAの場合、マンガンの存在下で酵素SuperScriptTMII逆転写酵素(サーモフィッシャーサイエンティフィック株式会社)を使用した。DMS修飾RNAの場合、TGIRTTM-III酵素(InGex)を使用した。どちらの場合も、1μLの2μMリバースプライマーを2μLの10mMdNTP(New England Biolabs)と7μLの先に修飾したRNAと混合した。サンプルは、サーモフィッシャーサイエンティフィック株式会社のProFlexTMPCRシステムでアニールされ(85℃、1分→65℃、10分→4℃で保持)、これは逆転写ステップにも使用した。次に、9μLの2.22×MaPバッファーを添加して、室温で2分間インキュベートし、1μLの酵素を加え、サンプルをサイクラーに入れて逆転写した(表2を参照)。
ライブラリの準備には、アンプリコンPCRとインデックスPCRの2つのPCRを行った。アンプリコンPCR用1ngの逆転写生成物は、25μLの反応容量で使用した。その他の反応コンポーネントは、1xPlatinumTMSuperFiTMPCR Master Mixと1×SuperFi GC Enhancer(どちらもサーモフィッシャーサイエンティフィック株式会社製)、500nMのフォワードプライマー及びリバースプライマーを用いた。サンプルをProFlexTMPCRシステムに移した。最初に、30秒間98℃に加熱した後、98℃で10秒間、64℃で10秒間、72℃で20秒間の3ステップPCRを行った。最後のサイクルの後、温度は72℃で5分間保持され、その後4℃に冷却した。精製には、Monarch(登録商標)PCR&DNA Cleanup Kit(5μg)(New England Biolabs Inc.)のDNAクリーンアップおよび濃縮プロトコルを使用した。最終溶出には、8μLのDNA溶出バッファーを使用した。これで、次世代シーケンシング用のインデックスを付ける準備ができた。
シーケンシングには、ペアエンドリードと標準リードプライマーを使用したNextSeq500/550ミッドアウトプットキットv2.5(イルミナ社、150サイクル)を使用した。
FASTQファイルのアダプターは最初にトリミングされ、次に、アラインメントソフトウェアを使用して生成されたFASTQファイルのリードを、アラインメントソフトウェアを使用して参照配列が含まれたファイル (リファレンスファイル)に対してマッピングを行った。本解析ではSTARアライナーソフトウェアを用いてマッピングした。さらなる分析のために、変異、欠失および挿入をカウントした。
(RNAライブラリ内の配列を区別するためのバーコード)
バーコードが変異プロファイリング実験で類似の配列を区別するのに役立つかどうかをテストするために、第1の設計によるライブラリを使用した。文字列の類似度を測る指標としてレーベンシュタイン距離を使用して、2つの配列の類似性を測定した。この距離は、ある配列を別の配列に変換するための挿入、削除、変異の最小数を示す。バーコードを付加しなければ、ライブラリ内の配列の任意のペアに対して、この数は1又は2となる。バーコードを付加すると、レーベンシュタイン距離は7以上である。したがって、変異プロファイリング実験で予想される変異率の増加があっても、シーケンスを正しく識別することができる。完全なライブラリに加えて、ライブラリの2つの単一シーケンス(ID1とID32)をコントロールとして用いた。ID1は構造化バーコードを含むが、ID32は非構造化バーコードを含む(図5参照)。
デルタ変異率=修飾変異率-未修飾変異率 (1)
第2のライブラリを使用して、バーコードがすべてのバージョンの共通プール内のRNAライブラリの異なるバージョンを区別するのに役立つかどうかを実験した。このため、第2のライブラリは、インビトロ転写の前にプライマーPr_d2a、Pr_d2b、Pr_d2c、Pr_d2dを使用してバッチバーコード(第2のバーコード)をRNAに付与し4つの異なるバージョンに区別した。図6に示したように、RNAライブラリの4つの異なるバージョンが、NAI又はDMSで修飾されるか、又はそれぞれのコントロールとして取り扱われた。精製ステップの後、ライブラリの4つのバージョンの等量を混合することにより、プールされたサンプルが各処理条件に対して作成された。ライブラリの4つの異なるバージョンとプールされたサンプルのそれぞれは、連続したステップで同じ方法で処理された。
図8は、単一IDの変異プロファイルのみを示している。次にすべてのIDの変異プロファイルを分析し、ViennaRNAパッケージで予測した二次構造と比較した。図9は、第2のライブラリをそれぞれ単独で又はプールしてNAI又はDMSで化学修飾したときの、塩基対を形成すると予測された領域(図9の黒い領域)と非結合であると予測された領域(図9の灰色の領域)のデルタ変異率の絶対値を別々にプロットしたバイオリンプロットである。図9(a)は、NAIで処理したサンプル、図9(b)は、DMSで処理したサンプルであり、それぞれのx軸に示したIDのうち、ID1~28は構造化バーコード配列を、ID29~37は非構造化バーコード配列を含む。この結果は、4つの個別のサンプル(図9の「バイオリン」の左側)とプールされたサンプル(図9の「バイオリン」の右側)の分布が非常に似ていることも示している。DMSの場合、塩基AとCの位置のみが考慮される。
全体で54種類のRNA構造が混在するマルチプレックス化されたライブラリ(RNAプローブライブラリ)に対して、96種類の構造化バッチバーコードを用意した。その後マッピングのために、ライブラリに含まれる54種類すべてのRNA構造に異なるバーコードを付与し、96×54種類のリファレンスファイルを作成した。実際にそのうちIDが異なる2種類のバッチバーコードを付加したRNAプローブライブラリを試験管内合成し、DMSによる変異プロファイル実験を行った。検証実験のために異なる構造化バッチバーコードに対して対応したインデックスを付与し、次世代シーケンシング解析を行った。その後、得られたすべてのリードをリファレンスファイルにマッピングをした。本解析ではSTARアライナーソフトウェアを用いてマッピングした。その結果を図10及び図11に示す。
RNAを用いた変異プロファイル反応を終え、DNAに変換したのちに市販のインデックスプライマー(例、Nextera XT Index Kit <イルミナ社>)などと組み合わせることで、サンプルの由来や条件の複雑性を上げることができる。図14は縦軸にイルミナ社の配列に基づいたインデックスプライマー(バーコードとして機能する)、横軸に実施例2で調製した構造化RNA ID7のサンプルをマッピングした際に判定されたIDを示す。カラースケールはリード数の平均値を示す。
全体で異なる1500種類のRNAプローブが混在するマルチプレックス化されたライブラリ(RNAプローブライブラリ)に対して、32種類の構造化バッチバーコードを用意した。その後マッピングのために、1500種類すべてのRNAに異なるバッチバーコードを付与し、32×1500種類(48000種類)のリファレンスファイルとともに実際にRNAプローブライブラリを試験管内合成した。次に、構造化バッチバーコードが付与されたRNAプローブライブラリ群を用いたプロファイル解析を行った。検証実験のために32の異なる構造化バッチバーコードに対してすべて32種類異なるインデックスプライマーを用いてインデックス(Index ID)を付与し、次世代シーケンサー(MiSeq<イルミナ社>)によるシーケンシング解析を行った。その後、インデックスにより32種類のファイルに分配した。バーコードが正しく機能すれば、インデックスID1に相当するファイルには構造化バッチバーコードID1が付与されたRNAプローブライブラリが含まれる。その後、得られたすべてのリードをリファレンスファイルにマッピングをした。本解析ではSTARアライナーソフトウェアを用いてマッピングした。
[1] Komatsu, K. R., Taya, T., Matsumoto, S., Miyashita, E., Kashida, S., & Saito, H. (2020). RNA structure-wide discovery of functional interactions with multiplexed RNA motif library. Nature communications, 11(1), 1-14.
[2] Tapsin, S., Sun, M., Shen, Y., Zhang, H., Lim, X. N., Susanto, T. T., ... & Wan, Y. (2018). Genome-wide identification of natural RNA aptamers in prokaryotes and eukaryotes. Nature communications, 9(1), 1-10.
[3] Corley, M., Flynn, R. A., Lee, B., Blue, S. M., Chang, H. Y., & Yeo, G. W. (2020). Footprinting SHAPE-eCLIP Reveals Transcriptome-wide Hydrogen Bonds at RNA-Protein Interfaces. Molecular Cell, 80(5), 903-914.
Claims (14)
- RNAの高次構造を解析するための方法であって、
(a)解析対象RNAにバーコード配列を付加した1又は複数のRNAプローブを調製する工程、
(b)前記RNAプローブとRNA修飾剤とを接触させる工程、及び
(c)工程(b)で得られたRNAプローブの配列中で、修飾を受けた塩基の位置と頻度を検出する工程、を含み、前記バーコード配列は前記RNA修飾剤との反応が抑制される構造を有する、方法。 - 前記工程(c)が以下の工程:
(c1)工程(b)で得られたRNAプローブの混合物を鋳型として逆転写酵素により相補DNAを合成する工程、
(c2)前記相補DNAの塩基配列を決定し、前記バーコード配列を含む塩基配列を整列させる工程、及び
(c3)前記整列させた塩基配列に生じた変異の位置と頻度を検出する工程、
を含む、請求項1に記載の方法。 - 前記RNA修飾剤が、前記RNAプローブ中の拘束ヌクレオチドを選択的に修飾するとき、前記バーコード配列が塩基対を形成しない配列である請求項1又は2に記載の方法。
- 前記RNA修飾剤が、前記RNAプローブ中の非拘束ヌクレオチドを選択的に修飾するとき、前記バーコード配列が複数の塩基対を含む構造を形成する請求項1又は2に記載の方法。
- 前記複数の塩基対を含む構造が、相補的な二本鎖構造、三重鎖構造又は四重鎖構造である請求項4に記載の方法。
- 前記複数の塩基対が、ステム-ループ構造又はシュードノット構造のステム部位に存在する請求項4又は5に記載の方法。
- 前記複数の塩基対を含む構造が、ステム-ループ構造であり、ステム部位に1つ以上のバルジ及び/又は内部ループ構造を有する請求項4~6のいずれか一項に記載の方法。
- 前記複数の塩基対を含む構造が、PDB(Protein Data Bank)に登録されているRNA構造又はその改変体である請求項4~7のいずれか一項に記載の方法。
- 前記解析対象RNAが、少なくとも1つのRNAモチーフを含む、請求項1~8のいずれか一項に記載の方法。
- 複数の塩基対を含む構造を形成するバーコード配列が付加された解析対象RNAを含むRNAプローブ。
- 前記複数の塩基対を含む構造が、相補的な二本鎖構造、三重鎖構造又は四重鎖構造である請求項10に記載のRNAプローブ。
- 前記複数の塩基対が、ステム-ループ構造又はシュードノット構造のステム部位に存在する請求項10又は11に記載のRNAプローブ。
- 複数の塩基対を含む構造を形成するバーコード配列がそれぞれの解析対象RNAに付加された複数のRNAプローブを含むRNAプローブライブラリ。
- 請求項13に記載のRNAプローブライブラリの2以上の複製物からなるRNAプローブライブラリ群であって、複製されたすべてのRNAプローブは、さらに第2のバーコード配列を含み、前記第2のバーコード配列は、1つのライブラリ内ではすべて同一配列であるが他のライブラリとの間では識別可能である、RNAプローブライブラリ群。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP21894688.7A EP4202056A4 (en) | 2020-11-18 | 2021-11-17 | RNA PROBE FOR MUTATION PROFILING AND ITS USE |
CA3200114A CA3200114C (en) | 2020-11-18 | 2021-11-17 | Rna probe for mutation profiling and use thereof |
CN202180064091.3A CN116234903B (zh) | 2020-11-18 | 2021-11-17 | 用于突变谱分析的rna探针及其用途 |
IL301876A IL301876B2 (en) | 2020-11-18 | 2021-11-17 | RNA testing for mutation profiling and its use |
JP2022530711A JP7141165B1 (ja) | 2020-11-18 | 2021-11-17 | 変異プロファイリングのためのrnaプローブ及びその使用 |
JP2022139711A JP2022177068A (ja) | 2020-11-18 | 2022-09-02 | 変異プロファイリングのためのrnaプローブ及びその使用 |
US18/296,375 US20240052339A1 (en) | 2020-11-18 | 2023-04-06 | Rna probe for mutation profiling and use thereof |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020-191550 | 2020-11-18 | ||
JP2020191550 | 2020-11-18 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/296,375 Continuation US20240052339A1 (en) | 2020-11-18 | 2023-04-06 | Rna probe for mutation profiling and use thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022107814A1 true WO2022107814A1 (ja) | 2022-05-27 |
Family
ID=81708923
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/042250 WO2022107814A1 (ja) | 2020-11-18 | 2021-11-17 | 変異プロファイリングのためのrnaプローブ及びその使用 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20240052339A1 (ja) |
EP (1) | EP4202056A4 (ja) |
JP (2) | JP7141165B1 (ja) |
CN (1) | CN116234903B (ja) |
CA (1) | CA3200114C (ja) |
IL (1) | IL301876B2 (ja) |
WO (1) | WO2022107814A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0312220B2 (ja) | 1985-05-08 | 1991-02-19 | Honda Motor Co Ltd | |
WO2018003809A1 (ja) | 2016-06-27 | 2018-01-04 | 国立大学法人京都大学 | Rna構造ライブラリ |
JP6612220B2 (ja) * | 2013-10-07 | 2019-11-27 | ザ ユニバーシティ オブ ノース カロライナ アット チャペル ヒル | 核酸における化学修飾の検出 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101622363B (zh) * | 2007-07-13 | 2012-09-05 | 爱科来株式会社 | 用于检测jak2基因的突变的探针及其用途 |
CN101586150B (zh) * | 2008-05-23 | 2016-09-28 | 陕西佰美基因股份有限公司 | 检测探针、通用寡核苷酸芯片及核酸检测方法及其用途 |
US9175338B2 (en) * | 2008-12-11 | 2015-11-03 | Pacific Biosciences Of California, Inc. | Methods for identifying nucleic acid modifications |
WO2011140510A2 (en) * | 2010-05-06 | 2011-11-10 | Bioo Scientific Corporation | Oligonucleotide ligation, barcoding and methods and compositions for improving data quality and throughput using massively parallel sequencing |
ES2927412T3 (es) * | 2018-11-08 | 2022-11-04 | Siemens Healthcare Gmbh | Secuenciación directa de nanoporos de ARN con la ayuda de un polinucleótido de horquilla |
-
2021
- 2021-11-17 EP EP21894688.7A patent/EP4202056A4/en active Pending
- 2021-11-17 IL IL301876A patent/IL301876B2/en unknown
- 2021-11-17 CN CN202180064091.3A patent/CN116234903B/zh active Active
- 2021-11-17 JP JP2022530711A patent/JP7141165B1/ja active Active
- 2021-11-17 WO PCT/JP2021/042250 patent/WO2022107814A1/ja unknown
- 2021-11-17 CA CA3200114A patent/CA3200114C/en active Active
-
2022
- 2022-09-02 JP JP2022139711A patent/JP2022177068A/ja active Pending
-
2023
- 2023-04-06 US US18/296,375 patent/US20240052339A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0312220B2 (ja) | 1985-05-08 | 1991-02-19 | Honda Motor Co Ltd | |
JP6612220B2 (ja) * | 2013-10-07 | 2019-11-27 | ザ ユニバーシティ オブ ノース カロライナ アット チャペル ヒル | 核酸における化学修飾の検出 |
WO2018003809A1 (ja) | 2016-06-27 | 2018-01-04 | 国立大学法人京都大学 | Rna構造ライブラリ |
Non-Patent Citations (18)
Title |
---|
AW, J.G.A.LIM, S.W.WANG, J.X. ET AL.: "Determination of isoform-specific RNA structure with nanopore long reads", NAT BIOTECHNOL, 2020 |
BENSON, D. ET AL., NUCL. ACIDS RES., vol. 41, 2013, pages D36 - D42 |
BERKOWITZ, N. D. ET AL., BMC BIOINFORMATICS, vol. 17, 2016, pages 215 |
CORLEY, M.FLYNN, R. A.LEE, B.BLUE, S. M.CHANG, H. Y.YEO, G. W.: "Footprinting SHAPE-eCLIP Reveals Transcriptome-wide Hydrogen Bonds at RNA-Protein Interface", MOLECULAR CELL, vol. 80, no. 5, 2020, pages 903 - 914, XP086383987, DOI: 10.1016/j.molcel.2020.11.014 |
GARALDE, D. R. ET AL.: "Highly parallel direct RNA sequencing on an array of nanopores", NAT. METHODS, 2018 |
HAMADA, M. ET AL., BIOINFORMATICS, vol. 25, 2009, pages 465 - 473 |
KOMATSU, K. R.TAYA, T.MATSUMOTO, S.MIYASHITA, E.KASHIDA, S.SAITO, H.: "RNA structure-wide discovery of functional interactions with multiplexed RNA motif library", NATURE COMMUNICATIONS, vol. 11, no. 1, 2020, pages 1 - 14 |
KWOK CHUN KIT, TANG YIN, ASSMANN SARAH M., BEVILACQUA PHILIP C.: "The RNA structurome: transcriptome-wide structure probing with next-generation sequencing", TRENDS IN BIOCHEMICAL SCIENCES, vol. 40, no. 4, 1 April 2015 (2015-04-01), AMSTERDAM, NL , pages 221 - 232, XP055931788, ISSN: 0968-0004, DOI: 10.1016/j.tibs.2015.02.005 * |
MEGAN ZUBRADT ET AL.: "DMS-Mapseq for genome-wide or targeted RNA structure probing in vivo", NAT METHODS, vol. 14, 2017, pages 75 - 82, XP055931783, DOI: 10.1038/nmeth.4057 |
MOKREJS, M. ET AL., NUCL. ACIDS RES., vol. 38, 2010, pages D131 - D136 |
NAWROCKI, E. P. ET AL., NUCL. ACIDS RES., vol. 43, 2015, pages D130 - D137 |
SATO, K. ET AL., METHODS BIOCHEM. ANAL., vol. 27, 2011, pages i85 - i93 |
See also references of EP4202056A4 |
STROBEL ERIC J; WATTERS KYLE E; LOUGHREY DAVID; LUCKS JULIUS B: "RNA systems biology: uniting functional discoveries and structural tools to understand global roles of RNAs", CURRENT OPINION IN BIOTECHNOLOGY, vol. 39, 30 April 2016 (2016-04-30), GB , pages 182 - 191, XP029569342, ISSN: 0958-1669, DOI: 10.1016/j.copbio.2016.03.019 * |
TAPSIN, S.SUN, M.SHEN, Y.ZHANG, H.LIM, X. N.SUSANTO, T. TWAN. Y.: "Genome-wide identification of natural RNA aptamers in prokaryotes and eukaryotes", NATURE COMMUNICATIONS, vol. 9, no. 1, 2018, pages 1 - 10 |
WILLIAM STEPHENSON ET AL.: "Direct detection of RNA modifications and structure using single molecule nanopore", BIORXIV DOI: HTTPS://DOI.ORG/10.1101/2020.05.31.126763, 1 June 2020 (2020-06-01) |
WORKMAN, R.E. ET AL.: "Nanopore native RNA sequencing of a human poly(A) transcriptome", NAT. METHODS, vol. 16, 2019, pages 1297 - 1305, XP036953641, DOI: 10.1038/s41592-019-0617-2 |
ZUBRADT MEGHAN, GUPTA PAROMITA, PERSAD SITARA, LAMBOWITZ ALAN M, WEISSMAN JONATHAN S, ROUSKIN SILVI: "DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo", NATURE METHODS, vol. 14, no. 1, 1 January 2017 (2017-01-01), New York, pages 75 - 82, XP055931783, ISSN: 1548-7091, DOI: 10.1038/nmeth.4057 * |
Also Published As
Publication number | Publication date |
---|---|
JP2022177068A (ja) | 2022-11-30 |
CA3200114C (en) | 2024-06-04 |
IL301876B2 (en) | 2024-05-01 |
US20240052339A1 (en) | 2024-02-15 |
JP7141165B1 (ja) | 2022-09-22 |
EP4202056A1 (en) | 2023-06-28 |
IL301876A (en) | 2023-06-01 |
CN116234903A (zh) | 2023-06-06 |
CA3200114A1 (en) | 2022-05-27 |
CN116234903B (zh) | 2024-06-11 |
JPWO2022107814A1 (ja) | 2022-05-27 |
IL301876B1 (en) | 2024-01-01 |
EP4202056A4 (en) | 2024-05-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11676682B1 (en) | Methods for accurate sequence data and modified base position determination | |
RU2698125C2 (ru) | Библиотеки для секвенирования нового поколения | |
CN109154013B (zh) | 转座酶和y衔接子用于片段化和标签化dna的用途 | |
CN102648295B (zh) | 用于多重基因分型的多样品索引 | |
CN109844137B (zh) | 用于鉴定嵌合产物的条形码化环状文库构建 | |
JP2017508471A (ja) | 次世代シークエンシングにおける稀な遺伝子変異の正確な検出 | |
US10385476B2 (en) | Methods and compositions for the selection and optimization of oligonucleotide tag sequences | |
US20220364169A1 (en) | Sequencing method for genomic rearrangement detection | |
CN108138175A (zh) | 用于分子条形码编码的试剂、试剂盒和方法 | |
JP2022160425A (ja) | 次世代配列決定法を用いた標的タンパク質の集団的定量方法とその用途 | |
WO2022107814A1 (ja) | 変異プロファイリングのためのrnaプローブ及びその使用 | |
TWI771847B (zh) | 擴增和確定目標核苷酸序列的方法 | |
WO2023201487A1 (zh) | 接头、接头连接试剂及试剂盒和文库构建方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2022530711 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21894688 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021894688 Country of ref document: EP Effective date: 20230322 |
|
ENP | Entry into the national phase |
Ref document number: 3200114 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |