WO2022186673A1 - Next-generation-sequencing-based rna sequencing panel for targeted genes, and analysis algorithm - Google Patents
Next-generation-sequencing-based rna sequencing panel for targeted genes, and analysis algorithm Download PDFInfo
- Publication number
- WO2022186673A1 WO2022186673A1 PCT/KR2022/003196 KR2022003196W WO2022186673A1 WO 2022186673 A1 WO2022186673 A1 WO 2022186673A1 KR 2022003196 W KR2022003196 W KR 2022003196W WO 2022186673 A1 WO2022186673 A1 WO 2022186673A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- fusion
- gene
- leukemia
- data
- probe
- Prior art date
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 145
- 238000007481 next generation sequencing Methods 0.000 title claims description 56
- 238000004458 analytical method Methods 0.000 title claims description 20
- 238000003559 RNA-seq method Methods 0.000 title abstract description 49
- 230000004927 fusion Effects 0.000 claims abstract description 195
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000001514 detection method Methods 0.000 claims abstract description 26
- 239000000523 sample Substances 0.000 claims description 68
- 208000032839 leukemia Diseases 0.000 claims description 61
- 230000035772 mutation Effects 0.000 claims description 41
- 230000014509 gene expression Effects 0.000 claims description 39
- 238000003745 diagnosis Methods 0.000 claims description 37
- 238000012163 sequencing technique Methods 0.000 claims description 35
- 102100037799 DNA-binding protein Ikaros Human genes 0.000 claims description 32
- 101000599038 Homo sapiens DNA-binding protein Ikaros Proteins 0.000 claims description 32
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 claims description 30
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 claims description 30
- 101000912957 Homo sapiens Protein DEK Proteins 0.000 claims description 30
- 102100026113 Protein DEK Human genes 0.000 claims description 30
- 108010060313 Core Binding Factor beta Subunit Proteins 0.000 claims description 27
- 102000008147 Core Binding Factor beta Subunit Human genes 0.000 claims description 27
- 102100021631 B-cell lymphoma 6 protein Human genes 0.000 claims description 26
- 102100026008 Breakpoint cluster region protein Human genes 0.000 claims description 26
- 101000971234 Homo sapiens B-cell lymphoma 6 protein Proteins 0.000 claims description 26
- 101000933320 Homo sapiens Breakpoint cluster region protein Proteins 0.000 claims description 26
- 101000823316 Homo sapiens Tyrosine-protein kinase ABL1 Proteins 0.000 claims description 25
- 102100022596 Tyrosine-protein kinase ABL1 Human genes 0.000 claims description 25
- 102100021569 Apoptosis regulator Bcl-2 Human genes 0.000 claims description 24
- 108091012583 BCL2 Proteins 0.000 claims description 24
- -1 EPOR Proteins 0.000 claims description 23
- 102100024379 AF4/FMR2 family member 1 Human genes 0.000 claims description 22
- 102100021975 CREB-binding protein Human genes 0.000 claims description 22
- 102100033992 Dual specificity protein phosphatase 22 Human genes 0.000 claims description 22
- 102100022103 Histone-lysine N-methyltransferase 2A Human genes 0.000 claims description 22
- 101000833180 Homo sapiens AF4/FMR2 family member 1 Proteins 0.000 claims description 22
- 101000896987 Homo sapiens CREB-binding protein Proteins 0.000 claims description 22
- 101001017467 Homo sapiens Dual specificity protein phosphatase 22 Proteins 0.000 claims description 22
- 101001045846 Homo sapiens Histone-lysine N-methyltransferase 2A Proteins 0.000 claims description 22
- 101000581507 Homo sapiens Methyl-CpG-binding domain protein 1 Proteins 0.000 claims description 22
- 101001134861 Homo sapiens Pericentriolar material 1 protein Proteins 0.000 claims description 22
- 101000813738 Homo sapiens Transcription factor ETV6 Proteins 0.000 claims description 22
- 102100027383 Methyl-CpG-binding domain protein 1 Human genes 0.000 claims description 22
- 101000956427 Homo sapiens Cytokine receptor-like factor 2 Proteins 0.000 claims description 21
- 101001129654 Homo sapiens Prohibitin-2 Proteins 0.000 claims description 21
- 101000997832 Homo sapiens Tyrosine-protein kinase JAK2 Proteins 0.000 claims description 21
- 102100031156 Prohibitin-2 Human genes 0.000 claims description 21
- 102100039580 Transcription factor ETV6 Human genes 0.000 claims description 21
- 102100033444 Tyrosine-protein kinase JAK2 Human genes 0.000 claims description 21
- 102100038497 Cytokine receptor-like factor 2 Human genes 0.000 claims description 20
- 108010051742 Platelet-Derived Growth Factor beta Receptor Proteins 0.000 claims description 20
- 102100026547 Platelet-derived growth factor receptor beta Human genes 0.000 claims description 20
- 108010018650 MEF2 Transcription Factors Proteins 0.000 claims description 19
- 102100039212 Myocyte-specific enhancer factor 2D Human genes 0.000 claims description 19
- 102100032481 B-cell CLL/lymphoma 9 protein Human genes 0.000 claims description 18
- 102000015367 CRBN Human genes 0.000 claims description 18
- 101000798495 Homo sapiens B-cell CLL/lymphoma 9 protein Proteins 0.000 claims description 18
- 101001030211 Homo sapiens Myc proto-oncogene protein Proteins 0.000 claims description 18
- 101000601724 Homo sapiens Paired box protein Pax-5 Proteins 0.000 claims description 18
- 101001129610 Homo sapiens Prohibitin 1 Proteins 0.000 claims description 18
- 101000959489 Homo sapiens Protein AF-9 Proteins 0.000 claims description 18
- 101000941994 Homo sapiens Protein cereblon Proteins 0.000 claims description 18
- 101000962461 Homo sapiens Transcription factor Maf Proteins 0.000 claims description 18
- 102100038895 Myc proto-oncogene protein Human genes 0.000 claims description 18
- 102100029166 NT-3 growth factor receptor Human genes 0.000 claims description 18
- 102100037504 Paired box protein Pax-5 Human genes 0.000 claims description 18
- 102100031169 Prohibitin 1 Human genes 0.000 claims description 18
- 102100039686 Protein AF-9 Human genes 0.000 claims description 18
- 101000613608 Rattus norvegicus Monocyte to macrophage differentiation factor Proteins 0.000 claims description 18
- 102100039189 Transcription factor Maf Human genes 0.000 claims description 18
- 101100395211 Trichoderma harzianum his3 gene Proteins 0.000 claims description 18
- 102100027881 Tumor protein 63 Human genes 0.000 claims description 18
- 101710140697 Tumor protein 63 Proteins 0.000 claims description 18
- 108010064892 trkC Receptor Proteins 0.000 claims description 18
- ZEOWTGPWHLSLOG-UHFFFAOYSA-N Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F Chemical compound Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F ZEOWTGPWHLSLOG-UHFFFAOYSA-N 0.000 claims description 17
- 102100023593 Fibroblast growth factor receptor 1 Human genes 0.000 claims description 17
- 101710182386 Fibroblast growth factor receptor 1 Proteins 0.000 claims description 17
- 102100031785 Endothelial transcription factor GATA-2 Human genes 0.000 claims description 16
- 101001066265 Homo sapiens Endothelial transcription factor GATA-2 Proteins 0.000 claims description 16
- 101000823271 Homo sapiens Tyrosine-protein kinase ABL2 Proteins 0.000 claims description 16
- 102100022651 Tyrosine-protein kinase ABL2 Human genes 0.000 claims description 16
- 108010058546 Cyclin D1 Proteins 0.000 claims description 15
- 102100024165 G1/S-specific cyclin-D1 Human genes 0.000 claims description 15
- 101000916644 Homo sapiens Macrophage colony-stimulating factor 1 receptor Proteins 0.000 claims description 15
- 101000846284 Homo sapiens Pre-mRNA 3'-end-processing factor FIP1 Proteins 0.000 claims description 15
- 102100028198 Macrophage colony-stimulating factor 1 receptor Human genes 0.000 claims description 15
- 102100031755 Pre-mRNA 3'-end-processing factor FIP1 Human genes 0.000 claims description 15
- AQQSXKSWTNWXKR-UHFFFAOYSA-N 2-(2-phenylphenanthro[9,10-d]imidazol-3-yl)acetic acid Chemical compound C1(=CC=CC=C1)C1=NC2=C(N1CC(=O)O)C1=CC=CC=C1C=1C=CC=CC=12 AQQSXKSWTNWXKR-UHFFFAOYSA-N 0.000 claims description 14
- 102100033793 ALK tyrosine kinase receptor Human genes 0.000 claims description 14
- 102100026031 Beta-glucuronidase Human genes 0.000 claims description 14
- 102100024185 G1/S-specific cyclin-D2 Human genes 0.000 claims description 14
- 102100037859 G1/S-specific cyclin-D3 Human genes 0.000 claims description 14
- 101000933465 Homo sapiens Beta-glucuronidase Proteins 0.000 claims description 14
- 101000896557 Homo sapiens Eukaryotic translation initiation factor 3 subunit B Proteins 0.000 claims description 14
- 101000980741 Homo sapiens G1/S-specific cyclin-D2 Proteins 0.000 claims description 14
- 101000738559 Homo sapiens G1/S-specific cyclin-D3 Proteins 0.000 claims description 14
- 101000988834 Homo sapiens Hypoxanthine-guanine phosphoribosyltransferase Proteins 0.000 claims description 14
- 101000971533 Homo sapiens Killer cell lectin-like receptor subfamily G member 1 Proteins 0.000 claims description 14
- 101001067833 Homo sapiens Peptidyl-prolyl cis-trans isomerase A Proteins 0.000 claims description 14
- 101000718497 Homo sapiens Protein AF-10 Proteins 0.000 claims description 14
- 101000584785 Homo sapiens Ras-related protein Rab-7a Proteins 0.000 claims description 14
- 101000596772 Homo sapiens Transcription factor 7-like 1 Proteins 0.000 claims description 14
- 101000666382 Homo sapiens Transcription factor E2-alpha Proteins 0.000 claims description 14
- 101000979205 Homo sapiens Transcription factor MafA Proteins 0.000 claims description 14
- 101000979190 Homo sapiens Transcription factor MafB Proteins 0.000 claims description 14
- 101000964718 Homo sapiens Zinc finger protein 384 Proteins 0.000 claims description 14
- 102100029098 Hypoxanthine-guanine phosphoribosyltransferase Human genes 0.000 claims description 14
- 102100034539 Peptidyl-prolyl cis-trans isomerase A Human genes 0.000 claims description 14
- 102100026286 Protein AF-10 Human genes 0.000 claims description 14
- 102100030019 Ras-related protein Rab-7a Human genes 0.000 claims description 14
- 102100038313 Transcription factor E2-alpha Human genes 0.000 claims description 14
- 102100023237 Transcription factor MafA Human genes 0.000 claims description 14
- 102100023234 Transcription factor MafB Human genes 0.000 claims description 14
- 108700020467 WT1 Proteins 0.000 claims description 14
- 101150084041 WT1 gene Proteins 0.000 claims description 14
- 102100022748 Wilms tumor protein Human genes 0.000 claims description 14
- 102100040731 Zinc finger protein 384 Human genes 0.000 claims description 14
- OGQSCIYDJSNCMY-UHFFFAOYSA-H iron(3+);methyl-dioxido-oxo-$l^{5}-arsane Chemical compound [Fe+3].[Fe+3].C[As]([O-])([O-])=O.C[As]([O-])([O-])=O.C[As]([O-])([O-])=O OGQSCIYDJSNCMY-UHFFFAOYSA-H 0.000 claims description 14
- 102100029234 Histone-lysine N-methyltransferase NSD2 Human genes 0.000 claims description 13
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 claims description 13
- 101000634048 Homo sapiens Histone-lysine N-methyltransferase NSD2 Proteins 0.000 claims description 13
- 101000610107 Homo sapiens Pre-B-cell leukemia transcription factor 1 Proteins 0.000 claims description 13
- 101000909637 Homo sapiens Transcription factor COE1 Proteins 0.000 claims description 13
- 102100040171 Pre-B-cell leukemia transcription factor 1 Human genes 0.000 claims description 13
- 102100024207 Transcription factor COE1 Human genes 0.000 claims description 13
- 101001126417 Homo sapiens Platelet-derived growth factor receptor alpha Proteins 0.000 claims description 12
- 102100030485 Platelet-derived growth factor receptor alpha Human genes 0.000 claims description 12
- 108010043471 Core Binding Factor Alpha 2 Subunit Proteins 0.000 claims description 11
- 102100039121 Histone-lysine N-methyltransferase MECOM Human genes 0.000 claims description 11
- 101100076418 Homo sapiens MECOM gene Proteins 0.000 claims description 11
- 101000685323 Homo sapiens Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial Proteins 0.000 claims description 11
- 101000837401 Homo sapiens T-cell leukemia/lymphoma protein 1A Proteins 0.000 claims description 11
- 108700024831 MDS1 and EVI1 Complex Locus Proteins 0.000 claims description 11
- 102100023155 Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial Human genes 0.000 claims description 11
- 102100028676 T-cell leukemia/lymphoma protein 1A Human genes 0.000 claims description 11
- 238000009396 hybridization Methods 0.000 claims description 11
- 102100027951 Brain and acute leukemia cytoplasmic protein Human genes 0.000 claims description 10
- 101100219190 Drosophila melanogaster byn gene Proteins 0.000 claims description 10
- 101100129584 Escherichia coli (strain K12) trg gene Proteins 0.000 claims description 10
- 101001077417 Gallus gallus Potassium voltage-gated channel subfamily H member 6 Proteins 0.000 claims description 10
- 102100031181 Glyceraldehyde-3-phosphate dehydrogenase Human genes 0.000 claims description 10
- 102100027377 HBS1-like protein Human genes 0.000 claims description 10
- 102100038885 Histone acetyltransferase p300 Human genes 0.000 claims description 10
- 101000697853 Homo sapiens Brain and acute leukemia cytoplasmic protein Proteins 0.000 claims description 10
- 101001009070 Homo sapiens HBS1-like protein Proteins 0.000 claims description 10
- 101000882390 Homo sapiens Histone acetyltransferase p300 Proteins 0.000 claims description 10
- 101001011441 Homo sapiens Interferon regulatory factor 4 Proteins 0.000 claims description 10
- 101001055145 Homo sapiens Interleukin-2 receptor subunit beta Proteins 0.000 claims description 10
- 101001033279 Homo sapiens Interleukin-3 Proteins 0.000 claims description 10
- 101001013158 Homo sapiens Myeloid leukemia factor 1 Proteins 0.000 claims description 10
- 101000591286 Homo sapiens Myocardin-related transcription factor A Proteins 0.000 claims description 10
- 101001000104 Homo sapiens Myosin-11 Proteins 0.000 claims description 10
- 101000844245 Homo sapiens Non-receptor tyrosine-protein kinase TYK2 Proteins 0.000 claims description 10
- 101000583474 Homo sapiens Phosphatidylinositol-binding clathrin assembly protein Proteins 0.000 claims description 10
- 101000611053 Homo sapiens Proteasome subunit beta type-2 Proteins 0.000 claims description 10
- 101000573199 Homo sapiens Protein PML Proteins 0.000 claims description 10
- 101001062093 Homo sapiens RNA-binding protein 15 Proteins 0.000 claims description 10
- 101001112293 Homo sapiens Retinoic acid receptor alpha Proteins 0.000 claims description 10
- 101001010792 Homo sapiens Transcriptional regulator ERG Proteins 0.000 claims description 10
- 102100030126 Interferon regulatory factor 4 Human genes 0.000 claims description 10
- 102100026879 Interleukin-2 receptor subunit beta Human genes 0.000 claims description 10
- 102100039064 Interleukin-3 Human genes 0.000 claims description 10
- 102100029691 Myeloid leukemia factor 1 Human genes 0.000 claims description 10
- 102100034099 Myocardin-related transcription factor A Human genes 0.000 claims description 10
- 102100036639 Myosin-11 Human genes 0.000 claims description 10
- 102100032028 Non-receptor tyrosine-protein kinase TYK2 Human genes 0.000 claims description 10
- 102100025372 Nuclear pore complex protein Nup98-Nup96 Human genes 0.000 claims description 10
- 102100031014 Phosphatidylinositol-binding clathrin assembly protein Human genes 0.000 claims description 10
- 102100040400 Proteasome subunit beta type-2 Human genes 0.000 claims description 10
- 102100026375 Protein PML Human genes 0.000 claims description 10
- 101710156592 Putative TATA-binding protein pB263R Proteins 0.000 claims description 10
- 102100029244 RNA-binding protein 15 Human genes 0.000 claims description 10
- 102000003890 RNA-binding protein FUS Human genes 0.000 claims description 10
- 108090000292 RNA-binding protein FUS Proteins 0.000 claims description 10
- 101100443768 Rattus norvegicus Dock9 gene Proteins 0.000 claims description 10
- 102100023606 Retinoic acid receptor alpha Human genes 0.000 claims description 10
- 102100040296 TATA-box-binding protein Human genes 0.000 claims description 10
- 101710145783 TATA-box-binding protein Proteins 0.000 claims description 10
- 108020004445 glyceraldehyde-3-phosphate dehydrogenase Proteins 0.000 claims description 10
- 108010054452 nuclear pore complex protein 98 Proteins 0.000 claims description 10
- 239000002773 nucleotide Substances 0.000 claims description 10
- 125000003729 nucleotide group Chemical group 0.000 claims description 10
- 101000634835 Homo sapiens M1-specific T cell receptor alpha chain Proteins 0.000 claims description 9
- 101000763322 Homo sapiens M1-specific T cell receptor beta chain Proteins 0.000 claims description 9
- 101000996563 Homo sapiens Nuclear pore complex protein Nup214 Proteins 0.000 claims description 9
- 101100078258 Homo sapiens RUNX1T1 gene Proteins 0.000 claims description 9
- 101000634836 Homo sapiens T cell receptor alpha chain MC.7.G5 Proteins 0.000 claims description 9
- 101000763321 Homo sapiens T cell receptor beta chain MC.7.G5 Proteins 0.000 claims description 9
- 102100033819 Nuclear pore complex protein Nup214 Human genes 0.000 claims description 9
- 102100024952 Protein CBFA2T1 Human genes 0.000 claims description 9
- 108700040655 RUNX1 Translocation Partner 1 Proteins 0.000 claims description 9
- 230000002018 overexpression Effects 0.000 claims description 9
- 102100026964 M1-specific T cell receptor beta chain Human genes 0.000 claims description 8
- 102100029450 M1-specific T cell receptor alpha chain Human genes 0.000 claims description 7
- 102100022807 Potassium voltage-gated channel subfamily H member 2 Human genes 0.000 claims description 7
- 239000002299 complementary DNA Substances 0.000 claims description 7
- 102100025373 Runt-related transcription factor 1 Human genes 0.000 claims 3
- 101000578768 Pneumocystis carinii Mitogen-activated protein kinase 2 Proteins 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 10
- 208000002250 Hematologic Neoplasms Diseases 0.000 abstract description 9
- 238000001914 filtration Methods 0.000 abstract description 9
- 238000010195 expression analysis Methods 0.000 abstract description 7
- 230000008685 targeting Effects 0.000 abstract description 7
- 238000012913 prioritisation Methods 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000013461 design Methods 0.000 abstract description 2
- 208000031261 Acute myeloid leukaemia Diseases 0.000 description 26
- 208000033776 Myeloid Acute Leukemia Diseases 0.000 description 22
- 206010028980 Neoplasm Diseases 0.000 description 19
- 238000010790 dilution Methods 0.000 description 18
- 239000012895 dilution Substances 0.000 description 18
- 238000010200 validation analysis Methods 0.000 description 17
- 201000007224 Myeloproliferative neoplasm Diseases 0.000 description 16
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 14
- 201000011510 cancer Diseases 0.000 description 14
- 230000008707 rearrangement Effects 0.000 description 14
- 208000032791 BCR-ABL1 positive chronic myelogenous leukemia Diseases 0.000 description 13
- 208000010833 Chronic myeloid leukaemia Diseases 0.000 description 13
- 108091008121 PML-RARA Proteins 0.000 description 13
- 238000012340 reverse transcriptase PCR Methods 0.000 description 13
- 108020004414 DNA Proteins 0.000 description 12
- 230000035945 sensitivity Effects 0.000 description 11
- 210000004027 cell Anatomy 0.000 description 10
- 230000005856 abnormality Effects 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 201000006462 myelodysplastic/myeloproliferative neoplasm Diseases 0.000 description 9
- 102000002664 Core Binding Factor Alpha 2 Subunit Human genes 0.000 description 8
- 208000006664 Precursor Cell Lymphoblastic Leukemia-Lymphoma Diseases 0.000 description 8
- 208000033761 Myelogenous Chronic BCR-ABL Positive Leukemia Diseases 0.000 description 7
- 208000019420 lymphoid neoplasm Diseases 0.000 description 7
- 201000000050 myeloid neoplasm Diseases 0.000 description 7
- 208000024893 Acute lymphoblastic leukemia Diseases 0.000 description 6
- 208000014697 Acute lymphocytic leukaemia Diseases 0.000 description 6
- 208000031422 Lymphocytic Chronic B-Cell Leukemia Diseases 0.000 description 6
- 238000003556 assay Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 239000013610 patient sample Substances 0.000 description 6
- 208000010839 B-cell chronic lymphocytic leukemia Diseases 0.000 description 5
- 239000012634 fragment Substances 0.000 description 5
- 201000000638 mature B-cell neoplasm Diseases 0.000 description 5
- 208000010915 neoplasm of mature B-cells Diseases 0.000 description 5
- 102200087780 rs77375493 Human genes 0.000 description 5
- 210000001519 tissue Anatomy 0.000 description 5
- 206010064571 Gene mutation Diseases 0.000 description 4
- 206010052178 Lymphocytic lymphoma Diseases 0.000 description 4
- 208000032758 Precursor T-lymphoblastic lymphoma/leukaemia Diseases 0.000 description 4
- 238000010804 cDNA synthesis Methods 0.000 description 4
- 238000010835 comparative analysis Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 206010028537 myelofibrosis Diseases 0.000 description 4
- 231100000590 oncogenic Toxicity 0.000 description 4
- 230000002246 oncogenic effect Effects 0.000 description 4
- 208000003476 primary myelofibrosis Diseases 0.000 description 4
- 208000036762 Acute promyelocytic leukaemia Diseases 0.000 description 3
- 208000004736 B-Cell Leukemia Diseases 0.000 description 3
- 208000025324 B-cell acute lymphoblastic leukemia Diseases 0.000 description 3
- 208000025321 B-lymphoblastic leukemia/lymphoma Diseases 0.000 description 3
- 206010014950 Eosinophilia Diseases 0.000 description 3
- 208000025205 Mantle-Cell Lymphoma Diseases 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000008826 genomic mutation Effects 0.000 description 3
- 238000007901 in situ hybridization Methods 0.000 description 3
- 210000003519 mature b lymphocyte Anatomy 0.000 description 3
- 210000005259 peripheral blood Anatomy 0.000 description 3
- 239000011886 peripheral blood Substances 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000003753 real-time PCR Methods 0.000 description 3
- 238000002864 sequence alignment Methods 0.000 description 3
- 102100031048 Coiled-coil domain-containing protein 6 Human genes 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 101000777370 Homo sapiens Coiled-coil domain-containing protein 6 Proteins 0.000 description 2
- 206010025323 Lymphomas Diseases 0.000 description 2
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 2
- 208000008601 Polycythemia Diseases 0.000 description 2
- 208000033826 Promyelocytic Acute Leukemia Diseases 0.000 description 2
- 108091008109 Pseudogenes Proteins 0.000 description 2
- 102000057361 Pseudogenes Human genes 0.000 description 2
- 238000002123 RNA extraction Methods 0.000 description 2
- 238000011529 RT qPCR Methods 0.000 description 2
- 208000000389 T-cell leukemia Diseases 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000000601 blood cell Anatomy 0.000 description 2
- 210000001185 bone marrow Anatomy 0.000 description 2
- 208000032852 chronic lymphocytic leukemia Diseases 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 239000013256 coordination polymer Substances 0.000 description 2
- 239000012470 diluted sample Substances 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 201000005787 hematologic cancer Diseases 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- 208000003747 lymphoid leukemia Diseases 0.000 description 2
- 238000007403 mPCR Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 108700024542 myc Genes Proteins 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 239000002751 oligonucleotide probe Substances 0.000 description 2
- 230000001717 pathogenic effect Effects 0.000 description 2
- 238000004393 prognosis Methods 0.000 description 2
- 102220197873 rs1057519750 Human genes 0.000 description 2
- 230000005945 translocation Effects 0.000 description 2
- 101710082795 30S ribosomal protein S17, chloroplastic Proteins 0.000 description 1
- 206010000830 Acute leukaemia Diseases 0.000 description 1
- 108091093088 Amplicon Proteins 0.000 description 1
- 101000849579 Arabidopsis thaliana 30S ribosomal protein S13, chloroplastic Proteins 0.000 description 1
- 101100330290 Arabidopsis thaliana CS26 gene Proteins 0.000 description 1
- 208000003950 B-cell lymphoma Diseases 0.000 description 1
- 208000019838 Blood disease Diseases 0.000 description 1
- 241000282465 Canis Species 0.000 description 1
- 101700026669 DACH1 Proteins 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 102100028735 Dachshund homolog 1 Human genes 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000583839 Homo sapiens Muscleblind-like protein 1 Proteins 0.000 description 1
- 101001091996 Homo sapiens Rho GTPase-activating protein 22 Proteins 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 238000000585 Mann–Whitney U test Methods 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 102100030965 Muscleblind-like protein 1 Human genes 0.000 description 1
- 208000005890 Neuroma Diseases 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 101150093908 PDGFRB gene Proteins 0.000 description 1
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 108091000080 Phosphotransferase Proteins 0.000 description 1
- 108010029485 Protein Isoforms Proteins 0.000 description 1
- 102000001708 Protein Isoforms Human genes 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- 241000700159 Rattus Species 0.000 description 1
- 102100035757 Rho GTPase-activating protein 22 Human genes 0.000 description 1
- 238000012300 Sequence Analysis Methods 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 230000001594 aberrant effect Effects 0.000 description 1
- 230000007488 abnormal function Effects 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 108010038083 amyloid fibril protein AS-SAM Proteins 0.000 description 1
- 239000002246 antineoplastic agent Substances 0.000 description 1
- 229940041181 antineoplastic drug Drugs 0.000 description 1
- 238000003149 assay kit Methods 0.000 description 1
- 210000003719 b-lymphocyte Anatomy 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 210000001772 blood platelet Anatomy 0.000 description 1
- 102220345055 c.812delC Human genes 0.000 description 1
- 238000010805 cDNA synthesis kit Methods 0.000 description 1
- 230000007910 cell fusion Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000008482 dysregulation Effects 0.000 description 1
- 230000000235 effect on cancer Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 210000003743 erythrocyte Anatomy 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000010353 genetic engineering Methods 0.000 description 1
- 102000054766 genetic haplotypes Human genes 0.000 description 1
- 230000007614 genetic variation Effects 0.000 description 1
- 208000014951 hematologic disease Diseases 0.000 description 1
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 1
- 208000018706 hematopoietic system disease Diseases 0.000 description 1
- 238000012744 immunostaining Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 229940043355 kinase inhibitor Drugs 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000012775 microarray technology Methods 0.000 description 1
- 210000005087 mononuclear cell Anatomy 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 229910052698 phosphorus Inorganic materials 0.000 description 1
- 239000011574 phosphorus Substances 0.000 description 1
- 102000020233 phosphotransferase Human genes 0.000 description 1
- 239000003757 phosphotransferase inhibitor Substances 0.000 description 1
- 238000010837 poor prognosis Methods 0.000 description 1
- 239000013641 positive control Substances 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 108020004418 ribosomal RNA Proteins 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- YEENEYXBHNNNGV-XEHWZWQGSA-M sodium;3-acetamido-5-[acetyl(methyl)amino]-2,4,6-triiodobenzoate;(2r,3r,4s,5s,6r)-2-[(2r,3s,4s,5r)-3,4-dihydroxy-2,5-bis(hydroxymethyl)oxolan-2-yl]oxy-6-(hydroxymethyl)oxane-3,4,5-triol Chemical compound [Na+].CC(=O)N(C)C1=C(I)C(NC(C)=O)=C(I)C(C([O-])=O)=C1I.O[C@H]1[C@H](O)[C@@H](CO)O[C@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 YEENEYXBHNNNGV-XEHWZWQGSA-M 0.000 description 1
- 210000001082 somatic cell Anatomy 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Definitions
- Cancer can develop in any tissue in the body, and cancer cells usually invade and destroy adjacent tissues, then gradually invade the circulatory system and metastasize to other parts of the body away from the site of the cancer, eventually killing the host (e.g. a human). make it die Cancer cells divide abnormally, and when observed under a microscope, normal tissues or cells lose their shape and exhibit abnormal functions. Cancer can accompany various genetic mutations depending on the type of tumor, and it has been reported that cell mutations have a significant effect on cancer development and progression.
- At least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63 is specifically Next-generation sequencing panel for leukemia diagnosis further comprising a binding probe.
- Figure 6 shows the comparative evaluation of analysis using the next-generation sequencing panel of the present invention and the existing commercialized targeted RNAseq analysis (B-ALL: B-lymphoblastic leukemia/lymphoma, APL: acute promyelocytic leukemia (acute promyelocytic) leukemia), AML: acute myeloid leukemia, T-ALL: T-lymphoblastic leukemia/lymphoma, FISH: fluorescence in situ hybridization.
- B-ALL B-lymphoblastic leukemia/lymphoma
- APL acute promyelocytic leukemia (acute promyelocytic) leukemia)
- AML acute myeloid leukemia
- T-ALL T-lymphoblastic leukemia/lymphoma
- FISH fluorescence in situ hybridization.
- the present invention provides a next-generation sequencing panel for leukemia diagnosis.
- the present invention provides a next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.
- the next-generation sequencing panel of the present invention is BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 And it may further include a probe that specifically binds to at least one selected from the group consisting of ZNF384.
- Mature B-cell neoplasm (MBN) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.
- the present invention is ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6 , FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10 , MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPI
- Leukemia can be diagnosed with high sensitivity and specificity by detecting mutations, fusions and abnormal expression of 84 genes selected by the sequencing panel of the present invention. More specifically, fusion genes such as ABL1-ETV6 and CSF1R-MEF2D described in Table 1, in particular, IGH-CRLF2, a fusion gene found in patients with Ph-like ALL, can be detected, thereby effectively diagnosing Ph-like ALL.
- fusion genes such as ABL1-ETV6 and CSF1R-MEF2D described in Table 1, in particular, IGH-CRLF2, a fusion gene found in patients with Ph-like ALL, can be detected, thereby effectively diagnosing Ph-like ALL.
- the present invention provides a target capture hybridization method using a next-generation sequencing panel to obtain read data by selecting and sequencing a target gene; checking whether PHB and PHB2 are overexpressed from the read data; And from the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, Leukemia diagnosis It provides a method of providing information for
- the adjustment of the raw lead data may be to filter only data having a quality score higher than or equal to a certain standard from the raw lead data, and the quality score is a value representing the estimation error probability in the raw data numerically, specifically, each It may be a Phred score, which is an index indicating the quality of the base.
- a method of obtaining SAM/BAM data by aligning read data with a reference sequence may be using HISAT2, and a method of obtaining GTF data by calculating the expression of each gene in SAM/BAM data may be using StringTie, and GTF data
- a method of normalizing ? may be using DESeq2.
- next-generation sequencing is a method of high-speed decoding of vast genome information by dividing the genome into countless fragments, analyzing and combining each nucleotide sequence, RNA extraction, cDNA synthesis, adapter ligation , which consists of the steps of target capture hybridization and sequencing.
- Each step may be performed by a method known in the art, and specifically, cDNA is synthesized from RNA extracted from a patient's blood sample, and adapter attachment, PCR performance and target capture hybridization are performed thereon. Sequence analysis (sequencing) may be performed on the library thus prepared.
- the adjusting step may be to filter only data having a quality score above a certain standard from raw data, and the quality score is a numerical value representing the estimation error probability in the raw data, specifically, the quality of each base. It may be a Phred score, which is an index indicating A FASTQ file in which the nucleotide sequence and Phred score of each sequencing read are displayed together is called a FASTQ file.
- the Phred score is 20 (Q20), the probability that the corresponding nucleotide sequence result is an error is 1%, and when it is 30 (Q30), it is stipulated that it has an error probability of 0.1%.
- the visible bases are judged to have excellent sequencing quality and are used for further analysis.
- the Freebayes is a haplotype-based genetic variation detection tool useful for calling mutations in a population, and is available at https://github.com/freebayes/freebayes.
- a total of 84 genes associated with hematologic cancers were selected based on previous literature.
- Predicted fusion candidates were filtered to exclude false-positive results and then classified using a stratified grading system according to the relevance of clinical symptoms with priority evidence in the literature.
- fusion candidates were considered true fusions if: i) supported by a minimum number of reads (FFPM ⁇ 0.1 and junctional reads ⁇ 1), ii) short replicates , pseudogenes, not read-through or found in healthy populations or normal samples, iii) affecting expression levels of fusion partner genes or causing in-frame fusions; and iv) two fusion detection algorithms (FusionCatcher final result file and STAR-Fusion preliminary file and final result file) were considered true fusion.
- FFPM ⁇ 0.1 and junctional reads ⁇ 1 ii) short replicates , pseudogenes, not read-through or found in healthy populations or normal samples, iii) affecting expression levels of fusion partner genes or causing in-frame fusions.
- two fusion detection algorithms FusionCatcher final result file and
- Phase 1 Phase 1 (good studies with consensus of field experts) and Phase 2 (several small published studies with some agreement; preclinical trials; or several case reports without agreement) fusions were selected using a previous tier grading system.
- expert consensus studies and fusion databases including ChimerDB and Mitelman databases were used.
- cDNA synthesis was performed using PrimeScriptTM II 1st strand cDNA synthesis kit (Takara) using 500-1,000 ng of total RNA. Using Takara ExTaq (Takara), 1 ⁇ L of cDNA was amplified with the following primers.
- Table 6 shows 30 of myeloid/lymphoid tumors with 6 AML, 9 B-ALL, 4 T-ALL, 3 mature B-cell tumors, 6 MPN, 1 MDS/MPN and 1 PDGFRB rearrangement.
- the final results of target RNA-seq compared to the conventional method using canine clinical samples are shown.
- targeted RNA-seq detected 12 identical fusions and 1 mutual fusion of CCND1-IGH.
- the partner gene was designated as CCDC6 in target RNA-seq, unlike conventional FISH, which was also confirmed by direct sequencing.
- RNA-seq identified 16 variants (tier 1 or 2) in the expressed transcripts of 10 samples (Table 6).
- Four frame-shifting mutations in GATA2 and WT1 were found in two AML cases (clinical sample [CS] 2-3).
- the M244V, Y253H, E255K/V, V299L and T315I mutations of ABL1 were assigned in the target RNA-seq, indicating that tyrosine It is associated with kinase inhibitor (TKI) resistance.
- TKI kinase inhibitor
- JAK2 V617F mutations were assigned in three BCR-ABL1-negative MPN samples, including two erythrocytosis and one primary myelofibrosis sample (CS26-28). All available cases with 15 of these variants were confirmed by DNA-based-NGS sequencing or real-time PCR.
- the present invention developed and validated a clinically applicable target RNA-seq system for 84 genes related to other hematological malignancies by considering the data as well as the basis of the previous literature.
- the platform of the present invention showed stable performance in assay validation, and efficiently detected a known gene and a new gene fusion.
- the targeted RNA-seq system showed better applicability to detect clinically significant sequence variants as well as expression features using 30 clinical samples from patients with hematological malignancies.
- RNA-seq data generated by the Illumina platform in a clinical setting should be interpreted with caution for trace-level transcriptome reads that show exactly the same breakpoints and sequences as top-hit fusions of other samples in the same pool. , should be inconsistent with the patient's clinical and pathological signs.
- fusion transcripts were selected as true positive fusions for consideration first after 4 filtering steps.
- candidates mostly found in short repeats, pseudogenes, read-through or healthy populations were removed from further evaluation, whereas aberrant expression of partner genes was removed. Induced fusions and in-frame fusions were included. Then, before grading fusions according to prioritized clinical evidence, not all fusions were sufficient to report clinically, unlike in the study setting, as many fusions had not been demonstrated in hematological malignancies.
- the advantage of this targeting method is that it reduces the cost of oligonucleotide probe sets and probes for only one partner, especially for CRLF2-, ETV6-, KMT2A-, NUP98-, PAX5- and PDGFRA/B fusions with multiple partner genes. can be used to easily detect chimeric transcripts.
- RNA-seq While the identification of genomic variants relies primarily on DNA-based sequencing by NGS techniques, the use of RNA-seq is difficult due to the inherent complexity of transcripts spanning multiple exons at genomic locations. Although this hurdle has been overcome using splice-recognition mappers such as HISAT2, TopHat and STAR10-12, only a few RNA-seq studies have investigated the detection of variants in clinical diagnostic settings. In the present invention, interesting mutations were identified in RNA-seq data having clinical significance. Among other things, the simultaneous detection of ABL1 mutations associated with TKI resistance and BCR-ABL1 fusion in B-ALL and CML-BP patients demonstrated the advantage of targeted RNA-seq to enable faster diagnostic and therapeutic decisions.
- RNAseq analysis system (Engvall M, Cahill N, Jonsson BI, Hoglund M, Hallbook H, Cavelier L: Detection of leukemia gene) developed based on the method of constructing cDNA library for specific genes using anchored multiplex PCR. Fusions by targeted RNA-sequencing in routine diagnostics. BMC Med Genomics 2020, 13:106.) and the analysis system using the next-generation sequencing panel of the present invention were compared and evaluated. For this comparative evaluation, one B-ALL patient sample, two AML patient samples, and one patient sample each for acute promyelocytic leukemia and T-ALL were used. As a result, in the B-ALL patient sample accompanied by the IGH-CRLF2 gene fusion, the IGH-CRLF2 gene fusion was detected only by analysis using the next-generation sequencing panel of the present invention.
- Ph-like ALL Philadelphia chromosome-like acute lymphoblastic leukemia
- CRLF2 Philadelphia chromosome-like acute lymphoblastic leukemia
- Ph-like B-ALL showed a very poor prognosis and was classified as a new subtype of ALL. Therefore, diagnosis of Ph-like B-ALL through gene analysis in ALL, especially B-ALL, is very important for treating leukemia from the point of view of precision medicine. Therefore, it can be seen that the next-generation sequencing panel of the present invention is very useful for detecting Ph-like B-ALL rather than cDNA library-based targeted RNAseq using anchored multiplex PCR used in some clinical test sites (FIG. 6).
- Our targeted RNA seq system uses a next-generation sequencing panel that targets 84 genes of the present invention, and the Commercial targeted RNA seq system analyzes gene fusions detected when using the previously commercialized targeted RNAseq analysis system. indicates.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Chemical & Material Sciences (AREA)
- Data Mining & Analysis (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Bioethics (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention allowed the design of a targeted RNA-seq panel targeting 84 genes associated with hematologic malignancies and integrated, for a clinical diagnostic setting, with stepwise filtering, prioritization strategies and a bioinformatics pipeline. The system provides, in various clinical samples, a gene fusion identification ability that is more sensitive than conventional molecular methods. A transcriptome and clinically significant variants in expression profiling can be directly and simultaneously investigated using RNA-seq data even without additional parallel testing. The present invention provides a comprehensive tool for analyzing hematologic malignancies in a clinical laboratory to identify the advantages of a clinical laboratory-oriented targeted RNA-seq system, which increases the diagnostic yield for gene fusion detection and is for simplifying diagnostic steps.
Description
본 발명은 백혈병 진단용 차세대 염기서열분석 패널 및 이를 이용한 백혈병 진단을 위한 정보제공방법에 관한 것이다.The present invention relates to a next-generation sequencing panel for leukemia diagnosis and a method for providing information for leukemia diagnosis using the same.
암은 신체의 어느 조직에서나 발생할 수 있으며, 암세포는 일반적으로 인접한 조직에 침투하여 파괴하고, 점점 순환계를 침범하여 암 발생 부위로부터 멀리 떨어진 신체의 다른 부위로 전이되어 결국 숙주(예를 들면 사람)를 죽게 한다. 암세포는 비정상적으로 분열하며, 현미경 하에서 관찰해 보면 정상적인 조직이나, 세포의 형태를 잃고 비정상적인 기능을 나타낸다. 암은 종양의 형태에 따라 다양한 유전자 변이를 동반할 수 있으며, 암 발생과 진행에 있어 세포 돌연변이가 큰 영향을 미친다는 것이 보고되고 있다.Cancer can develop in any tissue in the body, and cancer cells usually invade and destroy adjacent tissues, then gradually invade the circulatory system and metastasize to other parts of the body away from the site of the cancer, eventually killing the host (e.g. a human). make it die Cancer cells divide abnormally, and when observed under a microscope, normal tissues or cells lose their shape and exhibit abnormal functions. Cancer can accompany various genetic mutations depending on the type of tumor, and it has been reported that cell mutations have a significant effect on cancer development and progression.
따라서 암세포로부터 유전자 변이를 검출하는 다양한 방법들이 연구되고 있으며, 검출된 돌연변이 정보는 암환자의 진단 및 정밀 맞춤 항암제 선정에 많은 도움을 줄 수 있다.Therefore, various methods for detecting gene mutations from cancer cells are being studied, and the detected mutation information can help a lot in the diagnosis of cancer patients and the selection of precisely customized anticancer drugs.
기존 유전체 변이 검출 방법들은 한 개의 유전체 변이만을 검출할 수 있도록 고안된 앰플리콘 (amplicon) 및 프로브 (probe)를 사용하고 있으므로, 암의 원인이 되는 다양한 체세포 돌연변이를 검출하기 위해서는 다른 유전체 변이의 검출을 위한 추가 실험이 필요하며, 기존에 밝혀진 변이 이외의 새로운 변이를 발견할 수 없다는 단점을 갖는다.Existing genomic mutation detection methods use amplicons and probes designed to detect only one genomic mutation. Additional experiments are required, and there is a disadvantage that new mutations other than previously discovered mutations cannot be found.
또한 기존의 방법은 각 유전체 변이 종류에 따라 별도의 검출 방법 (예, SNV: real-time PCR, direct sequencing; 발현량 분석: microarray, 정량 real-time PCR; 또는 전좌: FISH;등)을 수행하기 때문에, 환자 한명의 암조직을 대상으로 모든 종류의 변이를 검출하기 위해서는 많은 시간이 소요되고, 큰 비용이 발생하게 된다.In addition, in the existing method, a separate detection method (eg, SNV: real-time PCR, direct sequencing; expression level analysis: microarray, quantitative real-time PCR; or translocation: FISH; etc.) is performed according to each type of genomic mutation. Therefore, it takes a lot of time to detect all kinds of mutations in cancer tissue of one patient, and a large cost is incurred.
최근 차세대 염기 서열 분석(NGS) 기법의 도입으로 여러 개의 암 관련 유전자를 동시에 분석하는 것이 가능해졌으나 여전히 상당한 양의 위양성 (false positive) 결과의 발생은 질병을 진단하고 예후를 예측하는데 있어서 바이오인포매틱스(bioinformatics) 활용에 도전적인 요소가 많이 남아있음을 의미한다.Recently, with the introduction of next-generation sequencing (NGS) techniques, it has become possible to analyze several cancer-related genes simultaneously, but the occurrence of a significant amount of false positive results is still a significant factor in diagnosing diseases and predicting prognosis. This means that many challenging factors remain in the application of bioinformatics).
본 발명은 특정 유전자를 표적으로 하는 백혈병 진단용 차세대 염기서열분석 패널 및 이를 이용한 백혈병 진단을 위한 정보제공방법을 제공하는 것을 목적으로 한다.An object of the present invention is to provide a next-generation sequencing panel for leukemia diagnosis targeting a specific gene and a method for providing information for leukemia diagnosis using the same.
1. PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC에 특이적으로 결합하는 프로브를 포함하는 백혈병 진단용 차세대 염기서열분석 패널.1. Next-generation sequencing panel for leukemia diagnosis, including probes that specifically bind to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.
2. 위 1에 있어서, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA 및 WT1로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.2. In the above 1, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1 , MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, and a next-generation for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of WT1 sequencing panel.
3. 위 1에 있어서, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 및 ZNF384로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.3. In the above 1, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1 , KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 and a next-generation for diagnosis of leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of ZNF384 sequencing panel.
4. 위 1에 있어서, ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214, PCM1, TBP, TCL1A, TRB, TRG 및 TYK2로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.4. In 1 above, ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214 , PCM1, TBP, TCL1A, TRB, TRG and a next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of TYK2.
5. 위 1에 있어서, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 및 ZNF384로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.5. according to 1 above, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 and ZNF384 A next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of.
6. 위 1에 있어서, AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 및 TP63으로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.6. In the above 1, at least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63 is specifically Next-generation sequencing panel for leukemia diagnosis further comprising a binding probe.
7. 위 1에 있어서, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB 및 PCM1로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.7. The leukemia according to the above 1, further comprising a probe that specifically binds to at least one selected from the group consisting of AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB and PCM1. A next-generation sequencing panel for diagnostics.
8. 위 1에 있어서, PDGFRA에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.8. The next-generation sequencing panel for leukemia diagnosis according to the above 1, further comprising a probe that specifically binds to PDGFRA.
9. ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 및 ZNF384에 특이적으로 결합하는 프로브를 포함하는 백혈병 진단용 차세대 염기서열분석 패널.9. ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, ERG, ETV FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, SDHA, A next-generation sequencing panel for leukemia diagnosis, comprising a probe that specifically binds to TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384.
10. 위 1 내지 9 중 어느 하나의 염기서열분석 패널로 표적 포획 혼성화하여 타겟 유전자를 선별하고 시퀀싱하여 리드 데이터를 얻는 단계;10. Selecting and sequencing a target gene by target capture hybridization with the sequencing panel of any one of 1 to 9 above to obtain read data;
상기 리드 데이터로부터 PHB 및 PHB2의 과발현 여부를 확인하는 단계; 및checking whether PHB and PHB2 are overexpressed from the read data; and
상기 리드 데이터로부터 IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC로 이루어진 군에서 선택되는 어느 하나의 유전자가 포함된 융합을 검출하는 단계를 포함하는, 백혈병 진단을 위한 정보제공 방법.From the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, leukemia diagnosis How to provide information for
11. 위 10에 있어서, 상기 과발현 여부는 상기 리드 데이터를 HISAT2로 참조 서열과 정렬하여 SAM/BAM 데이터를 얻고, StringTie로 각 유전자의 발현을 계산하여 얻은 GTF 데이터를 DESeq2로 정규화하여 수행되는 것인, 백혈병 진단을 위한 정보제공 방법.11. In the above 10, whether the overexpression is performed by aligning the read data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2. , an informational method for diagnosing leukemia.
12. 위 10에 있어서, 상기 유전자 융합 검출은 상기 리드 데이터를 Bowtie, STAR, Blat 또는 Bowtie2로 참조 서열과 정렬하여 STAR-Fusion 또는 Fusion Catcher 융합 유전자 확인 툴로 융합을 검출하는 단계를 포함하여 수행되는 것인, 백혈병 진단을 위한 정보제공 방법.12. The method of 10 above, wherein the gene fusion detection is performed by aligning the read data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a STAR-Fusion or Fusion Catcher fusion gene identification tool. Phosphorus, information providing method for leukemia diagnosis.
13. 위 10에 있어서, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, DEK, DUSP22, EBF1, EP300, ERG, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 및 ZNF384로 이루어진 군에서 선택되는 어느 하나의 유전자의 과발현, 융합 또는 변이 여부를 확인하는 단계를 더 포함하는, 백혈병 진단을 위한 정보제공 방법.13. The above 10, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, DEK, DUSP22, EBF1, EP300, ERG, FGFR3, FIP1L1, FUS, GAPDH , GATA2, GUSB, HBS1L, HPRT1, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYH11, NSDUP214, NUP98, NSDUP214 , PAX5, PBX1, PCM1, PDGFRA, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384 The method for providing information for diagnosing leukemia, further comprising the step of determining whether any one gene selected from the group is overexpressed, fused or mutated.
14. (a) 개체로부터 분리된 RNA로부터 합성한 cDNA를 청구항 1 내지 9 중 어느 한 항의 염기서열분석 패널의 각 프로브에 결합시키고 차세대 염기서열 분석(NGS)을 수행하여 미가공 리드 데이터를 얻는 단계;14. (a) binding cDNA synthesized from RNA isolated from an individual to each probe of the sequencing panel according to any one of claims 1 to 9 and performing next-generation sequencing (NGS) to obtain raw read data;
(b) 상기 미가공 리드 데이터를 Q10 이상의 품질 점수를 가진 데이터로 조정하는 단계; (b) adjusting the raw lead data to data having a quality score of Q10 or higher;
(c) 상기 조정된 데이터에서 각 유전자의 융합을 검출하는 단계;(c) detecting the fusion of each gene in the adjusted data;
(d) 상기 조정된 데이터의 각 유전자에서 참조 서열 대비 변이를 검출하는 단계; 및(d) detecting a mutation compared to a reference sequence in each gene of the adjusted data; and
(e) 상기 조정된 데이터로부터 각 유전자의 발현을 확인하는 단계를 포함하고,(e) confirming the expression of each gene from the adjusted data,
상기 융합의 검출은 상기 조정된 데이터를 Bowtie, STAR, Blat 또는 Bowtie2로 참조 서열과 정렬하여 융합 유전자 확인 툴 (STAR-Fusion, Fusion Catcher)로 융합을 검출하는 단계를 포함하여 수행되고,The detection of the fusion is performed by aligning the adjusted data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a fusion gene identification tool (STAR-Fusion, Fusion Catcher),
상기 변이의 검출은 조정된 데이터의 서열을 STAR로 정렬된 SAM/BAM 데이터를 얻고, Piccard로 상기 BAM 데이터 내의 duplicate를 분류 및 표지하고, 상기 정렬, 분류 및 중복 제거된 BAM 데이터를 Freebayes로 SNV 및 Indel 호출하여 수행되고,The detection of the mutation is to obtain the SAM / BAM data aligned with the STAR sequence of the adjusted data, classify and label the duplicates in the BAM data with Piccard, and the alignment, classification and deduplication BAM data with Freebayes SNV and This is done by calling Indel,
상기 유전자의 발현은 조정된 데이터를 HISAT2로 참조 서열과 정렬하여 SAM/BAM 데이터를 얻고, StringTie로 각 유전자의 발현을 계산하여 얻은 GTF 데이터를 DESeq2로 정규화하여 수행되는 것인, 백혈병 진단을 위한 정보제공방법. The expression of the gene is performed by aligning the adjusted data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2. Information for diagnosing leukemia How to provide.
본 발명은 특정 유전자를 표적으로 하는 RNA-seq 패널을 포함하는 백혈병 진단용 차세대 염기서열분석 (NGS, next generation sequencing) 패널에 관한 것으로, 높은 민감도 및 정확도로 백혈병을 진단하거나, 백혈병 진단을 위한 정보를 제공할 수 있다.The present invention relates to a next generation sequencing (NGS) panel for diagnosing leukemia comprising an RNA-seq panel that targets a specific gene. can provide
도 1은 융합 검출, 변이체 검출 및 발현 프로파일링을 위한 표적 RNA-시퀀싱 자료 분석의 생체정보분석 파이프라인을 나타내었다.1 shows a bioinformatics pipeline of target RNA-sequencing data analysis for fusion detection, variant detection, and expression profiling.
도 2는 캐리오버 (A-B), 반복성 (C-D) 및 직선성 (E-F)에 대한 표적 RNA 시퀀싱의 분석 검증에 관한 것이다. A-B: 표 4에 설명된 모든 알려진 융합은 STAR-Fusion(A) 및 FusionCatcher(B) 모두에서 미량 수준의 캐리오버 융합으로 실행 1에서 진정 융합으로 검출됨. 캐리오버 융합은 진정 융합보다 상당히 낮은 FFPM 및 리드 수를 가짐(P <0.001). C-D: 반복실험 내에서 알려진 융합은 리드 수 (C)의 신뢰할 수 있는 반복성을 보였으며 정규화된 FFPM 값을 사용했을 때 반복성이 증가함(D). E-F: BCR-ABL1(E) 및 PML-RARA(F) 융합으로 희석된 샘플의 FFPM은 선형 log2배 변화를 나타냄 (각 r2=0.9852 및 0.9447) (FFPM: 백만 개당 융합 조각).2 relates to analytical validation of target RNA sequencing for carryover (A-B), repeatability (C-D) and linearity (E-F). A-B: All known fusions described in Table 4 were detected as true fusions in Run 1 with trace levels of carryover fusions in both STAR-Fusion (A) and FusionCatcher (B). Carryover fusions have significantly lower FFPM and read counts than true fusions (P <0.001). C-D: Known fusions within replicates showed reliable repeatability of read counts (C) and increased repeatability when normalized FFPM values were used (D). E-F: FFPM of samples diluted with BCR-ABL1(E) and PML-RARA(F) fusions showed a linear log2-fold change (r2=0.9852 and 0.9447, respectively) (FFPM: fusion fragments per million).
도 3은 표적 RNA 시퀀싱에 의한 혈액 악성 종양 환자 30명과 정상 대조군 3명의 히트맵 및 계층적 클러스터링에 관한 것이다. 히트맵은 유전자 발현의 정규화된 log2배 변화를 색으로 보여주며, 패널의 표적 유전자 행과 환자 및 정상 대조군의 샘플 열이 모두 클러스터링됨. 상단 색상 막대는 각 샘플의 질병 그룹을 보여주고 질병 그룹을 4개의 별개의 클러스터로 구분함(CS: 임상 샘플; NC: 정상 제어; AML: 급성 골수성 백혈병; B-ALL: B-림프아구성 백혈병/림프종; T-ALL: T-림프아구성 백혈병/림프종; MBN: 성숙 B세포 종양; MPN: 골수 증식성 종양; CML-BP: 만성 골수성 백혈병 폭발기; MDS/MPN: 골수 형성 이상/골수 증식 종양; MLN: 골수성/림프성 종양).3 is a heat map and hierarchical clustering of 30 patients with hematologic malignancies and 3 normal controls by target RNA sequencing. The heatmap shows the normalized log2-fold change in gene expression in color, with both the target gene rows in the panel and the sample columns from patients and normal controls clustered. The top color bar shows the disease group of each sample and divides the disease groups into 4 distinct clusters (CS: clinical sample; NC: normal control; AML: acute myeloid leukemia; B-ALL: B-lymphoblastic leukemia) /lymphoma; T-ALL: T-lymphoblastic leukemia/lymphoma; MBN: mature B-cell tumor; MPN: myeloproliferative tumor; CML-BP: chronic myelogenous leukemia explosion; MDS/MPN: myelodysplastic/myeloproliferative Tumors; MLN: myeloid/lymphoid tumors).
도 4 및 5는 본 발명의 차세대 염기서열분석 패널을 이용하여 얻은, 다양한 유형의 백혈병에서 검출된 유전자 융합 빈도를 나타낸다. 도 4에서 검은색 막대는 각 백혈병 유형에서 유전자 융합이 감지된 환자의 빈도를 나타낸다. 93명의 백혈병 환자 중 77%(72명)에서 유전자 융합이 발견되었다. 유전적 융합 돌연변이는 성인 B-ALL 환자의 94%(33/35)와 소아 B-ALL 환자의 83%(25/30)에서 관찰되었다. 도 5는 백혈병의 각 유형에 대한 유전자 융합 패턴 및 빈도를 나타낸다. 성인 B-ALL에서 발견된 유전자 융합(n=35) 중 가장 흔한 융합 유전자는 BCR-ABL1(24/33, 73%)이었고, 소아 B-ALL에서 가장 흔한 융합 유전자 돌연변이는 ETV6-RUNX1 (4/26, 15%) 이었다. 4 and 5 show gene fusion frequencies detected in various types of leukemia obtained using the next-generation sequencing panel of the present invention. In Fig. 4, black bars indicate the frequency of patients with detected gene fusion in each leukemia type. Gene fusions were found in 77% (72) of 93 leukemia patients. Genetic fusion mutations were observed in 94% (33/35) of adult B-ALL patients and 83% (25/30) of pediatric B-ALL patients. 5 shows gene fusion patterns and frequencies for each type of leukemia. Among the gene fusions (n=35) found in adult B-ALL, the most common fusion gene was BCR-ABL1 (24/33, 73%), and the most common fusion gene mutation in pediatric B-ALL was ETV6-RUNX1 (4/ 26, 15%).
도 6은 본 발명의 차세대 염기서열분석 패널을 이용한 분석과 기존 상품화된 targeted RNAseq 분석과의 비교평가를 나타낸다(B-ALL: B-림프아 구성 백혈병/림프종, APL: 급성 전골수성 백혈병(acute promyelocytic leukemia), AML: 급성 골수성 백혈병(acute myeloid leukemia), T-ALL: T-림프아구성 백혈병/림프종, FISH: 형광핵산혼성화(fluorescence in situ hybridization).Figure 6 shows the comparative evaluation of analysis using the next-generation sequencing panel of the present invention and the existing commercialized targeted RNAseq analysis (B-ALL: B-lymphoblastic leukemia/lymphoma, APL: acute promyelocytic leukemia (acute promyelocytic) leukemia), AML: acute myeloid leukemia, T-ALL: T-lymphoblastic leukemia/lymphoma, FISH: fluorescence in situ hybridization.
본 발명은 백혈병 진단용 차세대 염기서열분석 패널을 제공한다.The present invention provides a next-generation sequencing panel for leukemia diagnosis.
본 발명의 “백혈병”은 혈액세포 특히 백혈구가 이상 증식하는 혈액암의 일종으로, 예컨대 급성 골수성 백혈병, 급성 림프구성 백혈병, 만성 골수성 백혈병 또는 만성 림프구성 백혈병 등일 수 있다."Leukemia" of the present invention is a type of blood cancer in which blood cells, particularly white blood cells, abnormally proliferate, and may be, for example, acute myeloid leukemia, acute lymphocytic leukemia, chronic myelogenous leukemia or chronic lymphocytic leukemia.
본 발명의 “진단”은 비정상적인 혈액세포가 억제되지 않고 과도하게 증식하여 정상적인 백혈구와 적혈구, 혈소판의 생성이 억제되고 있는 것을 발견하고 확인하는 모든 행위를 말하는 것으로, 백혈병에 대한 한 개체의 감수성을 판정하는 것, 한 개체가 백혈병을 현재 가지고 있는지 여부를 판정하는 것, 또는 백혈병에 걸린 개체의 예후를 판정하는 것을 포함할 수 있다."Diagnosis" of the present invention refers to any act of discovering and confirming that abnormal blood cells are excessively proliferated without inhibition and that the production of normal white blood cells, red blood cells, and platelets is suppressed, and the sensitivity of an individual to leukemia is determined. determining whether an individual currently has leukemia, or determining the prognosis of an individual afflicted with leukemia.
본 발명의 “개체”는 백혈병이 발병하였거나 발병할 수 있는 인간을 포함한 쥐, 생쥐, 가축 등의 모든 동물을 의미한다. 구체적인 예로, 인간을 포함한 포유동물일 수 있다.As used herein, the term “individual” refers to all animals, including humans, rats, mice, and livestock that have or can develop leukemia. As a specific example, it may be a mammal including a human.
본 발명의 “프로브"는 특정 유전자 또는 다른 DNA 서열을 검색하기 위해 유전 공학에 사용되는 단일 가닥 DNA 또는 RNA 단편을 의미하는 것으로 효소 화학적인 분리정제 또는 합성과정을 거쳐 제작된 수 염기 내지 수백 염기길이의 mRNA와 특이적으로 결합할 수 있는 핵산일 수 있다. 상기 프로브에 방사성 동위원소나 효소 등을 표지하여 mRNA의 존재 유무를 확인할 수 있으며, 공지된 방법으로 디자인하고 변형하여 사용될 수 있다.The "probe" of the present invention refers to a single-stranded DNA or RNA fragment used in genetic engineering to search for a specific gene or other DNA sequence, and is produced through enzymatic chemical separation and purification or synthetic process, from several bases to several hundred bases in length. It may be a nucleic acid capable of specifically binding to the mRNA of a. The presence or absence of mRNA can be checked by labeling the probe with a radioactive isotope or an enzyme, and it can be designed and modified by a known method.
본 발명의“패널”은 백혈병 진단을 위한 다수의 유전자에 결합하는 프로브 임의의 조합을 사용하여 구성된 것으로서 유전자 패널 또는 유전자 프로브 패널을 의미하며, 상기 조합은 예컨대 유전자 13개, 14개, 25개, 29개, 35개, 41개, 49개 또는 84개 등에 대한 프로브 전체 세트, 또는 그의 임의의 서브세트 또는 서브조합을 포함한다.The "panel" of the present invention refers to a gene panel or a gene probe panel constructed using any combination of probes binding to multiple genes for leukemia diagnosis, and the combination includes, for example, 13, 14, 25, the entire set of probes for 29, 35, 41, 49 or 84, etc., or any subset or subcombination thereof.
본 발명은 PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC에 특이적으로 결합하는 프로브를 포함하는 백혈병 진단용 차세대 염기서열분석 패널을 제공한다.The present invention provides a next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.
본 발명의 염기서열분석 패널로 선별한 PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC유전자의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 백혈병을 진단할 수 있다. 예컨대 PHB 및 PHB2의 발현이상(과발현)과 IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC 유전가 포함된 융합을 검출함으로써 높은 민감도 및 특이도로 필라델피아 염색체 유사 림프아구성 백혈병(Philadelphia chromosome-like-ALL)을 진단할 수 있다.High sensitivity and specificity by detecting mutation, fusion and abnormal expression of PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC genes selected by the sequencing panel of the present invention Road leukemia can be diagnosed. For example, by detecting a fusion containing PHB and PHB2 overexpression (overexpression) and IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC genes, Philadelphia chromosome-like lymphoblasts with high sensitivity and specificity It can diagnose leukemia (Philladelphia chromosome-like-ALL).
본 발명의 차세대 염기서열분석 패널은 AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA 및 WT1로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자들을 선별하여 각 유전자들의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 급성 골수성 백혈병(Acute myeloid leukemia, AML)을 진단할 수 있다.The next-generation sequencing panel of the present invention is AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D , MLF1, MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, may further comprise a probe that specifically binds to at least one selected from the group consisting of TRA and WT1 have. Acute myeloid leukemia (AML) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.
본 발명의 차세대 염기서열분석 패널은 AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 및 ZNF384로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자들을 선별하여 각 유전자들의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 B-림프아구성 백혈병/림프종(B-lymphoblastic leukemia/lymphoma, B-ALL)을 진단할 수 있다.The next-generation sequencing panel of the present invention is AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL , IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 and may further comprise a probe that specifically binds to at least one selected from the group consisting of ZNF384 have. B-lymphoblastic leukemia/lymphoma (B-ALL) with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene can be diagnosed
본 발명의 차세대 염기서열분석 패널은 ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214, PCM1, TBP, TCL1A, TRB, TRG 및 TYK2로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자들을 선별하여 각 유전자들의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 T-림프아구성 백혈병/림프종(T-lymphoblastic leukemia/lymphoma, T-ALL)을 진단할 수 있다.The next-generation sequencing panel of the present invention is ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3 , NUP214, PCM1, TBP, TCL1A, TRB, TRG and may further include a probe that specifically binds to at least one selected from the group consisting of TYK2. T-lymphoblastic leukemia/lymphoma (T-ALL) with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene can be diagnosed
본 발명의 차세대 염기서열분석 패널은 BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 및 ZNF384로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자들을 선별하여 각 유전자들의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 성숙 B세포 종양(Mature B-cell neoplasm, MBN)을 진단할 수 있다.The next-generation sequencing panel of the present invention is BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 And it may further include a probe that specifically binds to at least one selected from the group consisting of ZNF384. Mature B-cell neoplasm (MBN) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.
본 발명의 차세대 염기서열분석 패널은 AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 및 TP63으로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자들을 선별하여 각 유전자들의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 골수 증식성 종양(Myeloproliferative neoplasms, MPN)을 진단할 수 있다.The next-generation sequencing panel of the present invention is specific to at least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63. It may further include a probe binding positively. Myeloproliferative neoplasms (MPN) can be diagnosed with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene.
본 발명의 차세대 염기서열분석 패널은 AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB 및 PCM1로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자들을 선별하여 각 유전자들의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 골수 형성 이상/골수 증식 종양 (Myelodysplastic/myeloproliferative neoplasm, MDS/MPN)을 진단할 수 있다.The next-generation sequencing panel of the present invention further comprises a probe that specifically binds to at least one selected from the group consisting of AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB and PCM1. can do. It is possible to diagnose myelodysplastic/myeloproliferative neoplasm (MDS/MPN) with high sensitivity and specificity by selecting genes that specifically bind to the probe and detecting mutation, fusion, and expression abnormality of each gene. have.
본 발명의 차세대 염기서열분석 패널은 PDGFRA에 특이적으로 결합하는 프로브를 더 포함할 수 있다. 상기 프로브에 특이적으로 결합하는 유전자를 선별하여 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 골수성/림프성 종양(Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement, MLN)을 진단할 수 있다.The next-generation sequencing panel of the present invention may further include a probe that specifically binds to PDGFRA. Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement (MLN) can be diagnosed with high sensitivity and specificity by selecting a gene that specifically binds to the probe and detecting mutation, fusion, and expression abnormality.
본 발명은 ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 및 ZNF384(84개 유전자)에 특이적으로 결합하는 프로브를 포함하는 백혈병 진단용 차세대 염기서열분석 패널을 제공한다.The present invention is ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6 , FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10 , MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, SDHA RUNX1T1 , TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384 (84 genes) provides a next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds.
본 발명의 염기서열분석 패널로 선별한 84개 유전자의 변이, 융합 및 발현이상을 검출함으로써 높은 민감도 및 특이도로 백혈병을 진단할 수 있다. 보다 구체적으로 표 1에 기재된 ABL1-ETV6, CSF1R-MEF2D 등과 같은 융합유전자, 특히 Ph-like ALL 환자에서 발견되는 융합유전자인 IGH-CRLF2를 검출할 수 있어 Ph-like ALL를 효과적으로 진단할 수 있다.Leukemia can be diagnosed with high sensitivity and specificity by detecting mutations, fusions and abnormal expression of 84 genes selected by the sequencing panel of the present invention. More specifically, fusion genes such as ABL1-ETV6 and CSF1R-MEF2D described in Table 1, in particular, IGH-CRLF2, a fusion gene found in patients with Ph-like ALL, can be detected, thereby effectively diagnosing Ph-like ALL.
표 1. Philadelphia Chromosome-like 림프아구성 백혈병(ph-like ALL) 관련 융합유전자.Table 1. Philadelphia Chromosome-like Lymphoblastic Leukemia (ph-like ALL) Related Fusion Genes.
kinasekinase | Fusion partner genesFusion partner genes |
ABL1ABL1 | ETV6, NUP214ETV6, NUP214 |
CSF1RCSF1R | MEF2DMEF2D |
PDGFRBPDGFRB | EBF1, ETV6EBF1, ETV6 |
PDGFRAPDGFRA | FIP1L1FIP1L1 |
CRLF2CRLF2 | IGHIGH |
JAK2JAK2 | BCR, EBF1, ETV6, PAX5, PCM1BCR, EBF1, ETV6, PAX5, PCM1 |
EPOREPOR | IGH, IGKIG, IG |
NTRK3NTRK3 | ETV6ETV6 |
FGFR1FGFR1 | BCRBCR |
본 발명은 차세대 염기서열분석 패널을 이용한 표적 포획 혼성화 방법으로 타겟 유전자를 선별하고 시퀀싱하여 리드 데이터를 얻는 단계; 상기 리드 데이터로부터 PHB 및 PHB2의 과발현 여부를 확인하는 단계; 및 상기 리드 데이터로부터 IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC로 이루어진 군에서 선택되는 어느 하나의 유전자가 포함된 융합을 검출하는 단계를 포함하는, 백혈병 진단을 위한 정보제공 방법을 제공한다.The present invention provides a target capture hybridization method using a next-generation sequencing panel to obtain read data by selecting and sequencing a target gene; checking whether PHB and PHB2 are overexpressed from the read data; And from the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, Leukemia diagnosis It provides a method of providing information for
본 발명의 염기서열분석 패널은 전술한 바와 같다.The sequencing panel of the present invention is as described above.
본 발명의 표적 포획 혼성화 방법은 목적하는 유전자의 이상(변이, 유전자 융합, 발현이상)을 검출하기에 앞서 목적하는 유전자 타겟을 선별(Target enrichment)하는 방법으로, 타겟 유전자에 특이적으로 결합하는 프로브를 이용할 수 있다. 예컨대 특정 유전자에 특이적으로 결합하는 프로브를 포함하는 염기서열분석 패널로 타겟 유전자를 선별할 수 있다.The target capture hybridization method of the present invention is a method of selecting a target gene target prior to detecting an abnormality (mutation, gene fusion, or expression abnormality) of a target gene, and a probe specifically binding to a target gene is available. For example, a target gene may be selected by a sequencing panel including a probe that specifically binds to a specific gene.
본 발명의 표적 포획 혼성화 방법은 분석대상 유전체로부터 RNA 추출, cDNA 합성, 어댑터 결찰 및 PCR 후 수행하게 된다.The target capture hybridization method of the present invention is performed after RNA extraction, cDNA synthesis, adapter ligation, and PCR from the genome to be analyzed.
본 발명의 리드 데이터는 미가공 리드 데이터(raw data) 또는 미가공 리드 데이터를 조정하여 얻어진 조정된 데이터이다.The read data of the present invention is raw read data or adjusted data obtained by adjusting raw read data.
미가공 리드 데이터의 조정은 미가공 리드 데이터(raw data)에서 일정 기준 이상의 품질 점수를 가진 데이터만을 필터링하는 것일 수 있고, 품질 점수는 미가공 데이터에서의 추정 오류 확률을 수치로 나타낸 값으로, 구체적으로는 각 염기의 품질을 나타내는 지표인 Phred 점수일 수 있다.The adjustment of the raw lead data may be to filter only data having a quality score higher than or equal to a certain standard from the raw lead data, and the quality score is a value representing the estimation error probability in the raw data numerically, specifically, each It may be a Phred score, which is an index indicating the quality of the base.
과발현 여부의 확인은 상기 리드 데이터를 참조 서열과 정렬하여 SAM/BAM 데이터를 얻는 단계; 상기 SAM/BAM 데이터에서 각 유전자의 발현을 계산하여 GTF 데이터를 얻는 단계; 및 상기 GTF 데이터를 정규화하는 단계를 포함할 수 있다.Confirmation of overexpression may include obtaining SAM/BAM data by aligning the read data with a reference sequence; obtaining GTF data by calculating the expression of each gene from the SAM/BAM data; and normalizing the GTF data.
리드 데이터를 참조 서열과 정렬하여 SAM/BAM 데이터를 얻는 방법은 HISAT2를 이용한 것일 수 있고, SAM/BAM 데이터에서 각 유전자의 발현을 계산하여 GTF 데이터를 얻는 방법은 StringTie를 이용한 것일 수 있으며, GTF 데이터를 정규화하는 방법은 DESeq2를 이용한 것일 수 있다.A method of obtaining SAM/BAM data by aligning read data with a reference sequence may be using HISAT2, and a method of obtaining GTF data by calculating the expression of each gene in SAM/BAM data may be using StringTie, and GTF data A method of normalizing ? may be using DESeq2.
융합의 검출은 상기 리드데이터와 참조 서열의 비교를 통해 융합 유전자를 확인하는 것일 수 있다. 참조 서열은 예컨대 알고리즘 또는 소프트웨어인 Bowtie, STAR, Blat 또는 Bowtie2 등 각 프로그램 내의 참조 서열일 수 있다. 융합 유전자 확인 툴로 STAR-Fusion 또는 Fusion Catcher을 이용할 수 있다.The detection of the fusion may be to confirm the fusion gene by comparing the read data with the reference sequence. The reference sequence may be, for example, a reference sequence within each program such as Bowtie, STAR, Blat or Bowtie2, which is an algorithm or software. As a fusion gene identification tool, STAR-Fusion or Fusion Catcher can be used.
또한 본 발명은 백혈병 진단용 차세대 염기서열분석 패널을 이용하여 차세대 염기서열 분석을 수행하여 미가공 데이터를 얻는 단계; 조정단계; 각 유전자의 융합을 검출하는 단계; 각 유전자의 변이를 검출하는 단계; 및 각 유전자의 발현을 확인하는 단계를 포함하는 백혈병 진단을 위한 정보제공방법을 제공한다:In addition, the present invention includes the steps of obtaining raw data by performing next-generation sequencing using a next-generation sequencing panel for leukemia diagnosis; adjustment step; detecting the fusion of each gene; detecting a mutation in each gene; And it provides an information providing method for leukemia diagnosis comprising the step of confirming the expression of each gene:
상기 차세대 염기서열 분석(NGS, next-generation sequencing)은 유전체를 무수히 많은 조각으로 나눈 뒤 각각의 염기서열 분석하고 조합하여 방대한 유전체의 정보를 고속으로 해독하는 방법으로, RNA추출, cDNA 합성, 어댑터 결찰, 표적 포획 혼성화 및 시퀀싱의 단계로 이루어진다. 각 단계는 당 분야에 공지된 방법으로 수행될 수 있으며, 구체적으로는 환자 혈액 샘플에서 추출한 RNA로 cDNA를 합성하고, 이에 대해 어댑터 부착, PCR 수행 및 표적 포획 혼성화 (target capture hybridization)가 이루어지고, 이렇게 준비된 라이브러리에 대해 서열 분석(시퀀싱)이 수행될 수 있다.The next-generation sequencing (NGS) is a method of high-speed decoding of vast genome information by dividing the genome into countless fragments, analyzing and combining each nucleotide sequence, RNA extraction, cDNA synthesis, adapter ligation , which consists of the steps of target capture hybridization and sequencing. Each step may be performed by a method known in the art, and specifically, cDNA is synthesized from RNA extracted from a patient's blood sample, and adapter attachment, PCR performance and target capture hybridization are performed thereon. Sequence analysis (sequencing) may be performed on the library thus prepared.
상기 조정단계는 미가공 데이터(raw data)에서 일정 기준 이상의 품질 점수를 가진 데이터만을 필터링하는 것일 수 있고, 품질 점수는 미가공 데이터에서의 추정 오류 확률을 수치로 나타낸 값으로, 구체적으로는 각 염기의 품질을 나타내는 지표인 Phred 점수일 수 있다. 각 시퀀싱 리드(read)의 염기서열과 Phred 점수를 같이 표시한 것을 FASTQ 파일이라 부른다.The adjusting step may be to filter only data having a quality score above a certain standard from raw data, and the quality score is a numerical value representing the estimation error probability in the raw data, specifically, the quality of each base. It may be a Phred score, which is an index indicating A FASTQ file in which the nucleotide sequence and Phred score of each sequencing read are displayed together is called a FASTQ file.
상기 Phred 점수가 20 (Q20)이라는 것은 해당 염기서열 결과가 오류일 확률이 1%이고, 30일 경우(Q30)에는 0.1%의 오류 확률을 가지는 것으로 규정되어 있으며, 일반적으로는 30 이상의 Phred 점수를 보이는 염기는 시퀀싱 품질이 우수하다고 판단하고, 추후 분석에 활용된다. If the Phred score is 20 (Q20), the probability that the corresponding nucleotide sequence result is an error is 1%, and when it is 30 (Q30), it is stipulated that it has an error probability of 0.1%. The visible bases are judged to have excellent sequencing quality and are used for further analysis.
상기 유전자의 융합을 검출하는 단계는 서열 정렬 알고리즘 또는 소프트웨어인 Bowtie, STAR, Blat 또는 Bowtie2로 각 프로그램 내의 참조 서열과 정렬하고, 융합 유전자 확인 툴 (STAR-Fusion, Fusion Catcher)로 유전자의 융합을 발굴하는 단계를 포함할 수 있다.The step of detecting the fusion of the gene is aligning with the reference sequence in each program with a sequence alignment algorithm or software Bowtie, STAR, Blat or Bowtie2, and discovering the fusion of the gene with a fusion gene identification tool (STAR-Fusion, Fusion Catcher) may include the step of
상기 서열 정렬 알고리즘 Bowtie 및 Bowtie2는 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml에서, STAR는 https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/03_alignment.html에서, Blat은 https://genome.ucsc.edu/goldenPath/help/blatSpec.html에서 이용가능하다.The sequence alignment algorithms Bowtie and Bowtie2 are at http://bowtie-bio.sourceforge.net/bowtie2/index.shtml, and STAR is at https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons At /03_alignment.html, Blat is available at https://genome.ucsc.edu/goldenPath/help/blatSpec.html.
상기 융합 유전자 확인 툴인 STAR-Fusion은 STAR 서열 정렬기를 이용하여 융합 전사체 후보군을 발굴하는 프로그램으로, https://github.com/STAR-Fusion/STAR-Fusion/wiki#RunnningStarF에서 이용 가능하며, Fusion Catcher은 RNA-seq 데이터에서 체세포 융합 유전자, 전이, 키메라를 규명하는 소프트웨어로 https://github.com/ndaniel/fusioncatcher에서 이용 가능하다.STAR-Fusion, the fusion gene confirmation tool, is a program that discovers fusion transcript candidates using the STAR sequence aligner, and is available at https://github.com/STAR-Fusion/STAR-Fusion/wiki#RunnningStarF, Fusion Catcher is a software that identifies somatic cell fusion genes, metastases, and chimeras from RNA-seq data. It is available at https://github.com/ndaniel/fusioncatcher.
상기 유전자의 변이를 검출하는 단계는 조정된 데이터의 서열을 정렬하여 SAM/BAM 데이터를 얻고, Piccard로 상기 SAM/BAM 데이터 내의 duplicate를 분류 및 표지하고, 상기 정렬, 분류 및 중복 제거된 BAM 데이터를 Freebayes로 SNV 및 Indel 호출하는 단계를 포함할 수 있다.The step of detecting the mutation of the gene aligns the sequence of the adjusted data to obtain SAM / BAM data, classifies and labels duplicates in the SAM / BAM data with Piccard, and the alignment, classification and deduplication BAM data This may include calling SNVs and Indels with Freebayes.
상기 SAM 데이터는 서열 정렬 데이터를 포함하고 있는 텍스트 파일로 각 내용들은 탭(tab)으로 분리되어 정렬, 매핑(mapping) 정보를 담고 있는 것으로, 차세대 염기서열 분석을 통해 시퀀싱된 서열의 전사체 혹은 유전체 서열에 FASTQ 파일을 다시 매핑시킨 형태의 파일이다. BAM 데이터 역시 SAM 데이터와 같은 정보를 담고 있는 압축된 파일로 SAM 데이터보다 용량이 작아 대용량의 차세대 염기서열 분석 데이터를 사용하는 주요 프로그램에서는 BAM 파일을 주로 이용한다.The SAM data is a text file containing sequence alignment data. Each content is separated by tabs and contains alignment and mapping information. Transcripts or genomes of sequences sequenced through next-generation sequencing analysis This is a file in which the FASTQ file is re-mapped to the sequence. BAM data is also a compressed file containing the same information as SAM data. It has a smaller capacity than SAM data, so BAM files are mainly used in major programs that use large-capacity next-generation sequencing data.
상기 Piccard는 라이브러리 제작과정인 PCR에서 한 개의 리드(read) 또는 조각(fragment) 이 비정상적으로 증폭되어 얻게되는 의미없는 리드, 즉 중복(duplicate)에 의한 기술적인 편향을 조절하기 위한 툴로, https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates에서 이용가능하다.The Piccard is a tool for controlling technical bias due to meaningless reads, i.e., duplicates, obtained by abnormally amplifying a single read or fragment in PCR, which is a library production process, https:/ Available at /broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates.
상기 Freebayes는 haplotype-기반 유전적 변이 검출 툴로 모집단에서 변이를 호출하는데 유용하며, https://github.com/freebayes/freebayes에서 이용가능하다.The Freebayes is a haplotype-based genetic variation detection tool useful for calling mutations in a population, and is available at https://github.com/freebayes/freebayes.
상기 SNV(Single nucleotide variant)는 단일 염기 변이를 말하며, SNP(Single nucleotide polymorphism, 인구집단에서 1% 이상의 빈도로 존재하는 변이)를 포괄하는 개념이고, Indel(Insertion/Deletion)은 게놈에 짧은 염기서열이 삽입되거나 결실된 것을 의미한다.The SNV (Single nucleotide variant) refers to a single nucleotide variation, and is a concept encompassing SNP (Single nucleotide polymorphism, a mutation that exists at a frequency of 1% or more in a population), and Indel (Insertion/Deletion) is a short nucleotide sequence in the genome. This means that it is inserted or deleted.
상기 유전자의 발현을 확인하는 단계는 조정된 데이터를 HISAT2로 참조 서열과 정렬하여 SAM/BAM 데이터를 얻는 단계; StringTie로 각 유전자의 발현을 계산하여 GTF 데이터를 얻는 단계; 및 GTF 데이터를 DESeq2로 정규화하는 단계를 포함할 수 있다.The step of confirming the expression of the gene may include: aligning the adjusted data with a reference sequence with HISAT2 to obtain SAM/BAM data; obtaining GTF data by calculating the expression of each gene with StringTie; and normalizing the GTF data to DESeq2.
상기 HISAT2는 차세대 염기서열 분석 리드(reads)를 인간 게놈 집단과 프로그램에서 제공하는 단일 참조 게놈에 매핑하기 위한 정렬 프로그램으로, http://daehwankimlab.github.io/hisat2/에서 이용가능하다.The HISAT2 is an alignment program for mapping next-generation sequencing reads to a human genome population and a single reference genome provided by the program, and is available at http://daehwankimlab.github.io/hisat2/.
상기 StringTie는 RNA-seq 데이터를 가능성있는 전사체(potential transcript)로 효율적으로 조립할 수 있는 프로그램으로, 구체적으로는 각 유전자 좌위(locus)에 대한 여러 스플라이싱 변이체를 나타내는 전체-길이의 전사체를 조립하고 정량화할 수 있으며, http://ccb.jhu.edu/software/stringtie/index.shtml에서 이용가능하다.The StringTie is a program that can efficiently assemble RNA-seq data into a potential transcript, specifically a full-length transcript representing multiple splicing variants for each locus. It can be assembled and quantified, and is available at http://ccb.jhu.edu/software/stringtie/index.shtml.
상기 GTF(Gene transfer format) 데이터는 유전자에 대한 annotation 정보를 포함하고 있는 데이터를 의미한다.The GTF (Gene transfer format) data means data including annotation information on genes.
상기 DESeq2는 모든 샘플에서 각 유전자에 대해 기하 평균이 계산되는 내부 정규화(Normalization)를 수행하고, 그 다음 각 샘플의 유전자 수를 평균으로 나누는 분석법으로, 표본에서 이러한 비율의 중앙값은 해당 표본의 크기 인자를 의미하는 것으로 정규화는 데이터 중복 등의 이상현상으로 데이터가 실제 유전자의 발현 정도를 반영하지 못하는 문제점을 해결하기 위한 것으로 유전자 발현 분석에 필수적인 과정이다.The DESeq2 is an analysis method that performs internal normalization in which the geometric mean is calculated for each gene in all samples, and then divides the number of genes in each sample by the mean. Normalization is an essential process for gene expression analysis to solve the problem that data cannot reflect the actual expression level of a gene due to anomalies such as data duplication.
이하, 실시예를 통해 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail through examples.
실시예 1. 84개 유전자를 표적으로 하는 차세대 염기서열분석 패널Example 1. Next-generation sequencing panel targeting 84 genes
1. 실험방법1. Experimental method
(1) 샘플 수집 및 준비(1) Sample collection and preparation
진단 샘플은 1 개의 인간 참조 RNA (Cat no. 740000, Agilent Technologies), 1 개의 인간 참조 게놈 (NA12878), 반복 유전자 융합이 있는 4개의 검증 샘플, 반복 유전자 융합이 있거나/없는 30개의 임상 샘플 및 클론성 혈액 장애가 없는 14개의 정상 말초 혈액(PB) 샘플을 포함한다. 모든 검증 및 임상 샘플은 혈액 악성 종양 환자에서 유래되었으며 환자 진단이 잘 특성화되었을 때 포함되었다. 환자 진단은 골수 흡인물 (BM aspirates) 및 트레핀 생검 절편의 현미경 소견, 조직 면역염색법, 면역 표현법, 염색체 분석, FISH, 다중 RT-PCR, 실시간 PCR 및 임상 실험실에서 일반적으로 사용되는 NGS DNA 시퀀싱에서 WHO 분류에 따라 이루어졌다. 연구를 위한 환자 및 정상 샘플 수집은 전남 대학교 화순 병원 기관 심의위원회의 승인을 받았다(승인 번호 CNUHH-2020-091).Diagnostic samples included 1 human reference RNA (Cat no. 740000, Agilent Technologies), 1 human reference genome (NA12878), 4 validation samples with repeat gene fusions, 30 clinical samples and clones with/without repeat gene fusions. Include 14 normal peripheral blood (PB) samples without sexual blood disorders. All validation and clinical samples were from patients with hematological malignancies and were included when the patient diagnosis was well characterized. Patient diagnosis is based on microscopic findings of bone marrow aspirates (BM aspirates) and trepine biopsy sections, tissue immunostaining, immunoexpression, chromosomal analysis, FISH, multiplex RT-PCR, real-time PCR, and NGS DNA sequencing commonly used in clinical laboratories. This was done according to the WHO classification. The collection of patient and normal samples for the study was approved by the Institutional Review Board of Chonnam National University Hwasun Hospital (Approval No. CNUHH-2020-091).
환자 샘플은 EDTA(ethylenediaminetetraacetic acid) 튜브에서 얻었다. 4 개의 검증 샘플에 대해, 백혈병 세포 분율이 높고(세포 수의 43% ~ 96%) 반복 유전자 융합을 가진 BM 흡인물의 환자 샘플 8개를 동일한 융합을 가진 쌍(pair)으로 풀링하였다. 검증 샘플에는 BCR-ABL1, PML-RARA, RUNX1-RUNX1T1 및 CBFBMYH11의 융합을 포함하고 있다. 임상 샘플은 6개의 급성 골수성 백혈병 (AML), 9개의 B 림프구성 백혈병/림프종 (B-ALL), 4개의 T 림프구성 백혈병/ 림프종 (TALL), 3개의 성숙 B-세포 종양, 6개의 MPN, 1개의 골수 이형성/ 골수 증식성 종양(MDS/ MPN), 및 호산구 증가증 및 유전자 재배열이 있는 1개의 골수성/ 림프성 종양으로 구성된 27개의 BM 흡인물과 3개의 PB 샘플이 포함되었다. Lymphoprep(Alere Technologies AS)를 사용하여 단핵 세포층을 혈액 샘플에서 분리했다. RNA는 제조업체의 지침에 따라 RNAqueous Isolation 키트 (Thermo Fisher Scientific)로 추출되었다.Patient samples were obtained from ethylenediaminetetraacetic acid (EDTA) tubes. For the 4 validation samples, 8 patient samples of BM aspirates with high leukemia cell fraction (43% to 96% of cell number) and repeat gene fusions were pooled into pairs with identical fusions. Validation samples included fusions of BCR-ABL1, PML-RARA, RUNX1-RUNX1T1 and CBFBMYH11. Clinical samples included 6 acute myeloid leukemia (AML), 9 B lymphocytic leukemia/lymphoma (B-ALL), 4 T lymphocytic leukemia/lymphoma (TALL), 3 mature B-cell tumors, 6 MPN, 27 BM aspirates and 3 PB samples consisting of 1 myelodysplastic/myeloproliferative tumor (MDS/MPN), and 1 myeloid/lymphoid tumor with eosinophilia and gene rearrangements were included. Mononuclear cell layers were isolated from blood samples using Lymphoprep (Alere Technologies AS). RNA was extracted with an RNAqueous Isolation kit (Thermo Fisher Scientific) according to the manufacturer's instructions.
(2) 표적 캡처 패널의 설계 및 평가(2) Design and evaluation of target capture panels
혈액 암과 관련된 총 84개의 유전자 (AML, ALL, 림프종, MPN 및 유전자 재배열이 있는 골수성/ 림프성 종양)가 이전 문헌을 기반으로 선택되었다.A total of 84 genes associated with hematologic cancers (AML, ALL, lymphoma, MPN, and myeloid/lymphoid tumors with gene rearrangements) were selected based on previous literature.
84개의 유전자 및 각 유전자와 관련된 백혈병의 종류는 표 2 및 2에 기재하였다.The 84 genes and the types of leukemia associated with each gene are listed in Tables 2 and 2.
본 발명 패널이 표적으로 하는 84개의 유전자84 genes targeted by the present panel | ||||||||
ABL1ABL1 | CCND1CCND1 | EP300EP300 | GUSBGUSB | JAK2JAK2 | MRTFAMRTFA | PDGFRAPDGFRA | RBM15RBM15 | TRGTRG |
ABL2ABL2 | CCND2CCND2 | EPOREPOR | HBS1LHBS1L | KMT2AKMT2A | MYCMYC | PDGFRBPDGFRB | RUNX1RUNX1 | TYK2TYK2 |
AFF1AFF1 | CCND3CCND3 | ERGERG | HPRT1HPRT1 | MAFMAF | MYH11MYH11 | PHBPHB | RUNX1T1RUNX1T1 | WT1WT1 |
ALKALK | CRBNCRBN | ETV6ETV6 | IGHIGH | MAFAMAFA | NSD2NSD2 | PHB2PHB2 | SDHASDHA | ZNF384ZNF384 |
BAALCBAALC | CREBBPCREBBP | FGFR1FGFR1 | IGKIGK | MAFBMAFB | NTRK3NTRK3 | PICALMPICALM | TBPTBP | |
BCL2BCL2 | CRLF2CRLF2 | FGFR3FGFR3 | IGLIGL | MECOMMECOM | NUP214NUP214 | PMLPML | TCF3TCF3 | |
BCL6BCL6 | CSF1RCSF1R | FIP1L1FIP1L1 | IKZF1IKZF1 | MEF2DMEF2D | NUP98NUP98 | PPIAPPIA | TCL1ATCL1A | |
BCL9BCL9 | DEKDEK | FUSFUS | IL2RBIL2RB | MLF1MLF1 | PAX5PAX5 | PSMB2PSMB2 | TP63TP63 | |
BCRBCR | DUSP22DUSP22 | GAPDHGAPDH | IL3IL3 | MLLT10MLLT10 | PBX1PBX1 | RAB7ARAB7A | TRATRA | |
CBFBCBFB | EBF1EBF1 | GATA2GATA2 | IRF4IRF4 | MLLT3MLLT3 | PCM1PCM1 | RARARARA | TRBTRB |
백혈병의 세부종류Subtypes of Leukemia | 관련 유전자Related genes | |
1One | 급성 골수성 백혈병(Acute myeloid leukemia, AML)Acute myeloid leukemia (AML) | ABL2, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, CRLF2, DEK, DUSP22, EBF1, EPOR, ETV6, FGFR3, FGFR1, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3, MRTFA, MYC, MYH11, NUP98, PCM1, PHB, PHB2, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, WT1ABL2, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, CRLF2, DEK, DUSP22, EBF1, EPOR, ETV6, FGFR3, FGFR1, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A MEF2D, MLF1, MLLT3, MRTFA, MYC, MYH11, NUP98, PCM1, PHB, PHB2, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, WT1 |
22 | B-림프아구성 백혈병/림프종(B-lymphoblastic leukemia/lymphoma, B-ALL)B-lymphoblastic leukemia/lymphoma (B-ALL) | ABL1, ABL2, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, CRLF2, DEK, DUSP22, EP300, EPOR, ERG, FGFR1, FGFR3, FIP1L1, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3, PAX5, PBX1, PCM1, PHB, PHB2, PPIA, RAB7A, TCF3, ZNF384ABL1, ABL2, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, CRLF2, DEK, DUSP22, EP300, EPOR, ERG, FGFR1, FGFR3, FIP1L1, HBS1L, HPRT1, IGH IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3, PAX5, PBX1, PCM1, PHB, PHB2, PPIA, RAB7A, TCF3, ZNF384 |
33 | T-림프아구성 백혈병/림프종(T-lymphoblastic leukemia/lymphoma, T-ALL)T-lymphoblastic leukemia/lymphoma (T-ALL) | ABL1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CRLF2, DEK, DUSP22, EPOR, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3, NUP214, PCM1, PHB, PHB2, TBP, TCL1A, TRB, TRG, TYK2ABL1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CRLF2, DEK, DUSP22, EPOR, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, MYC, NSD2, NTRK3 NUP214, PCM1, PHB, PHB2, TBP, TCL1A, TRB, TRG, TYK2 |
44 | 성숙 B세포 종양(Mature B-cell neoplasm, MBN)Mature B-cell neoplasm (MBN) | ABL1, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, CSF1R, TP63, DEK, DUSP22, ETV6, FGFR1, FGFR3, IGH, IGK, IGL, IKZF1, KMT2A, MYC, NTRK3, PAX5, PBX1, PHB, PHB2, PPIA, RAB7A, TCF3, TP63, ZNF384ABL1, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, CSF1R, TP63, DEK, DUSP22, ETV6, FGFR1, FGFR3, IGH, IGK, IGL, IKZF1, KMT2A, MYC, NTRK3, PAX5, PHB2, PHB2, PHB2 PPIA, RAB7A, TCF3, TP63, ZNF384 |
55 | 골수 증식성 종양 (Myeloproliferative neoplasms, MPN)Myeloproliferative neoplasms (MPN) | ABL1, AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, EPOR, ETV6, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, MYC, PHB, PHB2, PSMB2, TP63ABL1, AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, EPOR, ETV6, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, MYC, PHB, PHB2, PSMB2, TP63 |
66 | 골수 형성 이상/골수 증식 종양 (Myelodysplastic/myeloproliferative neoplasm, MDS/MPN)Myelodysplastic/myeloproliferative neoplasm (MDS/MPN) | ABL1, ABL2, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, ETV6, FGFR3, GATA2, IKZF1, MAFA, MAFB, MYC, PCM1, PHB, PHB2ABL1, ABL2, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, ETV6, FGFR3, GATA2, IKZF1, MAFA, MAFB, MYC, PCM1, PHB, PHB2 |
77 | 골수성/림프성 종양(Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement, MLN)Myeloid/lymphoid neoplasm with eosinophilia and gene rearrangement (MLN) | FGFR1, JAK2, PDGFRB, MYC, PDGFRA, PHB, PHB2FGFR1, JAK2, PDGFRB, MYC, PDGFRA, PHB, PHB2 |
88 | Philadelphia chromosome-like ALLPhiladelphia chromosome-like ALL | ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, MYCABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, MYC |
맞춤형 올리고 뉴클레오티드 프로브는 표적 유전자를 포착하도록 설계되었다. 프로브가 패널의 84개 유전자를 균일하게 포착했는지 평가하기 위해 DNA 템플릿 (인간 게놈 참조 NA12878)을 서열 분석하였다. 전체 평균 범위도는 deepTools를 사용하여 육안으로 검사되었으며, 범위 균일성(%)은 대상 영역에 대한 평균 범위보다 0.2 배 높은 기준 위치의 백분율로 계산되었다. Custom oligonucleotide probes are designed to capture target genes. A DNA template (human genome reference NA12878) was sequenced to assess whether the probes uniformly captured the 84 genes of the panel. The overall average coverage diagram was visually inspected using deepTools, and coverage uniformity (%) was calculated as a percentage of the reference position 0.2 times higher than the average coverage for the target area.
(3) 분석 검증 매트릭스 및 비교 분석(3) Analysis validation matrix and comparative analysis
실험 전 30 개의 분석 검증 매트릭스를 마련하였다 (표 4). 실행 내 반복성과 이월성을 평가하기 위해 암 세포주에서 얻은 하나의 참조 RNA, 하나의 정상 샘플, 높은 종양 부담을 가진 4개의 검증 샘플 및 2 반복실험을 실행(run)에서 테스트했다 (실행 1). 희석 시험의 경우, BCR-ABL1과 PML-RARA를 포함하는 유효성 검사 샘플 2개를 첫 번째 농도 1,500ng에서 2배 희석(1:2, 1:4, 1:8)하여 테스트하였다(실행 2). 각 반복실험은 실행 간 유효성 검사를 위해 다시 시험되었다 (실행 3). 그 후, 30개의 임상 샘플과 13개의 정상 샘플을 융합 유전자 검출을 위해 기존의 FISH 또는 RT-PCR 방법과 비교 분석하고 추가로 발현 및 변이 분석을 위해 테스트하였다.Thirty assay validation matrices were prepared prior to the experiment (Table 4). One reference RNA, one normal sample, 4 validation samples with high tumor burden and 2 replicates from cancer cell lines were tested in the run (Run 1) to evaluate repeatability and carryover within the run. For the dilution test, two validation samples containing BCR-ABL1 and PML-RARA were tested at a first concentration of 1,500 ng at a 2-fold dilution (1:2, 1:4, 1:8) (Run 2). . Each replicate was tested again for inter-run validation (run 3). Then, 30 clinical samples and 13 normal samples were analyzed and compared with conventional FISH or RT-PCR methods for fusion gene detection and further tested for expression and mutation analysis.
표 4. 유전자 융합 검출을 위한 표적 RNA 시퀀싱의 분석 검증 매트릭스Table 4. Assay validation matrix of target RNA sequencing for gene fusion detection
Run no.Run no. | 샘플 색인 (sample index)sample index | |||||||
1One | 22 | 33 | 44 | 55 | 66 | 77 | 88 | |
1One | 인간 참조 RNA* human reference RNA * | 정상 샘플normal sample |
VS1 (BCR-ABL1)VS1 ( BCR-ABL1 ) |
VS1 replicate (BCR-ABL1)VS1 replicate ( BCR-ABL1 ) |
VS2 (PML-RARA)VS2 ( PML-RARA ) |
VS2 replicate (PML-RARA)VS2 replicate ( PML-RARA ) |
VS3 (RUNX1-RUNX1T1)VS3 ( RUNX1-RUNX1T1 ) |
VS4 (CBFB-MYH11)VS4 ( CBFB-MYH11 ) |
22 |
VS1-D1 [1:2 희석]VS1-D1 [1:2 dilution] |
VS2-D1 [1:2 희석]VS2-D1 [1:2 dilution] |
VS1-D2 [1:4 희석]VS1-D2 [1:4 dilution] |
VS2-D2 [1:4희석]VS2-D2 [1:4 dilution] |
VS1-D3 [1:8 희석]VS1-D3 [1:8 dilution] |
VS2-D3 [1:8 희석]VS2-D3 [1:8 dilution] |
VS1 replicate (BCR-ABL1)VS1 replicate ( BCR-ABL1 ) |
VS2 replicate (PML-RARA)VS2 replicate ( PML-RARA ) |
33 |
VS1-D1 replicate [1:2 희석]VS1-D1 replicate [1:2 dilution] |
VS2-D1 replicate [1:2 희석]VS2-D1 replicate [1:2 dilution] |
VS1-D2 replicate [1:4 희석]VS1-D2 replicate [1:4 dilution] |
VS2-D2 replicate [1:4희석]VS2-D2 replicate [1:4 dilution] |
VS1-D3 replicate [1:8 희석]VS1-D3 replicate [1:8 dilution] |
VS2-D3 replicate [1:8 희석]VS2-D3 replicate [1:8 dilution] |
VS1 replicate (BCR-ABL1)VS1 replicate ( BCR-ABL1 ) |
VS2 replicate (PML-RARA)VS2 replicate ( PML-RARA ) |
* 범용적인 인간 참조 RNA (Cat no. 740000, Agilent Technologies).괄호 안의 유전자 융합은 다중 RT-PCR 또는 형광 in situ 혼성화에 의해 이전에 검출된 유효성 검사 샘플에서 알려진 융합들임. VS (알려진 융합이 있는 검증 샘플); D (희석된 샘플).* Universal human reference RNA (Cat no. 740000, Agilent Technologies). Gene fusions in parentheses are known fusions in validation samples previously detected by multiplex RT-PCR or fluorescence in situ hybridization. VS (validation sample with known fusion); D (diluted sample).
(4) 라이브러리 준비 및 표적 RNA-seq(4) library preparation and target RNA-seq
cDNA 합성, 라이브러리 준비, 포획 혼성화는 HEMEaccuTest RNA kit (NGeneBio, Seoul, Korea)를 사용하여 수행되었다. 추출된 총 RNA 800~1,500ng에서 NEBNext® rRNA Depletion kit (NEB)를 사용하여 리보솜 RNA를 제거한 후 cDNA를 합성 및 정제했다. 어댑터 결찰(ligation), PCR 농축 및 표적 포획 혼성화는 제조업체의 지침에 따라 수행되었다. 라이브러리의 농도와 크기는 각각 Qubit 2.0 Fluorometer (Invitrogen) 및 4200 TapeStation 시스템 (Agilent Technologies)을 사용하여 측정되었다. 라이브러리는 MiseqDx (Illumina)의 Miseq 시약 키트v3 (300 사이클)를 사용하여 150bp 페어드-엔드(paired-ends)로 서열분석되었다.cDNA synthesis, library preparation, and capture hybridization were performed using HEMEaccuTest RNA kit (NGenBio, Seoul, Korea). After removing ribosomal RNA from 800~1,500ng of extracted total RNA using NEBNext® rRNA Depletion kit (NEB), cDNA was synthesized and purified. Adapter ligation, PCR enrichment and target capture hybridization were performed according to the manufacturer's instructions. The concentration and size of the library were measured using a Qubit 2.0 Fluorometer (Invitrogen) and a 4200 TapeStation system (Agilent Technologies), respectively. Libraries were sequenced with 150bp paired-ends using Miseq Reagent Kit v3 (300 cycles) from MiseqDx (Illumina).
(5) 생물 정보학 파이프라인(5) bioinformatics pipeline
이 연구에 사용된 생물 정보학 파이프 라인은 도 1에 요약되어 설명되어 있다. FASTQ 형식의 페어드-엔드 리드 (paired-end reads)의 시퀀싱 출력 파일은 Q10의 품질 점수로 조정되었다. 조정 후, STAR-Fusion 및 FusionCatcher 알고리즘을 모두 사용하여 융합 전사체(fusion transcript)를 확인했다. The bioinformatics pipeline used in this study is summarized and described in Figure 1. Sequencing output files of paired-end reads in FASTQ format were adjusted with a quality score of Q10. After adjustment, fusion transcripts were identified using both STAR-Fusion and FusionCatcher algorithms.
융합 검출에서 융합 리드 수(fusion read counts)와 FFPM (백만 당 융합 단편)의 두 매개 변수를 사용하여 예측된 융합을 조사했다. FFPM은 융합 리드 수보다 정규화된(normalized) 값이며 STAR-Fusion에서 사용가능하다. 뉴클레오타이드 변이체를 검출하기 위해 STAR에서 생성한 정렬된 BAM 파일은 Picard로 처리되고, 그 다음 FreeBayes에서 변이체 호출(variant calling)이 수행되었다. 변이는 ANNOVAR을 사용하여 주석을 달고 주석이 달린 정보를 기반으로 필터링되는데, 여기에는 영역(엑소닉 및 스플라이싱), 기능 (비동의(non-synonymous), 미스센스, 넌센스, 프레임 쉬프트) 및 빈도 (인구 데이터베이스에서 1% 미만이고 질병 데이터베이스에서 병원성 또는 병원 가능성이 있음)가 포함된다. 필터링된 변이는 임상적 유의성에 대한 근거 수준에 따라 등급이 매겨졌으며, 임상적으로 유의한 근거가 있는 1단계 및 2단계 변이가 최종적으로 선택되었다. In fusion detection, predicted fusions were investigated using two parameters: fusion read counts and FFPM (fusion fragments per million). FFPM is a normalized value than the number of fusion reads and is available in STAR-Fusion. The aligned BAM file generated by STAR to detect nucleotide variants was processed with Picard, and then variant calling was performed in FreeBayes. Mutations are annotated using ANNOVAR and filtered based on the annotated information, including domain (exonic and splicing), function (non-synonymous, missense, nonsense, frame shift) and Frequencies (less than 1% in population databases and pathogenic or likely pathogenic in disease databases) are included. Filtered variants were graded according to the level of evidence for clinical significance, and stage 1 and stage 2 variants with clinically significant evidence were finally selected.
발현 분석에서, 조정된 리드는 HISAT2를 사용하여 정렬된 다음, StringTie가 전사체 조립 및 발현 수준의 정량화에 사용되었다. 입수한 정렬 파일은 Samtools를 사용하여 BAM 형식으로 변환되었다. 그 다음, 리드 데이터는 DESeq2.38을 사용하여 정규화되었다. log2-fold-change는 임상 샘플의 정규화된 리드 수 및 14 개 정상 대조군의 평균을 이용하여 계산되었다. In expression analysis, adjusted reads were aligned using HISAT2 and then StringTie was used for transcript assembly and quantification of expression levels. The obtained alignment files were converted to BAM format using Samtools. Then, the read data were normalized using DESeq2.38. The log2-fold-change was calculated using the normalized number of reads from clinical samples and the average of 14 normal controls.
조절 장애 유전자(dysregulated genes)를 결정할 때, 임의의 log2-fold-change cutoff는 ± 2.0으로 설정되었다. 전체 프로세스에서 데이터는 인간 참조 게놈 (GRCh37 / hg19)에 매핑되었다.When determining the dysregulated genes, an arbitrary log2-fold-change cutoff was set to ±2.0. Data from the whole process were mapped to the human reference genome (GRCh37/hg19).
(6) 융합 후보군의 필터링 및 우선 순위 지정(6) Filtering and prioritization of fusion candidates
예측된 융합 후보군을 위양성 결과를 배제하도록 필터링한 다음 문헌의 우선순위 증거와 임상 증상의 관련성에 따라 계층화된 등급 시스템을 사용하여 분류하였다. 단계적 필터링 기준을 조정할 때 융합 후보군은 다음과 같은 경우 진정 융합 (true fusions)으로 간주되었다: i) 최소 리드 수 (FFPM≥ 0.1 및 접합 리드 수≥ 1)에 의해 뒷받침될 것, ii) 짧은 반복측정, 유사유전자, 리드스루(read-through)에 해당되지 않거나 건강한 모집단 또는 정규 표본에서 발견된 경우, iii) 융합 파트너 유전자의 발현 수준에 영향을 주거나 프레임 내 융합(in-frame fusion)을 일으킨 경우, 및 iv) 두 융합 검출 알고리즘 (FusionCatcher 최종결과파일 및 STAR-Fusion 예비파일과 최종결과파일)에 의해 검출된 경우, 진정 융합으로 간주되었다.Predicted fusion candidates were filtered to exclude false-positive results and then classified using a stratified grading system according to the relevance of clinical symptoms with priority evidence in the literature. When adjusting the cascading criteria, fusion candidates were considered true fusions if: i) supported by a minimum number of reads (FFPM ≥ 0.1 and junctional reads ≥ 1), ii) short replicates , pseudogenes, not read-through or found in healthy populations or normal samples, iii) affecting expression levels of fusion partner genes or causing in-frame fusions; and iv) two fusion detection algorithms (FusionCatcher final result file and STAR-Fusion preliminary file and final result file) were considered true fusion.
예측된 진정 양성 융합은 암에서 NGS 결과를 해석하기 위한 이전 지침에 따라 분류되었다. Predicted truly positive fusions were classified according to previous guidelines for interpreting NGS results in cancer.
1 단계 (현장 전문가의 합의를 통한 우수한 연구) 및 2 단계 (일부 합의가 있는 여러 소규모 발표된 연구; 전임상 시험; 또는 합의가 없는 몇 가지 사례 보고서) 융합은 이전 계층 등급 시스템을 사용하여 선택되었다. 단계적 등급 부여를 위해, 전문가 합의있는 연구 및 ChimerDB 및 Mitelman 데이터베이스를 포함한 융합 데이터베이스가 이용되었다.Phase 1 (good studies with consensus of field experts) and Phase 2 (several small published studies with some agreement; preclinical trials; or several case reports without agreement) fusions were selected using a previous tier grading system. For grading, expert consensus studies and fusion databases including ChimerDB and Mitelman databases were used.
예측된 진정 양성 융합이 환자의 암 유형과 연관되지 않은 경우, 잘 알려진 연구와 질병 데이터베이스에서 발견되었더라도 융합이 1단계 또는 2단계로 간주되지 않았다. 등급 1단계와 2단계의 전체 예측된 진정 융합 중에서, 다중 RT-PCR, FISH 또는 직접 시퀀싱으로 입증된 융합은 확인된(confirmed) 융합으로 간주되고, 다른 방법으로 식별되지 않은 융합은 추정 융합으로 간주되었다.If the predicted true-positive fusion was not associated with the patient's cancer type, the fusion was not considered stage 1 or stage 2, even if found in well-known studies and disease databases. Of all predicted true fusions in grades 1 and 2, fusions verified by multiple RT-PCR, FISH, or direct sequencing are considered confirmed fusions, and fusions not otherwise identified are considered putative fusions. became
(7) 융합 및 돌연변이 검출 방법(7) fusion and mutation detection methods
다중 RT-PCR은 28 개의 전좌(translocation)와 145 개의 절단점(breakpoints)을 표적으로 하는 HemaVision 키트 (DNA Technology)를 사용하여 수행되었다. FISH는 IGH-CCND1 (MetaSystems), BCR-ABL1 (MetaSystems), RUNX1-RUNX1T1 (Abbott Molecular), PML-RARA (Abbott Molecular)를 표적으로 하는 이중 융합 프로브 및 ETV6-RUNX1 (Abbott Molecular) 및 PDGFRB (MetaSystems), CBFB (MetaSystems) 및 KMT2A (Abbott Molecular)의 분리형 프로브를 사용하여 수행되었다. 표적 RNA-seq에서 예측된 융합이 다중 RT-PCR 또는 FISH에 의해 입증되지 않은 경우는 직접 시퀀싱(direct sequencing)을 시도하였다. cDNA 합성은 PrimeScript ™ II 1st strand cDNA 합성 키트 (Takara)를 사용하여 500~ 1,000ng의 총 RNA를 사용하여 수행되었다. Takara ExTaq (Takara)을 사용하여 1μL의 cDNA를 다음의 프라이머들로 증폭했다.Multiple RT-PCR was performed using the HemaVision kit (DNA Technology) targeting 28 translocations and 145 breakpoints. FISH is a dual fusion probe targeting IGH-CCND1 (MetaSystems), BCR-ABL1 (MetaSystems), RUNX1-RUNX1T1 (Abbott Molecular), PML-RARA (Abbott Molecular) and ETV6-RUNX1 (Abbott Molecular) and PDGFRB (MetaSystems). ), CBFB (MetaSystems) and KMT2A (Abbott Molecular) were performed using detachable probes. Direct sequencing was attempted if the predicted fusion in target RNA-seq was not verified by multiplex RT-PCR or FISH. cDNA synthesis was performed using PrimeScript™ II 1st strand cDNA synthesis kit (Takara) using 500-1,000 ng of total RNA. Using Takara ExTaq (Takara), 1 μL of cDNA was amplified with the following primers.
PrimerPrimer | 서열order | 서열번호SEQ ID NO: |
PAX5-FPAX5-F | 5'-AGATGCGGGGAGACTTGTT-3'5'-AGATGCGGGGAGACTTGTT-3' | 1One |
ARHGAP22-RARHGAP22-R | 5'-CTGCACCCAGTCCTCCATGT-3'5'-CTGCACCCAGTCCTCCATGT-3' | 22 |
DACH1-RDACH1-R | 5'-GCTCATTGCCATGGTGACAG-3'5'-GCTCATTGCCATGGTGACAG-3' | 33 |
PICALM-FPICALM-F | 5'-ACCCCCTGTAATGGCCTATC-3'5'-ACCCCCTGTAATGGCCTATC-3' | 44 |
MLLT10-RMLLT10-R | 5'-CAGTGGCTGCTTTGCTTTCTC-3'5'-CAGTGGCTGCTTTGCTTTCTC-3' | 55 |
MECOM-FMECOM-F | 5'-CTGCATAGATGCCAGTCAACCA-3'5'-CTGCATAGATGCCAGTCAACCA-3' | 66 |
MBNL1-RMBNL1-R | 5'- CAGGCATCATGGCATTGGCTA-3'5'-CAGGCATCATGGCATTGGCTA-3' | 77 |
MLLT3-RMLLT3-R | 5'-TCGTGCAAGTGGAAGACGAC-3'5'-TCGTGCAAGTGGAAGACGAC-3' | 88 |
CCND6-FCCND6-F | 5'-TCCGAGAGTGAGTCCAGCTT-3'5'-TCCGAGAGTGAGTCCAGCTT-3' | 99 |
PDGFRB-RPDGFRB-R | 5'-CGGATCTCGTAACGTGGCTT-3'5'-CGGATCTCGTAACGTGGCTT-3' | 1010 |
PCR 산물의 사이즈는 4200 TapeStation 시스템 (Agilent Technologies)을 사용하여 측정하였다. 모든 PCR 단계는 양성 대조군으로 548 bp 크기의 GADPH 유전자를 이용하였다. The size of the PCR product was measured using a 4200 TapeStation system (Agilent Technologies). All PCR steps used the 548 bp GADPH gene as a positive control.
직접 시퀀싱은 마크로젠(대한민국, 서울)이 동일한 정방향 및 역방향 프라이머를 이용한 PCR 산물을 사용하여 수행하였다. 시퀀싱 파일은 SeqMan 소프트웨어 (DNASTAR)으로 분석되었다. 이용 가능한 DNA 기반 PCR 또는 시퀀싱 결과가 있는 경우, 표적 RNA-seq에서 검출된 모든 변이체는 정량적 실시간 PCR (JAK2 MutaQuant 분석 키트, Ipsogen) 및 시퀀싱 (HEMEaccuTest DNA kit; NGeneBio)을 포함하는 DNA 기반 방법의 결과와 비교하여 확인되었다.Direct sequencing was performed using PCR products using the same forward and reverse primers of Macrogen (Seoul, Korea). Sequencing files were analyzed with SeqMan software (DNASTAR). If DNA-based PCR or sequencing results are available, all variants detected in the target RNA-seq are the results of DNA-based methods including quantitative real-time PCR (JAK2 MutaQuant assay kit, Ipsogen) and sequencing (HEMEaccuTest DNA kit; NGenBio). was confirmed by comparison with
(8) 통계적 분석(8) Statistical analysis
Wilcoxon rank-sum 테스트를 사용하여 평균 캐리오버(carryover)와 실제 융합 횟수를 비교하였다. 반복성과 선형성을 평가하기 위해 선형 회귀를 수행하였다. 계층 클러스터링(Hierarchical clustering)은 Euclidean 거리의 근접 측정을 사용하여 완전 연계(complete linkage)를 수행하였다. 모든 통계 분석은 R studio (Rstudio, Inc.)를 사용하여 수행되었다.The average carryover and the actual number of fusions were compared using the Wilcoxon rank-sum test. Linear regression was performed to evaluate repeatability and linearity. Hierarchical clustering performed complete linkage using proximity measurement of Euclidean distance. All statistical analyzes were performed using R studio (Rstudio, Inc.).
2. 실험결과2. Experimental results
(1) 검증 샘플을 사용한 분석 검증(1) Assay validation using validation samples
패널에 포함된 표적 유전자 커버리지 조사를 위해 RNA 표본이 각 표본의 유전자 발현 및 융합에 따라 다양한 패턴을 보일 수 있기 때문에 DNA 템플릿 (인간 게놈 참조 NA12878)을 사용하였다. To investigate the target gene coverage included in the panel, a DNA template (human genome reference NA12878) was used because RNA samples can show various patterns depending on the gene expression and fusion of each sample.
커버리지 플롯은 대상 성적표의 시작부터 끝 위치까지 균일한 평균 커버리지를 보여주었다. 커버리지의 균일성 (0.2×전체 평균 깊이 대비 높은 염기쌍의 %)은 99.8 %로 계산되었고, 패널 내 표적 유전자에 대한 균일한 커버리지를 보였다. 도 2는 본 발명의 표적화된 RNA-seq의 분석 성능을 보여준다. 실행 내 테스트 (표 4의 실행 1)에서, 예상되는 모든 융합은 6개의 양성 샘플과 알려진 융합을 가진 1 개의 참조 RNA로부터 필터링 전략을 조정한 후 안정적으로 검출되었다. 필터링 전, BCR-ABL1, PML-RARA, RUNX1-RUNX1T1 및 CBFB-MYH11을 포함한 캐리오버 융합들은 각 융합을 포함하지 않는 8개 샘플 모두에서 관찰되었다.The coverage plots showed uniform average coverage from the beginning to the end of the subject transcript. The uniformity of coverage (0.2×% of high base pairs relative to the total mean depth) was calculated to be 99.8%, showing uniform coverage for the target gene within the panel. Figure 2 shows the analytical performance of the targeted RNA-seq of the present invention. In an in-run test (run 1 in Table 4), all expected fusions were reliably detected after adjusting the filtering strategy from 6 positive samples and 1 reference RNA with known fusions. Before filtering, carryover fusions including BCR-ABL1, PML-RARA, RUNX1-RUNX1T1 and CBFB-MYH11 were observed in all 8 samples without each fusion.
캐리오버 융합 및 실제 융합에 대한 평균 log2 FFPM은 STAR-Fusion에서 각각 -0.37 및 5.04 였고, 캐리오버 융합 및 실제 융합에 대한 평균 log2의 융합 지원 리드(fusion supporting reads)은 FusionCatcher에서 각각 2.30 및 9.62였다. 캐리오버 융합은 진정 융합 (P <0.001, 도 2A 및 2B)보다 log2 FFPM 및 log2 융합 지원 리드값이 현저히 낮음을 보여주었고, 낮은 리드 수로 인해 필터링되었다. The average log2 FFPMs for carryover fusion and true fusion were -0.37 and 5.04 in STAR-Fusion, respectively, and the average log2 fusion supporting reads for carryover fusion and true fusion were 2.30 and 9.62 in FusionCatcher, respectively. . Carryover fusions showed significantly lower log2 FFPM and log2 fusion support read values than true fusions (P < 0.001, FIGS. 2A and 2B ), and were filtered out due to the lower number of reads.
실행 내 및 실행 간 테스트 (표 4의 실행 1-3) 모두에서, 모든 반복실험의 리드 수는 신뢰할 수 있는 반복성을 보여주었다 (r2 = 0.9655; 도 2C). STAR-Fusion에서 제공하는 정규화된 FFPM 값을 사용할 때, 그 결과는 리드 수만 사용한 경우보다 더 높은 반복성을 보여주었다 (r2 = 0.9874; 도 2D). 2배 희석 테스트 (표 4의 실행 2 및 3)에서, 2개의 알려진 융합 (BCR-ABL1 및 PML-RARA)은 3번 2배 희석이 될 때까지(1:8 희석) 높은 FFPM (> 9.0)으로 안정적으로 검출되었다. BCR-ABL1 및 PML-RARA를 포함하는 희석된 샘플의 FFPM은 선형 log2 배 변화 (각각 r2=0.9852 및 0.9447; 도 2E 및 2F)를 보였고, 검출 한계는 FFPM 컷오프를 0.1로 가정할 때 2배 희석이 4-5번 (1:16~1:32)될 것으로 예측되었다.In both the intra-run and inter-run tests (runs 1-3 in Table 4), the number of reads from all replicates showed reliable repeatability (r2 = 0.9655; Figure 2C). When using the normalized FFPM value provided by STAR-Fusion, the result showed higher repeatability than when only the number of reads was used (r2 = 0.9874; FIG. 2D). In the 2-fold dilution test (runs 2 and 3 in Table 4), the two known fusions (BCR-ABL1 and PML-RARA) had high FFPM (>9.0) until 3-fold dilution (1:8 dilution). was stably detected. The FFPM of the diluted samples containing BCR-ABL1 and PML-RARA showed a linear log2 fold change (r2=0.9852 and 0.9447, respectively; FIGS. 2E and 2F), and the limit of detection was a 2-fold dilution assuming an FFPM cutoff of 0.1. This was predicted to happen 4-5 times (1:16~1:32).
(2) 임상 샘플을 사용한 유전자 융합 검출(2) Gene fusion detection using clinical samples
첫 번째 단계에서, 30 개의 임상 샘플에서 약 2억 2천 7백만 개의 전사체 서열 리드를 생성하였다. 미가공 리드(raw reads)에서 최소 리드 수를 충족하는 총 1,243 및 3,363 융합 전사체가 각각 STAR-Fusion 및 FusionCatcher에 의해 예측되었다. 실험 방법 부분에 설명된 필터링 및 우선 순위 지정 전략을 조정한 후, 동형 유전자(isoform) 및 상호 융합(reciprocal fusions)을 포함하는 40 및 211개의 융합 전사체가 STAR-Fusion 및 FusionCatcher에서 각각 임상적으로 중요한 보고 가능한 융합 (1단계 및 2단계)으로 선택되었다. 우세한 동형 유전자를 선택하고 상호 융합을 무시한 후, 최종적으로 총 30개의 융합이 최종적으로 큐레이션되었다.In the first step, approximately 227 million transcript sequence reads were generated from 30 clinical samples. A total of 1,243 and 3,363 fusion transcripts meeting the minimum number of reads in raw reads were predicted by STAR-Fusion and FusionCatcher, respectively. After adjusting the filtering and prioritization strategies described in the experimental methods section, 40 and 211 fusion transcripts containing isoforms and reciprocal fusions were clinically significant in STAR-Fusion and FusionCatcher, respectively. It was selected as a reportable fusion (stage 1 and stage 2). After selecting the dominant homozygous gene and ignoring mutual fusions, a total of 30 fusions were finally curated.
표 6은 6개의 AML, 9개의 B-ALL, 4개의 T-ALL, 3개의 성숙한 B 세포 종양, 6개의 MPN, 1개의 MDS/ MPN 및 1개의 PDGFRB 재배열이 있는 골수성/림프성 종양의 30개 임상 샘플을 사용한 기존 방법과 비교한 표적 RNA-seq의 최종 결과를 보여준다. 13개의 알려진 융합 중에서, 표적화된 RNA-seq는 동일한 12개의 융합과 CCND1-IGH의 1개의 상호 융합을 검출하였다. PDGFRB 재배열이 있는 하나의 샘플에서 파트너 유전자는 기존의 FISH와 달리 표적 RNA-seq에서 CCDC6으로 지정되었으며, 이는 직접 시퀀싱으로도 확인되었다.Table 6 shows 30 of myeloid/lymphoid tumors with 6 AML, 9 B-ALL, 4 T-ALL, 3 mature B-cell tumors, 6 MPN, 1 MDS/MPN and 1 PDGFRB rearrangement. The final results of target RNA-seq compared to the conventional method using canine clinical samples are shown. Of the 13 known fusions, targeted RNA-seq detected 12 identical fusions and 1 mutual fusion of CCND1-IGH. In one sample with PDGFRB rearrangement, the partner gene was designated as CCDC6 in target RNA-seq, unlike conventional FISH, which was also confirmed by direct sequencing.
표 6. 혈액 악성 종양에서 30 개의 임상 샘플을 사용하여 기존 방법 (FISH 또는 다중 RT-PCR)과 표적 RNA-seq 간의 결과 비교Table 6. Comparison of results between conventional methods (FISH or multiple RT-PCR) and target RNA-seq using 30 clinical samples in hematological malignancies
샘플Sample
번호number |
진단Diagnosis | FISH or multiplex RT-PCRFISH or multiplex RT-PCR | 표적 RNA-seq*Target RNA-seq* | ||
확인된 융합◈Confirmed Fusion◈ | 추정 융합▣presumed fusion▣ | 변이체variant | |||
CS1CS1 | AMLAML | KMT2A-MLLT3KMT2A-MLLT3 | KMT2A-MLLT3KMT2A-MLLT3 | ||
CS2CS2 | AMLAML | PML-RARAPML-RARA | PML-RARAPML-RARA |
GATA2 p.I379Gfs*85WT1 p.T363Nfs*27 WT1 p.P271Rfs*20 GATA2 p.I379Gfs*85 WT1 p.T363Nfs*27 WT1 p.P271Rfs*20 |
|
CS3CS3 | AMLAML | PML-RARAPML-RARA | PML-RARAPML-RARA | NUP98-TOP2B NUP98-TOP2B | WT1 p.K250Qfs*3 WT1 p.K250Qfs*3 |
CS4CS4 | AMLAML | 음성voice | 음성voice | ||
CS5CS5 | AMLAML | 음성voice | 음성voice | ||
CS6CS6 | AMLAML | 음성voice | 음성voice | ||
CS7CS7 | B-ALL B-ALL | BCR-ABL1BCR-ABL1 | BCR-ABL1BCR-ABL1 | ABL1 p.E255K ABL1 p.E255K | |
CS8CS8 | B-ALL B-ALL | BCR-ABL1BCR-ABL1 | BCR-ABL1BCR-ABL1 | ||
CS9CS9 | B-ALL B-ALL | BCR-ABL1BCR-ABL1 | BCR-ABL1BCR-ABL1 | P2RY8- CRLF2§ P2RY8- CRLF2 § | |
CS10CS10 | B-ALL B-ALL | KMT2A-AFF1KMT2A-AFF1 | KMT2A-AFF1KMT2A-AFF1 | ||
CS11CS11 | B-ALL B-ALL | ETV6-RUNX1 ETV6-RUNX1 | ETV6-RUNX1 ETV6-RUNX1 |
ERG -DYRK1A§ IGH- PAX5§ ERG -DYRK1A § IGH-PAX5 § |
|
CS12CS12 | B-ALL B-ALL | 음성voice | PAX5PAX5 -ARHGAP22-ARHGAP22 | IGH- PAX5§ IGH-PAX5 § | |
CS13CS13 | B-ALL B-ALL | 음성voice | PAX5PAX5 -DACH1-DACH1 | ||
CS14CS14 | B-ALL B-ALL | 음성voice | 음성voice |
IGH- CRLF2§ P2RY8- CRLF2§ IGH- CRLF2 § P2RY8- CRLF2 § |
JAK2 p.R683G JAK2 p.R683G |
CS15CS15 | B-ALL B-ALL | 음성voice | 음성voice | ||
CS16CS16 | T-ALLT-ALL | 음성voice | PICALM-MLLT10PICALM-MLLT10 | ||
CS17CS17 | T-ALLT-ALL | 음성voice | 음성voice | ||
CS18CS18 | T-ALLT-ALL | 음성voice | 음성voice | NUP214-ABL1NUP214-ABL1 §§ | RUNX1 p.R162K RUNX1 p.R162K |
CS19CS19 | T-ALLT-ALL | 음성voice | 음성voice | ||
CS20CS20 | MCLMCL | IGH-CCND1IGH-CCND1 | CCND1CCND1 -IGH-IGH | ||
CS21CS21 | B-CLLB-CLL | NTNT | 음성voice |
IGH- BCL2§ IGH- PAX5§ IGH- BCL2 § IGH-PAX5 § |
|
CS22CS22 | B-CLLB-CLL | NTNT | 음성voice | IGH- BCL2§IGH- PAX5§ IGH- BCL2 § IGH-PAX5 § | |
CS23CS23 | CML, BP (myeloid BP)CML, BP (myeloid BP) | BCR-ABL1BCR-ABL1 |
BCR-ABL1BCR-ABL1
MECOMMECOM -MBNL1-MBNL1 |
ABL1 p.Y253H ABL1 p.V299L ABL1 p.T315I IKZF1 p.S442fs ABL1 p.Y253H ABL1 p.V299L ABL1 p.T315I IKZF1 p.S442fs |
|
CS24CS24 | CML, BP (lymphoid BP)CML, BP (lymphoid BP) | BCR-ABL1BCR-ABL1 | BCR-ABL1BCR-ABL1 PAX5PAX5 -MLLT3-MLLT3 |
ABL1 p.M244V ABL1 p.E255V ABL1 p.M244V ABL1 p.E255V |
|
CS25CS25 | CML, CPCML, CP | BCR-ABL1BCR-ABL1 | BCR-ABL1BCR-ABL1 | ||
CS26CS26 | PVPV | 음성voice | 음성voice | JAK2 p.V617F JAK2 p.V617F | |
CS27CS27 | PVPV | 음성voice | 음성voice | JAK2 p.V617F JAK2 p.V617F | |
CS28CS28 | PMFPMF | NTNT | 음성voice | JAK2 p.V617F JAK2 p.V617F | |
CS29CS29 | MDS/MPN-UMDS/MPN-U | 음성voice | 음성voice | ||
CS30CS30 | MLN with PDGFRB 재배열MLN with PDGFRB rearrangement | PDGFRB 유전자 재배열 PDGFRB gene rearrangement | CCDC6-PDGFRBCCDC6-PDGFRB |
* 표적 RNA-seq에서 검출된 모든 융합 및 변이체는 임상적 중요성을 결정하기 위해 증거 수준에 따라 등급 시스템에 의해 분류되었으며, 1 단계 및 2 단계이상만 선택됨.* All fusions and variants detected in target RNA-seq were sorted by a grading system according to level of evidence to determine clinical significance, and only steps 1 and 2 or higher were selected.
◈ 표적 RNA-seq에서 검출되고, 다중 RT-PCR, FISH 또는 직접 시퀀싱으로 확인된 유전자 융합. ◈ Gene fusions detected by target RNA-seq and confirmed by multiplex RT-PCR, FISH or direct sequencing.
▣ 표적 RNA-seq에서 검출되었지만 다른 다중 RT-PCR, FISH 또는 직접 시퀀싱 분석에 의해 확인되지 않은 유전자 융합. ▣ Gene fusions detected by target RNA-seq but not confirmed by other multiplex RT-PCR, FISH, or direct sequencing analyses.
§ STAR-Fusion 알고리즘에 의해 최종 결과에서 필터링된 융합.§ Fusion filtered in the final result by STAR-Fusion algorithm.
- 융합에서 과발현된 파트너 유전자는 굵게 표시(bolded).- Partner genes overexpressed in the fusion are bolded.
- FISH(형광 in situ hybridization); RT-PCR(역전사 효소-PCR); RNA-seq(RNA 시퀀싱); CS(임상 샘플); AML(급성 골수성 백혈병); NOS(달리 지정되지 않음); B-ALL(B-림프구성 백혈병/림프종); T-ALL(T-림프구성 백혈병/림프종); MCL(외투세포 림프종); B-CLL(B 세포형, 만성 림프구성 백혈병); CML (만성 골수성 백혈병); BP(폭발 단계); CP(만성기); PV(적혈구 증가증); PMF(원발성 골수 섬유화증); MDS/MPN-U(골수 이형성/골수 증식성 종양-분류 불가); MLN(골수성/림프성 종양); NT(테스트되지 않음).- FISH (fluorescence in situ hybridization); RT-PCR (reverse transcriptase-PCR); RNA-seq (RNA sequencing); CS (clinical sample); AML (acute myeloid leukemia); NOS (not otherwise specified); B-ALL (B-lymphocytic leukemia/lymphoma); T-ALL (T-lymphocytic leukemia/lymphoma); mantle cell lymphoma (MCL); B-CLL (B cell type, chronic lymphocytic leukemia); CML (chronic myeloid leukemia); BP (explosive phase); CP (chronic stage); PV (erythrocytosis); PMF (primary myelofibrosis); MDS/MPN-U (myelodysplastic/myeloproliferative tumor-not classifiable); MLN (myeloid/lymphoid tumor); NT (not tested).
계층화된 등급 시스템을 사용하여 5 개의 융합 전사체가 표적 RNA-seq에서 1단계 또는 2단계 융합으로 새로 검출되었으며, 이들의 절단점은 모두 직접 시퀀싱에 의해 확인되었다. 이러한 5개의 추가 융합에는 2개의 B-ALL 샘플 내 PAX5-ARHGAP22 및 PAX5-DACH1, 1개의 T-ALL 샘플 내 PICALM-MLLT10, 2 개의 CML-BP 샘플 내 MECOM-MBNL1 및 PAX5-MLLT3가 포함되었다. 이러한 추가 융합 중 ARHGAP22, DACH1 및 MBNL1은 비-표적 유전자였으며 표적 프로브에서 융합 파트너와 부분 혼성화를 통해 지정될 수 있다. 또한 질병과 관련된 12 개의 추정 융합을 발견했지만 직접 시퀀싱으로는 확인할 수 없었다.Five fusion transcripts were newly detected as one- or two-step fusions in target RNA-seq using a stratified grading system, and their cleavage points were all confirmed by direct sequencing. These five additional fusions included PAX5-ARHGAP22 and PAX5-DACH1 in two B-ALL samples, PICALM-MLLT10 in one T-ALL sample, MECOM-MBNL1 and PAX5-MLLT3 in two CML-BP samples. Among these additional fusions, ARHGAP22, DACH1 and MBNL1 were non-target genes and can be assigned via partial hybridization with fusion partners in the target probe. We also found 12 putative fusions associated with disease, but could not be confirmed by direct sequencing.
대부분의 추정 융합 (12개 융합 중 10개)은 파트너 유전자의 발현을 증가시키는 것으로 나타났으며, 그 중 7개 추정 융합이 IGH 재배열 (4개의 IGH-PAX5, 1개의 IGH-CRLF2 및 2개의 IGH-BCL2)으로 예측되었다.Most putative fusions (10 out of 12 fusions) were shown to increase expression of partner genes, of which 7 putative fusions were IGH rearrangements (4 IGH-PAX5, 1 IGH-CRLF2 and 2 IGH-BCL2).
나머지 2개의 추정 융합은 질병 관련 프레임 내 융합 (AML 샘플에서 하나의 NUP98-TOP2B, T-ALL 샘플에서 하나의 NUP214-ABL1)으로 예측되었으며, 낮은 발현으로 인해 직접 시퀀싱으로는 검출할 수 없었다.The remaining two putative fusions were predicted as disease-associated in-frame fusions (one NUP98-TOP2B in the AML sample, one NUP214-ABL1 in the T-ALL sample) and could not be detected by direct sequencing due to low expression.
(3) 임상 샘플에서의 변이 검출(3) detection of mutations in clinical samples
더 나아가, 표적화된 RNA-seq는 10 개의 샘플의 발현된 전사체에서 16개의 변이체 (tier 1 또는 2)를 식별하였다 (표 6). 2 건의 AML 사례에서 GATA2와 WT1 내 4개의 프레임 이동 돌연변이가 발견되었다 (임상 샘플 [CS] 2-3). 하나의 B-ALL 및 두 개의 CML-BP (CS7 및 CS23-24) 샘플의 세 가지 경우에서, ABL1의 M244V, Y253H, E255K/V, V299L 및 T315I 돌연변이가 표적 RNA-seq에서 지정되었으며, 이는 티로신 키나제 억제제 (TKI) 저항성과 관련되어 있다. JAK2 R683G, RUNX1 R162K 및 IKZF1 S442fs 돌연변이를 포함한 질병 관련 변이체는 각각 하나의 B-ALL, 하나의 T-ALL 및 하나의 CML-BP 사례 (CS14, CS18 및 CS23)에서 검출되었다.Furthermore, targeted RNA-seq identified 16 variants (tier 1 or 2) in the expressed transcripts of 10 samples (Table 6). Four frame-shifting mutations in GATA2 and WT1 were found in two AML cases (clinical sample [CS] 2-3). In three cases of one B-ALL and two CML-BP (CS7 and CS23-24) samples, the M244V, Y253H, E255K/V, V299L and T315I mutations of ABL1 were assigned in the target RNA-seq, indicating that tyrosine It is associated with kinase inhibitor (TKI) resistance. Disease-associated variants, including JAK2 R683G, RUNX1 R162K and IKZF1 S442fs mutations, were detected in one B-ALL, one T-ALL and one CML-BP case (CS14, CS18 and CS23), respectively.
2개의 적혈구 증가증과 1개의 원발성 골수 섬유증 샘플을 포함하는 3개의 BCR-ABL1- 음성 MPN 샘플에서 3개의 JAK2 V617F 돌연변이가 지정되었다 (CS26-28). 이 변이체 중 15개의 변이체가 있는 모든 이용가능한 사례는 DNA 기반-NGS 시퀀싱 또는 실시간 PCR로 확인되었다.Three JAK2 V617F mutations were assigned in three BCR-ABL1-negative MPN samples, including two erythrocytosis and one primary myelofibrosis sample (CS26-28). All available cases with 15 of these variants were confirmed by DNA-based-NGS sequencing or real-time PCR.
(4) 임상 샘플의 발현 분석(4) Expression analysis of clinical samples
도 3은 30 개의 혈액 악성 종양 사례와 3 개의 정상 대조군의 계층적 클러스터링을 보여주는 히트맵(heatmap)을 나타내었다. 계층적 클러스터링은 발현 데이터의 기본 구조에 따라 4개의 하위 트리를 생성하였다. 여기에는 1번 군집(첫 번째 T-ALL 및 AML), 2번 군집 (B-세포 백혈병 및 림프종), 3번 군집(두 번째 T-ALL 및 AML), 4번 군집(MPN, 기타 골수성 신경종 및 정상 대조군)의 4개의 군집이 포함된다. B-ALL의 한 사례를 제외하고, 클러스터링은 악성 세포의 암 아형 및 계보와 일치하는 신뢰할 수 있는 분할을 보여주었다.3 shows a heatmap showing hierarchical clustering of 30 hematological malignancies and 3 normal controls. Hierarchical clustering generated four subtrees according to the basic structure of the expression data. These include cluster 1 (T-ALL 1 and AML), cluster 2 (B-cell leukemia and lymphoma), cluster 3 (2 T-ALL and AML), cluster 4 (MPN, other myeloid neuromas and 4 populations of normal controls) are included. With the exception of one case of B-ALL, clustering showed reliable divisions consistent with cancer subtypes and lineages of malignant cells.
(5) 결과 해석(5) Interpretation of results
본 발명은 다른 혈액 악성 종양과 관련된 84개의 유전자를 대상으로 이전 문헌의 기초뿐만 아니라 데이터를 고려하여 임상적으로 적용가능한 표적 RNA-seq 시스템을 개발하고 검증하였다. 본 발명의 플랫폼은 분석 검증에서 안정적인 성능을 보였으며, 알려진 유전자와 새로운 유전자 융합을 효율적으로 검출하였다. 또한 표적 RNA-seq 시스템은 혈액 악성 종양 환자의 30개 임상 샘플을 사용하여 발현 특징뿐만 아니라 임상적으로 유의한 서열 변이를 검출하는 더 나은 적용 가능성을 보여주었다.The present invention developed and validated a clinically applicable target RNA-seq system for 84 genes related to other hematological malignancies by considering the data as well as the basis of the previous literature. The platform of the present invention showed stable performance in assay validation, and efficiently detected a known gene and a new gene fusion. In addition, the targeted RNA-seq system showed better applicability to detect clinically significant sequence variants as well as expression features using 30 clinical samples from patients with hematological malignancies.
분석 검증과 관련하여 표적 RNA-seq는 표적 유전자의 범위, 실행간 및 실행 내 반복성 및 선형성 테스트에서 신뢰할 수 있는 성능을 보였다. 그러나 테스트에서 미량 수준의 캐리오버 융합이 관찰되었다. 이에 대한 타당한 설명은 인덱스-홉핑(index-hopping) 또는 인덱스-스와핑(index-swapping)일 수 있으며, 이는 최근 일루미나 플랫폼 내 클러스터 증폭 도중 잔여 프라이머 또는 어댑터로 인한 시퀀싱 리드의 잘못된 할당으로 보고되었다. 모든 융합이 의심스러운 인덱스-홉핑 융합으로 감지된 것은 아니지만, 풀링에서 상위 히트(평균 FFPM=105.7)를 갖는 융합 전사체는 다른 샘플의 결과에서 잘못 검출되었다. 의심스러운 인덱스-홉핑 융합은 진정 융합(p<0.001)보다 리드 수가 훨씬 낮으므로, 낮은 지원 리드 수를 기반으로 필터링되었다. 이러한 맥락에서, 임상 환경 내 일루미나 플랫폼에 의해 생성된 RNA-seq 데이터는 동일한 풀링에서 다른 샘플의 상위-히트 융합과 정확히 동일한 중단점 및 서열을 보여주는 미량 수준의 전사체 리드에 대해 신중하게 해석되어야 하며, 환자의 임상 및 병리학적 징후와 불일치해야 한다.With regard to assay validation, target RNA-seq showed reliable performance in testing the range of target genes, repeatability between runs and within runs, and linearity. However, trace levels of carryover fusion were observed in the test. A valid explanation for this may be index-hopping or index-swapping, which was recently reported as an incorrect assignment of sequencing reads due to residual primers or adapters during cluster amplification in the Illumina platform. Although not all fusions were detected as questionable index-hopping fusions, fusion transcripts with top hits in the pool (mean FFPM=105.7) were falsely detected in the results of other samples. Suspicious index-hopping fusions have much lower read counts than true fusions (p<0.001), so they were filtered out based on low supporting read counts. In this context, RNA-seq data generated by the Illumina platform in a clinical setting should be interpreted with caution for trace-level transcriptome reads that show exactly the same breakpoints and sequences as top-hit fusions of other samples in the same pool. , should be inconsistent with the patient's clinical and pathological signs.
지금까지 NGS 방법을 사용한 종양유발(oncogenic) 융합 검출의 성능은 상당히 향상되었다. 이는 대부분 짧은 리드(short reads) 정렬을 위한 강력한 바이오인포마틱스 도구뿐만 아니라 거짓 양성(false-positive) 융합을 배제하기 위한 다층 필터링 전략 때문이다. 본 발명에서는 거짓 양성 호출을 제거하기 위해 단계적 필터링 전략을 사용했다.So far, the performance of oncogenic fusion detection using the NGS method has been significantly improved. This is largely due to a powerful bioinformatics tool for sorting short reads as well as a multi-layer filtering strategy to rule out false-positive fusions. In the present invention, a stepwise filtering strategy is used to remove false positive calls.
STAR-Fusion 및 FusionCatcher에 의해 각각 예측된 1,243 개 및 3,363 개의 융합 후보 중 83 개 (6.7 %) 및 477 개 (14.2 %)의 융합 전사체가 4 개의 필터링 단계 후 먼저 고려할 진정 양성 융합으로 선택되었다. 특히, 융합의 발암 기능(oncogenic fuction)을 고려하여, 대부분 짧은 반복, 유사 유전자, 리드-스루(read-through) 또는 건강한 집단에서 발견된 후보군은 추가 평가에서 제거되었으며, 반면 파트너 유전자의 비정상적 발현을 유발하는 융합 및 프레임 내 융합은 포함되었다. 그런 다음 우선 순위가 부여된 임상 증거에 따라 융합의 등급을 매기기 전, 많은 융합들이 혈액 악성 종양에서 증명되지 않았기 때문에 모든 융합들이 연구 환경에서와 달리 임상적으로 보고하기에 충분하지 않았다. 이러한 문제를 해결하기 위해 증거 수준에 따라 이상(aberrations)을 분류하여 그 중요성을 결정하는 계층적 등급 시스템을 채택했다. 계층적 등급 시스템에 따라 융합의 우선 순위를 지정함으로써 융합 전사체의 수는 STAR-Fusion 및 FusionCatcher 결과에서 각각 40 (3.2 %) 및 211 (6.3 %)로 좁혀져 임상적 중요성을 갖는 보고할만한 융합을 나타냈다. 마찬가지로, 적절한 필터링 및 우선 순위 지정 전략은 임상 환경에서 RNA-seq 데이터를 관리하는 데 필수적이다.Of the 1,243 and 3,363 fusion candidates predicted by STAR-Fusion and FusionCatcher, respectively, 83 (6.7%) and 477 (14.2%) fusion transcripts were selected as true positive fusions for consideration first after 4 filtering steps. In particular, considering the oncogenic function of the fusion, candidates mostly found in short repeats, pseudogenes, read-through or healthy populations were removed from further evaluation, whereas aberrant expression of partner genes was removed. Induced fusions and in-frame fusions were included. Then, before grading fusions according to prioritized clinical evidence, not all fusions were sufficient to report clinically, unlike in the study setting, as many fusions had not been demonstrated in hematological malignancies. To solve this problem, we adopted a hierarchical rating system that classifies aberrations according to the level of evidence and determines their significance. By prioritizing fusions according to a hierarchical grading system, the number of fusion transcripts was narrowed down to 40 (3.2%) and 211 (6.3%) in STAR-Fusion and FusionCatcher results, respectively, indicating reportable fusions of clinical significance. . Likewise, proper filtering and prioritization strategies are essential for managing RNA-seq data in a clinical setting.
최종적으로, 30개의 임상 샘플의 표적 RNA-seq 결과로 18개의 확인된 융합과 12개의 추정 융합이 큐레이팅되었다. 18개의 확인된 융합 중 5개는 임상 실험실에서 일반적으로 사용되는 이전 FISH 또는 다중 RT-PCR 테스트에서는 알려지지 않은 융합이었다. 5개의 융합 중 3개는 프로브-혼성화 방법을 사용하여 파트너 유전자 중 하나만 표적으로 하여 확인되었다. 유사하게, 표적 RNA-seq는 알려진 PDGFRB 재배열의 한 경우에서 파트너 유전자를 비-표적 CCDC6로 지정했다. 이 네 가지 경우에서 볼 수 있듯이, 혼성 포획 방법은 관심있는 전사체와 함께 인접 측면 영역을 분리할 수 있기 때문에 융합 검출에 대한 진단율을 효율적으로 향상시킬 수 있다. 이러한 표적화 방법의 장점은 올리고 뉴클레오티드 프로브 세트의 비용을 줄이고 특히 여러 파트너 유전자를 가지는 CRLF2-, ETV6-, KMT2A-, NUP98-, PAX5- 및 PDGFRA/B 융합과 같은 경우에만 하나의 파트너에 대한 프로브를 사용하여 키메라 전사체를 쉽게 검출할 수 있다.Finally, 18 identified fusions and 12 putative fusions were curated with target RNA-seq results of 30 clinical samples. Five of the 18 identified fusions were fusions unknown in previous FISH or multiple RT-PCR tests commonly used in clinical laboratories. Three of the five fusions were identified by targeting only one of the partner genes using a probe-hybridization method. Similarly, target RNA-seq designated the partner gene as non-target CCDC6 in one case of known PDGFRB rearrangements. As can be seen in these four cases, the hybrid capture method can efficiently improve the diagnostic rate for fusion detection because it can isolate adjacent flanking regions with the transcript of interest. The advantage of this targeting method is that it reduces the cost of oligonucleotide probe sets and probes for only one partner, especially for CRLF2-, ETV6-, KMT2A-, NUP98-, PAX5- and PDGFRA/B fusions with multiple partner genes. can be used to easily detect chimeric transcripts.
추정 융합이 실험적으로 확인될 수는 없었지만, 10개의 추정 융합에서 파트너 유전자의 대리(surrogate) 과발현은 두 유전 영역 사이의 재배열을 나타내는 것으로 보인다. 이러한 경우의 대부분은 인트론의 중단점이있는 IGH 재배열이었으며 FusionCatcher 알고리즘에서 소량의 잔여 DNA 분획이 검출될 가능성이 있었다. STAR-Fusion 알고리즘에서 이러한 IGH 재배열은 예비 파일에서 식별되었지만 최종 결과에서는 필터링되었다. 따라서 표적 RNA-seq를 융합 분석으로만 사용하는 것은 DNA 수준의 재배열을 직접 검출하기에는 부족했지만, 발현 분석과 함께 보완할 수 있었다. 이는 향상된 발현을 가진 표적 RNA-seq의 추정 결과가 임상 환경에서 FISH와 같은 추가 검사를 안내할 수 있음을 시사한다.Although the putative fusion could not be confirmed experimentally, surrogate overexpression of the partner gene in the ten putative fusions appears to indicate a rearrangement between the two genetic regions. Most of these cases were IGH rearrangements with breakpoints in introns, and it was likely that a small residual DNA fraction could be detected by the FusionCatcher algorithm. In the STAR-Fusion algorithm, these IGH rearrangements were identified in the preliminary file, but filtered in the final result. Therefore, the use of target RNA-seq only as a fusion assay was insufficient to directly detect DNA-level rearrangements, but it could be supplemented with expression analysis. This suggests that the putative results of target RNA-seq with enhanced expression may guide further tests such as FISH in clinical settings.
유전체 변이체의 식별은 주로 NGS 기술에 의한 DNA 기반 서열분석에 의존하는 반면, RNA-seq를 사용하는 것은 유전체 위치에서 여러 엑손에 걸쳐있는 전사체의 고유한 복잡성으로 인해 어려워서다. 이 장애물은 HISAT2, TopHat 및 STAR10-12와 같은 스플라이스-인식 매퍼(mapper)를 사용하여 극복되었지만 소수의 RNA-seq 연구만이 임상 진단 환경에서 변이 검출을 조사하였다. 본 발명에서는 임상적 중요성을 갖는 RNA-seq 데이터에서 흥미로운 변이를 확인할 수 있었다. 무엇보다도 B-ALL 및 CML-BP 환자에서 TKI 내성 및 BCR-ABL1 융합과 관련된 ABL1 돌연변이의 동시 검출은 보다 빠른 진단 및 치료 결정을 가능하게 하는 표적 RNA-seq의 이점을 보여주었다.While the identification of genomic variants relies primarily on DNA-based sequencing by NGS techniques, the use of RNA-seq is difficult due to the inherent complexity of transcripts spanning multiple exons at genomic locations. Although this hurdle has been overcome using splice-recognition mappers such as HISAT2, TopHat and STAR10-12, only a few RNA-seq studies have investigated the detection of variants in clinical diagnostic settings. In the present invention, interesting mutations were identified in RNA-seq data having clinical significance. Among other things, the simultaneous detection of ABL1 mutations associated with TKI resistance and BCR-ABL1 fusion in B-ALL and CML-BP patients demonstrated the advantage of targeted RNA-seq to enable faster diagnostic and therapeutic decisions.
또한 3 개의 MPN 사례에서 BCR-ABL1 융합의 음성 결과와 JAK2 V617F 돌연변이의 양성 결과를 동시에 확인할 수 있었다. 2 명의 AML, 2 명의 ALL 및 1 명의 CML-BP 환자에서 예후 및 진단과 관련된 다른 변이도 발견되었다. 따라서 본 발명의 연구 결과는 신뢰할 수 있는 전산 바이오인포마틱스 도구(computational bioinformatics tools)와 결합된 표적 RNA-seq의 사용이 추가 DNA 기반 서열분석을 병렬적으로 수행할 필요 없이 진단 단계를 단순화할 수 있음을 보여준다.In addition, negative results of BCR-ABL1 fusion and positive results of JAK2 V617F mutation were simultaneously confirmed in three MPN cases. Other prognostic and diagnostic variants were also found in 2 AML, 2 ALL and 1 CML-BP patients. Therefore, the study results of the present invention show that the use of target RNA-seq combined with reliable computational bioinformatics tools can simplify the diagnostic step without the need to perform additional DNA-based sequencing in parallel. show that there is
표적 RNA-seq 데이터는 또한 지난 20년간 마이크로어레이 기술로 수행한 것과 유사한 발현 분석을 통해 분자적 특징을 측정할 수 있다. 상기 언급하였듯이, 발현 데이터는 유전자 융합으로 인한 드라이버 이벤트를 규명하는 데 도움이 될 수 있다. 표적 RNA-seq의 30개 융합 중 6개와 9개는 각각 5' 및 3' 파트너 유전자의 과발현을 나타냈다. 이러한 결과는 일부 발암성 융합의 구조적 및 기능적 메커니즘에 의해 뒷받침될 수 있다. 조절 장애를 일으키는 발암성 융합에서 5' 또는 3' 융합 파트너는 각각 매우 안정적인 UTR 영역을 가진 3' 파트너의 기여 또는 5' 파트너 유전자의 조절 요소에 의해 과발현된다.Targeted RNA-seq data can also measure molecular characteristics through expression analysis similar to what has been done with microarray technology over the past two decades. As mentioned above, expression data can help characterize driver events due to gene fusion. Of the 30 fusions of target RNA-seq, 6 and 9 showed overexpression of 5' and 3' partner genes, respectively. These results may be supported by some structural and functional mechanisms of oncogenic fusion. In oncogenic fusions causing dysregulation, the 5' or 3' fusion partner is overexpressed either by the contribution of the 3' partner with a highly stable UTR region, respectively, or by regulatory elements of the 5' partner gene.
또한, 발현 데이터의 후속 분석은 암 아형 및 세포 계통에 따라 임상 샘플의 뚜렷한 클러스터링을 보여주었다. 분류는 하위 유형 식별자로 몇 가지 대표적인 분자적 특징 (예: MPN의 MECOM 및 EPOR 과발현 및 B-ALL의 EBF1, PAX5 및 TCL1A 과발현)에 기초하였다. 이것은 모호한 경우의 하위 유형 또는 계통 분류의 진단 또는 새로운 질병 하위 유형의 추가 발견을 지원할 수 있다.In addition, subsequent analysis of expression data revealed distinct clustering of clinical samples according to cancer subtypes and cell lineages. Classification was based on several representative molecular features by subtype identifiers (eg, MECOM and EPOR overexpression in MPN and EBF1, PAX5 and TCL1A overexpression in B-ALL). This may support the diagnosis of subtypes or phylogenies in ambiguous cases or further discovery of new disease subtypes.
실시예 2. 본 발명의 차세대 염기서열분석 패널을 이용한, 백혈병 환자에서의 유전자 변이 분석Example 2. Analysis of genetic mutations in leukemia patients using the next-generation sequencing panel of the present invention
백혈병으로 확진된 93명(acute myeloid leukemia 15명, adult B-acute lymphoid leukemia 35명, childhood B-acute leukemia 30명 및 T-acute lymphoid leukemia 13명)의 진단시 골수 검체를 이용하였다. 전체 93명 백혈병 환자에서 유전자 융합 변이는 72명(77%)의 환자에서 tier 1이나 tier 2 유전변이가 관찰되었다. 소아 B-acute lymphoid leukemia (B-ALL)에서 유전자융합변이는 대상 환아의 83% (25/30)에서 검출되었고, 성인 B-ALL의 경우 94% (33/35)에서 유전자 융합 변이가 관찰되었다. Acute myeloid leukemia (AML) 와 T-acute lymphoid leukemia (T-ALL) 환자에서 융합유전자 변이는 각각 53% (8/15) 및 46% (6/13) 였다(도 4 및 5 참조).Bone marrow specimens were used for diagnosis of 93 confirmed leukemia patients (15 acute myeloid leukemia, 35 adult B-acute lymphoid leukemia, 30 childhood B-acute leukemia, and 13 T-acute lymphoid leukemia). In all 93 leukemia patients, tier 1 or tier 2 genetic mutations were observed in 72 (77%) of the gene fusion mutations. In pediatric B-acute lymphoid leukemia (B-ALL), gene fusion mutation was detected in 83% (25/30) of the target children, and in adult B-ALL, gene fusion mutation was observed in 94% (33/35). . In patients with acute myeloid leukemia (AML) and T-acute lymphoid leukemia (T-ALL), fusion gene mutations were 53% (8/15) and 46% (6/13), respectively (see FIGS. 4 and 5).
실시예 3. 본 발명의 차세대 염기서열분석 패널과 기존 패널의 비교평가Example 3. Comparative evaluation of the next-generation sequencing panel of the present invention and the existing panel
Anchored multiplex PCR을 이용하여 특정 유전자들에 대한 cDNA library 구축하는 방법에 기반하여 개발된 상품화된 targeted RNAseq 분석 시스템(Engvall M, Cahill N, Jonsson BI, Hoglund M, Hallbook H, Cavelier L: Detection of leukemia gene fusions by targeted RNA-sequencing in routine diagnostics. BMC Med Genomics 2020, 13:106.)과 본 발명의 차세대 염기서열분석 패널을 이용한 분석시스템을 비교평가 하였다. 이 비교평가를 위해 B-ALL 환자 샘플 1개, AML 환자 샘플 2 개 그리고 급성 전골수성 백혈병(acute promyelocytic leukemia) 및 T-ALL의 경우 각각 1개 환자 샘플을 이용하였다. 그 결과 IGH-CRLF2 유전자 융합이 동반된 B-ALL 환자 샘플에서는 본 발명의 차세대 염기서열분석 패널을 이용한 분석에서만 IGH-CRLF2 유전자 융합이 검출이 되었다.A commercialized targeted RNAseq analysis system (Engvall M, Cahill N, Jonsson BI, Hoglund M, Hallbook H, Cavelier L: Detection of leukemia gene) developed based on the method of constructing cDNA library for specific genes using anchored multiplex PCR. Fusions by targeted RNA-sequencing in routine diagnostics. BMC Med Genomics 2020, 13:106.) and the analysis system using the next-generation sequencing panel of the present invention were compared and evaluated. For this comparative evaluation, one B-ALL patient sample, two AML patient samples, and one patient sample each for acute promyelocytic leukemia and T-ALL were used. As a result, in the B-ALL patient sample accompanied by the IGH-CRLF2 gene fusion, the IGH-CRLF2 gene fusion was detected only by analysis using the next-generation sequencing panel of the present invention.
philadelphia chromosome-like acute lymphoblastic leukemia(Ph-like ALL)는 B-ALL의 20-25%에서 발견되며, 이의 원인유전자로 CRLF2 유전자와 연관된 융합유전자변이가 61% 정도로 알려 져 있다. Ph-like B-ALL은 매우 불량한 예후를 보여 ALL의 새로운 아형으로 분류되었다. 따라서 ALL 특히 B-ALL에서 유전자분석을 통한 Ph-like B-ALL 진단은 정밀의료 관점에서 백혈병을 치료하는데 매우 중요하다. 따라서 본 발명의 차세대 염기서열분석 패널은 일부 임상검사현장에서 사용중인 anchored multiplex PCR을 이용한 cDNA library 기반 targeted RNAseq 보다 Ph-like B-ALL을 검출하는데 매우 유용함을 알 수 있다(도 6). 도 6에서 Our targeted RNA seq system은 본 발명의 84개 유전자를 표적으로 하는 차세대 염기서열분석 패널을 이용한 경우, Commercial targeted RNA seq system은 기존에 상품화된 targeted RNAseq 분석 시스템을 이용한 경우 검출된 유전자 융합을 나타낸다.Philadelphia chromosome-like acute lymphoblastic leukemia (Ph-like ALL) is found in 20-25% of B-ALL, and it is known that about 61% of fusion gene mutations related to the CRLF2 gene are causative genes. Ph-like B-ALL showed a very poor prognosis and was classified as a new subtype of ALL. Therefore, diagnosis of Ph-like B-ALL through gene analysis in ALL, especially B-ALL, is very important for treating leukemia from the point of view of precision medicine. Therefore, it can be seen that the next-generation sequencing panel of the present invention is very useful for detecting Ph-like B-ALL rather than cDNA library-based targeted RNAseq using anchored multiplex PCR used in some clinical test sites (FIG. 6). In Figure 6, Our targeted RNA seq system uses a next-generation sequencing panel that targets 84 genes of the present invention, and the Commercial targeted RNA seq system analyzes gene fusions detected when using the previously commercialized targeted RNAseq analysis system. indicates.
Claims (14)
- PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC에 특이적으로 결합하는 프로브를 포함하는 백혈병 진단용 차세대 염기서열분석 패널.A next-generation sequencing panel for diagnosing leukemia comprising a probe that specifically binds to PHB, PHB2, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB and MYC.
- 청구항 1에 있어서, AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3, MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA 및 WT1로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.AFF1, BAALC, BCL2, BCL6, BCR, CBFB, CRBN, CREBBP, DEK, DUSP22, EBF1, FGFR3, FIP1L1, FUS, GATA2, GUSB, IKZF1, IL3, KMT2A, MECOM, MEF2D, MLF1, MLLT3 , MRTFA, MYH11, NUP98, PCM1, PICALM, PML, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TRA, and a next-generation nucleotide sequence for diagnosis of leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of WT1 analysis panel.
- 청구항 1에 있어서, AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1, KMT2A, MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 및 ZNF384로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.AFF1, ALK, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CREBBP, DEK, DUSP22, EP300, ERG, FGFR3, FIP1L1, HBS1L, HPRT1, IGK, IGL, IKZF1 , MAF, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, PAX5, PBX1, PCM1, PPIA, RAB7A, TCF3 and ZNF384 Next-generation nucleotide sequence for diagnosing leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of analysis panel.
- 청구항 1에 있어서, ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214, PCM1, TBP, TCL1A, TRB, TRG 및 TYK2로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.ALK, BCL2, BCL6, BCL9, BCR, CBFB, DEK, DUSP22, FGFR3, HPRT1, IKZF1, IL2RB, IRF4, KMT2A, MAF, MAFA, MEF2D, MLLT10, MLLT3, NSD2, NTRK3, NUP214, PCM , TBP, TCL1A, TRB, TRG, and a next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group consisting of TYK2.
- 청구항 1에 있어서, BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 및 ZNF384로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.2. The composition of claim 1, consisting of BCL6, BCL9, CCND1, CCND2, CCND3, CREBBP, TP63, DEK, DUSP22, FGFR3, IGK, IGL, IKZF1, KMT2A, NTRK3, PAX5, PBX1, PPIA, RAB7A, TCF3, TP63 and ZNF384. A next-generation sequencing panel for leukemia diagnosis further comprising a probe that specifically binds to at least one selected from the group.
- 청구항 1에 있어서, AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 및 TP63으로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.The method according to claim 1, which specifically binds to at least one selected from the group consisting of AFF1, BCL2, BCL6, BCR, CBFB, CRBN, CREBB, DEK, FGFR3, GAPDH, GUSB, IKZF1, MAF, MAFB, PSMB2 and TP63. Next-generation sequencing panel for leukemia diagnosis further comprising a probe.
- 청구항 1에 있어서, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB 및 PCM1로 이루어진 군에서 선택되는 적어도 하나에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.The method according to claim 1, AFF1, BCR, CBFB, CRBN, CREBBP, DEK, FGFR3, GATA2, IKZF1, MAFA, MAFB and the next generation for diagnosis of leukemia further comprising a probe that specifically binds to at least one selected from the group consisting of PCM1 sequencing panel.
- 청구항 1에 있어서, PDGFRA에 특이적으로 결합하는 프로브를 더 포함하는 백혈병 진단용 차세대 염기서열분석 패널.The next-generation sequencing panel for leukemia diagnosis according to claim 1, further comprising a probe that specifically binds to PDGFRA.
- ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, FGFR1, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 및 ZNF384에 특이적으로 결합하는 프로브를 포함하는 백혈병 진단용 차세대 염기서열분석 패널.ABL1, ABL2, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, CRLF2, CSF1R, DEK, DUSP22, EBF1, EP300, EPOR, ERG, ETV6, FG, ETV6 FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGH, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, JAK2, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT3, MLLT10 MRTFA, MYC, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PDGFRB, PHB, PHB2, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, SDHA A next-generation sequencing panel for leukemia diagnosis, comprising a probe that specifically binds to TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384.
- 청구항 1 내지 9 중 어느 하나의 염기서열분석 패널로 표적 포획 혼성화하여 타겟 유전자를 선별하고 시퀀싱하여 리드 데이터를 얻는 단계;Obtaining read data by selecting and sequencing a target gene by target capture hybridization with the sequencing panel of any one of claims 1 to 9;상기 리드 데이터로부터 PHB 및 PHB2의 과발현 여부를 확인하는 단계; 및checking whether PHB and PHB2 are overexpressed from the read data; and상기 리드 데이터로부터 IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB 및 MYC로 이루어진 군에서 선택되는 어느 하나의 유전자가 포함된 융합을 검출하는 단계를 포함하는, 백혈병 진단을 위한 정보제공 방법.From the read data, IGH, ABL1, ABL2, CRLF2, CSF1R, EPOR, ETV6, FGFR1, JAK2, PDGFRB, and any one gene selected from the group consisting of MYC comprising the step of detecting a fusion containing any one gene, leukemia diagnosis How to provide information for
- 청구항 10에 있어서, 상기 과발현 여부는 상기 리드 데이터를 HISAT2로 참조 서열과 정렬하여 SAM/BAM 데이터를 얻고, StringTie로 각 유전자의 발현을 계산하여 얻은 GTF 데이터를 DESeq2로 정규화하여 수행되는 것인, 백혈병 진단을 위한 정보제공 방법.The method according to claim 10, wherein whether the overexpression is performed by aligning the read data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2, leukemia How to provide information for diagnosis.
- 청구항 10에 있어서, 상기 유전자 융합 검출은 상기 리드 데이터를 Bowtie, STAR, Blat 또는 Bowtie2로 참조 서열과 정렬하여 STAR-Fusion 또는 Fusion Catcher 융합 유전자 확인 툴로 융합을 검출하는 단계를 포함하여 수행되는 것인, 백혈병 진단을 위한 정보제공 방법.11. The method of claim 10, wherein the gene fusion detection is performed by aligning the read data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a STAR-Fusion or Fusion Catcher fusion gene identification tool. An informational method for diagnosing leukemia.
- 청구항 10에 있어서, AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, DEK, DUSP22, EBF1, EP300, ERG, FGFR3, FIP1L1, FUS, GAPDH, GATA2, GUSB, HBS1L, HPRT1, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYH11, NSD2, NTRK3, NUP214, NUP98, PAX5, PBX1, PCM1, PDGFRA, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 및 ZNF384로 이루어진 군에서 선택되는 어느 하나의 유전자의 과발현, 융합 또는 변이 여부를 확인하는 단계를 더 포함하는, 백혈병 진단을 위한 정보제공 방법.11. AFF1, ALK, BAALC, BCL2, BCL6, BCL9, BCR, CBFB, CCND1, CCND2, CCND3, CRBN, CREBBP, DEK, DUSP22, EBF1, EP300, ERG, FGFR3, FIP1L1, FUS, GAPDH, GATA2 , GUSB, HBS1L, HPRT1, IGK, IGL, IKZF1, IL2RB, IL3, IRF4, KMT2A, MAF, MAFA, MAFB, MECOM, MEF2D, MLF1, MLLT10, MLLT3, MRTFA, MYH11, NSD2, NTRK3, PAX214, NUP98 , PBX1, PCM1, PDGFRA, PICALM, PML, PPIA, PSMB2, RAB7A, RARA, RBM15, RUNX1, RUNX1T1, SDHA, TBP, TCF3, TCL1A, TP63, TRA, TRB, TRG, TYK2, WT1 and ZNF384 A method for providing information for leukemia diagnosis, further comprising the step of determining whether any one selected gene is overexpressed, fused or mutated.
- (a) 개체로부터 분리된 RNA로부터 합성한 cDNA를 청구항 1 내지 9 중 어느 한 항의 염기서열분석 패널의 각 프로브에 결합시키고 차세대 염기서열 분석(NGS)을 수행하여 미가공 리드 데이터를 얻는 단계;(a) binding cDNA synthesized from RNA isolated from an individual to each probe of the sequencing panel of any one of claims 1 to 9 and performing next-generation sequencing (NGS) to obtain raw read data;(b) 상기 미가공 리드 데이터를 Q10 이상의 품질 점수를 가진 데이터로 조정하는 단계; (b) adjusting the raw lead data to data having a quality score of Q10 or higher;(c) 상기 조정된 데이터에서 각 유전자의 융합을 검출하는 단계;(c) detecting the fusion of each gene in the adjusted data;(d) 상기 조정된 데이터의 각 유전자에서 참조 서열 대비 변이를 검출하는 단계; 및(d) detecting a mutation compared to a reference sequence in each gene of the adjusted data; and(e) 상기 조정된 데이터로부터 각 유전자의 발현을 확인하는 단계를 포함하고,(e) confirming the expression of each gene from the adjusted data,상기 융합의 검출은 상기 조정된 데이터를 Bowtie, STAR, Blat 또는 Bowtie2로 참조 서열과 정렬하여 융합 유전자 확인 툴 (STAR-Fusion, Fusion Catcher)로 융합을 검출하는 단계를 포함하여 수행되고,The detection of the fusion is performed by aligning the adjusted data with a reference sequence with Bowtie, STAR, Blat or Bowtie2 to detect the fusion with a fusion gene identification tool (STAR-Fusion, Fusion Catcher),상기 변이의 검출은 조정된 데이터의 서열을 STAR로 정렬된 SAM/BAM 데이터를 얻고, Piccard로 상기 BAM 데이터 내의 duplicate를 분류 및 표지하고, 상기 정렬, 분류 및 중복 제거된 BAM 데이터를 Freebayes로 SNV 및 Indel 호출하여 수행되고,The detection of the mutation is to obtain the SAM / BAM data aligned with the STAR sequence of the adjusted data, classify and label the duplicates in the BAM data with Piccard, and the alignment, classification and deduplication BAM data with Freebayes SNV and This is done by calling Indel,상기 유전자의 발현은 조정된 데이터를 HISAT2로 참조 서열과 정렬하여 SAM/BAM 데이터를 얻고, StringTie로 각 유전자의 발현을 계산하여 얻은 GTF 데이터를 DESeq2로 정규화하여 수행되는 것인, 백혈병 진단을 위한 정보제공방법.The expression of the gene is performed by aligning the adjusted data with a reference sequence with HISAT2 to obtain SAM/BAM data, and normalizing the GTF data obtained by calculating the expression of each gene with StringTie with DESeq2. Information for diagnosing leukemia How to provide.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2021-0029703 | 2021-03-05 | ||
KR20210029703 | 2021-03-05 | ||
KR1020220028718A KR20220125708A (en) | 2021-03-05 | 2022-03-07 | Next-generation sequencing-based target gene RNA sequencing panel and analysis algorithm |
KR10-2022-0028718 | 2022-03-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022186673A1 true WO2022186673A1 (en) | 2022-09-09 |
Family
ID=83154297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/003196 WO2022186673A1 (en) | 2021-03-05 | 2022-03-07 | Next-generation-sequencing-based rna sequencing panel for targeted genes, and analysis algorithm |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022186673A1 (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012031008A2 (en) * | 2010-08-31 | 2012-03-08 | The General Hospital Corporation | Cancer-related biological materials in microvesicles |
US20160130648A1 (en) * | 2014-11-12 | 2016-05-12 | Neogenomics Laboratories, Inc. | Deep sequencing of peripheral blood plasma dna as a reliable test for confirming the diagnosis of myelodysplastic syndrome |
KR101845957B1 (en) * | 2016-02-23 | 2018-04-05 | 전남대학교기술지주회사(주) | Kit for diagnosis of leukemia and diagnostic method targeting prohibitin gene |
US20180177771A1 (en) * | 2015-06-29 | 2018-06-28 | Abraxis Bioscience, Llc | Biomarkers for nanoparticle compositions |
US20190078093A1 (en) * | 2016-03-18 | 2019-03-14 | Caris Science, Inc. | Oligonucleotide probes and uses thereof |
-
2022
- 2022-03-07 WO PCT/KR2022/003196 patent/WO2022186673A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012031008A2 (en) * | 2010-08-31 | 2012-03-08 | The General Hospital Corporation | Cancer-related biological materials in microvesicles |
US20160130648A1 (en) * | 2014-11-12 | 2016-05-12 | Neogenomics Laboratories, Inc. | Deep sequencing of peripheral blood plasma dna as a reliable test for confirming the diagnosis of myelodysplastic syndrome |
US20180177771A1 (en) * | 2015-06-29 | 2018-06-28 | Abraxis Bioscience, Llc | Biomarkers for nanoparticle compositions |
KR101845957B1 (en) * | 2016-02-23 | 2018-04-05 | 전남대학교기술지주회사(주) | Kit for diagnosis of leukemia and diagnostic method targeting prohibitin gene |
US20190078093A1 (en) * | 2016-03-18 | 2019-03-14 | Caris Science, Inc. | Oligonucleotide probes and uses thereof |
Non-Patent Citations (2)
Title |
---|
"Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health", 13 September 2017, INTECH, ISBN: 978-953-51-3504-3, article PEREIRA MICHELE ARAÚJO, IMADA EDDIE LUIDY, GUEDES RAFAEL LUCAS MUNIZ: "RNA‐seq: Applications and Best Practices", XP055963891, DOI: 10.5772/intechopen.69250 * |
KIM IN-SUK, LEE JA YOUNG, KONG SUN-YOUNG, LEE SEUNG-TAE, HUH JUNGWON, NAM MYUNG-HYUN, KIM MYUNGSHIN, CHO YOUNG-UK, HUH HEE-JIN, SO: "Revision of Laboratory Testing Guidelines for Initial Diagnosis of Hematologic Neoplasms", LABORATORY MEDICINE ONLINE, vol. 10, no. 1, 1 January 2020 (2020-01-01), pages 10, XP055963896, DOI: 10.3343/lmo.2020.10.1.10 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Klco et al. | Advances in germline predisposition to acute leukaemias and myeloid neoplasms | |
Choy et al. | Prenatal diagnosis of fetuses with increased nuchal translucency by genome sequencing analysis | |
WO2019157791A1 (en) | Detection method and device of copy number variations, and computer readable medium | |
Bi et al. | Comparison of chromosome analysis and chromosomal microarray analysis: what is the value of chromosome analysis in today’s genomic array era? | |
AU745862C (en) | Contiguous genomic sequence scanning | |
Lee et al. | Chromosomal microarray with clinical diagnostic utility in children with developmental delay or intellectual disability | |
Safavi et al. | Novel gene targets detected by genomic profiling in a consecutive series of 126 adults with acute lymphoblastic leukemia | |
JP2009148291A (en) | Molecular detection of chromosome aberrations | |
CN110033829A (en) | The fusion detection method of homologous gene based on difference SNP marker object | |
AU711754B2 (en) | Methods for the detection of clonal populations of transformed cells in a genomically heterogeneous cellular sample | |
AU2014346680A1 (en) | Targeted screening for mutations | |
WO2020096248A1 (en) | Manufacturing and detection method of probe for detecting mutations in lung cancer tissue cells | |
Pastor et al. | Optical mapping of the 22q11. 2DS region reveals complex repeat structures and preferred locations for non-allelic homologous recombination (NAHR) | |
Pohovski et al. | Multiplex ligation-dependent probe amplification workflow for the detection of submicroscopic chromosomal abnormalities in patients with developmental delay/intellectual disability | |
Ming et al. | Rapid detection of submicroscopic chromosomal rearrangements in children with multiple congenital anomalies using high density oligonucleotide arrays | |
WO2019031866A1 (en) | Method for detecting gene rearrangement by using next generation sequencing | |
Rausch et al. | Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures | |
Weisschuh et al. | Diagnostic genome sequencing improves diagnostic yield: a prospective single-centre study in 1000 patients with inherited eye diseases | |
KR20220125708A (en) | Next-generation sequencing-based target gene RNA sequencing panel and analysis algorithm | |
WO2022186673A1 (en) | Next-generation-sequencing-based rna sequencing panel for targeted genes, and analysis algorithm | |
WO2020209590A1 (en) | Composition for diagnosis or prognosis prediction of glioma, and method for providing information related thereto | |
Liu et al. | Comprehensive Analysis of Hemophilia A (CAHEA): Towards Full Characterization of the F8 Gene Variants by Long-Read Sequencing | |
WO2019132581A1 (en) | Composition for diagnosing cancer such as breast cancer and ovarian cancer, and use thereof | |
Wong et al. | Detection and calibration of microdeletions and microduplications by array-based comparative genomic hybridization and its applicability to clinical genetic testing | |
Sabri et al. | Whole exome sequencing of chronic myeloid leukemia patients |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22763654 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22763654 Country of ref document: EP Kind code of ref document: A1 |