WO2023172929A1 - Artificial intelligence architecture for predicting cancer biomarkers - Google Patents
Artificial intelligence architecture for predicting cancer biomarkers Download PDFInfo
- Publication number
- WO2023172929A1 WO2023172929A1 PCT/US2023/063887 US2023063887W WO2023172929A1 WO 2023172929 A1 WO2023172929 A1 WO 2023172929A1 US 2023063887 W US2023063887 W US 2023063887W WO 2023172929 A1 WO2023172929 A1 WO 2023172929A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- biomarker
- treatment method
- biomarker comprises
- biological sample
- clustered
- Prior art date
Links
- 239000000107 tumor biomarker Substances 0.000 title abstract description 9
- 238000013473 artificial intelligence Methods 0.000 title description 7
- 238000000034 method Methods 0.000 claims abstract description 468
- 239000000090 biomarker Substances 0.000 claims abstract description 315
- 239000012472 biological sample Substances 0.000 claims abstract description 134
- 238000003384 imaging method Methods 0.000 claims abstract description 18
- 238000011282 treatment Methods 0.000 claims description 196
- 210000001519 tissue Anatomy 0.000 claims description 131
- 206010028980 Neoplasm Diseases 0.000 claims description 84
- 238000012163 sequencing technique Methods 0.000 claims description 83
- 230000035772 mutation Effects 0.000 claims description 78
- 206010006187 Breast cancer Diseases 0.000 claims description 73
- BASFCYQUMIYNBI-UHFFFAOYSA-N platinum Chemical compound [Pt] BASFCYQUMIYNBI-UHFFFAOYSA-N 0.000 claims description 70
- 208000026310 Breast neoplasm Diseases 0.000 claims description 61
- -1 ATR Proteins 0.000 claims description 59
- 201000011510 cancer Diseases 0.000 claims description 54
- 230000000869 mutational effect Effects 0.000 claims description 51
- 108090000623 proteins and genes Proteins 0.000 claims description 49
- 230000006801 homologous recombination Effects 0.000 claims description 38
- 238000002744 homologous recombination Methods 0.000 claims description 38
- 230000004075 alteration Effects 0.000 claims description 36
- 238000012549 training Methods 0.000 claims description 36
- 229910052697 platinum Inorganic materials 0.000 claims description 35
- 206010061535 Ovarian neoplasm Diseases 0.000 claims description 34
- 238000012360 testing method Methods 0.000 claims description 32
- WZUVPPKBWHMQCE-UHFFFAOYSA-N Haematoxylin Chemical compound C12=CC(O)=C(O)C=C2CC2(O)C1C1=CC=C(O)C(O)=C1OC2 WZUVPPKBWHMQCE-UHFFFAOYSA-N 0.000 claims description 30
- 238000013527 convolutional neural network Methods 0.000 claims description 30
- 238000013528 artificial neural network Methods 0.000 claims description 29
- 230000007547 defect Effects 0.000 claims description 28
- 108060006698 EGF receptor Proteins 0.000 claims description 27
- 102100037858 G1/S-specific cyclin-E1 Human genes 0.000 claims description 24
- 101000738568 Homo sapiens G1/S-specific cyclin-E1 Proteins 0.000 claims description 24
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 claims description 22
- 102000001301 EGF receptor Human genes 0.000 claims description 22
- 230000007812 deficiency Effects 0.000 claims description 20
- 230000034431 double-strand break repair via homologous recombination Effects 0.000 claims description 20
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 claims description 19
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 claims description 19
- 108700020463 BRCA1 Proteins 0.000 claims description 18
- 102000036365 BRCA1 Human genes 0.000 claims description 18
- 101150072950 BRCA1 gene Proteins 0.000 claims description 18
- 108700020462 BRCA2 Proteins 0.000 claims description 18
- 102000052609 BRCA2 Human genes 0.000 claims description 18
- 101150008921 Brca2 gene Proteins 0.000 claims description 18
- 101000984753 Homo sapiens Serine/threonine-protein kinase B-raf Proteins 0.000 claims description 18
- 230000037430 deletion Effects 0.000 claims description 18
- 238000012217 deletion Methods 0.000 claims description 18
- 239000000523 sample Substances 0.000 claims description 18
- 238000002560 therapeutic procedure Methods 0.000 claims description 18
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 claims description 17
- 102000007530 Neurofibromin 1 Human genes 0.000 claims description 17
- 108010085793 Neurofibromin 1 Proteins 0.000 claims description 17
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 claims description 17
- 241000700605 Viruses Species 0.000 claims description 17
- 230000037442 genomic alteration Effects 0.000 claims description 17
- 102100028914 Catenin beta-1 Human genes 0.000 claims description 16
- 108091007854 Cdh1/Fizzy-related Proteins 0.000 claims description 16
- 102000038594 Cdh1/Fizzy-related Human genes 0.000 claims description 16
- 101710105178 F-box/WD repeat-containing protein 7 Proteins 0.000 claims description 16
- 102100028138 F-box/WD repeat-containing protein 7 Human genes 0.000 claims description 16
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 claims description 16
- 241000711549 Hepacivirus C Species 0.000 claims description 16
- 101000916173 Homo sapiens Catenin beta-1 Proteins 0.000 claims description 16
- 241001502974 Human gammaherpesvirus 8 Species 0.000 claims description 16
- 241000725303 Human immunodeficiency virus Species 0.000 claims description 16
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 claims description 16
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 claims description 16
- 206010033128 Ovarian cancer Diseases 0.000 claims description 16
- 238000001514 detection method Methods 0.000 claims description 16
- 239000003814 drug Substances 0.000 claims description 16
- 210000004940 nucleus Anatomy 0.000 claims description 16
- 208000031448 Genomic Instability Diseases 0.000 claims description 15
- 241000701044 Human gammaherpesvirus 4 Species 0.000 claims description 15
- 229940079593 drug Drugs 0.000 claims description 15
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 claims description 15
- 102100033996 Double-strand break repair protein MRE11 Human genes 0.000 claims description 14
- 101000591400 Homo sapiens Double-strand break repair protein MRE11 Proteins 0.000 claims description 14
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 claims description 14
- 102000037984 Inhibitory immune checkpoint proteins Human genes 0.000 claims description 14
- 108091008026 Inhibitory immune checkpoint proteins Proteins 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 14
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 claims description 14
- 239000003112 inhibitor Substances 0.000 claims description 14
- 101150061338 mmr gene Proteins 0.000 claims description 14
- 102000015619 APOBEC Deaminases Human genes 0.000 claims description 13
- 108010024100 APOBEC Deaminases Proteins 0.000 claims description 13
- 102100027103 Serine/threonine-protein kinase B-raf Human genes 0.000 claims description 13
- 102100023931 Transcriptional regulator ATRX Human genes 0.000 claims description 13
- 238000003064 k means clustering Methods 0.000 claims description 13
- 238000000513 principal component analysis Methods 0.000 claims description 13
- 102100034540 Adenomatous polyposis coli protein Human genes 0.000 claims description 12
- 102100038111 Cyclin-dependent kinase 12 Human genes 0.000 claims description 12
- 102100025334 Guanine nucleotide-binding protein G(q) subunit alpha Human genes 0.000 claims description 12
- 102100036738 Guanine nucleotide-binding protein subunit alpha-11 Human genes 0.000 claims description 12
- 101000924577 Homo sapiens Adenomatous polyposis coli protein Proteins 0.000 claims description 12
- 101000721661 Homo sapiens Cellular tumor antigen p53 Proteins 0.000 claims description 12
- 101000884345 Homo sapiens Cyclin-dependent kinase 12 Proteins 0.000 claims description 12
- 101000857888 Homo sapiens Guanine nucleotide-binding protein G(q) subunit alpha Proteins 0.000 claims description 12
- 101001072407 Homo sapiens Guanine nucleotide-binding protein subunit alpha-11 Proteins 0.000 claims description 12
- 101001109719 Homo sapiens Nucleophosmin Proteins 0.000 claims description 12
- 101000628562 Homo sapiens Serine/threonine-protein kinase STK11 Proteins 0.000 claims description 12
- 101000819111 Homo sapiens Trans-acting T-cell-specific transcription factor GATA-3 Proteins 0.000 claims description 12
- 101001087416 Homo sapiens Tyrosine-protein phosphatase non-receptor type 11 Proteins 0.000 claims description 12
- 239000003798 L01XE11 - Pazopanib Substances 0.000 claims description 12
- 102100022678 Nucleophosmin Human genes 0.000 claims description 12
- 108010068097 Rad51 Recombinase Proteins 0.000 claims description 12
- 102000001332 SRC Human genes 0.000 claims description 12
- 108060006706 SRC Proteins 0.000 claims description 12
- 102100026715 Serine/threonine-protein kinase STK11 Human genes 0.000 claims description 12
- 102100021386 Trans-acting T-cell-specific transcription factor GATA-3 Human genes 0.000 claims description 12
- 102100033019 Tyrosine-protein phosphatase non-receptor type 11 Human genes 0.000 claims description 12
- 210000000349 chromosome Anatomy 0.000 claims description 12
- 229960000639 pazopanib Drugs 0.000 claims description 12
- CUIHSIWYWATEQL-UHFFFAOYSA-N pazopanib Chemical compound C1=CC2=C(C)N(C)N=C2C=C1N(C)C(N=1)=CC=NC=1NC1=CC=C(C)C(S(N)(=O)=O)=C1 CUIHSIWYWATEQL-UHFFFAOYSA-N 0.000 claims description 12
- 230000007704 transition Effects 0.000 claims description 12
- 102100028849 DNA mismatch repair protein Mlh3 Human genes 0.000 claims description 11
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 claims description 11
- 102100037700 DNA mismatch repair protein Msh3 Human genes 0.000 claims description 11
- 101000577867 Homo sapiens DNA mismatch repair protein Mlh3 Proteins 0.000 claims description 11
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 claims description 11
- 101001027762 Homo sapiens DNA mismatch repair protein Msh3 Proteins 0.000 claims description 11
- 101000738901 Homo sapiens PMS1 protein homolog 1 Proteins 0.000 claims description 11
- 241000701806 Human papillomavirus Species 0.000 claims description 11
- 229910015837 MSH2 Inorganic materials 0.000 claims description 11
- 102100025825 Methylated-DNA-protein-cysteine methyltransferase Human genes 0.000 claims description 11
- 102100037482 PMS1 protein homolog 1 Human genes 0.000 claims description 11
- 108010011536 PTEN Phosphohydrolase Proteins 0.000 claims description 11
- 102000014160 PTEN Phosphohydrolase Human genes 0.000 claims description 11
- 108040008770 methylated-DNA-[protein]-cysteine S-methyltransferase activity proteins Proteins 0.000 claims description 11
- KCOYQXZDFIIGCY-CZIZESTLSA-N (3e)-4-amino-5-fluoro-3-[5-(4-methylpiperazin-1-yl)-1,3-dihydrobenzimidazol-2-ylidene]quinolin-2-one Chemical compound C1CN(C)CCN1C1=CC=C(N\C(N2)=C/3C(=C4C(F)=CC=CC4=NC\3=O)N)C2=C1 KCOYQXZDFIIGCY-CZIZESTLSA-N 0.000 claims description 10
- 102100033793 ALK tyrosine kinase receptor Human genes 0.000 claims description 10
- 101001042041 Bos taurus Isocitrate dehydrogenase [NAD] subunit beta, mitochondrial Proteins 0.000 claims description 10
- 208000037088 Chromosome Breakage Diseases 0.000 claims description 10
- 102100030708 GTPase KRas Human genes 0.000 claims description 10
- 102100039788 GTPase NRas Human genes 0.000 claims description 10
- 101000744505 Homo sapiens GTPase NRas Proteins 0.000 claims description 10
- 101000960234 Homo sapiens Isocitrate dehydrogenase [NADP] cytoplasmic Proteins 0.000 claims description 10
- 101000605639 Homo sapiens Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Proteins 0.000 claims description 10
- 101000997835 Homo sapiens Tyrosine-protein kinase JAK1 Proteins 0.000 claims description 10
- 102100039905 Isocitrate dehydrogenase [NADP] cytoplasmic Human genes 0.000 claims description 10
- 102100038332 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Human genes 0.000 claims description 10
- 102000012338 Poly(ADP-ribose) Polymerases Human genes 0.000 claims description 10
- 108010061844 Poly(ADP-ribose) Polymerases Proteins 0.000 claims description 10
- 229920000776 Poly(Adenosine diphosphate-ribose) polymerase Polymers 0.000 claims description 10
- 102100028286 Proto-oncogene tyrosine-protein kinase receptor Ret Human genes 0.000 claims description 10
- 102100033438 Tyrosine-protein kinase JAK1 Human genes 0.000 claims description 10
- 102100033177 Vascular endothelial growth factor receptor 2 Human genes 0.000 claims description 10
- 229950005778 dovitinib Drugs 0.000 claims description 10
- ZEOWTGPWHLSLOG-UHFFFAOYSA-N Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F Chemical compound Cc1ccc(cc1-c1ccc2c(n[nH]c2c1)-c1cnn(c1)C1CC1)C(=O)Nc1cccc(c1)C(F)(F)F ZEOWTGPWHLSLOG-UHFFFAOYSA-N 0.000 claims description 9
- 108010009392 Cyclin-Dependent Kinase Inhibitor p16 Proteins 0.000 claims description 9
- 102100023593 Fibroblast growth factor receptor 1 Human genes 0.000 claims description 9
- 101710182386 Fibroblast growth factor receptor 1 Proteins 0.000 claims description 9
- 101000795643 Homo sapiens Hamartin Proteins 0.000 claims description 9
- 101001126417 Homo sapiens Platelet-derived growth factor receptor alpha Proteins 0.000 claims description 9
- 101000686031 Homo sapiens Proto-oncogene tyrosine-protein kinase ROS Proteins 0.000 claims description 9
- 101100355599 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) mus-11 gene Proteins 0.000 claims description 9
- 102100030485 Platelet-derived growth factor receptor alpha Human genes 0.000 claims description 9
- 102100023347 Proto-oncogene tyrosine-protein kinase ROS Human genes 0.000 claims description 9
- 101150006234 RAD52 gene Proteins 0.000 claims description 9
- 102000053062 Rad52 DNA Repair and Recombination Human genes 0.000 claims description 9
- 108700031762 Rad52 DNA Repair and Recombination Proteins 0.000 claims description 9
- 102100033254 Tumor suppressor ARF Human genes 0.000 claims description 9
- 230000033607 mismatch repair Effects 0.000 claims description 9
- 230000000392 somatic effect Effects 0.000 claims description 9
- 102000000872 ATM Human genes 0.000 claims description 8
- 108010004586 Ataxia Telangiectasia Mutated Proteins Proteins 0.000 claims description 8
- 102100027161 BRCA2-interacting transcriptional repressor EMSY Human genes 0.000 claims description 8
- 102100035631 Bloom syndrome protein Human genes 0.000 claims description 8
- 108091009167 Bloom syndrome protein Proteins 0.000 claims description 8
- 101710098191 C-4 methylsterol oxidase ERG25 Proteins 0.000 claims description 8
- 101000715943 Caenorhabditis elegans Cyclin-dependent kinase 4 homolog Proteins 0.000 claims description 8
- 108010058546 Cyclin D1 Proteins 0.000 claims description 8
- 102100024812 DNA (cytosine-5)-methyltransferase 3A Human genes 0.000 claims description 8
- 108010024491 DNA Methyltransferase 3A Proteins 0.000 claims description 8
- 102100039116 DNA repair protein RAD50 Human genes 0.000 claims description 8
- 102100027830 DNA repair protein XRCC2 Human genes 0.000 claims description 8
- ZBNZXTGUTAYRHI-UHFFFAOYSA-N Dasatinib Chemical compound C=1C(N2CCN(CCO)CC2)=NC(C)=NC=1NC(S1)=NC=C1C(=O)NC1=C(C)C=CC=C1Cl ZBNZXTGUTAYRHI-UHFFFAOYSA-N 0.000 claims description 8
- 102100031480 Dual specificity mitogen-activated protein kinase kinase 1 Human genes 0.000 claims description 8
- 101150021185 FGF gene Proteins 0.000 claims description 8
- 108091008794 FGF receptors Proteins 0.000 claims description 8
- 108010067741 Fanconi Anemia Complementation Group N protein Proteins 0.000 claims description 8
- 102100034553 Fanconi anemia group J protein Human genes 0.000 claims description 8
- 102100023600 Fibroblast growth factor receptor 2 Human genes 0.000 claims description 8
- 101710182389 Fibroblast growth factor receptor 2 Proteins 0.000 claims description 8
- 102100027842 Fibroblast growth factor receptor 3 Human genes 0.000 claims description 8
- 101710182396 Fibroblast growth factor receptor 3 Proteins 0.000 claims description 8
- 102100024165 G1/S-specific cyclin-D1 Human genes 0.000 claims description 8
- 102100029974 GTPase HRas Human genes 0.000 claims description 8
- 102100031885 General transcription and DNA repair factor IIH helicase subunit XPB Human genes 0.000 claims description 8
- 102100032610 Guanine nucleotide-binding protein G(s) subunit alpha isoforms XLas Human genes 0.000 claims description 8
- 102100036703 Guanine nucleotide-binding protein subunit alpha-13 Human genes 0.000 claims description 8
- 102100031561 Hamartin Human genes 0.000 claims description 8
- 241000700721 Hepatitis B virus Species 0.000 claims description 8
- 102100021866 Hepatocyte growth factor Human genes 0.000 claims description 8
- 102100035108 High affinity nerve growth factor receptor Human genes 0.000 claims description 8
- 102100039999 Histone deacetylase 2 Human genes 0.000 claims description 8
- 102100027755 Histone-lysine N-methyltransferase 2C Human genes 0.000 claims description 8
- 102100038970 Histone-lysine N-methyltransferase EZH2 Human genes 0.000 claims description 8
- 101000779641 Homo sapiens ALK tyrosine kinase receptor Proteins 0.000 claims description 8
- 101001057996 Homo sapiens BRCA2-interacting transcriptional repressor EMSY Proteins 0.000 claims description 8
- 101000743929 Homo sapiens DNA repair protein RAD50 Proteins 0.000 claims description 8
- 101000649306 Homo sapiens DNA repair protein XRCC2 Proteins 0.000 claims description 8
- 101000729474 Homo sapiens DNA-directed RNA polymerase I subunit RPA1 Proteins 0.000 claims description 8
- 101001095815 Homo sapiens E3 ubiquitin-protein ligase RING2 Proteins 0.000 claims description 8
- 101000848171 Homo sapiens Fanconi anemia group J protein Proteins 0.000 claims description 8
- 101000584633 Homo sapiens GTPase HRas Proteins 0.000 claims description 8
- 101000584612 Homo sapiens GTPase KRas Proteins 0.000 claims description 8
- 101000920748 Homo sapiens General transcription and DNA repair factor IIH helicase subunit XPB Proteins 0.000 claims description 8
- 101001014590 Homo sapiens Guanine nucleotide-binding protein G(s) subunit alpha isoforms XLas Proteins 0.000 claims description 8
- 101001014594 Homo sapiens Guanine nucleotide-binding protein G(s) subunit alpha isoforms short Proteins 0.000 claims description 8
- 101001072481 Homo sapiens Guanine nucleotide-binding protein subunit alpha-13 Proteins 0.000 claims description 8
- 101000898034 Homo sapiens Hepatocyte growth factor Proteins 0.000 claims description 8
- 101000596894 Homo sapiens High affinity nerve growth factor receptor Proteins 0.000 claims description 8
- 101001035011 Homo sapiens Histone deacetylase 2 Proteins 0.000 claims description 8
- 101001008892 Homo sapiens Histone-lysine N-methyltransferase 2C Proteins 0.000 claims description 8
- 101000882127 Homo sapiens Histone-lysine N-methyltransferase EZH2 Proteins 0.000 claims description 8
- 101001076408 Homo sapiens Interleukin-6 Proteins 0.000 claims description 8
- 101000916644 Homo sapiens Macrophage colony-stimulating factor 1 receptor Proteins 0.000 claims description 8
- 101001057193 Homo sapiens Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Proteins 0.000 claims description 8
- 101001014610 Homo sapiens Neuroendocrine secretory protein 55 Proteins 0.000 claims description 8
- 101000981336 Homo sapiens Nibrin Proteins 0.000 claims description 8
- 101000797903 Homo sapiens Protein ALEX Proteins 0.000 claims description 8
- 101000579425 Homo sapiens Proto-oncogene tyrosine-protein kinase receptor Ret Proteins 0.000 claims description 8
- 101000779418 Homo sapiens RAC-alpha serine/threonine-protein kinase Proteins 0.000 claims description 8
- 101000686227 Homo sapiens Ras-related protein R-Ras2 Proteins 0.000 claims description 8
- 101000932478 Homo sapiens Receptor-type tyrosine-protein kinase FLT3 Proteins 0.000 claims description 8
- 101001092125 Homo sapiens Replication protein A 70 kDa DNA-binding subunit Proteins 0.000 claims description 8
- 101000777293 Homo sapiens Serine/threonine-protein kinase Chk1 Proteins 0.000 claims description 8
- 101000802948 Homo sapiens Serine/threonine-protein phosphatase 2A 55 kDa regulatory subunit B alpha isoform Proteins 0.000 claims description 8
- 101000868152 Homo sapiens Son of sevenless homolog 1 Proteins 0.000 claims description 8
- 101000823316 Homo sapiens Tyrosine-protein kinase ABL1 Proteins 0.000 claims description 8
- 101000740048 Homo sapiens Ubiquitin carboxyl-terminal hydrolase BAP1 Proteins 0.000 claims description 8
- 101000808011 Homo sapiens Vascular endothelial growth factor A Proteins 0.000 claims description 8
- 101000804798 Homo sapiens Werner syndrome ATP-dependent helicase Proteins 0.000 claims description 8
- 239000002067 L01XE06 - Dasatinib Substances 0.000 claims description 8
- 239000002146 L01XE16 - Crizotinib Substances 0.000 claims description 8
- 101000740049 Latilactobacillus curvatus Bioactive peptide 1 Proteins 0.000 claims description 8
- 108010068342 MAP Kinase Kinase 1 Proteins 0.000 claims description 8
- 102100028198 Macrophage colony-stimulating factor 1 receptor Human genes 0.000 claims description 8
- 102100027240 Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Human genes 0.000 claims description 8
- 108091092878 Microsatellite Proteins 0.000 claims description 8
- 102100025725 Mothers against decapentaplegic homolog 4 Human genes 0.000 claims description 8
- 101710143112 Mothers against decapentaplegic homolog 4 Proteins 0.000 claims description 8
- 102100040884 Partner and localizer of BRCA2 Human genes 0.000 claims description 8
- 102100033810 RAC-alpha serine/threonine-protein kinase Human genes 0.000 claims description 8
- 102100025003 Ras-related protein R-Ras2 Human genes 0.000 claims description 8
- 101710100969 Receptor tyrosine-protein kinase erbB-3 Proteins 0.000 claims description 8
- 102100029986 Receptor tyrosine-protein kinase erbB-3 Human genes 0.000 claims description 8
- 102100029981 Receptor tyrosine-protein kinase erbB-4 Human genes 0.000 claims description 8
- 101710100963 Receptor tyrosine-protein kinase erbB-4 Proteins 0.000 claims description 8
- 102100020718 Receptor-type tyrosine-protein kinase FLT3 Human genes 0.000 claims description 8
- 102100035729 Replication protein A 70 kDa DNA-binding subunit Human genes 0.000 claims description 8
- 102100031081 Serine/threonine-protein kinase Chk1 Human genes 0.000 claims description 8
- 102100035728 Serine/threonine-protein phosphatase 2A 55 kDa regulatory subunit B alpha isoform Human genes 0.000 claims description 8
- 102000013380 Smoothened Receptor Human genes 0.000 claims description 8
- 101710090597 Smoothened homolog Proteins 0.000 claims description 8
- 208000000389 T-cell leukemia Diseases 0.000 claims description 8
- 208000028530 T-cell lymphoblastic leukemia/lymphoma Diseases 0.000 claims description 8
- CBPNZQVSJQDFBE-FUXHJELOSA-N Temsirolimus Chemical compound C1C[C@@H](OC(=O)C(C)(CO)CO)[C@H](OC)C[C@@H]1C[C@@H](C)[C@H]1OC(=O)[C@@H]2CCCCN2C(=O)C(=O)[C@](O)(O2)[C@H](C)CC[C@H]2C[C@H](OC)/C(C)=C/C=C/C=C/[C@@H](C)C[C@@H](C)C(=O)[C@H](OC)[C@H](O)/C(C)=C/[C@@H](C)C(=O)C1 CBPNZQVSJQDFBE-FUXHJELOSA-N 0.000 claims description 8
- 102100022596 Tyrosine-protein kinase ABL1 Human genes 0.000 claims description 8
- 102100039037 Vascular endothelial growth factor A Human genes 0.000 claims description 8
- 102100035336 Werner syndrome ATP-dependent helicase Human genes 0.000 claims description 8
- 230000003321 amplification Effects 0.000 claims description 8
- 230000037429 base substitution Effects 0.000 claims description 8
- 229960005061 crizotinib Drugs 0.000 claims description 8
- KTEIFNKAUNYNJU-GFCCVEGCSA-N crizotinib Chemical compound O([C@H](C)C=1C(=C(F)C=CC=1Cl)Cl)C(C(=NC=1)N)=CC=1C(=C1)C=NN1C1CCNCC1 KTEIFNKAUNYNJU-GFCCVEGCSA-N 0.000 claims description 8
- 229960002448 dasatinib Drugs 0.000 claims description 8
- 238000001983 electron spin resonance imaging Methods 0.000 claims description 8
- 102000052178 fibroblast growth factor receptor activity proteins Human genes 0.000 claims description 8
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 8
- 239000002773 nucleotide Substances 0.000 claims description 8
- 125000003729 nucleotide group Chemical group 0.000 claims description 8
- 239000012188 paraffin wax Substances 0.000 claims description 8
- 231100000241 scar Toxicity 0.000 claims description 8
- 229960000235 temsirolimus Drugs 0.000 claims description 8
- QFJCIRLUMZQUOT-UHFFFAOYSA-N temsirolimus Natural products C1CC(O)C(OC)CC1CC(C)C1OC(=O)C2CCCCN2C(=O)C(=O)C(O)(O2)C(C)CCC2CC(OC)C(C)=CC=CC=CC(C)CC(C)C(=O)C(OC)C(O)C(C)=CC(C)C(=O)C1 QFJCIRLUMZQUOT-UHFFFAOYSA-N 0.000 claims description 8
- 229960004066 trametinib Drugs 0.000 claims description 8
- LIRYPHYGHXZJBZ-UHFFFAOYSA-N trametinib Chemical compound CC(=O)NC1=CC=CC(N2C(N(C3CC3)C(=O)C3=C(NC=4C(=CC(I)=CC=4)F)N(C)C(=O)C(C)=C32)=O)=C1 LIRYPHYGHXZJBZ-UHFFFAOYSA-N 0.000 claims description 8
- 230000005945 translocation Effects 0.000 claims description 8
- 101150020330 ATRX gene Proteins 0.000 claims description 7
- 101700002522 BARD1 Proteins 0.000 claims description 7
- 102100028048 BRCA1-associated RING domain protein 1 Human genes 0.000 claims description 7
- 102100022057 Hepatocyte nuclear factor 1-alpha Human genes 0.000 claims description 7
- 101001045751 Homo sapiens Hepatocyte nuclear factor 1-alpha Proteins 0.000 claims description 7
- 239000002138 L01XE21 - Regorafenib Substances 0.000 claims description 7
- 108700028341 SMARCB1 Proteins 0.000 claims description 7
- 101150008214 SMARCB1 gene Proteins 0.000 claims description 7
- 102100025746 SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily B member 1 Human genes 0.000 claims description 7
- 108700042462 X-linked Nuclear Proteins 0.000 claims description 7
- 229960004836 regorafenib Drugs 0.000 claims description 7
- FNHKPVJBJVTLMP-UHFFFAOYSA-N regorafenib Chemical compound C1=NC(C(=O)NC)=CC(OC=2C=C(F)C(NC(=O)NC=3C=C(C(Cl)=CC=3)C(F)(F)F)=CC=2)=C1 FNHKPVJBJVTLMP-UHFFFAOYSA-N 0.000 claims description 7
- HKVAMNSJSFKALM-GKUWKFKPSA-N Everolimus Chemical compound C1C[C@@H](OCCO)[C@H](OC)C[C@@H]1C[C@@H](C)[C@H]1OC(=O)[C@@H]2CCCCN2C(=O)C(=O)[C@](O)(O2)[C@H](C)CC[C@H]2C[C@H](OC)/C(C)=C/C=C/C=C/[C@@H](C)C[C@@H](C)C(=O)[C@H](OC)[C@H](O)/C(C)=C/[C@@H](C)C(=O)C1 HKVAMNSJSFKALM-GKUWKFKPSA-N 0.000 claims description 6
- 101001050559 Homo sapiens Kinesin-1 heavy chain Proteins 0.000 claims description 6
- 102100023422 Kinesin-1 heavy chain Human genes 0.000 claims description 6
- 239000005517 L01XE01 - Imatinib Substances 0.000 claims description 6
- 239000002176 L01XE26 - Cabozantinib Substances 0.000 claims description 6
- 239000012661 PARP inhibitor Substances 0.000 claims description 6
- 229940121906 Poly ADP ribose polymerase inhibitor Drugs 0.000 claims description 6
- 229960001292 cabozantinib Drugs 0.000 claims description 6
- ONIQOQHATWINJY-UHFFFAOYSA-N cabozantinib Chemical compound C=12C=C(OC)C(OC)=CC2=NC=CC=1OC(C=C1)=CC=C1NC(=O)C1(C(=O)NC=2C=CC(F)=CC=2)CC1 ONIQOQHATWINJY-UHFFFAOYSA-N 0.000 claims description 6
- 229960005167 everolimus Drugs 0.000 claims description 6
- 201000011243 gastrointestinal stromal tumor Diseases 0.000 claims description 6
- 229960002411 imatinib Drugs 0.000 claims description 6
- KTUFNOKKBVMGRW-UHFFFAOYSA-N imatinib Chemical compound C1CN(C)CCN1CC1=CC=C(C(=O)NC=2C=C(NC=3N=C(C=CN=3)C=3C=NC=CC=3)C(C)=CC=2)C=C1 KTUFNOKKBVMGRW-UHFFFAOYSA-N 0.000 claims description 6
- 101100002343 Arabidopsis thaliana ARID1 gene Proteins 0.000 claims description 5
- 101001003194 Eleusine coracana Alpha-amylase/trypsin inhibitor Proteins 0.000 claims description 5
- 101000785776 Homo sapiens Artemin Proteins 0.000 claims description 5
- 101000712511 Homo sapiens DNA repair and recombination protein RAD54-like Proteins 0.000 claims description 5
- 101000974340 Homo sapiens Nuclear receptor corepressor 1 Proteins 0.000 claims description 5
- 101000799466 Homo sapiens Thrombopoietin receptor Proteins 0.000 claims description 5
- 101000997832 Homo sapiens Tyrosine-protein kinase JAK2 Proteins 0.000 claims description 5
- 102100024403 Nibrin Human genes 0.000 claims description 5
- 102100022935 Nuclear receptor corepressor 1 Human genes 0.000 claims description 5
- 108091081400 Subtelomere Proteins 0.000 claims description 5
- 102100034196 Thrombopoietin receptor Human genes 0.000 claims description 5
- 102100033444 Tyrosine-protein kinase JAK2 Human genes 0.000 claims description 5
- 210000002230 centromere Anatomy 0.000 claims description 5
- 230000002068 genetic effect Effects 0.000 claims description 5
- 238000009169 immunotherapy Methods 0.000 claims description 5
- XIIOFHFUYBLOLW-UHFFFAOYSA-N selpercatinib Chemical compound OC(COC=1C=C(C=2N(C=1)N=CC=2C#N)C=1C=NC(=CC=1)N1CC2N(C(C1)C2)CC=1C=NC(=CC=1)OC)(C)C XIIOFHFUYBLOLW-UHFFFAOYSA-N 0.000 claims description 5
- SPMVMDHWKHCIDT-UHFFFAOYSA-N 1-[2-chloro-4-[(6,7-dimethoxy-4-quinolinyl)oxy]phenyl]-3-(5-methyl-3-isoxazolyl)urea Chemical compound C=12C=C(OC)C(OC)=CC2=NC=CC=1OC(C=C1Cl)=CC=C1NC(=O)NC=1C=C(C)ON=1 SPMVMDHWKHCIDT-UHFFFAOYSA-N 0.000 claims description 4
- QFCXANHHBCGMAS-UHFFFAOYSA-N 4-[[4-(4-chloroanilino)furo[2,3-d]pyridazin-7-yl]oxymethyl]-n-methylpyridine-2-carboxamide Chemical compound C1=NC(C(=O)NC)=CC(COC=2C=3OC=CC=3C(NC=3C=CC(Cl)=CC=3)=NN=2)=C1 QFCXANHHBCGMAS-UHFFFAOYSA-N 0.000 claims description 4
- 101100127296 Dictyostelium discoideum kif1 gene Proteins 0.000 claims description 4
- 108010010285 Forkhead Box Protein L2 Proteins 0.000 claims description 4
- 102100035137 Forkhead box protein L2 Human genes 0.000 claims description 4
- 102100034533 Histone H2AX Human genes 0.000 claims description 4
- 101001067891 Homo sapiens Histone H2AX Proteins 0.000 claims description 4
- 101000599886 Homo sapiens Isocitrate dehydrogenase [NADP], mitochondrial Proteins 0.000 claims description 4
- 101000971638 Homo sapiens Kinesin-like protein KIF1A Proteins 0.000 claims description 4
- 101000984620 Homo sapiens Low-density lipoprotein receptor-related protein 1B Proteins 0.000 claims description 4
- 101000824415 Homo sapiens Protocadherin Fat 3 Proteins 0.000 claims description 4
- 101000691459 Homo sapiens Serine/threonine-protein kinase N2 Proteins 0.000 claims description 4
- 101000934996 Homo sapiens Tyrosine-protein kinase JAK3 Proteins 0.000 claims description 4
- 102100037845 Isocitrate dehydrogenase [NADP], mitochondrial Human genes 0.000 claims description 4
- 108090000484 Kelch-Like ECH-Associated Protein 1 Proteins 0.000 claims description 4
- 102000004034 Kelch-Like ECH-Associated Protein 1 Human genes 0.000 claims description 4
- 102100021527 Kinesin-like protein KIF1A Human genes 0.000 claims description 4
- 239000005551 L01XE03 - Erlotinib Substances 0.000 claims description 4
- 239000002147 L01XE04 - Sunitinib Substances 0.000 claims description 4
- 239000002139 L01XE22 - Masitinib Substances 0.000 claims description 4
- 102100027121 Low-density lipoprotein receptor-related protein 1B Human genes 0.000 claims description 4
- 108010075654 MAP Kinase Kinase Kinase 1 Proteins 0.000 claims description 4
- 102100033115 Mitogen-activated protein kinase kinase kinase 1 Human genes 0.000 claims description 4
- 101100127288 Mus musculus Kif1a gene Proteins 0.000 claims description 4
- FOFDIMHVKGYHRU-UHFFFAOYSA-N N-(1,3-benzodioxol-5-ylmethyl)-4-(4-benzofuro[3,2-d]pyrimidinyl)-1-piperazinecarbothioamide Chemical compound C12=CC=CC=C2OC2=C1N=CN=C2N(CC1)CCN1C(=S)NCC1=CC=C(OCO2)C2=C1 FOFDIMHVKGYHRU-UHFFFAOYSA-N 0.000 claims description 4
- 102000048850 Neoplasm Genes Human genes 0.000 claims description 4
- 108700019961 Neoplasm Genes Proteins 0.000 claims description 4
- 102000001759 Notch1 Receptor Human genes 0.000 claims description 4
- 108010029755 Notch1 Receptor Proteins 0.000 claims description 4
- 102100022134 Protocadherin Fat 3 Human genes 0.000 claims description 4
- 102100026180 Serine/threonine-protein kinase N2 Human genes 0.000 claims description 4
- 102100025387 Tyrosine-protein kinase JAK3 Human genes 0.000 claims description 4
- 229950009545 amuvatinib Drugs 0.000 claims description 4
- 229960003005 axitinib Drugs 0.000 claims description 4
- RITAVMQDGBJQJZ-FMIVXFBMSA-N axitinib Chemical compound CNC(=O)C1=CC=CC=C1SC1=CC=C(C(\C=C\C=2N=CC=CC=2)=NN2)C2=C1 RITAVMQDGBJQJZ-FMIVXFBMSA-N 0.000 claims description 4
- 229960001602 ceritinib Drugs 0.000 claims description 4
- VERWOWGGCGHDQE-UHFFFAOYSA-N ceritinib Chemical compound CC=1C=C(NC=2N=C(NC=3C(=CC=CC=3)S(=O)(=O)C(C)C)C(Cl)=CN=2)C(OC(C)C)=CC=1C1CCNCC1 VERWOWGGCGHDQE-UHFFFAOYSA-N 0.000 claims description 4
- 229960002271 cobimetinib Drugs 0.000 claims description 4
- RESIMIUSNACMNW-BXRWSSRYSA-N cobimetinib fumarate Chemical compound OC(=O)\C=C\C(O)=O.C1C(O)([C@H]2NCCCC2)CN1C(=O)C1=CC=C(F)C(F)=C1NC1=CC=C(I)C=C1F.C1C(O)([C@H]2NCCCC2)CN1C(=O)C1=CC=C(F)C(F)=C1NC1=CC=C(I)C=C1F RESIMIUSNACMNW-BXRWSSRYSA-N 0.000 claims description 4
- 229960001433 erlotinib Drugs 0.000 claims description 4
- AAKJLRGGTJKAMG-UHFFFAOYSA-N erlotinib Chemical compound C=12C=C(OCCOC)C(OCCOC)=CC2=NC=NC=1NC1=CC=CC(C#C)=C1 AAKJLRGGTJKAMG-UHFFFAOYSA-N 0.000 claims description 4
- 229960004655 masitinib Drugs 0.000 claims description 4
- WJEOLQLKVOPQFV-UHFFFAOYSA-N masitinib Chemical compound C1CN(C)CCN1CC1=CC=C(C(=O)NC=2C=C(NC=3SC=C(N=3)C=3C=NC=CC=3)C(C)=CC=2)C=C1 WJEOLQLKVOPQFV-UHFFFAOYSA-N 0.000 claims description 4
- ONDPWWDPQDCQNJ-UHFFFAOYSA-N n-(3,3-dimethyl-1,2-dihydroindol-6-yl)-2-(pyridin-4-ylmethylamino)pyridine-3-carboxamide;phosphoric acid Chemical compound OP(O)(O)=O.OP(O)(O)=O.C=1C=C2C(C)(C)CNC2=CC=1NC(=O)C1=CC=CN=C1NCC1=CC=NC=C1 ONDPWWDPQDCQNJ-UHFFFAOYSA-N 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 4
- 229940121487 ripretinib Drugs 0.000 claims description 4
- CEFJVGZHQAGLHS-UHFFFAOYSA-N ripretinib Chemical compound O=C1N(CC)C2=CC(NC)=NC=C2C=C1C(C(=CC=1F)Br)=CC=1NC(=O)NC1=CC=CC=C1 CEFJVGZHQAGLHS-UHFFFAOYSA-N 0.000 claims description 4
- 229950010746 selumetinib Drugs 0.000 claims description 4
- CYOHGALHFOKKQC-UHFFFAOYSA-N selumetinib Chemical compound OCCONC(=O)C=1C=C2N(C)C=NC2=C(F)C=1NC1=CC=C(Br)C=C1Cl CYOHGALHFOKKQC-UHFFFAOYSA-N 0.000 claims description 4
- 229960001796 sunitinib Drugs 0.000 claims description 4
- WINHZLLDWRZWRT-ATVHPVEESA-N sunitinib Chemical compound CCN(CC)CCNC(=O)C1=C(C)NC(\C=C/2C3=CC(F)=CC=C3NC\2=O)=C1C WINHZLLDWRZWRT-ATVHPVEESA-N 0.000 claims description 4
- 229950004186 telatinib Drugs 0.000 claims description 4
- 229950009455 tepotinib Drugs 0.000 claims description 4
- AHYMHWXQRWRBKT-UHFFFAOYSA-N tepotinib Chemical compound C1CN(C)CCC1COC1=CN=C(C=2C=C(CN3C(C=CC(=N3)C=3C=C(C=CC=3)C#N)=O)C=CC=2)N=C1 AHYMHWXQRWRBKT-UHFFFAOYSA-N 0.000 claims description 4
- 229960000940 tivozanib Drugs 0.000 claims description 4
- 229950000578 vatalanib Drugs 0.000 claims description 4
- YCOYDOIWSSHVCK-UHFFFAOYSA-N vatalanib Chemical compound C1=CC(Cl)=CC=C1NC(C1=CC=CC=C11)=NN=C1CC1=CC=NC=C1 YCOYDOIWSSHVCK-UHFFFAOYSA-N 0.000 claims description 4
- NYNZQNWKBKUAII-KBXCAEBGSA-N (3s)-n-[5-[(2r)-2-(2,5-difluorophenyl)pyrrolidin-1-yl]pyrazolo[1,5-a]pyrimidin-3-yl]-3-hydroxypyrrolidine-1-carboxamide Chemical compound C1[C@@H](O)CCN1C(=O)NC1=C2N=C(N3[C@H](CCC3)C=3C(=CC=C(F)C=3)F)C=CN2N=C1 NYNZQNWKBKUAII-KBXCAEBGSA-N 0.000 claims description 3
- HCDMJFOHIXMBOV-UHFFFAOYSA-N 3-(2,6-difluoro-3,5-dimethoxyphenyl)-1-ethyl-8-(morpholin-4-ylmethyl)-4,7-dihydropyrrolo[4,5]pyrido[1,2-d]pyrimidin-2-one Chemical compound C=1C2=C3N(CC)C(=O)N(C=4C(=C(OC)C=C(OC)C=4F)F)CC3=CN=C2NC=1CN1CCOCC1 HCDMJFOHIXMBOV-UHFFFAOYSA-N 0.000 claims description 3
- 108020004414 DNA Proteins 0.000 claims description 3
- 230000010558 Gene Alterations Effects 0.000 claims description 3
- 101000742859 Homo sapiens Retinoblastoma-associated protein Proteins 0.000 claims description 3
- 108010014608 Proto-Oncogene Proteins c-kit Proteins 0.000 claims description 3
- 102100038042 Retinoblastoma-associated protein Human genes 0.000 claims description 3
- 108010053099 Vascular Endothelial Growth Factor Receptor-2 Proteins 0.000 claims description 3
- HWGQMRYQVZSGDQ-HZPDHXFCSA-N chembl3137320 Chemical compound CN1N=CN=C1[C@H]([C@H](N1)C=2C=CC(F)=CC=2)C2=NNC(=O)C3=C2C1=CC(F)=C3 HWGQMRYQVZSGDQ-HZPDHXFCSA-N 0.000 claims description 3
- 229950000521 entrectinib Drugs 0.000 claims description 3
- 229950003970 larotrectinib Drugs 0.000 claims description 3
- HAYYBYPASCDWEQ-UHFFFAOYSA-N n-[5-[(3,5-difluorophenyl)methyl]-1h-indazol-3-yl]-4-(4-methylpiperazin-1-yl)-2-(oxan-4-ylamino)benzamide Chemical compound C1CN(C)CCN1C(C=C1NC2CCOCC2)=CC=C1C(=O)NC(C1=C2)=NNC1=CC=C2CC1=CC(F)=CC(F)=C1 HAYYBYPASCDWEQ-UHFFFAOYSA-N 0.000 claims description 3
- 229950008835 neratinib Drugs 0.000 claims description 3
- ZNHPZUKZSNBOSQ-BQYQJAHWSA-N neratinib Chemical compound C=12C=C(NC\C=C\CN(C)C)C(OCC)=CC2=NC=C(C#N)C=1NC(C=C1Cl)=CC=C1OCC1=CC=CC=N1 ZNHPZUKZSNBOSQ-BQYQJAHWSA-N 0.000 claims description 3
- 229950011068 niraparib Drugs 0.000 claims description 3
- PCHKPVIQAHNQLW-CQSZACIVSA-N niraparib Chemical compound N1=C2C(C(=O)N)=CC=CC2=CN1C(C=C1)=CC=C1[C@@H]1CCCNC1 PCHKPVIQAHNQLW-CQSZACIVSA-N 0.000 claims description 3
- 229950004707 rucaparib Drugs 0.000 claims description 3
- DWYRIWUZIJHQKQ-SANMLTNESA-N (1S)-1-(4-fluorophenyl)-1-[2-[4-[6-(1-methylpyrazol-4-yl)pyrrolo[2,1-f][1,2,4]triazin-4-yl]piperazin-1-yl]pyrimidin-5-yl]ethanamine Chemical compound Cn1cc(cn1)-c1cc2c(ncnn2c1)N1CCN(CC1)c1ncc(cn1)[C@@](C)(N)c1ccc(F)cc1 DWYRIWUZIJHQKQ-SANMLTNESA-N 0.000 claims description 2
- STUWGJZDJHPWGZ-LBPRGKRZSA-N (2S)-N1-[4-methyl-5-[2-(1,1,1-trifluoro-2-methylpropan-2-yl)-4-pyridinyl]-2-thiazolyl]pyrrolidine-1,2-dicarboxamide Chemical compound S1C(C=2C=C(N=CC=2)C(C)(C)C(F)(F)F)=C(C)N=C1NC(=O)N1CCC[C@H]1C(N)=O STUWGJZDJHPWGZ-LBPRGKRZSA-N 0.000 claims description 2
- YXTKHLHCVFUPPT-YYFJYKOTSA-N (2s)-2-[[4-[(2-amino-5-formyl-4-oxo-1,6,7,8-tetrahydropteridin-6-yl)methylamino]benzoyl]amino]pentanedioic acid;(1r,2r)-1,2-dimethanidylcyclohexane;5-fluoro-1h-pyrimidine-2,4-dione;oxalic acid;platinum(2+) Chemical compound [Pt+2].OC(=O)C(O)=O.[CH2-][C@@H]1CCCC[C@H]1[CH2-].FC1=CNC(=O)NC1=O.C1NC=2NC(N)=NC(=O)C=2N(C=O)C1CNC1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 YXTKHLHCVFUPPT-YYFJYKOTSA-N 0.000 claims description 2
- LIOLIMKSCNQPLV-UHFFFAOYSA-N 2-fluoro-n-methyl-4-[7-(quinolin-6-ylmethyl)imidazo[1,2-b][1,2,4]triazin-2-yl]benzamide Chemical compound C1=C(F)C(C(=O)NC)=CC=C1C1=NN2C(CC=3C=C4C=CC=NC4=CC=3)=CN=C2N=C1 LIOLIMKSCNQPLV-UHFFFAOYSA-N 0.000 claims description 2
- XYDNMOZJKOGZLS-NSHDSACASA-N 3-[(1s)-1-imidazo[1,2-a]pyridin-6-ylethyl]-5-(1-methylpyrazol-4-yl)triazolo[4,5-b]pyrazine Chemical compound N1=C2N([C@H](C3=CN4C=CN=C4C=C3)C)N=NC2=NC=C1C=1C=NN(C)C=1 XYDNMOZJKOGZLS-NSHDSACASA-N 0.000 claims description 2
- JDUBGYFRJFOXQC-KRWDZBQOSA-N 4-amino-n-[(1s)-1-(4-chlorophenyl)-3-hydroxypropyl]-1-(7h-pyrrolo[2,3-d]pyrimidin-4-yl)piperidine-4-carboxamide Chemical compound C1([C@H](CCO)NC(=O)C2(CCN(CC2)C=2C=3C=CNC=3N=CN=2)N)=CC=C(Cl)C=C1 JDUBGYFRJFOXQC-KRWDZBQOSA-N 0.000 claims description 2
- AILRADAXUVEEIR-UHFFFAOYSA-N 5-chloro-4-n-(2-dimethylphosphorylphenyl)-2-n-[2-methoxy-4-[4-(4-methylpiperazin-1-yl)piperidin-1-yl]phenyl]pyrimidine-2,4-diamine Chemical compound COC1=CC(N2CCC(CC2)N2CCN(C)CC2)=CC=C1NC(N=1)=NC=C(Cl)C=1NC1=CC=CC=C1P(C)(C)=O AILRADAXUVEEIR-UHFFFAOYSA-N 0.000 claims description 2
- WXNSCLIZKHLNSG-MCZRLCSDSA-N 6-(2,5-dioxopyrrol-1-yl)-N-[2-[[2-[[(2S)-1-[[2-[[2-[[(10S,23S)-10-ethyl-18-fluoro-10-hydroxy-19-methyl-5,9-dioxo-8-oxa-4,15-diazahexacyclo[14.7.1.02,14.04,13.06,11.020,24]tetracosa-1,6(11),12,14,16,18,20(24)-heptaen-23-yl]amino]-2-oxoethoxy]methylamino]-2-oxoethyl]amino]-1-oxo-3-phenylpropan-2-yl]amino]-2-oxoethyl]amino]-2-oxoethyl]hexanamide Chemical compound CC[C@@]1(O)C(=O)OCC2=C1C=C1N(CC3=C1N=C1C=C(F)C(C)=C4CC[C@H](NC(=O)COCNC(=O)CNC(=O)[C@H](CC5=CC=CC=C5)NC(=O)CNC(=O)CNC(=O)CCCCCN5C(=O)C=CC5=O)C3=C14)C2=O WXNSCLIZKHLNSG-MCZRLCSDSA-N 0.000 claims description 2
- 206010052747 Adenocarcinoma pancreas Diseases 0.000 claims description 2
- 108091093088 Amplicon Proteins 0.000 claims description 2
- MLDQJTXFUGDVEO-UHFFFAOYSA-N BAY-43-9006 Chemical compound C1=NC(C(=O)NC)=CC(OC=2C=CC(NC(=O)NC=3C=C(C(Cl)=CC=3)C(F)(F)F)=CC=2)=C1 MLDQJTXFUGDVEO-UHFFFAOYSA-N 0.000 claims description 2
- QADPYRIHXKWUSV-UHFFFAOYSA-N BGJ-398 Chemical compound C1CN(CC)CCN1C(C=C1)=CC=C1NC1=CC(N(C)C(=O)NC=2C(=C(OC)C=C(OC)C=2Cl)Cl)=NC=N1 QADPYRIHXKWUSV-UHFFFAOYSA-N 0.000 claims description 2
- 102100027314 Beta-2-microglobulin Human genes 0.000 claims description 2
- 229940124204 C-kit inhibitor Drugs 0.000 claims description 2
- 239000012275 CTLA-4 inhibitor Substances 0.000 claims description 2
- 229940045513 CTLA4 antagonist Drugs 0.000 claims description 2
- 206010009944 Colon cancer Diseases 0.000 claims description 2
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 2
- 108010025464 Cyclin-Dependent Kinase 4 Proteins 0.000 claims description 2
- 102100036252 Cyclin-dependent kinase 4 Human genes 0.000 claims description 2
- 102100024185 G1/S-specific cyclin-D2 Human genes 0.000 claims description 2
- 102100034458 Hepatitis A virus cellular receptor 2 Human genes 0.000 claims description 2
- 101710083479 Hepatitis A virus cellular receptor 2 homolog Proteins 0.000 claims description 2
- 101000937544 Homo sapiens Beta-2-microglobulin Proteins 0.000 claims description 2
- 101000980741 Homo sapiens G1/S-specific cyclin-D2 Proteins 0.000 claims description 2
- 101000824318 Homo sapiens Protocadherin Fat 1 Proteins 0.000 claims description 2
- 101000654718 Homo sapiens SET-binding protein Proteins 0.000 claims description 2
- 239000005411 L01XE02 - Gefitinib Substances 0.000 claims description 2
- 239000005511 L01XE05 - Sorafenib Substances 0.000 claims description 2
- 239000002136 L01XE07 - Lapatinib Substances 0.000 claims description 2
- 239000002137 L01XE24 - Ponatinib Substances 0.000 claims description 2
- 102000017578 LAG3 Human genes 0.000 claims description 2
- 101150030213 Lag3 gene Proteins 0.000 claims description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 claims description 2
- 108090000744 Mitogen-Activated Protein Kinase Kinases Proteins 0.000 claims description 2
- 102000004232 Mitogen-Activated Protein Kinase Kinases Human genes 0.000 claims description 2
- 102000048238 Neuregulin-1 Human genes 0.000 claims description 2
- 108090000556 Neuregulin-1 Proteins 0.000 claims description 2
- 239000012270 PD-1 inhibitor Substances 0.000 claims description 2
- 239000012668 PD-1-inhibitor Substances 0.000 claims description 2
- 239000012271 PD-L1 inhibitor Substances 0.000 claims description 2
- 102100024216 Programmed cell death 1 ligand 1 Human genes 0.000 claims description 2
- 206010060862 Prostate cancer Diseases 0.000 claims description 2
- 208000000236 Prostatic Neoplasms Diseases 0.000 claims description 2
- 102000016971 Proto-Oncogene Proteins c-kit Human genes 0.000 claims description 2
- 102100022095 Protocadherin Fat 1 Human genes 0.000 claims description 2
- 102100032741 SET-binding protein Human genes 0.000 claims description 2
- 229940126547 T-cell immunoglobulin mucin-3 Drugs 0.000 claims description 2
- XSMVECZRZBFTIZ-UHFFFAOYSA-M [2-(aminomethyl)cyclobutyl]methanamine;2-oxidopropanoate;platinum(4+) Chemical compound [Pt+4].CC([O-])C([O-])=O.NCC1CCC1CN XSMVECZRZBFTIZ-UHFFFAOYSA-M 0.000 claims description 2
- 229960001686 afatinib Drugs 0.000 claims description 2
- ULXXDDBFHOBEHA-CWDCEQMOSA-N afatinib Chemical compound N1=CN=C2C=C(O[C@@H]3COCC3)C(NC(=O)/C=C/CN(C)C)=CC2=C1NC1=CC=C(F)C(Cl)=C1 ULXXDDBFHOBEHA-CWDCEQMOSA-N 0.000 claims description 2
- 229960001611 alectinib Drugs 0.000 claims description 2
- KDGFLJKFZUIJMX-UHFFFAOYSA-N alectinib Chemical compound CCC1=CC=2C(=O)C(C3=CC=C(C=C3N3)C#N)=C3C(C)(C)C=2C=C1N(CC1)CCC1N1CCOCC1 KDGFLJKFZUIJMX-UHFFFAOYSA-N 0.000 claims description 2
- 229950010482 alpelisib Drugs 0.000 claims description 2
- 229950009576 avapritinib Drugs 0.000 claims description 2
- 229950004272 brigatinib Drugs 0.000 claims description 2
- 229950009671 capivasertib Drugs 0.000 claims description 2
- 229950005852 capmatinib Drugs 0.000 claims description 2
- 229960004562 carboplatin Drugs 0.000 claims description 2
- 190000008236 carboplatin Chemical compound 0.000 claims description 2
- 229960005395 cetuximab Drugs 0.000 claims description 2
- 239000003795 chemical substances by application Substances 0.000 claims description 2
- 201000001329 chromosome 9p deletion syndrome Diseases 0.000 claims description 2
- 229960004316 cisplatin Drugs 0.000 claims description 2
- DQLATGHUWYMOKM-UHFFFAOYSA-L cisplatin Chemical compound N[Pt](N)(Cl)Cl DQLATGHUWYMOKM-UHFFFAOYSA-L 0.000 claims description 2
- 229960002465 dabrafenib Drugs 0.000 claims description 2
- BFSMGDJOXZAERB-UHFFFAOYSA-N dabrafenib Chemical compound S1C(C(C)(C)C)=NC(C=2C(=C(NS(=O)(=O)C=3C(=CC=CC=3F)F)C=CC=2)F)=C1C1=CC=NC(N)=N1 BFSMGDJOXZAERB-UHFFFAOYSA-N 0.000 claims description 2
- 229950002205 dacomitinib Drugs 0.000 claims description 2
- LVXJQMNHJWSHET-AATRIKPKSA-N dacomitinib Chemical compound C=12C=C(NC(=O)\C=C\CN3CCCCC3)C(OC)=CC2=NC=NC=1NC1=CC=C(F)C(Cl)=C1 LVXJQMNHJWSHET-AATRIKPKSA-N 0.000 claims description 2
- 229940121647 egfr inhibitor Drugs 0.000 claims description 2
- 229950001969 encorafenib Drugs 0.000 claims description 2
- 229960002584 gefitinib Drugs 0.000 claims description 2
- XGALLCVXEZPNRQ-UHFFFAOYSA-N gefitinib Chemical compound C=12C=C(OCCCN3CCOCC3)C(OC)=CC2=NC=NC=1NC1=CC=C(F)C(Cl)=C1 XGALLCVXEZPNRQ-UHFFFAOYSA-N 0.000 claims description 2
- 239000002955 immunomodulating agent Substances 0.000 claims description 2
- 229940121354 immunomodulator Drugs 0.000 claims description 2
- 230000002584 immunomodulator Effects 0.000 claims description 2
- 229950005712 infigratinib Drugs 0.000 claims description 2
- WIJZXSAJMHAVGX-DHLKQENFSA-N ivosidenib Chemical compound FC1=CN=CC(N([C@H](C(=O)NC2CC(F)(F)C2)C=2C(=CC=CC=2)Cl)C(=O)[C@H]2N(C(=O)CC2)C=2N=CC=C(C=2)C#N)=C1 WIJZXSAJMHAVGX-DHLKQENFSA-N 0.000 claims description 2
- 229950010738 ivosidenib Drugs 0.000 claims description 2
- 229960004891 lapatinib Drugs 0.000 claims description 2
- BCFGMOOMADDAQU-UHFFFAOYSA-N lapatinib Chemical compound O1C(CNCCS(=O)(=O)C)=CC=C1C1=CC=C(N=CN=C2NC=3C=C(Cl)C(OCC=4C=C(F)C=CC=4)=CC=3)C2=C1 BCFGMOOMADDAQU-UHFFFAOYSA-N 0.000 claims description 2
- 229960003784 lenvatinib Drugs 0.000 claims description 2
- WOSKHXYHFSIKNG-UHFFFAOYSA-N lenvatinib Chemical compound C=12C=C(C(N)=O)C(OC)=CC2=NC=CC=1OC(C=C1Cl)=CC=C1NC(=O)NC1CC1 WOSKHXYHFSIKNG-UHFFFAOYSA-N 0.000 claims description 2
- CMJCXYNUCSMDBY-ZDUSSCGKSA-N lgx818 Chemical compound COC(=O)N[C@@H](C)CNC1=NC=CC(C=2C(=NN(C=2)C(C)C)C=2C(=C(NS(C)(=O)=O)C=C(Cl)C=2)F)=N1 CMJCXYNUCSMDBY-ZDUSSCGKSA-N 0.000 claims description 2
- 229950008991 lobaplatin Drugs 0.000 claims description 2
- 201000005202 lung cancer Diseases 0.000 claims description 2
- 208000020816 lung neoplasm Diseases 0.000 claims description 2
- 229950003135 margetuximab Drugs 0.000 claims description 2
- 229950007221 nedaplatin Drugs 0.000 claims description 2
- 190000005734 nedaplatin Chemical compound 0.000 claims description 2
- 229960000572 olaparib Drugs 0.000 claims description 2
- 231100000590 oncogenic Toxicity 0.000 claims description 2
- 230000002246 oncogenic effect Effects 0.000 claims description 2
- 229960003278 osimertinib Drugs 0.000 claims description 2
- DUYJMQONPNNFPI-UHFFFAOYSA-N osimertinib Chemical compound COC1=CC(N(C)CCN(C)C)=C(NC(=O)C=C)C=C1NC1=NC=CC(C=2C3=CC=CC=C3N(C)C=2)=N1 DUYJMQONPNNFPI-UHFFFAOYSA-N 0.000 claims description 2
- 229960001756 oxaliplatin Drugs 0.000 claims description 2
- DWAFYCQODLXJNR-BNTLRKBRSA-L oxaliplatin Chemical compound O1C(=O)C(=O)O[Pt]11N[C@@H]2CCCC[C@H]2N1 DWAFYCQODLXJNR-BNTLRKBRSA-L 0.000 claims description 2
- 229960004390 palbociclib Drugs 0.000 claims description 2
- AHJRHEGDXFFMBM-UHFFFAOYSA-N palbociclib Chemical compound N1=C2N(C3CCCC3)C(=O)C(C(=O)C)=C(C)C2=CN=C1NC(N=C1)=CC=C1N1CCNCC1 AHJRHEGDXFFMBM-UHFFFAOYSA-N 0.000 claims description 2
- 201000002094 pancreatic adenocarcinoma Diseases 0.000 claims description 2
- 229960001972 panitumumab Drugs 0.000 claims description 2
- 229940121655 pd-1 inhibitor Drugs 0.000 claims description 2
- 229940121656 pd-l1 inhibitor Drugs 0.000 claims description 2
- 229940121317 pemigatinib Drugs 0.000 claims description 2
- 229960002087 pertuzumab Drugs 0.000 claims description 2
- 229960001131 ponatinib Drugs 0.000 claims description 2
- PHXJVRSECIGDHY-UHFFFAOYSA-N ponatinib Chemical compound C1CN(C)CCN1CC(C(=C1)C(F)(F)F)=CC=C1NC(=O)C1=CC=C(C)C(C#CC=2N3N=CC=CC3=NC=2)=C1 PHXJVRSECIGDHY-UHFFFAOYSA-N 0.000 claims description 2
- 229940121597 pralsetinib Drugs 0.000 claims description 2
- GBLBJPZSROAGMF-BATDWUPUSA-N pralsetinib Chemical compound CO[C@]1(CC[C@@H](CC1)C1=NC(NC2=NNC(C)=C2)=CC(C)=N1)C(=O)N[C@@H](C)C1=CC=C(N=C1)N1C=C(F)C=N1 GBLBJPZSROAGMF-BATDWUPUSA-N 0.000 claims description 2
- 201000002025 prostate sarcoma Diseases 0.000 claims description 2
- 108700042226 ras Genes Proteins 0.000 claims description 2
- 229940121602 repotrectinib Drugs 0.000 claims description 2
- FIKPXCOQUIZNHB-WDEREUQCSA-N repotrectinib Chemical compound C[C@H]1CNC(=O)C2=C3N=C(N[C@H](C)C4=C(O1)C=CC(F)=C4)C=CN3N=C2 FIKPXCOQUIZNHB-WDEREUQCSA-N 0.000 claims description 2
- HMABYWSNWIZPAG-UHFFFAOYSA-N rucaparib Chemical compound C1=CC(CNC)=CC=C1C(N1)=C2CCNC(=O)C3=C2C1=CC(F)=C3 HMABYWSNWIZPAG-UHFFFAOYSA-N 0.000 claims description 2
- 229960005399 satraplatin Drugs 0.000 claims description 2
- 190014017285 satraplatin Chemical compound 0.000 claims description 2
- 229950003500 savolitinib Drugs 0.000 claims description 2
- 229940121610 selpercatinib Drugs 0.000 claims description 2
- 229960003787 sorafenib Drugs 0.000 claims description 2
- 229950004550 talazoparib Drugs 0.000 claims description 2
- 229960001612 trastuzumab emtansine Drugs 0.000 claims description 2
- 229960003862 vemurafenib Drugs 0.000 claims description 2
- GPXBXXGIAQBQNI-UHFFFAOYSA-N vemurafenib Chemical compound CCCS(=O)(=O)NC1=CC=C(F)C(C(=O)C=2C3=CC(=CN=C3NC=2)C=2C=CC(Cl)=CC=2)=C1F GPXBXXGIAQBQNI-UHFFFAOYSA-N 0.000 claims description 2
- 101150080074 TP53 gene Proteins 0.000 claims 4
- JJWKPURADFRFRB-UHFFFAOYSA-N carbonyl sulfide Chemical compound O=C=S JJWKPURADFRFRB-UHFFFAOYSA-N 0.000 claims 4
- 108700025694 p53 Genes Proteins 0.000 claims 4
- 101150068332 KIT gene Proteins 0.000 claims 3
- 101150048834 braF gene Proteins 0.000 claims 3
- 101150039808 Egfr gene Proteins 0.000 claims 2
- 206010051066 Gastrointestinal stromal tumour Diseases 0.000 claims 2
- 101000904868 Homo sapiens Transcriptional regulator ATRX Proteins 0.000 claims 2
- 108700021358 erbB-1 Genes Proteins 0.000 claims 2
- 101100455868 Arabidopsis thaliana MKK2 gene Proteins 0.000 claims 1
- 102100032887 Clusterin Human genes 0.000 claims 1
- 108090000197 Clusterin Proteins 0.000 claims 1
- 101100004648 Drosophila melanogaster brat gene Proteins 0.000 claims 1
- 108010086512 Hepatocyte Nuclear Factor 1 Proteins 0.000 claims 1
- 102000006754 Hepatocyte Nuclear Factor 1 Human genes 0.000 claims 1
- 101150097381 Mtor gene Proteins 0.000 claims 1
- 241001529936 Murinae Species 0.000 claims 1
- 108700020796 Oncogene Proteins 0.000 claims 1
- 206010039491 Sarcoma Diseases 0.000 claims 1
- 101100049199 Xenopus laevis vegt-a gene Proteins 0.000 claims 1
- 101150069914 brat gene Proteins 0.000 claims 1
- 230000002055 immunohistochemical effect Effects 0.000 claims 1
- 229950001290 lorlatinib Drugs 0.000 claims 1
- IIXWYSCJSQVBQM-LLVKDONJSA-N lorlatinib Chemical compound N=1N(C)C(C#N)=C2C=1CN(C)C(=O)C1=CC=C(F)C=C1[C@@H](C)OC1=CC2=CN=C1N IIXWYSCJSQVBQM-LLVKDONJSA-N 0.000 claims 1
- FAQDUNYVKQKNLD-UHFFFAOYSA-N olaparib Chemical compound FC1=CC=C(CC2=C3[CH]C=CC=C3C(=O)N=N2)C=C1C(=O)N(CC1)CCN1C(=O)C1CC1 FAQDUNYVKQKNLD-UHFFFAOYSA-N 0.000 claims 1
- 208000003154 papilloma Diseases 0.000 claims 1
- 230000003612 virological effect Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 45
- APWRZPQBPCAXFP-UHFFFAOYSA-N 1-(1-oxo-2H-isoquinolin-5-yl)-5-(trifluoromethyl)-N-[2-(trifluoromethyl)pyridin-4-yl]pyrazole-4-carboxamide Chemical compound O=C1NC=CC2=C(C=CC=C12)N1N=CC(=C1C(F)(F)F)C(=O)NC1=CC(=NC=C1)C(F)(F)F APWRZPQBPCAXFP-UHFFFAOYSA-N 0.000 description 44
- KVCQTKNUUQOELD-UHFFFAOYSA-N 4-amino-n-[1-(3-chloro-2-fluoroanilino)-6-methylisoquinolin-5-yl]thieno[3,2-d]pyrimidine-7-carboxamide Chemical compound N=1C=CC2=C(NC(=O)C=3C4=NC=NC(N)=C4SC=3)C(C)=CC=C2C=1NC1=CC=CC(Cl)=C1F KVCQTKNUUQOELD-UHFFFAOYSA-N 0.000 description 30
- 206010055113 Breast cancer metastatic Diseases 0.000 description 23
- 230000004083 survival effect Effects 0.000 description 22
- 238000013459 approach Methods 0.000 description 20
- 230000004044 response Effects 0.000 description 19
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 15
- 230000002611 ovarian Effects 0.000 description 15
- 210000000481 breast Anatomy 0.000 description 14
- 238000013135 deep learning Methods 0.000 description 14
- 238000003205 genotyping method Methods 0.000 description 14
- 238000013526 transfer learning Methods 0.000 description 14
- 238000002493 microarray Methods 0.000 description 11
- 238000002512 chemotherapy Methods 0.000 description 10
- 238000003745 diagnosis Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 8
- 230000002950 deficient Effects 0.000 description 8
- 238000010801 machine learning Methods 0.000 description 8
- 108091007743 BRCA1/2 Proteins 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000003556 assay Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000007482 whole exome sequencing Methods 0.000 description 6
- KCBWAFJCKVKYHO-UHFFFAOYSA-N 6-(4-cyclopropyl-6-methoxypyrimidin-5-yl)-1-[[4-[1-propan-2-yl-4-(trifluoromethyl)imidazol-2-yl]phenyl]methyl]pyrazolo[3,4-d]pyrimidine Chemical compound C1(CC1)C1=NC=NC(=C1C1=NC=C2C(=N1)N(N=C2)CC1=CC=C(C=C1)C=1N(C=C(N=1)C(F)(F)F)C(C)C)OC KCBWAFJCKVKYHO-UHFFFAOYSA-N 0.000 description 5
- 238000001574 biopsy Methods 0.000 description 5
- 238000013136 deep learning model Methods 0.000 description 5
- 238000002405 diagnostic procedure Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 238000002372 labelling Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000000926 separation method Methods 0.000 description 5
- 238000010200 validation analysis Methods 0.000 description 5
- 206010069754 Acquired gene mutation Diseases 0.000 description 4
- 230000033616 DNA repair Effects 0.000 description 4
- 238000001712 DNA sequencing Methods 0.000 description 4
- 102000001195 RAD51 Human genes 0.000 description 4
- 210000004602 germ cell Anatomy 0.000 description 4
- 201000008129 pancreatic ductal adenocarcinoma Diseases 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000028617 response to DNA damage stimulus Effects 0.000 description 4
- XGVXKJKTISMIOW-ZDUSSCGKSA-N simurosertib Chemical compound N1N=CC(C=2SC=3C(=O)NC(=NC=3C=2)[C@H]2N3CCC(CC3)C2)=C1C XGVXKJKTISMIOW-ZDUSSCGKSA-N 0.000 description 4
- 230000037439 somatic mutation Effects 0.000 description 4
- 238000012952 Resampling Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 238000001325 log-rank test Methods 0.000 description 3
- 230000000877 morphologic effect Effects 0.000 description 3
- 230000007170 pathology Effects 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 238000011518 platinum-based chemotherapy Methods 0.000 description 3
- 230000008685 targeting Effects 0.000 description 3
- UKGJZDSUJSPAJL-YPUOHESYSA-N (e)-n-[(1r)-1-[3,5-difluoro-4-(methanesulfonamido)phenyl]ethyl]-3-[2-propyl-6-(trifluoromethyl)pyridin-3-yl]prop-2-enamide Chemical compound CCCC1=NC(C(F)(F)F)=CC=C1\C=C\C(=O)N[C@H](C)C1=CC(F)=C(NS(C)(=O)=O)C(F)=C1 UKGJZDSUJSPAJL-YPUOHESYSA-N 0.000 description 2
- IRPVABHDSJVBNZ-RTHVDDQRSA-N 5-[1-(cyclopropylmethyl)-5-[(1R,5S)-3-(oxetan-3-yl)-3-azabicyclo[3.1.0]hexan-6-yl]pyrazol-3-yl]-3-(trifluoromethyl)pyridin-2-amine Chemical compound C1=C(C(F)(F)F)C(N)=NC=C1C1=NN(CC2CC2)C(C2[C@@H]3CN(C[C@@H]32)C2COC2)=C1 IRPVABHDSJVBNZ-RTHVDDQRSA-N 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 239000000104 diagnostic biomarker Substances 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 231100000518 lethal Toxicity 0.000 description 2
- 230000001665 lethal effect Effects 0.000 description 2
- FDLYAMZZIXQODN-UHFFFAOYSA-N olaparib Chemical compound FC1=CC=C(CC=2C3=CC=CC=C3C(=O)NN=2)C=C1C(=O)N(CC1)CCN1C(=O)C1CC1 FDLYAMZZIXQODN-UHFFFAOYSA-N 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- LZMJNVRJMFMYQS-UHFFFAOYSA-N poseltinib Chemical compound C1CN(C)CCN1C(C=C1)=CC=C1NC1=NC(OC=2C=C(NC(=O)C=C)C=CC=2)=C(OC=C2)C2=N1 LZMJNVRJMFMYQS-UHFFFAOYSA-N 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000010186 staining Methods 0.000 description 2
- 238000002626 targeted therapy Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012070 whole genome sequencing analysis Methods 0.000 description 2
- VCGRFBXVSFAGGA-UHFFFAOYSA-N (1,1-dioxo-1,4-thiazinan-4-yl)-[6-[[3-(4-fluorophenyl)-5-methyl-1,2-oxazol-4-yl]methoxy]pyridin-3-yl]methanone Chemical compound CC=1ON=C(C=2C=CC(F)=CC=2)C=1COC(N=C1)=CC=C1C(=O)N1CCS(=O)(=O)CC1 VCGRFBXVSFAGGA-UHFFFAOYSA-N 0.000 description 1
- MAYZWDRUFKUGGP-VIFPVBQESA-N (3s)-1-[5-tert-butyl-3-[(1-methyltetrazol-5-yl)methyl]triazolo[4,5-d]pyrimidin-7-yl]pyrrolidin-3-ol Chemical compound CN1N=NN=C1CN1C2=NC(C(C)(C)C)=NC(N3C[C@@H](O)CC3)=C2N=N1 MAYZWDRUFKUGGP-VIFPVBQESA-N 0.000 description 1
- ZGYIXVSQHOKQRZ-COIATFDQSA-N (e)-n-[4-[3-chloro-4-(pyridin-2-ylmethoxy)anilino]-3-cyano-7-[(3s)-oxolan-3-yl]oxyquinolin-6-yl]-4-(dimethylamino)but-2-enamide Chemical compound N#CC1=CN=C2C=C(O[C@@H]3COCC3)C(NC(=O)/C=C/CN(C)C)=CC2=C1NC(C=C1Cl)=CC=C1OCC1=CC=CC=N1 ZGYIXVSQHOKQRZ-COIATFDQSA-N 0.000 description 1
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- ABDDQTDRAHXHOC-QMMMGPOBSA-N 1-[(7s)-5,7-dihydro-4h-thieno[2,3-c]pyran-7-yl]-n-methylmethanamine Chemical compound CNC[C@@H]1OCCC2=C1SC=C2 ABDDQTDRAHXHOC-QMMMGPOBSA-N 0.000 description 1
- KKVYYGGCHJGEFJ-UHFFFAOYSA-N 1-n-(4-chlorophenyl)-6-methyl-5-n-[3-(7h-purin-6-yl)pyridin-2-yl]isoquinoline-1,5-diamine Chemical compound N=1C=CC2=C(NC=3C(=CC=CN=3)C=3C=4N=CNC=4N=CN=3)C(C)=CC=C2C=1NC1=CC=C(Cl)C=C1 KKVYYGGCHJGEFJ-UHFFFAOYSA-N 0.000 description 1
- VZSRBBMJRBPUNF-UHFFFAOYSA-N 2-(2,3-dihydro-1H-inden-2-ylamino)-N-[3-oxo-3-(2,4,6,7-tetrahydrotriazolo[4,5-c]pyridin-5-yl)propyl]pyrimidine-5-carboxamide Chemical compound C1C(CC2=CC=CC=C12)NC1=NC=C(C=N1)C(=O)NCCC(N1CC2=C(CC1)NN=N2)=O VZSRBBMJRBPUNF-UHFFFAOYSA-N 0.000 description 1
- FYELSNVLZVIGTI-UHFFFAOYSA-N 2-[4-[2-(2,3-dihydro-1H-inden-2-ylamino)pyrimidin-5-yl]-5-ethylpyrazol-1-yl]-1-(2,4,6,7-tetrahydrotriazolo[4,5-c]pyridin-5-yl)ethanone Chemical compound C1C(CC2=CC=CC=C12)NC1=NC=C(C=N1)C=1C=NN(C=1CC)CC(=O)N1CC2=C(CC1)NN=N2 FYELSNVLZVIGTI-UHFFFAOYSA-N 0.000 description 1
- BYHQTRFJOGIQAO-GOSISDBHSA-N 3-(4-bromophenyl)-8-[(2R)-2-hydroxypropyl]-1-[(3-methoxyphenyl)methyl]-1,3,8-triazaspiro[4.5]decan-2-one Chemical compound C[C@H](CN1CCC2(CC1)CN(C(=O)N2CC3=CC(=CC=C3)OC)C4=CC=C(C=C4)Br)O BYHQTRFJOGIQAO-GOSISDBHSA-N 0.000 description 1
- YGYGASJNJTYNOL-CQSZACIVSA-N 3-[(4r)-2,2-dimethyl-1,1-dioxothian-4-yl]-5-(4-fluorophenyl)-1h-indole-7-carboxamide Chemical compound C1CS(=O)(=O)C(C)(C)C[C@@H]1C1=CNC2=C(C(N)=O)C=C(C=3C=CC(F)=CC=3)C=C12 YGYGASJNJTYNOL-CQSZACIVSA-N 0.000 description 1
- WNEODWDFDXWOLU-QHCPKHFHSA-N 3-[3-(hydroxymethyl)-4-[1-methyl-5-[[5-[(2s)-2-methyl-4-(oxetan-3-yl)piperazin-1-yl]pyridin-2-yl]amino]-6-oxopyridin-3-yl]pyridin-2-yl]-7,7-dimethyl-1,2,6,8-tetrahydrocyclopenta[3,4]pyrrolo[3,5-b]pyrazin-4-one Chemical compound C([C@@H](N(CC1)C=2C=NC(NC=3C(N(C)C=C(C=3)C=3C(=C(N4C(C5=CC=6CC(C)(C)CC=6N5CC4)=O)N=CC=3)CO)=O)=CC=2)C)N1C1COC1 WNEODWDFDXWOLU-QHCPKHFHSA-N 0.000 description 1
- SRVXSISGYBMIHR-UHFFFAOYSA-N 3-[3-[3-(2-amino-2-oxoethyl)phenyl]-5-chlorophenyl]-3-(5-methyl-1,3-thiazol-2-yl)propanoic acid Chemical compound S1C(C)=CN=C1C(CC(O)=O)C1=CC(Cl)=CC(C=2C=C(CC(N)=O)C=CC=2)=C1 SRVXSISGYBMIHR-UHFFFAOYSA-N 0.000 description 1
- VJPPLCNBDLZIFG-ZDUSSCGKSA-N 4-[(3S)-3-(but-2-ynoylamino)piperidin-1-yl]-5-fluoro-2,3-dimethyl-1H-indole-7-carboxamide Chemical compound C(C#CC)(=O)N[C@@H]1CN(CCC1)C1=C2C(=C(NC2=C(C=C1F)C(=O)N)C)C VJPPLCNBDLZIFG-ZDUSSCGKSA-N 0.000 description 1
- YFCIFWOJYYFDQP-PTWZRHHISA-N 4-[3-amino-6-[(1S,3S,4S)-3-fluoro-4-hydroxycyclohexyl]pyrazin-2-yl]-N-[(1S)-1-(3-bromo-5-fluorophenyl)-2-(methylamino)ethyl]-2-fluorobenzamide Chemical compound CNC[C@@H](NC(=O)c1ccc(cc1F)-c1nc(cnc1N)[C@H]1CC[C@H](O)[C@@H](F)C1)c1cc(F)cc(Br)c1 YFCIFWOJYYFDQP-PTWZRHHISA-N 0.000 description 1
- 101150052384 50 gene Proteins 0.000 description 1
- CYJRNFFLTBEQSQ-UHFFFAOYSA-N 8-(3-methyl-1-benzothiophen-5-yl)-N-(4-methylsulfonylpyridin-3-yl)quinoxalin-6-amine Chemical compound CS(=O)(=O)C1=C(C=NC=C1)NC=1C=C2N=CC=NC2=C(C=1)C=1C=CC2=C(C(=CS2)C)C=1 CYJRNFFLTBEQSQ-UHFFFAOYSA-N 0.000 description 1
- ZRPZPNYZFSJUPA-UHFFFAOYSA-N ARS-1620 Chemical compound Oc1cccc(F)c1-c1c(Cl)cc2c(ncnc2c1F)N1CCN(CC1)C(=O)C=C ZRPZPNYZFSJUPA-UHFFFAOYSA-N 0.000 description 1
- 102100034580 AT-rich interactive domain-containing protein 1A Human genes 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 101100002344 Caenorhabditis elegans arid-1 gene Proteins 0.000 description 1
- 102000005381 Cytidine Deaminase Human genes 0.000 description 1
- 108010031325 Cytidine deaminase Proteins 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- GISRWBROCYNDME-PELMWDNLSA-N F[C@H]1[C@H]([C@H](NC1=O)COC1=NC=CC2=CC(=C(C=C12)OC)C(=O)N)C Chemical compound F[C@H]1[C@H]([C@H](NC1=O)COC1=NC=CC2=CC(=C(C=C12)OC)C(=O)N)C GISRWBROCYNDME-PELMWDNLSA-N 0.000 description 1
- 240000008168 Ficus benjamina Species 0.000 description 1
- 101000924266 Homo sapiens AT-rich interactive domain-containing protein 1A Proteins 0.000 description 1
- 239000002118 L01XE12 - Vandetanib Substances 0.000 description 1
- 208000032818 Microsatellite Instability Diseases 0.000 description 1
- 108010074346 Mismatch Repair Endonuclease PMS2 Proteins 0.000 description 1
- 102100037480 Mismatch repair endonuclease PMS2 Human genes 0.000 description 1
- 101100381978 Mus musculus Braf gene Proteins 0.000 description 1
- AYCPARAPKDAOEN-LJQANCHMSA-N N-[(1S)-2-(dimethylamino)-1-phenylethyl]-6,6-dimethyl-3-[(2-methyl-4-thieno[3,2-d]pyrimidinyl)amino]-1,4-dihydropyrrolo[3,4-c]pyrazole-5-carboxamide Chemical compound C1([C@H](NC(=O)N2C(C=3NN=C(NC=4C=5SC=CC=5N=C(C)N=4)C=3C2)(C)C)CN(C)C)=CC=CC=C1 AYCPARAPKDAOEN-LJQANCHMSA-N 0.000 description 1
- AFCARXCZXQIEQB-UHFFFAOYSA-N N-[3-oxo-3-(2,4,6,7-tetrahydrotriazolo[4,5-c]pyridin-5-yl)propyl]-2-[[3-(trifluoromethoxy)phenyl]methylamino]pyrimidine-5-carboxamide Chemical compound O=C(CCNC(=O)C=1C=NC(=NC=1)NCC1=CC(=CC=C1)OC(F)(F)F)N1CC2=C(CC1)NN=N2 AFCARXCZXQIEQB-UHFFFAOYSA-N 0.000 description 1
- IDRGFNPZDVBSSE-UHFFFAOYSA-N OCCN1CCN(CC1)c1ccc(Nc2ncc3cccc(-c4cccc(NC(=O)C=C)c4)c3n2)c(F)c1F Chemical compound OCCN1CCN(CC1)c1ccc(Nc2ncc3cccc(-c4cccc(NC(=O)C=C)c4)c3n2)c(F)c1F IDRGFNPZDVBSSE-UHFFFAOYSA-N 0.000 description 1
- 238000003559 RNA-seq method Methods 0.000 description 1
- 229940124653 Talzenna Drugs 0.000 description 1
- 208000003721 Triple Negative Breast Neoplasms Diseases 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- LXRZVMYMQHNYJB-UNXOBOICSA-N [(1R,2S,4R)-4-[[5-[4-[(1R)-7-chloro-1,2,3,4-tetrahydroisoquinolin-1-yl]-5-methylthiophene-2-carbonyl]pyrimidin-4-yl]amino]-2-hydroxycyclopentyl]methyl sulfamate Chemical compound CC1=C(C=C(S1)C(=O)C1=C(N[C@H]2C[C@H](O)[C@@H](COS(N)(=O)=O)C2)N=CN=C1)[C@@H]1NCCC2=C1C=C(Cl)C=C2 LXRZVMYMQHNYJB-UNXOBOICSA-N 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- 229960003852 atezolizumab Drugs 0.000 description 1
- 229950002916 avelumab Drugs 0.000 description 1
- 230000004611 cancer cell death Effects 0.000 description 1
- 239000012830 cancer therapeutic Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 229940121420 cemiplimab Drugs 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000002939 deleterious effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 229940121432 dostarlimab Drugs 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 229950009791 durvalumab Drugs 0.000 description 1
- 230000008482 dysregulation Effects 0.000 description 1
- 229940056913 eftilagimod alfa Drugs 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000007490 hematoxylin and eosin (H&E) staining Methods 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 229960005386 ipilimumab Drugs 0.000 description 1
- 238000004619 light microscopy Methods 0.000 description 1
- 208000026534 luminal B breast carcinoma Diseases 0.000 description 1
- 229940100352 lynparza Drugs 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 238000001000 micrograph Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 229960003301 nivolumab Drugs 0.000 description 1
- 238000011275 oncology therapy Methods 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000007918 pathogenicity Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 229960002621 pembrolizumab Drugs 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011338 personalized therapy Methods 0.000 description 1
- 229950010773 pidilizumab Drugs 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 229940121484 relatlimab Drugs 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- INBJJAFXHQQSRW-STOWLHSFSA-N rucaparib camsylate Chemical compound CC1(C)[C@@H]2CC[C@@]1(CS(O)(=O)=O)C(=O)C2.CNCc1ccc(cc1)-c1[nH]c2cc(F)cc3C(=O)NCCc1c23 INBJJAFXHQQSRW-STOWLHSFSA-N 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000011301 standard therapy Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 229940061918 tebotelimab Drugs 0.000 description 1
- 238000011285 therapeutic regimen Methods 0.000 description 1
- 239000012859 tissue stain Substances 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 229950007217 tremelimumab Drugs 0.000 description 1
- 208000022679 triple-negative breast carcinoma Diseases 0.000 description 1
- 229960000241 vandetanib Drugs 0.000 description 1
- UHTHHESEBZOYNR-UHFFFAOYSA-N vandetanib Chemical compound COC1=CC(C(/N=CN2)=N/C=3C(=CC(Br)=CC=3)F)=C2C=C1OCC1CCN(C)CC1 UHTHHESEBZOYNR-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/698—Matching; Classification
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10056—Microscopic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30024—Cell structures in vitro; Tissue sections in vitro
Definitions
- the disclosed technology relates to systems and methods for detecting clinically actionable biomarkers.
- a method of determining the presence of a biomarker in a biological sample includes obtaining a section of a biological sample, wherein the section of the biological sample has been treated with a stain, imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution to generate a first and second plurality of image data, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and providing the first and the second plurality of image data to a trained predictive neural network and determining the presence of a biomarker in the biological sample as an output of the trained predictive neural network.
- a method of generating a trained predictive model configured to determine a presence of a biomarker in a biological sample includes generating stained sections of one or more biological samples and corresponding biomarker labels, imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution to generate a first and second plurality of image data, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
- a method of determining the presence of a biomarker in a biological sample includes obtaining a stained section of the biological sample, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, and providing the plurality of images of the stained section an inputto a trained predictive model and determining the presence of a biomarker in the biological sample as an output of a trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing.
- a treatment method for treating cancer in a subject in need thereof includes obtaining a stained section of a biological sample, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, providing the plurality of images of the stained section to a trained predictive model and determining a presence of a biomarker in the biological sample as an output of the trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing, and administering treatment to the patientbased on the presence ofthe biomarker.
- FIGS. 1A-1B show an example of multi-resolution convolutional neural network architecture to detect molecular biomarkers from histopathological tissue slides based on some implementations of the disclosed technology.
- FIGS. 2A-2D show a neural network for detecting homologous recombination deficiency and predicting response to treatment in primary and metastatic breast cancer.
- FIGS. 3A-3C show a transfer learning in ovarian cancer for predicting response to platinum treatment.
- FIGS. 4A-4C show workflow for training neural network models independently for digitalized flash frozen and FFPE breast cancer slides.
- FIGS. 5A-5B show workflow for testing the performance of neural network (e.g., DeepHRD) models for digitalized flash frozen and FFPE breast cancer slides.
- neural network e.g., DeepHRD
- FIG. 6 shows an example method ofdetermining the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
- FIG. 7 shows an example method of enerating a trained predictive model configured to determine a presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
- FIG. 8 shows another example method of determining the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
- FIG. 9 shows a treatment method for treating cancer in a subject in need thereof based on some implementations of the disclosed technology .
- FIG. 10 shows an example of a computer system configured to determine the presence of abiomarker of abiological sample based on some implementations of the disclosed technology.
- the disclosed technology can be implemented in some embodiments to provide an artificial intelligence (Al) architecture platform for predicting cancer biomarkers, and provide therapeutic methods based on the biomarkers identified by the Al architecture platform
- the disclosed technology can be implemented in some embodiments to provide methods and systems for detecting clinically actionable cancer biomarkers and mutational signatures directly from digital hematoxylin and eosin (H&E) slides without sequencing.
- H&E digital hematoxylin and eosin
- the disclosed technology can also be implemented in some embodiments to provide a novel deep learning architecture that, with little to no customization, can be trained to predict clinically actionable molecular cancer biomarkers directly from digital images based on scans of slides stained using hematoxylin and eosin (H&E).
- H&E hematoxylin and eosin
- the invention allows skipping DNA sequencing and provides direct prediction of these biomarkers from the scanned slides.
- the machine learning method aggregates these predictions to locate regions of interests and makes a final actional status prediction for each patient. Further, this model introduces a multi-resolution approach that captures morphological patterns at varying zoom magnifications and implements Monte Carlo dropout to provide confidence metrics that refine the final predictive value. Regions of interests are selected using an unsupervised machine learning module comprised of a dimensionality reduction using principal component analysis and custom k-means clustering algorithm on the extracted feature vectors for each component of the grid space. From an application perspective, the method requires training before it can be applied to a particular clinical biomarker. Specifically, the approach requires at least a thousand patients with digital H&E slides and known molecular biomarkers to generate a cancer-specific/biomarker-specific prediction model. After the model is trained, it can be applied to an individual patient making a prediction whether the cancer of the individual patient has the biomarker.
- Hematoxylin and eosin (H&E) stain is one of the principal tissue stains usedin histology. H&E slides are routinely and universally generated by pathologists for cancer diagnosis. However, in most cases, these slides do not allow pathologist to detect clinically actionable molecular biomarkers and do not provide guidance for personalized therapy. As such, in majority of cases, cancer samples are subsequently sent for DNA and/or RNA sequencing for detecting individual and/or sets of biomarkers.
- the novel deep learning architecture implemented based on some embodiments of the disclosed technology allows training an Al model that can directly predictbiomarkers which are incorporated in therapeutic regimens thereby improving patient response to therapy and survival after patients are treated with therapy targeting the detected biomarker.
- the methods provided herein allow for identification of biomarkers directly from the digital H&E slides (see FIGS. 4A-4C), thus, skipping the need for shipping and sequencing of bio-specimens by external providers. For example, instead of sending a biospecimen over the mail to an external CLIA lab and waiting 14 days for results from sequencing, the Al approach allows directly detecting these biomarkers in the digital slides within a fraction of a second.
- the disclosed technology can also be implemented in some embodiments to provide a novel deep learning architecture that, with little to no customization, can be trained to predict clinically actionable and/or epidemiologically relevant molecular signatures directly from digital images based on scans of slides stained using hematoxylin and eosin (H&E).
- H&E hematoxylin and eosin
- the invention allows skipping DNA sequencing and provides direct prediction of these signatures from the digital images of the scanned slides.
- Previous methods to detect clinically actionable biomarkers for personalized cancer treatment or epidemiologically relevant biomarkers for large-scale genetic epidemiological studies relied almost exclusively on DNA sequencing or genotyping platforms (e.g., microarrays, targeted panel sequencing, whole-exome sequencing, and/orwhole- genome sequencing).
- the developed convolutional neural network architecture completely avoids traditional sequencing approaches by accurately evaluating the presence or ab sence of molecular signatures using digital images from hematoxylin and eosin-stained histology images sampled from individual patients.
- the developed architecture requires at least 1,000 digital images of whole slides for training a model for a specifical molecular signature in a particular cancer type. Nevertheless, after a model is trained, accurate predictions can be made for a digital image from a single cancer patient
- the developed architecture utilizes semi-supervised convolutional neural networks to make segmented predictions within a grid space across whole-slide H&E images composed of three-color channels.
- the overall architecture is representative of a multi-resolution model that captures morphological patterns at two zoom magnifications with each magnification reflecting a separate convolutional neural network (CNN).
- CNN convolutional neural network
- the model performs initial predictions on a lower resolution (i.e., generally at 5x magnification) and automatically localizes regions of interest by identifying those with the highest predictive power. Subsequently, the model performs a secondary prediction at a higher resolution scale across the selected regions of interest (i.e., usually at20x magnification).
- the resulting predictions are used in a final module that aggregates the scores across the multiresolution model to provide a final prediction for a given molecular signature in a specific cancer type.
- the architectures little customization generally related to adjusting one of the two zoom levels (e.g., using 25x magnification for a generating a better model for a particular molecular signature).
- the developed convolutional neural network uses a convolutional neural network architecture (e.g , ResNet) as its foundation with several important and significant modifications.
- the newly developed convolutional neural network is trained using a binary cross entropy loss function based on the most predictive tile derived from each sample at a given magnification level. For example, digital image of a whole slide is tiled at 5x magnification and all sub-tiles are evaluated through an inference stage; the sub-tile(s) with highest predicted probability for each whole slide are usedin a single trainingpass of the model. This process is repeated for each epoch throughout training of the model. Initially, this method aggregates the segmented predictions at a lower magnification to locate regions of interests.
- ResNet convolutional neural network architecture
- the automatic selection of the regions of interests is performed using an unsupervised machine learning module.
- the unsupervised machine learning module encompasses a dimension reduction algorithm based on principal component analysis of the extracted feature vectors for each component of the grid space, where the feature vectors are collected from the penultimate layer of the trained CNN and, subsequently, they are reduced to the two principal components contributing to the greatest variance across the collection of vectors.
- a custom k-means clustering module determines the most optimal number of clusters per sample by selecting the solution with maximum silhouette coefficients across all utilized iterations.
- the final regions of interests are chosen using the cluster encompassing the tile with the highest predictive value and including all other instances with silhouette coefficients within the top 50 th percentile of the selected cluster.
- the complete set of selected tiles is subsequently used for training the second CNN model, where the CNN has an enhanced resolution (usually of 20x).
- the second enhanced resolution CNN is based on an identical architecture as the first CNN and it is trained solely on the regions of interest chosen in the first stage of the model. Each region of interest is resampled from the original whole-slide image at an increased magnification to capture more details at the cellular level.
- the proposed models were generally trained and tested after resampling at 20x magnification; however, the model can be used with any zoom preference.
- the tile with the highest predictive value is used to make the final prediction for a particular molecular signature across all regions of interest in a given sample. This enhanced predictive score is averaged with the predicted score from the first model to arrive at a final actionable status prediction for each patient.
- Each pass of the model presents a single prediction score, and the resulting distribution of scores across all iterations for a single patient can be analyzed to determine the level of certainty with the final prediction. For example, a confident prediction is one with a low variance from the average predicted score, while an uncertain prediction will tend to have a high variance from the average predicted score.
- the developed architecture When applied to a single image of a whole slide for an individual patient, the developed architecture will provide a normalized prediction score between 1 (low) and 100 (high) and a confidence interval for the score.
- FIGS. 1A-1B show an example of multi-resolution convolutional neural network architecture to detect molecular biomarkers from histopathological tissue slides based on some implementations of the disclosed technology.
- the multi-resolution convolutional neural network architecture implemented based on some embodiments can detect homologous recombination deficiency from histopathological tissue slides.
- FIG. 1 A shows training of a neural network (e.g., DeepHRD) model for detecting homologous recombination deficiency (HRD) from whole-slide images (WSIs). For each WSI, a single prediction score is estimated based on the detection of HRD.
- a neural network e.g., DeepHRD
- HRD homologous recombination deficiency
- WSIs whole-slide images
- each WSI undergoes preprocessing and quality control.
- This module consists of tissue segmentation, filtering for nonfocused tissue, and final tiling of regions that contain tissue at 5x magnification.
- all tiles for a single image are processed through the first multiple instance learning (MIL) ResNetl 8 convolutional neural network.
- MIL multiple instance learning
- ResNetl 8 convolutional neural network uses the average of the top 25 predicted tile scores as the WSI predicted score. Dropout is incorporated into the fully connected layers in the feature extraction module to reduce overfitting during training. The same dropout technique is also incorporated during inference to simulate Monte Carlo dropout used to calculate confidence intervals in the final WSI prediction.
- the tile feature vectors from the penultimate layer of the feature extraction are used to automatically select regions of interest (ROI) from the original WSI for additional assessment.
- the feature vectors are reduced in dimensions using pnncipal component analysis and a custom k-means clustering module is used to determine the optimal number of clusters per sample.
- the selected tiles are then resampled at a 20x magnification.
- these sets of tiles are used to train a second MIL-ResNetl 8 model using an identical architecture to the one previously usedin 102.
- the average predictions across both models are aggregated for a single WSI. The resulting distribution of scores are used to calculate confidence intervals and establish a threshold of confidence for a final prediction.
- FIG. IB shows a trained neural network (e.g., DeepHRD) model for HRD prediction from a single whole-slide image.
- the trained neural network (e.g., DeepHRD) model produces a final prediction score for individual patient biopsies, with a computational-based diagnosis for subsequent clinical action.
- FIGS. 2A-2D show a neural network (e.g., DeepHRD) for detecting homologous recombination deficiency and predicting response to treatment in primary and metastatic breast cancer.
- FIG. 2A shows the receiver operating characteristic curves (ROCs) for classifying homologous recombination deficiency (HRD) in the TCGA held-out set (202) and the independent set (204) of primary breast cancers, encompassing the independent CPTAC and METABRIC primary breast cancer cohorts.
- FIG. 2B shows representative TCGA tissue slides are shown for both HRD and homologous recombination proficient (HRP) samples across multiple breast cancer subtypes along with the resulting predictions for each segmented tile at 5x and 20x resolutions.
- FIG. 2C shows ROCs for formalin-fixed paraffin-embedded (FFPE) diagnostic model in the TCGA held-out set (212) and for classifying metastatic breast cancer (MBC) patients who are complete responders to platinum therapy.
- FIG. 2D shows Kaplan-Meier survival curves for MBC patients treated with platinum chemotherapy separated by DeepHRD model predictions (220), BRCA1/2 mutation status (230), and SB S3 activity as predicted by SigMA (240).
- Q-values are corrected after considering breast cancer subtype, age at diagnosis, and the standard-of-care binary HRD classification score >42 (i.e , HRD score).
- Cox regression showing the logio-transformed hazard ratios are shown with their 95% confidence intervals (bottom of 220, 230, 240).
- Q-values less than or equal to 0.05 are annotated with * while q- values above 0.05 are annotated with n.s. (i.e., non-significant).
- FIGS. 3A-3C show a transfer learning (e.g., DeepHRD transfer learning) in ovarian cancer for predicting response to platinum treatment.
- FIG. 3 A shows schematic demonstrating the transfer learning method to train an ovarian homologous recombination deficiency (HRD) model from whole-slide H&E image (WSI) using a pretrained breast DeepHRD model. The pretrained flash-frozen breast model is used to initiate the weights and biases of all parameters in the ovarian model.
- HRD-scores are calculated from SNP6 genotyping microarray by deriving loss of heterozygosity (LOH), large-scale transitions (LST), and telomeric allelic imbalance (TAI).
- LH loss of heterozygosity
- LST large-scale transitions
- TAI telomeric allelic imbalance
- FIG. 3B shows Kaplan-Meier survival curves comparing the outcomes of patients treated with platinum chemotherapy split by the prediction of the DeepHRD transfer learning model.
- FIG. 3C shows Kaplan-Meier survival curves comparing the outcomes of platinum-treated patients split by the base model predictions with no transfer learning applied (310), BRCA1/2 mutation status (320), and SBS3 activity as predictedby SigMA (330).
- Q-values are corrected after considering ovarian cancer stage, age at diagnosis, and the standard-of-care binary HRD classification score >63 (i.e., HRD score).
- Cox regression showingthe loglO-transformed hazard ratios are shown with their 95% confidence intervals (bottom of 310, 320, 330).
- Q-values less than or equal 0.05 are annotated with * while q-values above 0.05 are annotated with n.s. (i.e., non-significant).
- FIGS. 4A-4C show workflowfor training neural network (e. g., DeepHRD) models independently for digitalized flash frozen and FFPE breast cancer slides.
- FIG. 4A shows prior to training, the number of HRD and HRP samples within each breast cancer subtype were balanced using all available PAM50 annotations.
- FIGS. 4B and 4C show the collection of flash frozen (FIG, 4B) and formalin-fixed paraffin-embedded (FFPE) (FIG. 4C) slides for the TCGA breast cancer cohort were used to train two independent DeepHRD models. Prior to training, the number of HRD and HRP samples were balanced within each breast cancer subtype. All downsampled individuals were added to the internal held-out test set. The validation sets were used to optimize the classification thresholds.
- FFPE formalin-fixed paraffin-embedded
- FIGS. 5A-5B show workflow for testing the performance of neural network(e.g., DeepHRD) models for digitalized flash frozen and FFPE breast cancer slides.
- FIG. 5 A shows the collection of breast cancers from CPTAC and METABRIC were used to independently validate the flash frozen breast cancer model. The DeepHRD prediction scores were averaged for samples with multiple images.
- FIG. 5 A shows the collection of breast cancers from CPTAC and METABRIC were used to independently validate the flash frozen breast cancer model.
- the DeepHRD prediction scores were averaged for samples with multiple images.
- FIG. 5B shows an independent collection of metastatic breast cancers treated with platinum chemotherapy was used to validate the formalin-fixed paraffin- embedded (FFPE) breast cancer model based upon individual patient response to therapy
- FFPE formalin-fixed paraffin- embedded
- the disclosed technology can be implemented in some embodiments to provide a deep learning artificial intelligence architecture that predicts genomic homologous recombination deficiency and platinum response from routine histology slides in breast and ovarian cancers.
- HRD homologous recombination repair
- Current standard diagnostic tests for detecting HRD in breast and ovarian cancers require genotyping-based or sequencing-based assays, which are not universally available.
- the disclosed technology can be implemented in some embodiments to provide a novel multi -resolution deep learning approach that allows training robust models for detecting genomic biomarkers directly from digitalized images of hematoxylin and eosin (H&E)-stained lightmicroscopy histopathological slides.
- a model for predicting genomically derived HRD scores can be trainedusing a number of primary breast cancers (e.g., 1,008 primary breast cancers from the Cancer Genome Atlas (TCGA) project).
- TCGA Cancer Genome Atlas
- the trained breast cancer model was externally validated on 535 primary breast cancers from two independent research cohorts and on 77 platinum-treated metastatic breast cancers. Applicability to 589 TCGA ovarian tumors was also demonstrated by training and validating a model using transfer learning for predicting platinum response.
- the deep learning approach based on some embodiments of the disclosed technology can identify platinum-sensitive BRCA1/2 wild-type tumors asHRD-positive.
- the deep learning model implemented based on some embodiments of the disclosed technology can outperform multiple existing genomic HRD biomarkers within each cohort.
- a deep learning model applied to digitalized H&E histopathological slides from breast and ovarian cancers detected genomically derived HRD and predicted direct clinical benefit to standard-of-care platinum-based therapies.
- the approach outperformed existing genetic biomarkers across multiple cohorts, slide scanners, and tissue-fixation procedures. These results have important implications for equitable and efficient clinical management of cancer patients sensitive to targeted DNA-damage-response therapies.
- Precision oncology aims to personalize cancer therapy by first identifying and, subsequently, targeting molecular defects in tumors within each individual. Many cancers harbor failures of specific DNA repair pathways and utilizing synthetic lethal relationships amongst peripheral pathways has proven as an effective treatment approach.
- HRD homologous recombination deficiency
- BRCA1 and BRCA2 susceptibility genes
- somatic mutations and epigenetic dysregulation in breast and ovarian cancers have been shown to lead to HRD.
- cancers deficient in homologous recombination exhibit genomic instability with characteristic patterns of somatic mutations and gene expression Some of these patterns have also been leveraged for detecting HRD in the absence of canonical germline or somatic defects within HRD-associated genes.
- SBS3 single-base substitution signature 3
- COSMIC Catalogue of Somatic Mutations in Cancer
- DeepHRD a weakly supervised convolutional neural network architecture based upon the fundamental assumptions of multiple-instance learning (MIL; FIG. 1).
- MIL multiple-instance learning
- FFPE formalin-fixed paraffin-embedded
- the DeepHRD method was trained separately to predict HRD status from FFPE and flash frozen tissues of breast cancers as well as from flash frozen tissue of ovarian cancers resultingin a total of three independently trained DeepHRD models.
- the flash frozen breast cancer model was externally validated using primary breast cancers from the: (i) Clinical Proteomic Tumor Analysis Consortium (CPTAC) comprised of 116 samples with associated whole-exome sequencing; and (ii) Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) comprised of 419 samples with associated microarray genotyping data (FIG. 5 A).
- CTAC Clinical Proteomic Tumor Analysis Consortium
- METABRIC Molecular Taxonomy of Breast Cancer International Consortium
- the FFPE breast cancer model was externally validated in an independent clinical cohort of metastatic breast cancers comprised of 77 patients treated with platinum-based chemotherapy with available metastatic biopsies and associated genomic and clinical annotations. Clinical response to platinum-based therapy and progression-free survival (PFS) were assessed usingthe Response Evaluation Criteria in Solid Tumors, version 1.1 (RECIST 1.1; FIG. 5B).
- the collection of fixed tissue slides from the TCGA breast cancers, TCGA ovarian cancers, CPTAC cohort, and METABRIC cohort were digitalized usingthe Aperio ScanScope system.
- the clinical cohort of metastatic breast cancers was digitalized using the Hamamatsu Photonics Nanozoomer system.
- HRD scores were calculated from sequencing or genotyping data as previously reported using scarHRD. Briefly, scarHRD considers the combined aggregated score of the telomeric allelic imbalance score, loss of heterozygosity score, and large-scale transitions score calculated for each patient using ASCAT-d erived copy number calls from SNP6 genotyping microarrays. Traditionally, an HRD score greater than 42 has beenused to determine eligibility fortreatment with platinum-based chemotherapy or PARP inhibitors in triple negative breast cancer, while an HRD score greater than 63 has been utilized for ovarian cancer.
- the pathogenicity of mutations found within BRCA1 and BRCA2 was determined using InterVar as previously described for the TCGA ovarian cohort. All variants predicted as pathogenic were also considered deleterious.
- the mutational status of BRCA1 and BRCA2 within the metastatic breast cancer cohort was determined by screening variants across existing database annotations that included Clinvar, Swissprot, Leiden Open Variation Database (LOVD), and the Universal Mutations Database (UMD) as previously reported.
- the deep learning architecture implemented based on some embodiments of the disclosed technology can be built upon the concept of the weakly supervised MIL-assumptions (FIG. 1 A). Specifically, all partitioned regions of a slide, or tiles, within a whole-slide H&E image (WSI) are assigned a weak label based upon the slide-level classification for each sample. It is assumed that all tiles within a negatively labeled slide are homologous recombination proficient (HRP), whereas at least a single tile must exhibit an HRD phenotype within a positively labeled slide. These assumptions allowthe model to be trained using only a single classification label for an entire image without the need for detailed manual annotations from a pathologist, which currently does not exist for characterizingHRD.
- HRP homologous recombination proficient
- the model is based on a multi-resolution decision, which performs an initial prediction on a low magnification (i.e., 5x magnification) and then automatically selects regions of interest (RO I) to perform a secondary prediction on an enhanced magnification within the selected ROIs (i.e., 20x magnification; FIG. 1 A).
- DeepHRD s multi-resolution architecture framework was designed to mimic the standard diagnostic protocol used by pathologists to examine H&E images by first selecting ROIs at a low magnification across the entire tissue slide and then to refine the specific tumor characteristics and subtypes captured at a higher resolution.
- DeepHRD maps individual tile predictions back to the original WSI, which allows visualizing the relative contribution and importance of specific tissue regions to the predictions of the model without using any pixel-level annotations (FIG. IB).
- the final model encompasses an ensemble of five identical architectures, with each producing multi-resolution prediction scores. The average of these scores was used to make a final prediction for each tissue slide. Due to the computational cost associated with processing an entire WSI, each slide was first segmented into smaller tiles for each utilized resolution. For the first stage of the model, each slide was tiled at a 5x magnification with 256x256 pixels per tile with approximately 2 pm of tissue per pixel. Blurred tiles and those with less than 80% of pixels containing tissue were removed (FIG. 1A).
- ResNetl 8 convolutional neural networks were trainedto extract features from the collection of tiles composing a single WSI.
- the resulting encoded features from the penultimate fully connected layer were used to automatically selectROIs at the 5x resolution.
- principal component analysis PCA
- K-means clustering was then used to group each tile representation. The total number of clusters as determined for each samplebased upon the value of k that provided the maximum silhouette coefficient across all tile representations. The cluster containing the tile with the maximum prediction probability was selected alongwith all tiles in the same cluster having a silhouette score greater than the 95% quantile of all silhouette scores across the WSI.
- the final ROIs were tiled at 20x magnification (0.5 pm per pixel) and used to train and test the second model.
- the top 25 tiles were averaged to calculate a final prediction score at a given resolution during an inference pass of a WSI.
- random dropout of nodes within the fully connected layers of the ResNet architecture was incorporated to prevent overfitting of the training dataset.
- This same dropout technique known as Monte Carlo dropout, was applied during inference of each WSI to provide an estimation of the model uncertainty by performing multiple inference passes over a single WSI.
- the resulting distribution of predictions were averaged to calculate a final score encompassing any epistemic uncertainty and were used to calculate confidence thresholds for a given sample (FIG. 1 A).
- DeepHRD can be used to make predictions on individual patients using only a digitalized cancer biopsy (FIG. IB).
- a digitalized cancer biopsy FOG. IB
- DeepHRD will produce a prediction score with confidence intervals that are used to make a computational diagnostic recommendation.
- the intended use of this method is to provide a computational diagnosis for subsequent clinical action.
- individuals with a high confidence prediction are labeled as either HRD or HRP (FIG. IB).
- a DeepHRD model was trained to detect HRD samples using a subset of flash frozen tissue slides from the TCGA breast cancer cohort (FIG. 1A).
- the number of HRD and HRP samples in each breast cancer subtype were balanced to prevent the models from learning features specific to individual subtype histology rather than those directly associated with HRD (FIGS. 4A-4B).
- the trained DeepHRD models can then be applied to individual digital slides to provide a patient-level prediction revealing whether a breast cancer is HRD or HRP (FIG. IB).
- DeepHRD allows overlaying an HRD probability mask to each digital slide, which can be used for subsequent investigation into pathological characteristics of each breast cancer patient (FIG. IB).
- the final trained DeepHRD breast cancer model was then tested on the held-out TCGA test set to assess the overall performance resultingin an AUC of 0.81 ([0 77-0.85] 95% Confidence Interval (CI); FIG. 2A).
- MBC metastatic breast cancer
- the final model was appliedto the held-outtest set of TCGA ovarian cancers to assess the ability of the model in separating individuals who benefit from treatment with platinum chemotherapy (FIG. 3 A).
- TCGA ovarian cancers were assessed for the ability of the model in separating individuals who benefit from treatment with platinum chemotherapy.
- 66 received first-line platinum chemotherapy for advanced-stage, high-grade serous ovarian cancer. Separating these individuals by their DeepHRD prediction resulted in a differential median survival between HRD and HRP predicted patients (FIG. 3B).
- genomic testing has substantially complicated routine clinical oncology workflows as it often requires re-biopsy to procure tumor tissue sufficientfor molecular assays as well as extensive analytics to analyze the large-scale data generated by these molecular assays.
- Recent deep learning Al approaches has demonstrated the ability to detect genomic biomarkers directly from H&E images, including ones indirectly related to therapy outcome (e g., detection of micro satellite instability that can be predictive of response to immunotherapy).
- no prior study has shown direct clinical significance of Al-based models for detecting HRD by predicting treatment benefit with external validations.
- HRD is a complementary biomarker to help guide the use of platinum therapies and an FDA-approved companion diagnostic test for the use of PARP inhibitors
- the performance of the neural network e g , DeepHRD
- the neural network e g , DeepHRD
- PDAC pancreatic ductal adenocarcinoma
- FFPE flash frozen and formalin-fixed paraffin-embedded
- HRD scores for the TCGA breast and ovarian cancers were obtained from a previous study.
- the 77 patients from the clinical cohort of whole-exome sequenced metastatic breast cancers were enrolled between June 2018 and March 2020 and all received at least one line of platinum chemotherapy. All clinical evaluations were determined locally at the Georges Francois Leclerc Cancer Center as previously reported.
- Each of the whole-slide images was segmented into 256x256 tiles at 5x and 20x magnifications containing 2pm per pixel and 0.5pm per pixel, respectively.
- Blurry tiles and those with less than 80% of pixels representing tissue were removed from all training and testing cohorts.
- a Laplacian filter was applied to each tile using a 3x3 kernel, and all tiles with a variance less than 0.02 were removed from the remaining analysis. All green, red, and blue pen marks and other annotation artifacts were removed by thresholding on the RGB color channels within each pixel.
- HRD scores were calculated as previously reported using scarHRD. Specifically, the HRD score is the summation of the telomeric allelic imbalance score, loss of heterozygosity score, and large-scale transitions scores calculated for each patient using ASCAT-derived copy number calls from SNP6 genotyping microarrays.
- the HRD scores for the CPTAC breast cancer samples were calculated based on copy number calls derived from whole-exome sequencing using Sequenza, which has been shown to result in analogous distributions of HRD scores to HRD scores calculated using ASCAT-derived copy number calls from SNP6 genotyping microarrays.
- HRD scores above 50 were considered HR-deficient and scores below 10 were considered proficient in the breast cancer cohorts. All intermediate scores were modelled as a probability of being deficient or proficient with an equal probability of both conditions at an HRD score of 30 (equation 2).
- the Adam optimizer was used for training with a learning rate of 10 -3 , a weight decay of 1 O' 4 , and minibatches consisting of 64 tiles.
- Each model was initiated using the ResNetl 8 architecture that was pretrained on the ImageNet (http://www.image-net.org/) database and was trainedfor200 epochs. All convolutional weights were frozen during training. Early stoppage was incorporated to prevent overfitting.
- a final inference pass is performed on all slides. All features from a single WSI were selected from the penultimate layer of the feature extractor and projected into a lower dimensional latent space using principal component analysis. K- means clustering was used to automatically select regions of interests (ROIs) for retiling at 20x magnification. The number of clusters was determined by selecting the solution with the maximum silhouette coefficient. The cluster containing the tile with the highest prediction probability was used to select the ROIs. All tiles belonging to this cluster, and which had a silhouette score greater than the 95% quantile of all silhouette scores for the given WSI were chosen as the final ROIs. Each ROI was then tiled into 256x256 pixel sub-tiles at20x magnification.
- DeepHRD is used to make predictions for individual wholeslide images.
- DeepHRD When performingthe multi-resolution inference, DeepHRD generates HRD probabilities for each tile at 5x magnification and for each tile within the automatically selected regions of interest at 20x magnification. Using the location of the original tiles, the probabilities can be mapped backto the original location within the whole-slide image to visualize the regional patterns that are influencing the final model prediction.
- FIG. 6 shows an example method 600 of determining the presence of a biomarker in a biological sample based on some implementations of the disclosed technology.
- the method 600 may include, at 610, obtaining a section of a biological sample, wherein the section of the biological sample has been treated with a stain, at 620, imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution to generate a first and second plurality of image data, at 630, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and at 640, providing the first and the second plurality of image data to a trained predictive neural network and determining the presence of a biomarker in the biological sample as an output of the trained predictive neural network.
- FIG. 7 shows an example method 700 of generating a trained predictive model configured to determine a presence of a biomarker in a biological sample based on some implementations of the disclosed technology.
- the method 700 may include, at 710, generating stained sections ofone or more biological samples and corresponding biomarker labels, at 720, imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution to generate a first and second plurality of image data, at 730, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and at 740, generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
- FIG. 8 shows another example method 800 of determining the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
- the method 800 may include, at 810, obtaining a stained section of the biological sample, at 820, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, and at 830, providing the plurality of images of the stained section an input to a trained predictive model and determining the presence of a biomarker in the biological sample as an output of a trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing.
- FIG. 9 shows a treatment method 900 for treating cancer in a subject in need thereof based on some implementations of the disclosed technology.
- the method 900 may include, at 910, obtaining a stained section of a biological sample, at 920, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, at 930, providing the plurality of images of the stained section to atained predictive model and determining a presence of a biomarker in the biological sample as an output of the trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing, and at 940, administering treatment to the patientbased on the presence of the biomarker.
- FIG. 10 shows an example of a computer system 1000 configured to determine the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
- the system 1000 includes a processor 1010 and a memory or storage medium 1020.
- the processor 1010 reads code from the memory 1020 and implements a method discussed in this patent document.
- Example 1 A method of determining the presence of a biomarker of a biological sample, comprising: (a) providing a section of a biological sample, wherein the section of the biological sample has been treated with a stain; (b) imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution thereby generating a first and second plurality of image data; (c) reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data; and (d) determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided an input of the reduced first and second plurality of image data.
- Example 2 The method of example 1, wherein the trained predictive model is configured to determine the presence of the biomarker with an accuracy of at least 80% as compared to genomic sequencing.
- Example 3 The method of example 2, wherein the accuracy comprises at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% as compared to genomic sequencing.
- Example 4 The method of example 1, wherein the trained predictive model comprises a first predictive model trained on the first plurality of image data and a second predictive model trained on the second plurality of image data.
- Example 5 The method of example 1, wherein the biomarker comprises loss of chromosome 9p
- Example 6 The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
- TP53 include a tumor suppressor gene.
- TP53 indicates tumor protein P53.
- Example 7 The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene EGFR (epidermal growth factor receptor).
- EGFR epidermal growth factor receptor
- Example 8 The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
- BRAF includes a human gene that encodes a protein called B-Raf.
- BRAF indicates v-raf murine sarcoma viral oncogene homolog B 1.
- Example 9 The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
- Example 10 The method of example 1, wherein the biomarker comprises presence of MSI (vs MS S) and/or MMR gene (e g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects.
- MSI vs MS S
- MMR gene e g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2
- the MSI gene defect indicates a micro satellite instable (MSI) gene defect
- the MMR gene defect indicates a mismatch repair gene defect.
- the MSS indicates micro satellite stable.
- Example 11 The method of example 1, wherein the biomarker comprises presence of high tumor mutational burden.
- Example 12 The method of example 1, wherein the biomarker comprises presence of hypermutator mutational signatures selected from: POLE comprised of POLE and MSI - COSMIC14 (POLE+MSI); MSI combined MSI - COSMIC15, MSI - COSMIC20 (POLD+MSI), MSI - COSMIC21, MSI - COSMIC26, and MSI - COSMIC6.
- Example 13 The method of example 1, wherein the biomarker comprises presence of apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC) alterations and mutational signature.
- APOBEC indicates a family of evolutionarily conserved cytidine deaminases.
- Example 14 The method of example 1, wherein the biomarker comprises presence of homologous recombination deficiency (HRD).
- HRD homologous recombination deficiency
- BRCA indicates breast cancer gene.
- Example 15 The method of example 1, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 14.
- HRP homologous recombination proficiency
- Example 16 The method of example 1, wherein the biomarker comprises presence of BRCA1/2 mutations
- Example 17 The method of example 1, wherein the biomarker comprises presence of “COSMIC3 - BRCA” mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as “mutational signature 3” (Sig3) in the COSMIC signature catalog or the presence of genomic ‘scar’ signatures.
- SNVs genome-wide somatic single nucleotide variations
- Example 18 The method of example 1, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH); number of telomeric imbalances (telomeric allelic imbalance, or TAI), which are the number of regions with allelic imbalance that extend to the sub-telomere but not across the centromere; and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
- GIS genomic instability score
- LH loss of heterozygosity
- TAI telomeric allelic imbalance
- LST large-scale state transitions
- Example 19 The method of example 1, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or any combination thereof.
- a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG
- Example 20 The method of example 1, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1, BRCA2 (also called ‘BRCA-ness’): alterations in PALB2, ATM, ATR, CHEK1/2, FANC genes (FANCA/C/D2/E/F/G/I//L/M/ 1), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B, ATRX, BAP1, BARD1 , BRIP1 , CDK12, PPP2R2A, MRE11, MRE11 A, NBN, TP53, NC0R1 , PTK2, ARID1 A, BLM, WRN, CDK12, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2.
- HRR homologous recombination repair
- Example 21 The method of example 1, wherein the biomarker comprises presence of potentially actionable genomic alterations in one or more of the following genes: ABL1, AKT1, ALK, APC, ATM, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, IDH1, IDH2, JAK1, JAK2, JAK3, KDR, KIT, MAP2K1, MET, NTRK, NTRK1, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH
- Example 22 The method of example 1, wherein the biomarker indicatescopy number alterations, deletions, amplifications, fusions, mutation clusters, mutation signatures or any combination thereof the genome of the biological sample.
- Example 23 The method of example 1, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
- Example 24 The method of example 1 , wherein the trained predictive model comprises a convolutional neural network.
- Example 25 The method of example 1 , wherein the trained predictive model comprises a neural network such as ResNet model.
- Example 26 The method of example 1 , further comprising reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data.
- the parameter space of the first and second plurality of image data indicates tiles at5x magnification, wherein the parameter space of the first and second plurality of image data is reduced to 25%, 10%, or 5% of the tiles carrying predictive information.
- Example 27 The method of example 26, wherein reducing is completed by principal component analysis.
- Example 28 The method of example 1, wherein the biological sample comprises a cancer free, or cancerous biological sample.
- Example 29 The method of example 1, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
- Example 30 The method of example 29, wherein the unhealthy tissue comprises virally infected tissue.
- Example 31 The method of example 30, wherein virally infected tissue comprises human papilloma virus (HPV) positive tissue.
- HPV human papilloma virus
- Example 32 The method of example 30, wherein the virally infectedtissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
- EBV Epstein-Barr virus
- HBV Hepatitis B virus
- HCV Hepatitis C virus
- HCV Human immunodeficiency virus
- HAV-8 Human herpes virus 8
- HTLV-1 Human T-cell leukemia virus type, also called human T-lymphotrophic virus
- Example 33 The method of example 29, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
- Example 34 The method of example 33, wherein the unhealthy tissue comprises premalign ant or precancerous tissue.
- Example 35 The method of example 1 , wherein the stain comprises a hematoxylin and eosin stain.
- Example 36 The method of example 1 , wherein the first resolution comprises a 5 X magnification, and wherein the second resolution comprises a 20X magnification.
- Example 37 The method of example 26, further comprising clustering the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
- Example 38 The method of example 37, wherein clustering is completed by k- means clustering.
- Example 39 The method of example 37, wherein the trained predictive model is trained with the clustered datasets that represent the top 15% of the variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
- Example 40 The method of example 37, wherein the trained predictive model is trained with the first and second clustered dataset and corresponding biomarker label of the biological sample, wherein the first and second clustered dataset comprise clustered datasets with silhouete coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
- Example 41 The method of example 40, wherein the corresponding biomarker label of the biological sample is determinedby genomic sequencing.
- Example 42 The method of example 1, wherein the output of the trained predictive model comprises an averaged predicted probability score of the firstand second predictive model.
- Example 43 The method of example 1 , wherein the one or more regions comprise at least 100 regions.
- Example 44 The method of example 1 , wherein the one or more regions comprise at most 10,000 regions
- Example 45 The method of example 1 , comprising removing one or more nodes of the trained predictive model when the trained predictive model is provided an input of the reduced firstand second plurality of image data.
- Example 46 A method of generating a trained predictive model configured to determine a presence of a biomarker of a biological sample, comprising: (a) providing stained sections of one or more biological samples and corresponding biomarker labels; (b) imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution thereby generating a first and second plurality of image data; (c) reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data; and (d) generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
- Example 47 The method of example 46, wherein the trained predictive model is configured to determine the presence of a biomarker with an accuracy of at least 80% as compared to genomic sequencing.
- Example 48 The method of example 47, wherein the accuracy comprises at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% as compared to genomic sequencing.
- Example 49 The method of example 46, wherein the biomarker label comprises loss of chromosome 9p.
- Example 50 The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
- Example 51 The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
- Example 52 The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
- Example 53 The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
- Example 54 The method of example 46, wherein the biomarker comprises presence of MSI (vsMSS) and/or MMR gene defects comprising one or more of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, and PMS2.
- MSI vsMSS
- MMR gene defects comprising one or more of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, and PMS2.
- Example 55 The method of example 46, wherein the biomarker comprises presence of hypermutator mutational signatures selected from POLE, MSI - COSMIC14, (POLE+MSI), MSI combined, MSI - COSMIC 15, MSI - COSMIC20, (POLD+MSI), MSI - COSMIC21, MSI - COSMIC26, and MSI - COSMIC6.
- the biomarker comprises presence of hypermutator mutational signatures selected from POLE, MSI - COSMIC14, (POLE+MSI), MSI combined, MSI - COSMIC 15, MSI - COSMIC20, (POLD+MSI), MSI - COSMIC21, MSI - COSMIC26, and MSI - COSMIC6.
- Example 56 The method of example 46, wherein the biomarker comprises presence of APOBEC alterations and mutational signature.
- Example 57 The method of example 45, whereinthe biomarker comprises presence of high tumor mutational burden.
- Example 58 The method of example 46, wherein the biomarker comprises presence of homologous recombination deficiency (HRD)
- HRD homologous recombination deficiency
- CDx Two commercial HRD companion diagnostic (CDx) tests, Myriad my Choice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exist.
- Example 59 The method of example 46, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 14.
- HRP homologous recombination proficiency
- Example 60 The method of example 46, wherein the biomarker comprises presence of BRC A 1/2 mutations
- Example 61 The method of example 46, wherein the biomarker comprises presence of “COSMIC3 - BRCA’ ’ mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) comprising “mutational signature 3” (Sig3) in the COSMIC signature catalog, or the presence of genomic ‘scar’ signatures.
- SNVs genome-wide somatic single nucleotide variations
- Sig3 mutantational signature 3
- Example 62 The method of example 46, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size (over 15 MB and less than the whole chromosome); number of telomeric imbalances (telomeric allelic imbalance, or TAI), which are the number of regions with allelic imbalance that extend to the sub-telomere but not across the centromere; and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
- GIS genomic instability score
- LH loss of heterozygosity
- TAI telomeric allelic imbalance
- LST large-scale state transitions
- Example 63 The method of example 46, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or any combination thereof.
- a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpC
- Example 64 The method of example 46, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1, BRCA2 (also called ‘BRCA-ness’): alterations in PALB2, ATM, ATR, CHEK1/2, FANC genes (FANCA/C/D2/E/F/G/I/7L/M/ 1), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B, ATRX, BAP1, BARD I, BRIP1 , CDK12, PPP2R2A, MRE1 1, MRE11 A, NBN, TP53, NCOR1 , PTK2, ARID1 A, BLM, WRN, CDK12, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2.
- HRR homologous recombination repair
- Example 65 The method of example 46, wherein the biomarker comprises presence of potentially actionable genomic alterations in one or more of the following genes: ABL1, AKT1, ALK, APC, ATM, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI , IDH1 , IDH2, JAK1, IAK2, JAK3, KDR, KIT, MAP2K1, MET, NTRK, NTRK1, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, F0XL2, GNA11, GNAQ, GNAS, HNF1 A, MLH1,
- Example 66 The method of example 46, wherein the stained sections of the one or more biological samples comprises paraffin embedded sections, formalin fixed sections, frozen sections, fresh sections, or any combination thereof sections.
- Example 67 The method of example 46, wherein the trained predictive model comprises a convolutional neural network.
- Example 68 The method of example 46, wherein the trained predictive model comprises a ResNet model.
- Example 69 The method of example 46, wherein reducing is completed by principal component analysis.
- Example 70 The method of example 46, wherein the one ormore biological samples comprise a cancer free, cancerous biological sample, healthy tissue, unhealthy tissue, or any combination of health and unhealthy tissues.
- Example 71 The method of example 70, wherein the unhealthy tissue comprises virally infected tissue.
- Example 72 The method of example 71, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
- HPV human papilloma virus
- Example 73 The method of example 46, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
- EBV Epstein-Barr virus
- HBV Hepatitis B virus
- HCV Hepatitis C virus
- HIV Human immunodeficiency virus
- HHV-8 Human herpes virus 8
- HTLV-1 Human T-cell leukemia virus type, also called human T-lymphotrophic virus
- Example 74 The method of example 70, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
- Example 75 The method of example 74, wherein the unhealthy tissue comprises premalign ant or precancerous tissue.
- Example 76 The method of example 46, wherein the stain comprises a hematoxylin and eosin stain.
- Example 77 The method of example 46, wherein the first resolution comprises a 5X magnification, and wherein the second resolution comprises a 20X magnification.
- Example 78 The method of example 46, further comprising clustering the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
- Example 79 The method of example 78, wherein clustering is completed by k- means clustering.
- Example 80 The method of example 78, wherein the trained predictive model is trained with clustered datasets that represent the top 15% of the variancebetween clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
- Example 81 The method of example 78, wherein the firstand second predictive models are trained with one or more biological samples’ first and second clustered dataset and the corresponding biomarker labels, wherein the first and second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
- Example 82 The method of example 46, wherein the corresponding biomarker labels of the one or more biological samples are determined by genomic sequencing.
- Example 83 The method of example 46, wherein the output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
- Example 84 The method of example 46, wherein the one or more regions comprise at least 100 regions.
- Example 85 The method of example 46, wherein the one or more regions comprise at most 1,000 regions.
- Example 86 The method of example 46, wherein generating the trained predictive model comprises removing one or more nodes of the first and second predictive model during training.
- Example 87 A computer system configured to determine the presence of a biomarker of a biological sample, comprising: one or more processors; and a non-transient computer readable storage medium including software, wherein the software comprises executable instructions that, as a result of execution, cause the one or more processors of the computer system to: (i) receive a section of a biological sample, wherein the section of the biological sample has been stained; (ii) image one or more regions of the stained section at a first resolution and a second resolution thereby generating a first and second plurality of image data; (iii) reduce a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data; and (iv) determine the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided an input of the reduced first and second plurality of image data.
- Example 88 The system of example 87, wherein the trained predictive model is configured to determine the presence of the biomarker with an accuracy of at least 80% as compared to genomic sequencing.
- Example 89 The system of example 88, wherein the accuracy comprises atleast 85%, at least 92%, at least 95%, at least 97%, or atleast 99% as compared to genomic sequencing.
- Example 90 The system of example 87, wherein the trained predictive model comprises a first predictive model trained on the first plurality of image data and a second predictive model trained on the second plurality of image data.
- Example 91 The system of example 87, wherein the biomarker comprises loss of chromosome 9p.
- Example 92 The system of example 87, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
- Example 93 The system of example 87, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
- Example 94 The system of example 87, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
- Example 95 The system of example 87, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
- Example 96 The system of example 87, wherein the trained predictive model comprises a convolutional neural network.
- Example 97 The system of example 87, wherein the trained predictive model comprises a ResNet model.
- Example 98 The system of example 87, wherein reducing is completed by principal component analysis.
- Example 99 The system of example 87, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
- Example 100 The system of example 99, wherein the unhealthy tissue comprises virally infected tissue.
- Example 101 The system of example 100, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
- HPV human papilloma virus
- Example 102 The system of example 100, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
- EBV Epstein-Barr virus
- HBV Hepatitis B virus
- HCV Hepatitis C virus
- HIV Human immunodeficiency virus
- HHV-8 Human herpes virus 8
- HTLV-1 Human T-cell leukemia virus type, also called human T-lymphotrophic virus
- Example 103 The system of example 99, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
- Example 104 The system of example 99, wherein the unhealthy tissue comprises premalignant or precancerous tissue.
- Example 105 The system of example 87, wherein the biological sample comprises a cancer free, or cancerous biological sample.
- Example 106 The system of example 87, wherein the stain comprises a hematoxylin and eosin stain.
- Example 107 The system of example 87, wherein the first resolution comprises a 5 X magnification, and wherein the second resolution comprises a 20X magnification.
- Example 108 The system of example 87, wherein the instructions further comprise cluster the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
- Example 109 The system of example 108, wherein the instruction of clustering is completed by k-means clustering.
- Example 110 The system of example 108, wherein the trained predictive model is trained with the clustered datasets that represent the top 15% of the variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
- Example 11 1. The system of example 108, wherein the trained predictive model is trained with the biological sample’s first and second clustered dataset and corresponding biomarker labels of the biological samples, wherein the firstand second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
- Example 112. The system of example 111, wherein the corresponding biomarker label of the biological sample is determinedby genomic sequencing.
- Example 113 The system of example 87, wherein the output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
- Example 114 The system of example 87, wherein the one or more regions comprise at least 100 regions, or at most 1,000 regions, or at least 100 regions and at most 1,000 regions.
- Example 115 The system of example 87, wherein the one or more processors comprise one or more processors of a smartphone, tablet, laptop, desktop, server, cloud computing architecture, or any combination thereof.
- Example 116 A method of determining the presence of a biomarker of a biological sample, comprising: (a) providing a section of a biological sample, wherein the section of the biological sample has been stained; (b) imaging one or more regions of the stained section of the biological sample thereby generating a plurality of images of the stained section; (c) determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided the plurality of images of the stained section an input, wherein the trained predictive model provides an accuracy of determining the presence of the biomarker of at least 80% as compared to genomic sequencing.
- Example 117 The method of example 116, wherein the accuracy comprises at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% as compared to genomic sequencing.
- Example 118 The method of example 116, wherein the trained predictive model comprises a first predictive model trained on a first plurality of images acquired at a first resolution and a second predictive model trained on a second plurality of images acquired at a second resolution.
- Example 119 The method of example 116, wherein the biomarker comprises loss of chromosome 9p.
- Example 120 The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
- Example 121 The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
- Example 122 The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
- Example 123 The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
- Example 124 The method of example 16, wherein the biomarker comprises presence of MSI (vsMSS) and/or MMR gene (e g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects.
- MSI vsMSS
- MMR gene e g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2
- Example 125 The method of example 116, wherein the biomarker comprises presence of hypermutator mutational signatures: POLE comprised of ‘ ‘POLE” and SBS6, SBS14, SBS15, SBS20, SBS21, SBS2, SBS26, SBS44.
- Example 126 The method of example 116, wherein the biomarker comprises presence of high tumor mutational burden.
- Example 127 The method of example 116, wherein the biomarker comprises presence of APOBEC alterations and mutational signature.
- Example 128 The method of example 116, wherein the biomarker comprises presence of homologous recombination deficiency (HRD).
- HRD homologous recombination deficiency
- CDx Two commercial HRD companion diagnostic (CDx) tests, Myriad myChoice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exist.
- Example 129 The method of example 116, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 14.
- HRP homologous recombination proficiency
- Example 130 The method of example 116, wherein the biomarker comprises presence of BRCA-1 and/or -2 mutations.
- Example 131 The method of example 116, wherein the biomarker comprises presence of “COSMIC3 - BRCA” mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as “mutational signature 3” (Sig3) in the COSMIC signature catalog, or the presence of genomic ‘scar’ signatures.
- SNVs genome-wide somatic single nucleotide variations
- Example 132 The method of example 116, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size; number of telomeric imbalances (telomeric allelic imbalance, or TAI); and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
- GIS genomic instability score
- LH loss of heterozygosity
- LST large-scale state transitions
- Example 133 The method of example 116, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or any combination thereof.
- a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-Np
- Example 134 The method of example 116, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1, BRCA2 (also called ‘BRCA-ness’): alterations in PALB2, BARD1 , ATM, BRIP1 , CHEK1/2, CDK12, ATR, ATRX, BAP1 , ARID 1 A, FANC genes (FANCA/C/D2/E/F/G/V/L/M, FANCI), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B; as well as otherless common HRR gene alterations in PPP2R2A, MRE11 , MRE11 A, NBN, TP53 , NC0R1 , PTK2, BLM, WRN, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRC
- HRR homo
- Example 135. The method of example 116, wherein the biomarker comprises presence of potentially actionable genomic alterations in one or more of the following genes: ABL1, AKT1, APC, ALK, APC, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, KDR, MET, NTRK, NTRK1/2/3, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNA13, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, NOTCH1, VEGFA, HGF, NPM1,
- Example l36 The method of example 116, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
- Example 137 The method of example 116, wherein the trained predictive model comprises a convolutional neural network.
- Example 138 The method of example 116, wherein the trained predictive model comprises a ResNet model.
- Example 139 The method of example 116, further comprising reducing a parameter space of the firstand second plurality of image data, thereby producing a reduced first and second plurality of image data.
- Example 140 The method of example 116, wherein reducing is completed by principal component analysis.
- Example 141 The method of example 116, wherein the biological sample comprises a cancer free, or cancerous biological sample.
- Example 142 The method of example 116, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
- Example 143 The method of example 142, wherein the unhealthy tissue comprises virally infected tissue.
- Example 144 The method of example 143, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
- HPV human papilloma virus
- Example 145 The method of example 144, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
- EBV Epstein-Barr virus
- HBV Hepatitis B virus
- HCV Hepatitis C virus
- HCV Human immunodeficiency virus
- HHHV-8 Human herpes virus 8
- HTLV-1 Human T-cell leukemia virus type, also called human T-lymphotrophic virus
- Example 146 The method of example 143, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
- Example 147 The method of example 143, wherein the unhealthy tissue comprises premalignant or precan cerous tissue.
- Example 148 The method of example 116, wherein the stain comprises a hematoxylin and eosin stain.
- Example 149 The method of example 118, wherein the first resolution comprises a 5X magnification, and wherein the second resolution comprises a 20X magnification.
- Example 150 The method of example 143, further comprising clustering the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
- Example 151 The method of example 150, wherein clusteringis completed by k- means clustering.
- Example 152 The method of example 150, wherein the trained predictive model is trained with the biological sample’s first and second clustered dataset and corresponding biomarker label of the biological sample, wherein the firstand second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
- Example 153 The method of example 152, wherein the corresponding biomarker label of the biological sample is determined by genomic sequencing.
- Example 154 The method of example 116, wherein the output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
- Example 155 The method of example 116, wherein the one or more regions comprise at least 100 regions.
- Example 156 The method of example 116, wherein the one or more regions comprise at most 1,000 regions.
- Example 157 The method of example 125, further comprising removing one or more nodes of the trained predictive model when provided as an input the reduced first and second plurality of image data.
- Example 158 A treatment method for treating cancer in a subject in need thereof, the method comprising: providing a section of a biological sample, wherein the section of the biological sample has been stained; imaging one or more regions of the stained section of the biological sample thereby generating a plurality of images of the stained section; determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided the plurality of images of the stained section an input, wherein the trained predictive model provides an accuracy of determining the presence of the biomarker of at least 80% as compared to genomic sequencing; and administering treatment to the patientbased on the presence of the biomarker.
- Example 159 The treatment method of example 158, wherein the biomarker comprises loss of chromosome 9p.
- Example 160 The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
- Example 161 The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
- Example 1 2. The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
- Example 163 The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
- Example 164 The treatment method of example 158, wherein the biomarker comprises presence of MSI (vsMSS) and/or MMR gene (e.g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects.
- MMR gene e.g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2
- Example 165 The treatment method of example 158, wherein the biomarker comprises presence of hypermutator mutational signatures selected from: POLE comprised of “POLE” and SBS6, SBS14, SBS15, SBS20, SBS21, SBS2, SBS26, SBS44.
- Example 166 The treatment method of example 158, wherein the biomarker comprises presence of high tumor mutational burden.
- Example 167 The treatment method of example 158, wherein the biomarker comprises presence of APOBEC alterations and mutational signature.
- Example 168 The treatment method of example 158, wherein the biomarker comprises presence of homologous recombination deficiency (HRD).
- HRD homologous recombination deficiency
- CDx Two commercial HRD companion diagnostic (CDx) tests, Myriad myChoice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 andBRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exist.
- Example 169 The treatment method of example 158, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 168.
- HRD negative or homologous recombination proficiency, HRP
- HRD positive for example, using genomic tests in Example 168.
- Example 170 The treatment method of example 158, wherein the biomarker comprises presence ofBRCA-1 and/or -2 mutations.
- Example 171 The treatment method of example 158, wherein the biomarker comprises presence of “COSMIC3 - BRCA” mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as “mutational signature 3” (Sig3) in the COSMIC signature catalog, or the presence of genomic ‘scar’ signatures
- SNVs genome-wide somatic single nucleotide variations
- Example 172 The treatment method of example 158, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size; number of telomeric imbalances (telomeric allelic imbalance, or TAI); and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
- GIS genomic instability score
- LH loss of heterozygosity
- LST large-scale state transitions
- Example 173 The treatment method of example 158, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions ata 5’-NpCpG-3’ contexts features of the sequencing data, or any combination thereof.
- a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions ata 5’-NpCp
- Example 174 The treatment method of example 158, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1 , BRCA2 (also called BRCA-ness’): alterations in PALB2, BARD1, ATM, BRIP1, CHEK1/2, CDK12, ATR, ATRX, BAP1, ARID1A, FANC genes (FANCA/C/D2/E/F/G/I//L/M, FANCI), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B; as well as otherless common HRR gene alterations in PPP2R2A, MRE11 , MRE11 A, NBN, TP53 , NCOR1 , PTK2, BLM, WRN, RPA1 , EMSY, CCNE1, ERCC3, TAD54, XRCC2/
- HRR homo
- Example 175. The treatment method of example 158, wherein the biomarker comprises presence of actionable genomic alterations in one or more of the following genes: ABL1, AKT1, APC, ALK, APC, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, KDR, MET, NTRK, NTRK1/2/3, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNA13, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, NOTCH1, VEGFA, HGF, NPM
- Example 176 The treatment method of example 163, wherein the patient has GIST; the treatment method comprising not administering c-Kit inhibitor imatinib.
- Example 177 The treatment method of example 163, wherein the patient has GIST or other solid tumor; the treatment method comprising not administering c-Kit inhibitors in addition to imatinib, including Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib,
- imatinib including Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib,
- Example 178 The treatment method of any of examples 164-166, further comprising administering a treatment for the cancer comprisingthe following drugs classes: immune checkpoint inhibitors (ICIs) and other immunotherapies to said subject if said MSI, MMR gene (e.g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects, hypermutator mutational signatures (e.g., COSMIC14/15/21/26/6) and/or high TMB of said sample is detected.
- ICIs immune checkpoint inhibitors
- Example 179 The treatment method of any of examples 164-166, further comprising administering a treatment for the cancer comprisingthe following drugs classes: PD- 1 inhibitors (e g., Pembrolizumab,Nivolumab, Cemiplimab, Pidilizumab, Dostarlimab, larotrectinib), PD-L1 inhibitors (e.g., Atezolizumab, Avelumab, Durvalumab), CTLA-4 inhibitors (e.g., Ipilimumab and tremelimumab), LAG-3 inhibitors (e.g., tebotelimab, eftilagimod alpha, Relatlimab), TIM-3 inhibitors (e.g., MBG453, Sym023, TSR-022), other immunomodulator therapies alone or in combination with other ICIs or other drugs to said subject if said MSI, MMR gene (e.g., POLE, MLH
- Example 180 The treatment method of any of examples 168-174, further comprising administering a treatment for the cancer comprisingthe following drug classes: platinum drugs, poly-ADP ribose polymerase (PARP) inhibitors, and/or newer agents such as ATR, Weel or CHK, Pol-theta orRAD52 inhibitors to said subject if said HRD or surrogate gene or signature wherein said cancer comprises, breast cancer, ovarian cancer, pancreatic adenocarcinoma, prostate cancer, sarcoma, or any solid tumor or combination thereof cancer.
- a treatment for the cancer comprisingthe following drug classes: platinum drugs, poly-ADP ribose polymerase (PARP) inhibitors, and/or newer agents such as ATR, Weel or CHK, Pol-theta orRAD52 inhibitors to said subject if said HRD or surrogate gene or signature wherein said cancer comprises, breast cancer, ovarian cancer, pancreatic adenocarcinoma, prostate cancer, s
- the weights collected from a final DeepHRD model trained to detect HRD in one tissue type or modality can be used to initiate the model weights for another tissue type or modality. All other training procedures can stay the same, thus, allowing the transfer of knowledge from training one tissue type or modality to another.
- the treatment method can utilize this approach to train an ovarian cancer model by utilizing the prior knowledge from the breast cancer model based on some embodiments of the disclosed technology. This Al algorithm will allow application of this deep learning Al technology for other genomic alterations and other cancer types. [00279] Example 181.
- any of examples 168-174 further comprising administering a treatment for the cancer comprising platinum drugs, including cisplatin, carboplatin, oxaliplatin, nedaplatin, lobaplatin, heptaplatin or satraplatin alone, or in combination with other drugs, e.g., FOLFOX to said subject if said HRD or surrogate gene or signature thereof of said sample is detected.
- the cancer therapeutic causes inter-strand breaks of genomic molecules of the subject’s cells, leadingto p53-initiated apoptosis.
- Example 182 The treatment method of any of examples 168-174, further comprising administering a treatment for the cancer comprising poly-ADP ribose polymerase (PARP) inhibitors, includin the four mainPARP inhibitors: olaparib (Lynparza), niraparib (Zejula), rucaparib (Rubraca), talazoparib (Talzenna) as well as other PARP inhibitors to said subject if said HRD or surrogate gene or signature thereof of said sample is detected.
- PARP poly-ADP ribose polymerase
- Example 183 The treatment method of example 158, wherein the treatment method comprising not administering immune checkpoint inhibitors (ICIs) and other immunotherapies to said subject if said 9p deletions of said sample is detected.
- ICIs immune checkpoint inhibitors
- Example 184 The treatment method of example 158, wherein the biomarker comprises presence ofEGFR/ErbBl mutations comprising one or more ofL858R, exonl 9del, and exon 20 alteration.
- Example 185 The treatment method of examples 184, further comprising administering to the patient afatinib, dacomitinib, erlotinib, gefitinib, osimertinib [T790], or amivantamib.
- Example 186 The treatment method of example 158, wherein the biomarker comprises presence ofHER2/ErbB2 Amplification
- Example 187 The treatment of example 186, further compring administering to the patient traztuzumab, ado-trastuzumab emtansine, lapatinib, margetuximab, neratinib, pertuzumab, tucatinimb, deruxtecab, traztumab deruxtecan, orneratinib.
- Example 188 The treatment method of example 158, wherein the biomarker comprises presence of BRAF mutation.
- Example 189 The treatment method of example 188, further comprising administering to the patient encorafenib, vemurafenib, dabrafenib, trametinib, or cobimetinib.
- Example 190 The treatment method of example 158, wherein the biomarker comprises presence ofFGFRl/2/3 fusions.
- Example 1 The treatment method of example 190, further comprising administering to the patient erdafitanib, fatibatinib, infigratinib, pemigatinib, dovitinib; lenvatinib, pazopanib, ponatinib, or regorafenib .
- Example 192 The treatment method of example 158, wherein the biomarker comprises presence ofPDGFRA exon 18 mutations.
- Example 193 The treatment method of example 192, further comprising administering to the patient avapritinib or dasatinib.
- Example 194 The treatment method of example 158, wherein the biomarker comprises presence of KIT mutations in GIST.
- Example 195 The treatment method of example 194, further comprising administering to the patient imatinib, Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib, Pazopanib, Regorafenib, Ripretinib and Dovitinib, or sorafenib.
- Example 196 The treatment method of example 158, wherein the biomarker comprises presence of NRG1 fusion.
- Example 197 The treatment method of example 196, further comprising administering to the patient zenocutinumab or seribantmab,
- Example 198 The treatment method of example 158, wherein the biomarker comprises presence of RET fusions.
- Example 199 The treatment method of example 198, further comprising administering to the patient pralsetinib, selpercatinib; crizotinib, ceritinib, cabozantinib, or vandetanib.
- Example 200 The treatment method of example 158, wherein the biomarker comprises presence ofROSl fusions.
- Example 201 The treatment method of example 200, further comprising administering to the patient crizotinib , or entrectinib .
- Example 202 The treatment method of example 158, wherein the biomarker comprises presence ofNTRKl/2 or 3 fusions.
- Example 203 The treatment method of example 202, further comprising administering to the patiententrectinib, larotrectinib, or repotrectinib .
- Example 204 The treatment method of example 158, wherein the biomarker comprises presence of ALK fusions.
- Example 205 The treatment method of example 204, further comprising administering to the patient crizotinib, alectinib, brigatinib, ceritinib, orlorlatinib.
- Example 206 The treatment method of example 158, wherein the biomarker comprises presence of PIK3CA alterations.
- Example 207 The treatment method of example 206, further comprising administering to the patient alpelisib, temsirolimus, or everolimus.
- Example 208 The treatment method of example 158, wherein the biomarker comprises presence ofMtor or TSC1/2 mutations.
- Example 209 The treatment method of example 208, further comprising administering to the patient temsirolimus, or everolimas.
- Example 210 The treatment method of example 158, wherein the biomarker comprises presence of Akt, or PTEN alterations.
- Example 211 The treatment method of example 210, further comprising administering to the patient capivasertib.
- Example 212 The treatment method of example 158, wherein the biomarker comprises presence of MET amplification or mutation.
- Example 213 The treatment method of example 212, further comprising administering to the patient crizotinib, tepotinib, capmatinib, telisotuzumib, tepotinib, or savolitinib.
- Example 214 The treatment method of example 158, wherein the biomarker comprises presence of MEK mutation.
- Example 215. The treatment method of example 214, further comprising administering to the patient tram etinib, cobimetinib, or selumetinib .
- Example 216 The treatment method of example 158, wherein the biomarker comprises presence ofNFl/2 alterations.
- Example 217 The treatment method of example 216, further comprising administering to the patient tram etinib, temsirolimus, everolimus, or selumetinib.
- Example 218 The treatment method of example 158, wherein the biomarker comprises presence of STK11 alterations.
- Example 219. The treatment method of example 218 comprising administering to the patient dasatinib, everolimus, temsirolimus, orbosutinib.
- Example 220 The treatment method of example 158, wherein the biomarker comprises presence of KDR alterations.
- Example 221 The treatment method of example 220, further comprising administering to the patient pazopanib, regorafenib, orvandetanib.
- Example 222 The treatment method of example 158, wherein the biomarker comprises presence of microsatellite stable (MS) with DNA polymerase-s (POLE) mutation, CD274 amplification, or 9p24.1 amplicon.
- MS microsatellite stable
- POLE DNA polymerase-s
- Example 223. The treatment method of example 222, further comprising administer ICIs to the patient.
- Example 224 The treatment method of example 158, wherein the biomarker comprises presence ofMAP2K alterations.
- Example 225 The treatment method of example 224, further comprising administering to the patient trametinib .
- Example 226 The treatment method of example 158, wherein the biomarker comprises presence of alterations to CCND2, CDK4, or CDKN2A/B.
- Example 227 The treatment method of example 226, further comprising administering to the patient Palbociclib.
- Example 228 The treatment method of example 158, wherein the biomarker comprises presence of IDH1 mutation
- Example 2329 The treatment method of example 228, further comprising administering to the patient ivosidenib .
- Example 230 The treatment method of example 158, wherein the biomarker comprises presence of truncating or oncogenic mutations in B2M, PTEN, JAK1, JAK2, STK11 and EGFR, and/or 9p21 or 9p arm/genetic region loss.
- Example 23 The treatment method of example 230, further comprising not administering to the patient an immune checkpoint inhibitor.
- Example 232. The treatment method of example 158, wherein the biomarker comprises presence of mutations in the RAS genes KRAS and NRAS.
- Example 233 The treatment method of example 232, further comprising not administering to the patient epidermal growth factor receptor (EGFR) therapies, like cetuximab and panitumumab, in colorectal cancer, and EGFR tyrosine kinase inhibitors, like erlotinib, in lung cancer.
- EGFR epidermal growth factor receptor
- genomic sequencing encompasses any type of genomic profiling where DNAandRNA are subjected to nextgeneration massively parallel sequencing protocol or genotyping through microarray hybridization.
- accuracy encompasses the mathematical terms: sensitivity, specificity, precision, negative predictive values, accuracy, and balanced accuracy, or any combination thereof mathematical terms.
- Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
- Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus.
- the computer readable medium can be a machine- readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them.
- data processing unit or “data processing apparatus” encompasses all apparatus, devices, andmachines for processing data, includingby way of example a programmable processor, a computer, or multiple processors or computers.
- the apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program does not necessarily correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts storedin a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e g., files that store one or more modules, sub programs, or portions of code).
- a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network
- processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
- a processor will receive instructions and data from a read only memory or a random access memory or both.
- the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- a computer need not have such devices.
- Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, mediaand memory devices, includingby way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices.
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memory devices.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Medical Informatics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Primary Health Care (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Quality & Reliability (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Biodiversity & Conservation Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Methods and systems that pertain to predicting cancer biomarkers are disclosed. In some embodiments of the disclosed technology, a method of determining the presence of a biomarker of a biological sample includes providing a section of a biological sample, wherein the section of the biological sample has been treated with a stain, imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution thereby generating a first and second plurality of image data, reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data, and determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided an input of the reduced first and second plurality of image data.
Description
ARTIFICIAL INTELLIGENCE ARCHITECTURE FOR PREDICTING CANCER BIOMARKERS
CROSS-REFERENCE TO RELATED APPLICATION
[00011 This patent document claims priorities to and benefits of U.S. Provisional Patent Application No. 63/269,033, titled “ARTIFICIAL INTELLIGENCE ARCHITECTURE FOR PREDICTING CANCER BIOMARKERS” and filed on March 8, 2022, and U.S. Provisional Patent Application No. 63/483,237, titled “ARTIFICIAL INTELLIGENCE ARCHITECTURE FOR PREDICTING CANCER BIOMARKERS” and filed on February 3 , 2023. The entire content of the aforementioned patent applications is incorporated by reference as part of the disclosure of this patent document.
TECHNICAL FIELD
[0002] The disclosed technology relates to systems and methods for detecting clinically actionable biomarkers.
BACKGROUND
[0003] Previous methods to detect clinically actional biomarkers for personalize cancer treatment rely almost exclusively on genomic sequencing or genotyping platforms, such as microarrays, targeted panel sequencing, whole-exome sequencing, or whole-genome sequencing Researchers are conducting a study to avoid traditional sequencing approaches.
SUMMARY
[0004] Disclosed are materials, systems, devicesand methods for predicting cancer biomarkers using a deep learning architecture.
[0005] In some implementations of the disclosed technology, a method of determining the presence of a biomarker in a biological sample includes obtaining a section of a biological sample, wherein the section of the biological sample has been treated with a stain, imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution to generate a first and second plurality of image data, reducing a parameter space of
the first and second plurality of image data to produce a reduced first and second plurality of image data, and providing the first and the second plurality of image data to a trained predictive neural network and determining the presence of a biomarker in the biological sample as an output of the trained predictive neural network.
[0006] In some implementations of the disclosed technology, a method of generating a trained predictive model configured to determine a presence of a biomarker in a biological sample includes generating stained sections of one or more biological samples and corresponding biomarker labels, imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution to generate a first and second plurality of image data, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
[0007] In some implementations of the disclosed technology, a computer system configured to determine the presence of a biomarker of a biological sample includes one or more processors, and a non-transitory computer readable storage medium including software stored thereon, wherein the software comprises executable instructions that, as a result of execution, cause the one or more processors of the computer system to image one or more regions of a stained section of a biological sample at a first resolution and at a second resolution to generate a first and a second plurality of image data, reduce a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and providing the first and the second plurality of image data to a trained predictive model and determine the presence of a biomarker in the biological sample as an output of the trained predictive model.
[0008] In some implementations of the disclosed technology, a method of determining the presence of a biomarker in a biological sample includes obtaining a stained section of the biological sample, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, and providing the plurality of images of the stained section an inputto a trained predictive model and determining the presence of a biomarker in the biological sample as an output of a trained predictive model, wherein the
trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing.
[0009] In some implementations of the disclosed technology, a treatment method for treating cancer in a subject in need thereof includes obtaining a stained section of a biological sample, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, providing the plurality of images of the stained section to a trained predictive model and determining a presence of a biomarker in the biological sample as an output of the trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing, and administering treatment to the patientbased on the presence ofthe biomarker.
[0010] The above and other aspects and implementations of the disclosed technology are described in more detail in the drawings, the description and the claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] FIGS. 1A-1B show an example of multi-resolution convolutional neural network architecture to detect molecular biomarkers from histopathological tissue slides based on some implementations of the disclosed technology.
[0012] FIGS. 2A-2D show a neural network for detecting homologous recombination deficiency and predicting response to treatment in primary and metastatic breast cancer.
[0013] FIGS. 3A-3C show a transfer learning in ovarian cancer for predicting response to platinum treatment.
[0014] FIGS. 4A-4C show workflow for training neural network models independently for digitalized flash frozen and FFPE breast cancer slides.
[0015] FIGS. 5A-5B show workflow for testing the performance of neural network (e.g., DeepHRD) models for digitalized flash frozen and FFPE breast cancer slides.
[0016] FIG. 6 shows an example method ofdetermining the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
[0017] FIG. 7 shows an example method of enerating a trained predictive model configured to determine a presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
[0018] FIG. 8 shows another example method of determining the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
[0019] FIG. 9 shows a treatment method for treating cancer in a subject in need thereof based on some implementations of the disclosed technology .
[0020] FIG. 10 shows an example of a computer system configured to determine the presence of abiomarker of abiological sample based on some implementations of the disclosed technology.
DETAILED DESCRIPTION
[0021] The disclosed technology can be implemented in some embodiments to provide an artificial intelligence (Al) architecture platform for predicting cancer biomarkers, and provide therapeutic methods based on the biomarkers identified by the Al architecture platform
[0022] The disclosed technology can be implemented in some embodiments to provide methods and systems for detecting clinically actionable cancer biomarkers and mutational signatures directly from digital hematoxylin and eosin (H&E) slides without sequencing.
[0023] The disclosed technology can also be implemented in some embodiments to provide a novel deep learning architecture that, with little to no customization, can be trained to predict clinically actionable molecular cancer biomarkers directly from digital images based on scans of slides stained using hematoxylin and eosin (H&E). The invention allows skipping DNA sequencing and provides direct prediction of these biomarkers from the scanned slides.
[0024] Previous methods to detect clinically actional biomarkers for personalize cancer treatment rely almost exclusively on genomic sequencing or genotyping platforms (e g., microarrays, targeted panel sequencing, whole-exome sequencing, or whole-genome sequencing). The develop convolutional neural network architecture completely avoids traditional sequencing approaches by directly predicting clinically actional biomarkers using digital images from hematoxylin and eosin-stained (H&E) histology images sampled from individual patients. From a machine learning architecture implementation perspective, the approach utilizes semi- supervised convolutional neural networks to make segmented predictions within a grid space across whole-slide H&E images composed of three-color channels. The machine learning method aggregates these predictions to locate regions of interests and makes a final actional status prediction for each patient. Further, this model introduces a multi-resolution approach that captures morphological patterns at varying zoom magnifications and implements
Monte Carlo dropout to provide confidence metrics that refine the final predictive value. Regions of interests are selected using an unsupervised machine learning module comprised of a dimensionality reduction using principal component analysis and custom k-means clustering algorithm on the extracted feature vectors for each component of the grid space. From an application perspective, the method requires training before it can be applied to a particular clinical biomarker. Specifically, the approach requires at least a thousand patients with digital H&E slides and known molecular biomarkers to generate a cancer-specific/biomarker-specific prediction model. After the model is trained, it can be applied to an individual patient making a prediction whether the cancer of the individual patient has the biomarker.
[0025] Hematoxylin and eosin (H&E) stain is one of the principal tissue stains usedin histology. H&E slides are routinely and universally generated by pathologists for cancer diagnosis. However, in most cases, these slides do not allow pathologist to detect clinically actionable molecular biomarkers and do not provide guidance for personalized therapy. As such, in majority of cases, cancer samples are subsequently sent for DNA and/or RNA sequencing for detecting individual and/or sets of biomarkers. The novel deep learning architecture implemented based on some embodiments of the disclosed technology allows training an Al model that can directly predictbiomarkers which are incorporated in therapeutic regimens thereby improving patient response to therapy and survival after patients are treated with therapy targeting the detected biomarker. The methods provided herein allow for identification of biomarkers directly from the digital H&E slides (see FIGS. 4A-4C), thus, skipping the need for shipping and sequencing of bio-specimens by external providers. For example, instead of sending a biospecimen over the mail to an external CLIA lab and waiting 14 days for results from sequencing, the Al approach allows directly detecting these biomarkers in the digital slides within a fraction of a second.
[0026] The disclosed technology can also be implemented in some embodiments to provide a novel deep learning architecture that, with little to no customization, can be trained to predict clinically actionable and/or epidemiologically relevant molecular signatures directly from digital images based on scans of slides stained using hematoxylin and eosin (H&E). The invention allows skipping DNA sequencing and provides direct prediction of these signatures from the digital images of the scanned slides. Previous methods to detect clinically actionable biomarkers for personalized cancer treatment or epidemiologically relevant biomarkers for large-scale
genetic epidemiological studies relied almost exclusively on DNA sequencing or genotyping platforms (e.g., microarrays, targeted panel sequencing, whole-exome sequencing, and/orwhole- genome sequencing). The developed convolutional neural network architecture completely avoids traditional sequencing approaches by accurately evaluating the presence or ab sence of molecular signatures using digital images from hematoxylin and eosin-stained histology images sampled from individual patients. The developed architecture requires at least 1,000 digital images of whole slides for training a model for a specifical molecular signature in a particular cancer type. Nevertheless, after a model is trained, accurate predictions can be made for a digital image from a single cancer patient
[0027] From a machine learning and implementation perspective, the developed architecture utilizes semi-supervised convolutional neural networks to make segmented predictions within a grid space across whole-slide H&E images composed of three-color channels. The overall architecture is representative of a multi-resolution model that captures morphological patterns at two zoom magnifications with each magnification reflecting a separate convolutional neural network (CNN). Generally, the model performs initial predictions on a lower resolution (i.e., generally at 5x magnification) and automatically localizes regions of interest by identifying those with the highest predictive power. Subsequently, the model performs a secondary prediction at a higher resolution scale across the selected regions of interest (i.e., usually at20x magnification). The resulting predictions are used in a final module that aggregates the scores across the multiresolution model to provide a final prediction for a given molecular signature in a specific cancer type. The architectures little customization generally related to adjusting one of the two zoom levels (e.g., using 25x magnification for a generating a better model for a particular molecular signature).
[0028] The developed convolutional neural network uses a convolutional neural network architecture (e.g , ResNet) as its foundation with several important and significant modifications. In contrast, the newly developed convolutional neural network is trained using a binary cross entropy loss function based on the most predictive tile derived from each sample at a given magnification level. For example, digital image of a whole slide is tiled at 5x magnification and all sub-tiles are evaluated through an inference stage; the sub-tile(s) with highest predicted probability for each whole slide are usedin a single trainingpass of the model. This process is repeated for each epoch throughout training of the model. Initially, this method aggregates the
segmented predictions at a lower magnification to locate regions of interests. The automatic selection of the regions of interests is performed using an unsupervised machine learning module. The unsupervised machine learning module encompasses a dimension reduction algorithm based on principal component analysis of the extracted feature vectors for each component of the grid space, where the feature vectors are collected from the penultimate layer of the trained CNN and, subsequently, they are reduced to the two principal components contributing to the greatest variance across the collection of vectors. A custom k-means clustering module determines the most optimal number of clusters per sample by selecting the solution with maximum silhouette coefficients across all utilized iterations. The final regions of interests are chosen using the cluster encompassing the tile with the highest predictive value and including all other instances with silhouette coefficients within the top 50th percentile of the selected cluster. The complete set of selected tiles is subsequently used for training the second CNN model, where the CNN has an enhanced resolution (usually of 20x). The second enhanced resolution CNN is based on an identical architecture as the first CNN and it is trained solely on the regions of interest chosen in the first stage of the model. Each region of interest is resampled from the original whole-slide image at an increased magnification to capture more details at the cellular level. The proposed models were generally trained and tested after resampling at 20x magnification; however, the model can be used with any zoom preference. During inference, the tile with the highest predictive value is used to make the final prediction for a particular molecular signature across all regions of interest in a given sample. This enhanced predictive score is averaged with the predicted score from the first model to arrive at a final actionable status prediction for each patient.
[0029] Within both CNN models, random dropout of nodes within the fully connected layers has been incorporated to improve the robustness of the approach. The dropouts are utilized both during model trainingto overcome issues with overfitting as well as during apply the model for inference serving as a method to quantify a level of confidence for each patient-level prediction. Specifically, inference is run across a single whole-slide image over many iterations (at least 100 iterations but usually less than 1,000 iterations) with a new set of nodes randomly dropped with each pass of the model. This method acts as a Bayesian approximation of the underlying distribution of potential models for the developed architecture. Each pass of the model presents a single prediction score, and the resulting distribution of scores across all iterations for a single
patient can be analyzed to determine the level of certainty with the final prediction. For example, a confident prediction is one with a low variance from the average predicted score, while an uncertain prediction will tend to have a high variance from the average predicted score. When applied to a single image of a whole slide for an individual patient, the developed architecture will provide a normalized prediction score between 1 (low) and 100 (high) and a confidence interval for the score.
[0030] FIGS. 1A-1B show an example of multi-resolution convolutional neural network architecture to detect molecular biomarkers from histopathological tissue slides based on some implementations of the disclosed technology. As shown in FIGS. 1 A-1B, the multi-resolution convolutional neural network architecture implemented based on some embodiments can detect homologous recombination deficiency from histopathological tissue slides. FIG. 1 A shows training of a neural network (e.g., DeepHRD) model for detecting homologous recombination deficiency (HRD) from whole-slide images (WSIs). For each WSI, a single prediction score is estimated based on the detection of HRD. Specifically, at 101, each WSI undergoes preprocessing and quality control. This module consists of tissue segmentation, filtering for nonfocused tissue, and final tiling of regions that contain tissue at 5x magnification. At 102, all tiles for a single image are processed through the first multiple instance learning (MIL) ResNetl 8 convolutional neural network. This architecture uses the average of the top 25 predicted tile scores as the WSI predicted score. Dropout is incorporated into the fully connected layers in the feature extraction module to reduce overfitting during training. The same dropout technique is also incorporated during inference to simulate Monte Carlo dropout used to calculate confidence intervals in the final WSI prediction. At 103, the tile feature vectors from the penultimate layer of the feature extraction are used to automatically select regions of interest (ROI) from the original WSI for additional assessment. The feature vectors are reduced in dimensions using pnncipal component analysis and a custom k-means clustering module is used to determine the optimal number of clusters per sample. At 104, the selected tiles are then resampled at a 20x magnification. At 105, these sets of tiles are used to train a second MIL-ResNetl 8 model using an identical architecture to the one previously usedin 102. At 106, the average predictions across both models are aggregated for a single WSI. The resulting distribution of scores are used to calculate confidence intervals and establish a threshold of confidence for a final prediction. FIG. IB shows a trained neural network (e.g., DeepHRD) model for HRD prediction from a
single whole-slide image. The trained neural network (e.g., DeepHRD) model produces a final prediction score for individual patient biopsies, with a computational-based diagnosis for subsequent clinical action.
[0031] FIGS. 2A-2D show a neural network (e.g., DeepHRD) for detecting homologous recombination deficiency and predicting response to treatment in primary and metastatic breast cancer. FIG. 2A shows the receiver operating characteristic curves (ROCs) for classifying homologous recombination deficiency (HRD) in the TCGA held-out set (202) and the independent set (204) of primary breast cancers, encompassing the independent CPTAC and METABRIC primary breast cancer cohorts. FIG. 2B shows representative TCGA tissue slides are shown for both HRD and homologous recombination proficient (HRP) samples across multiple breast cancer subtypes along with the resulting predictions for each segmented tile at 5x and 20x resolutions. FIG. 2C shows ROCs for formalin-fixed paraffin-embedded (FFPE) diagnostic model in the TCGA held-out set (212) and for classifying metastatic breast cancer (MBC) patients who are complete responders to platinum therapy. FIG. 2D shows Kaplan-Meier survival curves for MBC patients treated with platinum chemotherapy separated by DeepHRD model predictions (220), BRCA1/2 mutation status (230), and SB S3 activity as predicted by SigMA (240). Q-values are corrected after considering breast cancer subtype, age at diagnosis, and the standard-of-care binary HRD classification score >42 (i.e , HRD score). Cox regression showing the logio-transformed hazard ratios are shown with their 95% confidence intervals (bottom of 220, 230, 240). Q-values less than or equal to 0.05 are annotated with * while q- values above 0.05 are annotated with n.s. (i.e., non-significant).
[0032] FIGS. 3A-3C show a transfer learning (e.g., DeepHRD transfer learning) in ovarian cancer for predicting response to platinum treatment. FIG. 3 A shows schematic demonstrating the transfer learning method to train an ovarian homologous recombination deficiency (HRD) model from whole-slide H&E image (WSI) using a pretrained breast DeepHRD model. The pretrained flash-frozen breast model is used to initiate the weights and biases of all parameters in the ovarian model. HRD-scores are calculated from SNP6 genotyping microarray by deriving loss of heterozygosity (LOH), large-scale transitions (LST), and telomeric allelic imbalance (TAI). FIG. 3B shows Kaplan-Meier survival curves comparing the outcomes of patients treated with platinum chemotherapy split by the prediction of the DeepHRD transfer learning model. FIG. 3C shows Kaplan-Meier survival curves comparing the outcomes of platinum-treated
patients split by the base model predictions with no transfer learning applied (310), BRCA1/2 mutation status (320), and SBS3 activity as predictedby SigMA (330). Q-valuesare corrected after considering ovarian cancer stage, age at diagnosis, and the standard-of-care binary HRD classification score >63 (i.e., HRD score). Cox regression showingthe loglO-transformed hazard ratios are shown with their 95% confidence intervals (bottom of 310, 320, 330). Q-values less than or equal 0.05 are annotated with * while q-values above 0.05 are annotated with n.s. (i.e., non-significant).
[0033] FIGS. 4A-4C show workflowfor training neural network (e. g., DeepHRD) models independently for digitalized flash frozen and FFPE breast cancer slides. FIG. 4A shows prior to training, the number of HRD and HRP samples within each breast cancer subtype were balanced using all available PAM50 annotations. FIGS. 4B and 4C show the collection of flash frozen (FIG, 4B) and formalin-fixed paraffin-embedded (FFPE) (FIG. 4C) slides for the TCGA breast cancer cohort were used to train two independent DeepHRD models. Prior to training, the number of HRD and HRP samples were balanced within each breast cancer subtype. All downsampled individuals were added to the internal held-out test set. The validation sets were used to optimize the classification thresholds.
[0034] FIGS. 5A-5B show workflow for testing the performance of neural network(e.g., DeepHRD) models for digitalized flash frozen and FFPE breast cancer slides. FIG. 5 A shows the collection of breast cancers from CPTAC and METABRIC were used to independently validate the flash frozen breast cancer model. The DeepHRD prediction scores were averaged for samples with multiple images. FIG. 5B shows an independent collection of metastatic breast cancers treated with platinum chemotherapy was used to validate the formalin-fixed paraffin- embedded (FFPE) breast cancer model based upon individual patient response to therapy [0035] The disclosed technology can be implemented in some embodiments to provide a deep learning artificial intelligence architecture that predicts genomic homologous recombination deficiency and platinum response from routine histology slides in breast and ovarian cancers. [0036] Cancers harboring deficiencies in homologous recombination repair (HRD) can benefit from platinum-based chemotherapies and PARP inhibitors. Current standard diagnostic tests for detecting HRD in breast and ovarian cancers require genotyping-based or sequencing-based assays, which are not universally available.
[0037] The disclosed technology can be implemented in some embodiments to provide a novel multi -resolution deep learning approach that allows training robust models for detecting genomic biomarkers directly from digitalized images of hematoxylin and eosin (H&E)-stained lightmicroscopy histopathological slides. In some implementations, a model for predicting genomically derived HRD scores can be trainedusing a number of primary breast cancers (e.g., 1,008 primary breast cancers from the Cancer Genome Atlas (TCGA) project). In addition to a set of held-out TCGA samples, the trained breast cancer model was externally validated on 535 primary breast cancers from two independent research cohorts and on 77 platinum-treated metastatic breast cancers. Applicability to 589 TCGA ovarian tumors was also demonstrated by training and validating a model using transfer learning for predicting platinum response.
[0038] Across the TCGA breast cancer held-out validation cohort, the trained deep learning model primary breast cancers implemented based on some embodiments of the disclosed technology can predict genomically derived HRD scores from digital H&E slides with an AUC of 0.81 ([0.77-0.85] 95% Confidence Interval (CI)). This performance was confirmed in two independent primary breast cancer cohorts (AUC=0.76; [0.71-0.82] 95% CI). In an external clinical cohort of platinum-treated metastatic breast cancers, samples predicted by the deep learning model as HRD had a higher complete response (AUC=0.76; [0.54-0.93] 95% CI) and a 3.7-fold longer mean progression -free survival (hazard ratio^O.47; q=0.0087). Notably, the deep learning approach based on some embodiments of the disclosed technology can identify platinum-sensitive BRCA1/2 wild-type tumors asHRD-positive. By applying transfer learning, the approach based on some embodiments of the disclosed technology may also predict overall survival after first-line platinum treatment in advanced-stage, high-grade serous-type ovarian cancer (hazard ratio=0.45; q=0.024). The deep learning model implemented based on some embodiments of the disclosed technology can outperform multiple existing genomic HRD biomarkers within each cohort.
[0039] A deep learning model applied to digitalized H&E histopathological slides from breast and ovarian cancers detected genomically derived HRD and predicted direct clinical benefit to standard-of-care platinum-based therapies. The approach outperformed existing genetic biomarkers across multiple cohorts, slide scanners, and tissue-fixation procedures. These results have important implications for equitable and efficient clinical management of cancer patients sensitive to targeted DNA-damage-response therapies.
[0040] Precision oncology aims to personalize cancer therapy by first identifying and, subsequently, targeting molecular defects in tumors within each individual. Many cancers harbor failures of specific DNA repair pathways and utilizing synthetic lethal relationships amongst peripheral pathways has proven as an effective treatment approach. Specifically, exploiting treatments that increase DNA damage and/or provide inhibition of additional DNA repair pathways in cells with a pre-existing DNA repair defect can lead to selective cancer cell death. Previous mechanistic studies and clinical trials have shown that breast and ovarian cancers lacking the ability to repair DNA double strand breaks through homologous recombination are highly sensitive to DNA-damage-response targeted therapies like platinum treatment and Poly (ADP-ribose) polymerase (PARP) inhibitors.
[0041] Historically, homologous recombination deficiency (HRD) has been associated with germline mutations in specific genes leading to an increased risk for developingbreast and ovarian cancers with the most notable susceptibility genes being BRCA1 and BRCA2. In addition to germline variants, somatic mutations and epigenetic dysregulation in breast and ovarian cancers have been shown to lead to HRD. Importantly, cancers deficient in homologous recombination exhibit genomic instability with characteristic patterns of somatic mutations and gene expression Some of these patterns have also been leveraged for detecting HRD in the absence of canonical germline or somatic defects within HRD-associated genes. For instance, the pattern of single-base substitution signature 3 (SBS3), part of the Catalogue of Somatic Mutations in Cancer (COSMIC) catalog of mutational signatures, has been attributed to HRD independently of the molecular mechanisms disabling DNA repair through homologous recombination. Importantly, SBS3 has been previously utilized as a clinical biomarker for detecting HRD in breast and ovarian cancers.
[0042] In the United States, two commercial HRD companion diagnostic (CDx) tests have been approved by the U. S. Food and Drug Administration (FDA) for patients with ovarian and metastatic breast cancers. Myriad myChoice® CDx and FoundationOne® CDx both determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status. Additionally, multiple researches and CLIA-certified diagnostic tests for detecting HRD have been developed by examining germline variants, somatic mutations, mutational patterns, changes in gene expression, and/or epigenetic modifications. While detection of homologous recombination deficiency canbe performed using a multitude of different methods, all of these
approaches intrinsically rely on assays profilingDNA and/or RNA leadingto bottlenecks largely attributed to availability of molecular testing, time to decision making, and overall cost. In turn, this has precluded the widespread utilization of companion and complementary diagnostic biomarkers in standard therapy and CLIA-certified research testing for clinical trials. For example, the cost of a CLIA-certified companion for complementary HRD test is several thousand dollars making them unaffordable for many patients in the United States and most countries around the world. Furthermore, results from sequencing-based diagnostics can take 3 to 6 weeks, thus, severely delaying clinical management of many lethal and rapidly progressing solid tumors. Lastly, recent reports have demonstrated that only a small percentage of patients around the world have access to sequencing-based diagnostic tests, including FDA-approved companion diagnostic tests, with even lower testing rates in various underserved populations. This large ‘gap’ in cancer genomic testing presents a critical issue in the delivering equitable and efficient clinical management for all cancer patients worldwide necessitating the need for identifying low-cost scalable biomarkers for clinical oncology.
[0043] While the access and uptake of sequencing-based diagnostics is limited, tissue biopsies are routinely sampled and processed with hematoxylin and eosin staining (H&E) for solid-tumor diagnostics for most patients throughout the world. In combination with recent advancesin computer vision and computational pathology for detecting recurring patterns in complex, data- rich whole-slide H&E images, deep learning artificial intelligence (Al)-based models allow for both prognostic and diagnostic predictions using only histopathological tissue slides. Here we introduce DeepHRD, a weakly-supervised multi-resolution convolutional neural network architecture for detecting HRD directly from digitalized H&E tissue slides. We train and validate DeepHRD models on data from The Cancer Genome Atlas (TCGA) project and demonstrate their ability to detect HRD using data from two external research consortia. Importantly, using independent clinical samples, we demonstrate that DeepHRD outperforms existing genomic biomarkers in predicting patient response to platinum-based therapies. By circumventing current bottlenecks in genomic testing, the method based on some embodiments of the disclosed technology has direct implications for addressing global socioeconomic disparities in the diagnosis and treatment of breast and ovarian cancers.
[0044] To train a downstream model capable of predicting HRD status directly from digitalized tissue slides, we implemented DeepHRD — a weakly supervised convolutional neural network
architecture based upon the fundamental assumptions of multiple-instance learning (MIL; FIG. 1). We trained and internally validated our models using digitalized H&E images from the TCGA breast cancer cohort comprising 1,008 samples with flash frozen slides and 1,055 samples with formalin-fixed paraffin-embedded (FFPE) slides (FIGS. 4A-4C). We further trained a separate model usingthe TCGA ovarian cancer cohort comprising 589 samples with flash frozen slides. All included samples had accompanying whole-exome sequencing data as well as microarray genotyping data for calculating a genomic HRD score (FIGS. 4A-4C). The DeepHRD method was trained separately to predict HRD status from FFPE and flash frozen tissues of breast cancers as well as from flash frozen tissue of ovarian cancers resultingin a total of three independently trained DeepHRD models.
[0045] The flash frozen breast cancer model was externally validated using primary breast cancers from the: (i) Clinical Proteomic Tumor Analysis Consortium (CPTAC) comprised of 116 samples with associated whole-exome sequencing; and (ii) Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) comprised of 419 samples with associated microarray genotyping data (FIG. 5 A). The FFPE breast cancer model was externally validated in an independent clinical cohort of metastatic breast cancers comprised of 77 patients treated with platinum-based chemotherapy with available metastatic biopsies and associated genomic and clinical annotations. Clinical response to platinum-based therapy and progression-free survival (PFS) were assessed usingthe Response Evaluation Criteria in Solid Tumors, version 1.1 (RECIST 1.1; FIG. 5B). The collection of fixed tissue slides from the TCGA breast cancers, TCGA ovarian cancers, CPTAC cohort, and METABRIC cohort were digitalized usingthe Aperio ScanScope system. The clinical cohort of metastatic breast cancers was digitalized using the Hamamatsu Photonics Nanozoomer system.
[0046] Definition of the Genomic HRD Score and Associated Genetic Markers
[0047] HRD scores were calculated from sequencing or genotyping data as previously reported using scarHRD. Briefly, scarHRD considers the combined aggregated score of the telomeric allelic imbalance score, loss of heterozygosity score, and large-scale transitions score calculated for each patient using ASCAT-d erived copy number calls from SNP6 genotyping microarrays. Traditionally, an HRD score greater than 42 has beenused to determine eligibility fortreatment with platinum-based chemotherapy or PARP inhibitors in triple negative breast cancer, while an HRD score greater than 63 has been utilized for ovarian cancer. For our ground truth, we
incorporated soft labelling during training to prevent the model from becoming overconfident with a single image by centering the HRD score cutoff at the median score across all breast cancer samples (i.e., HRD score cutoff of 30). All breast cancer samples with HRD scores above 50 were considered deficient, while all samples with HRD scores below 10 were considered proficient. The remaining breast cancers with HRD scores between 10 and 50 were modelled using soft labeling centered at 30 where there is an equal symmetric probability of a sample being deficient or proficient. For ovarian cancers, all ovarian samples with HRD scores above 73 were considered deficient, while all samples with HRD scores below 53 were considered proficient. Analogous to breast cancer, the remaining ovarian cancers with HRD scores between 53 and 73 were modelled using soft labeling centered at 63 where there is an equal symmetric probability of a sample being deficient or proficient.
[0048] The pathogenicity of mutations found within BRCA1 and BRCA2 was determined using InterVar as previously described for the TCGA ovarian cohort. All variants predicted as pathogenic were also considered deleterious. The mutational status of BRCA1 and BRCA2 within the metastatic breast cancer cohort was determined by screening variants across existing database annotations that included Clinvar, Swissprot, Leiden Open Variation Database (LOVD), and the Universal Mutations Database (UMD) as previously reported.
[0049] The activities of COSMIC mutational signature SBS3 has been shown to be enriched in HRD samples. The presence of SB S3 reflects many potential deficiencies that may arise across the HR machinery independent of the underlying mechanism of inactivation, which has more recently been used for detecting HRD from clinical sequencing data. To determine the presence or absence of SB S3 activity within the TCGA ovarian cohort and the metastatic breast cancer cohort, we used the high confidence predictions produced by the machine learning tool SigMA. SigMA classifies samples as either SBS3 positive or SB S3 negative with an estimated false positive rate (FPR) of 1% and a sensitivity of 50%.
[0050] The deep learning architecture implemented based on some embodiments of the disclosed technology can be built upon the concept of the weakly supervised MIL-assumptions (FIG. 1 A). Specifically, all partitioned regions of a slide, or tiles, within a whole-slide H&E image (WSI) are assigned a weak label based upon the slide-level classification for each sample. It is assumed that all tiles within a negatively labeled slide are homologous recombination proficient (HRP), whereas at least a single tile must exhibit an HRD phenotype within a positively labeled slide.
These assumptions allowthe model to be trained using only a single classification label for an entire image without the need for detailed manual annotations from a pathologist, which currently does not exist for characterizingHRD.
[0051] The model is based on a multi-resolution decision, which performs an initial prediction on a low magnification (i.e., 5x magnification) and then automatically selects regions of interest (RO I) to perform a secondary prediction on an enhanced magnification within the selected ROIs (i.e., 20x magnification; FIG. 1 A). DeepHRD’s multi-resolution architecture framework was designed to mimic the standard diagnostic protocol used by pathologists to examine H&E images by first selecting ROIs at a low magnification across the entire tissue slide and then to refine the specific tumor characteristics and subtypes captured at a higher resolution. Further, DeepHRD maps individual tile predictions back to the original WSI, which allows visualizing the relative contribution and importance of specific tissue regions to the predictions of the model without using any pixel-level annotations (FIG. IB). The final model encompasses an ensemble of five identical architectures, with each producing multi-resolution prediction scores. The average of these scores was used to make a final prediction for each tissue slide. Due to the computational cost associated with processing an entire WSI, each slide was first segmented into smaller tiles for each utilized resolution. For the first stage of the model, each slide was tiled at a 5x magnification with 256x256 pixels per tile with approximately 2 pm of tissue per pixel. Blurred tiles and those with less than 80% of pixels containing tissue were removed (FIG. 1A). Forboth stages of the model, ResNetl 8 convolutional neural networks were trainedto extract features from the collection of tiles composing a single WSI. The resulting encoded features from the penultimate fully connected layer were used to automatically selectROIs at the 5x resolution. Specifically, principal component analysis (PCA) was used to project the encoded features into a latent space encompassing the greatest variance. K-means clustering was then used to group each tile representation. The total number of clusters as determined for each samplebased upon the value of k that provided the maximum silhouette coefficient across all tile representations. The cluster containing the tile with the maximum prediction probability was selected alongwith all tiles in the same cluster having a silhouette score greater than the 95% quantile of all silhouette scores across the WSI. The final ROIs were tiled at 20x magnification (0.5 pm per pixel) and used to train and test the second model. The top 25 tiles were averaged to calculate a final prediction score at a given resolution during an inference pass of a WSI.
Importantly, during training, random dropout of nodes within the fully connected layers of the ResNet architecture was incorporated to prevent overfitting of the training dataset. This same dropout technique, known as Monte Carlo dropout, was applied during inference of each WSI to provide an estimation of the model uncertainty by performing multiple inference passes over a single WSI. The resulting distribution of predictions were averaged to calculate a final score encompassing any epistemic uncertainty and were used to calculate confidence thresholds for a given sample (FIG. 1 A).
[0052] Once the final HRD model has been trained and tuned, DeepHRD, can be used to make predictions on individual patients using only a digitalized cancer biopsy (FIG. IB). Using a single W SI as input, DeepHRD will produce a prediction score with confidence intervals that are used to make a computational diagnostic recommendation. The intended use of this method is to provide a computational diagnosis for subsequent clinical action. Specifically, individuals with a high confidence prediction are labeled as either HRD or HRP (FIG. IB).
[0053] To characterize the classification performance of the proposed methodology, we calculated the area under the receiving operating curve (AUROC or AUC). Confidence intervals were calculated using non-parametric resampling. Comparisons across survival curves were calculated using a log-rank test. Multivariate analyses were performed to calculate hazard ratios using Cox regressions. Corrections for multiple hypothesis testing were performed using the Benjamini Hochberg procedure.
[0054] A DeepHRD model was trained to detect HRD samples using a subset of flash frozen tissue slides from the TCGA breast cancer cohort (FIG. 1A). The TCGA breast cancer cohort was separated with: (i) 70% of samples used for training; (ii) 15% of samples used for validating the training and for adjusting prediction parameters; (iii) 15% of samples held-out and used for testing the final trained model (n=l ,008 total breast cancer samples; FIGS. 4A-4C). Prior to training, the number of HRD and HRP samples in each breast cancer subtype were balanced to prevent the models from learning features specific to individual subtype histology rather than those directly associated with HRD (FIGS. 4A-4B). The trained DeepHRD models can then be applied to individual digital slides to provide a patient-level prediction revealing whether a breast cancer is HRD or HRP (FIG. IB). Importantly, DeepHRD allows overlaying an HRD probability mask to each digital slide, which can be used for subsequent investigation into pathological characteristics of each breast cancer patient (FIG. IB).
[0055] The final trained DeepHRD breast cancer model was then tested on the held-out TCGA test set to assess the overall performance resultingin an AUC of 0.81 ([0 77-0.85] 95% Confidence Interval (CI); FIG. 2A). To assess the generalizability of the model, we performed an external validation using the collection of breast cancer slides from CPTAC and METABRIC (FIG. 5 A) resultingin an AUC of 0.76 ([0.71-0.82] 95% CI; FIG. 2A). Importantly, while HRD is enriched in luminal B, basal -like, andHer2 enriched breast cancers (FIG. 4 A), the model implemented based on some embodiments of the disclosed technology can distinguish HR deficiency and proficiency across all subtypes (FIG. 2B).
[0056] While flash frozen tissue slides are commonly used for downstream molecular analyses, FFPE tissue slides are standard in clinical settings. Therefore, we trained an independent model to classify HRD directly from FFPE slides from the TCGA breast cancer cohort following the same procedure as previously described for training on flash frozen tissue images (FIGS. 4A- 4C). The final TCGA FFPE model exhibited an AUC of 0.81 ([0.77-0.86] 95% CI; FIG. 2C), which was identical to the TCGA flash frozen model. These results indicate that the fixation procedure and differences in staining coloration have minimal effects on the performance of predicting HRD status directly from breast cancer tissue slides.
[0057] Importantly, the FFPE model was capable of distinguishing metastatic breast cancer (MBC) samples, part of an independent clinical cohort, that had a complete response to platinum chemotherapy (n=9) from those having only a partial or no response to treatment («=68) resulting in an AUC of 0.76 ([0.54-0.93] 95% CI; FIG. 2C; FIG. 5B). Separatingthe MBC samples treated with platinum based upon DeepHRD’ s prediction revealed a differential clinical benefit between the HRD and HRP predicted samples with a median progression-free survival of 14.4 months for HRD patients and 3.9 months for HRP patients (p-value=0.0019, log-rank test). The model’s predictive value was consistent after correcting for breast cancer subtype, age of diagnosis, and the genomic HRD score with a hazard ratio of 0.47 ([0.27-0.83] 95% CI, q- value=0.0087; Cox Proportional Hazards regression; FIG. 2D). Further, DeepHRD captured 7 of the 9 samples that had a complete response to platinum treatment. In comparison, neither the separation of samples based upon the genomic status of BRCA1 /BRCA2 norusingthe elevated activity of the mutational signature associated with HRD (SBS3 as predicted by SigMA) resulted in a significant difference in progression-free survival (q-value=0.13 andq=0.34, respectively; Cox Proportional Hazards regression; FIG. 2D). While the small sample size of BRCAl/2
mutated tumors (~8% of the examined MBC samples) influenced the significance levels compared to wild-type tumors, the predictions from DeepHRD captured 4-fold more platinum sensitive samples than using
A / 2 status alone. Lastly, the tissue slides from the MBC cohort were digitalized using the Hamamatsu Nanozoomer system, while the TCGA breast cancers were digitalized usin the Aperio ScanScope system demonstrating the generalizability of DeepHRD across different scanning protocols.
[0058] To further assess the generalizability of the proposed method in application to other cancers, we performed transfer learning on the TCGA ovarian cancer cohort (FIG. 3 A). Individuals with ovarian cancer have traditionally received platinum chemotherapies as the first- line standard of care making this cohort ideal to test whether HRD predictions from tissue slides may have a direct clinical benefit. Specifically, we trained an independent model to predict HRD status from flash frozen slides usingthe TCGA ovarian cohort. Due to a two times smaller cohort size (/?=589 ovarian cancers), the ovarian model was initiated usingthe pretrained weights and biases generated from the flash frozen breast cancer model with the convolutional weights and biases frozen during training (FIG. 3 A). The final model was appliedto the held-outtest set of TCGA ovarian cancers to assess the ability of the model in separating individuals who benefit from treatment with platinum chemotherapy (FIG. 3 A). Of the 117 patients in the held-out set, 66 received first-line platinum chemotherapy for advanced-stage, high-grade serous ovarian cancer. Separating these individuals by their DeepHRD prediction resulted in a differential median survival between HRD and HRP predicted patients (FIG. 3B). Specifically, patients predicted to be HRD had a median survival of 4.6 years, while those predicted to be HRP had a median survival of 3.2 years (q-value=0.024) with a hazard ratio of 0.45 after correcting for the stage of the cancer, age, and the genomic HRD score ([0.22-0 90] 95% CI; Cox Proportional Hazards ratio; FIG. 3B). In comparison, we observed a worse separation of samples when using a base model with out transfer learning (q-value=0.076) with a hazard ratio of 0.53 after corrections ([0.26-1.07] 95% CI; Cox Proportional Hazards ratio; FIG. 3C), suggesting that the transfer learning provides a benefit when attempting to train Al-based approaches on smaller datasets. Consistent with the breast cancer cohort, neither separation of samples based upon the mutational status o BRCAl/BRCA2 nor based on the elevated activity of SBS3 resulted in a significant difference in survival (q-value=0.47 and q-value=0.32, respectively; Cox Proportional Hazards regression; FIG. 3C).
[0059] The development of DeepHRD prediction models for breast and ovarian cancers demonstrates the practicality of employing Al-based guidance into clinical diagnostics and precision medicine workflows. Results across multiple publicly available (TCGA, CPTAC, and METABRIC) and additional external cohorts indicate that the models are applicable to routinely sampled tissue blocks and are generalizable across different cancers, histological and molecular subtypes, digital scanning systems, and tissue fixation procedures, including variability in H&E tissue staining. The performance of DeepHRD was consistent across primary and metastatic breast cancers and, by incorporating transfer learning, the model was also applicable to serous ovarian cancer. Most importantly, based on RECIST 1.1 criteria, DeepHRD predicted clinical response and progression-free survival to DNA-damage-response targeted therapies, namely, platinum therapies, and outperformed existing genomicbased diagnostic biomarkers viz., BRCA1/2 status and signature SBS3). Furthermore, consistent with prior breast cancer genomic studies, DeepHRD captured patients with BRCA1/2 wild-type tumors who responded to platinum therapy, identifying 4-fold more responsive patients than BRCA1/2 mutation-testing alone (FIG. 2D). These results demonstrate that genotyping-based and sequencing-based assays, traditionally used for assessing HRD in a clinical setting, can be substituted and/or complemented with Al-baseddeep learning models that can rapidly predict clinical response from routine digitalized diagnostic histopathological slides. Circumventing the reliance on genomic profiling provides a solution which is more readily deployable into the clinic while delivering greater accessibility to state-of-the-art diagnostics for a larger proportion of the population across diverse socioeconomic groups.
[0060] Historically, diagnosis of solid cancers has evolved from microscopic morphological assessment of H&E slides to genomic biomarker testing. While crucial for clinical management of certain cancers, genomic testing has substantially complicated routine clinical oncology workflows as it often requires re-biopsy to procure tumor tissue sufficientfor molecular assays as well as extensive analytics to analyze the large-scale data generated by these molecular assays. Recent deep learning Al approaches has demonstrated the ability to detect genomic biomarkers directly from H&E images, including ones indirectly related to therapy outcome (e g., detection of micro satellite instability that can be predictive of response to immunotherapy). In some implementations, no prior study has shown direct clinical significance of Al-based models for detecting HRD by predicting treatment benefit with external validations. Since HRD
is a complementary biomarker to help guide the use of platinum therapies and an FDA-approved companion diagnostic test for the use of PARP inhibitors, the performance of the neural network (e g , DeepHRD) implemented based on some embodiments of the disclosed technology has direct implications for predicting response to DNA-damage-response targeting therapies within breast, ovarian, and other cancer types with known HR-deficiencies. One such example is pancreatic ductal adenocarcinoma (PDAC), where patients with HRD detected using an FDA- approved targeted-sequencing assay had an improved clinical outcome with standard first-line platinum-based treatment. A severe limitation to using genomic detection of HRD in PDAC standard clinical practice was that, in addition to issues with tissue procurement for genomic studies, the 3-6 week turn-around for obtaining molecular profiling was not appropriate for first- line treatment of advanced disease due to a median survival of only 3-6 months with rapid progression in some PDAC patients. For such cancer types, Al-based detection of HRD from routinely generated H&E slides may provide a better and faster diagnostic alternative.
[0061] While there has been an explosion of deep learning and computer vision-based approaches in digital pathology, the immediate translation into clinical practice has been limited by the lack of global accessibility in developing countries and resource-constrained communities due to the required infrastructure and the high overhead costs associated with streamlining digital pathology. Nevertheless, a recent study has shown the potential for deploying deep learning models trained on whole slide images directly to hand-held photographs of core-needle-biopsied tissues taken from a microscope’s field of view. These types of digitalized tissue images are smaller in size and ultimately require a tenth of the computational resources to process for downstream diagnostics making them readily deployable on a local smartphone device with a standard high-resolution camera attached to the ocular lens of a conventional light microscope. This approach promises inexpensive, efficient, and accurate deep-learning read-outs within seconds of preparing an H&E slide. In coordination with the development of lightweight deep learning architectures, there are opportunities for deploying diagnostic applications, which are traditionally computationally expensive, into a manageable package without a substantial decrease in predictive power. By relying on smartphone microscopy images, this transition would provide Al-based diagnostic solutions for equitable and efficient clinical management for all cancer patients worldwide.
[0062] Additional Methods
[0063] Data Sources
[0064] The collection of flash frozen and formalin-fixed paraffin-embedded (FFPE) slides from TCGA along with all clinical features were downloaded from the Genomic Data Commons (GDC; https://gdc.cancer.gov/). The collection of flash frozen slides from CPTAC were downloaded from The Cancer Imaging Archive (TCIA), and the genomics data was downloaded from the GDC. The collection of images from METABRIC and the associated SNP6 genotyping microarray data were downloaded from EGA (accession numbers: EGAD00010000270 and EGAD00010000266). The predicted cancer subtype for a subset of the TCGA breast cancer cohort were obtained from a previous study that utilized the 50-gene PAM50 model. HRD scores for the TCGA breast and ovarian cancers were obtained from a previous study. The 77 patients from the clinical cohort of whole-exome sequenced metastatic breast cancers were enrolled between June 2018 and March 2020 and all received at least one line of platinum chemotherapy. All clinical evaluations were determined locally at the Georges Francois Leclerc Cancer Center as previously reported.
[0065] Data Preprocessing
[0066] Each of the whole-slide images (WSIs) was segmented into 256x256 tiles at 5x and 20x magnifications containing 2pm per pixel and 0.5pm per pixel, respectively. Blurry tiles and those with less than 80% of pixels representing tissue were removed from all training and testing cohorts. To filter blurry tiles, a Laplacian filter was applied to each tile using a 3x3 kernel, and all tiles with a variance less than 0.02 were removed from the remaining analysis. All green, red, and blue pen marks and other annotation artifacts were removed by thresholding on the RGB color channels within each pixel.
[0067] Calculating HRD Scores
[0068] HRD scores were calculated as previously reported using scarHRD. Specifically, the HRD score is the summation of the telomeric allelic imbalance score, loss of heterozygosity score, and large-scale transitions scores calculated for each patient using ASCAT-derived copy number calls from SNP6 genotyping microarrays. The HRD scores for the CPTAC breast cancer samples were calculated based on copy number calls derived from whole-exome sequencing using Sequenza, which has been shown to result in analogous distributions of HRD scores to
HRD scores calculated using ASCAT-derived copy number calls from SNP6 genotyping microarrays.
[0069] For both TCGAbreast and ovarian cancer cohorts, soft labeling was applied to the HRD scores using specified thresholds for samples labeled confidently as HRD or HRP. These thresholds were determined by first splitting samples within each cancer type into HRD and HRP partitions using a single cutoff (HRD>=30 for breast and HRD>=63 for ovarian). The median values of the two resulting partitions of samples for each cancer type were used to set the range of confident HRD and HRP thresholds. All intermediate HRD scores were modeled as a probability using a quadratic function (equation 1).
[0070] Specifically, HRD scores above 50 were considered HR-deficient and scores below 10 were considered proficient in the breast cancer cohorts. All intermediate scores were modelled as a probability of being deficient or proficient with an equal probability of both conditions at an HRD score of 30 (equation 2).
[0071] Within the TCGA ovarian cohort, HRD scores above 73 were considered deficient and scores below 53 were considered proficient with the intermediate probabilities centered at 63 (equation 3).
[0072] Model Training and Testing
[0073] Prior to training, the number of HRD and HRP samples were balancedin all breast cancer subtypes using the PAM50 model classifications to normalize for specific breast cancer subtypes
being enriched or depleted ofHRD samples. All samples without annotated PAM50 subtype labels were considered as missing and were alsobalanced for the number of HRD and HRP cases (FIG. 4A). Soft labelling was incorporated to prevent overfitting during training and to account for ambiguity in the ability of the HRD score to classify true HRD samples. The entirety of training and testing was performed using the machine learning Python framework Pytorch (v.1.5.0). For both resolution models, the Adam optimizer was used for training with a learning rate of 10-3, a weight decay of 1 O'4, and minibatches consisting of 64 tiles. Each model was initiated using the ResNetl 8 architecture that was pretrained on the ImageNet (http://www.image-net.org/) database and was trainedfor200 epochs. All convolutional weights were frozen during training. Early stoppage was incorporated to prevent overfitting.
[0074] After training the 5x resolution models, a final inference pass is performed on all slides. All features from a single WSI were selected from the penultimate layer of the feature extractor and projected into a lower dimensional latent space using principal component analysis. K- means clustering was used to automatically select regions of interests (ROIs) for retiling at 20x magnification. The number of clusters was determined by selecting the solution with the maximum silhouette coefficient. The cluster containing the tile with the highest prediction probability was used to select the ROIs. All tiles belonging to this cluster, and which had a silhouette score greater than the 95% quantile of all silhouette scores for the given WSI were chosen as the final ROIs. Each ROI was then tiled into 256x256 pixel sub-tiles at20x magnification. This results in 16 tiles at 20x magnification for each ROI at a 5x magnification. To perform an inference pass of the model, a single WSI image is processed across 10 iterations with a random dropout probability of 0.20 for all nodes within the fully connected layers.
[0075] Transfer Learning
[0076] The weights collected from the final models trained to detect HRD from flash frozen breast slides were used to initiate the model weights for the ovarian model known as transfer learning. The held-out internal validation set was used to perform survival analysis based upon prior treatment with platinum chemotherapy . There were not enough FFPE slides for the ovarian cohort for training and testing a DeepHRD model for FFPE ovarian cancer samples.
[0077] Visualizing DeepHRD Predictions
[0078] Once successfully trained, DeepHRD is used to make predictions for individual wholeslide images. When performingthe multi-resolution inference, DeepHRD generates HRD
probabilities for each tile at 5x magnification and for each tile within the automatically selected regions of interest at 20x magnification. Using the location of the original tiles, the probabilities can be mapped backto the original location within the whole-slide image to visualize the regional patterns that are influencing the final model prediction.
[0079] Survival Analysis
[0080] Survival analysis was performed using the Lifelines Python package (v.0.24.4.). Forbofli the metastatic breast cancer (MBC) and the TCGA ovarian cohorts, samples were partitioned based upon the prediction from each respective DeepHRD model. Only samples thatwere treated with platinum chemotherapy were considered in the survival comparisons. Survival curves were compared using a log-rank test. Hazard ratios were calculated from Cox regressions after correcting for age of diagnosis, primary breast cancer subtype, and genomic HRD score within the MBC cohort and age of diagnosis, ovarian cancer stage, and genomic HRD score within the TCGA ovarian cohort. Median survival was calculated as the time at which the chance of surviving beyond that point is 50%.
[0081] Statistics
[0082] All performance metrics were calculated using the scikit-learn Python package (v 0.22.1) Confidence intervals were calculated using non-parametric resampling. Standard error bars were calculated usingthe NumPy Python package (v.1.18.1).
[0083] FIG. 6 shows an example method 600 of determining the presence of a biomarker in a biological sample based on some implementations of the disclosed technology.
[0084] In some implementations of the disclosed technology, the method 600 may include, at 610, obtaining a section of a biological sample, wherein the section of the biological sample has been treated with a stain, at 620, imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution to generate a first and second plurality of image data, at 630, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and at 640, providing the first and the second plurality of image data to a trained predictive neural network and determining the presence of a biomarker in the biological sample as an output of the trained predictive neural network.
[0085] FIG. 7 shows an example method 700 of generating a trained predictive model configured to determine a presence of a biomarker in a biological sample based on some implementations of the disclosed technology.
[0086] In some implementations of the disclosed technology, the method 700 may include, at 710, generating stained sections ofone or more biological samples and corresponding biomarker labels, at 720, imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution to generate a first and second plurality of image data, at 730, reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data, and at 740, generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
[0087] FIG. 8 shows another example method 800 of determining the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
[0088] In some implementations of the disclosed technology, the method 800 may include, at 810, obtaining a stained section of the biological sample, at 820, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, and at 830, providing the plurality of images of the stained section an input to a trained predictive model and determining the presence of a biomarker in the biological sample as an output of a trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing.
[0089] FIG. 9 shows a treatment method 900 for treating cancer in a subject in need thereof based on some implementations of the disclosed technology.
[0090] In some implementations of the disclosed technology, the method 900 may include, at 910, obtaining a stained section of a biological sample, at 920, imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section, at 930, providing the plurality of images of the stained section to atained predictive model and determining a presence of a biomarker in the biological sample as an output of the trained predictive model, wherein the trained predictive model is configured with a preset
accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing, and at 940, administering treatment to the patientbased on the presence of the biomarker.
[0091] FIG. 10 shows an example of a computer system 1000 configured to determine the presence of a biomarker of a biological sample based on some implementations of the disclosed technology.
[0092] In some implementations of the disclosed technology, the system 1000 includes a processor 1010 and a memory or storage medium 1020. The processor 1010 reads code from the memory 1020 and implements a method discussed in this patent document.
[0093] Therefore, various implementations of features of the disclosed technology can be made based on the above disclosure, including the examples listed below.
[0094] Method of Utilizing a Trained Predictive Model to Determine the Presence of a Biomarker in a Section
[0095] Example 1 . A method of determining the presence of a biomarker of a biological sample, comprising: (a) providing a section of a biological sample, wherein the section of the biological sample has been treated with a stain; (b) imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution thereby generating a first and second plurality of image data; (c) reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data; and (d) determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided an input of the reduced first and second plurality of image data.
[0096] Example 2. The method of example 1, wherein the trained predictive model is configured to determine the presence of the biomarker with an accuracy of at least 80% as compared to genomic sequencing.
[0097] Example 3. The method of example 2, wherein the accuracy comprises at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% as compared to genomic sequencing.
[0098] Example 4. The method of example 1, wherein the trained predictive model comprises a first predictive model trained on the first plurality of image data and a second predictive model trained on the second plurality of image data.
[0099] Example 5. The method of example 1, wherein the biomarker comprises loss of chromosome 9p
[00100] Example 6. The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene TP53. In some implementations, TP53 include a tumor suppressor gene. In one example, TP53 indicates tumor protein P53.
[00101] Example 7. The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene EGFR (epidermal growth factor receptor).
[00102] Example 8. The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene BRAF. In some implementations, BRAF includes a human gene that encodes a protein called B-Raf. In one example, BRAF indicates v-raf murine sarcoma viral oncogene homolog B 1.
[00103] Example 9. The method of example 1, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
[00104] Example 10. The method of example 1, wherein the biomarker comprises presence of MSI (vs MS S) and/or MMR gene (e g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects. In some implementations, the MSI gene defect indicates a micro satellite instable (MSI) gene defect, and the MMR gene defect indicates a mismatch repair gene defect. In some implementations, the MSS indicates micro satellite stable.
[00105] Example 11. The method of example 1, wherein the biomarker comprises presence of high tumor mutational burden.
[00106] Example 12. The method of example 1, wherein the biomarker comprises presence of hypermutator mutational signatures selected from: POLE comprised of POLE and MSI - COSMIC14 (POLE+MSI); MSI combined MSI - COSMIC15, MSI - COSMIC20 (POLD+MSI), MSI - COSMIC21, MSI - COSMIC26, and MSI - COSMIC6.
[00107] Example 13. The method of example 1, wherein the biomarker comprises presence of apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC) alterations and mutational signature. In some implementations, APOBEC indicates a family of evolutionarily conserved cytidine deaminases.
[00108] Example 14. The method of example 1, wherein the biomarker comprises presence of homologous recombination deficiency (HRD). Two commercial HRD companion diagnostic (CDx) tests, Myriad myChoice® CDx and FoundationOne® CDx, have been FDA
approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status, and, at least three academic HRD detection approaches (SigMA, HRDetect, and CHORD) exist. In some implementations, BRCA indicates breast cancer gene. [00109] Example 15. The method of example 1, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 14.
[00110] Example 16. The method of example 1, wherein the biomarker comprises presence of BRCA1/2 mutations
[00111] Example 17. The method of example 1, wherein the biomarker comprises presence of “COSMIC3 - BRCA” mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as “mutational signature 3” (Sig3) in the COSMIC signature catalog or the presence of genomic ‘scar’ signatures.
[00112] Example 18. The method of example 1, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH); number of telomeric imbalances (telomeric allelic imbalance, or TAI), which are the number of regions with allelic imbalance that extend to the sub-telomere but not across the centromere; and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
[00113] Example 19. The method of example 1, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or any combination thereof.
[00114] Example 20. The method of example 1, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1, BRCA2 (also called ‘BRCA-ness’): alterations in PALB2, ATM, ATR, CHEK1/2, FANC genes (FANCA/C/D2/E/F/G/I//L/M/ 1), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B, ATRX, BAP1,
BARD1 , BRIP1 , CDK12, PPP2R2A, MRE11, MRE11 A, NBN, TP53, NC0R1 , PTK2, ARID1 A, BLM, WRN, CDK12, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2. [00115] Example 21. The method of example 1, wherein the biomarker comprises presence of potentially actionable genomic alterations in one or more of the following genes: ABL1, AKT1, ALK, APC, ATM, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, IDH1, IDH2, JAK1, JAK2, JAK3, KDR, KIT, MAP2K1, MET, NTRK, NTRK1, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, NOTCH1, VEGFA, HGF, NPM1, PTPN11, RBI, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, GNA13, GNAQ, GNA11, RRAS2, KIF1 A, KIF5B.
[00116] Example 22. The method of example 1, wherein the biomarker indicatescopy number alterations, deletions, amplifications, fusions, mutation clusters, mutation signatures or any combination thereof the genome of the biological sample.
[00117] Example 23. The method of example 1, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
[00118] Example 24. The method of example 1 , wherein the trained predictive model comprises a convolutional neural network.
[00119] Example 25. The method of example 1 , wherein the trained predictive model comprises a neural network such as ResNet model.
[00120] Example 26. The method of example 1 , further comprising reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data. In some implementations, the parameter space of the first and second plurality of image data indicates tiles at5x magnification, wherein the parameter space of the first and second plurality of image data is reduced to 25%, 10%, or 5% of the tiles carrying predictive information.
[00121] Example 27. The method of example 26, wherein reducing is completed by principal component analysis.
[00122] Example 28. The method of example 1, wherein the biological sample comprises a cancer free, or cancerous biological sample.
[00123] Example 29. The method of example 1, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
[00124] Example 30. The method of example 29, wherein the unhealthy tissue comprises virally infected tissue.
[00125] Example 31. The method of example 30, wherein virally infected tissue comprises human papilloma virus (HPV) positive tissue.
[00126] Example 32. The method of example 30, wherein the virally infectedtissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
[00127] Example 33. The method of example 29, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
[00128] Example 34. The method of example 33, wherein the unhealthy tissue comprises premalign ant or precancerous tissue.
[00129] Example 35. The method of example 1 , wherein the stain comprises a hematoxylin and eosin stain.
[00130] Example 36. The method of example 1 , wherein the first resolution comprises a 5 X magnification, and wherein the second resolution comprises a 20X magnification.
[00131] Example 37. The method of example 26, further comprising clustering the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
[00132] Example 38. The method of example 37, wherein clustering is completed by k- means clustering.
[00133] Example 39. The method of example 37, wherein the trained predictive model is trained with the clustered datasets that represent the top 15% of the variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
[00134] Example 40. The method of example 37, wherein the trained predictive model is trained with the first and second clustered dataset and corresponding biomarker label of the biological sample, wherein the first and second clustered dataset comprise clustered datasets with
silhouete coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
[00135] Example 41. The method of example 40, wherein the corresponding biomarker label of the biological sample is determinedby genomic sequencing.
[00136] Example 42. The method of example 1, wherein the output of the trained predictive model comprises an averaged predicted probability score of the firstand second predictive model.
[00137] Example 43. The method of example 1 , wherein the one or more regions comprise at least 100 regions.
[00138] Example 44. The method of example 1 , wherein the one or more regions comprise at most 10,000 regions
[00139] Example 45. The method of example 1 , comprising removing one or more nodes of the trained predictive model when the trained predictive model is provided an input of the reduced firstand second plurality of image data.
[00140] Method of Training a Predictive Model
[00141] Example 46. A method of generating a trained predictive model configured to determine a presence of a biomarker of a biological sample, comprising: (a) providing stained sections of one or more biological samples and corresponding biomarker labels; (b) imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution thereby generating a first and second plurality of image data; (c) reducing a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data; and (d) generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
[00142] Example 47. The method of example 46, wherein the trained predictive model is configured to determine the presence of a biomarker with an accuracy of at least 80% as compared to genomic sequencing.
[00143] Example 48. The method of example 47, wherein the accuracy comprises at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% as compared to genomic sequencing.
[00144] Example 49. The method of example 46, wherein the biomarker label comprises loss of chromosome 9p.
[00145] Example 50. The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
[00146] Example 51. The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
[00147] Example 52. The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
[00148] Example 53. The method of example 46, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
[00149] Example 54. The method of example 46, wherein the biomarker comprises presence of MSI (vsMSS) and/or MMR gene defects comprising one or more of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, and PMS2.
[00150] Example 55. The method of example 46, wherein the biomarker comprises presence of hypermutator mutational signatures selected from POLE, MSI - COSMIC14, (POLE+MSI), MSI combined, MSI - COSMIC 15, MSI - COSMIC20, (POLD+MSI), MSI - COSMIC21, MSI - COSMIC26, and MSI - COSMIC6.
[00151] Example 56. The method of example 46, wherein the biomarker comprises presence of APOBEC alterations and mutational signature.
[00152] Example 57. The method of example 45, whereinthe biomarker comprises presence of high tumor mutational burden.
[00153] Example 58. The method of example 46, wherein the biomarker comprises presence of homologous recombination deficiency (HRD) Two commercial HRD companion diagnostic (CDx) tests, Myriad my Choice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exist.
[00154] Example 59. The method of example 46, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 14.
[00155] Example 60. The method of example 46, wherein the biomarker comprises presence of BRC A 1/2 mutations
[00156] Example 61. The method of example 46, wherein the biomarker comprises presence of “COSMIC3 - BRCA’ ’ mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) comprising “mutational signature 3” (Sig3) in the COSMIC signature catalog, or the presence of genomic ‘scar’ signatures.
[00157] Example 62. The method of example 46, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size (over 15 MB and less than the whole chromosome); number of telomeric imbalances (telomeric allelic imbalance, or TAI), which are the number of regions with allelic imbalance that extend to the sub-telomere but not across the centromere; and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
[00158] Example 63 . The method of example 46, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or any combination thereof.
[00159] Example 64. The method of example 46, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1, BRCA2 (also called ‘BRCA-ness’): alterations in PALB2, ATM, ATR, CHEK1/2, FANC genes (FANCA/C/D2/E/F/G/I/7L/M/ 1), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B, ATRX, BAP1, BARD I, BRIP1 , CDK12, PPP2R2A, MRE1 1, MRE11 A, NBN, TP53, NCOR1 , PTK2, ARID1 A, BLM, WRN, CDK12, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2. [00160] Example 65. The method of example 46, wherein the biomarker comprises presence of potentially actionable genomic alterations in one or more of the following genes: ABL1, AKT1, ALK, APC, ATM, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI , IDH1 , IDH2, JAK1, IAK2, JAK3, KDR, KIT, MAP2K1, MET, NTRK, NTRK1, CCNE, CCNE1,
CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, F0XL2, GNA11, GNAQ, GNAS, HNF1 A, MLH1, MPL, MSH6, N0TCH1, VEGFA, HGF, NPM1, PTPN11, RBI, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, GNA13, GNAQ, GNA11, RRAS2, KIF1 A, KIF5B //SEERK MSKCC LINK, DS/C ARIS EMAIL.
[00161] Example 66. The method of example 46, wherein the stained sections of the one or more biological samples comprises paraffin embedded sections, formalin fixed sections, frozen sections, fresh sections, or any combination thereof sections.
[00162] Example 67. The method of example 46, wherein the trained predictive model comprises a convolutional neural network.
[00163] Example 68. The method of example 46, wherein the trained predictive model comprises a ResNet model.
[00164] Example 69. The method of example 46, wherein reducing is completed by principal component analysis.
[00165] Example 70. The method of example 46, wherein the one ormore biological samples comprise a cancer free, cancerous biological sample, healthy tissue, unhealthy tissue, or any combination of health and unhealthy tissues.
[00166] Example 71. The method of example 70, wherein the unhealthy tissue comprises virally infected tissue.
[00167] Example 72. The method of example 71, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
[00168] Example 73. The method of example 46, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
[00169] Example 74. The method of example 70, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
[00170] Example 75. The method of example 74, wherein the unhealthy tissue comprises premalign ant or precancerous tissue.
[00171] Example 76. The method of example 46, wherein the stain comprises a hematoxylin and eosin stain.
[00172] Example 77. The method of example 46, wherein the first resolution comprises a 5X magnification, and wherein the second resolution comprises a 20X magnification.
[00173] Example 78. The method of example 46, further comprising clustering the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
[00174] Example 79. The method of example 78, wherein clustering is completed by k- means clustering.
[00175] Example 80. The method of example 78, wherein the trained predictive model is trained with clustered datasets that represent the top 15% of the variancebetween clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
[00176] Example 81. The method of example 78, wherein the firstand second predictive models are trained with one or more biological samples’ first and second clustered dataset and the corresponding biomarker labels, wherein the first and second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
[00177] Example 82. The method of example 46, wherein the corresponding biomarker labels of the one or more biological samples are determined by genomic sequencing.
[00178] Example 83. The method of example 46, wherein the output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
[00179] Example 84. The method of example 46, wherein the one or more regions comprise at least 100 regions.
[00180] Example 85. The method of example 46, wherein the one or more regions comprise at most 1,000 regions.
[00181] Example 86. The method of example 46, wherein generating the trained predictive model comprises removing one or more nodes of the first and second predictive model during training.
[00182] System using Trained Predictive Model to Determine the Presence of a Biomarker of a Biological Sample
[00183] Example 87. A computer system configured to determine the presence of a biomarker of a biological sample, comprising: one or more processors; and a non-transient computer readable storage medium including software, wherein the software comprises executable instructions that, as a result of execution, cause the one or more processors of the computer system to: (i) receive a section of a biological sample, wherein the section of the biological sample has been stained; (ii) image one or more regions of the stained section at a first resolution and a second resolution thereby generating a first and second plurality of image data; (iii) reduce a parameter space of the first and second plurality of image data, thereby producing a reduced first and second plurality of image data; and (iv) determine the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided an input of the reduced first and second plurality of image data.
[00184] Example 88. The system of example 87, wherein the trained predictive model is configured to determine the presence of the biomarker with an accuracy of at least 80% as compared to genomic sequencing.
[00185] Example 89. The system of example 88, wherein the accuracy comprises atleast 85%, at least 92%, at least 95%, at least 97%, or atleast 99% as compared to genomic sequencing.
[00186] Example 90. The system of example 87, wherein the trained predictive model comprises a first predictive model trained on the first plurality of image data and a second predictive model trained on the second plurality of image data.
[00187] Example 91. The system of example 87, wherein the biomarker comprises loss of chromosome 9p.
[00188] Example 92. The system of example 87, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
[00189] Example 93. The system of example 87, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
[00190] Example 94. The system of example 87, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
[00191] Example 95. The system of example 87, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
[00192] Example 96. The system of example 87, wherein the trained predictive model comprises a convolutional neural network.
[00193] Example 97. The system of example 87, wherein the trained predictive model comprises a ResNet model.
[00194] Example 98. The system of example 87, wherein reducing is completed by principal component analysis.
[00195] Example 99. The system of example 87, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
[00196] Example 100. The system of example 99, wherein the unhealthy tissue comprises virally infected tissue.
[00197] Example 101. The system of example 100, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
[00198] Example 102. The system of example 100, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
[00199] Example 103. The system of example 99, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
[00200] Example 104. The system of example 99, wherein the unhealthy tissue comprises premalignant or precancerous tissue.
[00201] Example 105. The system of example 87, wherein the biological sample comprises a cancer free, or cancerous biological sample.
[00202] Example 106. The system of example 87, wherein the stain comprises a hematoxylin and eosin stain.
[00203] Example 107. The system of example 87, wherein the first resolution comprises a 5 X magnification, and wherein the second resolution comprises a 20X magnification.
[00204] Example 108. The system of example 87, wherein the instructions further comprise cluster the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
[00205] Example 109. The system of example 108, wherein the instruction of clustering is completed by k-means clustering.
[00206] Example 110. The system of example 108, wherein the trained predictive model is trained with the clustered datasets that represent the top 15% of the variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
[00207] Example 11 1. The system of example 108, wherein the trained predictive model is trained with the biological sample’s first and second clustered dataset and corresponding biomarker labels of the biological samples, wherein the firstand second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
[00208] Example 112. The system of example 111, wherein the corresponding biomarker label of the biological sample is determinedby genomic sequencing.
[00209] Example 113. The system of example 87, wherein the output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
[00210] Example 114. The system of example 87, wherein the one or more regions comprise at least 100 regions, or at most 1,000 regions, or at least 100 regions and at most 1,000 regions.
[00211] Example 115. The system of example 87, wherein the one or more processors comprise one or more processors of a smartphone, tablet, laptop, desktop, server, cloud computing architecture, or any combination thereof.
[00212] General Method of Converting Image data into Biomarker Indications
[00213] Example 116. A method of determining the presence of a biomarker of a biological sample, comprising: (a) providing a section of a biological sample, wherein the section of the biological sample has been stained; (b) imaging one or more regions of the stained section of the biological sample thereby generating a plurality of images of the stained section; (c) determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided the plurality of images of the
stained section an input, wherein the trained predictive model provides an accuracy of determining the presence of the biomarker of at least 80% as compared to genomic sequencing. [00214] Example 117. The method of example 116, wherein the accuracy comprises at least 85%, at least 92%, at least 95%, at least 97%, or at least 99% as compared to genomic sequencing.
[00215] Example 118. The method of example 116, wherein the trained predictive model comprises a first predictive model trained on a first plurality of images acquired at a first resolution and a second predictive model trained on a second plurality of images acquired at a second resolution.
[00216] Example 119. The method of example 116, wherein the biomarker comprises loss of chromosome 9p.
[00217] Example 120. The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
[00218] Example 121. The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
[00219] Example 122. The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
[00220] Example 123. The method of example 116, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
[00221] Example 124. The method of example 16, wherein the biomarker comprises presence of MSI (vsMSS) and/or MMR gene (e g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects.
[00222] Example 125. The method of example 116, wherein the biomarker comprises presence of hypermutator mutational signatures: POLE comprised of ‘ ‘POLE” and SBS6, SBS14, SBS15, SBS20, SBS21, SBS2, SBS26, SBS44.
[00223] Example 126. The method of example 116, wherein the biomarker comprises presence of high tumor mutational burden.
[00224] Example 127. The method of example 116, wherein the biomarker comprises presence of APOBEC alterations and mutational signature.
[00225] Example 128. The method of example 116, wherein the biomarker comprises presence of homologous recombination deficiency (HRD). Two commercial HRD companion
diagnostic (CDx) tests, Myriad myChoice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exist.
[00226] Example 129. The method of example 116, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 14.
[00227] Example 130. The method of example 116, wherein the biomarker comprises presence of BRCA-1 and/or -2 mutations.
[00228] Example 131. The method of example 116, wherein the biomarker comprises presence of “COSMIC3 - BRCA” mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as “mutational signature 3” (Sig3) in the COSMIC signature catalog, or the presence of genomic ‘scar’ signatures.
[00229] Example 132. The method of example 116, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size; number of telomeric imbalances (telomeric allelic imbalance, or TAI); and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
[00230] Example 133. The method of example 116, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or any combination thereof.
[00231] Example 134. The method of example 116, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1, BRCA2 (also called ‘BRCA-ness’): alterations in PALB2, BARD1 , ATM, BRIP1 , CHEK1/2, CDK12, ATR, ATRX, BAP1 , ARID 1 A, FANC genes (FANCA/C/D2/E/F/G/V/L/M, FANCI), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B; as well as otherless common HRR gene
alterations in PPP2R2A, MRE11 , MRE11 A, NBN, TP53 , NC0R1 , PTK2, BLM, WRN, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2, NPM1, PTWN, H2AX, RPA; PRK2, NF1.
[00232] Example 135. The method of example 116, wherein the biomarker comprises presence of potentially actionable genomic alterations in one or more of the following genes: ABL1, AKT1, APC, ALK, APC, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, KDR, MET, NTRK, NTRK1/2/3, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNA13, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, NOTCH1, VEGFA, HGF, NPM1, PTPN11, RBI, SMAD4, SMARCB 1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, RRAS2, KIF1A, KIF5B, IDH1/2, JAK1/2/3, MAP2K1, MAP3K1, GATA3, PTPN11, SRC, SETBP 1, FAT 1, KEAP1, LRP1B, FAT3, NF1, RB.
[00233] Example l36. The method of example 116, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
[00234] Example 137. The method of example 116, wherein the trained predictive model comprises a convolutional neural network.
[00235] Example 138. The method of example 116, wherein the trained predictive model comprises a ResNet model.
[00236] Example 139. The method of example 116, further comprising reducing a parameter space of the firstand second plurality of image data, thereby producing a reduced first and second plurality of image data.
[00237] Example 140. The method of example 116, wherein reducing is completed by principal component analysis.
[00238] Example 141 . The method of example 116, wherein the biological sample comprises a cancer free, or cancerous biological sample.
[00239] Example 142. The method of example 116, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
[00240] Example 143. The method of example 142, wherein the unhealthy tissue comprises virally infected tissue.
[00241] Example 144. The method of example 143, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
[00242] Example 145. The method of example 144, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type, also called human T-lymphotrophic virus (HTLV-1).
[00243] Example 146. The method of example 143, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
[00244] Example 147. The method of example 143, wherein the unhealthy tissue comprises premalignant or precan cerous tissue.
[00245] Example 148. The method of example 116, wherein the stain comprises a hematoxylin and eosin stain.
[00246] Example 149. The method of example 118, wherein the first resolution comprises a 5X magnification, and wherein the second resolution comprises a 20X magnification.
[00247] Example 150. The method of example 143, further comprising clustering the reduced first and second plurality of image data thereby generating a first and second clustered dataset.
[00248] Example 151. The method of example 150, wherein clusteringis completed by k- means clustering.
[00249] Example 152. The method of example 150, wherein the trained predictive model is trained with the biological sample’s first and second clustered dataset and corresponding biomarker label of the biological sample, wherein the firstand second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
[00250] Example 153. The method of example 152, wherein the corresponding biomarker label of the biological sample is determined by genomic sequencing.
[00251] Example 154. The method of example 116, wherein the output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
[00252] Example 155. The method of example 116, wherein the one or more regions comprise at least 100 regions.
[00253] Example 156. The method of example 116, wherein the one or more regions comprise at most 1,000 regions.
[00254] Example 157. The method of example 125, further comprising removing one or more nodes of the trained predictive model when provided as an input the reduced first and second plurality of image data.
[00255] Treatment Method for Treating Cancer in a Subject
[00256] Example 158. A treatment method for treating cancer in a subject in need thereof, the method comprising: providing a section of a biological sample, wherein the section of the biological sample has been stained; imaging one or more regions of the stained section of the biological sample thereby generating a plurality of images of the stained section; determining the presence of a biomarker of the biological sample as an output of a trained predictive model when the trained predictive model is provided the plurality of images of the stained section an input, wherein the trained predictive model provides an accuracy of determining the presence of the biomarker of at least 80% as compared to genomic sequencing; and administering treatment to the patientbased on the presence of the biomarker.
[00257] Example 159. The treatment method of example 158, wherein the biomarker comprises loss of chromosome 9p.
[00258] Example 160. The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene TP53.
[00259] Example 161. The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene EGFR.
[00260] Example 1 2. The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene BRAF.
[00261] Example 163. The treatment method of example 158, wherein the biomarker comprises presence of clustered mutations in the gene KIT.
[00262] Example 164. The treatment method of example 158, wherein the biomarker comprises presence of MSI (vsMSS) and/or MMR gene (e.g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects.
[00263] Example 165. The treatment method of example 158, wherein the biomarker comprises presence of hypermutator mutational signatures selected from: POLE comprised of “POLE” and SBS6, SBS14, SBS15, SBS20, SBS21, SBS2, SBS26, SBS44.
[00264] Example 166. The treatment method of example 158, wherein the biomarker comprises presence of high tumor mutational burden.
[00265] Example 167. The treatment method of example 158, wherein the biomarker comprises presence of APOBEC alterations and mutational signature.
[00266] Example 168. The treatment method of example 158, wherein the biomarker comprises presence of homologous recombination deficiency (HRD). Two commercial HRD companion diagnostic (CDx) tests, Myriad myChoice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 andBRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exist.
[00267] Example 169. The treatment method of example 158, wherein the biomarker comprises presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Example 168.
[00268] Example 170. The treatment method of example 158, wherein the biomarker comprises presence ofBRCA-1 and/or -2 mutations.
[00269] Example 171. The treatment method of example 158, wherein the biomarker comprises presence of “COSMIC3 - BRCA” mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as “mutational signature 3” (Sig3) in the COSMIC signature catalog, or the presence of genomic ‘scar’ signatures
[00270] Example 172. The treatment method of example 158, wherein the biomarker comprises presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size; number of telomeric imbalances (telomeric allelic imbalance, or TAI); and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
[00271] Example 173. The treatment method of example 158, wherein the biomarker comprises presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of the sequencing data, a total number
and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions ata 5’-NpCpG-3’ contexts features of the sequencing data, or any combination thereof.
[00272] Example 174. The treatment method of example 158, wherein the biomarker comprises presence of genomic alterations in one or more of the following homologous recombination repair (HRR)-related or -associated genes beyond BRCA1 , BRCA2 (also called BRCA-ness’): alterations in PALB2, BARD1, ATM, BRIP1, CHEK1/2, CDK12, ATR, ATRX, BAP1, ARID1A, FANC genes (FANCA/C/D2/E/F/G/I//L/M, FANCI), RAD50, RAD51 genes (RAD51 B/C/D/Ll/3), RAD52, RAD54L/C/D/B; as well as otherless common HRR gene alterations in PPP2R2A, MRE11 , MRE11 A, NBN, TP53 , NCOR1 , PTK2, BLM, WRN, RPA1 , EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2, NPM1, PTWN, H2AX, RPA; PRK2, NF1.
[00273] Example 175. The treatment method of example 158, wherein the biomarker comprises presence of actionable genomic alterations in one or more of the following genes: ABL1, AKT1, APC, ALK, APC, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, KDR, MET, NTRK, NTRK1/2/3, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNA13, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, NOTCH1, VEGFA, HGF, NPM1, PTPN11, RB1, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, RRAS2, KIF1A, KIF5B, IDH1/2, JAK1/2/3, MAP2K1, MAP3K1, GATA3, PTPN11, SRC, SETBP1, FAT1, KEAP1, LRP1B, FAT3, NF1, RB.
[00274] Example 176. The treatment method of example 163, wherein the patient has GIST; the treatment method comprising not administering c-Kit inhibitor imatinib.
[00275] Example 177. The treatment method of example 163, wherein the patient has GIST or other solid tumor; the treatment method comprising not administering c-Kit inhibitors in addition to imatinib, including Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib,
Pazopanib, Regorafenib, Ripretinib and Dovitinib.
[00276] Example 178. The treatment method of any of examples 164-166, further comprising administering a treatment for the cancer comprisingthe following drugs classes: immune checkpoint inhibitors (ICIs) and other immunotherapies to said subject if said MSI, MMR gene (e.g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects, hypermutator mutational signatures (e.g., COSMIC14/15/21/26/6) and/or high TMB of said sample is detected.
[00277] Example 179. The treatment method of any of examples 164-166, further comprising administering a treatment for the cancer comprisingthe following drugs classes: PD- 1 inhibitors (e g., Pembrolizumab,Nivolumab, Cemiplimab, Pidilizumab, Dostarlimab, larotrectinib), PD-L1 inhibitors (e.g., Atezolizumab, Avelumab, Durvalumab), CTLA-4 inhibitors (e.g., Ipilimumab and tremelimumab), LAG-3 inhibitors (e.g., tebotelimab, eftilagimod alpha, Relatlimab), TIM-3 inhibitors (e.g., MBG453, Sym023, TSR-022), other immunomodulator therapies alone or in combination with other ICIs or other drugs to said subject if said MSI, MMR gene (e.g., POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2) defects, hypermutator mutational signatures (e.g., COSMIC 14/15/21/26/6) and/or high TMB of said sample is detected.
[00278] Example 180. The treatment method of any of examples 168-174, further comprising administering a treatment for the cancer comprisingthe following drug classes: platinum drugs, poly-ADP ribose polymerase (PARP) inhibitors, and/or newer agents such as ATR, Weel or CHK, Pol-theta orRAD52 inhibitors to said subject if said HRD or surrogate gene or signature wherein said cancer comprises, breast cancer, ovarian cancer, pancreatic adenocarcinoma, prostate cancer, sarcoma, or any solid tumor or combination thereof cancer. In some implementations, the weights collected from a final DeepHRD model trained to detect HRD in one tissue type or modality can be used to initiate the model weights for another tissue type or modality. All other training procedures can stay the same, thus, allowing the transfer of knowledge from training one tissue type or modality to another. The treatment method can utilize this approach to train an ovarian cancer model by utilizing the prior knowledge from the breast cancer model based on some embodiments of the disclosed technology. This Al algorithm will allow application of this deep learning Al technology for other genomic alterations and other cancer types.
[00279] Example 181. The treatment method of any of examples 168-174, further comprising administering a treatment for the cancer comprising platinum drugs, including cisplatin, carboplatin, oxaliplatin, nedaplatin, lobaplatin, heptaplatin or satraplatin alone, or in combination with other drugs, e.g., FOLFOX to said subject if said HRD or surrogate gene or signature thereof of said sample is detected. In some embodiments, the cancer therapeutic causes inter-strand breaks of genomic molecules of the subject’s cells, leadingto p53-initiated apoptosis.
[00280] Example 182. The treatment method of any of examples 168-174, further comprising administering a treatment for the cancer comprising poly-ADP ribose polymerase (PARP) inhibitors, includin the four mainPARP inhibitors: olaparib (Lynparza), niraparib (Zejula), rucaparib (Rubraca), talazoparib (Talzenna) as well as other PARP inhibitors to said subject if said HRD or surrogate gene or signature thereof of said sample is detected.
[00281] Example 183. The treatment method of example 158, wherein the treatment method comprising not administering immune checkpoint inhibitors (ICIs) and other immunotherapies to said subject if said 9p deletions of said sample is detected.
[00282] Example 184. The treatment method of example 158, wherein the biomarker comprises presence ofEGFR/ErbBl mutations comprising one or more ofL858R, exonl 9del, and exon 20 alteration.
[00283] Example 185. The treatment method of examples 184, further comprising administering to the patient afatinib, dacomitinib, erlotinib, gefitinib, osimertinib [T790], or amivantamib.
[00284] Example 186. The treatment method of example 158, wherein the biomarker comprises presence ofHER2/ErbB2 Amplification
[00285] Example 187. The treatment of example 186, further compring administering to the patient traztuzumab, ado-trastuzumab emtansine, lapatinib, margetuximab, neratinib, pertuzumab, tucatinimb, deruxtecab, traztumab deruxtecan, orneratinib.
[00286] Example 188. The treatment method of example 158, wherein the biomarker comprises presence of BRAF mutation.
[00287] Example 189. The treatment method of example 188, further comprising administering to the patient encorafenib, vemurafenib, dabrafenib, trametinib, or cobimetinib.
[00288] Example 190. The treatment method of example 158, wherein the biomarker comprises presence ofFGFRl/2/3 fusions.
[00289] Example 1 1. The treatment method of example 190, further comprising administering to the patient erdafitanib, fatibatinib, infigratinib, pemigatinib, dovitinib; lenvatinib, pazopanib, ponatinib, or regorafenib .
[00290] Example 192. The treatment method of example 158, wherein the biomarker comprises presence ofPDGFRA exon 18 mutations.
[00291] Example 193. The treatment method of example 192, further comprising administering to the patient avapritinib or dasatinib.
[00292] Example 194. The treatment method of example 158, wherein the biomarker comprises presence of KIT mutations in GIST.
[00293] Example 195. The treatment method of example 194, further comprising administering to the patient imatinib, Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib, Pazopanib, Regorafenib, Ripretinib and Dovitinib, or sorafenib.
[00294] Example 196. The treatment method of example 158, wherein the biomarker comprises presence of NRG1 fusion.
[00295] Example 197. The treatment method of example 196, further comprising administering to the patient zenocutinumab or seribantmab,
[00296] Example 198. The treatment method of example 158, wherein the biomarker comprises presence of RET fusions.
[00297] Example 199. The treatment method of example 198, further comprising administering to the patient pralsetinib, selpercatinib; crizotinib, ceritinib, cabozantinib, or vandetanib.
[00298] Example 200. The treatment method of example 158, wherein the biomarker comprises presence ofROSl fusions.
[00299] Example 201 . The treatment method of example 200, further comprising administering to the patient crizotinib , or entrectinib .
[00300] Example 202. The treatment method of example 158, wherein the biomarker comprises presence ofNTRKl/2 or 3 fusions.
[00301] Example 203. The treatment method of example 202, further comprising administering to the patiententrectinib, larotrectinib, or repotrectinib .
[00302] Example 204. The treatment method of example 158, wherein the biomarker comprises presence of ALK fusions.
[00303] Example 205. The treatment method of example 204, further comprising administering to the patient crizotinib, alectinib, brigatinib, ceritinib, orlorlatinib.
[00304] Example 206. The treatment method of example 158, wherein the biomarker comprises presence of PIK3CA alterations.
[00305] Example 207. The treatment method of example 206, further comprising administering to the patient alpelisib, temsirolimus, or everolimus.
[00306] Example 208. The treatment method of example 158, wherein the biomarker comprises presence ofMtor or TSC1/2 mutations.
[00307] Example 209. The treatment method of example 208, further comprising administering to the patient temsirolimus, or everolimas.
[00308] Example 210. The treatment method of example 158, wherein the biomarker comprises presence of Akt, or PTEN alterations.
[00309] Example 211 . The treatment method of example 210, further comprising administering to the patient capivasertib.
[00310] Example 212. The treatment method of example 158, wherein the biomarker comprises presence of MET amplification or mutation.
[00311] Example 213. The treatment method of example 212, further comprising administering to the patient crizotinib, tepotinib, capmatinib, telisotuzumib, tepotinib, or savolitinib.
[00312] Example 214. The treatment method of example 158, wherein the biomarker comprises presence of MEK mutation.
[00313] Example 215. The treatment method of example 214, further comprising administering to the patient tram etinib, cobimetinib, or selumetinib .
[00314] Example 216. The treatment method of example 158, wherein the biomarker comprises presence ofNFl/2 alterations.
[00315] Example 217. The treatment method of example 216, further comprising administering to the patient tram etinib, temsirolimus, everolimus, or selumetinib.
[00316] Example 218. The treatment method of example 158, wherein the biomarker comprises presence of STK11 alterations.
[00317] Example 219. The treatment method of example 218 comprising administering to the patient dasatinib, everolimus, temsirolimus, orbosutinib.
[00318] Example 220. The treatment method of example 158, wherein the biomarker comprises presence of KDR alterations.
[00319] Example 221. The treatment method of example 220, further comprising administering to the patient pazopanib, regorafenib, orvandetanib.
[00320] Example 222. The treatment method of example 158, wherein the biomarker comprises presence of microsatellite stable (MS) with DNA polymerase-s (POLE) mutation, CD274 amplification, or 9p24.1 amplicon.
[00321] Example 223. The treatment method of example 222, further comprising administer ICIs to the patient.
[00322] Example 224. The treatment method of example 158, wherein the biomarker comprises presence ofMAP2K alterations.
[00323] Example 225. The treatment method of example 224, further comprising administering to the patient trametinib .
[00324] Example 226. The treatment method of example 158, wherein the biomarker comprises presence of alterations to CCND2, CDK4, or CDKN2A/B.
[00325] Example 227. The treatment method of example 226, further comprising administering to the patient Palbociclib.
[00326] Example 228. The treatment method of example 158, wherein the biomarker comprises presence of IDH1 mutation
[00327] Example 2329The treatment method of example 228, further comprising administering to the patient ivosidenib .
[00328] Example 230. The treatment method of example 158, wherein the biomarker comprises presence of truncating or oncogenic mutations in B2M, PTEN, JAK1, JAK2, STK11 and EGFR, and/or 9p21 or 9p arm/genetic region loss.
[00329] Example 231. The treatment method of example 230, further comprising not administering to the patient an immune checkpoint inhibitor.
[00330] Example 232. The treatment method of example 158, wherein the biomarker comprises presence of mutations in the RAS genes KRAS and NRAS.
[00331] Example 233. The treatment method of example 232, further comprising not administering to the patient epidermal growth factor receptor (EGFR) therapies, like cetuximab and panitumumab, in colorectal cancer, and EGFR tyrosine kinase inhibitors, like erlotinib, in lung cancer.
[00332] In some implementations of the disclosed technology, genomic sequencing encompasses any type of genomic profiling where DNAandRNA are subjected to nextgeneration massively parallel sequencing protocol or genotyping through microarray hybridization.
[00333] In some implementations of the disclosed technology, accuracy encompasses the mathematical terms: sensitivity, specificity, precision, negative predictive values, accuracy, and balanced accuracy, or any combination thereof mathematical terms.
[00334] Implementations of the subject matter and the functional operations described in this patent document can be implemented in various systems, digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine- readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term “data processing unit” or “data processing apparatus” encompasses all apparatus, devices, andmachines for processing data, includingby way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
[00335] A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or
interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts storedin a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network
[00336] The processes and logic flows describedin this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performedby, and apparatus can also be implemented as, special purpose logic circuitry, e g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
[00337] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices.
Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, mediaand memory devices, includingby way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
[00338] It is intended that the specification, together with the drawings, be considered exemplary only, where exemplary means an example. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly
indicates otherwise. Additionally, the use of “or” is intended to include “and/or”, unless the context clearly indicates otherwise.
[00339] While this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub combination Moreover, although features maybe described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub combination or variation of a subcombination.
[00340] Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.
[00341] Only a few implementationsand examples are described and other implementations, enhancements and variations can be made based on what is described and illustrated in this patent document.
Claims
1 . A method of determining a presence of a biomarker in a biological sample, comprising: obtaining a section of a biological sample, wherein the section of the biological sample has been treated with a stain; imaging one or more regions of the stained section of the biological sample at a first resolution and a second resolution to generate a first and second plurality of image data; reducing a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data; and providing the first and the second plurality of image data to a trained predictive neural network and determining the presence of a biomarker in the biological sample as an output of the trained predictive neural network.
2. The method of claim 1, wherein the trained predictive neural network is configured to determine the presence of the biomarker with a preset accuracy, wherein the preset accuracy is at least 80% of an accuracy of genomic sequencing.
3 . The method of claim 2, wherein the preset accuracy comprises 85%, 92%, 95%, 97%, or 99% of the accuracy of genomic sequencing.
4. The method of claim 1 , wherein the trained predictive neural network comprises a first predictive model trained on the first plurality of image data and a second predictive neural network trained on the second plurality of image data.
5. The method of claim 1, wherein the biomarker comprises loss of chromosome 9p.
6. The method of claim 1, wherein the biomarker comprises a presence of clustered mutations in TP53 (tumor protein P53) gene.
7. The method of claim 1, wherein the biomarker comprises a presence of clustered mutations in epidermal growth factor receptor (EGFR) gene.
8. The method of claim 1, wherein the biomarker comprises a presence of clustered mutations in BRAF (v-raf murine sarcoma viral oncogene homologB 1) gene.
9. The method of claim 1, wherein the biomarker comprises a presence of clustered mutations in KIT or c-Kit gene.
10. The method of claim 1, wherein the biomarker comprises a presence of at least one of a microsatellite instable (MSI) defect or a mismatch repair (MMR) gene defect, wherein the MMR gene defect includes at least one of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2.
11. The method of claim 1 , wherein the biomarker comprises a presence of high tumor mutational burden.
12. The method of claim 1 , wherein the biomarker comprises a presence of hypermutator mutational signatures selected from: POLE including POLE and MSI-COSMIC14; MSI combined MSLCOSMIC15, MSLCOSMIC20, MSI-COSMIC21, MSI-COSMIC26, and MSI - COSMIC6.
13. The method of claim 1 , wherein the biomarker comprises a presence of apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC) alterations and mutational signature.
14. The method of claim 1, wherein the biomarker comprises a presence of homologous recombination deficiency (HRD).
15. The method of claim 1, wherein the biomarker comprises a presence of HRD negative or homologous recombination proficiency (HRP) or HRD positive.
16. The method of claim 1, wherein the biomarker comprises a presence of at least one of breast cancer gene (BRCA)-l mutation orBRCA-2 mutation.
17. The method of claim 1, wherein the biomarker comprises a presence of C0SMIC3 - BRCA mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as mutational signature 3 (Sig3) in a COSMIC signature catalog or a presence of genomic scar signatures.
18. The method of claim 1, wherein the biomarker comprises a presence of a genomic instability score (GIS) including one or more of: patterns or signatures of loss of heterozygosity (LOH); a number of telomeric imbalances corresponding to a number of regions with allelic imbalance that extend to a sub -telomere but not across a centromere; or large-scale state transitions (LST) corresponding to chromosome breaks, wherein the telomeric imbalances include telomeric allelic imbalances (TAI), wherein the chromosome breaks include deletions, translocations, and inversions.
19. The method of claim 1, wherein the biomarker comprises a presence of a homologous recombination feature set comprising one or more of: a total number and proportions of deletions at microhomologies features of sequencing data; a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data; a total number and proportions of heterozygous genomic segments features of the sequencing data; or a total number and proportions of C:G>T:A single base substitutions at 5’-NpCpG-3 ’ context features of the sequencing data, or a combination thereof.
20. The method of claim 1, wherein the biomarker comprises a presence of genomic alterations in one or more of homologous recombination repair (HRR)-related genes beyond BRCA1 and BRCA2, including at least one of: alterations in PALB2, ATM, ATR, CHEK1/2, FANC genes, RAD50, RAD51 genes, RAD52, RAD54 genes, ATRX, BAP1, BARD1, BRIP1, CDK12, PPP2R2A, MRE11, MRE1 1 A, NBN, TP53, NCOR1, PTK2, ARID1 A, BLM, WRN, CDK12, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, or HDAC2.
21. The method of claim 1, wherein the biomarker comprises a presence of potentially actionable genomic alterations in at least one of: ABL1 , AKT1 , ALK, APC, ATM, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, IDH1, IDH2, JAK1, JAK2, JAK3, KDR, KIT, MAP2K1, MET, NTRK, NTRK1, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, F0XL2, GNA11, GNAQ, GNAS, HNF1 A, MLH1, MPL, MSH6, N0TCH1 , VEGFA, HGF, NPM1, PTPN11, RBI, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, GNA13, GNAQ, GNA11, RRAS2, KIF1 A, orKIF5B.
22. The method of claim 1, wherein the biomarker indicates at least one of immunohistochemical alterations, and copy number alterations, deletions, amplifications, fusions, mutation clusters, mutation signatures or any combination thereof a genome of the biological sample.
23. The method of claim 1, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or a combination thereof.
24. The method of claim 1, wherein the trained predictive neural network comprises a convolutional neural network.
25. The method of claim 1, wherein the trained predictive neural network comprises a residual neural network.
26. The method of claim 1, wherein the parameter space of the firstand second plurality of image data indicates tiles at 5x magnification, wherein the parameter space of the first and second plurality of image data is reduced to 25%, 10%, or 5% of the tiles carrying predictive information.
27. The method of claim 26, wherein reducing is completedby principal component analysis.
28. The method of claim 1, wherein the biological sample comprises a cancer free, or cancerous biological sample.
29. The method of claim 1, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
30. The method of claim 29, wherein the unhealthy tissue comprises virally infected tissue
31. The method of claim 30, wherein virally infected tissue comprises human papilloma vims (HPV) positive tissue.
32. The method of claim 30, wherein the virally infected tissue comprises Ep stein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type corresponding to human T-lymphotrophic virus (HTLV-1).
33. The method of claim 29, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
34. The method of claim 33, wherein the unhealthy tissue comprises premalignant or precancerous tissue.
35. The method of claim 1, wherein the stain comprises a hematoxylin and eosin stain.
36. The method of claim 1, wherein the first resolution comprises a 5X magnification, and wherein the second resolution comprises a 20X magnification.
37. The method of claim 26, further comprising clusteringthe reduced first and second plurality of image data to generate a first and second clustered dataset to perform training that produces the trained predictive neural network.
38. The method of claim 37, wherein clustering is completed by k-means clustering.
39. The method of claim 37, wherein the trained predictive neural network is trained with the clustered datasets that represent the top 15% of a variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
40. The method of claim 37, wherein the trained predictive neural network is trained with the first and second clustered dataset and corresponding biomarker label of the biological sample, wherein the first and second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
41. The method of claim 40, wherein the corresponding biomarker label of the biological sample is determined by genomic sequencing.
42. The method of claim 1 , wherein the output of the trained predictive neural network comprises an averaged predicted probability score of the first and second predictive neural network.
43. The method of claim 1 , wherein the one or more regions comprise at least 100 regions.
44. The method of claim 1, wherein the one or more regions comprise at most 10,000 regions.
45. The method of claim 1, comprising removing one or more nodes of the trained predictive neural network when the trained predictive neural network is provided an input of the reduced first and second plurality of image data.
46. A method of generating a trained predictive model configured to determine a presence of a biomarker in a biological sample, comprising: generating stained sections of one or more biological samples and corresponding biomarker labels; imaging one or more regions of the stained sections of the one or more biological samples at a first resolution and a second resolution to generate a first and second plurality of image data; reducing a parameter space of the first and second plurality of image data to produce a reduced firstand second plurality of image data; and generating a trained predictive model, wherein the trained predictive model comprises a first predictive model trained with the reduced first plurality of image data and corresponding biomarker labels, and a second predictive model trained with the reduced second plurality of image data and corresponding biomarker labels.
47. The method of claim 46, wherein the trained predictive model is configured to determine the presence of a biomarker with a preset accuracy, wherein the preset accuracy is at least 80% of an accuracy of genomic sequencing.
48. The method of claim 47, wherein the preset accuracy comprises 85%, 92%, 95%, 97%, or 99% of the accuracy of genomic sequencing.
49. The method of claim 46, wherein the biomarker label comprises a loss of chromosome 9p.
50. The method of claim 46, wherein the biomarker comprises a presence of clustered mutations in TP53 gene.
51. The method of claim 46, wherein the biomarker comprises a presence of clustered mutations in EGFR gene.
52. The method of claim 46, wherein the biomarker comprises a presence of clustered mutations in BRAF gene.
53. The method of claim 46, wherein the biomarker comprises a presence of clustered mutations in KIT gene.
54. The method of claim 46, wherein the biomarker comprises a presence of at least one of a microsatellite instable (MSI) defect or a mismatch repair (MMR) gene defect, wherein the MMR gene defect includes at least one of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2.
55. The method of claim 46, wherein the biomarker comprises a presence of hypermutator mutational signatures selected from: POLE, MSI - C0SMIC14, a combination of POLE and MSI; MSI combined MSI-C0SMIC15, MSI-COSMIC20, MSI-C0SMIC21, MSI-COSMIC26, and MSI-C0SMIC6.
56. The method of claim 46, wherein the biomarker comprises a presence of apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC) alterations and mutational signature.
57. The method of claim 45, wherein the biomarker comprises presence of high tumor mutational burden.
58. The method of claim 46, wherein the biomarker comprises a presence of homologous recombination deficiency (HRD).
59. The method of claim 46, wherein the biomarker comprises a presence of HRD negative or homologous recombination proficiency (HRP) or HRD positive.
60. The method of claim 46, wherein the biomarker comprises presence of breast cancer gene (BRCA)l/2 mutations.
61. The method of claim 46, wherein the biomarker comprises a presence of COSMIC3 -
BRCA mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) comprising mutational signature 3 (Sig3 ) in a COSMIC signature catalog or a presence of genomic scar signatures.
62. The method of claim 46, wherein the biomarker comprises a presence of genomic instability score (GIS) including one or more of: patterns or signatures of loss of heterozygosity (LOH) corresponding to regions of intermediate size over 15 MB and less than a whole chromosome; a number of telomeric imbalances including telomeric allelic imbalance (TAI) corresponding to a number of regions with allelic imbalance that extend to a sub-telomere but not across a centromere; or large-scale state transitions (LST) corresponding to chromosome breaks including deletions, translocations, and inversions.
63. The method of claim 46, wherein the biomarker comprises a presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions ata 5’-NpCpG-3’ contexts features of the sequencing data, or a combination thereof.
64. The method of claim 46, wherein the biomarker comprises a presence of genomic alterations in one or more of homologous recombination repair (HRR)-related genes beyond BRCA1 and BRCA2, including at least one of: alterations in PALB2, ATM, ATR, CHEK1/2, FANC genes, RAD50, RAD51 genes, RAD52, RAD54genes, ATRX, BAP1, BARD1, BRIP1, CDK12, PPP2R2A, MRE11, MRE11 A, NBN, TP53, NCOR1, PTK2, ARID1 A, BLM, WRN, CDK12, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, or HDAC2.
65. The method of claim 46, wherein the biomarker comprises a presence of potentially actionable genomic alterations in at least one of : ABL1 , AKT1 , ALK, APC, ATM, BRAT, RET, ROS, KRAS, NRAS, HRAS, RAFI, IDH1, IDH2, JAK1, JAK2, JAK3, KDR, KIT, MAP2K1,
MET, NTRK, NTRK1, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, F0XL2, GNA11, GNAQ, GNAS, HNF 1 A, MLH1 , MPL, MSH6, N0TCH1 , VEGFA, HGF, NPM1 , PTPN11 , RBI, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, GNA13, GNAQ, GNA11, RRAS2, KIF1 A, orKIF5B.
66. The method of claim 46, wherein the stained sections of the one or more biological samples comprises paraffin embedded sections, formalin fixed sections, frozen sections, fresh sections, or any combination thereof sections.
67. The method of claim 46, wherein the trained predictive model comprises a convolutional neural network.
68. The method of claim 46, wherein the trained predictive model comprises a residual neural network model.
69. The method of claim 46, wherein reducing is completed by principal component analysis.
70. The method of claim 46, wherein the one or more biological samples comprise a cancer free, cancerous biological sample, healthy tissue, unhealthy tissue, or any combination of healthy and unhealthy tissues.
71. The method of claim 70, wherein the unhealthy tissue comprises a virally infected tissue.
72. The method of claim 71, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
73. The method of claim 46, wherein a virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus
(HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type corresponding to human T-lymphotrophic virus (HTLV-1).
74. The method of claim 70, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
75. The method of claim 74, wherein the unhealthy tissue comprises premalignant or precancerous tissue.
76. The method of claim 46, wherein the stain comprises a hematoxylin and eosin stain.
77. The method of claim 46, wherein the first resolution comprises a 5 X magnification, and wherein the second resolution comprises a 20X magnification.
78. The method of claim 46, further comprising clusterin the reduced first and second plurality of image data to generate a first and second clustered dataset.
79. The method of claim 78, wherein clustering is completed by k-means clustering.
80. The method of claim 78, wherein the trained predictive model is trained with clustered datasets that represent the top 15% of a variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
81. The method of claim 78, wherein the first and second predictive models are trained with one or more biological samples’ first and second clustered dataset and the corresponding biomarker labels, wherein the first and second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
82. The method of claim 46, wherein the corresponding biomarker labels of the one or more biological samples are determined by genomic sequencing.
83. The method of claim 46, wherein an output of the trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
84. The method of claim 46, wherein the one or more regions comprise at least 100 regions.
85. The method of claim 46, wherein the one or more regions comprise at most 1,000 regions.
86. The method of claim 46, wherein generating the trained predictive model comprises removing one or more nodes of the first and second predictive model during training.
87. A computer system configured to determine a presence ofa biomarker in a biological sample, comprising: one or more processors; and a n on-transitory computer readable storage medium including software stored thereon, wherein the software comprises executable instructions that, as a result of execution, cause the one or more processors of the computer system to: image one or more regions of a stained section of a biological sample at a first resolution and at a second resolution to generate a first and a second plurality of image data; reduce a parameter space of the first and second plurality of image data to produce a reduced first and second plurality of image data; and providing the first and the second plurality of image data to a trained predictive model and determine the presence of a biomarker in the biological sample as an output of the trained predictive model.
88. The system of claim 87, wherein the trained predictive model is configured to determine the presence of a biomarker with a preset accuracy, wherein the preset accuracy is at least 80% of an accuracy of genomic sequencing.
89. The system of claim 88, wherein the preset accuracy comprises 85%, 92%, 95%, 97%, or 99% of the accuracy of genomic sequencing.
90. The system of claim 87, wherein the trained predictive model comprises a first predictive model trained on the first plurality of image data and a second predictive model trained on the second plurality of image data.
91. The system of claim 87, wherein the biomarker comprises a loss of chromosome 9p.
92. The system of claim 87, wherein the biomarker comprises a presence of clustered mutations in TP53 gene.
93. The system of claim 87, wherein the biomarker comprises a presence of clustered mutations in EGFR gene.
94. The system of claim 87, wherein the biomarker comprises a presence of clustered mutations in BRAF gene.
95. The system of claim 87, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or any combination thereof sections.
96. The system of claim 87, wherein the trained predictive model comprises a convolutional neural network.
97. The system of claim 87, wherein the trained predictive model comprises a residual neural network model.
98. The system of claim 87, wherein reducing is completed by principal component analysis.
99. The system of claim 87, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
100. The system of claim 99, wherein the unhealthy tissue comprises a virally infectedtissue.
101. The system of claim 100, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
102. The system of claim 100, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus (HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type corresponding to human T-lymphotrophic virus (HTLV-1).
103. The system of claim 99, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
104. The system of claim 99, wherein the unhealthy tissue comprises premalignant or precancerous tissue.
105. The system of claim 87, wherein the biological sample comprises a cancer free, or cancerous biological sample.
106. The system of claim 87, wherein the stain comprises a hematoxylin and eosin stain.
107. The system of claim 87, wherein the first resolution comprises a 5X magnification, and wherein the second resolution comprises a 20X magnification.
108. The system of claim 87, wherein the instructions further comprise cluster the reduced first and second plurality of image data to generate a first and second clustered dataset.
109. The system of claim 108, wherein the instruction of clustering is completed by k-means clustering.
1 10. The system of claim 108, wherein the trained predictive model is trained with the clustered datasets that represent the top 15% ofthe variance between clustered datasets of the first and second clustered datasets and corresponding biomarker labels.
1 11. The system of claim 108, wherein the trained predictive model is trained with first and second clustered dataset of the biological sample and corresponding biomarker labels of the biological sample, wherein the first and second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters ofthe first and second clustered dataset.
1 12. The system of claim 111, wherein the corresponding biomarker label ofthe biological sample is determined by genomic sequencing.
1 13. The system of claim 87, wherein the output ofthe trained predictive model comprises an averaged predicted probability score of the first and second predictive model.
1 14. The system of claim 87, wherein the one ormore regions comprise atleast 100 regions, or at most 1,000 regions, or at least 100 regions and at most 1,000 regions.
1 15. The system of claim 87, wherein the one ormore processors comprise one or more processors of a smartphone, tablet, laptop, desktop, server, cloud computing architecture, or any combination thereof.
1 16. A method of determining a presence of a biomarker in a biological sample, comprising: obtaining a stained section of the biological sample; imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section; and
providing the plurality of images of the stained section an input to a trained predictive model and determining the presence of a biomarker in the biological sample as an output of a trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing.
117. The method of claim 116, wherein the accuracy comprises 85%, 92%, 95%, 97%, or 99% of the accuracy of genomic sequencing.
118. The method of claim 116, wherein the trained predictive model comprises a first predictive model trained on a first plurality of images acquired at a first resolution and a second predictive model trained on a second plurality of images acquired at a second resolution.
119. The method of claim 116, wherein the biomarker comprises a loss of chromosome 9p.
120. The method of claim 116, wherein the biomarker comprises a presence of clustered mutations in TP53 gene.
121. The method of claim 116, wherein the biomarker comprises a presence of clustered mutations in epidermal growth factor receptor (EGFR) gene.
122. The method of claim 116, wherein the biomarker comprises a presence of clustered mutations in BRAF gene.
123. The method of claim 116, wherein the biomarker comprises a presence of clustered mutations in KIT gene.
124. The method of claim 116, wherein the biomarker comprises a presence of at least one of a microsatellite instable (MSI) defect or a mismatch repair (MMR) gene defect, wherein the MMR gene defect includes at least one of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2.
125. The method of claim 116, wherein the biomarker comprises a presence of hypermutator mutational signatures selected from: POLE including POLE and SBS6, SBS14, SBS15, SBS20, SBS21, SBS2, SBS26, SBS44.
126. The method of claim 116, wherein the biomarker comprises a presence of high tumor mutational burden.
127. The method of claim 116, wherein the biomarker comprises a presence of apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC) alterations and mutational signature
128. The method of claim 116, wherein the biomarker comprises a presence of homologous recombination deficiency (HRD).
129. The method of claim 116, wherein the biomarker comprises a presence of HRD negative or homologous recombination proficiency (HRP) or HRD positive.
130. The method of claim 116, wherein the biomarker comprises a presence of at least one of breast cancer gene (BRCA)-l mutation orBRCA-2 mutation.
131. The method of claim 116, wherein the biomarker comprises a presence of COSMIC3 - BRCA mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as mutational signature 3 (Sig3) in a COSMIC signature catalog or a presence of genomic scar signatures.
132. The method of claim 116, wherein the biomarker comprises a presence of a genomic instability score (GIS) including one or more of: patterns or signatures of loss of heterozygosity (LOH); a number of telomeric imbalances corresponding to a number of regions with allelic imbalance that extend to a sub -telomere but not across a centromere; or large-scale state transitions (LST) corresponding to chromosome breaks, wherein the telomeric imbalances
include telomeric allelic imbalances (TAI), wherein the chromosome breaks include deletions, translocations, and inversions.
133. The method of claim 116, wherein the biomarker comprises a presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5 ’-NpCpG-3 ’ contexts features of the sequencing data, or a combination thereof.
134. The method of claim 116, wherein the biomarker comprises a presence of genomic alterations in one or more of homologous recombination repair (HRR)-related genes beyond BRCA1 and BRCA2, including at least one of: alterations in PALB2, BARD1, ATM, BRIP1, CHEK1/2, CDK12, ATR, ATRX, BAP1, ARID1 A, FANC genes, RAD50, RAD51 genes, RAD52, RAD54L/C/D/B, HRRgene alterations in PPP2R2A, MRE11, MRE11 A, NBN, TP53, NC0R1, PTK2, BLM, WRN, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2, NPM1, PTWN, H2AX, RPA; or PRK2, NF1.
135. The method of claim 116, wherein the biomarker comprises a presence of potentially actionable genomic alterations in one or more of genes including at least one of: ABL1, AKT1, APC, ALK, APC, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, KDR, MET, NTRK, NTRK1/2/3, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, FOXL2, GNA11, GNA13, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, NOTCH1, VEGFA, HGF, NPM1, PTPN11, RB1, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, RRAS2, KIF1A, KIF5B, IDH1/2, JAK1/2/3, MAP2K1, MAP3K1, GATA3, PTPN11, SRC, SETBP1, FAT1, KEAP1, LRP1B, FAT3, NF1, orRB.
136. The method of claim 116, wherein the section of the biological sample comprises a paraffin embedded section, a formalin fixed section, a frozen section, a fresh section, or a combination thereof
137. The method of claim 116, wherein the trained predictive model comprises a convolutional neural network.
138. The method of claim 116, wherein the trained predictive model comprises a residual neural network model.
139. The method of claim 116, further comprising reducing a parameter space of a first and second plurality of images to produce a reduced firstand second plurality of images.
140. The method of claim 116, wherein reducing is completed by principal component analysis.
141. The method of claim 116, wherein the biological sample comprises a cancer free, or cancerous biological sample.
142. The method of claim 116, wherein the biological sample comprises healthy tissue, unhealthy tissue, or any combination thereof tissues.
143. The method of claim 142, wherein the unhealthy tissue comprises a virally infected tissue.
144. The method of claim 143, wherein the virally infected tissue comprises human papilloma virus (HPV) positive tissue.
145. The method of claim 144, wherein the virally infected tissue comprises Epstein-Barr virus (EBV), Hepatitis B virus (HBV), Hepatitis C virus (HCV), Human immunodeficiency virus
(HIV), Human herpes virus 8 (HHV-8), and/or Human T-cell leukemia virus type corresponding to human T-lymphotrophic virus (HTLV-1).
146. The method of claim 143, wherein the unhealthy tissue comprises nuclei morphology different from nuclei morphology of healthy tissue.
147. The method of claim 143, wherein the unhealthy tissue comprises premalignantor precancerous tissue.
148. The method of claim 1 16, wherein the stain comprises a hematoxylin and eosin stain.
149. The method of claim 1 18, wherein the first resolution comprises a 5 X magnification, and wherein the second resolution comprises a 20X magnification.
150. The method of claim 143, further comprising clustering a reduced first and second plurality of images generating a first and second clustered dataset.
151. The method of claim 150, wherein clustering is completed by k-means clustering.
152. The method of claim 150, wherein the trained predictive model is trained with first and second clustered dataset of the biological sample and corresponding biomarker label of the biological sample, wherein the first and second clustered dataset comprise clustered datasets with silhouette coefficients within the top 50th percentile across all clusters of the first and second clustered dataset.
153. The method of claim 152, wherein the corresponding biomarker label of the biological sample is determined by genomic sequencing.
154. The method of claim 1 16, wherein the output of the trained predictive model comprises an averaged predicted probability score of firstand second predictive model.
155. The method of claim 116, wherein the one or more regions comprise at least 100 regions.
156. The method of claim 116, wherein the one ormore regions comprise at most 1,000 regions.
157. The method of claim 125, further comprising removing one ormore nodes of the trained predictive model when provided as an input a reduced first and second plurality of images.
158. A treatment method for treating cancer in a patient, the method comprising: obtaining a stained section of a biological sample; imaging one or more regions of the stained section of the biological sample to generate a plurality of images of the stained section; providing the plurality of images of the stained section to a trained predictive model and determining a presence of a biomarker in the biological sample as an output of the trained predictive model, wherein the trained predictive model is configured with a preset accuracy of determining the presence of the biomarker set to at least 80% of genomic sequencing; and administering treatment to the patient based on the presence of the biomarker.
159. The treatment method of claim 158, wherein the biomarker comprises a loss of chromosome 9p
160. The treatment method of claim 158, wherein the biomarker comprises a presence of clustered mutations in TP53 gene.
161. The treatment method of claim 158, wherein the biomarker comprises a presence of clustered mutations in epidermal growth factor receptor (EGFR) gene.
162. The treatment method of claim 158, wherein the biomarker comprises a presence of clustered mutations in BRAT gene.
163. The treatment method of claim 158, wherein the biomarker comprises a presence of clustered mutations in KIT gene.
164. The treatment method of claim 158, wherein the biomarker comprises a presence of at least one of a microsatellite instable (MSI) defect or a mismatch repair (MMR) gene defect, wherein the MMR gene defect includes at least one of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2.
165. The treatment method of claim 158, wherein the biomarker comprises a presence of hypermutator mutational signatures selected from: POLE including POLE and SBS6, SBS14, SBS15, SBS20, SBS21, SBS2, SBS26, SBS44.
166. The treatment method of claim 158, wherein the biomarker comprises a presence of high tumor mutational burden.
167. The treatment method of claim 158, wherein the biomarker comprises a presence of APOBEC alterations and mutational signature.
168. The treatment method of claim 158, wherein the biomarker comprises a presence of homologous recombination deficiency (HRD). Two commercial HRD companion diagnostic (CDx) tests, Myriad myChoice® CDx and FoundationOne® CDx, have been FDA approved to determine HRD by quantifying overall genomic instability in combination with BRCA1 and BRCA2 status, and, at least three academic HRD detection approaches— SigMA, HRDetect, and CHORD — exi st.
169. The treatment method of claim 158, wherein the biomarker comprises a presence of HRD negative (or homologous recombination proficiency, HRP) or HRD positive, for example, using genomic tests in Claim 169.
170. The treatment method of claim 158, wherein the biomarker comprises a presence of BRCA-1 and/or -2 mutations.
171. The treatment method of claim 158, wherein the biomarker comprises a presence of C0SMIC3-BRCA mutational signature, comprising a specific pattern of genome-wide somatic single nucleotide variations (SNVs) defined as mutational signature 3 (Sig3) in COSMIC signature catalog, or a presence of genomic scar signatures.
172. The treatment method of claim 158, wherein the biomarker comprises a presence of genomic instability score (GIS), comprised of patterns (or signatures) of loss of heterozygosity (LOH), which are regions of intermediate size; number of telomeric imbalances (telomeric allelic imbalance, or TAI); and large-scale state transitions (LST), which are chromosome breaks (deletions, translocations, and inversions).
173. The treatment method of claim 158, wherein the biomarker comprises a presence of a homologous recombination feature set which comprises: a total number and proportions of deletions at microhomologies features of sequencing data, a total number and proportions of genomic segments with loss of heterozygosity features of the sequencing data, a total number and proportions of heterozygous genomic segments features of the sequencing data, a total number and proportions of C:G>T:A single base substitutions at a 5’-NpCpG-3 ’ contexts features of the sequencing data, or a combination thereof.
174. The treatment method of claim 158, wherein the biomarker comprises a presence of genomic alterations in one or more of homologous recombination repair (HRR)-related genes beyond BRCA1 and BRCA2, including at least one of: alterations in PALB2, BARD1, ATM, BRIP1, CHEK1/2, CDK12, ATR, ATRX, BAP1, ARID! A, FANC genes, RAD50, RAD51 genes, RAD52, RAD54 genes; HRR gene alterations in PPP2R2A, MRE11 , MRE11 A, NBN, TP53, NCOR1, PTK2, BLM, WRN, RPA1, EMSY, CCNE1, ERCC3, TAD54, XRCC2/3, HDAC2, NPM1 , PTWN, H2AX, RPA; or PRK2, NF1 .
175. The treatment method of claim 158, wherein the biomarker comprises a presence of actionable genomic alterations in one or more of genes including at least one of: ABL1, AKT1, APC, ALK, APC, BRAF, RET, ROS, KRAS, NRAS, HRAS, RAFI, KDR, MET, NTRK,
NTRK1/2/3, CCNE, CCNE1, CDK4/6, CCND1/2, AR, PDGFRA, PIK3CA, PTEN, CDH1, CDKN2A, CSF1R, CTNNB1, DDR2, DNMT3A, EGFR, ERBB2, ERBB3, ERBB4, HER2/NEU, EZH2, FBXW7, FGF, FGFR, FGFR1, FGFR2, FGFR3, FLT3, F0XL2, GNA11, GNA13, GNAQ, GNAS, HNF1A, MLH1, MPL, MSH6, N0TCH1, VEGFA, HGF, NPM1, PTPN11, RB1, SMAD4, SMARCB1, SMO, SRC, STK11, TP53, TSC1, VHL, ESRI, MAPK3K1, GATA3, CDH1, FBXW7, NF1, KMT2C, CTNNB1, RRAS2, KIF1A, KIF5B, IDH1/2, JAK1/2/3, MAP2K1, MAP3K1, GATA3, PTPN11, SRC, SETBP 1, FAT 1, KEAP1, LRP1B, FAT3, NF1, orRB.
176. The treatment method of claim 163, wherein the patient has a gastrointestinal stromal tumor (GIST), and wherein the treatment method comprising not administering c-Kit inhibitor imatinib.
177. The treatment method of claim 163, wherein the patient has a GIST or another solid tumor; the treatment method comprising not administering c-Kit inhibitors in addition to imatinib, including Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib, Pazopanib, Regorafenib, Ripretinib and Dovitinib.
178. The treatment method of any of claims 164-166, further comprising administering a treatment for the cancer comprising drugs classes including: immune checkpoint inhibitors (ICIs) and other immunotherapies to the patient upon detection of at least one of MSI, MMR gene defects, hypermutator mutational signatures or high TMB of said sample, wherein the MMR gene defects include at least one of POLE, MLH1, MLH3, MGMT, MSH6, MSH3, MSH2, PMS1, orPMS2.
179. The treatment method of any of claims 164-166, further comprising administering a treatment for the cancer comprising drugs classes including: PD-1 inhibitors, PD-L1 inhibitors, CTLA-4 inhibitors, LAG-3 inhibitors, TIM-3 inhibitors, other immunomodulator therapies alone or in combination with other ICIs or other drugs to the patient upon detection of at least one of MSI, MMR gene defects, hypermutator mutational signatures or high TMB of said sample.
180. The treatment method of any of claims 168-174, further comprising administering a treatment for the cancer comprising drug classes including: platinum drugs, poly-ADP ribose polymerase (PARP) inhibitors, and/or newer agents such as ATR, Weel or CHK, Pol-theta or RAD52 inhibitors to the patient upon detection of HRD or surrogate gene or signature, wherein said cancer comprises breast cancer, ovarian cancer, pancreatic adenocarcinoma, prostate cancer, sarcoma, or any solid tumor or a combination thereof.
181. The treatment method of any of claims 168-174, further comprising administering a treatment for the cancer comprising platinum drugs, including cisplatin, carboplatin, oxaliplatin, nedaplatin, lobaplatin, heptaplatin or satraplatin alone, or in combination with other drugs, eg, FOLFOX to the patient upon detection of HRD or surrogate gene or signature thereof.
182. The treatment method of any of claims 168-174, further comprising administering a treatment for the cancer comprising poly-ADP ribose polymerase (PARP) inhibitors, including four main PARP inhibitors: olaparib, niraparib, rucaparib, talazoparib, or PARP inhibitors to the patient upon detection of at least one of HRD or surrogate gene or signature thereof.
183. The treatment method of claim 158, wherein the treatment method comprising not administering immune checkpoint inhibitors (ICIs) and other immunotherapies to the patient upon detection of 9p deletions of said sample.
184. The treatment method of claim 158, wherein the biomarker comprises a presence of EGFR/ErbB 1 mutations comprising one or more of L858R, exonl 9del, and exon 20 alteration.
185. The treatment method of claims 184, further comprising administering, to the patient, afatinib, dacomitinib, erlotinib, gefitinib, osimertinib, or amivantamib.
186. The treatment method of claim 158, wherein the biomarker comprises a presence of HER2/ErbB2 amplification.
187. The treatment of claim 186, further compring administering, to the patient, traztuzumab, ado-trastuzumab emtansine, lapatinib, margetuximab, neratinib, pertuzumab, tucatinimb, deruxtecab, traztumab deruxtecan, or neratinib.
188. The treatment method of claim 158, wherein the biomarker comprises a presence of BRAF mutation.
189. The treatment method of claim 188, further comprising administering, to the patient, encorafenib, vemurafenib, dabrafenib,trametinib, or cobimetinib.
190. The treatment method of claim 158, wherein the biomarker comprises a presence of FGFR1/2/3 fusions.
1 1. The treatment method of claim 190, further comprising administering, to the patient, erdafitanib, fatibatinib, infigratinib, pemigatinib, dovitinib; lenvatinib, pazopanib, ponatinib, or regora enib.
192. The treatment method of claim 158, wherein the biomarker comprises presence of PDGFRA exon 18 mutations.
193. The treatment method of claim 192, further comprising administering, to the patient, avapritinib or dasatinib.
194. The treatment method of claim 158, wherein the biomarker comprises a presence of KIT mutations in GIST.
195. The treatment method of claim 194, further comprising administering, to the patient, imatinib, Axitinib, Dovitinib, Dasatinib, Motesanib diphosphate, Pazopanib, Sunitinib, Masitinib, Vatalanib, Cabozantinib, Tivozanib, Amuvatinib, Telatinib, Pazopanib, Regorafenib, Ripretinib and Dovitinib, or sorafenib.
196. The treatment method of claim 158, wherein the biomarker comprises a presence of NRG1 fusion.
1 7. The treatment method of claim 196, further comprising administering, to the patient, zenocutinumab or seribantmab,
198. The treatment method of claim 158, wherein the biomarker comprises a presence of RET fusions.
1 9. The treatment method of claim 198, further comprising administering, to the patient, pralsetinib, selpercatinib; crizotinib, ceritinib.cabozantinib, orvandetanib.
200. The treatment method of claim 158, wherein the biomarker comprises a presence of ROS1 fusions.
201. The treatment method of claim 200, further comprising administering, to the patient, crizotinib, or entrectinib .
202. The treatment method of claim 158, wherein the biomarker comprises a presence of NTRK 1/2 or 3 fusions.
203. The treatment method of claim 202, further comprising administering, to the patient, entrectinib, larotrectinib, or repotrectinib.
204. The treatment method of claim 158, wherein the biomarker comprises a presence of ALK fusions.
205. The treatment method of claim 204, further comprising administering, to the patient, crizotinib, alectinib, brigatinib, ceritinib, or lorlatinib.
206. The treatment method of claim 158, wherein the biomarker comprises a presence of PIK3CA alterations.
207. The treatment method of claim 206, further comprising administering, to the patient, alpelisib, temsirolimus, or everolimus.
208. The treatment method of claim 158, wherein the biomarker comprises a presence of Mtor orTSCl/2 mutations.
209. The treatment method of claim 208, further comprising administering, to the patient, temsirolimus, oreverolimas.
210. The treatment method of claim 158, wherein the biomarker comprises a presence of Akt, orPTEN alterations.
211. The treatment method of claim 210, further comprising administering, to the patient, capivasertib.
212. The treatment method of claim 158, wherein the biomarker comprises a presence of MET amplification or mutation.
213. The treatment method of claim 212, further comprising administering, to the patient, crizotinib, tepotinib, capmatinib, telisotuzumib, tepotinib, or savolitinib.
214. The treatment method of claim 158, wherein the biomarker comprises a presence of MEK mutation.
215. The treatment method of claim 214, further comprising administering, to the patient, trametinib, cobimetinib, or selumetinib.
216. The treatment method of claim 158, wherein the biomarker comprises a presence of NF1/2 alterations.
217. The treatment method of claim 216, further comprising administering, to the patient, trametinib, temsirolimus, everolimus, or selumetinib.
218. The treatment method of claim 158, wherein the biomarker comprises presence of STK11 alterations.
219. The treatment method of claim 218 comprising administering, to the patient, dasatinib, everolimus, temsirolimus, orbosutinib.
220. The treatment method of claim 158, wherein the biomarker comprises a presence of KDR alterations.
221. The treatment method of claim 220, further comprising administering, to the patient, pazopanib,regorafenib, orvandetanib.
222. The treatment method of claim 158, wherein the biomarker comprises a presence of microsatellite stable (MS) with DNA polymerase-e (POLE) mutation, CD274 amplification, or 9p24.1 amplicon.
223. The treatment method of claim 222, further comprising administer ICIs to the patient.
224. The treatment method of claim 158, wherein the biomarker comprises a presence of MAP2K alterations.
225. The treatment method of claim 224, further comprising administering, to the patient, trametinib.
226. The treatment method of claim 158, wherein the biomarker comprises a presence of alterations to CCND2, CDK4, orCDKN2A/B
227. The treatment method of claim 226, further comprising administering, to the patient, Palbociclib.
228. The treatment method of claim 158, wherein the biomarker comprises a presence of IDH1 mutation.
229. The treatment method of claim 228, further comprising administering, to the patient, ivosidenib.
230. The treatment method of claim 158, wherein the biomarker comprises a presence of truncating or oncogenic mutationsin B2M, PTEN, JAK1, JAK2, STK11 and EGFR, and/or 9p21 or9p arm/genetic region loss.
231. The treatment method of claim 230, further comprising not administering, to the patient, an immune checkpoint inhibitor.
232. The treatment method of claim 158, wherein the biomarker comprises a presence of mutations in RAS genes KRAS and NRAS.
233. The treatment method of claim 232, further comprising not administering, to the patient, epidermal growth factor receptor (EGFR) therapies, including cetuximab and panitumumab, in colorectal cancer, and EGFR tyrosine kinase inhibitors, including erlotinib, in lung cancer.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263269033P | 2022-03-08 | 2022-03-08 | |
US63/269,033 | 2022-03-08 | ||
US202363483237P | 2023-02-03 | 2023-02-03 | |
US63/483,237 | 2023-02-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023172929A1 true WO2023172929A1 (en) | 2023-09-14 |
Family
ID=87935936
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/063887 WO2023172929A1 (en) | 2022-03-08 | 2023-03-07 | Artificial intelligence architecture for predicting cancer biomarkers |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023172929A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200258223A1 (en) * | 2018-05-14 | 2020-08-13 | Tempus Labs, Inc. | Determining biomarkers from histopathology slide images |
US20200365268A1 (en) * | 2019-05-14 | 2020-11-19 | Tempus Labs, Inc. | Systems and methods for multi-label cancer classification |
US20200388029A1 (en) * | 2017-11-30 | 2020-12-10 | The Research Foundation For The State University Of New York | System and Method to Quantify Tumor-Infiltrating Lymphocytes (TILs) for Clinical Pathology Analysis Based on Prediction, Spatial Analysis, Molecular Correlation, and Reconstruction of TIL Information Identified in Digitized Tissue Images |
US20210073986A1 (en) * | 2019-09-09 | 2021-03-11 | PAIGE,AI, Inc. | Systems and methods for processing images of slides to infer biomarkers |
-
2023
- 2023-03-07 WO PCT/US2023/063887 patent/WO2023172929A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200388029A1 (en) * | 2017-11-30 | 2020-12-10 | The Research Foundation For The State University Of New York | System and Method to Quantify Tumor-Infiltrating Lymphocytes (TILs) for Clinical Pathology Analysis Based on Prediction, Spatial Analysis, Molecular Correlation, and Reconstruction of TIL Information Identified in Digitized Tissue Images |
US20200258223A1 (en) * | 2018-05-14 | 2020-08-13 | Tempus Labs, Inc. | Determining biomarkers from histopathology slide images |
US20200365268A1 (en) * | 2019-05-14 | 2020-11-19 | Tempus Labs, Inc. | Systems and methods for multi-label cancer classification |
US20210073986A1 (en) * | 2019-09-09 | 2021-03-11 | PAIGE,AI, Inc. | Systems and methods for processing images of slides to infer biomarkers |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | Pharmacogenomic landscape of patient-derived tumor cells informs precision oncology therapy | |
US20240141432A9 (en) | Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results | |
Bolli et al. | Genomic patterns of progression in smoldering multiple myeloma | |
Lazar et al. | Comprehensive and integrated genomic characterization of adult soft tissue sarcomas | |
Blakely et al. | Evolution and clinical impact of co-occurring genetic alterations in advanced-stage EGFR-mutant lung cancers | |
JP2022025101A (en) | Methods for fragmentome profiling of cell-free nucleic acids | |
Bray et al. | Genomic characterization of intrinsic and acquired resistance to cetuximab in colorectal cancer patients | |
Stephens et al. | The landscape of cancer genes and mutational processes in breast cancer | |
Jang et al. | Prediction of clinically actionable genetic alterations from colorectal cancer histopathology images using deep learning | |
Marchiò et al. | The genetic landscape of breast carcinomas with neuroendocrine differentiation | |
US20220130549A1 (en) | Tumor classification based on predicted tumor mutational burden | |
KR20230045009A (en) | How to identify chromosomal spatial instability such as homology repair deficiency in low-coverage next-generation sequencing data | |
Park et al. | Genomic landscape and clinical utility in Korean advanced pan-cancer patients from prospective clinical sequencing: K-MASTER program | |
Ansari-Pour et al. | Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes | |
Singh et al. | Molecular characterization and therapeutic targeting of colorectal cancers harboring receptor tyrosine kinase fusions | |
Lin et al. | Evolutionary route of nasopharyngeal carcinoma metastasis and its clinical significance | |
Alam et al. | Recent application of artificial intelligence on histopathologic image-based prediction of gene mutation in solid cancers | |
Arslan et al. | A systematic pan-cancer study on deep learning-based prediction of multi-omic biomarkers from routine pathology images | |
Zugazagoitia et al. | Prospective Clinical Integration of an Amplicon-Based Next-Generation Sequencing Method to Select Advanced Non–Small-Cell Lung Cancer Patients for Genotype-Tailored Treatments | |
WO2023172929A1 (en) | Artificial intelligence architecture for predicting cancer biomarkers | |
Casolino et al. | Interpreting and integrating genomic tests results in clinical cancer care: Overview and practical guidance | |
Wenzel et al. | Routine molecular pathology diagnostics in precision oncology | |
Bergstrom et al. | Deep learning predicts HRD and platinum response from histology slides in breast and ovarian cancer | |
US20240052419A1 (en) | Methods and systems for detecting genetic variants | |
Farncombe et al. | Mark Basik, 4, 5 Yvonne Bombard, 6, 7 Victoria Carile, 8 Lesa Dawson, 9, 10 William D. Foulkes, 8, 11 David Malkin, 12, 13, 14 Aly Karsan, 15 Patricia Parkin, 13, 16 Lynette S. Penney, 17 Aaron Pollett, 18 Kasmintan A. Schrader, 15, 19 Trevor J. Pugh, 2, 3, 14,* Raymond H. Kim, 2, 20, 21, 22,* and the CHARM consortium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23767626 Country of ref document: EP Kind code of ref document: A1 |