US20200005901A1 - Cancer classifier models, machine learning systems and methods of use - Google Patents
Cancer classifier models, machine learning systems and methods of use Download PDFInfo
- Publication number
- US20200005901A1 US20200005901A1 US16/458,589 US201916458589A US2020005901A1 US 20200005901 A1 US20200005901 A1 US 20200005901A1 US 201916458589 A US201916458589 A US 201916458589A US 2020005901 A1 US2020005901 A1 US 2020005901A1
- Authority
- US
- United States
- Prior art keywords
- cancer
- patient
- biomarkers
- classifier model
- panel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 360
- 201000011510 cancer Diseases 0.000 title claims abstract description 305
- 238000010801 machine learning Methods 0.000 title claims abstract description 114
- 238000000034 method Methods 0.000 title claims abstract description 95
- 210000004789 organ system Anatomy 0.000 claims abstract description 60
- 230000036210 malignancy Effects 0.000 claims abstract description 32
- 239000000090 biomarker Substances 0.000 claims description 225
- 238000012360 testing method Methods 0.000 claims description 72
- 238000012549 training Methods 0.000 claims description 72
- 230000035945 sensitivity Effects 0.000 claims description 47
- 108010036226 antigen CYFRA21.1 Proteins 0.000 claims description 37
- 238000004422 calculation algorithm Methods 0.000 claims description 32
- 238000002405 diagnostic procedure Methods 0.000 claims description 26
- 238000013528 artificial neural network Methods 0.000 claims description 25
- 101000623901 Homo sapiens Mucin-16 Proteins 0.000 claims description 23
- 102100023123 Mucin-16 Human genes 0.000 claims description 23
- 101001024605 Homo sapiens Next to BRCA1 gene 1 protein Proteins 0.000 claims description 22
- 238000012216 screening Methods 0.000 claims description 22
- 101001133056 Homo sapiens Mucin-1 Proteins 0.000 claims description 19
- 238000003745 diagnosis Methods 0.000 claims description 18
- 208000020816 lung neoplasm Diseases 0.000 claims description 18
- 206010058467 Lung neoplasm malignant Diseases 0.000 claims description 17
- 208000014018 liver neoplasm Diseases 0.000 claims description 17
- 201000005202 lung cancer Diseases 0.000 claims description 17
- 210000001519 tissue Anatomy 0.000 claims description 15
- 206010009944 Colon cancer Diseases 0.000 claims description 14
- 238000003066 decision tree Methods 0.000 claims description 14
- 238000007477 logistic regression Methods 0.000 claims description 13
- 201000007270 liver cancer Diseases 0.000 claims description 12
- 238000001574 biopsy Methods 0.000 claims description 11
- 238000007637 random forest analysis Methods 0.000 claims description 11
- 238000012706 support-vector machine Methods 0.000 claims description 11
- 206010006187 Breast cancer Diseases 0.000 claims description 9
- 208000026310 Breast neoplasm Diseases 0.000 claims description 9
- 238000013135 deep learning Methods 0.000 claims description 9
- 206010061535 Ovarian neoplasm Diseases 0.000 claims description 8
- 206010060862 Prostate cancer Diseases 0.000 claims description 8
- 208000000236 Prostatic Neoplasms Diseases 0.000 claims description 8
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 7
- 210000004185 liver Anatomy 0.000 claims description 7
- 206010004593 Bile duct cancer Diseases 0.000 claims description 6
- 206010061902 Pancreatic neoplasm Diseases 0.000 claims description 6
- 208000024313 Testicular Neoplasms Diseases 0.000 claims description 6
- 208000026900 bile duct neoplasm Diseases 0.000 claims description 6
- 208000006990 cholangiocarcinoma Diseases 0.000 claims description 6
- 208000029742 colonic neoplasm Diseases 0.000 claims description 6
- 206010073071 hepatocellular carcinoma Diseases 0.000 claims description 6
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 claims description 6
- 208000008443 pancreatic carcinoma Diseases 0.000 claims description 6
- 206010008342 Cervix carcinoma Diseases 0.000 claims description 5
- 208000008839 Kidney Neoplasms Diseases 0.000 claims description 5
- 206010033128 Ovarian cancer Diseases 0.000 claims description 5
- 206010038389 Renal cancer Diseases 0.000 claims description 5
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 claims description 5
- 201000010881 cervical cancer Diseases 0.000 claims description 5
- 201000010982 kidney cancer Diseases 0.000 claims description 5
- 201000001441 melanoma Diseases 0.000 claims description 5
- 201000002528 pancreatic cancer Diseases 0.000 claims description 5
- 206010005949 Bone cancer Diseases 0.000 claims description 4
- 208000018084 Bone neoplasm Diseases 0.000 claims description 4
- 208000022072 Gallbladder Neoplasms Diseases 0.000 claims description 4
- 208000000265 Lobular Carcinoma Diseases 0.000 claims description 4
- 208000000453 Skin Neoplasms Diseases 0.000 claims description 4
- 206010057644 Testis cancer Diseases 0.000 claims description 4
- 201000003714 breast lobular carcinoma Diseases 0.000 claims description 4
- 201000010175 gallbladder cancer Diseases 0.000 claims description 4
- 206010073096 invasive lobular breast carcinoma Diseases 0.000 claims description 4
- 201000000849 skin cancer Diseases 0.000 claims description 4
- 201000003120 testicular cancer Diseases 0.000 claims description 4
- 238000012958 reprocessing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000001172 regenerating effect Effects 0.000 claims description 2
- 239000000523 sample Substances 0.000 description 54
- 102000007066 Prostate-Specific Antigen Human genes 0.000 description 33
- 108010072866 Prostate-Specific Antigen Proteins 0.000 description 33
- 239000000427 antigen Substances 0.000 description 25
- 102000036639 antigens Human genes 0.000 description 23
- 108091007433 antigens Proteins 0.000 description 23
- 239000003550 marker Substances 0.000 description 22
- 210000004369 blood Anatomy 0.000 description 16
- 239000008280 blood Substances 0.000 description 16
- 238000005259 measurement Methods 0.000 description 16
- 238000003018 immunoassay Methods 0.000 description 15
- 210000002966 serum Anatomy 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 12
- 201000010099 disease Diseases 0.000 description 12
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 12
- 230000002496 gastric effect Effects 0.000 description 11
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000011161 development Methods 0.000 description 10
- 230000018109 developmental process Effects 0.000 description 10
- 108090000623 proteins and genes Proteins 0.000 description 10
- 238000001514 detection method Methods 0.000 description 9
- 210000004027 cell Anatomy 0.000 description 8
- 230000006872 improvement Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000003068 static effect Effects 0.000 description 8
- 238000013145 classification model Methods 0.000 description 7
- 238000003909 pattern recognition Methods 0.000 description 7
- 102000004169 proteins and genes Human genes 0.000 description 7
- 230000036541 health Effects 0.000 description 6
- 238000003384 imaging method Methods 0.000 description 6
- 210000000056 organ Anatomy 0.000 description 6
- 230000000391 smoking effect Effects 0.000 description 6
- 238000010200 validation analysis Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000003556 assay Methods 0.000 description 5
- 210000001124 body fluid Anatomy 0.000 description 5
- 238000007796 conventional method Methods 0.000 description 5
- 230000002068 genetic effect Effects 0.000 description 5
- 210000004072 lung Anatomy 0.000 description 5
- 210000002381 plasma Anatomy 0.000 description 5
- 206010005003 Bladder cancer Diseases 0.000 description 4
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 108020004999 messenger RNA Proteins 0.000 description 4
- 229920001184 polypeptide Polymers 0.000 description 4
- 102000004196 processed proteins & peptides Human genes 0.000 description 4
- 108090000765 processed proteins & peptides Proteins 0.000 description 4
- 238000002601 radiography Methods 0.000 description 4
- -1 serum Substances 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- 201000005112 urinary bladder cancer Diseases 0.000 description 4
- 102000055006 Calcitonin Human genes 0.000 description 3
- 108060001064 Calcitonin Proteins 0.000 description 3
- 102000004190 Enzymes Human genes 0.000 description 3
- 108090000790 Enzymes Proteins 0.000 description 3
- 101000914324 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 5 Proteins 0.000 description 3
- 101000914321 Homo sapiens Carcinoembryonic antigen-related cell adhesion molecule 7 Proteins 0.000 description 3
- 101000617725 Homo sapiens Pregnancy-specific beta-1-glycoprotein 2 Proteins 0.000 description 3
- 206010025323 Lymphomas Diseases 0.000 description 3
- 208000005718 Stomach Neoplasms Diseases 0.000 description 3
- 125000000539 amino acid group Chemical group 0.000 description 3
- 239000013060 biological fluid Substances 0.000 description 3
- BBBFJLBPOGFECG-VJVYQDLKSA-N calcitonin Chemical compound N([C@H](C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N1[C@@H](CCC1)C(N)=O)C(C)C)C(=O)[C@@H]1CSSC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N1 BBBFJLBPOGFECG-VJVYQDLKSA-N 0.000 description 3
- 229960004015 calcitonin Drugs 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 206010017758 gastric cancer Diseases 0.000 description 3
- 229940088597 hormone Drugs 0.000 description 3
- 239000005556 hormone Substances 0.000 description 3
- 230000001537 neural effect Effects 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000007619 statistical method Methods 0.000 description 3
- 210000002784 stomach Anatomy 0.000 description 3
- 201000011549 stomach cancer Diseases 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 108090000663 Annexin A1 Proteins 0.000 description 2
- 208000035143 Bacterial infection Diseases 0.000 description 2
- 101100314454 Caenorhabditis elegans tra-1 gene Proteins 0.000 description 2
- 238000002965 ELISA Methods 0.000 description 2
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 2
- 206010017993 Gastrointestinal neoplasms Diseases 0.000 description 2
- 206010027406 Mesothelioma Diseases 0.000 description 2
- 208000008900 Pancreatic Ductal Carcinoma Diseases 0.000 description 2
- 102100022019 Pregnancy-specific beta-1-glycoprotein 2 Human genes 0.000 description 2
- 206010036790 Productive cough Diseases 0.000 description 2
- 206010039491 Sarcoma Diseases 0.000 description 2
- 208000002495 Uterine Neoplasms Diseases 0.000 description 2
- 208000036142 Viral infection Diseases 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 239000012491 analyte Substances 0.000 description 2
- 208000022362 bacterial infectious disease Diseases 0.000 description 2
- 238000009534 blood test Methods 0.000 description 2
- 230000010261 cell growth Effects 0.000 description 2
- 238000002591 computed tomography Methods 0.000 description 2
- 201000004101 esophageal cancer Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003117 fluorescence-linked immunosorbent assay Methods 0.000 description 2
- 230000002489 hematologic effect Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 210000002751 lymph Anatomy 0.000 description 2
- 210000001165 lymph node Anatomy 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 201000008129 pancreatic ductal adenocarcinoma Diseases 0.000 description 2
- 230000035935 pregnancy Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 210000002307 prostate Anatomy 0.000 description 2
- 230000002685 pulmonary effect Effects 0.000 description 2
- 238000003127 radioimmunoassay Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 210000003296 saliva Anatomy 0.000 description 2
- 238000003196 serial analysis of gene expression Methods 0.000 description 2
- 210000003802 sputum Anatomy 0.000 description 2
- 208000024794 sputum Diseases 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- 210000004243 sweat Anatomy 0.000 description 2
- 230000002381 testicular Effects 0.000 description 2
- 239000000439 tumor marker Substances 0.000 description 2
- 210000002700 urine Anatomy 0.000 description 2
- 206010046766 uterine cancer Diseases 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- 102000000872 ATM Human genes 0.000 description 1
- 102000004145 Annexin A1 Human genes 0.000 description 1
- 102100040006 Annexin A1 Human genes 0.000 description 1
- 102000004149 Annexin A2 Human genes 0.000 description 1
- 108090000668 Annexin A2 Proteins 0.000 description 1
- 102100024003 Arf-GAP with SH3 domain, ANK repeat and PH domain-containing protein 1 Human genes 0.000 description 1
- 206010003445 Ascites Diseases 0.000 description 1
- 108010004586 Ataxia Telangiectasia Mutated Proteins Proteins 0.000 description 1
- 108700020463 BRCA1 Proteins 0.000 description 1
- 102000036365 BRCA1 Human genes 0.000 description 1
- 101150072950 BRCA1 gene Proteins 0.000 description 1
- 102000052609 BRCA2 Human genes 0.000 description 1
- 108700020462 BRCA2 Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 108010029692 Bisphosphoglycerate mutase Proteins 0.000 description 1
- 208000003174 Brain Neoplasms Diseases 0.000 description 1
- 101150008921 Brca2 gene Proteins 0.000 description 1
- 206010006458 Bronchitis chronic Diseases 0.000 description 1
- 102100025570 Cancer/testis antigen 1 Human genes 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 108091007854 Cdh1/Fizzy-related Proteins 0.000 description 1
- 102000038594 Cdh1/Fizzy-related Human genes 0.000 description 1
- 102100025064 Cellular tumor antigen p53 Human genes 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- 208000000668 Chronic Pancreatitis Diseases 0.000 description 1
- 206010009900 Colitis ulcerative Diseases 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 208000011231 Crohn disease Diseases 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 206010014561 Emphysema Diseases 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 108010067741 Fanconi Anemia Complementation Group N protein Proteins 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 102000002812 Heat-Shock Proteins Human genes 0.000 description 1
- 108010004889 Heat-Shock Proteins Proteins 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000856237 Homo sapiens Cancer/testis antigen 1 Proteins 0.000 description 1
- 101000687905 Homo sapiens Transcription factor SOX-2 Proteins 0.000 description 1
- 101000955067 Homo sapiens WAP four-disulfide core domain protein 2 Proteins 0.000 description 1
- 206010020772 Hypertension Diseases 0.000 description 1
- 101710200424 Inosine-5'-monophosphate dehydrogenase Proteins 0.000 description 1
- 102100033421 Keratin, type I cytoskeletal 18 Human genes 0.000 description 1
- 102100033420 Keratin, type I cytoskeletal 19 Human genes 0.000 description 1
- 102100023972 Keratin, type II cytoskeletal 8 Human genes 0.000 description 1
- 108010066327 Keratin-18 Proteins 0.000 description 1
- 108010066302 Keratin-19 Proteins 0.000 description 1
- 108010070511 Keratin-8 Proteins 0.000 description 1
- 102000004895 Lipoproteins Human genes 0.000 description 1
- 108090001030 Lipoproteins Proteins 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 206010063914 Multimorbidity Diseases 0.000 description 1
- 206010033649 Pancreatitis chronic Diseases 0.000 description 1
- 102100040884 Partner and localizer of BRCA2 Human genes 0.000 description 1
- 102000011025 Phosphoglycerate Mutase Human genes 0.000 description 1
- 238000010240 RT-PCR analysis Methods 0.000 description 1
- 108010081734 Ribonucleoproteins Proteins 0.000 description 1
- 102000004389 Ribonucleoproteins Human genes 0.000 description 1
- 102000054727 Serum Amyloid A Human genes 0.000 description 1
- 108700028909 Serum Amyloid A Proteins 0.000 description 1
- 238000012896 Statistical algorithm Methods 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 102100024270 Transcription factor SOX-2 Human genes 0.000 description 1
- 108010078814 Tumor Suppressor Protein p53 Proteins 0.000 description 1
- 102000015098 Tumor Suppressor Protein p53 Human genes 0.000 description 1
- 201000006704 Ulcerative Colitis Diseases 0.000 description 1
- 208000006593 Urologic Neoplasms Diseases 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 102100038965 WAP four-disulfide core domain protein 2 Human genes 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004520 agglutination Effects 0.000 description 1
- 235000001014 amino acid Nutrition 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000002820 assay format Methods 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 210000000013 bile duct Anatomy 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 206010006451 bronchitis Diseases 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 210000000038 chest Anatomy 0.000 description 1
- 230000035606 childbirth Effects 0.000 description 1
- 208000007451 chronic bronchitis Diseases 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000002052 colonoscopy Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013211 curve analysis Methods 0.000 description 1
- 208000030381 cutaneous melanoma Diseases 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000011223 gene expression profiling Methods 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000003054 hormonal effect Effects 0.000 description 1
- 230000001900 immune effect Effects 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 238000003364 immunohistochemistry Methods 0.000 description 1
- 238000011885 in vitro diagnostic (IVD) kits Methods 0.000 description 1
- 238000011534 incubation Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000028709 inflammatory response Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 206010024627 liposarcoma Diseases 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 238000009607 mammography Methods 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 230000005906 menstruation Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000009826 neoplastic cell growth Effects 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 201000007700 organ system cancer Diseases 0.000 description 1
- 230000002611 ovarian Effects 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 230000027758 ovulation cycle Effects 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 239000006187 pill Substances 0.000 description 1
- 208000022131 polyp of large intestine Diseases 0.000 description 1
- 108040000983 polyphosphate:AMP phosphotransferase activity proteins Proteins 0.000 description 1
- 238000009597 pregnancy test Methods 0.000 description 1
- 238000000159 protein binding assay Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000003248 secreting effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000002579 sigmoidoscopy Methods 0.000 description 1
- 201000003708 skin melanoma Diseases 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 230000005586 smoking cessation Effects 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 108010088201 squamous cell carcinoma-related antigen Proteins 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 235000000346 sugar Nutrition 0.000 description 1
- 150000008163 sugars Chemical class 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 201000002510 thyroid cancer Diseases 0.000 description 1
- 210000001685 thyroid gland Anatomy 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000002753 trypsin inhibitor Substances 0.000 description 1
- 239000000107 tumor biomarker Substances 0.000 description 1
- 210000003932 urinary bladder Anatomy 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
- 230000036642 wellbeing Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
Definitions
- This application pertains generally to classifier models generated by a machine learning system, trained with longitudinal data, for identifying asymptomatic patients with an increased risk for developing cancer and the type of cancer, especially in an otherwise asymptomatic or vaguely symptomatic patient.
- imaging and diagnostic tests have been introduced into medical practice in an attempt to help physicians detect cancer early. These include various imaging modalities such as mammography as well as diagnostic tests to identify cancer specific “biomarkers” in the blood and other bodily fluids such as the prostate specific antigen (PSA) test.
- PSA prostate specific antigen
- the value of many of these tests is often questioned particularly with regard to whether the costs and risks associated with false positives, false negatives, etc. outweigh the potential benefits in terms of actual lives saved.
- Cancer detection poses significant technical challenges as compared to detecting viral or bacterial infections since cancer cells, unlike viruses and bacteria, are biologically similar to and hard to distinguish from normal, healthy cells. For this reason, tests used for the early detection of cancer often suffer from higher numbers of false positives and false negatives than comparable tests for viral or bacterial infections or for tests that measure genetic, enzymatic, or hormonal abnormalities. This often causes confusion among healthcare practitioners and their patients leading in some cases to unnecessary, expensive, and invasive follow-up testing while in other cases to a complete disregard for follow-up testing resulting in cancers being detected too late for useful intervention.
- Physicians and patients welcome tests that yield a binary decision or result, e.g., either the patient is positive or negative for a condition, such as observed in the over the counter pregnancy test kits which present, for example, an immunoassay result in the shape of a plus sign or a negative sign as an indication of pregnancy or not.
- a binary decision or result e.g., either the patient is positive or negative for a condition, such as observed in the over the counter pregnancy test kits which present, for example, an immunoassay result in the shape of a plus sign or a negative sign as an indication of pregnancy or not.
- a level not obtainable for most cancer tests such binary outputs can be highly misleading or inaccurate.
- Machine learning systems comprising diagnostic decision-support systems may use clinical decision formulas, rules, trees, or other processes for assisting a physician with making a diagnosis.
- decision-making systems have been developed, such systems are not widely used in medical practice because these systems suffer from limitations that prevent them from being integrated into the day-to-day operations of health organizations.
- decision-making systems may provide an unmanageable volume of data, rely on analysis that is marginally significant, and not correlate well with complex multimorbidity (Greenhalgh, T. Evidence based medicine: a movement in crisis? BMJ (2014) 348:g3725)
- patient data may be scattered across different computer systems in both structured and unstructured form.
- systems are difficult to interact with (Berner, 2006; Shortliffe, 2006).
- the entry of patient data is difficult, the list of diagnostic suggestions may be too long, and the reasoning behind diagnostic suggestions is not always transparent. Further, the systems are not focused enough on next actions, and do not help the clinician figure out what to do to help the patient (Shortliffe, 2006).
- classifier models Disclosed herein are classifier models, machine learning systems, computer implemented systems and methods thereof.
- a method in a computer-implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an increased risk of having or developing cancer, for an asymptomatic patient, comprises obtaining measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample; obtaining clinical parameters corresponding to the patient including at least age and gender; classifying the patient into a risk category of having or developing cancer using a first classifier model, wherein the first classifier model is generated by a machine learning system using first training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients; and, wherein the first classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the
- the machine learning system further comprises iteratively regenerating the first classifier model by training the first classifier model with new training data to improve the performance of the first classifier model.
- the classifier model is iteratively regenerated wherein the method further comprises obtaining one or more test results from the diagnostic testing which confirm or deny the presence of cancer in the patient; incorporating the one or more test results into the first training data for further training of the first classifier model of the machine learning system; and generating an improved first classifier model by the machine learning system.
- the training data used to train the classifier model generated by the machine learning system comprises a group of data from a group of patients with no cancer diagnosis three or more months after providing a sample. In certain other embodiments, the training data comprises a group of data from a group of patients with a cancer diagnosis three or more months after providing a sample.
- a method in a computer implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an organ system-based malignancy for a patient with an increased risk of having or developing cancer, comprises:
- cancer classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients;
- cancer classifier model assigns the organ system class membership using input variables of age and the measured values of the panel of biomarkers from the patient;
- a method in a computer implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an organ system-based malignancy for a patient with an increased risk of having or developing cancer, comprising:
- a machine learning comprising at least one processor for predicting an organ system-based malignancy for a patient with an increased risk of having or developing cancer, wherein the processor is configured to:
- a) obtain measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample;
- e provide a notification to a user for diagnostic testing of the patient.
- FIGS. 1A and 1B show Receiver Operating Characteristic (ROC) Curves for the best performing machine learning models, Ridge Logistic Regression (AUC 0.875, Youden Index 0.628) ( FIG. 1A ) and SVM model (AUC 0.816, Youden Index 0.631) ( FIG. 1B ) for male subject's likelihood of developing cancer within about 2 years from testing date. See Example 1 and Table 4.
- ROC Receiver Operating Characteristic
- kNN pattern recognition algorithm
- FIG. 3 shows a table of input variables (biomarker measurements and age) for the classifier model and the classification of each patient into a risk category based on the output (probability value). See Example 3.
- FIG. 4 shows workflow for performing methods to predict an increased risk of having or developing cancer, for an asymptomatic patient using the present classifier models.
- FIGS. 5A and 5B show significant improvement of the present male classifier model for sensitivity and specificity ( FIG. 5A ) as compared to measurement of individual biomarkers (“any marker high” methods) for predicting cancer and the corresponding area under the curve (AUC) value of 0.87 ( FIG. 5B ). See Example 4.
- FIGS. 6A and 6B show the present male classifier model was able to distinguish cancers from noncancers with 82% sensitivity and 81% specificity with a threshold value of 0.5.
- FIGS. 7A and 7B show the present female classifier model is significantly better at predicting cancer development within one year than measurement of a panel of individual biomarkers from the same subjects ( FIG. 7A ) and corresponding AUC value of 0.67 ( FIG. 7B ).
- the present female classifier model is an improvement as compared to individual biomarker “single threshold” method wherein the sensitivity represents a 4-fold increase as compared to the single threshold method.
- the present female classifier model identifies 4 ⁇ more cancers in female patients as compared to the conventional methods of “any marker high”.
- FIGS. 8A and 8B show the present female classifier model was able to distinguish cancers from noncancers with 50% sensitivity and 74% specificity with a threshold value of 0.5.
- classifier models and there use with asymptomatic patients as to cancer for the early prediction of tumors and/or occult cancer.
- the classifier models were generated by a machine learning system using training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients.
- the present classifier models were trained with biomarkers that were measured at least 3 months, if not longer, before patients received a diagnosis.
- training data comprises a group of data from a group of patients with no cancer diagnosis three or more months after providing a sample.
- the training data comprises a group of data from a group of patients with a cancer diagnosis three or more months after providing a sample. See Example 1A.
- the classifier models are “trained” using machine learning systems by building a model from inputs.
- Those inputs may be longitudinal data, wherein a known diagnosis of cancer (including matched controls) is determine months, if not years after data from measured biomarkers and clinical factors of those patients is collected. See Example 1A and 2 for training of the present classifier models using longitudinal cancer patient data.
- the classifier model has a performance of a Receiver Operator Characteristic (ROC) curve with a sensitivity value of at least 0.8 and a specificity value of at least 0.8.
- ROC Receiver Operator Characteristic
- a first classifier model generated by a machine learning system, that classifies a patient into a risk category of having or developing cancer.
- use of the classifier model classifies a patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the classifier model is above a threshold value.
- the classifier model classifies a patient in a low risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the classifier model is below a threshold value.
- the term “increased risk” refers to an increase for the presence, or development, of the cancer as compared to the known prevalence of that particular cancer across the population cohort. See Example 3.
- a second classifier model generated by a machine learning system, that classifies a patient into an organ system or specific cancer class membership.
- the second classifier model assigns the organ system or specific cancer class membership using input variables of age and the measured values of the panel of biomarkers from the patient.
- a patient is classified into an organ system or specific cancer class membership using a second classifier model, when the patient was classified into an increased risk category by the first classifier model, and wherein the second classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
- the classifier model is static, and its use is implemented by a computer-implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement the classifier model.
- a machine learning system iteratively regenerates the classifier model by training the classifier model with new training data to improve the performance of the classifier model.
- the present methods using a first classifier model and in a computer-implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an increased risk of having or developing cancer, for an asymptomatic patient, comprise obtaining measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample, obtaining clinical parameters corresponding to the patient including at least age and gender, classifying the patient into a risk category of having or developing cancer using a first classifier model, wherein the first classifier model is generated by a machine learning system using first training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients; and, wherein the first classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of bio
- the first classifier model yields a numerical risk score for each patient tested, which can be used by physicians to further inform screening procedures to better predict and diagnose early stage cancer in asymptomatic patients. Those patients classified into an increased risk category may be further classified using the second classifier model into a class membership. That class membership may be an organ system malignancy, or a specific cancer type. Also, as disclosed in more detail herein, the machine learning system is adapted to receive additional data as the system is used in a real-world clinical setting and to recalculate and improve the performance so that the classifier model becomes “smarter” the more it is used.
- the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated.
- the term “about” is used to refer to an amount that is approximately, nearly, almost, or in the vicinity of being equal to or is equal to a stated amount, e.g., the state amount plus/minus about 5%, about 4%, about 3%, about 2% or about 1%.
- asymptomatic refers to a patient or human subject that has not previously been diagnosed with the same cancer that their risk of having is now being quantified and categorized.
- human subjects may show signs such as coughing, fatigue, pain, etc., but have not been previously diagnosed with lung cancer but are now undergoing screening to categorize their increased risk for the presence of cancer and for the present methods are still considered “asymptomatic”.
- the term “AUC” refers to the Area Under the Curve, for example, of a ROC Curve. That value can assess the merit or performance of a test on a given sample population with a value of 1 representing a good test ranging down to 0.5 which means the test is providing a random response in classifying test subjects. Since the range of the AUC is only 0.5 to 1.0, a small change in AUC has greater significance than a similar change in a metric that ranges for 0 to 1 or 0 to 100%. When the % change in the AUC is given, it will be calculated based on the fact that the full range of the metric is 0.5 to 1.0.
- a variety of statistics packages can calculate AUC for a ROC curve, such as, JMPTM or Analyse-ItTM.
- AUC can be used to compare the accuracy of the classification model across the complete data range. Classification models with greater AUC have, by definition, a greater capacity to classify unknowns correctly between the two groups of interest (disease and no disease).
- biological sample and “test sample” refer to all biological fluids and excretions isolated from any given subject.
- samples include, but are not limited to, blood, blood serum, blood plasma, urine, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, bronchial and other lavage samples, or tissue extract samples.
- blood, serum, plasma and bronchial lavage or other liquid samples are convenient test samples for use in the context of the present methods.
- biomarker measure is information relating to a biomarker that is useful for characterizing the presence or absence of a disease. Such information may include measured values which are, or are proportional to, concentration, or that are otherwise provide qualitative or quantitative indications of expression of the biomarker in tissues or biologic fluids.
- cancer and “cancerous” refer to or describe the physiological condition in mammals that is typically characterized by unregulated cell growth.
- examples of cancer include but are not limited to, lung cancer, breast cancer, colon cancer, prostate cancer, hepatocellular cancer, gastric cancer, pancreatic cancer, cervical cancer, ovarian cancer, liver cancer, bladder cancer, cancer of the urinary tract, thyroid cancer, renal cancer, carcinoma, melanoma, and brain cancer.
- the term “cohort” or “cohort population” refers to a group or segment of human subjects with shared factors or influences, such as age, family history, cancer risk factors, environmental influences, medical histories, etc.
- a “cohort” refers to a group of human subjects with shared cancer risk factors; this is also referred to herein as a “disease cohort”.
- a “cohort” refers to a normal population group matched, for example by age, to the cancer risk cohort; also referred to herein as a “normal cohort”.
- a “same cohort” refers to a group of human subjects having the same shared cancer risk factors as the individual undergoing assessment for a risk of having a disease such as cancer.
- machine learning refers to algorithms that give a computer the ability to learn without being explicitly programmed including algorithms that learn from and make predictions about data.
- Machine learning algorithms include, but are not limited to, decision tree learning, artificial neural networks (ANN) (also referred to herein as a “neural net”), deep learning neural network, support vector machines, rule base machine learning, random forest, logistic regression, pattern recognition algorithms, etc.
- ANN artificial neural networks
- neural net deep learning neural network
- linear regression or logistic regression can be used as part of a machine learning process.
- using linear regression or another algorithm as part of a machine learning process is distinct from performing a statistical analysis such as regression with a spreadsheet program such as Excel.
- the machine learning process has the ability to continually learn and adjust the classifier model as new data becomes available and does not rely on explicit or rules-based programming.
- Statistical modeling relies on finding relationships between variables (e.g., mathematical equations) to predict an outcome.
- Medical history refers to any type of medical information associated with a patient.
- the medical history is stored in an electronic medical records database.
- Medical history may include clinical data (e.g., imaging modalities, blood work, biomarkers, cancerous samples and control samples, labs, etc.), clinical notes, symptoms, severity of symptoms, number of years smoking, family history of a disease, history of illness, treatment and outcomes, an ICD code indicating a particular diagnosis, history of other diseases, radiology reports, imaging studies, reports, medical histories, genetic risk factors identified from genetic testing, genetic mutations, etc.
- the term “increased risk” refers to an increase in the risk level, for a human subject after analysis by the classifier model, for the presence, or development, of a cancer relative to a population's known prevalence of a particular cancer before testing.
- a human subject's risk for cancer before biomarker testing and/or data analysis may be 1% (based on the understood prevalence of cancer in the population), but after analysis using the classifier model the patient's risk for the presence of cancer may be 8% or alternatively reported as an increase of 8 times compared to the cohort.
- the machine learning system calculates the 8% risk of having the cancer and the increased risk of 8 times relative to the population or cohort population is provided in more detail herein.
- markers refer to molecules that can be evaluated in a sample and are associated with a physical condition.
- markers include expressed genes or their products (e.g., proteins) or autoantibodies to those proteins that can be detected from human samples, such as blood, serum, solid tissue, and the like, that is associated with a physical or disease condition.
- biomarkers include, but are not limited to, biomolecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, metabolites, polypeptides, proteins (such as, but not limited to, antigens and antibodies), carbohydrates, lipids, hormones, antibodies, regions of interest which serve as surrogates for biological molecules, combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins) and any complexes involving any such biomolecules, such as, but not limited to, a complex formed between an antigen and an autoantibody that binds to an available epitope on said antigen.
- biomolecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, metabolites, polypeptides, proteins (such as, but not limited to, antigens and antibodies), carbohydrates, lipids, hormones, antibodies, regions of interest which serve as surrogates for biological molecules, combinations thereof (e.g., glycoproteins, ribonucleoproteins, lip
- biomarker can also refer to a portion of a polypeptide (parent) sequence that comprises at least 5 consecutive amino acid residues, preferably at least 10 consecutive amino acid residues, more preferably at least 15 consecutive amino acid residues, and retains a biological activity and/or some functional characteristics of the parent polypeptide, e.g. antigenicity or structural domain characteristics.
- the present markers refer to both tumor antigens present on or in cancerous cells or those that have been shed from the cancerous cells into bodily fluids such as blood or serum.
- the present markers as used herein, also refer to autoantibodies produced by the body to those tumor antigens.
- a “marker” as used herein refers to both tumor antigens and autoantibodies that are capable of being detected in serum of a human subject. It is also understood in the present methods that use of the markers in a panel may each contribute equally in the classifier model or certain biomarkers may be weighted wherein the markers in a panel contribute a different weight or amount in the classifier model.
- Biomarker may include any biological substance indicative of the presence of cancer, including but not limited to, genetic, epigenetic, proteomic, glycomic or imaging biomarkers. Biomarkers include molecules secreted by tumors or cancer, including cell freeDNA, mRNA, and protein-based products (tumor markers or antigens), etc.
- pathology of (tumor) cancer includes all phenomena that compromise the well-being of the patient. This includes, without limitation, abnormal or uncontrollable cell growth, metastasis, interference with the normal functioning of neighboring cells, release of cytokines or other secretory products at abnormal levels, suppression or aggravation of inflammatory or immunological response, neoplasia, premalignancy, malignancy, invasion of surrounding or distant tissues or organs, such as lymph nodes, etc.
- a “physiological sample” includes samples from biological fluids and tissues.
- Biological fluids include whole blood, blood plasma, blood serum, sputum, urine, sweat, lymph, and alveolar lavage.
- Tissue samples include biopsies from solid lung tissue or other solid tissues, lymph node biopsy tissues, biopsies of metastatic foci. Methods of obtaining physiological samples are well known.
- a positive predictive score As used herein, the term “a positive predictive score,” “a positive predictive value,” or “PPV” refers to the likelihood that a score within a certain range on a biomarker test is a true positive result. It is defined as the number of true positive results divided by the number of total positive results. True positive results can be calculated by multiplying the test sensitivity times the prevalence of disease in the test population. False positives can be calculated by multiplying ( 1 minus the specificity) times (1 ⁇ the prevalence of disease in the test population). Total positive results equal True Positives plus False Positives.
- ROC curve Receiveiver Operating Characteristic Curve
- ROC curves can be generated for a single feature as well as for other single outputs, for example, a combination of two or more features that are combined (such as, added, subtracted, multiplied, weighted, etc.) to provide a single combined value which can be plotted in a ROC curve.
- the ROC curve is a plot of the true positive rate (sensitivity) of a test against the false positive rate (1 ⁇ specificity) of the test.
- ROC curves provide another means to quickly screen a data set.
- performance of the present classifier models is determined using computed ROC curves with sensitivity and specificity values. The performance is used to compare models, and also importantly, to compare models with different variables to select a classifier model with the highest accuracy as to predicting having or developing cancer, for a patient.
- classifier models for classifying asymptomatic patients into a risk category for having or developing cancer and/or classifying a patient with an increased risk of having or developing cancer into an organ system-based malignancy class membership and/or into a specific cancer class membership.
- the machine learning system disclosed herein generated the present classifier models using longitudinal data from a cohort of over 12,000 asymptomatic male patients and over 15,000 asymptomatic female patients. See Example 1A and 2.
- biomarkers were measured, and follow-up of the patients was performed to provide a diagnostic indicator in the future (e.g. no cancer development, or diagnosis of a specific cancer).
- Using biomarkers obtained months, or even years, before cancer was detected provided a powerful tool to train the classifier models resulting in highly accurate classifier models as measured by ROC curve analysis.
- training data comprises data from a group of patients with no cancer diagnosis three or more months after providing a sample.
- training data comprises data from a group of patients with a cancer diagnosis three or more months after providing a sample.
- the cohort of asymptomatic female patients was used to train a classifier model to be used with female patients and the cohort of asymptomatic male patients was used to train a classifier model to be used with male patients.
- the gender of the patient is used to select the classifier model.
- training data comprises a greater number of patients without cancer than with cancer, wherein training of the classifier models comprises reprocessing the training data by using a stratified sampling technique to improve selection of negative samples.
- the classifier model has a performance of a Receiver Operator Characteristic (ROC) curve with a sensitivity value of at least 0.8 and a specificity value of at least 0.8.
- ROC Receiver Operator Characteristic
- the machine learning system generates a classifier model that may be static.
- the classifier model is trained and then its use is implemented with a computer implemented system wherein patient data (e.g. biomarker marker measurements and age) are input and the classifier model provides an output that is used to classify patients.
- patient data e.g. biomarker marker measurements and age
- the classifier models are continuously, or routinely, being updated and improved wherein the input values, output values, along with a diagnostic indicator from patients are used to further train the classifier models.
- the classifier model has an improved performance of a Receiver Operator Characteristic (ROC) curve having a sensitivity value of at least 0.85 and a specificity value of at least 0.8.
- ROC Receiver Operator Characteristic
- the classifier model is further trained and improved by the machine learning system comprising (1) obtaining one or more test results from the diagnostic testing which confirm or deny the presence of cancer in the patient, (2) incorporating the one or more test results into the training data for further training of the classifier model of the machine learning system; and (3) generating an improved classifier model by the machine learning system.
- diagnostic testing comprises radiography screening or tissue biopsy.
- this first classifier model is generated by a machine learning system using training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients.
- the first classifier model was trained using data from only a male cohort or a female cohort.
- the training data that comprises values of a panel of at least six biomarkers.
- the training data comprises values from a panel of biomarkers selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
- a first classifier model is generated by a machine learning system using training data that comprises a male cohort only, values of a panel of six biomarkers comprising AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
- a first classifier model is generated by a machine learning system using training data that comprises a female cohort only, values of a panel of seven biomarkers comprising AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
- the first classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the first classifier model is above a threshold. In embodiments, the first classifier model classifies the patient in a low (e.g., no increased risk) risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the first classifier model is below a threshold.
- the output is a probability value, wherein the threshold is set to separate patients into a low risk category (those patients wherein their risk is no more than the population reflective of the training data) from an increased risk category (those patients with an increased risk of having or developing cancer as compared to a population reflective of the training data). See Example 3 and FIG. 3 .
- the increased risk category may be further subdivided, such as a moderate risk category and a high-risk category.
- those patients classified into an increased risk category may be assigned a risk score, such as a percent, e.g., X of 100, or multiplier number.
- a patient may be assigned a 2 to 10% risk score (of having or developing cancer) wherein the incidence of cancer in the population used to train the classifier model is about 1%.
- those percentage risk scores may be presented as X of 100, e.g. 3 out of 100 wherein a patient with that score has an approximately 3 out of 100 risk of developing cancer within one year from when the biomarkers were measured.
- a threshold cut off wherein a risk score at or below would be considered normal, and a risk score above would be considered an increased risk.
- the threshold cut off value may be 1 out of 100, corresponding to a “normal” risk of having cancer in a heterogenous population of 1%.
- the patient may be assigned a multiplier number.
- the risk score is not an output value, but a value assigned to a risk category, such as an increased risk category, wherein the output value is used to classify a patient into the risk category.
- an output value is a predicted probability value that may range from 0 to 1, wherein that value is used to classify a patient into a risk category. The risk score assigned to a risk category is then calculated by comparing the predicted probability assigned to a risk category to the prevalence of cancer in a population. See Example 3.
- a patient may have an increased risk of having or developing cancer selected from the group consisting of: breast cancer, bile duct cancer, bone cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, kidney cancer, liver or hepatocellular cancer, lobular carcinoma, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, and testicular cancer.
- cancer selected from the group consisting of: breast cancer, bile duct cancer, bone cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, kidney cancer, liver or hepatocellular cancer, lobular carcinoma, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, and testicular cancer.
- the classifier model is selected based on the gender of the patient.
- the input variables for a male patient comprises measured values from a panel of at least six biomarkers and age.
- the panel of biomarkers is selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
- the input variable for a male patient comprises measured values from AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
- the input variables for a female patient comprises measured values from a panel of at least six biomarkers and age.
- the input variables for a female patent comprises measured values from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
- the first classifier model comprises a support vector machine, a decision tree, a random forest, a neural network, a deep learning neural network, or a logistic regression algorithm.
- a second classifier model to predict at least one most likely organ system malignancy and/or a specific cancer.
- the second classifier model is applied to patients that are classified into an increased risk category for having or developing cancer.
- the second classifier model was trained with measured biomarkers from a longitudinal study, and age, wherein one classifier model was trained from and for female patients and another classifier model was trained from and for male patients.
- the second classifier model was generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
- the second classifier model was trained using data from only a male cohort or only a female cohort.
- the training data comprises values of a panel of at least six biomarkers.
- the training data comprises values from a panel of biomarkers selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
- a second classifier model is generated by a machine learning system using training data that comprises a male cohort only, values of a panel of six biomarkers comprising AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
- a second classifier model is generated by a machine learning system using training data that comprises a female cohort only, values of a panel of seven biomarkers comprising AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
- the second classifier model has a performance of a Receiver Operator Characteristic (ROC) curve with a sensitivity value of at least 0.8 and a specificity value of at least 0.7.
- ROC Receiver Operator Characteristic
- the second classifier model assigns a patient into an organ system class membership using input variables of age and the measured values of the panel of biomarkers from the patient. In certain embodiments, the second classifier model assigns a patient into a specific cancer class membership using input variables of age and the measured values of the panel of biomarkers from the patient. In embodiments, the class membership is for an organ system selected from genitourinary (GU), gastrointestinal (GI), pulmonary, dermatological, hematological, nervous system, gynecological, or general. See Example 3.
- GUI genitourinary
- GI gastrointestinal
- pulmonary dermatological
- hematological hematological
- nervous system gynecological
- the class membership is for a cancer selected from breast cancer, bile duct cancer, bone cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, kidney cancer, liver or hepatocellular cancer, lobular carcinoma, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, or testicular cancer.
- the second classifier model is selected based on the gender of the patient.
- the input variables for a male patient comprises measured values from a panel of at least six biomarkers and age.
- the panel of biomarkers is selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
- the input variable for a male patient comprises measured values from AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
- the input variables for a female patient comprises measured values from a panel of at least six biomarkers and age.
- the input variables for a female patent comprises measured values from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
- the second classifier model comprises a pattern recognition algorithm.
- the second classifier model comprises k-Nearest Neighbors algorithm (kNN).
- the second classifier model comprises a support vector machine, a decision tree, a random forest, a neural network, a deep learning neural network, or a logistic regression algorithm.
- a machine learning system comprising at least one processor for predicting an increased risk for cancer, and/or an organ system-based malignancy, and/or a specific cancer.
- the processor is configured to obtain measured values of a panel of biomarkers in a sample from a patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample, obtain clinical parameters from the patient including age and gender, and generate a first classifier model by the machine learning system to classify the patient into a risk category of having or developing cancer, wherein the first classifier model classifies a patient into an increased risk category when the output of the first classifier model is greater than a threshold, and wherein the first classifier model is generated by the machine learning system using training data that comprises values from a panel of at least two biomarkers, age, gender and a diagnostic indicator for a population of patients.
- the training data is from longitudinal study wherein the biomarker measurements are obtained months, or years, before a cancer diagnosis is confirmed (or not) for a patent in the training data cohort.
- the processor is configured to obtain measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample; obtain clinical parameters from the patient including age and gender, and generate a second classifier model by the machine learning system to classify the patient into an organ system class membership, wherein the second classifier model assigns the organ system class membership using input variables of age and the measured values of the panel of biomarkers from the patient, and wherein the second classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
- the processor is configured to obtain measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample; obtain clinical parameters from the patient including age and gender, and generate a second classifier model by the machine learning system to classify the patient into a specific cancer class membership, wherein the second classifier model assigns the specific cancer class membership using input variables of age and the measured values of the panel of biomarkers from the patient, and wherein the second classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
- a panel of markers from an asymptomatic human subject may be measured.
- gene expression e.g., mRNA
- resulting gene products e.g., polypeptides or proteins
- tumor antigens e.g. CEA, CA-125, PSA, etc.
- testing is preferably conducted using an automated immunoassay analyzer from a company with a large installed base.
- Representative analyzers include the Elecsys® system from Roche Diagnostics or the Architect® Analyzer from Abbott Diagnostics. Using such standardized platforms permits the results from one laboratory or hospital to be transferable to other laboratories around the world.
- the methods provided herein are not limited to any one assay format or to any particular set of markers that comprise a panel. For example, PCT International Pat. Pub. No. WO 2009/006323; US Pub. No. 2012/0071334; US Pat. Pub. No. 2008/0160546; US Pat. Pub. No. 2008/0133141; US Pat. Pub. No.
- 2007/0178504 (each herein incorporated by reference) teaches a multiplex lung cancer assay using beads as the solid phase and fluorescence or color as the reporter in an immunoassay format. Hence, the degree of fluorescence or color can be provided in the form of a qualitative score as compared to an actual quantitative value of reporter presence and amount.
- the presence and quantification of one or more antigens or antibodies in a test sample can be determined using one or more immunoassays that are known in the art.
- Immunoassays typically comprise: (a) providing an antibody (or antigen) that specifically binds to the biomarker (namely, an antigen or an antibody); (b) contacting a test sample with the antibody or antigen; and (c) detecting the presence of a complex of the antibody bound to the antigen in the test sample or a complex of the antigen bound to the antibody in the test sample.
- Well known immunological binding assays include, for example, an enzyme linked immunosorbent assay (ELISA), which is also known as a “sandwich assay”, an enzyme immunoassay (EIA), a radioimmunoassay (RIA), a fluoroimmunoassay (FIA), a chemiluminescent immunoassay (CLIA), a counting immunoassay (CIA), a filter media enzyme immunoassay (META), a fluorescence-linked immunosorbent assay (FLISA), agglutination immunoassays and multiplex fluorescent immunoassays (such as the Luminex Lab MAP), immunohistochemistry, etc.
- ELISA enzyme linked immunosorbent assay
- EIA enzyme immunoassay
- RIA radioimmunoassay
- FFIA fluoroimmunoassay
- CLIA chemiluminescent immunoassay
- CIA counting immunoassay
- MEA filter media enzyme
- the immunoassay can be used to determine a test amount of an antigen in a sample from a subject.
- a test amount of an antigen in a sample can be detected using the immunoassay methods described above. If an antigen is present in the sample, it will form an antibody-antigen complex with an antibody that specifically binds the antigen under suitable incubation conditions as described herein. The amount, activity, or concentration, etc. of an antibody-antigen complex can be determined by comparing the measured value to a standard or control.
- the AUC for the antigen can then be calculated using techniques known, such as, but not limited to, a ROC analysis.
- gene expression of markers is measured in a sample from a human subject.
- markers e.g., mRNA
- gene expression profiling methods for use with paraffin-embedded tissue include quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), however, other technology platforms, including mass spectroscopy and DNA microarrays can also be used. These methods include, but are not limited to, PCR, Microarrays, Serial Analysis of Gene Expression (SAGE), and Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS).
- the sample from the human subject is a tissue section such as from a biopsy.
- the sample from the human subject is a bodily fluid such as blood, serum, plasma or a part or fraction thereof.
- the sample is a blood or serum and the markers are proteins measured therefrom.
- the sample is a tissue section and the markers are mRNA expressed therein. Many other combinations of sample forms from the human subjects and the form of the markers are contemplated.
- a panel can be selected, or as was done by the present Applicants, a panel can be selected based on measurement of individual markers in longitudinal clinical samples wherein a panel is generated based on empirical data for a desired disease such as cancer.
- biomarkers examples include molecules detectable, for example, in a body fluid sample, such as, antibodies, antigens, small molecules, proteins, hormones, enzymes, genes and so on.
- molecules detectable for example, in a body fluid sample
- a body fluid sample such as, antibodies, antigens, small molecules, proteins, hormones, enzymes, genes and so on.
- tumor antigens has many advantages due to their widespread use over many years and the fact that validated and standardized detection kits are available for many of them for use with the aforementioned automated immunoassay platforms.
- a panel of biomarkers are selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
- the panel of biomarkers is selected from anti-p53, anti-NY-ESO-1, anti-ras, anti-Neu, anti-MAPKAPK3, cytokeratin 8, cytokeratin 19, cytokeratin 18, CEA, CA125, CA15-3, CA19-9, Cyfra 21-1, serum amyloid A, proGRP and ⁇ 1 -anti-trypsin (US 20120071334; US 20080160546; US 20080133141; US 20070178504 (each herein incorporated by reference)).
- Additional tumor markers include human epididymal protein 4; calcitonin, PAP, BR 27.29, Her-2; and HE-4.
- Autoantibodies that are proposed to be circulating markers for lung cancer include p53, NY-ESO-1, CAGE, GBU4-5, Annexin 1, SOX2 and IMPDH, phosphoglycerate mutase, ubiquillin, Annexin I, Annexin II, and heat shock protein 70-9B (HSP70-9B).
- a panel of markers comprises markers associated with a cancer selected from bile duct cancer, bone cancer, pancreatic cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, liver or hepatocellular cancer, ovarian cancer, testicular cancer, lobular carcinoma, prostate cancer, and skin cancer or melanoma.
- a panel of markers comprises markers associated with breast cancer.
- a panel of biomarkers comprises markers associated with “pan cancer”.
- the patients were tested with the following biomarkers: AFP, CA 15-3, CA125, PSA, SCC, CEA, CA 19-9, and CYFRA, 21-1 using kits available from Roche Diagnostics, Abbott Diagnostics, and Siemens Healthcare Diagnostics.
- the sensitivity of the panel for identifying the four most commonly diagnosed malignancies in that region was 90.9%, 75.0%, 100% and 76%, respectively.
- Subjects with at least one of the markers showing values above the cut-off point were considered positive for the assay. No algorithm was reported. Moreover, neither clinical parameters nor biomarker velocity were factored in with this test.
- the methods and machine learning systems according to the present invention can improve and enhance the pan-cancer biomarker panel reported by the Taiwanese group and readily permit its use in other parts of the world.
- an algorithm that combines biomarker values with clinical parameters could be employed that automatically improves using the machine learning software.
- a panel can comprise any number of markers as a design choice, seeking, for example, to maximize specificity or sensitivity of the classifier model.
- the present methods may ask for presence of at least one of two or more biomarkers, three or more biomarkers, four or more biomarkers, five or more biomarkers, six or more biomarkers, seven or more biomarkers, eight biomarkers or more as a design choice.
- the panel of biomarkers may comprise at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine or at least ten or more different markers. In one embodiment, the panel of biomarkers comprises about two to ten different markers. In another embodiment, the panel of biomarkers comprises about four to eight different markers. In yet another embodiment, the panel of markers comprises about six or about seven different markers.
- a sample is committed to the assay and the results can be a range of numbers reflecting the presence and level (e.g., concentration, amount, activity, etc.) of presence of each of the biomarkers of the panel in the sample.
- each marker in the panel is measured and normalized wherein none of the markers are given any specific weight. In this instance each marker has a weight of 1.
- the choice of the markers may be based on the understanding that each marker, when measured and normalized, contributed unequally as an input variable for the classifier model.
- a particular marker in the panel can either be weighted as a fraction of 1 (for example if the relative contribution is low), a multiple of 1 (for example if the relative contribution is high) or as 1 (for example when the relative contribution is neutral compared to the other markers in the panel).
- a machine learning system may analyze values from biomarker panels without normalization of the values.
- the raw value obtained from the instrumentation to make the measurement may be analyzed directly.
- Primary care healthcare practitioners who may include physicians specializing in internal medicine or family practice as well as physician assistants and nurse practitioners, are among the users of the techniques disclosed herein. These primary care providers typically see a large volume of patients each day. In one instance these patients are at risk for lung cancer due to smoking history, age, and other lifestyle factors. In 2012 about 18% of the U.S. population was current smokers and many more were former smokers with a lung cancer risk profile above that of a population that has never smoked.
- a blood sample from patient such as a patient 50 years of age or older, is sent to a laboratory qualified to test the sample using a panel of biomarkers, such as those used to train the present classifier models generated by a machine learning system.
- biomarkers such as those used to train the present classifier models generated by a machine learning system.
- suitable bodily fluids such as a sputum or saliva might also be utilized.
- the measured values of the biomarkers are then used as input values, along with age, to be used with the first classifier model in a computer implemented system.
- An output value is obtained and compared to a threshold value wherein the threshold is empirically determined and set to separate patients in a low risk category from those in an increased risk for having or developing cancer.
- the threshold value is empirically determined using longitudinal clinical data. If the risk calculation is to be made at the point of care, rather than at the laboratory, a software application compatible with mobile devices (e.g. a tablet or smart phone) may be employed.
- the input variables of measured biomarkers and age may be used with the second classifier model in a computer implemented system.
- An output value is obtained and compared to the longitudinal clinical data used to train the second classifier model and assigned a class membership, wherein the class memberships are organ system.
- the class membership is further defined by a specific cancer type, e.g. lung cancer.
- Embodiments of the present invention further provide for an apparatus for assessing a subject's risk level for the presence of cancer and correlating the risk level with an increase or decrease of the presence of cancer after testing relative to a population or a cohort population.
- the apparatus may comprise a processor configured to execute computer readable media instructions (e.g., a computer program or software application, e.g., a machine learning system, to receive the concentration values from the evaluation of biomarkers in a sample and, in combination with other risk factors (e.g., medical history of the patient, publicly available sources of information pertaining to a risk of developing cancer, etc.) may determine a risk score and compare it to a grouping of stratified cohort population comprising multiple risk categories.
- computer readable media instructions e.g., a computer program or software application, e.g., a machine learning system, to receive the concentration values from the evaluation of biomarkers in a sample and, in combination with other risk factors (e.g., medical history of the patient, publicly available sources of information
- the apparatus can take any of a variety of forms, for example, a handheld device, a tablet, or any other type of computer or electronic device.
- the apparatus may also comprise a processor configured to execute instructions (e.g., a computer software product, an application for a handheld device, a handheld device configured to perform the method, a world-wide-web (WWW) page or other cloud or network accessible location, or any computing device.
- the apparatus may include a handheld device, a tablet, or any other type of computer or electronic device for accessing a machine learning system provided as a software as a service (SaaS) deployment.
- SaaS software as a service
- the correlation may be displayed as a graphical representation, which, in some embodiments, is stored in a database or memory, such as a random access memory, read-only memory, disk, virtual memory, etc.
- a database or memory such as a random access memory, read-only memory, disk, virtual memory, etc.
- Other suitable representations, or exemplifications known in the art may also be used.
- the apparatus may further comprise a storage means for storing the correlation, an input means, and a display means for displaying the status of the subject in terms of the particular medical condition.
- the storage means can be, for example, random access memory, read-only memory, a cache, a buffer, a disk, virtual memory, or a database.
- the input means can be, for example, a keypad, a keyboard, stored data, a touch screen, a voice-activated system, a downloadable program, downloadable data, a digital interface, a hand-held device, or an infrared signal device.
- the display means can be, for example, a computer monitor, a cathode ray tube (CRT), a digital screen, a light-emitting diode (LED), a liquid crystal display (LCD), an X-ray, a compressed digitized image, a video image, or a hand-held device.
- the apparatus can further comprise or communicate with a database, wherein the database stores the correlation of factors and is accessible to the user.
- the apparatus is a computing device, for example, in the form of a computer or hand-held device that includes a processing unit, memory, and storage.
- the computing device can include or have access to a computing environment that comprises a variety of computer-readable media, such as volatile memory and non-volatile memory, removable storage and/or non-removable storage.
- Computer storage includes, for example, RAM, ROM, EPROM & EEPROM, flash memory or other memory technologies, CD ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other medium known in the art to be capable of storing computer-readable instructions.
- the computing device can also include or have access to a computing environment that comprises input, output, and/or a communication connection.
- the input can be one or several devices, such as a keyboard, mouse, touch screen, or stylus.
- the output can also be one or several devices, such as a video display, a printer, an audio output device, a touch stimulation output device, or a screen reading output device.
- the computing device can be configured to operate in a networked environment using a communication connection to connect to one or more remote computers.
- the communication connection can be, for example, a Local Area Network (LAN), a Wide Area Network (WAN) or other networks and can operate over the cloud, a wired network, wireless radio frequency network, and/or an infrared network.
- LAN Local Area Network
- WAN Wide Area Network
- Artificial intelligence systems include computer systems configured to perform tasks usually accomplished by humans, e.g., speech recognition, decision making, language translation, image processing and recognition, etc.
- artificial intelligence systems have the capacity to learn, to maintain and access a large repository of information, to perform reasoning and analysis in order to make decisions, as well as the ability to self-correct.
- Artificial intelligence systems may include knowledge representation systems and machine learning systems.
- Knowledge representation systems generally provide structure to capture and encode information used to support decision making.
- Machine learning systems are capable of analyzing data to identify new trends and patterns in the data.
- machine learning systems may include neural networks, induction algorithms, genetic algorithms, etc. and may derive solutions by analyzing patterns in data.
- the present classifier models comprise an algorithm such as a support vector machine, a decision tree, a random forest, a neural network, a deep learning neural network, a logistic regression or a pattern recognition algorithm.
- the present classifier models may be used to classify an individual patient into one of a plurality of categories, e.g., a category indicative of a likelihood of cancer or a category indicating that cancer is not likely.
- Inputs to the classifier model may include a panel of biomarkers associated with the presence of cancer as well as clinical parameters. See Example 3.
- clinical parameters include one or more of the following: (1) age; (2) gender; (3) smoking history in years; (4) number of packs per year; (5) symptoms; (6) family history of cancer; (7) concomitant illnesses; (8) number of nodules; (9) size of nodules; and (10) imaging data and so forth.
- the clinical parameter used as in put value is age wherein gender is used to train the classifier model providing a classifier model for male patients and a separate classifier model for female patients.
- the clinical parameters include smoking history in years, number of packs per year, and age.
- the panel of biomarkers comprises any two, any three, any four, any five, any six, any seven, any eight, any nine, or any ten biomarkers.
- the panel of biomarkers comprises two or more biomarkers selected from the group consisting of: AFP, CA125, CA 15-3, CA 19-19, CEA, CYFRA 21-1, HE-4, NSE, Pro-GRP, PSA, SCC, anti-Cyclin E2, anti-MAPKAPK3, anti-NY-ESO-1, and anti-p53.
- the panel of biomarkers comprises CA 19-9, CEA, CYFRA 21-1, NSE, Pro-GRP, and SCC. In still other embodiments, the panel of biomarkers comprises AFP, CA125, CA 15-3, CA-19-9, CEA, HE-4, and PSA. In yet other embodiments, the panel of biomarkers comprises AFP, CA125, CA 15-3, CA-19-9, Calcitonin, CEA, PAP, and PSA. In other embodiments, the panel of biomarkers comprises AFP, BR 27.29, CA12511, CA 15-3, CA-19-9, Calcitonin, CEA, Her-2, and PSA.
- SVMs support vector machines
- SVMs are supervised learning models that analyze data for classification and regression analysis.
- SVMs may plot a collection of data points in n-dimensional space (e.g., where n is the number of biomarkers and clinical parameters), and classification is performed by finding a hyperplane that can separate the collection of data points into classes.
- hyperplanes are linear, while in other embodiments, hyperplanes are non-linear.
- SVMs are effective in high dimensional spaces, are effective in cases in which the number of dimensions is higher than the number of data points, and generally work well on data sets with clear margins of separation.
- Decision trees are a type of supervised learning algorithm also used in classification problems. Decision trees may be used to identify the most significant variable that provides the best homogenous sets of data. Decision trees split groups of data points into one or more subsets, and then may split each subset into one or more additional categories, and so forth until forming terminal nodes (e.g., nodes that do not split). Various algorithms may be used to decide where a split occurs, including a Gini Index (a type of binary split), Chi-Square, Information Gain, or Reduction in Variance. Decision trees have the capability to rapidly identify the most significant variables among a large number of variables, as well as identify relationships between two or more variables. Additionally, decision trees can handle both numerical and non-numerical data. This technique is generally considered to be a non-parametric approach, e.g., the data does not have to fit a normal distribution.
- Random forest (or random decision forest) is a suitable approach for both classification and regression.
- the random forest method constructs a collection of decision trees with controlled variance.
- nvar a number of variables less than M is used to split groups of data points. The best split is selected and the process is repeated until reaching a terminal node.
- Random forest is particularly suited to process a large number of input variables (e.g., thousands) to identify the most significant variables. Random forest is also effective for estimating missing data.
- Neural nets also referred to as artificial neural nets (ANNs) are described throughout this application.
- a neural net which is a non-deterministic machine learning technique, utilizes one or more layers of hidden nodes to compute outputs. Inputs are selected and weights are assigned to each input. Training data is used to train the neural networks, and the inputs and weights are adjusted until reaching specified metrics, e.g., a suitable specificity and sensitivity.
- ANNs may be used to classify data in cases in which correlation between dependent and independent variables is not linear or in which classification cannot be easily performed using an equation. More than 25 different types of ANNs exist, with each ANN yielding different results based on different training algorithms, activation/transfer functions, number of hidden layers, etc. In some embodiments, more than 15 types of transfer functions are available for use with the neural network. Prediction of the likelihood of having cancer is based upon one or more of the type of ANN, the activation/transfer function, the number of hidden layers, the number of neurons/nodes, and other customizable parameters.
- Deep learning neural networks another machine learning technique, are similar to regular neural nets, but are more complex (e.g., typically have multiple hidden layers) and are capable of automatically performing operations (e.g., feature extraction) in an automated manner, generally requiring less interaction with a user than a traditional neural net.
- inputs may be selected in order to improve the performance of the classifier model. For example, rather than picking the set of inputs that achieves the highest possible sensitivity with a clinically relevant specificity such as 80% or greater, the inputs are selected to reach a sensitivity threshold (e.g., 80% or greater), and once reaching this threshold, the inputs are selected to optimize performance of the classifier model, thereby improving the performance of the classifier model.
- a sensitivity threshold e.g., 80% or greater
- a set of data comprising a plurality of patient records, each patient record including a plurality of parameters and corresponding values for a patient, and wherein the set of data also includes a diagnostic indicator indicating whether or not the patient has been diagnosed with cancer is stored in a memory, accessible by the classifier model or machine learning system.
- the plurality of parameters includes various biomarkers, clinical factors and other factors which may be selected as inputs into the classifier model.
- the diagnostic indicator is an affirmative indicator that the patient has cancer, e.g., a lung X-ray and/or biopsy confirming a diagnosis of cancer.
- a subset of the plurality of parameters is selected for inputs into the machine learning system, wherein the subset includes a panel of at least two different biomarkers and at least one clinical parameter, such as age.
- the set of data (e.g. longitudinal) is randomly partitioned into training data and validation data.
- the classifier model is generated using the machine learning system based on the training data, the subset of inputs and other parameters associated with the machine learning system as described herein. It is determined whether the classifier meets certain performance criteria, such as a predetermined Receiver Operator Characteristic (ROC) statistic, specifying a sensitivity and a specificity, for correct classification of patients. In embodiments, the specificity is at least 80% and the sensitivity is at least 75%. See Example 1A and 2.
- ROC Receiver Operator Characteristic
- the classifier may be iteratively regenerated based on the training data and a different subset of inputs until the classifier meets the pre-determined ROC statistic.
- a static configuration of the classifier may be generated. This static configuration may be deployed to a physician's office for use in identifying patients at risk of having lung cancer or stored on a remote server that can be accesses by the physician's office.
- the classifier model may be validated using the validation data.
- the validation data also includes a plurality of parameters and corresponding values for a patient, and includes a diagnostic indicator indicating whether or not the patient has been diagnosed with cancer.
- the validation data may be classified using the classifier model, and it may be determined whether the classifier meets the predetermined performance criteria such as a ROC statistic based on this data.
- the classifier may be iteratively regenerated based on the training data and a different subset of the plurality of parameters, until the regenerated classifier meets the predetermined ROC statistic. The validation process may then be repeated.
- a user may enter input values corresponding to a patient into the computing device.
- the patient may then be classified, using the static classifier, into a risk category indicative of a likelihood of having cancer or into another risk category indicative of a likelihood of not having cancer.
- the system may then send a notification to the user (e.g., a physician) recommending additional diagnostic testing (e.g., a CT scan, a chest x-ray or biopsy) when the patient is classified into the category indicative of a likelihood of having cancer.
- additional diagnostic testing e.g., a CT scan, a chest x-ray or biopsy
- the classifier model generated by the machine learning system may be continuously trained over time. Test results obtained from the diagnostic testing, which confirm or deny the presence of cancer, may be incorporated into the training data set for further training of the machine learning system, and to generate an improved classifier by the machine learning system.
- the values of a panel of biomarkers in a sample from a patient are measured.
- a classifier model is generated by a machine learning system to classify the patient into a risk category for having or developing cancer, wherein the classifier model has a performance of a ROC curve with a sensitivity of at least 80% and a specificity of at least 80%, and wherein the classifier is generated using the panel of biomarkers comprising at least two different biomarkers, and at least one clinical parameter, such as age.
- a notification to a user for diagnostic testing is provided.
- the risk category for having or developing cancer may be further categorized into qualitative groups (e.g. high, low, medium, etc.) for the likelihood of having cancer, or into quantitative groups (e.g. a percentage, multiplier, risk score, composite score) of the likelihood of having cancer.
- a second classifier model is generated by a machine learning system to assign patients to an organ system and/or specific cancer class membership, wherein the classifier model has a performance of a ROC curve with a sensitivity of at least 70% and a specificity of at least 80%, and wherein the classifier is generated using the panel of biomarkers comprising at least two different biomarkers, and at least one clinical parameter, such as age.
- a notification to a user for diagnostic testing is provided.
- a computer implemented method for predicting a risk or having or developing cancer in a subject using a computer system having one or more processors coupled to a memory storing one or more computer readable instructions for execution by the one or more processors, the one or more computer readable instructions comprising instructions for: storing a set of data comprising a plurality of patient records, each patient record including a plurality of parameters for a patient, and wherein the set of data also includes a diagnostic indicator indicating whether or not the patient has been diagnosed with cancer; selecting a plurality of parameters for inputs into a machine learning system, wherein the parameters include a panel of at least two different biomarker values and at least one type of clinical data; and generating a classifier using the machine learning system, wherein the classifier comprises a sensitivity of at least 70% and a specificity of at least 80%, and wherein the classifier is based on a subset of the inputs.
- the machine learning system may have the capability to deploy improved predictions on a scheduled basis.
- the techniques used by the machine learning system to determine risk may remain static for a period of time, allowing consistency with regard to determination of a risk score.
- the machine learning system may deploy updated techniques that incorporate analysis of new data to produce an improved risk score.
- the machine learning systems described herein may operate: (1) in a static manner; (2) in a semi-static manner, in which the classifier is updated according to a prescribed schedule (e.g., at a specific time); or (3) in a continuous manner, being updated as new data is available.
- Example 1A Development of a Multi-Marker Model for Classifying Asymptomatic Patients as to Developing Cancer: “Pan Cancer” Test
- a multi-marker classification model and method for identifying asymptomatic patients with an increased risk for developing cancer can be categorized as “low”, “medium/moderate” or “high risk” for developing cancer, wherein the ranges for those categories may be based on, for example, probability of developing cancer within 6 months to a year, wherein the probability is measured against baseline level of cancer in the heterogenous population. It is understood in the art, that the rate of cancer is about 1% in the general population. The prevalence of cancer in the cohort used to develop the present Pan Cancer test was about 1.5%. See the below examples for more detail on the use of the test and probability values.
- the development of the classifier model, and the selection of markers may be based on a combination of accuracy, area under the curve (AUC), sensitivity, specificity values, and/or Youden index (Sensitivity+Specificity ⁇ 1) that provide a measure of the performance of the classifier model.
- the development and continued learning by the classifier model of the Pan Cancer Test was performed using longitudinal data and/or retrospective data over a 12-year period wherein biomarkers were measured (along with gender and age), statistical analysis performed, and that data correlated to those individuals that developed cancer. From that, a model comprising an algorithm was generated and trained to identify those individuals with an increased risk at developing cancer over the following 6 months to a year. The same principal is applied to continually increase the accuracy of the model wherein individuals and their biomarker measurements are added to the cohort and further train the model.
- the present “pan cancer” model was developed using data from 12,622 asymptomatic males and 15,316 asymptomatic females who had sera biomarkers measured based on a tumor marker panel over a 12-year period in Taiwan.
- the male cohort had a panel of six markers measured (AFP, CEA, CA19-9, CA15-3, CA125, PSA, SCC, and CYFRA21-1) and the female cohort had a panel of seven markers measured (AFP, CEA, CA19-9, CA125, CA15-3, SCC, and CYFRA21-1). All tumor markers were measured using commercially available in vitro diagnostic (IVD) kits and instrumentation manufactured by either Roche or Abbott Diagnostics. All assays of tumor markers met the requirements of the College of American Pathologists (CAP) Laboratory Accreditation Program. Outcome data were obtained from a cancer registry to determine whether each patient had received a new diagnosis of malignancy within 1 year of the tumor markers test.
- IVD in vitro diagnostic
- the biomarker panel AFP, CEA, CA19-9, CYFRA21-1, SCC and PSA were measured for all 12,622 male individuals and the biomarker panel AFP, CEA, CA19-9, CA125, CA15-3, SCC, and CYFRA21-1 were measured for all 15,316 female individuals.
- a variable selection process was applied to select robust variables from those serum tumor markers to design cancer detection models. The accuracy, sensitivity, specificity, AUC (area under the curve), and Youden index were compared to select the best machine learning models.
- the Youden index was used as a performance indicator for selecting the variables used in the classifier models in this study.
- the ML models are amenable to periodic review and redefinition. Using a larger data set by combining the US and Asian cohorts, the accuracy of the pan cancer model may be further improved for females by leveraging additional data and expanding the number of clinical factor predictors. It is also possible, without wishing to be bound by a theory, that a model for females may optionally account for fluctuations in hormones, such as during pregnancy or menstrual cycles, to further improve performance.
- the developed pan cancer model can be applied to the panel of measured biomarkers, along with age and gender, to determine the likelihood that an individual is at risk for developing cancer.
- the time frame for developing cancer is a few months, such as within 3 months, and up to about 2 years.
- the “likelihood” an individual is at risk for developing cancer is a probability above background that the individual tested will develop cancer within a few months to about 2 years.
- an individual may be classified as “moderate risk” wherein their probability of developing cancer is five times (5 ⁇ ) more than baseline, wherein baseline is about 1% in the general population.
- the likelihood a tested individual that is classified as “moderate risk” has a 5% risk of developing cancer as compared to a “low risk” individual that has a 1% risk of developing cancer over that same time period.
- individuals identified as “moderate risk” or “high risk” may then be selected for further analysis for predicting organ system-based malignancy for a patient with an increased risk of having cancer.
- an individual with a probability above 0.5 (50%) using the selected model of Table 5 were classified as “moderate risk” or “high risk”.
- Individuals with a probability value below 0.5 (50%) were classified as “low risk”.
- the performance of the selected models had a sensitivity value of 0.82 and a specificity value of 0.81.
- a method for predicting an increased risk of having cancer for an asymptomatic patient comprising measuring values of a panel of biomarkers in a sample from a patient; obtaining clinical parameters from the patient including age and gender; utilizing a classifier generated by a machine learning system to classify the patient into a low risk, moderate risk or high risk category of having or developing cancer, wherein the classifier provides a probability value and those individuals with a probability of 0.5 or greater are classified as moderate risk or high risk, and wherein the classifier is generated using a panel of at least six biomarkers, age, gender and a diagnostic indicator from a plurality of patient records and wherein the classifier has a performance based on a Receiver Operator Characteristic (ROC) curve of a sensitivity value of at least 0.8 and a specificity value of at least 0.8; and providing a notification to a user for diagnostic testing.
- ROC Receiver Operator Characteristic
- the present classifier model comprises the following importance factor for each variable, and for each gender.
- Example 1B Improvement of a Multi-Marker Model for Classifying Asymptomatic Patients as to Developing Cancer: Inclusion of Clinical Factor “Age” in Model
- ROC Receiver Operating Characteristic
- the classifier model using only measured sera biomarkers helped 1 in 125-200 males whereas 1 in 4-7 were harmed (false diagnosis); and, 1 in 200-333 females were helped whereas 1 in 3-8 females were harmed.
- age was used in the present classifier model along with the measured sera biomarkers AFP, CEA, CA19-9, CYFRA 21-1 and SCC along with PSA for men and CA 15-3 and CA125 for women.
- Table 1 shows a comparison of various models that includes all 6 biomarkers (AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC) and age, wherein the classifier model performance was significantly increased with a sensitivity value of at least 0.8 and a specificity value of at least 0.8 (of a ROC curve).
- Example 2 Development of a Model for Predicting Organ System-Based Malignancy for Individuals in the “High Risk” and “Moderate Risk” Category Based on the Pan Cancer Test
- Example 1 Provided herein are techniques for predicting organ system-based malignancy for a patient with an increased risk of having cancer as identified in Example 1. That information can then be used to refer patients to a specialist for more invasive diagnostic testing.
- k-Nearest Neighbors algorithm (kNN) was used to determine the top three most likely organs to develop cancer in the “moderate risk” or “high risk” classified groups the performance of the test had a sensitivity value of 81% and the specificity value was 72%.
- a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer comprising: measuring values of a panel of biomarkers in a sample from a patient; obtaining clinical parameters from the patient including age and gender; utilizing a machine learning system to classify patient with an increased risk of having or developing cancer into an appropriate category, to identify at least one most likely organ system malignancy for that patient, wherein the classifier provides a class membership, and wherein the classifier is generated using a panel of at least six biomarkers, age, gender and a diagnostic indicator from a plurality of patient records and wherein the classifier has a performance based on a Receiver Operator Characteristic (ROC) curve of a sensitivity value of at least 0.8 and a specificity value of at least 0.7; and, providing a notification to a user for diagnostic testing.
- ROC Receiver Operator Characteristic
- Example 3 Screening Patients for Likelihood of Developing Cancer and Predicting Mostly Likely Organ Involved in Cancer Using a Two-Step Model
- a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer wherein a model trained from the cohort in Example 1 is applied to the measured panel of biomarkers and the clinical factors of age and gender to identify those patients with an increased risk of having or developing cancer; the pan cancer test.
- the model trained using the cohort of Example 2 is applied to the measured panel of biomarkers and the clinical factors of age and gender to provide a class membership (e.g. the organ system most likely (or top 2 or 3 organ systems)) to be involved in the cancer; the organ system-based malignancy test.
- the trained model predicts the top three organ systems.
- the output of the model may provide a class membership in one organ system (wherein the top three organ systems are all the same), in two organ systems (wherein two of the top three organ systems are the same) or in three organ systems (wherein the top three organ system predicted by the model are all different). See Table 6 for a list of organ systems (class membership) and representative cancer types within each class.
- asymptomatic patients (5 male and 3 female) were first screened using the pan cancer test according to Example 1, and then those categorized as moderate or high risk were further screened using the organ system-based malignancy test according to Example 2.
- Health History Hypertension, Diabetes, Chronic Pancreatitis, Colorectal Polyps, Crohn's Disease, Ulcerative Colitis, COPD, Chronic Bronchitis, Emphysema, etc.
- Cancer screening history colonnoscopy, sigmoidoscopy, mammogram, X-Ray or CT scan for Lung cancer, PAP/HPV test
- a male patient with a probability value categorized as low risk that means less than 1% of individuals with a probability value in that range will likely be found to have cancer. That risk level is no different than the general heterogeneous population; in other words, the low risk category represents no increased risk for a male patient as compared to baseline.
- a male patient with a probability value categorized as moderate risk that means approximately 5 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having biomarkers measured. That risk level is approximately 5% of having or developing cancer within one year, or a five times (5 ⁇ ) increase as compared to the low risk category.
- a probability value categorized as high risk that means approximately 10 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having those biomarkers measured. That risk level is approximately 10% of having or developing cancer within one year, or a ten times (10 ⁇ ) increase as compared to the low risk category.
- the current iteration of the application of the pan cancer test model provides the following probability ranges for each category for female patients:
- a female patient with a probability value categorized as low risk that means less than 1% of individuals with a probability value in that range will likely be found to have cancer. That risk level is no different than the general heterogeneous population; in other words, the low risk category represents no increased risk for a female patient as compared to baseline.
- the low risk category represents no increased risk for a female patient as compared to baseline.
- the low risk category represents no increased risk for a female patient as compared to baseline.
- a female patient with a probability value categorized as moderate risk that means approximately 2 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having biomarkers measured. That risk level is approximately 2% of having or developing cancer within one year, or a two times (2 ⁇ ) increase as compared to the low risk category.
- a female patient with a probability value categorized as high risk that means approximately 8 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having those biomarkers measured. That risk level is approximately 8% of having or developing cancer within one year, or an eight times (8 ⁇ ) increase as compared to the low risk category.
- the trained pattern recognition model of Example 2 was applied to the high and moderate risk male patients and the high-risk female patient. Those same variables of FIG. 3 were used as input for the organ system-based malignancy test model.
- the output a class membership of an organ system that represents a group of cancer types, may be used to suggest a specialist for follow-up care that may include radiography or invasive diagnostic tests.
- a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer that utilizes a two-step machine learning process wherein a first machine learning model is applied using measured sera biomarkers and age as input variables, wherein gender is used to select the measured biomarkers and to train the classifier, to categorize patients as low risk (no increased risk) or moderate or high risk wherein the latter two categories represent an increased risk of having or developing cancer within one year as compared to baseline (low risk). For those patients categorized as moderate or high risk a second machine learning classifier is applied using the measured biomarkers, age and gender as input variables and providing a class membership for an organ system that represents a number of different cancer types.
- a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer comprising: a) measuring values of a panel of biomarkers in a sample from a patient; b) obtaining clinical parameters from the patient including age and gender; c) utilizing a first classifier generated by a machine learning system to classify the patient into a low risk, moderate risk or high risk of having or developing cancer, wherein the classifier provides a probability value and those individuals with a probability of 0.5 or greater are classified as moderate risk or high risk, and wherein the classifier is generated using a panel of at least six biomarkers, age, gender and a diagnostic indicator from a plurality of patient records; utilizing a second classifier generated by a machine learning system, when a patient is classified into a medium or high risk category of developing cancer in step c), to identify at least one most likely organ system malignancy for that patient, wherein the classifier provides a class membership, and wherein the classifier is generated using a panel of
- the machine learning system comprises one or more machine learning processors.
- the machine learning processors are deep learning processors.
- the one or more deep learning processors train one or more classification models using training data.
- the machine learning system generates one or more classifiers to predict a likelihood of having cancer or developing cancer, of class membership, or of both.
- the machine learning model may comprise one or more classifiers, one or more inputs, and one or more weighting factors for weighting of the inputs, along with one or more classification models.
- the machine learning model may be continuously improved as new training data is available.
- Example 4 Male Classifier Model is Superior to a Single Threshold Method of Measuring Biomarkers for Prediction of Cancer
- Example 1 Provided herein is a demonstration that the present male classifier model, as developed in Example 1, is significantly better at predicting cancer development within one year than measurement of a panel of individual biomarkers from the same subjects.
- the present methods and classifier models aggregate biomarker measurements and clinical factors, such as age, to predict a patient's cancer risk, whereas previous methods may measure the same panel of markers but predict, or deem a patient an increased risk for developing cancer, if any one measured biomarker is “high”.
- any one biomarker above a threshold deemed to be clinically relevant would indicate a positive test for an increased risk of developing cancer.
- Table 8 below provides a normal range for well-validated tumor markers, measurement of a given marker above the normal range would indicate an increased likelihood of developing cancer.
- the present male classifier model according to Example 1, and used in Example 3, provides a significant improvement to sensitivity and specificity for predicting cancer as compared to “any marker high” methods. See FIG. 5 .
- Biomarker Normal Range Cancers AFP ⁇ 8.3 ng/ml Liver cancer, testicular and ovarian cancers CA 19-9 ⁇ 35 U/ml Pancreatic, colorectal, stomach, liver and bile duct cancer CEA ⁇ 4.7 ng/ml Colorectal, pancreatic, (non-smokers) gastrointestinal cancers, ⁇ 5.6 ng/ml lung cancer (smokers) CYFRA 21-1 ⁇ 3.3 ng/ml Lung, H&N cancer, uterine cancer, esophagus cancer, bladder cancer, mesothelioma, some lymphomas and sarcomas PSA ⁇ 4 ng/ml Prostate cancer
- the present male classifier model provides a substantial improvement in diagnostic accuracy over conventional methods, e.g., any marker high methods; an improvement in sensitivity is demonstrated wherein 2 ⁇ more cancers in males detected. Moreover, the present male classifier model was able to distinguish cancers from noncancers with 82% sensitivity and 81% specificity. See FIG. 6 . In this figure, the cut off between low risk and moderate or high risk was 50, or 0.5. The risk score may be provided from 0 to 1, or 0 to 100.
- Example 5 Female Classifier Model is Superior to a Single Threshold Method of Measuring Biomarkers for Prediction of Cancer
- the present female classifier model as developed in Example 1, is significantly better at predicting cancer development within one year than measurement of a panel of individual biomarkers from the same subjects.
- the present female classifier model improves individual biomarker “single threshold” method wherein the sensitivity represents a 4-fold increase as compared to the single threshold method.
- the present female classifier model identifies 4 ⁇ more cancers in female patients as compared to the conventional methods of “any marker high”. See FIG. 7 .
- Table 9 provides a normal range for well-validated tumor markers, measurement of a given marker above the normal range would indicate an increased likelihood of developing cancer using conventional methods.
- Biomarker Normal Range Cancers AFP ⁇ 8.3 ng/ml Liver cancer, testicular and ovarian cancers CA 19-9 ⁇ 35 U/ml Pancreatic, colorectal, stomach, liver and bile duct cancer CEA ⁇ 4.7 ng/ml Colorectal, pancreatic, (non-smokers) gastrointestinal cancers, ⁇ 5.6 ng/ml lung cancer (smokers) CYFRA 21-1 ⁇ 3.3 ng/ml Lung, H&N cancer, uterine cancer, esophagus cancer, bladder cancer, mesothelioma, some lymphomas and sarcomas CA 125 ⁇ 38 U/ml Ovarian and lung cancers CA15-3 ⁇ 25 U/ml Breast cancer
- the present female classifier model provides a substantial improvement in diagnostic accuracy over conventional methods, e.g., any marker high methods; an improvement in sensitivity is demonstrated wherein 4 ⁇ more cancers in females are detected. Moreover, the present female classifier model was able to distinguish cancers from noncancers with 50% sensitivity and 74% specificity. See FIG. 8 . In this figure, the cut off between low risk and moderate or high risk was 50, or 0.5.
- the risk score may be provided from 0 to 1, or 0 to 100, or X out of 100 patients (who have scored (in the population used to develop the algorithm) at or above your score were diagnosed with cancer within one year of have these biomarkers tested).
- a heterogenous population has a cancer incidence of 1 out 100, wherein any risk score of 1 out of 100 is considered normal risk, or not an increased risk.
- a risk score of 2 out of 100, or great classifies a patient in an increased risk category.
- Example 6 Screening Patients for Likelihood of Developing Cancer and Identifying Patients with an Increased Risk of Developing Cancer when all Measured Biomarkers are in the Normal Range
- this method and present classifier model uses input variables of measured biomarkers that are within a normal clinical range, wherein the pan cancer classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the first classifier model is above a threshold.
- asymptomatic patients (2 male and 2 female) were screened using the pan cancer test according to Example 1 and Example 3.
- the biomarkers of Table 8 were measured within the normal range, however the present male classifier model classified both patients in an increased risk category using a threshold of a 1% (cancer rate in a heterogenous population).
- One patient (mp #1) was classified as having an increased risk of having cancer as 5 out of 100 (positive predictive value) and the other (mp #2) was classified as having an increased risk of having cancer as 12 out of 100.
- Mp #1 was subsequently diagnosed with stage 1 liver cancer and mp #2 was subsequently diagnosed with stage 1 bladder cancer.
- the present male classifier model classified the male patients at high risk, where normally all tumor markers low would not raise concern.
- the biomarkers of Table 9 were measured within the normal range, however the present female classifier model classified both patients in an increased risk category using a threshold of a 1% (cancer rate in a heterogenous population).
- One patient (fp #1) was classified as having an increased risk of having cancer as 2 out of 100 (positive predictive value) and the other (fp #2) was classified as having an increased risk of having cancer as 3 out of 100.
- Fp # was subsequently diagnosed with stage1B lung cancer and fp #2 was subsequently diagnosed with stage 2B breast cancer.
- the present female classifier model classified the female patients at high risk, where normally all tumor markers low would not raise concern.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Epidemiology (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Analytical Chemistry (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Chemical & Material Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/458,589 US20200005901A1 (en) | 2018-06-30 | 2019-07-01 | Cancer classifier models, machine learning systems and methods of use |
US18/213,882 US20240040068A1 (en) | 2018-10-29 | 2023-06-25 | Fast and/or slow motion compensating timer display |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862692683P | 2018-06-30 | 2018-06-30 | |
US16/458,589 US20200005901A1 (en) | 2018-06-30 | 2019-07-01 | Cancer classifier models, machine learning systems and methods of use |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/173,033 Continuation US10388322B1 (en) | 2018-10-29 | 2018-10-29 | Real time video special effects system and method |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/793,747 Continuation-In-Part US11218646B2 (en) | 2018-10-29 | 2020-02-18 | Real time video special effects system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
US20200005901A1 true US20200005901A1 (en) | 2020-01-02 |
Family
ID=68987635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/458,589 Pending US20200005901A1 (en) | 2018-06-30 | 2019-07-01 | Cancer classifier models, machine learning systems and methods of use |
Country Status (4)
Country | Link |
---|---|
US (1) | US20200005901A1 (ja) |
JP (1) | JP7431760B2 (ja) |
CN (1) | CN112970067A (ja) |
WO (1) | WO2020006547A1 (ja) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222575A (zh) * | 2020-01-07 | 2020-06-02 | 北京联合大学 | 一种基于hrrp目标识别的klxs多模型融合方法及系统 |
US20200185059A1 (en) * | 2018-12-10 | 2020-06-11 | Grail, Inc. | Systems and methods for classifying patients with respect to multiple cancer classes |
CN111276243A (zh) * | 2020-01-22 | 2020-06-12 | 首都医科大学附属北京佑安医院 | 一种基于生物标志物的多变量分类系统和方法 |
CN111584064A (zh) * | 2020-03-27 | 2020-08-25 | 湖州市中心医院 | 一种结、直肠癌转移预测系统及其使用方法 |
CN111583993A (zh) * | 2020-05-29 | 2020-08-25 | 杭州广科安德生物科技有限公司 | 构建体外检测癌症的数学模型的方法及其应用 |
CN112259221A (zh) * | 2020-10-21 | 2021-01-22 | 北京大学第一医院 | 基于多种机器学习算法的肺癌诊断系统 |
US20210057100A1 (en) * | 2019-08-22 | 2021-02-25 | Kenneth Neumann | Methods and systems for generating a descriptor trail using artificial intelligence |
US20210057099A1 (en) * | 2019-08-22 | 2021-02-25 | Kenneth Neumann | Methods and systems for generating a descriptor trail using artificial intelligence |
CN112652361A (zh) * | 2020-12-29 | 2021-04-13 | 中国医科大学附属盛京医院 | 一种基于gbdt模型的骨髓瘤高风险筛查方法及其应用 |
US20210241046A1 (en) * | 2019-11-26 | 2021-08-05 | University Of North Texas | Compositions and methods for cancer detection and classification using neural networks |
WO2021206925A1 (en) * | 2020-04-06 | 2021-10-14 | General Genomics, Inc. | Predicting susceptibility of living organisms to medical conditions using machine learning models |
CN113539493A (zh) * | 2021-06-23 | 2021-10-22 | 吾征智能技术(北京)有限公司 | 一种利用多模态风险因素推断癌症风险概率的系统 |
US20210345925A1 (en) * | 2018-09-21 | 2021-11-11 | Carnegie Mellon University | A data processing system for detecting health risks and causing treatment responsive to the detection |
WO2021247577A1 (en) * | 2020-06-01 | 2021-12-09 | 2020 Genesystems | Methods and software systems to optimize and personalize the frequency of cancer screening blood tests |
CN113913518A (zh) * | 2021-08-31 | 2022-01-11 | 广州市金域转化医学研究院有限公司 | 成熟b细胞肿瘤的分型标志物及其应用 |
WO2022015700A1 (en) * | 2020-07-13 | 2022-01-20 | 20/20 GeneSystems | Universal pan cancer classifier models, machine learning systems and methods of use |
US20220084632A1 (en) * | 2019-06-27 | 2022-03-17 | Veracyte, Inc. | Clinical classfiers and genomic classifiers and uses thereof |
CN114974589A (zh) * | 2022-06-10 | 2022-08-30 | 燕山大学 | 一种宫颈癌预测方法 |
US11475302B2 (en) * | 2019-04-05 | 2022-10-18 | Koninklijke Philips N.V. | Multilayer perceptron based network to identify baseline illness risk |
US11487608B2 (en) * | 2018-12-11 | 2022-11-01 | Rovi Guides, Inc. | Entity resolution framework for data matching |
WO2022251633A1 (en) * | 2021-05-28 | 2022-12-01 | University Of Southern California | A radiomic-based machine learing algorithm to reliably differentiate benign renal masses from renal carcinoma |
WO2022241264A3 (en) * | 2021-05-13 | 2023-01-26 | Arizona Board Of Regents On Behalf Of The University Of Arizona | Method of targeted multi-panel approach and tiered a.i. use for differential diagnosis and prognosis |
CN116259414A (zh) * | 2023-05-09 | 2023-06-13 | 南京诺源医疗器械有限公司 | 转移性淋巴结区分模型、构建方法及应用 |
US20230207128A1 (en) * | 2021-12-29 | 2023-06-29 | AiOnco, Inc. | Processing encrypted data for artificial intelligence-based analysis |
US20230243830A1 (en) * | 2020-10-05 | 2023-08-03 | Freenome Holdings, Inc. | Markers for the early detection of colon cell proliferative disorders |
CN116779179A (zh) * | 2023-08-22 | 2023-09-19 | 聊城市第二人民医院 | 一种基于支持向量机的肾细胞瘤背景信息分析系统 |
US11783915B2 (en) | 2018-06-01 | 2023-10-10 | Grail, Llc | Convolutional neural network systems and methods for data classification |
TWI818203B (zh) * | 2020-10-23 | 2023-10-11 | 國立臺灣大學醫學院附設醫院 | 基於病患病情的分類模型建立方法 |
US11817214B1 (en) | 2019-09-23 | 2023-11-14 | FOXO Labs Inc. | Machine learning model trained to determine a biochemical state and/or medical condition using DNA epigenetic data |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11621080B2 (en) * | 2014-12-08 | 2023-04-04 | 20/20 GeneSystems | Methods and machine learning systems for predicting the likelihood or risk of having cancer |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5983211A (en) * | 1996-01-24 | 1999-11-09 | Heseltine; Gary L. | Method and apparatus for the diagnosis of colorectal cancer |
US20090061422A1 (en) * | 2005-04-19 | 2009-03-05 | Linke Steven P | Diagnostic markers of breast cancer treatment and progression and methods of use thereof |
KR101401561B1 (ko) * | 2010-12-30 | 2014-06-11 | 주식회사 바이오인프라 | 복합 바이오마커를 활용한 암 진단 정보 생성 방법, 및 암 진단 예측 시스템 장치 |
IL278227B (en) * | 2011-04-29 | 2022-07-01 | Cancer Prevention & Cure Ltd | Data classification systems for identifying biomarkers and diagnosing diseases |
US9753043B2 (en) * | 2011-12-18 | 2017-09-05 | 20/20 Genesystems, Inc. | Methods and algorithms for aiding in the detection of cancer |
US9753037B2 (en) * | 2013-03-15 | 2017-09-05 | Rush University Medical Center | Biomarker panel for detecting lung cancer |
WO2015066564A1 (en) * | 2013-10-31 | 2015-05-07 | Cancer Prevention And Cure, Ltd. | Methods of identification and diagnosis of lung diseases using classification systems and kits thereof |
DK3071973T3 (da) * | 2013-11-21 | 2021-01-11 | Pacific Edge Ltd | Triage af patienter med asymptomatisk hæmaturi ved hjælp af genotype- og fænotypebiomarkører |
TWI630501B (zh) * | 2016-07-29 | 2018-07-21 | 長庚醫療財團法人林口長庚紀念醫院 | Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set |
-
2019
- 2019-07-01 WO PCT/US2019/040075 patent/WO2020006547A1/en active Application Filing
- 2019-07-01 US US16/458,589 patent/US20200005901A1/en active Pending
- 2019-07-01 CN CN201980056329.0A patent/CN112970067A/zh active Pending
- 2019-07-01 JP JP2020573269A patent/JP7431760B2/ja active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11621080B2 (en) * | 2014-12-08 | 2023-04-04 | 20/20 GeneSystems | Methods and machine learning systems for predicting the likelihood or risk of having cancer |
Non-Patent Citations (5)
Title |
---|
Cairns, S. R., British Society of Gastroenterology, & Association of Coloproctology for Great Britain and Ireland (2010). Guidelines for colorectal cancer screening and surveillance in moderate and high risk groups (update from 2002). Gut, 59(5), 666–689. (Year: 2010) * |
Kovalchik, Stephanie A., et al. "A regression model for risk difference estimation in population-based case–control studies clarifies gender differences in lung cancer risk of smokers and never smokers." BMC medical research methodology 13.1 (2013): 1-8 (Year: 2013) * |
Prescott, Eva, et al. "Gender and smoking-related risk of lung cancer." Epidemiology (1998): 79-83 (Year: 1998) * |
Wen, Y. H., Chang, P. Y., Hsu, C. M., Wang, H. Y., Chiu, C. T., & Lu, J. J. (2015). Cancer screening through a multi-analyte serum biomarker panel during health check-up examinations: Results from a 12-year experience. Clinica chimica acta; international journal of clinical chemistry, 450, 273–276 (Year: 2015) * |
Yan, S., Qian, W., Guan, Y., & Zheng, B. (2016). Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method. Medical physics, 43(6) (Year: 2016) * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11783915B2 (en) | 2018-06-01 | 2023-10-10 | Grail, Llc | Convolutional neural network systems and methods for data classification |
US20210345925A1 (en) * | 2018-09-21 | 2021-11-11 | Carnegie Mellon University | A data processing system for detecting health risks and causing treatment responsive to the detection |
US20200185059A1 (en) * | 2018-12-10 | 2020-06-11 | Grail, Inc. | Systems and methods for classifying patients with respect to multiple cancer classes |
US11581062B2 (en) * | 2018-12-10 | 2023-02-14 | Grail, Llc | Systems and methods for classifying patients with respect to multiple cancer classes |
US11487608B2 (en) * | 2018-12-11 | 2022-11-01 | Rovi Guides, Inc. | Entity resolution framework for data matching |
US11475302B2 (en) * | 2019-04-05 | 2022-10-18 | Koninklijke Philips N.V. | Multilayer perceptron based network to identify baseline illness risk |
US20220084632A1 (en) * | 2019-06-27 | 2022-03-17 | Veracyte, Inc. | Clinical classfiers and genomic classifiers and uses thereof |
US20210057100A1 (en) * | 2019-08-22 | 2021-02-25 | Kenneth Neumann | Methods and systems for generating a descriptor trail using artificial intelligence |
US20210057099A1 (en) * | 2019-08-22 | 2021-02-25 | Kenneth Neumann | Methods and systems for generating a descriptor trail using artificial intelligence |
US11810669B2 (en) * | 2019-08-22 | 2023-11-07 | Kenneth Neumann | Methods and systems for generating a descriptor trail using artificial intelligence |
US11581094B2 (en) * | 2019-08-22 | 2023-02-14 | Kpn Innovations, Llc. | Methods and systems for generating a descriptor trail using artificial intelligence |
US11817214B1 (en) | 2019-09-23 | 2023-11-14 | FOXO Labs Inc. | Machine learning model trained to determine a biochemical state and/or medical condition using DNA epigenetic data |
US20210241046A1 (en) * | 2019-11-26 | 2021-08-05 | University Of North Texas | Compositions and methods for cancer detection and classification using neural networks |
CN111222575A (zh) * | 2020-01-07 | 2020-06-02 | 北京联合大学 | 一种基于hrrp目标识别的klxs多模型融合方法及系统 |
CN111276243A (zh) * | 2020-01-22 | 2020-06-12 | 首都医科大学附属北京佑安医院 | 一种基于生物标志物的多变量分类系统和方法 |
CN111584064A (zh) * | 2020-03-27 | 2020-08-25 | 湖州市中心医院 | 一种结、直肠癌转移预测系统及其使用方法 |
WO2021206925A1 (en) * | 2020-04-06 | 2021-10-14 | General Genomics, Inc. | Predicting susceptibility of living organisms to medical conditions using machine learning models |
CN111583993A (zh) * | 2020-05-29 | 2020-08-25 | 杭州广科安德生物科技有限公司 | 构建体外检测癌症的数学模型的方法及其应用 |
WO2021247577A1 (en) * | 2020-06-01 | 2021-12-09 | 2020 Genesystems | Methods and software systems to optimize and personalize the frequency of cancer screening blood tests |
WO2022015700A1 (en) * | 2020-07-13 | 2022-01-20 | 20/20 GeneSystems | Universal pan cancer classifier models, machine learning systems and methods of use |
US20230243830A1 (en) * | 2020-10-05 | 2023-08-03 | Freenome Holdings, Inc. | Markers for the early detection of colon cell proliferative disorders |
CN112259221A (zh) * | 2020-10-21 | 2021-01-22 | 北京大学第一医院 | 基于多种机器学习算法的肺癌诊断系统 |
TWI818203B (zh) * | 2020-10-23 | 2023-10-11 | 國立臺灣大學醫學院附設醫院 | 基於病患病情的分類模型建立方法 |
CN112652361A (zh) * | 2020-12-29 | 2021-04-13 | 中国医科大学附属盛京医院 | 一种基于gbdt模型的骨髓瘤高风险筛查方法及其应用 |
WO2022241264A3 (en) * | 2021-05-13 | 2023-01-26 | Arizona Board Of Regents On Behalf Of The University Of Arizona | Method of targeted multi-panel approach and tiered a.i. use for differential diagnosis and prognosis |
WO2022251633A1 (en) * | 2021-05-28 | 2022-12-01 | University Of Southern California | A radiomic-based machine learing algorithm to reliably differentiate benign renal masses from renal carcinoma |
CN113539493A (zh) * | 2021-06-23 | 2021-10-22 | 吾征智能技术(北京)有限公司 | 一种利用多模态风险因素推断癌症风险概率的系统 |
CN113913518A (zh) * | 2021-08-31 | 2022-01-11 | 广州市金域转化医学研究院有限公司 | 成熟b细胞肿瘤的分型标志物及其应用 |
US20230207128A1 (en) * | 2021-12-29 | 2023-06-29 | AiOnco, Inc. | Processing encrypted data for artificial intelligence-based analysis |
CN114974589A (zh) * | 2022-06-10 | 2022-08-30 | 燕山大学 | 一种宫颈癌预测方法 |
CN116259414A (zh) * | 2023-05-09 | 2023-06-13 | 南京诺源医疗器械有限公司 | 转移性淋巴结区分模型、构建方法及应用 |
CN116779179A (zh) * | 2023-08-22 | 2023-09-19 | 聊城市第二人民医院 | 一种基于支持向量机的肾细胞瘤背景信息分析系统 |
Also Published As
Publication number | Publication date |
---|---|
JP7431760B2 (ja) | 2024-02-15 |
CN112970067A (zh) | 2021-06-15 |
JP2021529954A (ja) | 2021-11-04 |
WO2020006547A1 (en) | 2020-01-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7431760B2 (ja) | 癌分類子モデル、機械学習システム、および使用方法 | |
US20240112811A1 (en) | Methods and machine learning systems for predicting the likelihood or risk of having cancer | |
Xiao et al. | Comparison and development of machine learning tools in the prediction of chronic kidney disease progression | |
JP7250693B2 (ja) | 初期ステージの肺がん診断のための血漿ベースのタンパク質プロファイリング | |
US20230263477A1 (en) | Universal pan cancer classifier models, machine learning systems and methods of use | |
US20190072554A1 (en) | Methods of Identification and Diagnosis of Lung Diseases Using Classification Systems and Kits Thereof | |
Ostrin et al. | Contribution of a blood-based protein biomarker panel to the classification of indeterminate pulmonary nodules | |
Kiessling | The changing face of cancer diagnosis: from computational image analysis to systems biology | |
US20230243830A1 (en) | Markers for the early detection of colon cell proliferative disorders | |
CN113270188A (zh) | 食管鳞癌根治术后患者预后预测模型构建方法及装置 | |
Rashid et al. | Artificial intelligence in acute respiratory distress syndrome: A systematic review | |
Tang et al. | Diagnosis of hepatocellular carcinoma based on salivary protein glycopatterns and machine learning algorithms | |
CA3202255A1 (en) | Markers for the early detection of colon cell proliferative disorders | |
He et al. | A novel clinical model for predicting malignancy of solitary pulmonary nodules: a multicenter study in Chinese population | |
Wang et al. | Survival risk prediction model for ESCC based on relief feature selection and CNN | |
US20230223145A1 (en) | Methods and software systems to optimize and personalize the frequency of cancer screening blood tests | |
Popa et al. | A new approach to predict ulcerative colitis activity through standard clinical–biological parameters using a robust neural network model | |
US20130080101A1 (en) | System, method and computer-accessible medium for evaluating a malignancy status in at-risk populations and during patient treatment management | |
Kanellakis et al. | Management of incidental nodules in lung cancer screening: ready for prime-time? | |
Yadav et al. | Artificial Intelligence: A Promising Tool in Diagnosis of Respiratory Diseases | |
Nayak et al. | Computational Intelligence in Cancer Diagnosis: Progress and Challenges | |
Liu et al. | Detection of Nasopharyngeal Carcinoma Using Routine Medical Tests via Machine Learning | |
CN117831690A (zh) | 检测待测血样异常信号定量的计算机实施方法 | |
Gray | Validating and Updating Lung Cancer Prediction Models | |
CN115862838A (zh) | 一种基于机器学习算法的胆管癌诊断模型及其构建方法和应用 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |